TY - GEN
T1 - Speech based emotion recognition using spectral feature extraction and an ensemble of kNN classifiers
AU - Rieger, Steven A.
AU - Muraleedharan, Rajani
AU - Ramachandran, Ravi P.
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/10/24
Y1 - 2014/10/24
N2 - Security (and cyber security) is an important issue in existing and developing technology. It is imperative that cyber security go beyond password based systems to avoid criminal activities. A human biometric and emotion based recognition framework implemented in parallel can enable applications to access personal or public information securely. The focus of this paper is on the study of speech based emotion recognition using a pattern recognition paradigm with spectral feature extraction and an ensemble of k nearest neighbor (kNN) classifiers. The five spectral features are the linear predictive cepstrum (CEP), mel frequency cepstrum (MFCC), line spectral frequencies (LSF), adaptive component weighted cepstrum (ACW) and the post-filter cepstrum (PFL). The bagging algorithm is used to train the ensemble of kNNs. Fusion is implicitly accomplished by ensemble classification. The LDC emotional prosody speech database is used in all the experiments. Results show that the maximum gain in performance is achieved by using two kNNs as opposed to using a single kNN.
AB - Security (and cyber security) is an important issue in existing and developing technology. It is imperative that cyber security go beyond password based systems to avoid criminal activities. A human biometric and emotion based recognition framework implemented in parallel can enable applications to access personal or public information securely. The focus of this paper is on the study of speech based emotion recognition using a pattern recognition paradigm with spectral feature extraction and an ensemble of k nearest neighbor (kNN) classifiers. The five spectral features are the linear predictive cepstrum (CEP), mel frequency cepstrum (MFCC), line spectral frequencies (LSF), adaptive component weighted cepstrum (ACW) and the post-filter cepstrum (PFL). The bagging algorithm is used to train the ensemble of kNNs. Fusion is implicitly accomplished by ensemble classification. The LDC emotional prosody speech database is used in all the experiments. Results show that the maximum gain in performance is achieved by using two kNNs as opposed to using a single kNN.
UR - http://www.scopus.com/inward/record.url?scp=84912074169&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84912074169&partnerID=8YFLogxK
U2 - 10.1109/ISCSLP.2014.6936711
DO - 10.1109/ISCSLP.2014.6936711
M3 - Conference contribution
AN - SCOPUS:84912074169
T3 - Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, ISCSLP 2014
SP - 589
EP - 593
BT - Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, ISCSLP 2014
A2 - Zheng, Thomas Fang
A2 - Li, Haizhou
A2 - Dong, Minghui
A2 - Tao, Jianhua
A2 - Lu, Yanfeng
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 9th International Symposium on Chinese Spoken Language Processing, ISCSLP 2014
Y2 - 12 September 2014 through 14 September 2014
ER -