TY - GEN
T1 - Robust speaker identification under noisy conditions using feature compensation and signal to noise ratio estimation
AU - Frankle, Megan N.
AU - Ramachandran, Ravi P.
PY - 2016/7/2
Y1 - 2016/7/2
N2 - For wireless remote access security, forensics, electronic commerce and surveillance applications, there is a growing need for biometric speaker identification systems to be robust to noise. This paper examines the robustness issue for the case of additive white noise at signal to noise ratios ranging from 0 to 30 dB. A Gaussian mixture model classifier based on adaptation of a universal background model is used. The system is trained on clean speech and tested on clean and noisy speech. To mitigate the performance loss due to mismatched training and testing conditions, five robust features, feature compensation and decision level fusion strategies are used. The feature compensation is based on blind estimation of the signal to noise ratio of the test speech and the selection of an affine transform among a repertoire. A two-way analysis of variance compares the experimental scenarios (benchmark, control and practical) and the individual features/fusion at each signal to noise ratio. The practical scenario is always statistically better than the benchmark and sometimes equivalent to the control scenario.
AB - For wireless remote access security, forensics, electronic commerce and surveillance applications, there is a growing need for biometric speaker identification systems to be robust to noise. This paper examines the robustness issue for the case of additive white noise at signal to noise ratios ranging from 0 to 30 dB. A Gaussian mixture model classifier based on adaptation of a universal background model is used. The system is trained on clean speech and tested on clean and noisy speech. To mitigate the performance loss due to mismatched training and testing conditions, five robust features, feature compensation and decision level fusion strategies are used. The feature compensation is based on blind estimation of the signal to noise ratio of the test speech and the selection of an affine transform among a repertoire. A two-way analysis of variance compares the experimental scenarios (benchmark, control and practical) and the individual features/fusion at each signal to noise ratio. The practical scenario is always statistically better than the benchmark and sometimes equivalent to the control scenario.
UR - http://www.scopus.com/inward/record.url?scp=85015916196&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85015916196&partnerID=8YFLogxK
U2 - 10.1109/MWSCAS.2016.7869973
DO - 10.1109/MWSCAS.2016.7869973
M3 - Conference contribution
T3 - Midwest Symposium on Circuits and Systems
BT - 2016 IEEE 59th International Midwest Symposium on Circuits and Systems, MWSCAS 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 59th IEEE International Midwest Symposium on Circuits and Systems, MWSCAS 2016
Y2 - 16 October 2016 through 19 October 2016
ER -