The affine transform and feature fusion for robust speaker identification in the presence of speech coding distortion

Robert W. Mudrowsky, Ravi Ramachandran, Sachin S. Shetty

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

For security in wireless, voice over IP and cellular telephony applications, there is an emerging need for speaker identification systems (SID) to be robust to speech coding distortion. This paper examines the robustness issue for the 8 kilobits/second ITU-T G.729 codec. The SID system is trained on clean speech and tested on the decoded speech of the G.729 codec. To mitigate the performance loss due to mismatched training and testing conditions, five features are considered and two approaches are used. Four of the five features are based on linear prediction analysis and the other is the mel frequency cepstrum. The first method is feature compensation based on the affine transform and is used to map the features from the test scenario to the train scenario. The second method is feature fusion based on the arithmetic combination of probabilities generated by the vector quantizer classifier. The affine transform and fusion of four features gives the best identification success rate (ISR) of 83.2%. The best performing single feature achieves an ISR of 70.5% without the affine transform and 77.4% with the affine transform.

Original languageEnglish (US)
Title of host publicationProceedings of the 2010 Asia Pacific Conference on Circuit and System, APCCAS 2010
Pages1063-1066
Number of pages4
DOIs
StatePublished - Dec 1 2010
Event2010 Asia Pacific Conference on Circuit and System, APCCAS 2010 - Kuala Lumpur, Malaysia
Duration: Dec 6 2010Dec 9 2010

Other

Other2010 Asia Pacific Conference on Circuit and System, APCCAS 2010
CountryMalaysia
CityKuala Lumpur
Period12/6/1012/9/10

All Science Journal Classification (ASJC) codes

  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'The affine transform and feature fusion for robust speaker identification in the presence of speech coding distortion'. Together they form a unique fingerprint.

Cite this