Abstract
For security in wireless, voice over IP and cellular telephony applications, there is an emerging need for speaker identification systems (SID) to be robust to speech coding distortion. This paper examines the robustness issue for the 8 kilobits/second ITU-T G.729 codec. The SID system is trained on clean speech and tested on the decoded speech of the G.729 codec. To mitigate the performance loss due to mismatched training and testing conditions, five features are considered and two approaches are used. Four of the five features are based on linear prediction analysis and the other is the mel frequency cepstrum. The first method is feature compensation based on the affine transform and is used to map the features from the test scenario to the train scenario. The second method is feature fusion based on the arithmetic combination of probabilities generated by the vector quantizer classifier. The affine transform and fusion of four features gives the best identification success rate (ISR) of 83.2%. The best performing single feature achieves an ISR of 70.5% without the affine transform and 77.4% with the affine transform.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the 2010 Asia Pacific Conference on Circuit and System, APCCAS 2010 |
Pages | 1063-1066 |
Number of pages | 4 |
DOIs | |
State | Published - Dec 1 2010 |
Event | 2010 Asia Pacific Conference on Circuit and System, APCCAS 2010 - Kuala Lumpur, Malaysia Duration: Dec 6 2010 → Dec 9 2010 |
Other
Other | 2010 Asia Pacific Conference on Circuit and System, APCCAS 2010 |
---|---|
Country/Territory | Malaysia |
City | Kuala Lumpur |
Period | 12/6/10 → 12/9/10 |
All Science Journal Classification (ASJC) codes
- Electrical and Electronic Engineering