TY - GEN
T1 - The affine transform and feature fusion for robust speaker identification in the presence of speech coding distortion
AU - Mudrowsky, Robert W.
AU - Ramachandran, Ravi P.
AU - Shetty, Sachin S.
PY - 2010
Y1 - 2010
N2 - For security in wireless, voice over IP and cellular telephony applications, there is an emerging need for speaker identification systems (SID) to be robust to speech coding distortion. This paper examines the robustness issue for the 8 kilobits/second ITU-T G.729 codec. The SID system is trained on clean speech and tested on the decoded speech of the G.729 codec. To mitigate the performance loss due to mismatched training and testing conditions, five features are considered and two approaches are used. Four of the five features are based on linear prediction analysis and the other is the mel frequency cepstrum. The first method is feature compensation based on the affine transform and is used to map the features from the test scenario to the train scenario. The second method is feature fusion based on the arithmetic combination of probabilities generated by the vector quantizer classifier. The affine transform and fusion of four features gives the best identification success rate (ISR) of 83.2%. The best performing single feature achieves an ISR of 70.5% without the affine transform and 77.4% with the affine transform.
AB - For security in wireless, voice over IP and cellular telephony applications, there is an emerging need for speaker identification systems (SID) to be robust to speech coding distortion. This paper examines the robustness issue for the 8 kilobits/second ITU-T G.729 codec. The SID system is trained on clean speech and tested on the decoded speech of the G.729 codec. To mitigate the performance loss due to mismatched training and testing conditions, five features are considered and two approaches are used. Four of the five features are based on linear prediction analysis and the other is the mel frequency cepstrum. The first method is feature compensation based on the affine transform and is used to map the features from the test scenario to the train scenario. The second method is feature fusion based on the arithmetic combination of probabilities generated by the vector quantizer classifier. The affine transform and fusion of four features gives the best identification success rate (ISR) of 83.2%. The best performing single feature achieves an ISR of 70.5% without the affine transform and 77.4% with the affine transform.
UR - http://www.scopus.com/inward/record.url?scp=79959240178&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79959240178&partnerID=8YFLogxK
U2 - 10.1109/APCCAS.2010.5774905
DO - 10.1109/APCCAS.2010.5774905
M3 - Conference contribution
AN - SCOPUS:79959240178
SN - 9781424474561
T3 - IEEE Asia-Pacific Conference on Circuits and Systems, Proceedings, APCCAS
SP - 1063
EP - 1066
BT - Proceedings of the 2010 Asia Pacific Conference on Circuit and System, APCCAS 2010
T2 - 2010 Asia Pacific Conference on Circuit and System, APCCAS 2010
Y2 - 6 December 2010 through 9 December 2010
ER -