TY - GEN
T1 - Rank-based frame classification for usable speech detection in speaker identification systems
AU - Ethridge, James
AU - Ramachandran, Ravi P.
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/9/9
Y1 - 2015/9/9
N2 - The performance of a speaker identification (SID) system degrades substantially when there is a mismatch between the training and testing conditions. Discriminating between temporal sections of speech signals which are speech-like (SID usable) and noise-like (SID unusable) while only retaining frames labeled SID usable can augment SID performance substantially. In this paper, a novel labeling system for SID usable and SID unusable frames is presented for a GMM based SID system. This is motivated by a control experiment demonstrating that very high SID accuracies are theoretically achievable by removing frames that contribute more to the scores of competing speakers rather than the true speaker. To blindly identify these SID usable and unusable frames, the Mahalanobis distance and an ensemble of decision tree classifiers (with boosting) were trained on a dataset which was different from the enrollment database for the SID system. The classifier based techniques yielded improvements over the base speaker identification system (all frames used) in all cases when the speech signal was corrupted with additive white or additive pink noise.
AB - The performance of a speaker identification (SID) system degrades substantially when there is a mismatch between the training and testing conditions. Discriminating between temporal sections of speech signals which are speech-like (SID usable) and noise-like (SID unusable) while only retaining frames labeled SID usable can augment SID performance substantially. In this paper, a novel labeling system for SID usable and SID unusable frames is presented for a GMM based SID system. This is motivated by a control experiment demonstrating that very high SID accuracies are theoretically achievable by removing frames that contribute more to the scores of competing speakers rather than the true speaker. To blindly identify these SID usable and unusable frames, the Mahalanobis distance and an ensemble of decision tree classifiers (with boosting) were trained on a dataset which was different from the enrollment database for the SID system. The classifier based techniques yielded improvements over the base speaker identification system (all frames used) in all cases when the speech signal was corrupted with additive white or additive pink noise.
UR - http://www.scopus.com/inward/record.url?scp=84961329201&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84961329201&partnerID=8YFLogxK
U2 - 10.1109/ICDSP.2015.7251878
DO - 10.1109/ICDSP.2015.7251878
M3 - Conference contribution
AN - SCOPUS:84961329201
T3 - International Conference on Digital Signal Processing, DSP
SP - 292
EP - 296
BT - 2015 IEEE International Conference on Digital Signal Processing, DSP 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - IEEE International Conference on Digital Signal Processing, DSP 2015
Y2 - 21 July 2015 through 24 July 2015
ER -