TY - GEN
T1 - Using Deep Speech Recognition to Evaluate Speech Enhancement Methods
AU - Siddiqui, Shamoon
AU - Rasool, Ghulam
AU - Ramachandran, Ravi P.
AU - Bouaynaya, Nidhal C.
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/7
Y1 - 2020/7
N2 - Progress in speech-related tasks is dependent on the quality of the speech signal being processed. While much progress has been made in various aspects of speech processing (including but not limited to, speech recognition, language detection, and speaker diarization), enhancing a noise-corrupted speech signal as it relates to those tasks has not been rigorously evaluated. Speech enhancement aims to improve the signal-to-noise ratio of a noise-corrupted signal to boost the speech elements (signal) and reduce the non-speech ones (noise). Speech enhancement techniques are evaluated using metrics that are either subjective (asking people their opinion of the enhanced signal) or objective (attempt to calculate metrics based on the signal itself). The subjective measures are better indicators of improved quality but do not scale well to large datasets. The objective metrics have mostly been constructed to attempt to model the subjective results. Our goal in this work is to establish a benchmark to assess the improvement of speech enhancement as it relates to the downstream task of automated speech recognition. In doing so, we retain the qualities of subjective measures while ensuring that evaluation can be done at a large scale in an automated fashion. We explore the impact of various noise types, including stationary, non-stationary, and a shift in noise distribution. We found that existing objective metrics are not a strong indicator of performance as it relates to an improvement in a downstream task. As such, we believe that Word Error Rate should be used when the downstream task is automated speech recognition.
AB - Progress in speech-related tasks is dependent on the quality of the speech signal being processed. While much progress has been made in various aspects of speech processing (including but not limited to, speech recognition, language detection, and speaker diarization), enhancing a noise-corrupted speech signal as it relates to those tasks has not been rigorously evaluated. Speech enhancement aims to improve the signal-to-noise ratio of a noise-corrupted signal to boost the speech elements (signal) and reduce the non-speech ones (noise). Speech enhancement techniques are evaluated using metrics that are either subjective (asking people their opinion of the enhanced signal) or objective (attempt to calculate metrics based on the signal itself). The subjective measures are better indicators of improved quality but do not scale well to large datasets. The objective metrics have mostly been constructed to attempt to model the subjective results. Our goal in this work is to establish a benchmark to assess the improvement of speech enhancement as it relates to the downstream task of automated speech recognition. In doing so, we retain the qualities of subjective measures while ensuring that evaluation can be done at a large scale in an automated fashion. We explore the impact of various noise types, including stationary, non-stationary, and a shift in noise distribution. We found that existing objective metrics are not a strong indicator of performance as it relates to an improvement in a downstream task. As such, we believe that Word Error Rate should be used when the downstream task is automated speech recognition.
UR - http://www.scopus.com/inward/record.url?scp=85093832291&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85093832291&partnerID=8YFLogxK
U2 - 10.1109/IJCNN48605.2020.9206817
DO - 10.1109/IJCNN48605.2020.9206817
M3 - Conference contribution
AN - SCOPUS:85093832291
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2020 International Joint Conference on Neural Networks, IJCNN 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 International Joint Conference on Neural Networks, IJCNN 2020
Y2 - 19 July 2020 through 24 July 2020
ER -