Cochannel speaker count labelling based on the use of cepstral and pitch prediction derived features

Michael A. Lewis, Ravi P. Ramachandran

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

Cochannel interference of speech signals is a common practical problem particularly in tactical communications. Ideally, separation of the individual speech signals is desired. However, it is known that when two equal bandwidth signals are added, such a separation is not possible. We examine the problem of identifying temporal regions or frames as being either one-speaker or two-speaker speech. This identification is important in making automatic speaker and speech recognition systems more robust and is based on feature extraction and subsequent classification as is done in pattern recognition. The research has looked into both the closed-set problem where the identity of the tow interfering speakers are known a priori and the more difficult open-set problem where the identities are not known (speaker independent). For the feature extraction step, we propose a new pitch prediction feature (PPF) which is compared with the linear Predictive cepstral coefficients (LPCC) and the mel frequency cepstral coefficients (MFCC). The features are computed and classified on a frame-by-frame basis. We compare the performance of two classifiers, namely, the neural tree network (NTN) and vector quantizer (VQ). The results show that in both the closed-and open-set cases, (1) the VQ is the better classifier and (2) the PPF outperforms both the MFCC and LPCC features. The superiority of the PFF comes with the added benefits of using a scalar feature as opposed to the 12-dimensional vectorial LPCC and MFCC features and a lower VQ codebook size.

Original languageEnglish (US)
Pages (from-to)499-507
Number of pages9
JournalPattern Recognition
Volume34
Issue number2
DOIs
StatePublished - Feb 2001

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Cochannel speaker count labelling based on the use of cepstral and pitch prediction derived features'. Together they form a unique fingerprint.

Cite this