Speaker identification based on the use of robust cepstral features obtained from pole-zero transfer functions

Mihailo S. Zilovic, Ravi Ramachandran, Richard J. Mammone

Research output: Contribution to journalArticle

33 Citations (Scopus)

Abstract

A common problem in speaker identification systems is that a mismatch in the training and testing conditions sacrifices much performance. We attempt to alleviate this problem by proposing new features that show less variation when speech is corrupted by convolutional noise (channel) and/or additive noise. The conventional feature used is the linear predictive (LP) cepstrum that is derived from an all-pole transfer function which, in turn, achieves a good approximation to the spectral envelope of the speech. Recently, a new cepstral feature based on a pole-zero function (called the adaptive component weighted or ACW cepstrum) was introduced. We propose four additional new cepstral features based on pole-zero transfer functions. One is an alternative way of doing adaptive component weighting and is called the ACW2 cepstrum. Two others (known as the PFL1 cepstrum and the PFL2 cepstrum) are based on a pole-zero postfilter used in speech enhancement. Finally, an autoregressive moving-average (ARMA) analysis of speech results in a pole-zero transfer function describing the spectral envelope. The cepstrum of this transfer function is the feature. Experiments involving a closed set, text-independent and vector quantizer based speaker identification system are done to compare the various features. The TIMIT and King databases are used. The ACW and PFL1 features are the preferred features, since they do as well or better than the LP cepstrum for all the test conditions. The corresponding spectra show a clear emphasis of the formants and no spectral tilt. To enhance robustness, it is important to emphasize the formants. An accurate description of the spectral envelope is not required.

Original languageEnglish (US)
Pages (from-to)260-267
Number of pages8
JournalIEEE Transactions on Speech and Audio Processing
Volume6
Issue number3
DOIs
StatePublished - Dec 1 1998

Fingerprint

transfer functions
Transfer functions
Poles
poles
envelopes
system identification
Identification (control systems)
channel noise
autoregressive moving average
Speech enhancement
Additive noise
counters
education
augmentation
Testing
approximation
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Acoustics and Ultrasonics
  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering

Cite this

@article{9a3ff3dd2dba4308bad606c2dc58be8a,
title = "Speaker identification based on the use of robust cepstral features obtained from pole-zero transfer functions",
abstract = "A common problem in speaker identification systems is that a mismatch in the training and testing conditions sacrifices much performance. We attempt to alleviate this problem by proposing new features that show less variation when speech is corrupted by convolutional noise (channel) and/or additive noise. The conventional feature used is the linear predictive (LP) cepstrum that is derived from an all-pole transfer function which, in turn, achieves a good approximation to the spectral envelope of the speech. Recently, a new cepstral feature based on a pole-zero function (called the adaptive component weighted or ACW cepstrum) was introduced. We propose four additional new cepstral features based on pole-zero transfer functions. One is an alternative way of doing adaptive component weighting and is called the ACW2 cepstrum. Two others (known as the PFL1 cepstrum and the PFL2 cepstrum) are based on a pole-zero postfilter used in speech enhancement. Finally, an autoregressive moving-average (ARMA) analysis of speech results in a pole-zero transfer function describing the spectral envelope. The cepstrum of this transfer function is the feature. Experiments involving a closed set, text-independent and vector quantizer based speaker identification system are done to compare the various features. The TIMIT and King databases are used. The ACW and PFL1 features are the preferred features, since they do as well or better than the LP cepstrum for all the test conditions. The corresponding spectra show a clear emphasis of the formants and no spectral tilt. To enhance robustness, it is important to emphasize the formants. An accurate description of the spectral envelope is not required.",
author = "Zilovic, {Mihailo S.} and Ravi Ramachandran and Mammone, {Richard J.}",
year = "1998",
month = "12",
day = "1",
doi = "10.1109/89.668819",
language = "English (US)",
volume = "6",
pages = "260--267",
journal = "IEEE Transactions on Speech and Audio Processing",
issn = "1063-6676",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "3",

}

Speaker identification based on the use of robust cepstral features obtained from pole-zero transfer functions. / Zilovic, Mihailo S.; Ramachandran, Ravi; Mammone, Richard J.

In: IEEE Transactions on Speech and Audio Processing, Vol. 6, No. 3, 01.12.1998, p. 260-267.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Speaker identification based on the use of robust cepstral features obtained from pole-zero transfer functions

AU - Zilovic, Mihailo S.

AU - Ramachandran, Ravi

AU - Mammone, Richard J.

PY - 1998/12/1

Y1 - 1998/12/1

N2 - A common problem in speaker identification systems is that a mismatch in the training and testing conditions sacrifices much performance. We attempt to alleviate this problem by proposing new features that show less variation when speech is corrupted by convolutional noise (channel) and/or additive noise. The conventional feature used is the linear predictive (LP) cepstrum that is derived from an all-pole transfer function which, in turn, achieves a good approximation to the spectral envelope of the speech. Recently, a new cepstral feature based on a pole-zero function (called the adaptive component weighted or ACW cepstrum) was introduced. We propose four additional new cepstral features based on pole-zero transfer functions. One is an alternative way of doing adaptive component weighting and is called the ACW2 cepstrum. Two others (known as the PFL1 cepstrum and the PFL2 cepstrum) are based on a pole-zero postfilter used in speech enhancement. Finally, an autoregressive moving-average (ARMA) analysis of speech results in a pole-zero transfer function describing the spectral envelope. The cepstrum of this transfer function is the feature. Experiments involving a closed set, text-independent and vector quantizer based speaker identification system are done to compare the various features. The TIMIT and King databases are used. The ACW and PFL1 features are the preferred features, since they do as well or better than the LP cepstrum for all the test conditions. The corresponding spectra show a clear emphasis of the formants and no spectral tilt. To enhance robustness, it is important to emphasize the formants. An accurate description of the spectral envelope is not required.

AB - A common problem in speaker identification systems is that a mismatch in the training and testing conditions sacrifices much performance. We attempt to alleviate this problem by proposing new features that show less variation when speech is corrupted by convolutional noise (channel) and/or additive noise. The conventional feature used is the linear predictive (LP) cepstrum that is derived from an all-pole transfer function which, in turn, achieves a good approximation to the spectral envelope of the speech. Recently, a new cepstral feature based on a pole-zero function (called the adaptive component weighted or ACW cepstrum) was introduced. We propose four additional new cepstral features based on pole-zero transfer functions. One is an alternative way of doing adaptive component weighting and is called the ACW2 cepstrum. Two others (known as the PFL1 cepstrum and the PFL2 cepstrum) are based on a pole-zero postfilter used in speech enhancement. Finally, an autoregressive moving-average (ARMA) analysis of speech results in a pole-zero transfer function describing the spectral envelope. The cepstrum of this transfer function is the feature. Experiments involving a closed set, text-independent and vector quantizer based speaker identification system are done to compare the various features. The TIMIT and King databases are used. The ACW and PFL1 features are the preferred features, since they do as well or better than the LP cepstrum for all the test conditions. The corresponding spectra show a clear emphasis of the formants and no spectral tilt. To enhance robustness, it is important to emphasize the formants. An accurate description of the spectral envelope is not required.

UR - http://www.scopus.com/inward/record.url?scp=0032075135&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0032075135&partnerID=8YFLogxK

U2 - 10.1109/89.668819

DO - 10.1109/89.668819

M3 - Article

AN - SCOPUS:0032075135

VL - 6

SP - 260

EP - 267

JO - IEEE Transactions on Speech and Audio Processing

JF - IEEE Transactions on Speech and Audio Processing

SN - 1063-6676

IS - 3

ER -