TY - GEN
T1 - Non-stationary analysis of DNA sequences
AU - Bouaynaya, Nidhal
AU - Schonfeld, Dan
PY - 2007
Y1 - 2007
N2 - Previous searches for long-range correlations in DNA sequences was carried out using statistical tools for stationary signals. However, genomic signals are non-stationary as can be attested by standard statistical tests for stationarity. In this paper, we address, in the light of non-stationary time-series analysis, the questions of (i) the existence of long-range correlations in DNA sequences and (ii) whether they are present in both coding and non-coding segments or only in the latter. It turns out that the statistical differences between coding and non-coding segments are more subtle than previously claimed by the stationary analysis. Both coding and non-coding sequences exhibit long-range correlations, as asserted by an evolutionary 1/f spectrum (i.e., having a time-dependent spectral exponent). Moreover, the average spectral exponent of non-coding segments is higher than its counterpart for coding segments. To prove that this observation is not an artifact of the 1/f evolutionary spectrum, we show, using an index of randomness that we derive from the frequency-time distribution of the genomic signals, that coding sequences are "more random" (i.e., whiter) than non-coding sequences. We believe that this result is likely the source of confusion and controversy in previous work, which relied on stationary analysis of DNA correlations.
AB - Previous searches for long-range correlations in DNA sequences was carried out using statistical tools for stationary signals. However, genomic signals are non-stationary as can be attested by standard statistical tests for stationarity. In this paper, we address, in the light of non-stationary time-series analysis, the questions of (i) the existence of long-range correlations in DNA sequences and (ii) whether they are present in both coding and non-coding segments or only in the latter. It turns out that the statistical differences between coding and non-coding segments are more subtle than previously claimed by the stationary analysis. Both coding and non-coding sequences exhibit long-range correlations, as asserted by an evolutionary 1/f spectrum (i.e., having a time-dependent spectral exponent). Moreover, the average spectral exponent of non-coding segments is higher than its counterpart for coding segments. To prove that this observation is not an artifact of the 1/f evolutionary spectrum, we show, using an index of randomness that we derive from the frequency-time distribution of the genomic signals, that coding sequences are "more random" (i.e., whiter) than non-coding sequences. We believe that this result is likely the source of confusion and controversy in previous work, which relied on stationary analysis of DNA correlations.
UR - http://www.scopus.com/inward/record.url?scp=47849117917&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=47849117917&partnerID=8YFLogxK
U2 - 10.1109/SSP.2007.4301247
DO - 10.1109/SSP.2007.4301247
M3 - Conference contribution
AN - SCOPUS:47849117917
SN - 142441198X
SN - 9781424411986
T3 - IEEE Workshop on Statistical Signal Processing Proceedings
SP - 200
EP - 204
BT - 2007 IEEE/SP 14th Workshop on Statistical Signal Processing, SSP 2007, Proceedings
T2 - 2007 IEEE/SP 14th WorkShoP on Statistical Signal Processing, SSP 2007
Y2 - 26 August 2007 through 29 August 2007
ER -