TY - GEN

T1 - A martingale framework for concept change detection in time-varying data streams

AU - Ho, Shen Shyang

PY - 2005

Y1 - 2005

N2 - In a data streaming setting, data points are observed one by one. The concepts to be learned from the data points may change infinitely often as the data is streaming. In this paper, we extend the idea of testing exchangeability online (Vovk et al., 2003) to a martingale framework to detect concept changes in time-varying data streams. Two martingale tests are developed to detect concept changes using: (i) martingale values, a direct consequence of the Doob's Maximal Inequality, and (ii) the martingale difference, justified using the Hoeffding-Azuma Inequality. Under some assumptions, the second test theoretically has a lower probability than the first test of rejecting the null hypothesis, "no concept change in the data stream", when it is in fact correct. Experiments show that both martingale tests are effective in detecting concept changes in time-varying data streams simulated using two synthetic data sets and three benchmark data sets.

AB - In a data streaming setting, data points are observed one by one. The concepts to be learned from the data points may change infinitely often as the data is streaming. In this paper, we extend the idea of testing exchangeability online (Vovk et al., 2003) to a martingale framework to detect concept changes in time-varying data streams. Two martingale tests are developed to detect concept changes using: (i) martingale values, a direct consequence of the Doob's Maximal Inequality, and (ii) the martingale difference, justified using the Hoeffding-Azuma Inequality. Under some assumptions, the second test theoretically has a lower probability than the first test of rejecting the null hypothesis, "no concept change in the data stream", when it is in fact correct. Experiments show that both martingale tests are effective in detecting concept changes in time-varying data streams simulated using two synthetic data sets and three benchmark data sets.

UR - http://www.scopus.com/inward/record.url?scp=31844442669&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=31844442669&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:31844442669

SN - 1595931805

T3 - ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning

SP - 321

EP - 328

BT - ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning

A2 - Raedt, L.

A2 - Wrobel, S.

T2 - ICML 2005: 22nd International Conference on Machine Learning

Y2 - 7 August 2005 through 11 August 2005

ER -