A martingale framework for concept change detection in time-varying data streams

Research output: Chapter in Book/Report/Conference proceedingConference contribution

39 Scopus citations

Abstract

In a data streaming setting, data points are observed one by one. The concepts to be learned from the data points may change infinitely often as the data is streaming. In this paper, we extend the idea of testing exchangeability online (Vovk et al., 2003) to a martingale framework to detect concept changes in time-varying data streams. Two martingale tests are developed to detect concept changes using: (i) martingale values, a direct consequence of the Doob's Maximal Inequality, and (ii) the martingale difference, justified using the Hoeffding-Azuma Inequality. Under some assumptions, the second test theoretically has a lower probability than the first test of rejecting the null hypothesis, "no concept change in the data stream", when it is in fact correct. Experiments show that both martingale tests are effective in detecting concept changes in time-varying data streams simulated using two synthetic data sets and three benchmark data sets.

Original languageEnglish (US)
Title of host publicationICML 2005 - Proceedings of the 22nd International Conference on Machine Learning
EditorsL. Raedt, S. Wrobel
Pages321-328
Number of pages8
Publication statusPublished - Dec 1 2005
EventICML 2005: 22nd International Conference on Machine Learning - Bonn, Germany
Duration: Aug 7 2005Aug 11 2005

Publication series

NameICML 2005 - Proceedings of the 22nd International Conference on Machine Learning

Other

OtherICML 2005: 22nd International Conference on Machine Learning
CountryGermany
CityBonn
Period8/7/058/11/05

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Ho, S. S. (2005). A martingale framework for concept change detection in time-varying data streams. In L. Raedt, & S. Wrobel (Eds.), ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning (pp. 321-328). (ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning).