Learning from streaming data with concept drift and imbalance: An overview

T. Ryan Hoens, Robi Polikar, Nitesh V. Chawla

Research output: Contribution to journalReview articlepeer-review

152 Scopus citations

Abstract

The primary focus of machine learning has traditionally been on learning from data assumed to be sufficient and representative of the underlying fixed, yet unknown, distribution. Such restrictions on the problem domain paved the way for development of elegant algorithms with theoretically provable performance guarantees. As is often the case, however, real-world problems rarely fit neatly into such restricted models. For instance class distributions are often skewed, resulting in the "class imbalance" problem. Data drawn from non-stationary distributions is also common in real-world applications, resulting in the "concept drift" or "non-stationary learning" problem which is often associated with streaming data scenarios. Recently, these problems have independently experienced increased research attention, however, the combined problem of addressing all of the above mentioned issues has enjoyed relatively little research. If the ultimate goal of intelligent machine learning algorithms is to be able to address a wide spectrum of real-world scenarios, then the need for a general framework for learning from, and adapting to, a non-stationary environment that may introduce imbalanced data can be hardly overstated. In this paper, we first present an overview of each of these challenging areas, followed by a comprehensive review of recent research for developing such a general framework.

Original languageEnglish (US)
Pages (from-to)89-101
Number of pages13
JournalProgress in Artificial Intelligence
Volume1
Issue number1
DOIs
StatePublished - Apr 2012

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Learning from streaming data with concept drift and imbalance: An overview'. Together they form a unique fingerprint.

Cite this