TY - GEN
T1 - Quantifying the limited and gradual concept drift assumption
AU - Sarnelle, Joseph
AU - Sanchez, Anthony
AU - Capo, Robert
AU - Haas, Joshua
AU - Polikar, Robi
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/9/28
Y1 - 2015/9/28
N2 - Nonstationary environments, where underlying distributions change over time, are becoming increasingly common in real-world applications. A specific example of such an environment is concept drift, where the joint probability distributions of observed data drift over time. Such environments call for a model that can update its parameters to adapt to the changing environment. An extreme case of this scenario, referred to as extreme verification latency, is where labeled data are only available at initialization, with unlabeled data becoming available in a streaming fashion thereafter. In such a scenario, the classifier must update its hypothesis based on only unlabeled data drawn from the drifting distributions. In our prior work, we described a framework, called COMPOSE, that works well in this type of environment, provided that the data distributions experience limited (or gradual) drift. Limited drift assumption is common in many concept drift algorithms yet - surprisingly - there is little or no formal definition of this assumption. In this contribution, we describe a mechanism to formally quantify limited drift. We define two metrics, one that represents the normalized class separation drift, and the other that uses the ratio of between-class separations and within class drift through time. We test these metrics on both synthetic and real world problems, and argue that the latter can be more suitably used.
AB - Nonstationary environments, where underlying distributions change over time, are becoming increasingly common in real-world applications. A specific example of such an environment is concept drift, where the joint probability distributions of observed data drift over time. Such environments call for a model that can update its parameters to adapt to the changing environment. An extreme case of this scenario, referred to as extreme verification latency, is where labeled data are only available at initialization, with unlabeled data becoming available in a streaming fashion thereafter. In such a scenario, the classifier must update its hypothesis based on only unlabeled data drawn from the drifting distributions. In our prior work, we described a framework, called COMPOSE, that works well in this type of environment, provided that the data distributions experience limited (or gradual) drift. Limited drift assumption is common in many concept drift algorithms yet - surprisingly - there is little or no formal definition of this assumption. In this contribution, we describe a mechanism to formally quantify limited drift. We define two metrics, one that represents the normalized class separation drift, and the other that uses the ratio of between-class separations and within class drift through time. We test these metrics on both synthetic and real world problems, and argue that the latter can be more suitably used.
UR - http://www.scopus.com/inward/record.url?scp=84945279499&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84945279499&partnerID=8YFLogxK
U2 - 10.1109/IJCNN.2015.7280850
DO - 10.1109/IJCNN.2015.7280850
M3 - Conference contribution
AN - SCOPUS:84945279499
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2015 International Joint Conference on Neural Networks, IJCNN 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - International Joint Conference on Neural Networks, IJCNN 2015
Y2 - 12 July 2015 through 17 July 2015
ER -