TY - GEN
T1 - Resampling Techniques for Learning under Extreme Verification Latency with Class Imbalance
AU - Frederickson, Christopher
AU - Polikar, Robi
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/10/10
Y1 - 2018/10/10
N2 - A common, yet rarely addressed, real-world problem in computational intelligence applications is learning from non-stationary streaming data, where the underlying distribution of the data changes over time. This problem, also referred to as concept drift, is made even more challenging if, after initially receiving a small set of labeled data, the streaming data only consists of unlabeled data, requiring the learner to adapt to changing underlying distribution without the benefit of labeled data. This particular scenario is typically referred to as learning in initially labeled nonstationary environment, or as extreme verification latency (EVL), pointing to the fact that the label verification of the test data is indefinitely delayed. In our prior work, we have noted that current EVL algorithms - including the algorithm COMPOSE that we have developed - are largely unable to track changing distributions if the data drawn from those distributions are even mildly imbalanced. In this work, we integrate COMPOSE with 13 different resampling based modified algorithms, and compare accuracy, F1 score, and execution time. The results differed from what we originally expected and provided unique insight on how to choose a data rebalancing approach for different types of drift.
AB - A common, yet rarely addressed, real-world problem in computational intelligence applications is learning from non-stationary streaming data, where the underlying distribution of the data changes over time. This problem, also referred to as concept drift, is made even more challenging if, after initially receiving a small set of labeled data, the streaming data only consists of unlabeled data, requiring the learner to adapt to changing underlying distribution without the benefit of labeled data. This particular scenario is typically referred to as learning in initially labeled nonstationary environment, or as extreme verification latency (EVL), pointing to the fact that the label verification of the test data is indefinitely delayed. In our prior work, we have noted that current EVL algorithms - including the algorithm COMPOSE that we have developed - are largely unable to track changing distributions if the data drawn from those distributions are even mildly imbalanced. In this work, we integrate COMPOSE with 13 different resampling based modified algorithms, and compare accuracy, F1 score, and execution time. The results differed from what we originally expected and provided unique insight on how to choose a data rebalancing approach for different types of drift.
UR - http://www.scopus.com/inward/record.url?scp=85056488819&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85056488819&partnerID=8YFLogxK
U2 - 10.1109/IJCNN.2018.8489622
DO - 10.1109/IJCNN.2018.8489622
M3 - Conference contribution
AN - SCOPUS:85056488819
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2018 International Joint Conference on Neural Networks, IJCNN 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 International Joint Conference on Neural Networks, IJCNN 2018
Y2 - 8 July 2018 through 13 July 2018
ER -