A generative model with network regularization for semi-supervised collective classification

Ruichao Shi, Qingyao Wu, Yunming Ye, Shen Shyang Ho

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

In recent years much effort has been devoted to Collective Classification (CC) techniques for predicting labels of linked instances. Given a large number of labeled data, conventional CC algorithms make use of local labeled neighbours to increase accuracy. However, in many real-world applications, labeled data are limited and very expensive to obtain. In this situation, most of the data have no connection to labeled data, and supervision knowledge cannot be obtained from the local connections. Recently, Semi-Supervised Collective Classification (SSCC) has been examined to leverage unlabeled data for enhancing the classification performance of CC. In this paper we propose a probabilistic generative model with network regularization (GMNR) for SSCC. Our main idea is to compute label probability distributions for unlabeled instances by maximizing both the log-likelihood in the generative model and the label smoothness on the network topology of data. The proposed generative model is based on the Probabilistic Latent Semantic Analysis (PLSA) method using attribute features of all instances. A network regularizer is employed to smooth the label probability distributions on the network topology of data. Finally, we develop an effective EM algorithm to compute the label probability distributions for label prediction. Experimental results on three real sparsely-labeled network datasets show that the proposed model GMNR outperforms state-of-theart CC algorithms and other SSCC algorithms.

Original languageEnglish (US)
Title of host publicationSIAM International Conference on Data Mining 2014, SDM 2014
EditorsMohammed J. Zaki, Arindam Banerjee, Srinivasan Parthasarathy, Pang Ning-Tan, Zoran Obradovic, Chandrika Kamath
PublisherSociety for Industrial and Applied Mathematics Publications
Pages64-72
Number of pages9
ISBN (Electronic)9781510811515
DOIs
StatePublished - 2014
Externally publishedYes
Event14th SIAM International Conference on Data Mining, SDM 2014 - Philadelphia, United States
Duration: Apr 24 2014Apr 26 2014

Publication series

NameSIAM International Conference on Data Mining 2014, SDM 2014
Volume1

Other

Other14th SIAM International Conference on Data Mining, SDM 2014
Country/TerritoryUnited States
CityPhiladelphia
Period4/24/144/26/14

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'A generative model with network regularization for semi-supervised collective classification'. Together they form a unique fingerprint.

Cite this