Manifold Learning for Multivariate Variable-Length Sequences with an Application to Similarity Search

Shen Shyang Ho, Peng Dai, Frank Rudzicz

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

Multivariate variable-length sequence data are becoming ubiquitous with the technological advancement in mobile devices and sensor networks. Such data are difficult to compare, visualize, and analyze due to the nonmetric nature of data sequence similarity measures. In this paper, we propose a general manifold learning framework for arbitrary-length multivariate data sequences driven by similarity/distance (parameter) learning in both the original data sequence space and the learned manifold. Our proposed algorithm transforms the data sequences in a nonmetric data sequence space into feature vectors in a manifold that preserves the data sequence space structure. In particular, the feature vectors in the manifold representing similar data sequences remain close to one another and far from the feature points corresponding to dissimilar data sequences. To achieve this objective, we assume a semisupervised setting where we have knowledge about whether some of data sequences are similar or dissimilar, called the instance-level constraints. Using this information, one learns the similarity measure for the data sequence space and the distance measures for the manifold. Moreover, we describe an approach to handle the similarity search problem given user-defined instance level constraints in the learned manifold using a consensus voting scheme. Experimental results on both synthetic data and real tropical cyclone sequence data are presented to demonstrate the feasibility of our manifold learning framework and the robustness of performing similarity search in the learned manifold.

Original languageEnglish (US)
Article number7060711
Pages (from-to)1333-1344
Number of pages12
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume27
Issue number6
DOIs
StatePublished - Jun 2016
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Manifold Learning for Multivariate Variable-Length Sequences with an Application to Similarity Search'. Together they form a unique fingerprint.

Cite this