Accuracy of class prediction using similarity functions in PAM

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations


Clustering have been proven to be an effective technique for finding data instances with similar characteristics. Such algorithms are based on the notion of distance between data points, often computed using Euclidean metric. That is why, clustering algorithms are mostly applicable to the data sets comprising of numerical values. However, the real life data often consist of features which are categorical in nature. For example, to identify abnormal behavior or a cyberattack in a network, we usually examine packet headers which contain categorical values such as source and destination IP addresses, source and destination port numbers, upper layer protocols, etc. Euclidean metric is not applicable to such data sets because it cannot compute the distance between categorical variables. To address this problem, similarity functions have been designed to determine the relationship between given categorical values. Similarity defines how closely related the objects are to one another. Often similarity could be thought of as opposite to distance where similar objects have high value, while dissimilar objects have low or zero value. In this paper we explored accuracy of various similarity functions using the Partitioning Around Medoids (PAM) clustering algorithm. We tested similarity functions on several data sets to determine their ability to correctly predict the class labels. We also examined the applicability of various similarity functions to different types of data sets.

Original languageEnglish (US)
Title of host publicationProceedings - 2016 IEEE International Conference on Industrial Technology, ICIT 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages6
ISBN (Electronic)9781467380751
StatePublished - May 19 2016
EventIEEE International Conference on Industrial Technology, ICIT 2016 - Taipei, Taiwan, Province of China
Duration: Mar 14 2016Mar 17 2016

Publication series

NameProceedings of the IEEE International Conference on Industrial Technology


OtherIEEE International Conference on Industrial Technology, ICIT 2016
Country/TerritoryTaiwan, Province of China

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Electrical and Electronic Engineering


Dive into the research topics of 'Accuracy of class prediction using similarity functions in PAM'. Together they form a unique fingerprint.

Cite this