TY - GEN

T1 - Optimal N-ary ECOC Matrices for Ensemble Classification

AU - Nguyen, Hieu D.

AU - Lavalva, Lucas J.

AU - Hot, Shen Shyang

AU - Khan, Mohammed Sarosh

AU - Kaegi, Nicholas

N1 - Funding Information:
The authors would like to acknowledge partial financial support from the Center for Undergraduate Research in Mathematics (CURM) through NSF grant DMS-1722563.
Publisher Copyright:
© 2021 IEEE.

PY - 2021

Y1 - 2021

N2 - A new recursive construction of $N$ -ary error-correcting output code (ECOC) matrices for ensemble classification methods is presented, generalizing the classic doubling construction for binary Hadamard matrices. Given any prime integer $N$, this deterministic construction generates base- $N$ symmetric square matrices $M$ of prime-power dimension having optimal minimum Hamming distance between any two of its rows and columns. Experimental results for six datasets demonstrate that using these deterministic coding matrices for $N$ -ary ECOC classification yields comparable and in many cases higher accuracy compared to using randomly generated coding matrices. This is particular true when $N$ is adaptively chosen so that the dimension of $M$ matches closely with the number of classes in a dataset, which reduces the loss in minimum Hamming distance when $M$ is truncated to fit the dataset. This is verified through a distance formula for $M$ which shows that these adaptive matrices have significantly higher minimum Hamming distance in comparison to randomly generated ones.

AB - A new recursive construction of $N$ -ary error-correcting output code (ECOC) matrices for ensemble classification methods is presented, generalizing the classic doubling construction for binary Hadamard matrices. Given any prime integer $N$, this deterministic construction generates base- $N$ symmetric square matrices $M$ of prime-power dimension having optimal minimum Hamming distance between any two of its rows and columns. Experimental results for six datasets demonstrate that using these deterministic coding matrices for $N$ -ary ECOC classification yields comparable and in many cases higher accuracy compared to using randomly generated coding matrices. This is particular true when $N$ is adaptively chosen so that the dimension of $M$ matches closely with the number of classes in a dataset, which reduces the loss in minimum Hamming distance when $M$ is truncated to fit the dataset. This is verified through a distance formula for $M$ which shows that these adaptive matrices have significantly higher minimum Hamming distance in comparison to randomly generated ones.

UR - http://www.scopus.com/inward/record.url?scp=85125767736&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85125767736&partnerID=8YFLogxK

U2 - 10.1109/SSCI50451.2021.9660146

DO - 10.1109/SSCI50451.2021.9660146

M3 - Conference contribution

AN - SCOPUS:85125767736

T3 - 2021 IEEE Symposium Series on Computational Intelligence, SSCI 2021 - Proceedings

BT - 2021 IEEE Symposium Series on Computational Intelligence, SSCI 2021 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2021 IEEE Symposium Series on Computational Intelligence, SSCI 2021

Y2 - 5 December 2021 through 7 December 2021

ER -