TY - GEN
T1 - Cluster tree based multi-label classification for protein function prediction
AU - Wu, Qingyao
AU - Ye, Yunming
AU - Zhang, Xiaofeng
AU - Ho, Shen Shyang
PY - 2013
Y1 - 2013
N2 - Automatically assigning functions for unknown proteins is a key task in computational biology. Proteins in nature have multiple classes according to the functions they perform. Many efforts have been made to cast the protein function prediction into a multi-label learning problem. This paper proposes a novel Cluster Tree based Multi-label Learning algorithm (CTML) for protein function prediction. The main idea is to compute a set of predictive labels associated at each node for multi-label prediction by using the k-means clustering techniques and the predictive functions via the learning data at the nodes. With the propagation of the predictive labels from the root node to the leaf node, the correlations between labels can be preserved. Experimental results on benchmark data (genbase and yeast datasets) show that the proposed CTML algorithm is effective in predicting protein functions. Moreover, the classification performance of the CTML algorithm is competitive against the other baseline multi-label learning algorithms.
AB - Automatically assigning functions for unknown proteins is a key task in computational biology. Proteins in nature have multiple classes according to the functions they perform. Many efforts have been made to cast the protein function prediction into a multi-label learning problem. This paper proposes a novel Cluster Tree based Multi-label Learning algorithm (CTML) for protein function prediction. The main idea is to compute a set of predictive labels associated at each node for multi-label prediction by using the k-means clustering techniques and the predictive functions via the learning data at the nodes. With the propagation of the predictive labels from the root node to the leaf node, the correlations between labels can be preserved. Experimental results on benchmark data (genbase and yeast datasets) show that the proposed CTML algorithm is effective in predicting protein functions. Moreover, the classification performance of the CTML algorithm is competitive against the other baseline multi-label learning algorithms.
UR - http://www.scopus.com/inward/record.url?scp=84894588111&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84894588111&partnerID=8YFLogxK
U2 - 10.1109/BIBM.2013.6732548
DO - 10.1109/BIBM.2013.6732548
M3 - Conference contribution
AN - SCOPUS:84894588111
SN - 9781479913091
T3 - Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
SP - 513
EP - 516
BT - Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
T2 - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
Y2 - 18 December 2013 through 21 December 2013
ER -