基于聚类分析和半监督学习的蛋白质质谱数据分类  被引量:2

Classification of Proteomic Mass Spectrometry Data Based on Affinity Propagation Clustering and Semisupervised Learning

在线阅读下载全文

作  者:祝磊[1] 曹凯敏[1] 游晓璐[1] 徐平[1] 应南娇[1] 

机构地区:[1]杭州电子科技大学生命信息与仪器工程学院,浙江杭州310018

出  处:《航天医学与医学工程》2014年第5期367-372,共6页Space Medicine & Medical Engineering

基  金:国家自然科学基金(60801054;61205200);浙江省自然科学基金(LY12F01005)

摘  要:目的针对高维冗余的SELDI蛋白质质谱数据,提出一种基于聚类分析和半监督学习的数据分类方法。方法算法首先运用t-test对蛋白质质谱数据进行初步降维;然后将处理后的数据用聚类分析算法进行进一步降维;最后运用半监督学习算法传递标签,充分提取有标记样本和无标记样本的信息,从而进行分类。结果在公共卵巢癌数据集OC-WCX2b和公共前列腺癌数据集PC-H4上获得了99.15%和96.75%分类准确率。在浙江省肿瘤医院临床乳腺癌数据集BC-WCX2a上获得了95.18%的分类准确率和100%的敏感性。结论基于聚类分析的半监督学习方法能够有效利用未标记的质谱样本信息,与经典的监督学习算法相比,其分类性能更理想、实用性更好。Objective To propose a classification method based on affinity propagation clustering and semi-supervised learning for the high-dimensional and redundant mass spectrometry data. Methods First,t-test was applied to extract part of component of the proteomic mass spectrometry data preliminarily. Then,the affinity propagation clustering was employed to extract the principal component. Finally,to take advantage of both labeled samples and unlabeled samples,semi-supervised learning was used to predict the labels. Results The classification accuracy of the algorithm proved to be 99. 15% and 96. 75% respectively in the public ovarian cancer database OC-WCX2 b and the public prostate cancer database PC-H4. In the clinical breast cancer database BC-WCX2 a of Zhejiang Cancer Hospital,the classification accuracy was 95. 18% and the sensitivity was 100%. Conclusion The experimental results demonstrate that the method of classification based on affinity propagation clustering and semi-supervised learning can effectively make use of the information from unlabeled mass spectrometry samples. Compared with the supervised learning method,it proves to be a more ideal method of classification and more practical.

关 键 词:蛋白质质谱 聚类分析 半监督学习 特征提取 

分 类 号:R318.04[医药卫生—生物医学工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象