基于递归零空间线性判别分析算法的蛋白质质谱数据特征选择  被引量:3

Feature Selection for Protein Mass Spectrum Data Based on Recursive Null Space Linear Discriminant Analysis Algorithm

在线阅读下载全文

作  者:王尧佳[1] 祝磊[1] 韩斌[1] 厉力华[1] 郑智国[2] 牟瀚舟[2] 

机构地区:[1]杭州电子科技大学自动化学院生物医学工程与仪器研究所,浙江杭州310018 [2]浙江省肿瘤研究所,浙江杭州310022

出  处:《航天医学与医学工程》2010年第5期324-328,共5页Space Medicine & Medical Engineering

基  金:国家自然科学基金(60801054;60801055);国家杰出青年科学基金(60788101);浙江省重大科技攻关国际合作项目(2006C14026)

摘  要:目的针对蛋白质质谱数据,采用一种新的基于特征选择的算法提取判别特征,提高癌症辅助诊断的准确率。方法将小波特征与递归零空间线性判别分析(LDA)特征选择算法相结合,首先对数据进行多分辨率的小波分解,提取样本细节特征;接着运用t-test进行筛选,初步降低数据的特征维数;然后递归调用零空间LDA算法,筛选出最具判别意义的蛋白位点;最后采用支持向量机(SVM)分类器估算算法性能。采用十折交叉验证进行测试。结果在公共数据卵巢癌OC-WCX2a上的分类率达到98.3%。在浙江省肿瘤医院提供的临床乳腺癌BC-WCX2a数据上分类率为91.45%,敏感性为97.2%。同时,该算法有效地降低了所选特征间的相关性。结论本算法可充分提取蛋白质质谱数据中的判别特征,从而更有利于癌症的辅助诊断。Objective To extract the distinguished features and improve accuracy of diagnosis of cancer by u- sing a new algorithm based on feature selection for protein mass spectrum data. Methods The wavelet features and recursive null space linear discriminant analysis(LDA) feature selection algorithm were combined. First- ly, the multi-resolution wavelet decomposition was used to extract the detail features of the protein spectrum data. Then, t-test was used to screen the data, the dimension of features was preliminary reduced. Thirdly, the recursive null space LDA algorithm was adopted to select the most discriminative protein features. Finally, we used SVM classifier to estimate the performance of the algorithm. The 10-fold cross validation test was em- ployed. Results The satisfactory classification results of the algorithm in the public ovarian cancer data OC- WCX2a could reach up to 98.3%. In the clinical breast cancer data BC-WCX2a provided by Zhejiang Cancer Hospital, the classification rate was 91.45% and the sensitivity was 97.2%. In addition, the algorithm effec- tively reduced the correlation between the selected features. Conclusion The feature selection algorithm which combines the wavelet features and recursive null space LDA algorithm can fully extract the discriminative fea- tures of spectrum data and thus be more helpful for cancer diagnosis.

关 键 词:癌症分类 蛋白质质谱 递归零空间线性判别分析 特征选择 

分 类 号:Q789[生物学—分子生物学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象