基于特征交互与权重集成的癌症分类方法  被引量:3

Cancer classification method based on feature interaction and weight integration

在线阅读下载全文

作  者:陈昊楠 金敏 Chen Haonan;Jin Min(College of Computer Science&Electronic Engineering,Hunan University,Changsha 410082,China)

机构地区:[1]湖南大学信息科学与工程学院,长沙410082

出  处:《计算机应用研究》2021年第4期1051-1057,共7页Application Research of Computers

基  金:国家自然科学基金资助项目(61773157)。

摘  要:在癌症分类研究领域,高维、高冗余、类分布不平衡的基因表达数据如何进行特征选择与分类模型构建一直是影响分类准确率的难点。为了提高癌症分类的准确率,提出了基于特征交互与权重集成的癌症分类方法。在特征选择层面,利用多特征对分类信息的增益性交互作用来选出对于标签联合互信息大于单独互信息之和的特征组合,并利用条件互信息选择低冗余的特征,解决基因表达数据的高维、高冗余问题。在分类模型层面,提出结合权重集成反馈机制的二次学习集成模型,综合不同模型对不同类别样本的差异拟合能力,构造不依赖于样本数量的类权重,解决数据类分布不平衡的问题。应用该方法对六种癌症数据进行分类测试,accuracy、sensitivity、precision和F-measure四项指标均稳定在99.39%以上、specificity在94.74%以上,表明该方法能有效提高癌症分类的准确率和稳定性,同时具有对于不同癌症分类的通用性。In the field of cancer classification,gene expression profile data has the characteristics of high dimensions,high redundancy,and unbalanced class distribution,which are the factors that affect the accuracy of classification.In order to improve the accuracy of cancer classification,this paper proposed a cancer classification method based on feature interaction and weight integration.At the feature selection level,this method used the gaining interaction of multiple features to select the features with the joint mutual information that was greater than the sum of the individual mutual information,and further used conditional mutual information to select low-redundancy features.At the classification model level,the re-learning ensemble model combined with weight integration feedback mechanism could comprehensively consider the different fitting ability of multiple models for different types of samples.This model constructed class weight that did not depend on the number of samples,and solved the problem of unbalanced class distribution.Comparative experiments of six kinds of cancer data show that the four indicators of accuracy,sensitivity,precision and F-measure are all stable above 99.39%,and the specificity is above 94.74%,which indicates that the method can further improve the accuracy and stability of cancer classification and the versatility of different cancers.

关 键 词:癌症分类 数据科学 特征交互 多元异构模型 权重集成反馈 二次学习集成模型 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象