检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈昊楠 金敏 Chen Haonan;Jin Min(College of Computer Science&Electronic Engineering,Hunan University,Changsha 410082,China)
机构地区:[1]湖南大学信息科学与工程学院,长沙410082
出 处:《计算机应用研究》2021年第4期1051-1057,共7页Application Research of Computers
基 金:国家自然科学基金资助项目(61773157)。
摘 要:在癌症分类研究领域,高维、高冗余、类分布不平衡的基因表达数据如何进行特征选择与分类模型构建一直是影响分类准确率的难点。为了提高癌症分类的准确率,提出了基于特征交互与权重集成的癌症分类方法。在特征选择层面,利用多特征对分类信息的增益性交互作用来选出对于标签联合互信息大于单独互信息之和的特征组合,并利用条件互信息选择低冗余的特征,解决基因表达数据的高维、高冗余问题。在分类模型层面,提出结合权重集成反馈机制的二次学习集成模型,综合不同模型对不同类别样本的差异拟合能力,构造不依赖于样本数量的类权重,解决数据类分布不平衡的问题。应用该方法对六种癌症数据进行分类测试,accuracy、sensitivity、precision和F-measure四项指标均稳定在99.39%以上、specificity在94.74%以上,表明该方法能有效提高癌症分类的准确率和稳定性,同时具有对于不同癌症分类的通用性。In the field of cancer classification,gene expression profile data has the characteristics of high dimensions,high redundancy,and unbalanced class distribution,which are the factors that affect the accuracy of classification.In order to improve the accuracy of cancer classification,this paper proposed a cancer classification method based on feature interaction and weight integration.At the feature selection level,this method used the gaining interaction of multiple features to select the features with the joint mutual information that was greater than the sum of the individual mutual information,and further used conditional mutual information to select low-redundancy features.At the classification model level,the re-learning ensemble model combined with weight integration feedback mechanism could comprehensively consider the different fitting ability of multiple models for different types of samples.This model constructed class weight that did not depend on the number of samples,and solved the problem of unbalanced class distribution.Comparative experiments of six kinds of cancer data show that the four indicators of accuracy,sensitivity,precision and F-measure are all stable above 99.39%,and the specificity is above 94.74%,which indicates that the method can further improve the accuracy and stability of cancer classification and the versatility of different cancers.
关 键 词:癌症分类 数据科学 特征交互 多元异构模型 权重集成反馈 二次学习集成模型
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7