检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:秦喜文[1,2] 王芮 张斯琪 Qin Xiwen;Wang Rui;Zhang Siqi(Isiute of Big Duta Science,Changchun Uniersity of Technology,Changchun 130012,China;Graduate School.Changchun Uniesity of Technolgy,Changchun 130012,China;School of Mathematics and Saisties,Changchun Unirersity of Technology,Changchun 130012,China)
机构地区:[1]长春工业大学大数据科学研究院,长春130012 [2]长春工业大学研究生院,长春130012 [3]长春工业大学数学与统计学院,长春130012
出 处:《中国生物医学工程学报》2022年第2期177-185,共9页Chinese Journal of Biomedical Engineering
基 金:国家自然科学基金(11301036,12026430);吉林省教育厅科研项目(JJKH20170540KJ,JJKH20210716KJ)。
摘 要:乳腺癌基因数据的分类研究在临床医学上具有重要意义。针对基因数据的结构复杂、高维小样本等特点,提出一种最大相关最小条件冗余和深度级联森林结合的基因数据分类方法。选取博德基因研究所乳腺癌基因表达数据集,共98个数据作为样本,每个样本包含1 213个特征基因。首先对数据进行标准化处理,然后利用最大相关最小条件冗余选取特征子集,最后使用深度级联森林对特征子集进行分类。将随机森林、支持向量机和BP神经网络作为对比方法。结果表明,所提出的最大相关最小条件冗余和深度级联森林结合方法的最佳分类准确率达到93.78%,明显优于其他方法。该方法能有效提高乳腺癌基因数据的分类准确率,对基于基因数据的乳腺癌分类具有重要的理论意义与实用价值。The classification of breast cancer gene data is of great importance in clinical medicine. Aiming at the characteristics of complex structure, high-dimensional and small samples of gene data, this paper proposes a gene data classification method based on the max-relevance and min-conditional redundancy( mRMCR) and multi-grained cascade forest( gcForest). A total of 98 data were selected from the breast cancer gene expression data set of the Broad Gene Research Institute, and each sample contained 1 213 characteristic genes. Firstly, the data are standardized, then the feature subsets are selected by using the max-relevance and min-conditional redundancy, and finally the feature subsets are classified by the gcForest. Taking random forest, support vector machine and BP neural network as comparison methods, the results show that the best classification accuracy of the proposed combination method of mRMCR and gcForest is 93.78%, which is obviously better than other methods. This method can effectively improve the classification accuracy of breast cancer gene data, and has important theoretical significance and practical value for breast cancer classification based on gene data.
关 键 词:乳腺癌分类 基因表达数据 变量选择 最大相关最小冗余 深度级联森林
分 类 号:R318[医药卫生—生物医学工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.117.85.73