检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]哈尔滨理工大学软件学院,黑龙江哈尔滨150040 [2]黑龙江工程学院计算机科学与技术学院,黑龙江哈尔滨150050
出 处:《哈尔滨理工大学学报》2017年第2期112-116,123,共6页Journal of Harbin University of Science and Technology
基 金:黑龙江省教育厅2014年度科学技术研究面上项目(12541124)
摘 要:针对微阵列表达数据集中基因-基因之间存在复杂相关关系的问题,基于随机森林变量重要性分数,提出了一种新的加权K-均值基因聚类算法。首先,以微阵列表达数据中的样本为对象、基因为特征,训练随机森林分类器,计算每个基因的变量重要性分数;然后,以基因为对象、样本为特征、基因的变量重要性分数为权重进行K-均值聚类。在Leukemia、Breast、DLBCL等3个微阵列表数据集上进行了实验,结果表明:所提出的加权K-均值聚类算法与原始的K-均值聚类算法相比,类间距离与总距离的比值平均高出17.7个百分点,具有更好的同质性和差异性。In view of the complex correlation between gene and gene in the microarray data set, a weighted K- mean gene clustering algorithm based on random forest variable importance score was proposed. First, the proposed algorithm begins with training random forest classifier on the microarray data, using the samples as objects and the genes as features, variable importance scores were calculated for each gene; then, a weighted K-means clustering were performed with genes as objects, samples as features, and variable importance score as weighted value. Experiments were carried out on Leukemia, Breast and DLBCL three datasets. The experimental results show that the proposed weighted K- mean clustering algorithm has an average of 17.7 percentage points higher than the original K- mean clustering algorithm with respective to the ratio of the distance between the class and the total distance and has better homogeneity and difference.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229