一种加权K-均值基因聚类算法  被引量:12

A Weighted K-means Gene Clustering Algorithm

在线阅读下载全文

作  者:姚登举[1] 詹晓娟[2] 张晓晶[1] 

机构地区:[1]哈尔滨理工大学软件学院,黑龙江哈尔滨150040 [2]黑龙江工程学院计算机科学与技术学院,黑龙江哈尔滨150050

出  处:《哈尔滨理工大学学报》2017年第2期112-116,123,共6页Journal of Harbin University of Science and Technology

基  金:黑龙江省教育厅2014年度科学技术研究面上项目(12541124)

摘  要:针对微阵列表达数据集中基因-基因之间存在复杂相关关系的问题,基于随机森林变量重要性分数,提出了一种新的加权K-均值基因聚类算法。首先,以微阵列表达数据中的样本为对象、基因为特征,训练随机森林分类器,计算每个基因的变量重要性分数;然后,以基因为对象、样本为特征、基因的变量重要性分数为权重进行K-均值聚类。在Leukemia、Breast、DLBCL等3个微阵列表数据集上进行了实验,结果表明:所提出的加权K-均值聚类算法与原始的K-均值聚类算法相比,类间距离与总距离的比值平均高出17.7个百分点,具有更好的同质性和差异性。In view of the complex correlation between gene and gene in the microarray data set, a weighted K- mean gene clustering algorithm based on random forest variable importance score was proposed. First, the proposed algorithm begins with training random forest classifier on the microarray data, using the samples as objects and the genes as features, variable importance scores were calculated for each gene; then, a weighted K-means clustering were performed with genes as objects, samples as features, and variable importance score as weighted value. Experiments were carried out on Leukemia, Breast and DLBCL three datasets. The experimental results show that the proposed weighted K- mean clustering algorithm has an average of 17.7 percentage points higher than the original K- mean clustering algorithm with respective to the ratio of the distance between the class and the total distance and has better homogeneity and difference.

关 键 词:微阵列表达数据 聚类分析 随机森林 K-均值 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象