机构地区:[1]西南大学计算机与信息科学学院,重庆400715 [2]北京建筑大学电气与信息工程学院,北京100044 [3]建筑大数据智能处理方法研究北京市重点实验室,北京100044 [4]中国计算机学会(CCF)
出 处:《计算机学报》2019年第6期1274-1288,共15页Chinese Journal of Computers
基 金:国家自然科学基金(61873214,61872300,61741217,61871020,61571163,61532014);重庆市基础与前沿研究项目(cstc2018jcyjAX0228,cstc2016jcyjA0351)资助~~
摘 要:癌症亚型识别在肿瘤异质性分析中具有重要意义.双聚类可以在大规模基因表达数据的基因和样本维度上同时进行聚类分析,发现部分样本在部分基因子集上表达相似的双聚类簇,进而发现相应的癌症亚型,为癌症的精准基因治疗等提供了重要的信息.双聚类算法通过结合基因相互作用网络数据,可进一步提高癌症亚型分类的准确度,但已有整合基因网络的双聚类算法通常仅基于基因的度加权选择基因,易受网络中噪声互作的干扰和缺失互作的误导.为此,该文提出了一种基于基因互作网络正则化的双聚类算法(Network Regularized Bi-Clustering algorithm, NetRBC). NetRBC首先通过最小化聚类簇上的均方残差分别求取癌症基因表达数据矩阵上的基因簇和样本簇指示矩阵;然后利用基因网络和基因簇指示矩阵构建图正则项;最后将此正则项结合到基于均方残差的非负矩阵分解中,约束基因簇和样本簇矩阵的协同分解,以期提高癌症亚型分类的精度.在多个癌症基因表达数据上的实验结果表明,NetRBC比已有相关方法能够更准确地区分癌症亚型.Cancer subtype identification is crucial for understanding tumor heterogeneity.Existing methods for identifying cancer subtypes have primarily focused on utilizing traditional clustering algorithms (such as k -means and hierarchical clustering) to cluster gene expression data and thus to identify subtypes.These traditional approaches, however, separately group the data from genes or samples dimension only, so they cannot discover the patterns that similar genes exhibit similar behaviors only over a subset of conditions (or samples). Bi-clustering can simultaneously group large scale gene expression data from sample and gene dimensions, and find out bi-clusters that relevant samples exhibit similar gene expression profiles over a subset of genes, and thus to identify corresponding cancer subtypes.The discovered bi-clusters bring insights for categorizing cancer subtypes and precise gene treatments.Incorporating the information of gene - gene interaction networks can further improve the quality of the discovered bi-clusters.However, current efforts generally use the networks to weight and select genes.They are often interfered by noisy interactions and misled by missing interactions.There are many types of bi-clusters, including constant bi-cluster, constant row bi-cluster, constant column bi-cluster, coherent values additive bi-cluster and coherent value multiplicative bi-cluster. To address these limitations and explore multiple types of bi-clusters, in this paper, we introduce a gene - gene interaction Network Regularized Bi-Clustering algorithm (NetRBC) based on the Semi-Nonnegative Matrix Tri-Factorization (SNMTF).NetRBC firstly integrates the mean square residuals into SNMFT, and optimizes the gene - cluster and sample - cluster indicator matrices via minimizing the sum-squared loss of the discovered bi-clusters.Next, it constructs a graph regularization term by using the gene networks and gene - cluster indicator matrix.The core idea of the regularization term is that if a pair of genes interact with each ot
关 键 词:双聚类 均方残差 非负矩阵分解 癌症亚型 基因网络
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...