机构地区:[1]太原师范学院计算机科学与技术系,山西晋中030619 [2]山西大学计算机与信息技术学院,太原030006
出 处:《计算机科学》2018年第7期230-236,242,共8页Computer Science
基 金:山西省回国留学人员科研基金(2017-014);国家自然科学基金(61572005);山西省软科学研究项目(2016041036-4)资助
摘 要:大型癌症基因组项目(TCGA,ICGC等)产生了大量的癌症组学数据,使人们深入研究癌症变为可能,其中寻找引发癌症的相关突变基因是一个重要挑战。在癌细胞中,基因变异可分为两类:一类是可导致癌症发生的驱动突变(driver mutation),另一类是对癌症发生扩散没有影响的乘客突变(passenger mutation)。识别癌症驱动基因有利于理解癌症发病原理和发展进程以及研发癌症药物或进行靶向治疗,是生物信息学中的重要问题。文中提出一种基于突变基因网络的癌症驱动通路识别算法GNDP,对癌症病人的体细胞突变数据进行分析。该算法定义了非重叠平衡度来度量基因对的位于同一驱动通路的可能性;根据基因对的非重叠平衡度、互斥和覆盖度,构建基因互斥网络,很大程度上减少了网络边数,提高了计算效率;在所构造的基因互斥网络中将查找到的极大团作为潜在驱动通路基因集合;用覆盖度和互斥度对潜在驱动通路基因集合进行筛选,得到其极大权重子团,并将其作为识别出的驱动通路。分别在模拟数据、肺腺癌以及多形性成胶质细胞瘤突变数据上对GNDP算法进行有效性验证,并将其与经典驱动通路识别算法Dendrix和Multi-Dendrix进行实验对比。结果表明,GNDP不需要指定驱动通路的基因个数,能在模拟数据上准确检测出所有人工设置的驱动通路;针对肺腺癌和多形性成胶质细胞瘤突变数据,GNDP在不需要任何先验知识的情况下达到较高的识别准确率,能高效地识别出主要驱动通路,其结果优于对比算法。Large cancer genome projects such as The Cancer Genome Atlas(TCGA)and International Cancer Genome Consortium(ICGC)have produced big amount of data collected from patients with different cancer types.The identification of mutated genes causing cancer is a significant challenge.Genovariation in cancer cells can be divided into two types:functional driver mutation and random passenger mutation.Identifcation of driver genes is benefit to understand the pathogenesis and progression of cancer,as well as research cancer drug and targeted therapy,and it is an essential problem in the field of bioinformatics.This paper proposed a driver pathway identification algorithm based on mutated gene networks for cancer(GNDP).In GNDP,a nonoverlap balance metric is defined to measure the possibility of two genes lying in the same driver pathway.To reduce the complexity of the constructed mutually exclusive gene networks,the nonoverlap balance metric,the exclusivity and the coverage of a gene pair are computed first,and then the edges with low nonoverlap balance metric,low exclusivity and low coverage are deleted.Then,all maximal cliques which might be potential driver pathways are found out.After that,the weight of each clique is assigned as the product of its exclusive degree and coverage degree and then every node of a clique will be checked to judge whether is' s deletion might obtain a larger weight.At last,the maximal weight cliques are obtained in mutually exclusive gene networks as the final driver pathways.This paper compared GNDP algorithm with classical algorithm Dendrix and Multi-Dendrix on both simulated data sets and somatic mutation data sets.The results show that GNDP can detect all artificial pathways in simulated data.For Lung adenocarcinoma and Glioblastoma data,GNDP shows higher efficiency and accuracy than the comparison algorithms.In addition,GNDP does not need any prior knowledge and does not need to set the number of genes in driver pathways in advance.
关 键 词:癌症基因组 体细胞突变 基因互斥网络 极大团 驱动通路
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...