检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:冯艳霞 张志红 张少强[1] FENG Yanxia, ZHANG Zhihong, ZHANG Shaoqiang(College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, Chin)
机构地区:[1]天津师范大学计算机与信息工程学院,天津300387
出 处:《计算机应用》2018年第6期1826-1830,共5页journal of Computer Applications
基 金:国家自然科学基金资助项目(61572358);天津自然科学基金资助项目(16JCYBJC23600)~~
摘 要:针对新一代测序(NGS)的染色质免疫共沉淀的高通量测序(ChIP-Seq)数据集的模体发现问题,提出一种基于费舍尔(Fisher)精确检验的模体发现算法——Fisher Net。首先运用费舍尔精确检验计算所有k长短序的P值并筛选出模体的种子;然后,构建初始模体的位置赋权矩阵;最后,用位置赋权矩阵扫描所有k长短序形成最终模体。通过小鼠胚胎干细胞(m ESC)和红细胞、人类淋巴母细胞系的ChIP-Seq数据集以及ENCODE数据库的数据进行验证,结果表明所提算法精度和计算速度均高于其他常见的模体发现算法,并且能够发现超过80%的已知转录因子核心模体及其辅调控因子模体。该算法在保证高精度的同时可以应用到大规模测序数据集。Aiming at the motif finding problem in Chromatin Immunoprecipitation Sequencing(ChIP-Seq) datasets of Next-Generation Sequencing(NGS), a new motif finding algorithm based on Fisher's exact test, called Fisher Net, was proposed. Firstly, Fisher's exact test was used to calculate the P values of all k-mers, some of which were selected as motif seeds. Secondly, the position weight matrix of the initial motif was constructed. Finally, the position weight matrix was employed to scan all k-mers for obtaining the final motif. The ChIP-Seq datasets of mouse Embryonic Stem cells(m ESC),mouse erythrocytes, human lymphoblastic lines and the ENCODE database were used for verifying. The verification results show that, the accuracy and calculation speed of the proposed algorithm are higher than those of other common motif finding algorithms, and it can find more than 80% of core motifs for known transcription factors and their co-factors. The proposed algorithm can be applied to large-scale sequencing datasets while ensuring high accuracy.
关 键 词:模体发现算法 顺式调控 真核生物 染色质免疫共沉淀的高通量测序 转录因子
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.112