检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:崔雅轩 张少强[1] CUI Yaxuan;ZHANG Shaoqiang(College of Computer Information and Engineering,Tianjin Normal University,Tianjin 300387,China)
机构地区:[1]天津师范大学计算机与信息工程学院,天津300387
出 处:《计算机工程与应用》2022年第3期201-206,共6页Computer Engineering and Applications
基 金:国家自然科学基金(61572358);天津自然科学基金重点项目(19JCZDJC35100)。
摘 要:为了解决第三代测序数据较高的错误率和提高基因组组装精度,基于10X Genomics链读测序数据设计了一种针对PacBio长读数据的组装和纠错算法SuperLLEC。该算法使用Wtdbg2算法将PacBio长读测序数据拼接成支架序列,运用Bowtie2比对工具将链读序列比对到支架序列,并根据链读条码进一步组装支架序列;对不匹配的比对位点采用Fisher精确检验预测该位点为单核酸多态性或是PacBio测序错误的碱基。通过三组人类细胞的长读数据和链读数据的算法比较实验,证明该方法能够较明显地提高基因组组装的准确度、NG50长度和单核酸多态性位点预测精度。In order to solve the high error rate of the third-generation sequencing data and improve the accuracy of genome assembly,an assembly and error correction algorithm,called SuperLLEC,is designed for the long-read data of the PacBio based on the 10X Genomics linked-read sequencing data.Wtdbg2 is employed to assemble the PacBio long reads of a genome into scaffolds.Bowtie2 is used to align each linked-read to these scaffolds,and further assemble these scaffolds based on the barcodes of linked-reads.Fisher’s exact test is used to predict whether each mismatched alignment site is a single nucleotide polymorphism(SNP)or an error base sequenced by PacBio.Algorithm comparison experiments on the long-read and linked-read data from three groups of human cells show that SuperLLEC can significantly improve the accuracy of genome assembly,increase NG50 length,and recover more SNPs.
关 键 词:链读 长读 支架 组装 纠错 FISHER精确检验
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.171