检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:韦好 钟诚[1] WEI Hao;ZHONG Cheng(School of Computer and Electronics and Information,Guangxi University,Nanning 530004 .China)
机构地区:[1]广西大学计算机与电子信息学院广西高校并行分布式计算技术重点实验室
出 处:《小型微型计算机系统》2019年第8期1804-1808,共5页Journal of Chinese Computer Systems
基 金:国家自然科学基金项目(61462005)资助;广西自然科学基金项目(2014GXNSFAA118396)资助
摘 要:生物序列比对有助于定位序列之间的相似区域.测序技术的快速发展需要序列比对算法能够灵活地处理更长且错误率更高的reads序列.通过增强型稀疏后缀数组对参考序列建立索引,自适应地调整种子的最小长度,寻找参考序列与reads序列之间的最大精确匹配和超大精确匹配,以此进行种子扩展,提出一种改进的long-read比对算法.与已有代表性的算法相比,模拟和真实数据实验结果表明,本文算法在获得基本相同精确度的前提下,召回率明显提升,敏感度总体上更高,且能够识别更多的reads序列.Biological sequence alignments help to locate similar regions between sequences. The rapid development of sequencing technology has forced the sequence-mapping algorithm to flexibly process longer reads with higher error. The reference sequence is indexed by an enhanced sparse suffix array,and the maximum exact match and super maximum exact match between the reference sequence and the reads are found by adaptively adjusting minimum length of seeds,the seeds are expanded by these two matches,and an improved long-read alignment algorithm is proposed. Compared with the existing representative algorithm,the experimental result on the simulation and real data shows that the proposed algorithm significantly improves the recall rate and has totally higher sensitivity under the premise of obtaining basically same accuracy,and it can identify more reads.
关 键 词:序列比对 增强型稀疏后缀数组 索引 最大精确匹配
分 类 号:TP301[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15