增强型稀疏后缀数组索引的高错误率reads比对  被引量:1

Aligning High Error Rate Reads Using Enhanced Sparse Suffix Array Index

在线阅读下载全文

作  者:韦好 钟诚[1] WEI Hao;ZHONG Cheng(School of Computer and Electronics and Information,Guangxi University,Nanning 530004 .China)

机构地区:[1]广西大学计算机与电子信息学院广西高校并行分布式计算技术重点实验室

出  处:《小型微型计算机系统》2019年第8期1804-1808,共5页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(61462005)资助;广西自然科学基金项目(2014GXNSFAA118396)资助

摘  要:生物序列比对有助于定位序列之间的相似区域.测序技术的快速发展需要序列比对算法能够灵活地处理更长且错误率更高的reads序列.通过增强型稀疏后缀数组对参考序列建立索引,自适应地调整种子的最小长度,寻找参考序列与reads序列之间的最大精确匹配和超大精确匹配,以此进行种子扩展,提出一种改进的long-read比对算法.与已有代表性的算法相比,模拟和真实数据实验结果表明,本文算法在获得基本相同精确度的前提下,召回率明显提升,敏感度总体上更高,且能够识别更多的reads序列.Biological sequence alignments help to locate similar regions between sequences. The rapid development of sequencing technology has forced the sequence-mapping algorithm to flexibly process longer reads with higher error. The reference sequence is indexed by an enhanced sparse suffix array,and the maximum exact match and super maximum exact match between the reference sequence and the reads are found by adaptively adjusting minimum length of seeds,the seeds are expanded by these two matches,and an improved long-read alignment algorithm is proposed. Compared with the existing representative algorithm,the experimental result on the simulation and real data shows that the proposed algorithm significantly improves the recall rate and has totally higher sensitivity under the premise of obtaining basically same accuracy,and it can identify more reads.

关 键 词:序列比对 增强型稀疏后缀数组 索引 最大精确匹配 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象