检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:黎瑶 钟诚[1] LI Yao;ZHONG Cheng(School of Computer,Electronics and Information,Guangxi University,Nanning 530004,China)
机构地区:[1]广西大学计算机与电子信息学院,广西高校并行分布式计算技术重点实验室,南宁530004
出 处:《小型微型计算机系统》2020年第9期1999-2005,共7页Journal of Chinese Computer Systems
基 金:国家自然科学基金项目(61962004)资助。
摘 要:下一代测序平台产生的大量短序列(short reads)包含许多重复的子序列,这给求解短序列比对(short-read alignment)问题带来了挑战.如何处理包含重复子序列的基因组区域将影响后续基因组的分析.现有的利用de Bruijn图的短序列比对算法效果并不理想或者未考虑重复子序列的影响.针对包含许多重复子序列的短序列比对问题,依据种子预定义由给定的shape布局中生成的关键字建立hash索引,通过采用基于空位种子(gapped seeds)搜索策略的区域选择方法,通过搜索索引筛选候选位置以减少待比对的候选位置个数、减少搜索空间;运用Hough变换分组操作将种子命中聚集为粗对准形式,以降低后续比对验证时间;采用简洁de Bruijn图结构压缩存储和索引长度为k的序列片段(k-mer),以降低比对所需的存储空间.分析与实验结果表明,与已有的代表性同类算法相比,本文的算法既保持或获得更高正确比对百分比,又降低了所需的运行时间和存储空间,尤其是对高重复率的序列进行比对,本文算法可获得更高的正确对准百分比.A large number of short reads produced by the next-generation sequencing platform contains many repetitive subsequences,which proposes a challenge to solving the read-alignment problem.How to deal with reference regions containing repetitive subsequences will affect subsequent genome analysis.The existing short-read alignment methods based on the de Bruijn graph do not have an ideal result or do not consider the effect of repeating subsequences.To align a large number of reads and reference genome with many repeat subsequences,a hash index structure is first build by the keys generated by a given shape layout.Secondly,the region selection approach is applied by search strategy of the gapped seeds,the candidate positions are filtered to reduce the number of candidate positions to be matched and the required search space.Finally,the Hough transformation grouping operation is executed,the seed hits are aggregated into a coarse alignment form to reduce subsequent alignment validation time,and a compact de Bruijn graph is used to store and search the sequence fragment called k-mer to reduce the required storage space.The analysis and experimental results show that compared with the existing representative algorithms,the proposed algorithm not only obtains similar or higher percentage of correct alignment,but also reduces the required running time and storage space,especially for the sequences of high repetition rate,the algorithm can obtain a higher percentage of correct alignment.
关 键 词:序列比对 空位种子 区域选择 简洁de Bruijn图 高重复率
分 类 号:TP301[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.25