检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:温华铭 徐云[1,2] 杨金宝 Wen Huaming;Xu Yun;Yang Jinbao(School of Computer Science&Technology,University of Science&Technology of China,Hefei 230027,China;Key Laboratory of High Perfor-mance Computing of Anhui Province,Hefei 230027,China;College of Informatics,Huazhong Agricultural University,Wuhan 430070,China)
机构地区:[1]中国科学技术大学计算机科学与技术学院,合肥230027 [2]安徽省高性能计算重点实验室,合肥230027 [3]华中农业大学信息学院,武汉430070
出 处:《计算机应用研究》2024年第7期2160-2164,共5页Application Research of Computers
基 金:国家自然科学基金面上项目(61672480);国家外专局111引智计划资助项目(BP0719016)。
摘 要:串联重复序列是基因组构建的困难片段,由于其重复单元之间的相似性与其拷贝数的不确定性,在序列比对时容易定位到多个候选位置,如何快速而准确地筛选出正确的比对位置是一项挑战。现有方法使用种子(从测序片段中选取的短序列)来定位并扩展候选比对位置,但挑选种子时未考虑串联重复序列特性。因此,提出了一种串联重复序列比对的位置筛选方法,其通过计算稀有kmer(长度为k的子序列)序列的相似性来筛选比对结果。此外,采用合并稀有kmer的策略加速计算,并利用基于编辑距离的模糊查找以提高过滤信息密度。实验结果表明,在模拟数据集上提高比对结果的召回率与准确率的同时,该方法比现有方法快约2倍,且具有良好的并行加速性能。Tandem repeat sequences are difficult part in genome construction,due to the high similarity between repeated units and the ambiguity in copy numbers,it often result in multiple candidate positions during sequence alignment.The challenge lies in rapidly and accurately filtering out the correct alignment positions.Existing methods address this issue by using seeds(short sequences selected from sequencing fragments)to locate and extend candidate alignment positions,but overlook the distinctive characteristics of tandem repeat sequences when selecting seeds.To tackle the problem,this paper proposed a position filtering method for tandem repeat sequence alignment,which filtered alignment results by calculating the similarity of rare kmer sequences.Additionally,it implemented a strategy of merging rare kmers to expedite computation,coupled with a fuzzy search based on edit distance to enhance filtering information density.Experimental results demonstrate that this approach improves both the recall and accuracy of alignment results on simulated datasets while achieving approximately a 2-fold increase in computational speed compared to existing methods,with notable parallel acceleration effects.
关 键 词:串联重复 单分子实时测序 序列比对 种子-扩展法
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15