基于Contig的双面基因组片段填充算法研究  

Research on Two-sided Genome Scaffold Filling Algorithm Based on Contig

在线阅读下载全文

作  者:柳楠[1] 李胜华 朱永琦 LIU Nan;LI Sheng-hua;ZHU Yong-qi(School of Computer Science and Technology,Shandong Jianzhu University,Jinan 250101,China)

机构地区:[1]山东建筑大学计算机科学与技术学院,山东济南250101

出  处:《计算机技术与发展》2022年第12期213-220,共8页Computer Technology and Development

基  金:国家自然科学基金(61902221);山东省自然科学基金(ZR2018MF012)。

摘  要:随着生物测序技术的快速发展,人们能够在更短的时间内获得大规模的基因片段序列,如何对不完整的基因组片段进行填充使其变得完整已经被越来越多的人关注。基于普通序列的双面基因组问题是将缺失基因X填充到一个不完整基因组片段A中,得到A',将缺失基因Y填充到一个不完整基因组片段B中,得到B',使得A'和B'之间的邻接数最大化。测序获得的基因组片段通常由片段重叠群(contig)组成,缺失基因只能插入到contig两端。该算法针对基因组片段重叠群一类实例,首先为了简化基因组片段,进行合并符合条件的连续缺失串操作,其后进行具有一一对应关系的type-2类型缺失串的插入操作;其次采用贪婪搜索策略对type-1类型缺失串进行插入操作,然后对type-3串进行插入操作。设计了多项式时间算法,证明了算法的正确性,分析了算法的时间复杂度,开发了算法测试平台,进一步验证了算法的有效性。With the rapid development of biological sequencing technology,large scale gene scaffold sequences can be obtained by humans in a shorter time.How to fill the incomplete genome scaffold to make them complete has attracted more and more attention.The two-sided genome problem based on common sequence is to fill the missing gene X into an incomplete genome scaffold A to get A',and fill the missing gene Y into an incomplete genome scaffold B to get B',so as to maximize the number of adjacency between A'and B'.The genomic scaffold obtained by sequencing are usually composed of scaffold overlap groups(contig),and the missing genes can only be inserted at both ends of contig.For the example of overlapping groups of genome segments,in order to simplify the genome fragment,the operation of merging the qualified continuous missing strings is carried out firstly,and then the insertion of the type-2 missing strings with one-to-one correspondence is carried out.Secondly,the greedy search strategy is used to insert the type-1 missing string,and then the type-3 string is inserted.A polynomial time algorithm is designed and proved in correctness.The time complexity of the algorithm is analyzed,and the algorithm test platform is developed to further verify the effectiveness of the algorithm.

关 键 词:基因组 多项式时间 贪婪搜索 片段重叠群 算法测试 邻接 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象