检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:曹跃 夏云[1] 郑渝池[1] CAO Yue;XIA Yun;ZHENG Yuchi(Department of Herpetology, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China;University of Chinese Academy of Sciences, Beijing 100049, China)
机构地区:[1]中国科学院成都生物研究所两栖爬行动物研究室,成都610041 [2]中国科学院大学,北京100049
出 处:《四川动物》2018年第3期261-267,共7页Sichuan Journal of Zoology
基 金:国家自然科学基金项目(31372181;31572243)
摘 要:动物线粒体基因组发生局部串联复制后,涉及区域具有多基因拷贝、假基因化、大量插入缺失的特点,难以排序和构建基因树。而不依赖排序的聚类方法理论上可用来归纳和展示这类序列的差异,但未见相关评估和运用。本研究选取棘腹蛙Quasipaa boulengeri 19号个体,以3类常用的基于特定长度(k)子序列集的非排序算法,依次设k值为4、6、8……20,对其轻链复制起点邻近复制区域583~695 bp的序列进行聚类。构建相同个体线粒体1 518 bp蛋白编码序列最大似然树为参照,计算和考查两者间拓扑结构距离和差异。所评估的28种算法中,半数可在主要为8的特定k值下产生和最大似然树拓扑结构相差仅2个节点(11.8%)的聚类树,部分算法在不同k值下均表现不佳,较小的k值(4)适合解析差异程度相对较高的序列间关系。这些结果例证了动物线粒体重复序列非排序聚类的可行性,其中的算法、k值理想组合可能适合类似系统。建议对其他类型的复制重排系统进行类似评估。Animal mitochondrial genome regions experienced tandem duplication and the following random loss are often hypervariable and hence challenging for alignment algorithms. In theory,alignment-free comparison methods( AFM) can be used to summarize and visually present the relationships and similarities of such sequences. To our knowledge,relevant evaluations and applications are lacking. We evaluated 3 types of commonly used k-mer-based AFM with a system of intraspecific sequence variation for one such region around the origin of light strand replication. From the frog species Quasipaa boulengeri,19 sequences ranging from 583 bp to 695 bp were clustered using 28 AFM. For each method,substrings of length k = 4,6,8,10,12,14,16,18,and 20 bp were tried. From the same individuals,the mitochondrial protein-coding sequences with length of 1 518 bp were used to reconstruct a Maximum Likelihood tree as the reference topology. Between the reference and AFM topologies,the Robinson-Foulds distance was calculated and the major topological difference was recorded. Using a k value of typically 8,half of the methods produced a tree different from the reference by only 2 nodes( 11. 8%). However,poor performances were constantly observed for some methods. A small k value of 4 was found to be suitable for inferring the relationships among sequence groups. These findings support a successful application of AFM on animal mitochondrial tandem duplication regions. The combinations between methods and k values with ideal performance obtained here may be applied to similar systems. For different systems,similar evaluations will be helpful.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.193