基于前缀标识符及其位置的DNA序列比较  被引量:1

Comparison of DNA Sequences Based on Prefix Identifiers and Their Locations

在线阅读下载全文

作  者:王代 陆超 

机构地区:[1]辽宁师范大学,辽宁 大连

出  处:《自然科学》2021年第2期281-290,共10页Open Journal of Nature Science

摘  要:分子序列比较是生物信息学中最基本、最主要的问题,DNA序列相似性分析是研究的重要的课题。非比对方法是研究序列比较的方法之一,它克服了比对方法的局限,其计算速度更快。本文从前缀标识符位置角度出发,利用信息熵,提出了序列分析的非比对方法。本文通过对生物序列构建前缀树,得到生物序列前缀标识符的基础上,以两两序列的共同前缀标识符为研究对象,提取它们在序列中位置信息,将它们的位置差的绝对值看成随机变量,利用信息熵,提出新的DNA序列相似性度量方法,建立有效的模型。将70个哺乳动物的线粒体DNA序列作为实验数据集,应用该模型得到的相似性距离构建生物进化树。该进化树的分类结果符合当前的生物学分类标准。Comparison of molecular sequence is the most basic and important problem in bioinformatics. DNA sequence similarity analysis is an important research topic. Alignment-free method is one of the methods to study sequence comparison. It overcomes the limitation of alignment method and is faster than alignment method. In this paper, from the point of view of prefix identifier location, the alignment-free method of sequence analysis is proposed by using information entropy. Based on the prefix tree and the prefix identifier of biological sequences, the position information of pairwise sequences is extracted by using the common prefix identifiers of pairwise sequences. The absolute value of their position difference is regarded as random variable. Using information entropy, a new DNA sequence similarity measurement method is proposed and an effective model is established. Mitochondrial DNA sequences of 70 mammalian were used as experimental data sets. Construct the Phylogenetic tree based on the similarity distance obtained by the model. The classification results of Phylogenetic tree conform to the current biological classification.

关 键 词:非比对方法 相似性度量 进化树 前缀标识符 信息熵 

分 类 号:G63[文化科学—教育学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象