检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]湖南大学计算机与通信学院,湖南长沙410082
出 处:《计算机与应用化学》2009年第11期1380-1384,共5页Computers and Applied Chemistry
基 金:国家自然科学基金资助项目(10571019:数学方法在分子生物学中的应用)
摘 要:传统的方法度量序列之间的距离需要序列比对,使一些主观因素破坏数据的原始状态,导致计算结果因人而异。本文拟介绍一些基于信息理论的度量法,提出1种类似的新度量法,建立数学依据。这些度量法度量序列之间的距离不需要序列比对,没有主观因素干涉。同时,选取了20种胎生哺乳动物的线粒体全基因序列,分别使用这些度量法计算出他们的距离,再利用NEIGHBOR法构建系统树。由比较结果看来,新方法用较少时间构建的系统树完全不逊色于以往的方法。这为研究分子序列的差异性提供了1种新方法。Traditional sequence distances require an alignment and therefore are not directly applicable to the moreproblem of whole genome phylogeny where events such as rearrangements make full length alignments impossible. This paper introduces information theoretical concept and arithmetic which is used to compute information probability distribution of sequence. Some information theory-based measures are also introduced, which are used to measure discrepancy of information probability distribution, such as Kullback-Leiber entropy, cross entropy and FDOD function. Then, a sequence measure is presented, which works on sequences using the information theoretical concept of shannon information and a program to estimate this distance, the new measure needn't align sequences to measure their distance and do not have subjective factors to interfere. Some properties of the new measure are proved. Distance matrix of 20 mammals whole mitochondrial genomes sequences is computed by measures. Then, Phylogenies are constructed by NEIGHBOR. As the experiment shown, The time complexity of the new measure is less, and phylogeny constructed by new measure is the most credible. It is useful for studying the discrepancy of biologic sequence.
分 类 号:O6[理学—化学] TP393[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.151