检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]南京林业大学信息技术学院,江苏南京210037
出 处:《微电子学与计算机》2011年第9期193-196,201,共5页Microelectronics & Computer
基 金:国家自然科学基金项目(30671639);江苏省自然科学基金项目(BK2009393);江苏省青蓝工程学术带头人项目
摘 要:提出了一种基于频繁子树挖掘策略说我DNA重复序列识别方法.绕开了传统的序列比对方式,将序列按照后缀树结构方式进行组织,再对后缀树形式做了约减改进,使其更加适合子树挖掘操作,最后利用频繁子树挖掘的方法对其进行学习.算法可以直接识别出满足设定阈值的重复序列,避免了由短重复体拼接所造成的时间浪费,设计的"二次识别技术"使得算法对模糊重复体也有着很好的识别效果,提高了识别完整度.实验证明:算法在识别效率性能方面较升,尤其当识别较长重复体时,优势体现的更为明显,同时在识别完整度方面也高度可比.The proposed algorithm is based on the thinking of the frequent subtree mining repetitive DNA sequences in the body identified.The organization of DNA sequences in the new algorithm is different from with the others;organized a sequence as a tree,so we could avoid alignment as those traditional methods,then improved trees more simple that could be operating by frequent subtree mining,used a kind of algorithm for mining frequent subtree to learn these trees.This new algorithm could find out the repeated sequences which meet the threshold set directly,avoid the wasting of time result of splicing the short sequences.Designed the new technology "secondary identification",which could find out the fuzzy repetitive sequences,also improved integrity of identification.Experiment show that our mothod improved the time efficiency compared with mainstream algorithms,especially learning to find out some long sequences and highly comparable on the integrity of identification.
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.221.242.128