基于频繁子树挖掘的DNA重复序列识别方法  被引量:2

Algorithm of Identification the DNA Repeat Sequence Based on Frequent Subtree Mining

在线阅读下载全文

作  者:周溜溜[1] 业宁[1] 徐昇[1] 严敏利[1] 

机构地区:[1]南京林业大学信息技术学院,江苏南京210037

出  处:《微电子学与计算机》2011年第9期193-196,201,共5页Microelectronics & Computer

基  金:国家自然科学基金项目(30671639);江苏省自然科学基金项目(BK2009393);江苏省青蓝工程学术带头人项目

摘  要:提出了一种基于频繁子树挖掘策略说我DNA重复序列识别方法.绕开了传统的序列比对方式,将序列按照后缀树结构方式进行组织,再对后缀树形式做了约减改进,使其更加适合子树挖掘操作,最后利用频繁子树挖掘的方法对其进行学习.算法可以直接识别出满足设定阈值的重复序列,避免了由短重复体拼接所造成的时间浪费,设计的"二次识别技术"使得算法对模糊重复体也有着很好的识别效果,提高了识别完整度.实验证明:算法在识别效率性能方面较升,尤其当识别较长重复体时,优势体现的更为明显,同时在识别完整度方面也高度可比.The proposed algorithm is based on the thinking of the frequent subtree mining repetitive DNA sequences in the body identified.The organization of DNA sequences in the new algorithm is different from with the others;organized a sequence as a tree,so we could avoid alignment as those traditional methods,then improved trees more simple that could be operating by frequent subtree mining,used a kind of algorithm for mining frequent subtree to learn these trees.This new algorithm could find out the repeated sequences which meet the threshold set directly,avoid the wasting of time result of splicing the short sequences.Designed the new technology "secondary identification",which could find out the fuzzy repetitive sequences,also improved integrity of identification.Experiment show that our mothod improved the time efficiency compared with mainstream algorithms,especially learning to find out some long sequences and highly comparable on the integrity of identification.

关 键 词:DNA序列 重复体识别 频繁子树挖掘 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象