k-长DNA子序列计数算法研究  被引量:2

Research on Counting Algorithm of k-mer Occurrence in DNA Sequence

在线阅读下载全文

作  者:王树林[1] 王戟[1] 陈火旺[1] 张鼎兴[1] 

机构地区:[1]国防科技大学计算机学院

出  处:《计算机工程》2007年第9期40-42,共3页Computer Engineering

基  金:国家自然科学基金资助项目(60233020)

摘  要:基因组的结构与功能存在密切联系,其功能主要通过DNA子序列来表达,因此研究DNA序列结构对于生物信息学来说具有重要的意义。该文研究了k-长DNA子序列在DNA全序列中出现频数的计数问题,设计并实现了k-长DNA子序列内部计数算法和外部计数算法。该算法通过一个哈希函数把k-长DNA子序列映射为整数关键字从而把k-长DNA子序列出现频数的计数问题转化为整数关键字的重复计数问题,使得能够利用经典B树算法来解决k-长DNA子序列的出现频数计数问题。针对所要解决的问题提出3种改进措施以进一步提高算法的性能。There is a close relationship between the structures of whole genome and its functions which are expressed by its subsequences. Researching the structure of DNA sequence has a profound meaning to bioinformatics. The problem that all k-mers in whole genome are counted is researched. The internal and external algorithm which counts all k-mers occurrence in DNA sequences is designed and implemented. This algorithm translates the problem of counting all k-mers into the problem of counting integer keys with the help of a hash function which maps a k-mer to an integer, and it applies the classic B-tree algorithm to solve the problem of counting k-mers in DNA sequence. It proposes three measures to further improve the efficiency of the algorithm according to the feature of the counting problem.

关 键 词:k-长DNA子序列 DNA序列 B树 全基因组 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象