一种可用于生物序列分析的轻量级索引结构  被引量:1

A new lightweight index SUA for biological sequence analysis

在线阅读下载全文

作  者:王镝[1] 王国仁[1] 陈白尘[1] 吴青泉[1] 王斌[1] 韩冬红[1] 

机构地区:[1]东北大学信息科学与工程学院,辽宁沈阳110004

出  处:《华中科技大学学报(自然科学版)》2005年第z1期209-212,225,共5页Journal of Huazhong University of Science and Technology(Natural Science Edition)

基  金:国家自然科学基金资助项目(6027379;60273074)

摘  要:针对目前可用于重复片断查询的索引结构所需空间过大的问题,通过对序列中重复片断的分析提出一种轻量级数据结构———后继数组,它是基于基数排序方法建立的.后继数组也适用于多序列分析.理论分析表明了后继数组及多序列后继数组在存储空间上的优势.实验结果表明后继数组仅需要约原序列长度5倍的存储空间,在建立时间上后继数组也要优于后缀树等索引结构.Searching for repetitions is an important topic in bio-sequence analysis but the bottleneck of current indices used for it such as suffix tree is much too huge space consumption.Succeeding Unit Array(SUA),a lightweight index structure,is proposed through the analysis of repetitions in the DNA sequences in order to solve the bottleneck.It is constructed based on Radix Sorting.Furthermore,SUA is suitable for multi-sequences analysis.The theoretical analysis shows the advantage of SUA in space consumption.Given a sequence of length n,the space consumption of SUA is only about 5n in the experiments.Meanwhile,the construction is faster than other indices such as suffix tree.

关 键 词:DNA序列 重复片段 后继数组 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象