基于码书索引变换的高通量DNA序列数据压缩算法  被引量:1

High-Throughput DNA Sequence Data Compression Method Based on Codebook Index Transformation

在线阅读下载全文

作  者:谭丽[1] 孙季丰[1] 

机构地区:[1]华南理工大学电子与信息学院,广东广州510641

出  处:《电子学报》2015年第5期1007-1013,共7页Acta Electronica Sinica

基  金:国家自然科学基金青年科学基金(No.61202292);广东省自然科学基金(No.9151064101000037)

摘  要:提出一种高通量DNA序列数据的压缩算法.该算法先采用码书索引变换模型,将传统码书索引值的表示方法变换成由四个标准碱基字符替代的四进制数值方式,并采用一种界定替换串与非替换串的简明编码方法,接着通过信息熵的大小来决定是否进行块排序压缩变换(BWT),最后进行前移编码变换和Huffman熵编码.在多种测序数据集上的实验结果表明,CITD在大多数情况下可以获得比本文所对比的高通量DNA专用压缩方法更优的压缩性能.A novel high-throughput DNA sequence compression method based on codebook index transformation (C1TD) is proposed. In CITD, we used the codebook index transformation (CIT) model, to substitute the traditional represatation of codebook indexes by the quaternary values which are expressed by the four standard base characters, and adopted a simple encoding method to distinguish the replaced and non-replaced substring, and subsequently determined whether need to use the Burrow Wheeler Transfor- marion (BWT) according to the value of infommtion entropy,finally used move to front (MTF) transformation and Huffman en- tropy coding to compress the data. Experimental results on several sequencing dam sets demonstrate better performance of CITD than the high-throughput DNA sequence cornoression algorithms cited in this paper,in most cases.

关 键 词:高通量DNA序列 码书索引变换模型 块排序压缩变换 前移编码 信息熵 数据压缩算法 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象