基于混合统计模型的DNA序列压缩算法  

Compression Algorithm of DNA Sequences Based on Mixed Statistical Model

在线阅读下载全文

作  者:孙季丰[1] 仝雪珂 谭丽[1] 

机构地区:[1]华南理工大学电子与信息学院,广东广州510640

出  处:《华南理工大学学报(自然科学版)》2014年第3期8-14,共7页Journal of South China University of Technology(Natural Science Edition)

基  金:国家自然科学基金青年科学基金资助项目(61202292)

摘  要:基于专家模型算法(XM算法)原理和有限上下文混合统计模型估计DNA序列每一个符号的概率,提出一种基于混合统计模型的DNA序列压缩算法.将采用混合统计模型计算出的概率估计应用于算术编码中,对标准DNA序列集的符号位进行压缩编码.实验结果表明,文中提出的混合统计模型能得到比原有限上下文模型更好的压缩效果,且能比其他经典DNA序列压缩算法产生更大的压缩率,弥补基于统计信息的当前较先进的XM算法用于标准DNA序列集时一些数据的不足,但对高通量DNA系列的压缩效果有待提高.Proposed in this paper is a compression algorithm of DNA sequences based on the mixed statistical model,which estimates the probability of each symbol of a DNA sequence in line with the principle of expert model algorithm (XM algorithm)and the mixed finite context statistical model. Then,the estimated probability is applied to the arithmetic coding to encode each symbol of standard DNA sequences. Experimental results show that (1 )as compared with the single finite context model,the mixed statistical model helps to obtain better compression effect;(2 )the proposed algorithm based on mixed statistical model helps to achieve higher compression ratio than those of some other classical compression algorithms;(3 )it effectively overcomes the deficiencies of XM algorithm for the standard dataset compression of DNA sequences,although the XM algorithm based on statistical information is rather advanced;and (4 )the proposed algorithm needs to be improved for the compression of high-throughput DNA sequences.

关 键 词:XM算法 有限上下文模型 混合统计模型 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象