基于Context建模熵编码的基因组序列应用  

Application of Entropy Coding Genome Sequence Based on Context Modeling

在线阅读下载全文

作  者:陈慧[1] CHEN Hui(Dianchi College of Yunnan University,Kunming,Yunnan Province,650228 China)

机构地区:[1]云南大学滇池学院,云南昆明650228

出  处:《科技资讯》2021年第9期25-27,共3页Science & Technology Information

基  金:校级重点项目《基于小波变换和支持向量机的数字水印技术》(项目编号:2019XZD06)。

摘  要:该文通过将生物学特征和生物学含义引入DNA序列数据的压缩处理中,提出了基于生物信息学特征的基因组序列的Context建模熵编码技术,拟结合基因组序列特点,研究针对基因组序列的Context建模熵编码技术。在算法中DNA序列根据组成部分生物学含义的不同切分重组为4个集合:编码序列CDS集合、内含子序列集合、RNA序列集合以及剩余序列的集合。根据各集合中序列的具体生物学特征分别进行预处理,并通过熵编码算法进行压缩。实验结果表明,该算法在基准测试序列上的压缩性能优于原有的DNA序列压缩方法,特别是对于生物信息学特征清晰的长序列,算法能够在较短的时间内获得较高的压缩率。In this paper,by introducing biological characteristics and biological meaning into the compression processing of DNA sequence data,Context modeling entropy coding technology of genome sequence based on bioinformatics features was proposed.It is intended to combine the characteristics of genome sequence to study the context modeling entropy coding technology of genome sequence.In the algorithm,DNA sequences are reorganized into four sets according to the different slices of the biological meaning of the constituent parts:the CDS set of the coding sequence,the intron sequence set,the RNA sequence set and the remaining sequence set.According to the specific biological characteristics of the sequences in each set,the sequences were preprocessed and compressed by entropy coding algorithm.The experimental results show that the compression performance of the proposed algorithm is better than that of the original DNA sequence compression method,especially for long sequences with clear bioinformatics features,the algorithm can obtain a higher compression rate in a relatively short time.

关 键 词:基因组序列 Context建模 熵编码 集合 

分 类 号:G64[文化科学—高等教育学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象