基于特征挖掘的基因组缺失变异集成检测方法  

Integrated Feature Mining Based Approach for Calling Genomic Deletions

在线阅读下载全文

作  者:张晓东[1] 凌诚[1] 高敬阳[1] ZHANG Xiao-dong LING Cheng GAO Jing-yang(College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029 ,China)

机构地区:[1]北京化工大学信息科学与技术学院,北京100029

出  处:《计算机科学》2017年第1期80-83,共4页Computer Science

基  金:国家自然科学基金(61472026);广州市科技计划项目(2014J4100081)资助

摘  要:随着高通量测序技术的应用与发展,基于测序的缺失变异检测方法大量涌现。然而,单一检测方法仍存在适用的局限性以及检测精度与敏感度不足的问题。为此,提出一种基于多检测理论融合的特征挖掘与机器学习算法集成的基因组缺失变异综合检测方法。该方法将多种工具应用于个体缺失变异检测,得到变异检测初始集;再根据多种检测理论对初始集中的缺失变异进行序列特征挖掘与特征提取;最后,将检测工具与机器学习算法相融合以获得集成的检测方法,剔除初始集中的假阳性变异,获得最终的结果集。基于千人基因组计划数据的实验表明,相较于单个工具的检测结果,该方法在检测精度和敏感度上均占优势;相较于多个工具检测结果的直接组合,该方法在损失少许检测敏感度的前提下显著地提高了检测精度。With the application and development of next generation sequencing technology, methods of calling genomic deletions based on sequencing have proliferated. However, using a single method to call deletions has limitation in appli- cation and insufficiency of precision and sensitivity. To solve these problems, an integrated approach for calling deletions was proposed based on feature mining according to combining multiple theory and machine learning algorithm. First, dif- ferent callers are used for calling deletions. These results are merged as aninitial result set of deletions. Then, according to variety of detection strategies, features of the initial result set of deletions are extracted based on next generation se- quencing data. Finally, to obtain the final result set of calling deletions, a machine learning model is trained to distinguish false positive deletions from initial call set. The experimental results show that compared with a single caller such as Pindel and SVseq2, the proposed approach has higher precision and sensitivity simultaneously. Compared with directly merging multiple deletion call sets, the proposed approach can significantly improve the precision with slight loss of sen- sitivity.

关 键 词:缺失变异 特征挖掘 集成检测 

分 类 号:TP391[自动化与计算机技术—计算机应用技术] Q523[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象