检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:何亨[1,2] 程凯莉 张葵 成淑君[3] HE Heng;CHENG Kaili;ZHANG Kui;CHENG Shujun(School of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan 430065,Hubei,China;Key Laboratory of Intelligent Information Processing and Real Time Industrial Systems in Hubei Province,Wuhan 430065,Hubei,China;School of Computing,Beijing University of Posts and Telecommunications,Beijing 100876,China)
机构地区:[1]武汉科技大学计算机科学与技术学院,湖北武汉430065 [2]湖北省智能信息处理与实时工业系统重点实验室,湖北武汉430065 [3]北京邮电大学计算机学院,北京100876
出 处:《计算机工程》2025年第5期177-187,共11页Computer Engineering
基 金:国家自然科学基金(62372343,61602351)。
摘 要:拷贝数变异(CNV)作为一种遗传变异,广泛存在于人类基因组的基因分布中。CNV检测效率的提升不仅可以为更多的病患提供更加快速精确的CNV检测结果,大幅降低医疗成本,同时又有利于药物的研发和临床应用。基于读段深度(RD)的方法是目前最为常用的CNV检测方法,对RD相关信息的处理时间较长,在CNV检测中时间占比较高。现有方法无法有效应用于全基因组分析,存在计算效率较低、检测精度下降的问题。基于RD的CNV检测方法,提出一种高效的测序数据并行处理方案EPPCNV。在EPPCNV中,设计2个MapReduce作业串行执行的方法,实现高效全基因组测序数据的并行处理,精准地完成RD相关信息的提取;为充分考虑到GC含量偏差对CNV检测结果的影响,对测序数据的RDs进行校正处理,保证最终检测结果的高灵敏度与高精确度;采用独立于具体CNV检测方法的高适配性数据处理方式,其最终生成的RD相关信息能够与多种主流CNV检测方法直接结合,在不改变原方法对CNV区域判定的基础上,实现方法整体性能的大幅提升。实验结果表明,EPPCNV的综合准确率高,分别与CNV-LOF、HBOS-CNV以及CNVnator 3种方法直接结合,能够显著提升原方法的计算效率,并保证检测结果的高灵敏度与高精确度。对于覆盖深度越高、数据量越大的测序数据,CNV检测方法与EPPCNV结合后计算效率的提升更为显著。Copy Number Variation(CNV)is a type of genetic variation that widely occurs in the gene distribution of the human genome.Improving the efficiency of CNV detection can provide patients with more rapid and accurate results,significantly reduce medical costs,and facilitate drug development and clinical applications.Currently,a method based on Read Depth(RD)is the most commonly used method for CNV detection,and the processing time for RD-related information is long,accounting for the relatively high CNV detection time.Existing methods have problems,such as ineffective application in whole-genome analysis,low computational efficiency,and decreased detection accuracy.This paper proposes an efficient parallel processing scheme for sequencing data for copy number variation detection EPPCNV.In EPPCNV,two MapReduce jobs are designed to achieve efficient parallel processing of whole-genome sequencing data and accurately extract RD-related information.Moreover,EPPCNV fully considers the impact of GC content deviation on CNV detection results,implementing RD corrections of sequencing data to ensure high sensitivity and accuracy of the final detection outputs.Further,EPPCNV adopts a highly adaptable data processing method that operates independently of specific CNV detection methods.The final RD-related information generated by EPPCNV can be directly combined with various mainstream CNV detection methods,thereby achieving a significant improvement in the overall performance of the method without changing the judgment of the CNV regions in the original method.Experimental results show that EPPCNV achieves high comprehensive accuracy and can be directly combined with CNV-LOF,HBOS-CNV,and CNVnator methods,significantly improving the computational efficiency of these methods while maintaining high sensitivity and accuracy.For sequencing data with a higher coverage depth and larger data volume,the combination of the CNV detection method and EPPCNV yields even greater improvements in computational efficiency.
关 键 词:拷贝数变异检测 MapReduce作业 测序数据处理 读段深度 全基因组
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.147