基于GPU加速随机森林算法的大豆基因型填充研究  被引量:1

Research on Soybean Genotype Imputation Based on GPU-Accelerated RandomForest Algorithm

在线阅读下载全文

作  者:李明亮 李卓 黄斌 于军[1] 辛鹏[1] 张继成[3] 唐友 LI Mingliang;LI Zhuo;HUANG Bin;YU Jun;XIN Peng;ZHANG Jicheng;TANG You(School of Information and Control Engineering,Jilin Institute of Chemical Technology,Jilin 132022,China;Electrical and Information Engineering College,Jilin Agricultural Science and Technology University,Jilin 132101,China;College of Electronic and Information,Northeast Agricultural University,Harbin 150030,China)

机构地区:[1]吉林化工学院信息与控制工程学院,吉林吉林132022 [2]吉林农业科技学院电气与信息工程学院,吉林吉林132101 [3]东北农业大学电气与信息工程学院,黑龙江哈尔滨150030

出  处:《大豆科学》2023年第6期742-748,共7页Soybean Science

基  金:吉林省科技发展计划项目(YDZJ202201ZYTS692)。

摘  要:基因型填充(Genotype Imputation, GI)是一种利用现有的基因型信息来推断未测定或不完整基因型的技术。为了探索在大豆基因组测序中处理不完整数据的高效填充方法,以提高数据处理速度和效率,本研究采用真实的大豆参考面板基因型数据,通过对数据进行2%、5%、10%和25%的完全随机缺失处理,利用GPU加速的随机森林机器学习算法构建填充模型,并对不同缺失比例的数据进行填充。同时,对比分析了不同处理器的准确性和性能。结果显示:基于GPU加速的随机森林算法在大豆基因组中实现了优秀的填充精度。与主流基因填充软件相比,该方法至少提供了4倍以上的运算时间优势。因此,GPU加速的基因型填充策略可应用于大规模基因型数据处理中,提高大豆基因型数据处理的速度和效率,同时减少计算时间和资源消耗。Genotype Imputation(GI)is a technique that uses existing genotype information to infer unobserved or incomplete genotypes.This study aims to explore efficient imputation methods for handling incomplete data in soybean genomic sequencing,with the goal of improving data processing speed and efficiency.Real soybean reference panel genotype data was used in the study,and the data was subjected to complete random missingness at rates of 2%,5%,10%,and 25%.A GPU-acceleratedrandom forest machine learning algorithm was employed to construct imputation models and fill in the missing data at different missingness rates.Additionally,the accuracy and performance of different processors were compared and analyzed.The research results demonstrate that the GPU-accelerated random forest algorithm achieves excellent imputation accuracy in the soybean genome.Compared to mainstream genotype imputation software,this method provides at least a fourfold computational time advantage.Therefore,the GPU-accelerated genotype imputation strategy can be applied to largescale genotype data processing,improving the speed and efficiency of soybean genotype data processing while reducing computational time and resource consumption.

关 键 词:大豆基因填充 随机森林算法 GPU加速 数据处理 

分 类 号:S565.1[农业科学—作物学] TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象