Impact of database choice and confidence score on the performance of taxonomic classification using Kraken2  

在线阅读下载全文

作  者:Yunlong Liu Morteza H.Ghaffari Tao Ma Yan Tu 

机构地区:[1]Key Laboratory of Feed Biotechnology of the Ministry of Agricultural and Rural Affairs,Institute of Feed Research,Chinese Academy of Agricultural Sciences,Beijing 100081,China [2]Institute of Animal Science,Physiology Unit,University of Bonn,Bonn 53115,Germany

出  处:《aBIOTECH》2024年第4期465-475,共11页生物技术通报(英文版)

基  金:supported by the Central Public-Interest Scientific Institution Basal Research Fund of the Chinese Academy of Agricultural Sciences(Y2022QC10);Agricultural Sciences and Technology Innovation Program of the Chinese Academy of Agricultural Sciences(CAAS-IFRZDRW202404,CAAS-ASTIP-2023-IFR-04).

摘  要:Accurate taxonomic classification is essential to understanding microbial diversity and function through metagenomic sequencing.However,this task is complicated by the vast variety of microbial genomes and the computational limitations of bioinformatics tools.The aim of this study was to evaluate the impact of reference database selection and confidence score(CS)settings on the performance of Kraken2,a widely used k-mer-based metagenomic classifier.In this study,we generated simulated metagenomic datasets to systematically evaluate how the choice of reference databases,from the compact Minikraken v1 to the expansive nt-and GTDB r202,and different CS(from 0 to 1.0)affect the key performance metrics of Kraken2.These metrics include classification rate,precision,recall,F1 score,and accuracy of true versus calculated bacterial abundance estimation.Our results show that higher CS,which increases the rigor of taxonomic classification by requiring greater k-mer agreement,generally decreases the classification rate.This effect is particularly pronounced for smaller databases such as Minikraken and Standard-16,where no reads could be classified when the CS was above 0.4.In contrast,for larger databases such as Standard,nt and GTDB r202,precision and F1 scores improved significantly with increasing CS,highlighting their robustness to stringent conditions.Recovery rates were mostly stable,indicating consistent detection of species under different CS settings.Crucially,the results show that a comprehensive reference database combined with a moderate CS(0.2 or 0.4)significantly improves classification accuracy and sensitivity.This finding underscores the need for careful selection of database and CS parameters tailored to specific scientific questions and available computational resources to optimize the results of metagenomic analyses.

关 键 词:METAGENOME Taxonomic classification Kraken2 Reference database Confidence score 

分 类 号:Q78[生物学—分子生物学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象