eDNA监测测序数据分析注释中参考数据库选择、指标阈值选择、目标数据准备的影响——以长江中游鱼类为监测目标  

The impacts of reference database selection,indicator threshold determination and target data preparation on the sequence data analysis of eDNA monitoring Taking fish as the target in the middle Yangtze River

在线阅读下载全文

作  者:许兰馨 杨海乐 刘志刚[1] 杜浩[1] Xu Lanxin;Yang Haile;Liu Zhigang;Du Hao(Key Laboratory of Freshwater Biodiversity Conservation,Ministry of Agriculture and Rural Affairs,Yangtze River Fisheries Research Institute,Chinese Academy of Fishery Sciences,Wuhan 430223,P.R.China;Wuxi Fisheries College,Nanjing Agricultural University,Wuxi 214000,P.R.China)

机构地区:[1]中国水产科学研究院长江水产研究所,农业农村部淡水生物多样性保护重点实验室,武汉430223 [2]南京农业大学无锡渔业学院,无锡214000

出  处:《湖泊科学》2024年第6期1843-1852,共10页Journal of Lake Sciences

基  金:中央级公益性科研院所基本科研业务费专项(YFI202201);农业财政专项“长江禁捕后常态化监测专项”(CJJC-2023-01)联合资助。

摘  要:在基于宏条形码(meta-barcoding)的eDNA监测技术中,eDNA测序数据的分析和注释是决定监测结果判断和评估精准与否的基础,而参考数据库选择、指标阈值选择、目标数据准备是eDNA测序数据分析和注释中最为关键的3个技术环节。为厘清上述3个技术环节处理方案的影响,本研究以长江中游2组eDNA监测COI基因测序数据为分析对象,针对鱼类的检出进行3组实验来分别检验:1)不同参考数据库及物种注释算法对注释结果的影响;2)不同OTU聚类序列相似度和物种注释分类置信度(序列一致性和序列覆盖度)对注释结果的影响;3)目标数据中各物种不同序列丰富度对注释结果的影响。结果显示:1)Blast算法下,3个版本nt库注释出的物种基本一致(72%~78%),2个本地序列参考库注释出的物种也基本一致(91%~96%),这5个序列参考库注释出的物种52%~68%一致;nt库RDP Classifier算法注释出的物种覆盖95%以上Blast算法注释出的物种,并比Blast算法注释出的物种多151%~443%,多出的物种大都是错误注释,本地参考数据库RDP Classifier算法注释出的物种覆盖66%~85%的Blast算法注释出的物种,并存在数条只注释到科属的结果。2)OTU聚类序列相似度阈值,取值0.999比取值0.99获得的OTU多154%~209%,注释到鱼类的OTU多240%~490%;注释分类置信度阈值(Blast算法,序列一致性和序列覆盖度)从0.8到0.99注释获得的物种组成(94%以上)基本一致,OTU组成(83%以上)也基本一致,注释分类置信度阈值取0.7时注释获得的物种组成、OTU组成与取0.8及以上时注释获得的有较大差异。3)在OTU聚类序列相似度阈值为0.999、注释分类置信度阈值为0.9时,多序列数据注释所得鱼类物种数、OTU数最多,物种注释正确率最高(达81.49%),分别比单序列数据的多7%、215%和高5%。在具体eDNA测序数据的分析和注释中,可通过建立完善本地参考数据库、优化OTU聚类序列相似度和物种注In the meta-barcoding based eDNA monitoring technology,the analysis and annotation of eDNA sequence data serve as the foundation for obtaining accurate and reliable monitoring results.The selection of reference databases,the determination of analysis and annotation indicator thresholds,and the preparation of target data are the most critical technical steps in eDNA sequence data analysis and annotation.To clarify the impacts of these three technical aspects and provide scientific support for the standardization of eDNA monitoring technology,the current study used two sets of COI gene sequence data from eDNA monitoring in the middle reach of the Yangtze River as the analysis objects and designed three sets of experiments to test 1)the impacts of different reference databases and species annotation algorithms on the annotation results,2)the impacts of different OTU clustering sequence similarity and species annotation classification confidence(sequence consistency and sequence coverage)on the annotation results,and 3)the impacts of different target sequence data richness of each species on the annotation results.The results showed that:1)under the Blast algorithm,the annotated species matched with three versions of nt library from NCBI were generally consistent(72%-78%);those matched with two local sequence reference libraries were also generally consistent(91%-96%);and the annotated species from the five results matched with these five sequence reference libraries were consistent in 52%-68%.The RDP Classifier algorithm annotated species matched with nt libraries covered over 95%of Blast algorithm annotated species,and increased by 151%-443%species,but most additional species were misannotated.The RDP Classifier algorithm annotated species matched with local sequence reference libraries covered 66%-85%of Blast algorithm annotated species,and there were several results only annotated to family or genus level.2)When the OTU clustering sequence similarity threshold was set to 0.999,it obtained 154%-209%more OTUs than

关 键 词:环境DNA 鱼类 宏条形码 参考数据库 OTU聚类序列相似度 物种注释分类置信度 长江中游 

分 类 号:S932.4[农业科学—渔业资源]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象