基于ITS条形码及机器学习的黄檀属物种分子鉴别研究  

Molecular identification of Dalbergia species based on ITS barcode with machine learning approaches

在线阅读下载全文

作  者:邝家荣 刘巧珍 代江鹏 谭智杰 林月霞 高晓霞 朱爽 KUANG Jiarong;LIU Qiaozhen;DAI Jiangpeng;TAN Zhijie;LIN Yuexia;GAO Xiaoxia;ZHU Shuang(School of Life Sciences and Biopharmaceutics,Guangdong Pharmaceutical University,Guangzhou 510006,China;School of Pharmacy,Guangdong Pharmaceutical University,Guangzhou 510006,China)

机构地区:[1]广东药科大学生命科学与生物制药学院,广东广州510006 [2]广东药科大学药学院,广东广州510006

出  处:《中草药》2024年第11期3825-3834,共10页Chinese Traditional and Herbal Drugs

基  金:广东省基础与应用基础研究基金自然科学基金面上项目(2022A1515011268)。

摘  要:目的提高黄檀属的物种鉴别成功率,并将机器学习方法与传统的基于距离/系统发育树的方法进行比较,筛选最优的ITS条形码分析方法。方法所使用的黄檀属物种ITS序列来自实验获得的3条以及从NCBI下载的399条共96个物种。以条形码ITS作为分子标记,对比距离法、系统发育树法及机器学习方法在黄檀属物种的鉴别成功率。结果在基于机器学习方法的分析中,黄檀属物种的平均鉴别成功率为39.59%,其中BLOG能识别出42个黄檀属物种,其正确序列分类占比为95.75%。另外,SMO、Naïve Bayes、JRip、J48能够识别出34个物种,分别获得了79.10%、58.71%、72.64%、76.37%的正确序列分类占比。基于系统发育树法与距离法的分析分别获得28.13%和36.46%的鉴别成功率。结论基于机器学习的黄檀属ITS条形码基原识别比距离法/系统发育树法拥有更高的鉴别成功率和社会经济效率。建议优先利用基于ITS条形码的机器学习方法对黄檀属物种进行基原识别。Objective To improve the identification success rate of Dalbergia and screen out the best ITS analysis methods,compare the machine learning methods with the traditional distance-based and phylogenetic tree-based methods to screen the optimal ITS barcode analysis method.Methods A total of 402 ITS sequences of Dalbergia species used in this study were collected by experiments(three ITS sequences)and downloaded from NCBI(399 ITS sequences)for a total of 96 species.The barcode ITS was used as a molecular marker to compare the success rate of distance method,phylogenetic tree method and machine learning method in the identification of Dalbergia species.Results In the analysis based on machine learning methods,the average identification success rate of Dalbergia species was 39.59%,of which 42 Dalbergia species could be recognized by BLOG,and the percentage of their correct sequence classification was 95.75%.In addition,SMO,Naïve Bayes,JRip and J48 can identify 34 species with the correct sequence distribution rate of 79.10%,58.71%,72.64%and 76.37%,respectively.The distance-based and phylogenetic tree-based methods obtained the species identification success rate of 36.46%and 28.13%,respectively.Conclusion ITS barcoding identification of Dalbergia based on machine learning approaches has higher identification success rate and socio-economic than traditional methods.It is recommended to prioritize the use of machine learning approaches to identify Dalbergia based on ITS barcode.

关 键 词:ITS 黄檀属 机器学习 DNA条形码 基原识别 

分 类 号:R286.12[医药卫生—中药学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象