Performance Comparison of Computational Methods for the Prediction of the Function and Pathogenicity of Non-coding Variants  被引量:1

在线阅读下载全文

作  者:Zheng Wang Guihu Zhao Bin Li Zhenghuan Fang Qian Chen Xiaomeng Wang Tengfei Luo Yijing Wang Qiao Zhou Kuokuo Li Lu Xia Yi Zhang Xun Zhou Hongxu Pan Yuwen Zhao Yige Wang Lin Wang Jifeng Guo Beisha Tang Kun Xia Jinchen Li 

机构地区:[1]National Clinical Research Centre for Geriatric Disorders,Department of Geriatrics,Xiangya Hospital,Central South University,Changsha 410008,China [2]Department of Neurology,Xiangya Hospital,Central South University,Changsha 410008,China [3]Centre for Medical Genetics&Hunan Key Laboratory of Medical Genetics,School of Life Sciences,Central South University,Changsha 410008,China [4]Reproductive Medicine Center,Xiangya Hospital,Central South University,Changsha 410008,China

出  处:《Genomics, Proteomics & Bioinformatics》2023年第3期649-661,共13页基因组蛋白质组与生物信息学报(英文版)

基  金:supported by the National Natural Science Foundation of China(Grant No.81801133 to JL);the Young Elite Scientist Sponsorship Program by China Association for Science and Technology(Grant No.2018QNRC001 to JL);the Innovation-Driven Project of Central South University,China(Grant No.20180033040004 to JL);the Natural Science Foundation for Young Scientists of Hunan Province,China(Grant No.2019JJ50974 to GZ);the Natural Science Foundation of Hunan Province for outstanding Young Scholars,China(Grant No.2020JJ3059 to JL).

摘  要:Non-coding variants in the human genome significantly influence human traits and complex diseases via their regulation and modification effects.Hence,an increasing number of computational methods are developed to predict the effects of variants in human non-coding sequences.However,it is difficult for inexperienced users to select appropriate computational methods from dozens of available methods.To solve this issue,we assessed 12 performance metrics of 24 methods on four independent non-coding variant benchmark datasets:(1)rare germline variants from clinical relevant sequence variants(ClinVar),(2)rare somatic variants from Catalogue Of Somatic Mutations In Cancer(COSMIC),(3)common regulatory variants from curated expression quantitative trait locus(eQTL)data,and(4)disease-associated common variants from curated genomewide association studies(GWAS).All 24 tested methods performed differently under various conditions,indicating varying strengths and weaknesses under different scenarios.Importantly,the performance of existing methods was acceptable for rare germline variants from ClinVar with the area under the receiver operating characteristic curve(AUROC)of 0.4481–0.8033 and poor for rare somatic variants from COSMIC(AUROC=0.4984–0.7131),common regulatory variants from curated eQTL data(AUROC=0.4837–0.6472),and disease-associated common variants from curated GWAS(AUROC=0.4766–0.5188).We also compared the prediction performance of 24 methods for non-coding de novo mutations in autism spectrum disorder,and found that the combined annotation-dependent depletion(CADD)and context-dependent tolerance score(CDTS)methods showed better performance.Summarily,we assessed the performance of 24 computational methods under diverse scenarios,providing preliminary advice for proper tool selection and guiding the development of new techniques in interpreting non-coding variants.

关 键 词:Non-coding variant Pathogenicity estimation Functional prediction Performance assessment Prediction model 

分 类 号:Q51[生物学—生物化学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象