The Biological Significance of Multi-copy Regions and Their Impact on Variant Discovery  

在线阅读下载全文

作  者:Jing Sun Yanfang Zhang Minhui Wang Qian Guan Xiujia Yang Jin Xia Ou Mingchen Yan Chengrui Wang Yan Zhang Zhi-Hao Li Chunhong Lan Chen Mao Hong-Wei Zhou Bingtao Hao Zhenhai Zhang 

机构地区:[1]State Key Laboratory of Organ Failure Research,National Clinical Research Center for Kidney Disease,Division of Nephrology,Nanfang Hospital,Southern Medical University,Guangzhou 510515,China [2]Department of Bioinformatics,School of Basic Medical Sciences,Southern Medical University,Guangzhou 510515,China [3]Key Laboratory of Mental Health of the Ministry of Education,Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence,Southern Medical University,Guangzhou 510515,China [4]Center for Precision Medicine,Shunde Hospital of Southern Medical University,Foshan 528399,China [5]Microbiome Medicine Center,Division of Laboratory Medicine,Zhujiang Hospital,Southern Medical University,Guangzhou 510282,China [6]Division of Epidemiology,School of Public Health,Southern Medical University,Guangzhou 510515,China

出  处:《Genomics, Proteomics & Bioinformatics》2020年第5期516-524,共9页基因组蛋白质组与生物信息学报(英文版)

基  金:supported by the National Natural Science Foundation of China(NSFC,Grant No.31771479);Science Fund for Creative Research Groups of the NSFC(Grant No.81521003);Projects of International Cooperation and Exchanges of NSFC(Grant No.61661146004);Municipal Planning Projects of Scientific Technology of Guangdong(Grant No.201804020083);the Science and Technology Program of Guangzhou(Grant No.201400000004);the Natural Science Foundation of Guangdong(Grant No.2015B050501006);the Team Program of Natural Science Foundation of Guangdong(Grant No.2014A030312002);the 1000 Talents Program of China。

摘  要:Identification of genetic variants via high-throughput sequencing(HTS)technologies has been essential for both fundamental and clinical studies.However,to what extent the genome sequence composition affects variant calling remains unclear.In this study,we identified 63,897 multi-copy sequences(MCSs)with a minimum length of 300 bp,each of which occurs at least twice in the human genome.The 151,749 genomic loci(multi-copy regions,or MCRs)harboring these MCSs account for 1.98% of the genome and are distributed unevenly across chromosomes.MCRs containing the same MCS tend to be located on the same chromosome.Gene Ontology(GO)analyses revealed that 3800 genes whose UTRs or exons overlap with MCRs are enriched for Golgirelated cellular component terms and various enzymatic activities in the GO biological function category.MCRs are also enriched for loci that are sensitive to neocarzinostatin-induced double-strand breaks.Moreover,genetic variants discovered by genome-wide association studies and recorded in dbSNP are significantly underrepresented in MCRs.Using simulated HTS datasets,we show that false variant discovery rates are significantly higher in MCRs than in other genomic regions.These results suggest that extra caution must be taken when identifying genetic variants in the MCRs via HTS technologies.

关 键 词:Multi-copy sequence Multi-copy region Genetic study Variant discovery High-throughput sequencing 

分 类 号:Q811.4[生物学—生物工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象