The statistical power of k-mer based aggregative statistics for alignment-free detection of horizontal gene transfer  

在线阅读下载全文

作  者:Guan-Da Huang Xue-Mei Liu Tian-Lai Huang Li-C.Xia 

机构地区:[1]School of Physics and Optoelectronics,South China University of Technology,Guangzhou,510640,China [2]Department of Medicine,Stanford University School of Medicine,Stanford,CA,94305,USA

出  处:《Synthetic and Systems Biotechnology》2019年第3期150-156,共7页合成和系统生物技术(英文)

基  金:L.C.X.was supported by the Innovation in Cancer Informatics Fund.

摘  要:Alignment-based database search and sequence comparison are commonly used to detect horizontal gene transfer(HGT).However,with the rapid increase of sequencing depth,hundreds of thousands of contigs are routinely assembled from metagenomics studies,which challenges alignment-based HGT analysis by overwhelming the known reference sequences.Detecting HGT by k-mer statistics thus becomes an attractive alternative.These alignment-free statistics have been demonstrated in high performance and efficiency in wholegenome and transcriptome comparisons.To adapt k-mer statistics for HGT detection,we developed two aggregative statistics T^(S)_(sum ) and T^(*)_(sum),which subsample metagenome contigs by their representative regions,and summarize the regional D^(S) _(2) and D^(*)_(2)metrics by their upper bounds.We systematically studied the aggregative statistics’power at different k-mer size using simulations.Our analysis showed that,in general,the power of T^(S)_(sum) and T^(*)_(sum) increases with sequencing coverage,and reaches a maximum power>80%at k=6,with 5%Type-I error and the coverage ratio>0.2x.The statistical power ofT^(S)_(sum) and T^(*)_(sum) was evaluated with realistic simulations of HGT mechanism,sequencing depth,read length,and base error.We expect these statistics to be useful distance metrics for identifying HGT in metagenomic studies.

关 键 词:Alignment-free sequence comparison k-mer Horizontal gene transfer Statistical power 

分 类 号:F42[经济管理—产业经济]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象