基于机器学习算法识别DNA甲基化位点  

Identification of DNA Methylation Sites Based on Machine Learning Algorithms

在线阅读下载全文

作  者:张亚群 李娜[1] 于波[1] 韩坤凌 吴顺军 ZHANG Yaqun;LI Na;YU Bo;HAN Kunling;WU Shunjun(School of Mathematical Sciences,Dezhou University,Dezhou Sha ndong 253000,China)

机构地区:[1]德州学院数学与大数据学院,山东德州253023

出  处:《德州学院学报》2024年第4期1-6,20,共7页Journal of Dezhou University

摘  要:DNA甲基化是表观遗传学中的研究热点之一。本研究提出一种预测DNA序列中甲基化位点的新方法,名为DPTRS。首先,利用二核苷酸组成(dinucleotide composition,DNC)、伪二核苷酸组成(pseudo dinucleotide composition,PseDNC)和三核苷酸组成(trinucleotide composition,TNC)对序列进行表征。其次,融合获得的信息并利用重复编辑的近邻方法(repeated edited nearest neighbour,RENN)进行不平衡处理。再次,通过局部线性嵌入(locally linear embedding,LLE)方法进行降维。最后,利用支持向量机(support vector machine,SVM)对特征子集进行预测。最终的预测结果基于十折交叉验证方法得出,基准数据集上预测准确率为91.48%。结果显示,DPTRS方法能够有效识别DNA序列中的甲基化位点。DNA methylation is one of the research hotspots in epigenetics.This study proposes a new method for predicting methylation sites in DNA sequences,named DPTRS.Firstly,the dinucleotide composition(DNC),pseudo dinucleotide composition(PseDNC)and the trinucleotide composition(TNC)are used to characterize the sequences.Secondly,the obtained information is fused and the repeated edited nearest neighbours(RENN)is used to deal with the imbalance.Thirdly,the dimensionality is reduced by the locally linear embedding(LLE).Finally,support vector machines(SVM)is used to predict the feature subset.The final prediction results are obtained based on the 10-fold cross validation method,and the prediction accuracy on the benchmark dataset is 91.48%.The results show that the DPTRS method can effectively identify methylation sites in DNA sequence.

关 键 词:DNA甲基化 不平衡处理 支持向量机 

分 类 号:Q811.4[生物学—生物工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象