基于综合DNA序列特征的支持向量机方法识别核小体定位  被引量:3

Identification of nucleosome positioning using support vector machine method based on comprehensive DNA sequence feature

在线阅读下载全文

作  者:崔颖 徐泽龙[2] 李建中 CUI Ying;XU Zelong;LI Jianzhong(Electronic Engineering College,Heilongjiang University,Harbin 150080,P.R.China;School of Bioinformatics Sciences and Technology,Harbin Medical University,Harbin 150081,P.R.China;School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,P.R.China)

机构地区:[1]黑龙江大学电子工程学院,哈尔滨150080 [2]哈尔滨医科大学生物信息科学与技术学院,哈尔滨150081 [3]哈尔滨工业大学计算机科学与技术学院,哈尔滨150001

出  处:《生物医学工程学杂志》2020年第3期496-501,共6页Journal of Biomedical Engineering

基  金:国家自然科学基金资助项目(61832003)。

摘  要:本文基于Z曲线(z-curve)理论和位置权重矩阵(PWM)提出一种构建核小体DNA序列的模型。该模型将核小体DNA序列集转换成三维空间坐标,通过计算该序列集的位置权重矩阵获得相似性权重得分,将两者整合得到综合序列特征模型(CSeqFM),并分别计算候选核小体序列和连接序列到模型CSeqFM的欧氏距离作为特征集,投入到支持向量机(SVM)中训练和检验,通过十折交叉验证进行性能评估。结果显示,酵母核小体定位的敏感性、特异性、准确率和Matthews相关系数(MCC)分别为97.1%、96.9%、94.2%和0.89,受试者操作特征(receiver operating characteristic,ROC)曲线下面积(area under curve,AUC)达到0.980 1。与其他相关Z曲线方法比较,CSeqFM方法在各项评估指标中均表现出优势,具有更好的识别效果。同时,将CSeqFM方法推广到线虫、人类和果蝇的核小体定位识别中,AUC均高于0.90,与iNuc-STNC和iNuc-PseKNC方法比较,CSeqFM方法也表现出较好的稳定性和有效性,进一步表明该方法具有较好的可靠性和识别效能。In this article, based on z-curve theory and position weight matrix(PWM), a model for nucleosome sequences was constructed. Nucleosome sequence dataset was transformed into three-dimensional coordinates, PWM of the nucleosome sequences was calculated and the similarity score was obtained. After integrating them, a nucleosome feature model based on the comprehensive DNA sequences was obtained and named CSeqFM. We calculated the Euclidean distance between nucleosome sequence candidates or linker sequences and CSeqFM model as the feature dataset, and put the feature datasets into the support vector machine(SVM) for training and testing by ten-fold crossvalidation. The results showed that the sensitivity, specificity, accuracy and Matthews correlation coefficient(MCC) of identifying nucleosome positioning for S. cerevisiae were 97.1%, 96.9%, 94.2% and 0.89, respectively, and the area under the receiver operating characteristic curve(AUC) was 0.980 1. Compared with another z-curve method, it was found that our method had better identifying effect and each evaluation performance showed better superiority. CSeqFM method was applied to identify nucleosome positioning for other three species, including C. elegans, H. sapiens and D. melanogaster.The results showed that AUCs of the three species were all higher than 0.90, and CSeqFM method also showed better stability and effectiveness compared with iNuc-STNC and iNuc-PseKNC methods, which is further demonstrated that CSeqFM method has strong reliability and good identification performance.

关 键 词:序列特征 支持向量机 核小体 Z曲线 位置权重矩阵 欧氏距离 

分 类 号:Q811.4[生物学—生物工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象