检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:崔颖 徐泽龙[2] 李建中 CUI Ying;XU Zelong;LI Jianzhong(Electronic Engineering College,Heilongjiang University,Harbin 150080,P.R.China;School of Bioinformatics Sciences and Technology,Harbin Medical University,Harbin 150081,P.R.China;School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,P.R.China)
机构地区:[1]黑龙江大学电子工程学院,哈尔滨150080 [2]哈尔滨医科大学生物信息科学与技术学院,哈尔滨150081 [3]哈尔滨工业大学计算机科学与技术学院,哈尔滨150001
出 处:《生物医学工程学杂志》2020年第3期496-501,共6页Journal of Biomedical Engineering
基 金:国家自然科学基金资助项目(61832003)。
摘 要:本文基于Z曲线(z-curve)理论和位置权重矩阵(PWM)提出一种构建核小体DNA序列的模型。该模型将核小体DNA序列集转换成三维空间坐标,通过计算该序列集的位置权重矩阵获得相似性权重得分,将两者整合得到综合序列特征模型(CSeqFM),并分别计算候选核小体序列和连接序列到模型CSeqFM的欧氏距离作为特征集,投入到支持向量机(SVM)中训练和检验,通过十折交叉验证进行性能评估。结果显示,酵母核小体定位的敏感性、特异性、准确率和Matthews相关系数(MCC)分别为97.1%、96.9%、94.2%和0.89,受试者操作特征(receiver operating characteristic,ROC)曲线下面积(area under curve,AUC)达到0.980 1。与其他相关Z曲线方法比较,CSeqFM方法在各项评估指标中均表现出优势,具有更好的识别效果。同时,将CSeqFM方法推广到线虫、人类和果蝇的核小体定位识别中,AUC均高于0.90,与iNuc-STNC和iNuc-PseKNC方法比较,CSeqFM方法也表现出较好的稳定性和有效性,进一步表明该方法具有较好的可靠性和识别效能。In this article, based on z-curve theory and position weight matrix(PWM), a model for nucleosome sequences was constructed. Nucleosome sequence dataset was transformed into three-dimensional coordinates, PWM of the nucleosome sequences was calculated and the similarity score was obtained. After integrating them, a nucleosome feature model based on the comprehensive DNA sequences was obtained and named CSeqFM. We calculated the Euclidean distance between nucleosome sequence candidates or linker sequences and CSeqFM model as the feature dataset, and put the feature datasets into the support vector machine(SVM) for training and testing by ten-fold crossvalidation. The results showed that the sensitivity, specificity, accuracy and Matthews correlation coefficient(MCC) of identifying nucleosome positioning for S. cerevisiae were 97.1%, 96.9%, 94.2% and 0.89, respectively, and the area under the receiver operating characteristic curve(AUC) was 0.980 1. Compared with another z-curve method, it was found that our method had better identifying effect and each evaluation performance showed better superiority. CSeqFM method was applied to identify nucleosome positioning for other three species, including C. elegans, H. sapiens and D. melanogaster.The results showed that AUCs of the three species were all higher than 0.90, and CSeqFM method also showed better stability and effectiveness compared with iNuc-STNC and iNuc-PseKNC methods, which is further demonstrated that CSeqFM method has strong reliability and good identification performance.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.33