DNA序列新特征的提取方法及其在重组位点识别中的应用  

Extraction Method of New Features of DNA Sequence and Its Application in Recombination Spots Identification

在线阅读下载全文

作  者:程丽荣 赵熙强[1] Cheng Lirong;Zhao Xiqiang(School of Mathematical Sciences,Ocean University of China,Qingdao 266100,China)

机构地区:[1]中国海洋大学数学科学学院,山东青岛266100

出  处:《中国海洋大学学报(自然科学版)》2023年第6期59-64,共6页Periodical of Ocean University of China

基  金:国家自然科学基金项目(11271341)资助。

摘  要:为提升重组位点识别的预测性能,本文提出了一种新的特征提取方法来识别重组位点。分别利用Word2Vec模型编码的3-gram向量和DNA特性获得两组表示DNA序列的新特征,与已有的特征(FastText模型获取)进行组合来表示DNA序列,使用支持向量机为分类算法,在基准数据集上进行5倍交叉验证。研究表明,本文提出的方法在识别重组位点方面获得了93.88%的敏感性、95.08%的特异性、94.54%的准确率和0.8902的马修斯相关系数,以上指标均优于现有的方法,本文所提出的方法为解决生物学的序列信息提取问题提供了一种新思路。In this paper,a new feature extraction method is proposed to identify recombination spots.Two groups of new features representing DNA sequences were obtained from DNA properties and 3-gram vector encoded by the Word2Vec model,and then they were combined with the existing features(obtained from the FastText model).SVM is used as the classification algorithm to perform 5-fold cross-validation on the benchmark dataset.Finally,the prediction performance is obtained by the proposed method a sensitivity(Sen)of 93.88%,specificity(Spec)of 95.08%,accuracy(Acc)of 94.54%,MCC of 0.8902 and area under the curve of 0.99,all above indicators better than the existing methods,indicating that the proposed method is successful.In addition,the proposed method provides a new idea for solving the related problems of sequence information extraction in biology.

关 键 词:DNA序列 重组位点 Word2Vec模型 词向量 3-gram 二核苷酸属性 支持向量机 

分 类 号:O236[理学—运筹学与控制论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象