基于知识编码的剪切位点预测  被引量:3

Knowledge-Based Encoding Applied to Splice-Site Recognition

在线阅读下载全文

作  者:黄金艳[1] 李通化[1] 陈开[1] 

机构地区:[1]同济大学化学系,上海200092

出  处:《同济大学学报(自然科学版)》2007年第11期1548-1551,1561,共5页Journal of Tongji University:Natural Science

基  金:国家自然科学基金资助项目(20275026)

摘  要:在现有生物统计中,对脱氧核糖核酸中碱基的编码表达主要限于腺嘌呤,鸟嘌呤,胞嘧啶和胸腺嘧啶4种.但这种编码方式的变量太少,同时没有考虑碱基在脱氧核糖核酸中的位置信息,在剪切位点预测中,准确率不会超过90%.据此采用基于知识的编码方式,即真剪切位点与假剪切位点的统计差表,结合支持向量机方法,大大提高了剪切位点识别的准确率,并进一步采用碱基的统计特征的多变量编码方式使真给体位点和假给体位点的预报率分别达到96.4%和93.0%,真受体位点和假受体位点的预报率分别达到94.4%和93.0%.In biological statistics, the encoding of bases or nucleotides is usually limited to four types ie. adenine (A) , cytosine (C), guanine (G) and thymine (T) for DNA. Two issues make the biological statistics imperfect with such encoding when one refers to the DNA sequences. One is that the number of types is too small; the other is that the encoding of the same nucleotide is always the same no matter where the nucleotide is. In splice sites prediction, for example, the accuracy is lower than ninety per- cent though the sequences adjacent to the splice sites have a high conservation. In order to improve the prediction accuracy, much attention has been paid to the improvement of the performance of the algo- rithms adopted, and little attention to solving the fundamental issue, namely, nucleotide encoding. In this paper, a predictor is constructed to predict the true and false splice sites for higher eukaryotes based on support vector machines. The results show that the accuracy for the prediction of true donor sites and pseudo-sites are 96.3 %, 93.1% respectively, and the accuracy for prediction of true acceptor sites and pseudo-sites are 94.0 %, 93.1% respectively.

关 键 词:基因识别 支持向量机 剪切位点识别 编码方法 

分 类 号:Q811.4[生物学—生物工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象