检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨乌日吐[1] 李前忠[1] 刘利[1] 樊国梁[1]
机构地区:[1]内蒙古大学理工学院物理系,呼和浩特010021
出 处:《现代生物医学进展》2007年第5期790-792,794,共4页Progress in Modern Biomedicine
基 金:国家自然科学基金(30560039);高等学校博士学科点专项科研基金;内蒙古自然科学基金的资助。
摘 要:选择性剪切是调解基因表达的重要机制。识别选择性剪切位点是后基因组时代的一个重要工作。本文从最新的EBI人类基因选择性剪切数据库中,选取5′/3′选择性剪切位点作为正集,选取在剪切位点附近的假剪切位点作为负集,并把所有的选择性剪切位点和假剪切位点随机分成训练集和测试集。本文选用的预测选择性剪切位点的方法是基于位置权重矩阵和离散增量的支持向量机方法。此方法仅基于训练集,以不同位点的单碱基概率和序列片断的三联体频数作为信息参数,利用位置权重矩阵和离散增量算法结合支持向量机,得到了选择性供体位点和受体位点的分类器,并用此分类器对测试集中的选择性供体位点和受体位点进行预测。对独立测试集中的选择性供体位点和选择性受体位点的预测成功率分别为88.74%和90.86%,特异性分别为85.62%和81.19%。本文预测选择性剪切位点的方法成功率高于其它选择性剪切位点预测方法预测成功率,此预测方法进一步提高了对选择性剪切位点的理论预测能力。Alternative splicing, which makes the same DNA sequence to product more than one protein sequences, plays an important role in regulating gene expression. Recognition of alternative splicing sites is one of the most important work in postgenome era. In this paper, the alternative 5'/3' splicing sites (alternative donor/acceptor sites ) obtained from the latest human alternative splicing database of EBI were selected as the positive set, and the pseudo splicing sites and flanking splicing sites were selected as the negative set. The pseudo donor site and pseudo acceptor site were meant the GT/AG sites of DNA sequence ,in which , splicing action dicl not happen at anytime. All alternative splicing sites and pseudo splicing sites were randomly divided into two independent parts: training set and testing set. The training set included 723 alternative donor sites, 1060 alternative acceptor sites , 727 pseudo donor sites and 755 pseudo acceptor sites; the testing set included 2894 alternative donor sites, 4244 alternative acceptor sites , 38284 pseudo donor sites and 29458 pseudo acceptor sites. In this paper, a new method based on support vector machine method combined with position weight matrix and increment of diversity was introduced to predict alternative splicing sites. Training set's mononucleotide frequencies of different sites were selected as position weight matrix's parameters and sequence fraction's 3-met frequencies of training set were selected as parameters of diversity source, receiving the scoring faction and increment of diversity which were the support vector machine's parameters. The alternative donor sites and alternative acceptor sites in the independent testing set were predicted by the support vector machine classifier which was made up of the support vector machine method and position weight matrix and increment of diversity. The predictive results showed that the accuracies of prediction were 88.74010 and 90,86010, respectively for alternative donor sites and alternative acc
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.200