检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张光亚[1] 李红春[1] 高嘉强[1] 方柏山[1]
出 处:《生物工程学报》2008年第11期1968-1974,共7页Chinese Journal of Biotechnology
基 金:高等学校博士学科点专项科研基金项目(No.20070385001);福建省自然科学基金项目(No.2007J0360)资助~~
摘 要:从序列出发预测某蛋白质是否为脂肪酶以及属于哪种脂肪酶具有重要的理论和应用价值。提出了基于Z标度和T标度的伪氨基酸组成方法提取序列特征值,采用了k-近邻算法回答上述问题。经参数选择后,三种方法在各自最优运行参数下,其10倍交叉验证的结果为:对脂肪酶和非脂肪酶预测精度分别为92.8%、91.4%和91.3%;对脂肪酶类型预测的精度分别为92.3%、90.3%和89.7%。其中基于Z标度伪氨基酸组成效果最佳,基于T标度的次之,但均明显优于其他6种常见的特征值提取方法,并对其可能的原因进行了探讨。Lipases are widely used enzymes in biotechnology. Although they catalyze the same reaction, their sequences vary. Therefore, it is highly desired to develop a fast and reliable method to identify the types of lipases according to their sequences, or even just to confirm whether they are lipases or not. By proposing two scales based pseudo amino acid composition approaches to extract the features of the sequences, a powerful predictor based on k-nearest neighbor was introduced to address the problems. The overall success rates thus obtained by the 10-fold cross-validation test were shown as below: for predicting lipases and nonlipase, the success rates were 92.8%, 91.4% and 91.3%, respectively. For lipase types, the success rates were 92.3%, 90.3% and 89.7%, respectively. Among them, the Z scales based pseudo amino acid composition was the best, T scales was the second. They outperformed significantly than 6 other frequently used sequence feature extraction methods. The high success rates yielded for such a stringent dataset indicate predicting the types of lipases is feasible and the different scales pseudo amino acid composition might be a useful tool for extracting the features of protein sequences, or at lease can play a complementary role to many of the other existing approaches.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.21.28.69