基于位点相关概率模型的富亮氨酸重复序列预测  

Sequence prediction of leucine-rich repeat based on position-related possibility model

在线阅读下载全文

作  者:巩晶[1] 李雪[2] 陶超[2] 魏天迪[3,4] 

机构地区:[1]山东大学医学院癌症研究中心,济南250012 [2]山东大学泰山学堂,济南250100 [3]山东大学生命科学学院,济南250100 [4]山东省齐鲁干细胞工程有限公司博士后工作站,济南250101

出  处:《中国科技论文》2015年第6期626-628,637,共4页China Sciencepaper

基  金:高等学校博士学科点专项科研基金资助项目(20110131120024;20110131120045)

摘  要:富亮氨酸重复序列(leucine-rich repeat,LRR)是一种广泛存在的蛋白质结构基序,在诸多重要生命过程中起关键性作用并与诸多人类疾病紧密相关。研究LRR中各个位点之间的氨基酸分布的相关性,并基于此相关性建立概率模型,可应用于序列水平上的LRR预测,以提高LRR预测的准确度。本文从LRRML数据库中提取已知的LRR蛋白质序列作为训练集和测试集;为LRR各个位点上氨基酸的分布数据构建4种不同的概率模型,包括位点相关和位点不相关概率模型;再通过机器学习和K-折交叉验证的方法,确定可以用于LRR预测的最佳模型。结果表明,位点相关概率模型和位点不相关概率模型以不同权重相加之后的综合模型在LRR预测中显示出高的准确度。LRR中各个位点之间的氨基酸分布存在一定的相关性,此相关性可作为重要参数应用于LRR预测。Leucine-rich repeat(LRR)is a widely distributed protein motif,which is related to a large number of important life processes and human diseases.The correlation of amino acid distributions between different positions in LRRs was investigated,and the correlation was applied to sequence-level LRR predictions to improve the accuracy of LRR predictions.Known LRR protein sequences were extracted from the LRRML database as training set and test set.Four different possibility models were built for the amino acid distribution data at every position in LRRs,including position-related and position-irrelated models.The best model for LRR prediction was selected through machine-learning experiments with k-fold validations.A weighted model integrating aposition-related possibility model and a position-irrelated possibility model exhibited the highest accuracy in LRR prediction experiments.There is a correlation of amino acid distributions between different positions in LRRs,and this is significant enough to be used as an important parameter for LRR predictions.

关 键 词:生物信息学 富亮氨酸重复序列 序列算法 位点相关概率模型 

分 类 号:Q517[生物学—生物化学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象