检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:路畅 陈霞 王峻[1] 余国先[1] 余志文 Chang LU;Xia CHEN;Jun WANG;Guoxian YU;Zhiwen YU(College of Computer and Information Sciences,Southwest University,Chongqing 400715,China;College of Computer Science and Technology,South China University of Technology,Guangzhou 510006,China)
机构地区:[1]西南大学计算机与信息科学学院,重庆400715 [2]华南理工大学计算机科学与工程学院,广州510006
出 处:《中国科学:信息科学》2018年第8期1035-1050,共16页Scientia Sinica(Informationis)
基 金:国家自然科学基金(批准号:61402378;61572199;61741217);重庆市基础与前沿研究计划项目(批准号:cstc2014jcyj A40031;cstc2016jcyj A0351)资助
摘 要:蛋白质功能自动标注是生物信息学领域的关键问题之一.蛋白质功能标注信息来源广泛,噪声标注信息不可避免地被引入.已有蛋白质功能预测研究更关注预测功能信息完全未知(或部分已知)蛋白质的功能,极少关注识别蛋白质的噪声功能标注.本文提出一种基于稀疏语义相似度的蛋白质噪声功能标注识别方法 (identifying noisy functional annotations of proteins using sparse semantic similarity,NFA).NFA首先利用一个蛋白质–功能标签关联矩阵存储蛋白质功能标注信息,对不同证据的功能标注信息分别加权,再利用功能标签间层次结构关系向上传播这些权重到拓展的功能标注上;其次,在加权后的关联矩阵上利用l1-norm约束的稀疏表示计算蛋白质之间的语义相似度;最后基于一个蛋白质的语义近邻蛋白质的功能标注信息投票识别该蛋白质的噪声功能.在酵母菌和拟南芥这两个模式生物上的实验结果表明,NFA较现有算法能更准确识别蛋白质噪声功能标注,剔除NFA识别出的噪声功能标注能够提升现有蛋白质功能预测算法的精度.Automatically annotating functions of proteins is a key task in bioinformatics.Functional annotations of proteins are collected from multiple sources; thus,noisy annotations are inevitably introduced.However,the current research in protein function prediction almost always focuses on predicting functions for completely unannotated(or incompletely annotated) proteins,and seldom identifies the noisy annotations of proteins.In this paper,we propose a method called identifying noisy functional annotations(NFAs) of proteins using sparse semantic similarity.NFA first utilizes a protein-function association matrix to store the functional annotations of proteins,differentially weighs the annotations using the evidence codes attached with these annotations,and subsequently upward propagates the weights to the expanded annotations via the hierarchical structure among the functional labels.Next,NFA measures the semantic similarity between proteins by the l1-norm regularized sparse representation on the weighted protein-function association matrix.Finally,it identifies the noisy functions of a protein based on the functions annotated to its semantic neighborhood proteins.The experimental results on two model species(A.thaliana and S.cerevisiae) show that the NFA more accurately identifies noisy annotations than other related methods.Additionally,removing the identified noisy annotations improves the accuracy of the current function prediction model.
关 键 词:蛋白质功能 噪声功能标注 稀疏表示 语义相似度 标签结构
分 类 号:Q51[生物学—生物化学] TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.222.164.159