检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]武汉大学数学与统计学院,武汉430072 [2]武汉大学深圳研究院,深圳518057
出 处:《计算机科学》2015年第4期177-180,共4页Computer Science
基 金:国家自然科学基金(61271337;61103126);教育部博士点基金(20100141120049);湖北省自然科学基金(2011CDB454);深圳市战略新兴产业发展专项资金项目(JCYJ20130401160028781)资助
摘 要:新药研制成功的关键在于药物靶点的发现和准确定位。在已知的药物靶点中,离子通道蛋白是一类广受欢迎的靶点,它与免疫系统、心血管等疾病密切相关。对于靶点的发现,传统生物方法成本高、耗时久。因此,探讨了基于机器学习的离子通道蛋白药物靶点的挖掘,以加快药物靶点发现过程,节约经费。由于药物靶点相关序列的长度不一致,考虑了蛋白质序列编码的13种特征,它们能将不等长的蛋白质序列转化成等长序列。通过数值实验筛选能够较好地区分靶点和非靶点的特征子集,并采用集成学习的方法整合特征得到预测模型。通过与已有工作的比较表明,提出的集成模型能得到较高的准确率,具有很好的应用前景。The identification of molecular targets is a critical step in the discovery and development process of new drugs.Among large known drug targets,ion channel proteins are the most attractive drug targets,which are closely linked to some diseases such as cardiovascular and central nervous systems.Traditional biological methods have the characteristics of high-cost and time-consuming in mining drug targets.Our work discussed the mining of potential ion channel drug targets based on random forests,which is aimed at speeding up the discovery process of drug targets and saving money.Since the lengths of sequences related to drug targets are diverse,thirteen types of protein encoding features were considered which can transform the protein sequences with distinct lengths into the sequences with same lengths in our study.A feature subset which has better performance in the division between drug targets and non-targets was chosen by numerical experiments and the ensemble learning was introduced to attain prediction models.Our study attains high accuracy by comparison to the developed methods,which plays the critical roles in the mining of new drug targets.
分 类 号:TP3-05[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.31