一种基于图论的计算蛋白质数据库代表序列的算法

A graph theoretic algorithm for computing representative sequences

机构地区：[1]华东师范大学软件学院,上海市200062 [2]中国科学院上海生命科学研究院计算生物学研究所(中国科学院-马普学会计算生物学伙伴研究所),上海市200031

出　　处：《计算机与应用化学》2008年第5期607-610,共4页Computers and Applied Chemistry

摘　　要：许多生物序列数据库中都含有大量的冗余序列,这些冗余序列通常不利于对数据库的统计分析和处理,而且它们要占用更多的计算机存储和处理资源。针对这个问题,本文中我们设计了一种去除蛋白质冗余序列的算法。该算法基于图论最大独立集的概念来生成非冗余序列集合,对目前存在的不少蛋白质去冗余程序所采用的由Hobohm和Sander最早设计的一种首先将序列分成若干簇然后取出代表序列的算法进行了改进,使得生成了更多的非冗余代表序列集合,避免了一些非冗余的序列也被去除。我们开发出了实现该算法的程序FastCluster,可以用来去除蛋白质数据库中的冗余序列。Many biological sequence databases have redundant sequences which are not helpful to statistical analysis and require more computational time and resources to process. Currently there are some programs used to remove redundant protein sequences. Most of them use the algorithm of Hobohm and Sander which also removes some non-redundant sequences and thus generates a relatively smaller non-redundant sequence set. In this paper, we present a graph theoretic algorithm to compute representative sequences, which makes an improvement to Hobohm and Sander＇s algorithm and can avoid removing non-redundant sequences effectively by using the concept of maximum independent set. This algorithm can produce more non-redundant sequence set and has been implemented in our program FastCluster.

关键词：生物信息学最大独立集代表序列去冗余

分类号：TP301[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于图论的计算蛋白质数据库代表序列的算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于图论的计算蛋白质数据库代表序列的算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索