一种可扩展半径的RNA二级结构密度聚类算法  

Density-based Clustering with Extensible Radius for RNA Secondary Structure Prediction

在线阅读下载全文

作  者:王常武[1,2] 王秀芹[1,2] 魏真真[1,2] 王宝文[1,2] 刘文远[1,2] 李永强[3] 

机构地区:[1]燕山大学信息科学与工程学院,秦皇岛066004 [2]河北省计算机虚拟技术与系统集成重点实验室,秦皇岛066004 [3]秦皇岛市第一医院,秦皇岛市第一医院信息科,秦皇岛066000

出  处:《小型微型计算机系统》2015年第9期1968-1972,共5页Journal of Chinese Computer Systems

摘  要:基于自由能模型预测RNA二级结构时,真实结构可能存在于高于最小自由能一定范围内的次优结构集合中.通过对RNA次优结构集合聚类,选取代表性的结构,可以提高RNA二级结构预测的准确率.针对可变密度的RNA二级结构数据集合,提出了一种可扩展半径的密度聚类算法.算法利用特征选择方法对特征集合进行筛选,选取与聚类相关度较高的特征子集,降低聚类空间的维度.聚类过程,以最大密度对象作为簇的初始聚类中心,根据簇内的密度分布情况和密度变化参数更新簇的半径,直到簇扩展完成.实验表明,该算法可以识别并处理变密度簇,能够有效地聚类RNA二级结构.Prediction of RNA secondary structure based on free energy model produces the problem that the true structure may be a suboptimal structure w ithin an energy increment above the minimum free energy. The accuracy of the true RNA structure prediction can be improved through grouping suboptimal structures into a small number of clusters and computing a representative structure for each cluster. In this paper,a density-based clustering algorithm w ith extensible radius dubbed ' ER-DBSCAN' is presented to handle the RNA dataset having variable density. Our method firstly adopts feature selection algorithm based on the consensus matrix to filter the feature set and select the features having the high correlation w ith clustering analysis to reduce the dimension of the clustering space.Next,the clustering module ER-DBSCAN starts w ith the maximum density object as the starting point of a new cluster,and adjusts the radius of the cluster based on the density distribution and density variation during cluster expansion. Our results indicate that ER-DBSCAN can detect and handle clusters of varying density,and cluster RNA secondary structures effectively.

关 键 词:RNA二级结构 次优结构 密度聚类算法 特征选择 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象