基于KST索引的最大连通Steiner分量查询算法  被引量:1

The KST Index Based Querying Algorithm for Steiner Maximum-Connected Components

在线阅读下载全文

作  者:陈子阳 陈伟[2,3] 贾勇[2] 周军锋 CHEN Zi-Yang;CHEN Wei;JIA Yong;ZHOU Jun-Feng(School of Information and Management,Shanghai Lixin University of Accounting and Finance,Shanghai 201620;School of Information Science and Engineering,Yanshan University,Qinhuangdao,Hebei 066004;Department of Information Engineering,Hebei University of Environmental Engineering,Qinhuangdao,Hebei 066102;School of Computer Science and Technology,Donghua University,Shanghai 201620)

机构地区:[1]上海立信会计金融学院信息管理学院,上海201620 [2]燕山大学信息科学与工程学院,河北秦皇岛066004 [3]河北环境工程学院信息工程系,河北秦皇岛066102 [4]东华大学计算机科学与技术学院,上海201620

出  处:《计算机学报》2020年第7期1215-1229,共15页Chinese Journal of Computers

基  金:国家自然科学基金(61472339,61572421,61873337)资助.

摘  要:查找图的连通分量在生物信息学领域有着重要应用价值,其中的关键问题之一是查询最大连通Steiner分量(SMCC).针对已有最大连通Steiner分量查询方法中存在的查询效率低的问题,本文首先提出利用k-edge连通分量与(k+1)-edge连通分量之间的包含关系建立顶点集合的分层索引KST.和现有的专用索引相比,KST索引规模得到了缩减;然后本文提出了基于KST索引的SMCC查询算法以及具有顶点数量限制的SMCC L查询算法.和已有方法中索引的是图中顶点不同,KST索引中维护的是顶点集合的包含关系.其优点在于将已有方法在遍历过程中的一次一顶点的查询方式转换为更高效的一次一集合的查询方式,显著减少了需要访问的索引点数量,极大提升了查询处理的效率;最后,基于15个真实数据集进行实验测试,从不同角度验证了本文所提方法的高效性.Given a graph G,and a set of query nodes q,the Steiner Maximum-Connected Component(SMCC)is a connected subgraph of G with the maximum connectivity and the maximum number nodes which contains q.And SMCC L is the SMCC of G with the constraint of the number of nodes L.Finding SMCC or SMCC L is one of the fundamental operations in graph data processing,and is one of the hot issues,which has attracted much attention in the research field and can be applied in many applications,including bio-informatics,etc.Existing methods for SMCC and SMCC L query processing are mainly classified into the following two categories:(1)Finding SMCC or SMCC L based on existing k-edge connected component algorithms which decrease the value of k from|V|(the number of nodes in graph G)to 1 in turn,calculate all k-edge connected components in G,then the first k-edge connected component containing query q is SMCC of q,and the first k-edge connected component containing q whose number of nodes is greater than or equal to L is SMCC L of q.The shortcomings of such methods are that the search process needs to traverse graph G many times and the computation cost is high when the size of graph is large.(2)Finding SMCC or SMCC L based on special index which constructs the MST(Maximum Spanning Tree)index of G offline.When processing the query,it firstly calculates the connectivity of q,traverses the MST with any node in q as the start node,and only accesses the nodes corresponding to the edges satisfying specific conditions until all nodes in q are covered,then the visited nodes are the nodes in the SMCC of q.The shortcomings of such methods are that querying SMCC needs expensive traversal operations,each step of traversal can obtain at most one useful node(namely one-node-a-step).When the number of nodes in SMCC is large,the number of nodes that need to be accessed in the index tree will increase.Moreover,when query requests are executed frequently,the system load will increase rapidly.Considering that existing approaches on querying SMCC and SMCC L

关 键 词:无向图 k-edge连通分量 最大连通Steiner分量 索引 最大生成树 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象