机构地区:[1]东华大学计算机科学与技术学院,上海201620 [2]上海立信会计金融学院信息管理学院,上海201620
出 处:《计算机学报》2024年第5期1045-1064,共20页Chinese Journal of Computers
基 金:国家自然科学基金项目(62372101,62272097,61873337)资助。
摘 要:许多实体之间的关系可以建模为时态图,其中每条边都与表示其发生的时间相关联,k-core是捕获密集子图的基本模型,在近些年得到了广泛研究.给定时间区间I=[s,e]和k值,时态图G上的k-core子图查询从区间I对应的快照图GI中返回相应的k-core子图.针对时态图中的k-core子图查询问题,现有方法是基于PHC索引(Pruned Historical Core-Index)的算法.对任意可能的k值,PHC索引维护了所有可能出现在某个时间区间的k-core子图中的顶点集Sk,且为集合中每个顶点存储了一组时间区间,用于判定该点是否属于给定时间区间的k-core子图.基于PHC索引查询k-core子图时,需要访问Sk集合中的所有顶点,并判断每个顶点的可满足性.由于Sk集合对应于最大区间快照图的k-core子图里的所有顶点,且实际中用户查询区间对应的快照图往往比最大区间快照图小得多,基于PHC索引的查询算法存在许多无效判断,需要对大量不在结果集中的顶点进行检测,且无效检测次数随着查询区间的缩短而增多,从而导致算法效率较低.针对该问题,本文提出一种新的索引,即最短区间历史核索引SIHC(Shortest Interval Historical Core Index).SIHC索引的基本思想是通过维护最短k核区间到顶点的倒排表,查询处理时,可基于用户给定的时间区间定位到SIHC索引中满足条件的区间,进而直接得到满足条件的k-core子图中的顶点,从而避免了基于PHC索引进行查询时所需的大量无效判断.我们从理论上证明了基于SIHC索引处理时态图上k-core子图查询的正确性,并设计了高效的索引构建算法.最后,基于真实世界的时态图进行了实验,实验结果表明本文提出的算法比现有算法快1~2个数量级.In real applications,the relationships between entities are usually modeled using a temporal graph,where each edge is associated with a timestamp denoting the time of the interaction between two entities at that moment.The k-core is a fundamental model for capturing dense subgraphs and has been widely studied in recent years.Given a temporal graph G and a time interval I-[s,e],the snapshot graph G of G consists of all vertices of G and edges with timestamps within I,where edges with the same endpoints are merged into a single edge without timestamp.Based on the snapshot graph Gr,the k-core subgraph query on a temporal graph G returns the corresponding k-core subgraph from Gr.For the k-core subgraph query problem in temporal graphs,the state-of-the-art method is based on the PHC index(Pruned Historical Core-Index).For any possible value of k,the PHC index maintains a set S,of vertices that may appear in the k-core subgraph within any time interval,and for each vertex of Ss,PHC stores a set of time intervals to determine if it belongs to the k-core subgraph of a given time interval.When querying the k-core subgraph based on the PHC index,it is necessary to access all vertices of S and determine the satisfiability of each vertex.Since S:corresponds to all vertices in the k-core subgraph of the snapshot graph corresponding to the largest interval,and the snapshot graph of the given query interval is often much smaller than the snapshot graph of the largest interval,the query algorithm based on the PHC index involves many invalid judgments,requiring detection of a large number of vertices that are not in the result set.Even worse,the number of invalid detections increases as the query interval becomes shorter,resulting in low efficiency.We show that it is the adopted interval which is not the shortest one that results in the low efficiency of PHC index.We theoretically prove the correctness of the k-core subgraph query on temporal graphs based on the shortest intervals,based on which we propose a new index called Shor
关 键 词:图数据管理 时态图 密集子图 k-core 最短k核区间
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...