基于距离度量的多样性图排序方法  被引量:16

Distance Metric Based Diversified Ranking on Large Graphs

在线阅读下载全文

作  者:李劲[1,2] 岳昆 蔡娇[1] 张志坚 刘惟一[3] 

机构地区:[1]云南大学软件学院,云南昆明650091 [2]云南省软件工程重点实验室,云南昆明650091 [3]云南大学信息学院,云南昆明650091

出  处:《软件学报》2018年第3期599-613,共15页Journal of Software

基  金:国家自然科学基金(61562091;61472345);第二批"云岭学者"培养项目(C6153001);云南省应用基础研究计划(2014FA023;2016FB110);云南大学中青年骨干教师培养计划项目;云南大学青年英才培育计划(WX173602);云南大学数据驱动的软件工程科技创新团队项目(2017HC012)~~

摘  要:有效结合查询相关性和多样性的扩展相关性,是多样性图排序问题的一种优化目标.基于扩展相关性的多样性图排序可建模为一个子模函数优化问题,贪心子模优化算法可近似求解该问题.然而,扩展相关性不能直接度量节点间的不相似性.子模优化算法是串行算法,不能充分利用诸如Spark等集群计算平台有效提高算法效率.针对这些问题,提出一种描述节点间不相似性的距离度量.基于该距离度量,将多样性图排序问题建模为一个在查询相关节点集上构造的带权完全图的最大和k-dispersion优化问题.提出了求解该问题的多项式时间2-近似算法.鉴于不同节点对的距离度量计算是相互独立的,进一步提出了基于MapReduce编程模型的并行化多样性图排序算法.最后,在真实图数据集上验证了所提出算法的高效性和有效性.Expansion relevance which combines both relevance and diversity into a single function is resorted to a submodular optimization objective that can be solved by applying the classic cardinality constrained monotone submodular maximization. However, expansion relevance do not directly capture the dis-similarity over a pair of nodes. Existing submodular algorithms are sequential and not easy to take full advantage of the power of distributed cluster computing platform, such as Spark, to significantly improve the efficiency of algorithm. To tackle this issue, in this paper, a distance metric, which is defined by a sum function of personalized PageRank scores over the symmetry difference of neighbors of a pair of nodes, is first introduced to capture the pairwise dis-similarity over pairs of nodes. Then, the problem of diversified ranking on graphs is formulated as a max-sum k-dispersion problem with metrical edge weight. A polynomial time 2-approximate algorithm is proposed to solve the problem. Considering the computational independence of differem pairs of nodes, a MapReduce algorithm is further developed to boost the efficiency of the process. Finally, extensive experiments are conducted on real network datasets to verify the effectiveness and efficiency of the proposed algorithm.

关 键 词:图数据 个性化PageRank 样性图排序 最大和k-dispersion MAPREDUCE 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象