基于文献信息网络语义特征的相似性搜索  被引量:4

Similarity search based on semantic features of bibliographic information network

在线阅读下载全文

作  者:邱庆羽 李婧 全兵 童超 张利君 张海仙[1] QIU Qingyu;LI Jing;QUAN Bing;TONG Chao;ZHANG Lijun;ZHANG Haixian(School of Computer Science,Sichuan University,Chengdu Sichuan 610065,China;China Mobile(Suzhou)Software Technology Company Limited,Suzhou Jiangsu 215000,China;Chengdu Ruibeiyingte Information Technology Company Limited,Chengdu Sichuan 610041,China)

机构地区:[1]四川大学计算机学院,成都610065 [2]中移(苏州)软件技术有限公司,江苏苏州215000 [3]成都瑞贝英特信息技术有限公司,成都610041

出  处:《计算机应用》2018年第5期1327-1333,1352,共8页journal of Computer Applications

基  金:教育部-中国移动科研基金资助项目(MCM20160307);四川省科技创新苗子工程项目;成都市科技局国际合作项目(2016-GH02-00048-HZ;2015-GH02-00041-HZ)~~

摘  要:文献信息网络是典型的异构信息网络,基于其进行相似性搜索是图挖掘领域的一个研究热点。然而,现有的方法主要采用元路径或元结构的方式,并未考虑节点自身的语义特征,从而导致搜索结果出现偏差。对此,基于文献信息网络提出了一种基于向量的语义特征提取方法,并设计实现了基于向量的节点相似性计算方法 VSim;此外,结合元路径设计了基于语义特征的相似性搜索算法VPSim;为提高算法的执行效率,针对文献网络数据的特点,设计了剪枝策略。通过在真实数据上的实验,验证了VSim对搜索语义特征相似实体的适用性,以及VPSim算法的有效性、高执行效率和高可扩展性。Bibliography information network is a typical heterogeneous information network and the similarity search based on it is a hot topic of graph mining. However, current methods mainly adopt meta path or meta structure to search similar objects, do not consider semantic features of node itself which leads to a deviation in the search results. To fill this gap, a vector-based semantic feature extraction method was proposed, and a vector-based node similarity calculation method called VSim was designed and implemented. In addition, a similarity search algorithm VPSim( Similarity computation Based on Vector and meta Path) based on semantic features was designed by combining the meta-paths. In order to improve the execution efficiency of the algorithm, a pruning strategy based on the characteristics of bibliographic network data was designed. Experiments on real-world data sets demonstrate that VSim is applicative for searching entities with similar semantic features and VPSim is effective, efficient and extensible.

关 键 词:文献信息网络 相似性搜索 图挖掘 元路径 语义特征 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象