Improving Entity Linking in Chinese Domain by Sense Embedding Based on Graph Clustering  被引量:1

在线阅读下载全文

作  者:张照博 钟芷漫 袁平鹏 金海 Zhao-Bo Zhang;Zhi-Man Zhong;Ping-Peng Yuan;Hai Jin(National Engineering Research Center for Big Data Technology and System,Huazhong University of Science and Technology Wuhan 430074,China;Service Computing Technology and System Laboratory,Huazhong University of Science and Technology Wuhan 430074,China;Cluster and Grid Computing Laboratory,Huazhong University of Science and Technology,Wuhan 430074,China;School of Computer Science and Technology,Huazhong University of Science and Technology,Wuhan 430074,China)

机构地区:[1]National Engineering Research Center for Big Data Technology and System,Huazhong University of Science and Technology Wuhan 430074,China [2]Service Computing Technology and System Laboratory,Huazhong University of Science and Technology Wuhan 430074,China [3]Cluster and Grid Computing Laboratory,Huazhong University of Science and Technology,Wuhan 430074,China [4]School of Computer Science and Technology,Huazhong University of Science and Technology,Wuhan 430074,China

出  处:《Journal of Computer Science & Technology》2023年第1期196-210,共15页计算机科学技术学报(英文版)

基  金:supported by the National Natural Science Foundation of China under Grant Nos.61932004 and 62072205.

摘  要:Entity linking refers to linking a string in a text to corresponding entities in a knowledge base through candidate entity generation and candidate entity ranking.It is of great significance to some NLP(natural language processing)tasks,such as question answering.Unlike English entity linking,Chinese entity linking requires more consideration due to the lack of spacing and capitalization in text sequences and the ambiguity of characters and words,which is more evident in certain scenarios.In Chinese domains,such as industry,the generated candidate entities are usually composed of long strings and are heavily nested.In addition,the meanings of the words that make up industrial entities are sometimes ambiguous.Their semantic space is a subspace of the general word embedding space,and thus each entity word needs to get its exact meanings.Therefore,we propose two schemes to achieve better Chinese entity linking.First,we implement an ngram based candidate entity generation method to increase the recall rate and reduce the nesting noise.Then,we enhance the corresponding candidate entity ranking mechanism by introducing sense embedding.Considering the contradiction between the ambiguity of word vectors and the single sense of the industrial domain,we design a sense embedding model based on graph clustering,which adopts an unsupervised approach for word sense induction and learns sense representation in conjunction with context.We test the embedding quality of our approach on classical datasets and demonstrate its disambiguation ability in general scenarios.We confirm that our method can better learn candidate entities’fundamental laws in the industrial domain and achieve better performance on entity linking through experiments.

关 键 词:natural language processing(NLP) domain entity linking computational linguistics word sense disambiguation knowledge graph 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象