EnAli:entity alignment across multiple heterogeneous data sources  被引量:2

在线阅读下载全文

作  者:Chao KONG Ming GAO Chen XU Yunbin FU Weining QIAN Aoying ZHOU 

机构地区:[1]School of Data Science and Engineering,East China Normal University,Shanghai 200062,China [2]Technische Universitat Berlin,Berlin 10623,Germany

出  处:《Frontiers of Computer Science》2019年第1期157-169,共13页中国计算机科学前沿(英文版)

基  金:the National Key Research and Development Program of China (2016YFB1000905);the National Natural Science Foundation of China (Grant Nos.U1401256, 61402177,61672234,61402180 and 61232002);NSF of Shanghai (14ZR1412600).

摘  要:Entity alignment is the problem of identifying which entities in a data source refer to the same real-world entity in the others.Identifying entities across heterogeneous data sources is paramount to many research fields,such as data cleaning,data integration,.information retrieval and machine learning.The aligning process is not only overwhelmingly expensive for large data sources since it involves all tuples from two or more data sources,but also need to handle heterogeneous entity attributes.In this paper,we propose an unsupervised approach,called EnAli,to match entities across two or more heterogeneous data sources.EnAli employs a generative probabilistic model to incorporate the heterogeneous entity attributes via employing exponential family,handle missing values,and also utilize the locality sensitive hashing schema to reduce the candidate tuples and speed up the aligning process.EnAli is highly accurate and efficient even without any ground-truth tuples.We illustrate the performance of EnAli on re-identifying entities from the same data source,as well as aligning entities across three real data sources.Our experimental results manifest that our proposed approach outperforms the comparable baseline.

关 键 词:ENTITY ALIGNMENT EXPONENTIAL family LOCALITY sensitive HASHING EM-algofithm 

分 类 号:TP[自动化与计算机技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象