检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:丁玥 郭雨荷 卢卫 李海翔 张美慧 李晖 潘安群 杜小勇 Yue Ding;Yu-He Guo;Wei Lu;Hai-Xiang Li;Mei-Hui Zhang;Hui Li;An-Qun Pan;Xiao-Yong Du(Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education,Renmin University of China Beijing 100872,China;School of Information,Renmin University of China,Beijing 100872,China;Tencent(Beijing)Technology Company Limited,Beijing 100080,China;School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China;College of Computer Science and Technology,Guizhou University,Guiyang 550025,China;Tencent(Shenzhen)Technology Company Limited,Shenzhen 518057,China)
机构地区:[1]Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education,Renmin University of China Beijing 100872,China [2]School of Information,Renmin University of China,Beijing 100872,China [3]Tencent(Beijing)Technology Company Limited,Beijing 100080,China [4]School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China [5]College of Computer Science and Technology,Guizhou University,Guiyang 550025,China [6]Tencent(Shenzhen)Technology Company Limited,Shenzhen 518057,China
出 处:《Journal of Computer Science & Technology》2023年第4期927-946,共20页计算机科学技术学报(英文版)
基 金:supported by the National Key Research and Development Program of China under Grant No.2020YFB2104100;the National Natural Science Foundation of China under Grant Nos.61972403 and U1711261;the Fundamental Research Funds for the Central Universities of China,the Research Funds of Renmin University of China,and Tencent Rhino-Bird Joint Research Program.
摘 要:Identifying semantic types for attributes in relations,known as attribute semantic type(AST)identification,plays an important role in many data analysis tasks,such as data cleaning,schema matching,and keyword search in databases.However,due to a lack of unified naming standards across prevalent information systems(a.k.a.information islands),AST identification still remains as an open problem.To tackle this problem,we propose a context-aware method to figure out the ASTs for relations in this paper.We transform the AST identification into a multi-class classification problem and propose a schema context aware(SCA)model to learn the representation from a collection of relations associated with attribute values and schema context.Based on the learned representation,we predict the AST for a given attribute from an underlying relation,wherein the predicted AST is mapped to one of the labeled ASTs.To improve the performance for AST identification,especially for the case that the predicted semantic types of attributes are not included in the labeled ASTs,we then introduce knowledge base embeddings(a.k.a.KBVec)to enhance the above representation and construct a schema context aware model with knowledge base enhanced(SCA-KB)to get a stable and robust model.Extensive experiments based on real datasets demonstrate that our context-aware method outperforms the state-of-the-art approaches by a large margin,up to 6.14%and 25.17%in terms of macro average F1 score,and up to 0.28%and 9.56%in terms of weighted F1 score over high-quality and low-quality datasets respectively.
关 键 词:attribute semantic type(AST)identification CONTEXT-AWARE semantic embedding knowledge base embedding
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.15.3.240