检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:徐峥 乐小虬[1,2] Xu Zheng;Le Xiaoqiu(National Science Library,Chinese Academy of Sciences,Beijing 100190,China;Department of Library,Information and Archives Management,School of Economics and Management,University of Chinese Academy of Sciences,Beijing 100190,China)
机构地区:[1]中国科学院文献情报中心,北京100190 [2]中国科学院大学经济与管理学院图书情报与档案管理系,北京100190
出 处:《数据分析与知识发现》2021年第5期95-103,共9页Data Analysis and Knowledge Discovery
摘 要:【目的】将类目式文档中的类目单元表示成语义特征AND-OR逻辑表达式,使类目文档实现语义化表示,为类目语义匹配、语义检索等应用提供语义化数据。【方法】以类目单元描述/注释文本AND-OR逻辑语义标注数据为基础,利用UniLM模型,通过学习词性特征、显式AND-OR逻辑文本描述特征以及改进Beam Search搜索排序策略等方法构建Seq2Seq生成模型,解决类目单元内语义特征AND-OR逻辑表达式的生成问题。通过融合上下文层次语义,解决类目单元外部语义的扩展问题。【结果】在人工标注的国际专利分类表数据上展开实验,结果评价得分为87.2分,比基准模型(BiLSTM-Attention)高11.5分。【局限】适用于国际专利分类表中的类目数据特点,其泛化效果有待在其他领域数据中进一步验证。【结论】所提类目单元语义表示方法在国际专利分类表中有较好表现,能够有效生成融合类目单元内部语义特征及其上下文层次语义特征的AND-OR逻辑表达式。[Objective] The paper represents category unit of the categorical document as an AND-OR logical expression with semantic features, which provides data for category semantic matching and retrieval. [Methods]We constructed the seq2seq generation model using UniLM based on the AND-OR logical semantic annotation of category unit descriptions. This model learns the speech features and explicit AND-OR logical text features, to improve the sorting strategy of Beam Search. The proposed method could generate AND-OR logical expression of semantic features within category unit. By integrating context-level semantics, we extended the external semantics of category unit. [Results] We examined our method with the manually annotated International Patent Classification data. The evaluation score of the experimental result was 87.2 points, which was 11.5 points higher than the benchmark model(BiLSTM-Attention). [Limitations] More research is needed to examine the model’s performance with other datasets. [Conclusions] The proposed semantic representation method could effectively generate AND-OR logical expressions for patent data, which integrates the internal semantic features of category unit and the semantic features at the contextual level.
关 键 词:语义表示 语义解析 AND-OR逻辑 类目式文档
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.188.23.110