检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:汪凯 梁宇腾 张玉洁[1] 徐金安[1] 陈钰枫[1] WANG Kai;LIANG Yuteng;ZHANG Yujie;XU Jinan;CHEN Yufeng(Beijing Jiaotong University,Beijing 100044,China)
机构地区:[1]北京交通大学计算机与信息技术学院,北京100044
出 处:《情报工程》2022年第3期68-80,共13页Technology Intelligence Engineering
基 金:国家自然科学基金(61876198,61976016)。
摘 要:[目的/意义]汉语分词、词性标注和依存句法分析作为汉语自然语言处理的三大基本任务发挥着至关重要的作用。基于转移的三个任务联合模型曾经取得最好精度,但是随着神经网络和计算能力的发展,具有全局信息建模能力的图模型,在单任务和两个任务上已经超过转移模型。如何在基于图模型下联合三个任务,进一步提升精度成为新的挑战。[方法/过程]本文提出一种基于图的汉语分词、词性标注和依存句法分析的联合模型,通过设计统一的字级别标签实现三个任务的联合,并采用预训练语言模型融合上下文信息的字表示方法和基于双仿射注意力机制的评分函数。本文也设计了联合模型的解法算法用于三个任务的解码。[结果/结论]实验结果表明,本文词性标注任务的引入方式可以建模词性与分词以及词性与依存句法分析之间的关系,从而带来其他两个任务上精度的提升。与目前精度最好的Yan[1]工作相比,在三个任务上都取得最好精度。[Objective/Significance]Chinese word segmentation,POS tagging and dependency parsing play a vital role.The three-task transition-based joint model has achieved the best accuracy,but with the development of neural networks and computing capabilities,the graph-based model with global information modeling capabilities has surpassed the transition-based model in single-task and two-tasks.How to combine the three tasks based on the graph-based framework to further improve the accuracy has become a new challenge.[Methods/Process]This paper proposes a joint model of three tasks based on graph-based framework.The combination is realized by designing unified character-level tags,and the character context representation method based on pre-training language model(e.g.BERT).The scoring function implemented by the biaffine attention mechanism.This paper also designs the solution algorithm of the joint model for the decoding of the three tasks.[Results/Conclusions]The experimental results show that,the introduction of the POS tagging can better model relationship between part-of-speech and word segmentation,as well as between part-of-speech and dependency parsing,so as to improve the accuracy of the other two tasks.Compared with the Yan work[1],the best performance is achieved on the three tasks.
分 类 号:G35[文化科学—情报学] TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49