检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:郭晓 陈艳平 唐瑞雪 黄瑞章[1,2] 秦永彬[1,2] GUO Xiao;CHEN Yanping;TANG Ruixue;HUANG Ruizhang;QIN Yongbin(State Key Laboratory of Public Big Data,Guizhou University,Guiyang 550025,China;College of Computer Science and Technology,Guizhou University,Guiyang 550025,China;College of Information,Guizhou University of Finance and Economics,Guiyang 550025,China)
机构地区:[1]贵州大学公共大数据国家重点实验室,贵阳550025 [2]贵州大学计算机科学与技术学院,贵阳550025 [3]贵州财经大学信息学院,贵阳550025
出 处:《计算机工程与应用》2023年第22期144-150,共7页Computer Engineering and Applications
基 金:国家自然科学基金(62166007);贵州省自然科学基金(黔科合基础ZK[2022]027);贵州省教育厅青年科技人才成长项目(黔教合KY字[2022]205号)。
摘 要:识别谓语中心词是理解句子的关键,对于分析汉语结构具有重要意义。汉语结构松散导致谓语中心词识别困难,成为中文信息处理中的难点问题。由于单个句子中只有一个谓语中心词,枚举跨度将会产生大量负样本,导致正负样本不平衡。谓语中心词及高度重叠的负例样本之间共享相同的上下文,语义相近,容易产生误报。为了解决这些问题,提出一种基于边界回归的谓语中心词识别方法。首先识别谓语中心词的边界,然后通过边界组合生成跨度,从而减少跨度负样本的数量并且降低计算的复杂度。通过边界回归模块,更新跨度在句子中相当于谓语中心词的位置,提高跨度边界的准确性。通过增加约束策略,输出唯一的谓语中心词。实验结果显示,该模型的F值达到了84.41%,验证了该模型识别谓语中心词的有效性。In Chinese,the identification of predicate head is the key to understand sentence,plays an important part in analyzing sentence structure.Then,with loose structure in Chinese,the identification of predicate head is a hard nut in information processing.Because there is merely one predicate head in a sentence,large negative samples are generated by enumeration span,which gives rise to unbalances in positive and negative samples.In addition,the same context is shared by both the predicate head and the highly overlapping negative samples,so similar semanteme is easy to cause identification errors.To solve the above problems,this paper presents a method of predicate head identification based on boundary regression.Firstly,the boundary of the predicate head is identified.Then,a span is created by identifying the boundary,which helps to reduce the number of span negative samples and the computational amount.Secondly,by updating the same position of the span as the predicate head in the sentence,the accuracy of the span boundary is improved.Additionally,the unique predicate head is output through adding the constraint strategy.Experimental results show that the F value of the model reaches 84.41%,which verifies the effectiveness of the model in identifying predicate head.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.200