检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:德吉措 安见才让[1,2,3] De Jicuo;Anjian Cairang(School of Computer Science,Qinghai Minzu University,Xining 810007,China;Qinghai Key Laboratory of Tibetan Information Processing and Machine Translation,Xining 810007,China;State Key Laboratory of Intelligent Information Processing and Application of Tibetan Language,Jointly Established by the Ministry of Provincial Affairs,Xining 81007,China)
机构地区:[1]青海民族大学计算机学院,西宁810007 [2]青海省藏文信息处理与机器翻译重点实验室,西宁810007 [3]省部共建藏语智能信息处理及应用国家重点实验室,西宁810007
出 处:《青海科技》2024年第1期81-86,107,共7页Qinghai Science and Technology
摘 要:作为实体关系抽取研究的重要基础,构建高质量、标准化的语料库能够提高实体关系抽取任务的精确度和召回率。目前,藏文关系抽取语料库构建大多依靠传统人工标注方法且局限于特定领域,存在标注效率低且人物关系语料库相对缺乏的问题。文章构建了藏文人名实体识别语料库;通过分析人物关系特征和实体关系类别及其标注规范,构建触发词词典进行语料回标,生成15400条实体识别和8000条藏文人物关系抽取标注语料。为验证语料库的可用性,利用命名实体识别和关系抽取实验进行统计分析,其实体识别F1值达到67.2%,关系抽取F1值达到66.2%,结果表明该语料库的构建对后续面向藏文人物关系抽取研究提供了数据基础。As the important foundation of entity relationship extraction research,the construction of a high-quality,standardized corpus can improve the precision and recall of the entity relationship extraction task.At present,the construction of Tibetan relationship extraction corpus mostly relies on traditional manual annotation methods and is limited to specific domains,which has the problems of low annotation efficiency and relative lack of person relationship corpus.Therefore,this paper constructs a Tibetan person-entity recognition corpus;by analyzing person-relationship features and entity-relationship categories and their annotation specifications,and constructing a trigger word dictionary for corpus back-labeling,it generates 15400 entity-recognition and 8000 Tibetan person-relationship extraction annotated corpora.In order to verify the usability of the corpus,the named entity recognition and relationship extraction experiments are utilized for statistical analysis,and its entity recognition F1 value reaches 67.2%,and its relationship extraction F1 value reaches 66.2%,which shows that the construction of this corpus provides a data basis for the subsequent research oriented to the Tibetan character relationship extraction.
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.143.7.5