检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:香前 才藏太[1,2,3] 李措 Xiangqian;Caizang-Tai;LI Cuo(School of Computer Science,Qinghai Normal University,Xining 810016,China;Key Laboratory of Tibetan Information Processing,Ministry of Education,Xining 810008,China;The State Key Laboratory of Tibetan Intelligent Information Processing and Application,Xining 810008,China)
机构地区:[1]青海师范大学计算机学院,青海西宁810016 [2]藏文信息处理教育部重点实验室,青海西宁810008 [3]省部共建藏语智能信息处理及应用国家重点实验室,青海西宁810008
出 处:《高原科学研究》2024年第4期108-114,共7页Plateau Science Research
基 金:国家社会科学基金项目(23BYY078)。
摘 要:新闻要素识别是从新闻文本中提取时间、地点、人物、组织机构、事件等关键信息实体的过程,是新闻内容分析的基础。文章将藏文新闻要素分类细化为10类,并提出一种基于RoBERTa-BiLSTM-CRF的藏文新闻要素识别方法。该方法首先通过RoBERTa预训练语言模型对藏文新闻文本进行编码,然后通过BiLSTM和自注意力机制进行特征提取,最后采用条件随机场进行序列标注,完成对新闻要素的识别和分类。在自建数据集(Tibetan news)上进行实验后F1值达到88.8%。News element recognition is a process of extracting key information entities such as time,location,people,organizations,and events from news texts,serving as the foundation for news content analysis.While sig-nificant progress has been made for Chinese news element recognition,few studies have been conducted for Ti-betan news and the existing element classification systems are rather coarse,making it difficult to comprehensive-ly cover various key information in Tibetan news reports.Therefore,in this paper,the element classification of Ti-betan news is refined into 10 categories.Meanwhile,addressing the challenges in Tibetan news texts such as un-clear word boundaries,numerous out-of-vocabulary words,and word polysemy,we propose a Tibetan news ele-ment recognition method based on RoBERTa-BiLSTM-CRF.This method first encodes Tibetan news texts using the RoBERTa pre-trained language model,then extracts features through BiLSTM and self-attention mecha-nism,and finally employs conditional random fields for sequence labeling to complete the recognition and classi-fication of news elements.Experiments conducted on our self-built dataset(Tibetan news)demonstrate the effec-tiveness of this method,achieving an F1 score of 88.8%.
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222