A comprehensive review of existing corpora and methods for creating annotated corpora for event extraction tasks  

在线阅读下载全文

作  者:Mohd Hafizul Afifi Abdullah Norshakirah Aziz Said Jadid Abdulkadir Kashif Hussain Hitham Alhussian Noureen Talpur 

机构地区:[1]Centre for Cyber-Physical Systems(C2PS),Universiti Teknologi PETRONAS,32610 Seri Iskandar,Malaysia [2]Centre for Research in Data Science(CeRDaS),Universiti Teknologi PETRONAS,32610 Seri Iskandar,Malaysia [3]Department of Science and Engineering,Solent University,SO140YN,UK

出  处:《Journal of Data and Information Science》2024年第4期196-238,共43页数据与情报科学学报(英文版)

摘  要:Purpose:The purpose of this study is to serve as a comprehensive review of the existing annotated corpora.This review study aims to provide information on the existing annotated corpora for event extraction,which are limited but essential for training and improving the existing event extraction algorithms.In addition to the primary goal of this study,it provides guidelines for preparing an annotated corpus and suggests suitable tools for the annotation task.Design/methodology/approach:This study employs an analytical approach to examine available corpus that is suitable for event extraction tasks.It offers an in-depth analysis of existing event extraction corpora and provides systematic guidelines for researchers to develop accurate,high-quality corpora.This ensures the reliability of the created corpus and its suitability for training machine learning algorithms.Findings:Our exploration reveals a scarcity of annotated corpora for event extraction tasks.In particular,the English corpora are mainly focused on the biomedical and general domains.Despite the issue of annotated corpora scarcity,there are several high-quality corpora available and widely used as benchmark datasets.However,access to some of these corpora might be limited owing to closed-access policies or discontinued maintenance after being initially released,rendering them inaccessible owing to broken links.Therefore,this study documents the available corpora for event extraction tasks.Research limitations:Our study focuses only on well-known corpora available in English and Chinese.Nevertheless,this study places a strong emphasis on the English corpora due to its status as a global lingua franca,making it widely understood compared to other languages.Practical implications:We genuinely believe that this study provides valuable knowledge that can serve as a guiding framework for preparing and accurately annotating events from text corpora.It provides comprehensive guidelines for researchers to improve the quality of corpus annotations,especially for e

关 键 词:Information extraction Event extraction Text mining Large language model Natural language processing 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象