检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:项威 刘文卓 王邦 XIANG Wei;LIU Wenzhuo;WANG Bang(School of Electronic Information and Communications,Huazhong University of Science and Technology,Wuhan Hubei 430074 China)
机构地区:[1]华中科技大学电子信息与通信学院,武汉430074
出 处:《计算机应用》2022年第S01期1-6,共6页journal of Computer Applications
基 金:国家自然科学基金资助项目(62172167)。
摘 要:针对现有文本标注工具中缺乏复杂类型标注功能和众包质量检测方法等问题,构建了一个基于Web的众包文本标注平台。一方面,平台采用浏览器/服务器(B/S)的开发架构和前后端分离的开发方式,实现了复杂类型文本标注的需求,提供序列标注、单标签标注、量级标签标注、多层次标签标注和嵌套文本标注等场景的文本标注功能;另一方面,还提出了一种基于监督数据的多数投票一致性检测方法,在随机注入的监督数据上计算标注参与者的标注能力,作为多数投票的权重,进行真值推断得到最终的标注结果。最后,进行了系统功能测试、系统性能测试和浏览器兼容性测试,测试结果表明该系统能够满足复杂类型文本标注的需求,所提出的一致性检测方法能够筛选出高质量的标注内容反馈给用户。提供了一个高效便捷的众包文本标注平台,以构建高质量的文本语料库,助力自然语言处理(NLP)相关任务的研究,并已部署在服务器上,互联网用户可直接通过浏览器访问。As existing text annotation tools are insufficient in complex type text annotation and crowdsourcing quality testing,a Web-based crowdsourcing text annotation platform was built.On one hand,Browser/Server(B/S)development architecture was adopted,and the front-end and back-end of systems were developed individually.The requirements of various complex type text annotation was realized,including sequence annotation,single annotation,magnitude annotation,multi-level annotation and nested text annotation.On the other hand,a majority vote algorithm was proposed to calculate the annotation abilities of annotation participants on random selected supervised data,as the weights of majority vote for consistent detection and annotation quality monitoring.Finally,the system functional test,performance test and browser compatibility test were conducted.Test results show that the text annotation system can meet the complex type text annotation requirements,and the proposed consistent detection algorithm can select high quality annotation content for users.An efficient and convenient crowdsourcing text annotation platform was provided to build a high-quality corpus and facilitate the research of Natural Language Processing(NLP)tasks.It was deployed on the server and could be accessed through major browsers.
关 键 词:文本标注 自然语言处理 众包 WEB 一致性检测
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49