检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张颖怡 章成志[2] Zhang Yingyi;Zhang Chengzhi(Department of Archives and E-government,School of Social Science,Soochow University,Suzhou 215123;Department of Information Management,School of Economics and Management,Nanjing University of Science and Technology,Nanjing 210094)
机构地区:[1]苏州大学社会学院档案与电子政务系,苏州215123 [2]南京理工大学经济管理学院信息管理系,南京210094
出 处:《情报学报》2024年第6期712-732,共21页Journal of the China Society for Scientific and Technical Information
基 金:国家自然科学基金项目“基于学术文献全文内容的细粒度算法实体抽取与评估研究”(72074113)。
摘 要:研究问题和方法是学术论文中的重要组成部分,其在学术论文组织、管理与检索以及科研成果评价中具有重要意义。为缓解研究问题与方法识别中存在的公式化表达依赖和词语边界识别错误等问题,本文提出一种联合公式化表达脱敏和边界识别加强的模型。具体地,公式化表达脱敏使用数据增强方法实现,边界识别加强使用指针网络与序列标注模型实现。随着学术论文的开放获取,学术论文全文被研究者用于实体识别任务中。为证明使用学术论文全文的必要性,本文人工构建了自然语言处理领域的摘要和全文标注数据集,同时设计了数值和内容指标,用于分析两类数据集中的问题和方法识别结果以及问题与方法关系对抽取结果的差异。十折交叉实验结果表明,本文模型的宏平均F1值优于SciBERT-BiLSTM-CRF基线模型3.69个百分点且存在显著性差异。根据摘要与全文实体识别和关系对抽取结果的对比,发现摘要中包含的问题与方法实体的表意较宽泛,全文中具有更多描述模型设计和训练细节的实体和关系对。Problems and methods are crucial components of scientific papers and play a significant role in the organiza‐tion,management,retrieval,and evaluation of scientific papers.To alleviate the formulaic expression dependency and word boundary recognition errors in problem and method recognition methods,we propose a model combined with formu‐laic expression desensitization and enhanced boundary recognition.Specifically,formulaic expression desensitization is achieved through data augmentation methods,whereas boundary enhancement utilizes pointer networks and sequence la‐beling models.With open access to scientific papers,researchers are utilizing full-text papers for entity recognition tasks.To demonstrate the importance of using full-text papers,this paper manually constructs an abstract and full-text annotated dataset in the field of natural language processing.Numerical and content-based metrics are designed to compare the prob‐lem,method,and their relationship extracted from two datasets.The results of ten-fold cross-validation experiments indi‐cate that the proposed model outperforms baseline models such as SciBERT-BiLSTM-CRF significantly,with a macro-av‐erage F1 score improvement of 3.69 percentage points.When comparing entity recognition and relationship extraction re‐sults between abstracts and full texts,this paper shows that problem and method entities in abstracts have a broader seman‐tic representation,whereas full texts contain more detailed entities and relationships that describe model design and train‐ing procedures.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.117.9.230