基于多类型文本的英语关系从句自动提取和标注研究  

Automatic Extraction and Annotation of English Relative Clauses in Multitype Texts

在线阅读下载全文

作  者:李金满 张婷玉 LI Jinman;ZHANG Tingyu

机构地区:[1]上海财经大学外国语学院,上海200433

出  处:《外语与外语教学》2023年第3期85-96,148,共13页Foreign Languages and Their Teaching

基  金:国家社科基金项目“类型学特色构式的加工共性研究”(项目编号:18BYY008)的阶段性研究成果。

摘  要:关系从句在理论语言学、心理语言学、计算语言学、语言习得与教学等研究领域广受关注,但因人工提取和标注耗时且易错,严重限制了研究规模。新近问世的AutoSubClause计算机程序,通过使用依存句法分析器,能够实现英语多种从句自动提取及标注,有助于解决这一问题,但其具体处理效果犹未可知。本文基于不同类型语料文本,包括英语本族语者、学习者、译者产出的政论、文学类书面语和口语语料文本,系统评估该程序自动提取英语关系从句的准确率,并进一步考察该程序对关系从句各类特性的自动标注效果,包括可及性、生命性、限制性等。研究结果显示,该程序自动提取英语本族语和译文文本中关系从句的召回率和精确率较高,但对学习者文本的提取效果还有待改进;在关系从句特性的自动标注方面,该程序对名词生命性、从句限制性以及核心名词在从句中充当成分等的识别精确率总体表现优秀。针对程序存在的问题和不足,论文进行了分析并提出了改进建议。Issues concerning relative clauses have attracted considerable attention in research areas such as theoretical linguistics,psycholinguistics,computational linguistics,and language acquisition and teaching.However,the size of research in previous studies was limited because it is time-consuming and error prone for researchers to extract and annotate relative clauses manually.To address this issue,a recent computer program named Auto Sub Clause,using dependency parsing,was developed to automatically extract and annotate different types of English subordinate clauses,but its performance remains unknown.In this study,we evaluate the accuracy of the program based on different types of texts including political and literary,written and spoken ones,produced by native speakers,learners,or translators,respectively.We also assess the reliability of its annotation of linguistic features such as accessibility,animacy,and restrictiveness.Results revealed an overall high performance on the extraction of relative clauses from native and translated texts,but the precision for learner's texts need to be improved.In addition,the program demonstrated a high precision in the automatic annotation of linguistic features such as animacy,restrictiveness,and the roles of head nouns in relative clauses.Limitations of the program are discussed and suggestions for improvement are provided.

关 键 词:AutoSubClause 关系从句 自动提取及标注 多类型文本 

分 类 号:H0[语言文字—语言学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象