基于中文的亚洲英语共同体语料库的构建  

Construction of a Chinese-based Corpus of Asian English Community

在线阅读下载全文

作  者:叶星妤 潘孝新 秦晓惠[2] 王龙 黄超 罗熊[1] YE Xing-yu;PAN Xiao-xin;QIN Xiao-hui;WANG Long;HUANG Chao;LUO Xiong(School of Computer and Communication Engineering,University of Science and Technology Beijing,Beijing 100083,China;School of Foreign Languages,University of Science and Technology Beijing,Beijing 100083,China)

机构地区:[1]北京科技大学计算机与通信工程学院,北京100083 [2]北京科技大学外国语学院,北京100083

出  处:《计算机技术与发展》2024年第11期180-185,共6页Computer Technology and Development

基  金:国家自然科学基金(62202044)。

摘  要:基于中文的亚洲英语共同体是中华文化的承载介体,是人类命运共同体的基本通用语之一。然而,缺乏大量的真实可信数据、科学的数据挖掘与自然语言处理方法,已成为制约基于中文的亚洲英语研究发展的关键技术问题。在分析相关研究现状的基础上,设计并实现了一个大数据驱动的基于中文的亚洲英语语料库并通过Web开发实现在线检索服务(Corpus of Chinese-based Asian English,CCbAE)。这是一个由六个基于中文的英语变体(中国内地英语、中国香港英语、中国台湾英语、中国澳门英语、新加坡英语、马来西亚英语)组合而成的大规模语料库。首先,简要说明了系统的总体架构和数据库构建。其次,结合Web可视化界面着重介绍了语料库的六大功能,分别为词频统计、特征展示、词汇变异、形态变异、句法变异、词义变异。该系统的设计与实现为不同层次的用户体,提供简捷易用的基于中文的亚洲英语语料检索服务。The Chinese-based Asian English community is the carrier of Chinese culture and one of the basic lingua francas of a community with a shared future for mankind.However,the lack of a large amount of real and credible data,scientific data mining and natural language processing methods has become a key technical problem restricting the development of Chinese-based Asian English research.Based on analyzing the status quo of relevant research,a large data-driven Chinese-based Asian English corpus is designed and implemented,and online retrieval service(Corpus of Chinese-based Asian English,CCbAE) is realized through Web development.This is a large-scale corpus composed of six Chinese-based English variants(China's Mainland English,Chinese Hong Kong English,Chinese Taiwan English,Chinese Macao English,Singapore English,and Malaysian English).Firstly,the general architecture and database construction of the system are briefly explained.Secondly,the six major functions of the corpus are introduced in combination with the Web visual interface,which are word frequency statistics,feature display,lexical variation,morphological variation,syntactic variation,and the variation of meaning.The design and implementation of this system provides a simple and easy-to-use the Corpus of Chinese-based Asian English retrieval service for users of different levels.

关 键 词:语料库 亚洲英语 大数据 语言检索 自然语言处理 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象