检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:舒蕾 郭懿鸾 王慧萍 张学涛[1,2] 胡韧奋 SHU Lei;GUO Yiluan;WANG Huiping;ZHANG Xuetao;HU Renfen(Institute of Chinese Information Processing,Beijing Normal University,Beijing 100875,China;Institute for Advanced Study of the Humanities and Religion,Beijing Normal University,Beijing 100875,China;College of Chinese Language and Culture,Beijing Normal University,Beijing 100875,China)
机构地区:[1]北京师范大学中文信息处理研究所,北京100875 [2]北京师范大学人文宗教高等研究院,北京100875 [3]北京师范大学汉语文化学院,北京100875
出 处:《中文信息学报》2022年第5期21-30,共10页Journal of Chinese Information Processing
基 金:国家自然科学基金(62006021);北京市社会科学基金青年学术带头人项目(21DTR037)。
摘 要:古汉语以单音节词为主,其一词多义现象十分突出,这为现代人理解古文含义带来了一定的挑战。为了更好地实现古汉语词义的分析和判别,该研究基于传统辞书和语料库反映的语言事实,设计了针对古汉语多义词的词义划分原则,并对常用古汉语单音节词进行词义级别的知识整理,据此对包含多义词的语料开展词义标注。现有的语料库包含3.87万条标注数据,规模超过117.6万字,丰富了古代汉语领域的语言资源。实验显示,基于该语料库和BERT语言模型,词义判别算法准确率达到80%左右。进一步地,该文以词义历时演变分析和义族归纳为案例,初步探索了语料库与词义消歧技术在语言本体研究和词典编撰等领域的应用。Due to the dominant monosyllabic words,polysemy is a challenge for modern people to understand the ancient Chinese.Based on the linguistic knowledge in traditional dictionaries,this paper designs the principles of semantic division of polysemous words in ancient Chinese,and categorizes the knowledge of popular monosyllabic words in ancient Chinese.With these guidelines,the annotated corpus has accumulated up to 38700 sentences with more than 1176000 Chinese characters.Experiments show that the accuracy of BERT based word sense disambiguation model trained on the corpus achieves about 80%.Furthermore,this paper explores the application of the corpus built and the technique of word sense disambiguation in the study of language ontology and dictionary compilation via diachronic evolution analysis of word meaning and the induction of sense families.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.133.157.170