“大地语料库”中日古籍引用关系挖掘研究--以《论语》在日本汉文中的引用为例  

Research on Mining the Citation Relationships between Ancient Chinese and Japanese Texts Based on“DaDi Corpus”Data:Take the quotation of The Analects of Confucius in Japanese Chinese Texts as an Example

在线阅读下载全文

作  者:熊伟 王鼎 XIONG Wei;WANG Ding

机构地区:[1]苏州大学外国语学院,江苏苏州215006

出  处:《东北亚外语研究》2024年第4期46-62,共17页Foreign Language Research in Northeast Asia

基  金:国家社会科学基金重点项目“日本汉字词语料库建设与研究”(19AYY020);苏州大学2024年“莙政基金”项目“中华经典的异域传播——基于大数据的日本汉诗里的中国典籍引用研究”(苏大教[2024]53号)的阶段性成果。

摘  要:中日古籍间引用关系的挖掘,对研究中日间词汇、思想、文化等的传播都有重要意义。作为大地语料库中日古籍引用关系挖掘与检索功能开发建设的一环,本文基于该语料库所收中日古代汉文数据,测试五种基于字符串的算法,优选2-gram重叠算法并优化为2-gram相似比等算法。通过该算法,对《论语》全十卷在日本古代汉文中的引用情况进行了全文挖掘。验证了大地语料库数据在中日古籍引用挖掘中的可用性及2-gram相似比算法的有效性,为数字人文下中日古籍的引用挖掘提供实用的数据集与方法参考。The mining of citation relationships between Chinese and Japanese ancient texts holds significant importance for studying the transmission of vocabulary,ideas,and culture between China and Japan.As part of the development and construction of the citation relationship mining and retrieval function for ancient Chinese and Japanese texts in the DaDi Corpus,this study,based on the data of ancient Chinese texts from both China and Japan collected in this corpus,tests five string-based algorithms.Among them,the 2-gram overlap algorithm is selected as the optimal choice and further optimized into a 2-gram similarity ratio algorithm.This algorithm is then applied to conduct a full-text mining of the citations of The Analects of Confucius by ancient Chinese texts in Japan.The study verifies the usability of DaDi Corpus data in the mining of citations from ancient Chinese and Japanese texts,as well as the effectiveness of the 2-gram similarity ratio algorithm,providing practical datasets and methodological references for the mining of citations from ancient Chinese and Japanese texts within the realm of digital humanities.

关 键 词:引用挖掘 文本相似度 中日古籍 大地语料库 2-gram相似比 

分 类 号:H0[语言文字—语言学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象