汉语方言自动聚类与分区及相关计算方法  

Automatic Clustering and Division of Chinese Dialects and Related Computational Methods

在线阅读下载全文

作  者:江荻[1,2] Jiang Di

机构地区:[1]中国社会科学院民族学与人类学研究所 [2]江苏师范大学语言科学与艺术学院.

出  处:《复印报刊资料(语言文字学)》2022年第8期97-110,共14页LINGUISTICS AND PHILOLOGY

基  金:国家社会科学基金重大项目“中国民族语言大规模语法标注文本在线检索系统研制与建设研究”(21&ZD304)。

摘  要:本文回顾了学界对汉语方言之间相互关系的三种计量方法:特征统计、词源统计和词汇相似度计量,指出这三种计量方法采用的是非整体的、语音和词汇上受限的考察方法。文章阐述了一种更适用的计算模型,即Levenshtein Distance算法(莱文斯坦距离,或称编辑距离),该方法对语言或方言之间线性字符串的语音相似性和词汇对应性具有协调功能,并蕴含特征比对和词源概率效用。本文自动分区实验汇集了南方吴、闽、粤、湘、客、赣、徽、淮8个分区的78个方言,官话方言有东北、北京、冀鲁、胶辽、中原、兰银、西南108个方言,共计186个汉语方言点。每个方言收集了斯瓦迪士100个基本词,并对方言之间展开相似性计算。计算结果与传统分区基本一致,但更为精准。This paper reviews three measuring methods of the relationships between Chinese dialects:feature statistics,etymological statistics and lexical similarity measures,pointing out that these three measures employ a non-holistic,phonetically and lexically constrained methods of examination.This paper expounds a more applicable calculation model,the Levenshtein Distance algorithm(or Edit Distance),which has an integrated and coordinated function for phonological similarity and lexical correspondence of linear strings between languages or dialects,and implies feature comparison and etymological probability utilities.The automatic dialect classifying experiments in this paper collect 78 dialects from eight districts of Wu,Min,Yue,Xiang,Ke,Gan,Hui and Huai in the South China,and 108 dialects from eight divisions of Mandarin,namely Dialects of Dongbei,Beijing,Ji-lu,Jiao-Liao,Zhongyuan,Lan-Yin,Xinan and Jin Dialect,for a total of 186 Chinese dialects.Swadesh's 100 basic words were collected for each dialect and similarity calculations were carried out between the dialects.The calculation results are basically consistent with the traditional partitioning,but more precise.

关 键 词:汉语方言 聚类算法 莱文斯坦距离 自动分区 

分 类 号:H17[语言文字—汉语] TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象