Language clustering with word co-occurrence networks based on parallel texts  被引量:6

Language clustering with word co-occurrence networks based on parallel texts

在线阅读下载全文

作  者:LIU HaiTao CONG Jin 

机构地区:[1]School of International Studies,Zhejiang University

出  处:《Chinese Science Bulletin》2013年第10期1139-1144,共6页

基  金:supported by the National Social Science Foundation of China (09BYY024 and 11&ZD188)

摘  要:This study investigates the feasibility of applying complex networks to fine-grained language classification and of employing word co-occurrence networks based on parallel texts as a substitute for syntactic dependency networks in complex-network-based language classification.14 word co-occurrence networks were constructed based on parallel texts of 12 Slavic languages and 2 non-Slavic languages,respectively.With appropriate combinations of major parameters of these networks,cluster analysis was able to distinguish the Slavic languages from the non-Slavic and correctly group the Slavic languages into their respective sub-branches.Moreover,the clustering could also capture the genetic relationships of some of these Slavic languages within their sub-branches.The results have shown that word co-occurrence networks based on parallel texts are applicable to fine-grained language classification and they constitute a more convenient substitute for syntactic dependency networks in complex-network-based language classification.This study investigates the feasibility of applying complex networks to fine-grained language classification and of employing word co-occurrence networks based on parallel texts as a substitute for syntactic dependency networks in complex-network-based language classification. 14 word co-occurrence networks were constructed based on parallel texts of 12 Slavic languages and 2 non-Slavic languages, respectively. With appropriate combinations of major parameters of these networks, cluster analysis was able to distinguish the Slavic languages from the non-Slavic and correctly group the Slavic languages into their respective sub-branches. Moreover, the clustering could also capture the genetic relationships of some of these Slavic languages within their sub-branches. The results have shown that word co-occurrence networks based on parallel texts are applicable to fine-grained language classification and they constitute a more convenient substitute for syntactic dependency networks in complex-network- based language classification.

关 键 词:网络构建 文本聚类 语言 平行 共生 聚类分析 遗传关系 复杂网络 

分 类 号:H08[语言文字—语言学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象