词汇分布语义的语言学基础探微  被引量:1

A Linguistic Probe into the Foundation of Distributed Word Representation

在线阅读下载全文

作  者:潘俊 吴宗大 Pan Jun;Wu Zongda(Department of Big Data Sciences,Zhejiang University of Science and Technology,Hangzhou 310023;Department of Computer Sciences,Shaoxing University,Shaoxing 312000)

机构地区:[1]浙江科技学院大数据科学系,杭州310023 [2]绍兴文理学院 [3]南京大学信息管理学院,绍兴312000

出  处:《浙江社会科学》2019年第12期99-104,158,159,共8页Zhejiang Social Sciences

基  金:教育部人文社会科学研究青年基金项目“基于知识库和大规模文本的词汇语义表示研究”(18YJCZH137);浙江省自然科学基金重点项目(LZ18F020001)的研究成果之一

摘  要:词汇分布表示是当前人工智能领域语义表示的主要方法。通过对大规模语料中词汇分布规律的学习,可以得到以数学向量形式表示的词汇,并具有语义可计算和可推导的重要性质。词汇分布语义表示的语言哲学基础是维特根斯坦的词义使用论,主张词的意义就在于使用。维特根斯坦和索绪尔的语言观有着某种联系,索绪尔认为,语言中一切要素都按照句段关系和联想关系运行,词汇分布语义模型的输入上下文事实上可以归为这两类关系。布龙菲尔德的结构语言学深受索绪尔的影响,最终发展成为海里斯的分布方法论,构成了词汇分布语义的语言学基础。当前以神经语言模型为代表的词汇分布语义,根植于词义使用论学说,并以描写语言学为理论支撑,其本质是对语料中词汇使用模式和偏好的反映,因此可以客观折射社会文化生活的某些特征,同时也不可避免地存在其固有的局限性。Distributed word representation is an effective way to express word meaning,and has been widely used in artificial intelligence applications.It utilizes the distributional patterns of words in large text data and obtains words represented as mathematical v ectors,which are semantically computable and de-ductible.The philosophical basis of distributed word representation is Wittgenstein’s use theory of meaning,which claims that the meaning of a word lies in the use of it.There are certain connections between Wittgenstein’s language theory and Saussure’s language theory.Saussure believes that all elements in a language are based on two different types of relations,e.g.syntagmatic and associative relations,which we find are actually the exact context input of distributed word representation models.Bloomfield’s structural linguistics was deeply influenced by Saussure and finally developed into Harris’s distribution methodology,forming the linguistic basis of distributed word representation.This paper shows that the distributed word representation is deeply rooted in the soil of use theory of meaning,and is built on the descriptive linguistics foundation.In essence,it is a reflection of the patterns and preferences of word distribution in the collected corpus data.Therefore,it could objectively reveal some characteristics of the social and cultural life,and inevitably has its inherent limitations of course.

关 键 词:词义 分布语义 自然语言理解 结构主义 描写语言学 

分 类 号:H31[语言文字—英语]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象