检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘慧婷[1,2] 凌超 LIU Huiting1,2 ,LING Chao1,2(Key Laboratory of Intelligent Computing and Signal Processing of the Ministry of Education, Anhui University, Hefei 230039, Anhui, China; 2. School of Computer Science and Technology, Anhui University, Hefei 230601, Anhui, China)
机构地区:[1]安徽大学计算智能与信号处理教育部重点实验室,安徽合肥230039 [2]安徽大学计算机科学与技术学院,安徽合肥230601
出 处:《华南理工大学学报(自然科学版)》2018年第8期122-129,共8页Journal of South China University of Technology(Natural Science Edition)
基 金:国家自然科学基金资助项目(61202227);安徽省高等学校自然科学研究项目(KJ2018A00B)~~
摘 要:当前的词嵌入模型多数基于分布假设理论,这类模型将单词作为最基本语义单元,然后利用词的外部上下文信息学习词表示.然而,在类似于汉语的语言中,单词经常由多个字符组成,这些字符包含了丰富的内部信息,同时单词的语义也和这些字符的语义息息相关.考虑到当前常用词模型均忽略了字符信息,文中以中文为例,提出了单词与字符表示的协同学习模型.为了解决汉语中存在的单字符多语义和多字符单语义情况,文中提出了基于多原型的单词协同学习模型,并使用词相似任务和类比推理任务对该模型进行评估.结果显示,文中模型的词表示质量均优于其他词嵌入模型.Most word embedding models at present are based a word as a basic unit and infers word representation from its on the theory of distribution hypothesis, which regards external contexts. However, in some languages similar to Chinese, a word is formed by several characters which contain rich internal infomlation whose semantic meaning is closely related to that of those characters in the same time. Considering that the commonly used word embedding models ignore the character information, two collaborative learning models represented by words and characters are put forward in this paper by taking Chinese for example. In order to solve the problem of homonymy and polysemy, muhiple-prototype character embedding models are proposed through which word similarity tasks and analogical rea-soning tasks are evaluated. The resuhs demonstrate that the models represented by words in the paper outperfoml word embedding models.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.144.240.141