检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:施建军 刘磊 周瓴 SHI Jianjun;LIU Lei;ZHOU Ling
机构地区:[1]上海外国语大学语言研究院 [2]上海外国语大学日本文化经济学院/常熟理工学院
出 处:《外语教学理论与实践》2023年第1期18-36,共19页Foreign Language Learning Theory And Practice
基 金:2022年度国家社科基金重点项目“基于词向量的中日现代语言汉字词汇语义计量研究”(批准号:22AYY024)中期成果。
摘 要:中日现代语言通用汉字词各义项在两种语言中的使用情况一直是学界关注的难题。基于高频中日同形词的研究结果表明,利用BERT词向量技术对日语目标词义项统计的平均准确率达到了90%,最高达到97%;对汉语目标词义项统计的平均准确率达到了88.3%,最高也达到97%,利用词向量技术对中日汉字词汇语义开展计量研究具备可行性。同时研究还发现,传统词典义项设立的科学性、例句规范性和句长等因素都会对基于词向量的语义分析产生影响。The sense frequency and distribution of Chinese characters commonly used in modern languages of China and Japan have always been a concern of the academic community,as well as a problem in the comparative study of the two languages.Using neural-network-based word embedding as a tool,this paper selects 10 highfrequency homographs with rich meanings as target words to conduct exploratory research on this issue.Research result shows that the sense classification accuracy of Japanese target words based on BERT word embedding reaches 90%on average and 97%at the highest;The average accuracy of Chinese target words has reached 88.3%with the highest also at 97%.It is feasible to use word embedding to carry out quantitative research on the semantics of Chinese and Japanese words.The research also reveals that,among other factors,the scientificity and rationality of word sense induction in traditional dictionaries,the length of dictionary sample sentences,the standardization of corpus sample sentences,the accuracy of sample sentence extraction,and the length of corpus sample sentences have an impact on the semantic analysis based on word embeding.
分 类 号:H36[语言文字—日语] H136[自动化与计算机技术—计算机应用技术] TP391.1[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.138.197.104