检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:唐善成[1] 鲁彪 张雪 张莹 梁少君 TANG Shan-cheng;LU Biao;ZHANG Xue;ZHANG Ying;LIANG Shao-jun(School of Communication and Information Engineering,Xian University of Science and Technology,Xian 710054,China)
机构地区:[1]西安科技大学通信与信息工程学院,西安710054
出 处:《科学技术与工程》2023年第16期6967-6973,共7页Science Technology and Engineering
基 金:国家重点研发计划项目(2018YFC0808300);陕西省科技计划重点产业创新链(群)项目(2020ZDLGY15-07);西安市科技计划科技创新引导项目(201805036YD14CG20(4))。
摘 要:为解决现有中文字向量表征方法中字形特征利用不充分的问题,利用矢量图形的尺度不变性,提出了一种面向汉字矢量图形特征的字向量(scalable vector graphics to vector,SVG2vec)表征方法。预处理阶段将汉字像素图像转化矢量图形,生成字形矢量坐标对序列;特征学习阶段采用双向循环神经网络(recurrent neural network,RNN)和自回归混合密度循环神经网络构建矢量图形变分自编码器模型,利用模型学习汉字字形结构特征;向量生成阶段输入字形矢量坐标对序列到编码器,编码器将字形特征映射到概率连续分布空间,得到SVG2vec字向量。与已有字向量在不同层级任务上进行对比实验。结果表明:SVG2vec向量在命名实体识别、中文分词和短文本相似度计算实验中,F1均值比Word2vec、GloVe等未利用字形特征的向量分别提高了1.27、0.4,1.67、0.12,3.28、2.03,比GnM2Vec、CWE等利用字形特征的向量分别提高了1.02、1.07,1.69、1.34,0.04、0.31,SVG2vec能更有效利用汉字字形特征。In order to solve the problem of insufficient use of character features in existing Chinese character vector representation methods,a character vector representation method was proposed based on the scale invariance of vector graphics.In the pre-processing stage,the pixel-level image of Chinese characters was transformed into vector graphics,and the vector coordinate sequences of glyphs were generated.In the feature learning stage,bidirectional recurrent neural network(RNN)and autoregressive mixed density RNN were used to construct vector graphic variational autoencoder model,and the model was used to learn Chinese character character structure features.The sequence of glyph vector coordinate pairs were input to the encoder,and the encoder maps the glyphs to the probability continuous distribution space to obtain SVG2vec word vector.Compared with the existing word vector on different levels of tasks,the results show that in named entity recognition,Chinese word segmentation and short text similarity calculation tasks,compared with Word2vec and GloVe without using glyph features,F1 value of SVG2vec vectors is increased by 1.27,0.4,1.67,0.12,3.28,2.03,compared with glyph and meaning to vector(GnM2Vec)and character-enhanced word embedding(CWE)using glyph features,F1 value of SVG2vec vectors is increased by 1.02,1.07,1.69,1.34,0.04,0.31,SVG2vec can effectively represent Chinese glyphs.
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.222.22.193