检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Feiyang PAN Shuokai LI Xiang AO Qing HE
机构地区:[1]Key Lab of Intelligent Information Processing of Chinese Academy of Sciences(CAS),Institute of Computing Technology,CAS,Beijing 100190,China [2]University of Chinese Academy of Sciences,Beijing 100049,China
出 处:《Frontiers of Computer Science》2022年第2期47-54,共8页中国计算机科学前沿(英文版)
基 金:The reseach work was supported by the National Key Research and Development Program of China(2017YFB1002104);the National Natural Science Foundation of China(Grant Nos.92046003,61976204,U1811461);Xiang Ao was also supported by the Project of Youth Innovation Promotion Association CAS and Beijing Nova Program(Z201100006820062).
摘 要:Word-embedding acts as one of the backbones of modern natural language processing(NLP).Recently,with the need for deploying NLP models to low-resource devices,there has been a surge of interest to compress word embeddings into hash codes or binary vectors so as to save the storage and memory consumption.Typically,existing work learns to encode an embedding into a compressed representation from which the original embedding can be reconstructed.Although these methods aim to preserve most information of every individual word,they often fail to retain the relation between words,thus can yield large loss on certain tasks.To this end,this paper presents Relation Reconstructive Binarization(R2B)to transform word embeddings into binary codes that can preserve the relation between words.At its heart,R2B trains an auto-encoder to generate binary codes that allow reconstructing the wordby-word relations in the original embedding space.Experiments showed that our method achieved significant improvements over previous methods on a number of tasks along with a space-saving of up to 98.4%.Specifically,our method reached even better results on word similarity evaluation than the uncompressed pre-trained embeddings,and was significantly better than previous compression methods that do not consider word relations.
关 键 词:embedding compression variational auto-encoder binary word embedding
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.169