基于递归自编码器的广告短语相关性  被引量:2

Correlation between phrases in advertisement based on recursive autoencoder

在线阅读下载全文

作  者:胡庆辉[1,2] 魏士伟[2] 解忠乾 任亚峰[1] 

机构地区:[1]武汉大学计算机学院,武汉430072 [2]桂林航天工业学院广西高校机器人与焊接技术重点实验室培育基地,广西桂林541004

出  处:《计算机应用》2016年第1期154-157,187,共5页journal of Computer Applications

基  金:国家自然科学基金资助项目(11301106);广西自然科学基金资助项目(2014GXNSFAA1183105);广西高校科研资助项目(ZD2014147;YB2014431)~~

摘  要:针对现有广告短语相关性研究成果多采用字面匹配,忽略了短语所包含的深层语义信息,限制了任务的性能等问题,提出了采用深度学习算法研究广告短语的相关性,采用递归自编码器(RAE)对短语进行深层结构分析,使得短语向量包含深层的语义信息,以此来构建广告语境下的短语相关性计算方法。具体地,给定一个包含若干词的序列,序列中所有相邻的两个元素尝试合并产生一个重构误差,遍历将重构误差最小的元素两两合并,形成类似哈夫曼树结构的短语树。采用梯度下降法最小化短语树的重构误差,采用余弦距离度量短语之间的相关性。实验结果显示,通过引入词语权重信息,加大了重要词语在最终短语向量表示中贡献的信息量,使得RAE更适合短语计算;比起传统LDA和BM25算法,在50%召回率的条件下,提出的算法的准确率分别提高了4.59个百分点和3.21个百分点,这证明了所提算法的有效性。Focusing on the issue that most research results on correlation between advertising phrases stay in the literal level, and can not exploit deep semantic information of the phrases, which limits the performance of the task, a novel method was proposed to calculate the correlation between the phrases by using deep learning technique. Recursive Auto Encoder( RAE) was developed to make full use of semantic information in the word order and phrase, which made the phrase vector contain more deep semantic information, and built the calculating method of correlation under the advertising situation.Specifically, for a given list of a few phrases, reconstruction error was produced by merging the adjacent two elements. Phrase tree, which similar to the Huffman tree, was produced by merging two elements with smallest reconstruction error in turn.Gradient descent and Cosine distance were used to minimize the reconstruction error of phrase tree and measure the correlation between the phrases respectively. The experimental results show that the contribution of the important phrases is increased in the representation of the final phrase vector by introducing weight information, and RAE is more suitable for phrase calculation. The proposed method increases the accuracy by 4. 59% and 3. 21% respectively compared with LDA( Latent Dirichlet Allocation) and BM25 algorithm under the same condition of 50% recall rate, which proves its effectiveness.

关 键 词:深度学习 递归自编码器 词向量 计算广告 搜索引擎 

分 类 号:TP391[自动化与计算机技术—计算机应用技术] TP181[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象