基于弱监督深度学习的文本聚类算法及应用  被引量:2

TEXT CLUSTERING ALGORITHM AND ITS APPLICATION BASED ON WEAKLY-SUPERVISED DEEP LEARNING

在线阅读下载全文

作  者:谭敏 张宏源 张海超 Tan Min;Zhang Hongyuan;Zhang Haichao(School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, Zhejiang, China)

机构地区:[1]杭州电子科技大学计算机学院,浙江杭州310018

出  处:《计算机应用与软件》2019年第4期171-177,共7页Computer Applications and Software

基  金:国家自然科学基金青年基金项目(61602136)

摘  要:围绕基于用户点击数据的文本聚类展开研究。利用点击数据将查询文本表征为图像点击特征图,并在此上训练深度点击模型。为了应对文本噪声,引入可刻画文本可靠性的权重,提出基于弱监督深度学习的文本聚类算法来迭代更新文本权重和深度模型。将该算法应用于基于点击特征的图像识别中,通过合并相似文本,为图像构建紧凑的文本集点击特征向量,实现高效的图像识别。在Clickture-Dog和Clickture-Bird两个公开点击数据集上进行验证,结果表明:用图像点击特征图来表征查询文本可有效解决原始点击特征向量的稀疏和不连续性,帮助获得优秀识别率;弱监督深度聚类模型不仅帮助学习强大的文本表征,还能有效选择高质量文本数据训练模型,进一步提高性能。The research is based on the text clustering from user-click data. With click data, a query-text was represented as a smooth image-click-graph, and a deep click model was trained. In order to deal with heavy noise in the clicked query-text set, a weight vector that could measure the reliability of the query-text was introduced, and a text clustering algorithm based on weakly-supervised training method was proposed to iteratively update the weight vector and deep model. The text clustering algorithm was applied to click-feature-based image recognition. After combining similar query-text, a compact click-frequency-vector for images was constructed to achieve accurate image recognition. The proposed method was verified on public Clickture-Dog and Clickture-Bird datasets. The experimental results show that representing each query as an image-click-graph can deal with the non-smoothness and sparseness in the original click vectors, which helps to improve image recognition accuracy. Weakly-supervised deep learning not only helps to learn powerful representations, but also can effectively select queries of high quality, which further improved the recognition performance.

关 键 词:图像识别 深度聚类 用户点击数据 查询合并 弱监督学习 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象