融合多模态信息的产品摘要抽取模型

Product summarization extraction model with multimodal information fusion

作　　者：赵强王中卿[1] 王红玲[1] ZHAO Qiang;WANG Zhongqing;WANG Hongling(School of Computer Science and Technology,Soochow University,Suzhou Jiangsu 215006,China)

机构地区：[1]苏州大学计算机科学与技术学院,江苏苏州215006

出　　处：《计算机应用》2024年第1期73-78,共6页journal of Computer Applications

基　　金：国家自然科学基金资助项目(61976146)。

摘　　要：在网络购物平台上,简洁、真实、有效的产品摘要对于提升购物体验至关重要。网上购物无法接触到产品实物,产品图像所含信息是除产品文本描述外的重要视觉信息,因此融合包括产品文本和产品图像在内的多模态信息的产品摘要对于网络购物具有重要的意义。针对融合产品文本描述和产品图像的问题,提出一种融合多模态信息的产品摘要抽取模型。与一般的产品摘要任务的输入只包含产品文本描述不同,该模型引入了产品图像作为一种额外的信息来源,使抽取产生的摘要更丰富。具体来说,首先对产品文本描述和产品图像分别使用预训练模型进行特征表示,从产品文本描述中提取每个句子的文本特征表示,从产品图像中提取产品整体的视觉特征表示;然后使用基于低阶张量的多模态融合方法将每个句子的文本特征和整体视觉特征进行模态融合,得到每个句子的多模态特征表示;最后将所有句子的多模态特征表示输入摘要生成器中以生成最终的产品摘要。在CEPSUM(Chinese E-commerce Product SUMmarization)2.0数据集上进行对比实验,在CEPSUM 2.0的3个数据子集上,该模型的平均ROUGE-1比TextRank高3.12个百分点,比BERTSUMExt(BERT SUMmarization Extractive)高1.75个百分点。实验结果表明,该模型融合产品文本和图像信息对于产品摘要是有效的,在ROUGE评价指标上表现良好。On online shopping platforms,concise,authentic and effective product summarizations are crucial to improving the shopping experience.In addition,online shopping cannot touch the actual product,and the information contained in the product image is important visual information except the product text description,so product summarization that fuses multimodal information including product text and product image is of great significance for online shopping.Aiming at fusing product text descriptions and product images,a product summarization extraction model with multimodal information fusion was proposed.Different from the general product summarization task whose input only contains the product text description,the proposed model introduces product image as an additional source of information to make the extracted summary richer.Specifically,first the pre-trained model was used to represent the features of the product text description and product image by which the text feature representation of each sentence was extracted from the product text description,and the overall visual feature representation of the product was extracted from the product image.Then the lowrank tensor-based multimodal fusion method was used to modally fuse the text features and overall visual features to obtain the multimodal feature representation for each sentence.Finally,the multimodal feature representations of all sentences were fed into the summary generator to generate the final product summarization.Comparative experiments were conducted on CEPSUM 2.0(Chinese E-commerce Product SUMmarization 2.0)dataset.On the three subsets of CEPSUM 2.0,the average ROUGE-1(Recall-Oriented Understudy for Gisting Evaluation 1)of this model is 3.12 percentage points higher than that of TextRank and 1.75 percentage points higher than that of BERTSUMExt(BERT SUMmarization Extractive).Experimental results show that the proposed model is effective in fusing product text and image information,which performs well on ROUGE evaluation index.

关键词：产品摘要多模态摘要抽取式摘要多模态融合自动文摘

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

融合多模态信息的产品摘要抽取模型

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

融合多模态信息的产品摘要抽取模型

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索