基于场景图的段落生成序列图像方法  

Image Stream From Paragraph Method Based on Scene Graph

在线阅读下载全文

作  者:张玮琪 汤轶丰 李林燕 胡伏原[1,4] ZHANG Wei-qi;TANG Yi-feng;LI Lin-yan;HU Fu-yuan(School of Electronic&Information Engineering,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China;Suzhou Key Laboratory for Big Data and Information Service,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China;Suzhou Institute of Trade and Commerce,Suzhou,Jiangsu 215009,China;Virtual Reality Key Laboratory of Intelligent Interaction and Application Technology of Suzhou,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China)

机构地区:[1]苏州科技大学电子与信息工程学院,江苏苏州215009 [2]苏州科技大学苏州市大数据与信息服务重点实验室,江苏苏州215009 [3]苏州经贸职业技术学院,江苏苏州215009 [4]苏州科技大学苏州市虚拟现实智能交互及应用技术重点实验室,江苏苏州215009

出  处:《计算机科学》2022年第1期233-240,共8页Computer Science

基  金:国家自然科学基金(61876121);江苏省重点研发计划项目(BE2017663);江苏省教育厅高等学校自然科学研究面上项目(19KJB520054);江苏省研究生实践创新项目(SJCX20_1119)。

摘  要:通过生成对抗网络进行段落生成序列图像的任务已经可以生成质量较高的图像。然而当输入的文本涉及多个对象和关系时,文本序列的上下文信息难以提取,生成图像的对象布局容易产生混乱,生成的对象细节不足。针对该问题,文中在StoryGAN的基础上,提出了一种基于场景图的段落生成序列图像方法。首先,通过图卷积将段落转换为多个场景图,每个场景图包含对应文本的对象和关系信息;然后,预测对象的边界框和分割掩膜来计算生成场景布局;最后,根据场景布局和上下文信息生成更符合对象及其关系的序列图像。在CLEVR-SV和CoDraw-SV数据集上进行测试,该方法可以生成包含多个对象及其关系的64×64像素的序列图像。实验结果表明,在CLEVR-SV数据集上,所提方法的SSIM和FID比StoryGAN分别提升了1.34%和9.49%;在CoDraw-SV数据集上,所提方法的ACC比StoryGAN提高了7.40%。所提方法提高了生成场景的布局合理性,不仅可以生成包含多个对象和关系的图像序列,而且生成的图像质量更高,细节更清晰。The task of generating sequence images from paragraphs by generating confrontation networks can already generate higher quality images.However,when the input text involves multiple objects and relationships,the context information of the text sequence is difficult to extract,the object layout of the generated image is prone to confusion,and the generated object details are insufficient.To solve this problem,this paper proposes a method of generating sequence images based on scene graphs based on StoryGAN.First,the paragraph is converted into multiple scene graphs through graph convolution,each scene graph contains the object and relationship information of the corresponding text.Then,the bounding box and segmentation mask of the object are predicted to calculate the scene layout.Finally,according to the scene layout and the context information,a sequence of images more in line with the object and its relationship is generated.Tests on CLEVR-SV and CoDraw-SV data sets show that the method in this paper can generate 64×64-pixel sequence images containing multiple objects and their relationships.Experimental results show that on the CLEVR-SV data set,the SSIMand FIDof this method are improved by 1.34%and 9.49%respectively than StoryGAN.On the CoDraw-SV data set,the ACCof this method is 7.40%higher than that of StoryGAN.The proposed method improves the rationality of the layout of the generated scene,not only can generate an image sequence containing multiple objects and relationships,but also the generated image has higher quality and clearer details.

关 键 词:生成对抗网络 图卷积神经网络 场景布局 文本生成图像 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象