supported in part by the National Natural Science Foundation of China under Grant No.62176061;the Science and Technology Commission of Shanghai Municipality under Grant No.22511105000.
As a Turing test in multimedia,visual question answering(VQA)aims to answer the textual question with a given image.Recently,the“dynamic”property of neural networks has been explored as one of the most promising way...