supported by Beijing Natural Science Foundation of China(L201023);the Natural Science Foundation of China(62076030)。
Image captioning refers to automatic generation of descriptive texts according to the visual content of images.It is a technique integrating multiple disciplines including the computer vision(CV),natural language proc...
supported in part by the National Natural Science Foundation of China under Grants 62273272 and 61873277;in part by the Chinese Postdoctoral Science Foundation under Grant 2020M673446;in part by the Key Research and Development Program of Shaanxi Province under Grant 2023-YBGY-243;in part by the Youth Innovation Team of Shaanxi Universities.
Currently,the video captioning models based on an encoder-decoder mainly rely on a single video input source.The contents of video captioning are limited since few studies employed external corpus information to guide...