supported in part by the National Basic Research Program of China(No.2012CB316400);National Natural Science Foundation of China(Nos.61472353 and 61572431);China Knowledge Centre for Engineering Sciences and Technology,the Fundamental Research Funds for the Central Universities;2015 Qianjiang Talents Program of Zhejiang Province;supported in part by the US NSF(No.CCF1017828)
In this paper, we propose an approach for generating rich fine-grained textual descriptions of images. In particular, we use an LSTM-in-LSTM(long short-term memory) architecture, which consists of an inner LSTM and an...