Faceing problems to create one to many RNN type model

Captions generator from image

I wonna Create h deep Learning model that predicts caption from images and these technique i used in my new project which is extra advance type project.

  1. From Kaggle flickr8k dataset i used to build this project.
  2. Preprocessing is done extracting features is done.
  3. I wonna train this model so during training my loss is reducing after training when I give new image for predictions they not able predict their captions or predict incorrect caption.

Can any one help in this situation what should I do.