Image to Bengali Caption Generation Using Deep CNN and Bidirectional Gated Recurrent Unit (original) (raw)

There is not more research on the linguistic characteristics of the Bengali language. Bengali is spoken by about 193 million people globally, and it is one of the top ten spoken languages worldwide. In this paper, a CNN and Bidirectional GRU architecture is proposed for producing a natural language caption from an image in the Bengali language. Bangladeshi people may use this study to grasp one another better and crack language barriers and increase their cultural understanding. This study would immensely help several blind people in their daily lives. The encoder-decoder approach was used in this paper for captioning. We used a pre-trained Deep CNN named InceptionV3 image encoder to interpret, identify, and annotate the dataset’s images and used a Bidirectional GRU architecture as the decoder to produce captions. In order to deliver the finest and subtle Bengali captions from our model, argmax search and beam search are included. We proposed a new dataset named BNATURE that contain...