1-1hit |
Takahisa YAMAMOTO Shiki TAKEUCHI Atsushi NAKAZAWA
Visual sentiment analysis has a lot of applications, including image captioning, opinion mining, and advertisement; however, it is still a difficult problem and existing algorithms cannot produce satisfactory results. One of the difficulties in classifying images into emotions is that visual sentiments are evoked by different types of information - visual and semantic information where visual information includes colors or textures, and semantic information includes types of objects evoking emotions and/or their combinations. In contrast to the existing methods that use only visual information, this paper shows a novel algorithm for image emotion recognition that uses both information simultaneously. For semantic features, we introduce an object vector and a word vector. The object vector is created by an object detection method and reflects existing objects in an image. The word vector is created by transforming the names of detected objects through a word embedding model. This vector will be similar among objects that are semantically similar. These semantic features and a visual feature made by a fine-tuned convolutional neural network (CNN) are concatenated. We perform the classification by the concatenated feature vector. Extensive evaluation experiments using emotional image datasets show that our method achieves the best accuracy except for one dataset against other existing methods. The improvement in accuracy of our method from existing methods is 4.54% at the highest.