SwiDeN : Convolutional Neural Networks For Depiction Invariant Object Recognition (original) (raw)

SwiDeN

Proceedings of the 24th ACM international conference on Multimedia, 2016

Current state of the art object recognition architectures achieve impressive performance but are typically specialized for a single depictive style (e.g. photos only, sketches only). In this paper, we present SwiDeN: our Convolutional Neural Network (CNN) architecture which recognizes objects regardless of how they are visually depicted (line drawing, realistic shaded drawing, photograph etc.). In SwiDeN, we utilize a novel 'deep' depictive style-based switching mechanism which appropriately addresses the depiction-specific and depiction-invariant aspects of the problem. We compare SwiDeN with alternative architectures and prior work on a 50-category Photo-Art dataset containing objects depicted in multiple styles. Experimental results show that SwiDeN outperforms other approaches for the depiction-invariant object recognition problem.

Insights From A Large-Scale Database of Material Depictions In Paintings

arXiv (Cornell University), 2020

Deep learning has paved the way for strong recognition systems which are often both trained on and applied to natural images. In this paper, we examine the give-and-take relationship between such visual recognition systems and the rich information available in the fine arts. First, we find that visual recognition systems designed for natural images can work surprisingly well on paintings. In particular, we find that interactive segmentation tools can be used to cleanly annotate polygonal segments within paintings, a task which is time consuming to undertake by hand. We also find that FasterRCNN, a model which has been designed for object recognition in natural scenes, can be quickly repurposed for detection of materials in paintings. Second, we show that learning from paintings can be beneficial for neural networks that are intended to be used on natural images. We find that training on paintings instead of natural images can improve the quality of learned features and we further find that a large number of paintings can be a valuable source of test data for evaluating domain adaptation algorithms. Our experiments are based on a novel large-scale annotated database of material depictions in paintings which we detail in a separate manuscript.

Deep Learning Approaches to Art Style Recognition in

2017

Foreword Computation for the work described in this report was supported by the DeiC National HPC Centre, SDU. The authors will be referring to themselves in first person plural. Counselling and the original project proposal has been delivered by Manfred Jaeger.

Freehand Sketch Recognition Using Deep Features

Freehand sketches often contain sparse visual detail. In spite of the sparsity, they are easily and consistently recognized by humans across cultures, languages and age groups. Therefore, analyzing such sparse sketches can aid our understanding of the neurocognitive processes involved in visual representation and recognition. In the recent past, Convolutional Neural Networks (CNNs) have emerged as a powerful framework for feature representation and recognition for a variety of image domains. However, the domain of sketch images has not been explored. This paper introduces a freehand sketch recognition framework based on "deep" features extracted from CNNs. We use two popular CNNs for our experiments -Imagenet CNN and a modified version of LeNet CNN. We evaluate our recognition framework on a publicly available benchmark database containing thousands of freehand sketches depicting everyday objects. Our results are an improvement over the existing state-of-the-art accuracies by 3% − 11%. The effectiveness and relative compactness of our deep features also make them an ideal candidate for related problems such as sketch-based image retrieval. In addition, we provide a preliminary glimpse of how such features can help identify relative importance of crucial attributes (e.g. object-parts) in the sketched objects.

Fine-tuning Convolutional Neural Networks for fine art classification

Expert Systems with Applications

The increasing availability of large digitized fine art collections opens new research perspectives in the intersection of artificial intelligence and art history. Motivated by the successful performance of Convolutional Neural Networks (CNN) for a wide variety of computer vision tasks, in this paper we explore their applicability for art-related image classification tasks. We perform extensive CNN fine-tuning experiments and consolidate in one place the results for five different art-related classification tasks on three large fine art datasets. Along with addressing the previously explored tasks of artist, genre, style and time period classification, we introduce a novel task of classifying artworks based on their association with a specific national artistic context. We present state-of-the-art classification results of the addressed tasks, signifying the impact of our method on computational analysis of art, as well as other image classification related research areas. Furthermore, in order to question transferability of deep representations across various source and target domains, we systematically compare the effects of domain-specific weight initialization by evaluating networks pre-trained for different tasks, varying from object and scene recognition to sentiment and memorability labelling. We show that fine-tuning networks pre-trained for scene recognition and sentiment prediction yields better results than fine-tuning networks pre-trained for object recognition. This novel outcome of our work suggests that the semantic correlation between different domains could be inherent in the CNN weights. Additionally, we address the practical applicability of our results by analysing different aspects of image similarity. We show that features derived from fine-tuned networks can be employed to retrieve images similar in either style or content, which can be used to enhance capabilities of search systems in different online art collections.

Classification of basic artistic media based on a deep convolutional approach

The Visual Computer, 2019

Artistic media play an important role in recognizing and classifying artworks in many artwork classification works and public artwork databases. We employ deep CNN structure to recognize artistic media from artworks and to classify them into predetermined categories. For this purpose, we define basic artistic media as oilpaint brush, pastel, pencil and watercolor and build artwork image dataset by collecting artwork images from various websites. To build our classifier, we implement various recent deep CNN structures and compare their performances. Among them, we select DenseNet, which shows best performance for recognizing artistic media. Through the human baseline experiment, we show that the performance of our classifier is compatible with that of trained human. Furthermore, we also show that our classifier shows a similar recognition and classification pattern with human in terms of well-classifying media, ill-classifying media, confusing pair and confusing case. We also collect synthesized oilpaint artwork images from fourteen important oilpaint literatures and apply them to our classifier. Our classifier shows a meaningful performance, which will lead to an evaluation scheme for the artistic media simulation techniques of non-photorealistic rendering (NPR) society.

Painting Classification using a Pre-trained Convolutional Neural Network

The problem of classifying images into different predefined categories is an important high-level vision problem. In recent years, convolutional neural networks (CNNs) have been the most popular tool for image classification tasks. CNNs are multi-layered neural networks that can handle complex classification tasks if trained properly. However , training a CNN requires a huge number of labeled images that are not always available for all problem domains. A CNN pre-trained on a different image dataset may not be effective for classification across domains. In this paper, we explore the use of pre-trained CNN not as a classification tool but as a feature extraction tool for painting classification. We run an extensive array of experiments to identify the layers that work best with the problems of artist and style classification, and also discuss several novel representation and classification techniques using these features.

Art Classification with Semi-Supervised Learning and Multimodal Vision Transformers

Art classification is a challenging task that requires understanding various painting characteristics beyond pixel values. This project addresses the challenge of enhancing art classification using deep learning techniques, specifically focusing on the integration of semi-supervised learning and multimodal vision transformers. We leverage the ArtBench-10 dataset, a meticulously curated collection of 60,000 images representing 10 distinctive artistic styles. Our approach combines supervised and semi-supervised deep learning models, including Vision Transformers (ViT), ResNet, and EfficientNet B2, to classify artwork into distinct styles. We employ MixMatch, a semi-supervised learning algorithm, to effectively utilize a combination of labeled and unlabeled data for training. Additionally, we integrate textual and visual information using the BLIP (Bootstrapped Language Image Pretraining) model to capture nuanced artistic styles. Through extensive experimentation, we demonstrate the efficacy of our approach in improving classification accuracy and highlight the potential of multimodal frameworks for artwork analysis. The code and pre-trained files are available in https://drive.google.com/drive/folders/ 17Ig7JWGMCHp0C6MsKzjmeLUoTyacFZsI ? usp = sharing.

Extracting and Analyzing Deep Learning Features for Discriminating Historical Art Deep Learning Features and Art

PEARC '20: Practice and Experience in Advanced Research Computing, 2020

Art historians are interested in possible methods and visual criteria for determining the style and authorship of artworks. One approach, developed by Giovanni Morelli in the late nineteenth century, focused on abstracting, extracting and comparing details of recognizable human forms, although he never prescribed what exactly to look for. In this work, we asked what could a contemporary method like convolution networks contribute or reveal about such a humanistic method that is not fully determined, but that is also so clearly aligned with computation? Convolution networks have become very successful in object recognition because they learn general features to distinguish and classify large sets of objects. Thus, we wanted to explore what features are present in these networks that have some discriminatory power for distinguishing paintings. We input the digitized art into a large-scale convolutional network that was pre-trained for object recognition from naturalistic images. Because we do not have labels, we extracted activations from the network and ran K-means clustering. We contrasted and evaluated discriminatory power between shallow and deeper layers. We also compared predetermined features from standard computer vision techniques of edge detection. It turns out that the deep network individual feature maps are highly generic and do not easily map onto obvious authorship interpretations, but in the aggregate can have strong discriminating power that are intuitive. Although this does not directly test issues of attribution, the application can inform humanistic perspectives regarding what counts as features that make up visual elements of paintings.

A Deep Approach for Classifying Artistic Media from Artworks

KSII Transactions on Internet and Information Systems, 2019

We present a deep CNN-based approach for classifying artistic media from artwork images. We aim to classify most frequently used artistic media including oilpaint brush, watercolor brush, pencil and pastel, etc. For this purpose, we extend VGGNet, one of the most widely used CNN structure, by substituting its last layer with a fully convolutional layer, which reveals class activation map (CAM), the region of classification. We build two artwork image datasets: YMSet that collects more than 4K artwork images for four most frequently used artistic media from various internet websites and WikiSet that collects almost 9K artwork images for ten most frequently used media from WikiArt. We execute a human baseline experiment to compare the classification performance. Through our experiments, we conclude that our classifier is superior in classifying artistic media to human.