Applying Computer Vision Systems to Historical Book Illustrations: Challenges and First Results (original) (raw)

Deep learning for historical books: classification of printing technology for digitized images

Multimedia Tools and Applications, 2021

Printing technology has evolved through the past centuries due to technological progress. Within Digital Humanities, images are playing a more prominent role in research. For mass analysis of digitized historical images, bias can be introduced in various ways. One of them is the printing technology originally used. The classification of images to their printing technology e.g. woodcut, copper engraving, or lithography requires highly skilled experts. We have developed a deep learning classification system that achieves very good results. This paper explains the challenges of digitized collections for this task. To overcome them and to achieve good performance, shallow networks and appropriate sampling strategies needed to be combined. We also show how class activation maps (CAM) can be used to analyze the results.

Deep Learning Approaches to Classification of Production Technology for 19th Century Books

2018

Cultural research is dedicated to understanding the processes of knowledge dissemination and the social and technological practices in the book industry. Research on children books in the 19th century can be supported by computer systems. Specifically, the advances in digital image processing seem to offer great opportunities for analyzing and quantifying the visual components in the books. The production technology for illustrations in books in the 19th century was characterized by a shift from wood or copper engraving to lithography. We report classification experiments which intend to classify images based on the production technology. For a classification task that is also difficult for humans, the classification quality reaches only around 70%. We analyze some further error sources and identify reasons for the low performance.

Benchmarking Deep Learning Models for Classification of Book Covers

SN Computer Science

Book covers usually provide a good depiction of a book's content and its central idea. The classification of books in their respective genre usually involves subjectivity and contextuality. Book retrieval systems would utterly benefit from an automated framework that is able to classify a book's genre based on an image, specifically for archival documents where digitization of the complete book for the purpose of indexing is an expensive task. While various modalities are available (e.g., cover, title, author, abstract), benchmarking the image-based classification systems based on minimal information is a particularly exciting field due to the recent advancements in the domain of image-based deep learning and its applicability. For that purpose, a natural question arises regarding the plausibility of solving the problem of book classification by only utilizing an image of its cover along with the current state-of-the-art deep learning models. To answer this question, this paper makes a threefold contribution. First, the publicly available book cover dataset comprising of 57k book covers belonging to 30 different categories is thoroughly analyzed and corrected. Second, it benchmarks the performance on a battery of stateof-the-art image classification models for the task of book cover classification. Third, it uses explicit attention mechanisms to identify the regions that the network focused on in order to make the prediction. All of our evaluations were performed on a subset of the mentioned public book cover dataset. Analysis of the results revealed the inefficacy of the most powerful models for solving the classification task. With the obtained results, it is evident that significant efforts need to be devoted in order to solve this image-based classification task to a satisfactory level.

Extracting and Analyzing Deep Learning Features for Discriminating Historical Art

Practice and Experience in Advanced Research Computing, 2020

Art historians are interested in possible methods and visual criteria for determining the style and authorship of artworks. One approach, developed by Giovanni Morelli in the late nineteenth century, focused on abstracting, extracting and comparing details of recognizable human forms, although he never prescribed what exactly to look for. In this work, we asked what could a contemporary method like convolution networks contribute or reveal about such a humanistic method that is not fully determined, but that is also so clearly aligned with computation? Convolution networks have become very successful in object recognition because they learn general features to distinguish and classify large sets of objects. Thus, we wanted to explore what features are present in these networks that have some discriminatory power for distinguishing paintings. We input the digitized art into a large-scale convolutional network that was pre-trained for object recognition from naturalistic images. Because we do not have labels, we extracted activations from the network and ran K-means clustering. We contrasted and evaluated discriminatory power between shallow and deeper layers. We also compared predetermined features from standard computer vision techniques of edge detection. It turns out that the deep network individual feature maps are highly generic and do not easily map onto obvious authorship interpretations, but in the aggregate can have strong discriminating power that are intuitive. Although this does not directly test issues of attribution, the application can inform humanistic perspectives regarding what counts as features that make up visual elements of paintings.

Extracting and Analyzing Deep Learning Features for Discriminating Historical Art Deep Learning Features and Art

PEARC '20: Practice and Experience in Advanced Research Computing, 2020

Art historians are interested in possible methods and visual criteria for determining the style and authorship of artworks. One approach, developed by Giovanni Morelli in the late nineteenth century, focused on abstracting, extracting and comparing details of recognizable human forms, although he never prescribed what exactly to look for. In this work, we asked what could a contemporary method like convolution networks contribute or reveal about such a humanistic method that is not fully determined, but that is also so clearly aligned with computation? Convolution networks have become very successful in object recognition because they learn general features to distinguish and classify large sets of objects. Thus, we wanted to explore what features are present in these networks that have some discriminatory power for distinguishing paintings. We input the digitized art into a large-scale convolutional network that was pre-trained for object recognition from naturalistic images. Because we do not have labels, we extracted activations from the network and ran K-means clustering. We contrasted and evaluated discriminatory power between shallow and deeper layers. We also compared predetermined features from standard computer vision techniques of edge detection. It turns out that the deep network individual feature maps are highly generic and do not easily map onto obvious authorship interpretations, but in the aggregate can have strong discriminating power that are intuitive. Although this does not directly test issues of attribution, the application can inform humanistic perspectives regarding what counts as features that make up visual elements of paintings.

Illumination Detection in IIIF Medieval Manuscripts Using Deep Learning

Digital Medievalist (DM) Open Issue

Illuminated manuscripts are essential iconographic sources for medieval studies. With the massive adoption of IIIF, old and new digital collections of manuscripts are accessible online and provide interoperable image data. However, finding illuminations within the manuscripts’ pages is increasingly time consuming. This article proposes an approach based on machine learning and transfer learning that browses IIIF manuscript pages and detects the illuminated ones. To evaluate our approach, a group of domain experts created a new dataset of manually annotated IIIF manuscripts. The preliminary results show that our algorithm detects the main illuminated pages in a manuscript, thus reducing experts’ search time.

Recognizing Characters in Art History Using Deep Learning

Proceedings of the 1st Workshop on Structuring and Understanding of Multimedia heritAge Contents - SUMAC '19, 2019

Figure 1: Art historical scene depicting the iconography called Annunciation of the Lord (left [10], right [32]). Mary and Gabriel are the main protagonists. We can clearly see the differences in the background, in the artistic style, in the foreground, in the objects, their properties, and the use of color.

Michela Vignoli, Doris Gruber, Rainer Simon, Axel Weißenfeld: Impact of AI: Game-changer for Image Classification in Historical Research?

Melanie Althage, Martin Dröge, Torsten Hiltmann, Claudia Prinz (ed.), Digitale Methoden in der geschichtswissenschaftlichen Praxis: Fachliche Transformationen und ihre epistemologischen Konsequenzen: Konferenzbeiträge der Digital History 2023, 2023

AI opens new possibilities for processing and analysing large, heterogeneous historical data corpora in a semi-automated way. The Ottoman Nature in Travelogues (ONiT) project develops an interdisciplinary methodological framework for an AI-driven analysis of text-image relations in digitised printed material. In this paper, we discuss our results from the first project year, in which we explore the potential of multi-modal deep learning approaches for combined analysis of text and image similarity of "nature" representations in historical prints. Our experiments with OpenCLIP for zero-shot classification of prints from the ICONCLASS AI Test Set show the potential but also limitations of using pre-trained contrastive-learning algorithms for historical contents. Based on the results and our learnings, we discuss in which way computational, quantitative methods affect our underlying epistemology stemming from more traditional "analogue" methods. Our experiences confirm that interdisciplinary collaboration between historians and AI developers is important to adapt disciplinary conventions and heuristics for use in applied AI methods. Our main learnings are the necessity to differentiate between distinct visual features in historical images versus representations of "nature" that require interpretation, and to develop an understanding for the features an AI algorithm can be retrained to detect.

Digital Comics Image Indexing Based on Deep Learning

Journal of Imaging

The digital comic book market is growing every year now, mixing digitized and digital-born comics. Digitized comics suffer from a limited automatic content understanding which restricts online content search and reading applications. This study shows how to combine state-of-the-art image analysis methods to encode and index images into an XML-like text file. Content description file can then be used to automatically split comic book images into sub-images corresponding to panels easily indexable with relevant information about their respective content. This allows advanced search in keywords said by specific comic characters, action and scene retrieval using natural language processing. We get down to panel, balloon, text, comic character and face detection using traditional approaches and breakthrough deep learning models, and also text recognition using LSTM model. Evaluations on a dataset composed of online library content are presented, and a new public dataset is also proposed.

Dedicated Texture Based Tools for Characterisation of Old Books

Second International Conference on Document Image Analysis for Libraries (DIAL'06), 2006

This paper lies on the field of ancient patrimonial books valorization: it precisely relates to the development of suitable assistance tools for humanists and historians to help them to retrieve information in great corpus of digitized documents. This paper presents a part of this ambitious project and deals with the presentation of a pixel classification method for ancient typewritten documents. The presented approach lies on a multiresolution maps construction and analysis. For 5 resolutions we construct 5 different characterisation maps. All the maps are based on texture information (correlation of pixels orientations, grey level pixel density…). After the merging of these 25 maps, each pixel of the original image is described by a vector which allows the computing of a hierarchical classification. In order to avoid issues linked to the binarization process, all or maps are computed on grey level images. The system has been tested on a CESR database of ancient printed books of the Renaissance. The classification results are evaluated through several visual classification illustrations.