P. Bolettieri - Academia.edu (original) (raw)
Papers by P. Bolettieri
The Artificial Intelligence for Media and Humanities laboratory (AIMH) has the mission to investi... more The Artificial Intelligence for Media and Humanities laboratory (AIMH) has the mission to investigate and advance the state of the art in the Artificial Intelligence field, specifically addressing applications to digital media and digital humanities, and taking also into account issues related to scalability. This report summarize the 2020 activities of the research group.
Lecture Notes in Computer Science, 2006
The digital library field is recently broadening its scope of applicability and it is also contin... more The digital library field is recently broadening its scope of applicability and it is also continuously adapting to the frequent changes occurring in the internet society. Accordingly, digital libraries are slightly moving from a controlled environment accessible only to professionals and domain-experts, to environments accessible to casual users that want to exploit the potentialities offered by the digital library technology. These new trends require, for instance, new search paradigms to be offered, new media content to be managed, and new description extraction techniques to be used. Building digital library applications, and effectively adapting them to new emerging trends, requires to develop a platform that offers standard and powerful building blocks to support application developers. In this paper we discuss our experience of using MILOS, a multimedia content management system oriented to the construction of digital libraries, to build a demanding application dedicated to non-professional users. Specifically, we discuss the design and implementation of an on-line photo album (PhotoBook), which is a digital library application that allows people to manage their own photos, to share them with friends, and to make them publicly available and searchable. PhotoBook, uses a complex internal metadata schema (MPEG-7) and allows users to simply express complex queries (combining similarity search and fielded search), enabling them to retrieve material of interest even if metadata are imprecise or missing.
Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, 2016
Surrogate Text Representation (STR) is a profitable solution to efficient similarity search on me... more Surrogate Text Representation (STR) is a profitable solution to efficient similarity search on metric space using conventional text search engines, such as Apache Lucene. This technique is based on comparing the permutations of some reference objects in place of the original metric distance. However, the Achilles heel of STR approach is the need to reorder the result set of the search according to the metric distance. This forces to use a support database to store the original objects, which requires efficient random I/O on a fast secondary memory (such as flash-based storages). In this paper, we propose to extend the Surrogate Text Representation to specifically address a class of visual metric objects known as Vector of Locally Aggregated Descriptors (VLAD). This approach is based on representing the individual sub-vectors forming the VLAD vector with the STR, providing a finer representation of the vector and enabling us to get rid of the reordering phase. The experiments on a publicly available dataset show that the extended STR outperforms the baseline STR achieving satisfactory performance near to the one obtained with the original VLAD vectors.
Lecture Notes in Computer Science, 2015
Lecture Notes in Computer Science, 2007
The objective of this paper is to demonstrate the reuse of digital content, as video documents or... more The objective of this paper is to demonstrate the reuse of digital content, as video documents or PowerPoint presentations, by exploiting existing technologies for automatic extraction of metadata (OCR, speech recognition, cut detection, MPEG-7 visual descriptors, etc.). The multimedia documents and the extracted metadata are then indexed and managed by the Multimedia Content Management System (MCMS) MILOS, specifically developed to support design and effective implementation of digital library applications. As a result, the indexed digital material can be retrieved by means of content based retrieval on the text extracted and on the MPEG-7 visual descriptors (via similarity search), assisting the user of the e-Learning Library (student or teacher) to retrieve the items not only on the basic bibliographic metadata (title, author, etc.).
Lecture Notes in Computer Science, 2006
The digital library field is recently broadening its scope of applicability and it is also contin... more The digital library field is recently broadening its scope of applicability and it is also continuously adapting to the frequent changes occurring in the internet society. Accordingly, digital libraries are slightly moving from a controlled environment accessible only to professionals and domain-experts, to environments accessible to casual users that want to exploit the potentialities offered by the digital library technology. These new trends require, for instance, new search paradigms to be offered, new media content to be managed, and new description extraction techniques to be used. Building digital library applications, and effectively adapting them to new emerging trends, requires to develop a platform that offers standard and powerful building blocks to support application developers. In this paper we discuss our experience of using MILOS, a multimedia content management system oriented to the construction of digital libraries, to build a demanding application dedicated to non-professional users. Specifically, we discuss the design and implementation of an on-line photo album (PhotoBook), which is a digital library application that allows people to manage their own photos, to share them with friends, and to make them publicly available and searchable. PhotoBook, uses a complex internal metadata schema (MPEG-7) and allows users to simply express complex queries (combining similarity search and fielded search), enabling them to retrieve material of interest even if metadata are imprecise or missing.
Proceedings of the 6th International Conference on Mobile Technology, Application and Systems, Mobility '09, 2009
In this paper we present a prototype for parental control that detects images with adult content ... more In this paper we present a prototype for parental control that detects images with adult content received on a mobile device. More specifically, the application that we developed is able to intercept images received through various communication channels (bluetooth, MMS) on mobile devices based on the Symbian T M operating systems. Once intercepted, the images are analysed by the component of the system that automatically classify images with explicit sexual content. At the current stage the application that intercept images runs on the mobile device, the classifier runs on a remote server.
Communications in Computer and Information Science, 2013
In this paper we present the architecture of a Digital Library for enabling the reusing of audiov... more In this paper we present the architecture of a Digital Library for enabling the reusing of audiovisual documents in an e-Learning context. The reuse of Learning Objects is based on automatically extracted descriptors carrying a semantic meaning for the professional that uses these Learning Objects to prepare new interactive multimedia lectures. The presented system is based on MILOS, a general purpose Multimedia Content Management System created to support design and effective implementation of digital library applications. MILOS supports the storage and content based retrieval of any multimedia documents whose descriptions are provided by using arbitrary metadata models represented in XML. The objective is to demonstrate the reuse of digital content, as video documents or PowerPoint presentations, by exploiting existing technologies for automatic extraction of metadata (OCR, speech recognition, cut detection, MPEG-7 visual descriptors, etc.). The search interface assists the user of the system in the retrieval the multimedia objects in the collection, by combining full-text retrieval on text extracted and metadata, and similarity search on the MPEG-7 visual descriptors.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2013
Vector of locally aggregated descriptors (VLAD) is a promising approach for addressing the proble... more Vector of locally aggregated descriptors (VLAD) is a promising approach for addressing the problem of image search on a very large scale. This representation is proposed to overcome the quantization error problem faced in Bag-of-Words (BoW) representation. However, text search engines have not be used yet for indexing VLAD given that it is not a sparse vector of occurrence counts. For this reason BoW approach is still the most widely adopted method for finding images that represent the same object or location given an image as a query and a large set of images as dataset. In this paper, we propose to enable inverted files of standard text search engines to exploit VLAD representation to deal with large-scale image search scenarios. We show that the use of inverted file with VLAD significantly outperforms BoW in terms of efficiency and effectiveness on the same hardware and software infrastructure.
Enabling effective and efficient Content-Based Image Retrieval (CBIR) on Very Large Digital Libra... more Enabling effective and efficient Content-Based Image Retrieval (CBIR) on Very Large Digital Libraries (VLDLs), is today an important research issue. While there exist well-known approaches for information retrieval on textual content for VLDLs, the research for an effective CBIR method that is also able to scale to very large collections is still open. A practical effect of this situation is that most of the image retrieval services currently available for VLDLs are based only on textual metadata. In this paper, we report on our experience in creating a collection of 106 million images, i.e., the CoPhIR collection, the largest currently available to the scientific community for research purposes.We discuss the various issues arising from working with a such large collection and dealing with a complex retrieval model on information-rich features. We present the non-trivial process of image crawling and descriptive feature extraction, using the European EGEE computer GRID. The feature extraction phase is often ignored when discussing the scalability issue while, as we show in this work, it could be one of the toughest issues to be solved in order to make CBIR feasible on VLDLs.
The scalability, as well as the effectiveness, of the different Content-based Image Retrieval (CB... more The scalability, as well as the effectiveness, of the different Content-based Image Retrieval (CBIR) approaches proposed in literature, is today an important research issue. Given the wealth of images on the Web, CBIR systems must in fact leap towards Web-scale datasets. In this paper, we report on our experience in building a test collection of 100 million images, with the corresponding descriptive features, to be used in experimenting new scalable techniques for similarity searching, and comparing their results. In the context of the SAPIR (Search on Audiovisual content using Peer-to-peer Information Retrieval) European project, we had to experiment our distributed similarity searching technology on a realistic data set. Therefore, since no large-scale collection was available for research purpose, we had to tackle the non-trivial process of image crawling and descriptive feature extraction (we used five MPEG-7 features) using the European EGEE computer GRID. The result of this effort is CoPhIR, the first CBIR test collection of such scale. CoPhIR is now open to the research community for experiments and comparisons, and access to the collection was already granted to more than 50 research groups worldwide.
This report describes the MILOS Multimedia Content Management System: a general purpose software ... more This report describes the MILOS Multimedia Content Management System: a general purpose software component tailored to support design and effective implementation of any digital library application. MILOS supports the storage and content based retrieval of any multimedia documents whose descriptions are provided by using arbitrary metadata models represented in XML. MILOS is flexible in the management of documents containing different types of data and content descriptions; it is efficient and ...
ABSTRACT Vector of locally aggregated descriptors (VLAD) is a promising approach for addressing t... more ABSTRACT Vector of locally aggregated descriptors (VLAD) is a promising approach for addressing the problem of image search on a very large scale. This representation is proposed to overcome the quantization error problem faced in Bag-of-Words (BoW) representation. In this paper, we propose to enable inverted files of standard text search engines to exploit VLAD representation to deal with large-scale image search scenarios. We show that the use of inverted files with VLAD significantly outperforms BoW in terms of efficiency and effectiveness on the same hardware and software infrastructure.
In this paper we present the web user interface of a scalable and distributed system for image re... more In this paper we present the web user interface of a scalable and distributed system for image retrieval based on visual features and annotated text, developed in the context of the SAPIR project. Its architecture makes use of Peer-to-Peer networks to achieve scalability and efficiency allowing the management of huge amount of data and simultaneous access by a large number of users. Describing the SAPIR web user interface we want to encourage final users to use SAPIR to search by content similarity, together with the usual text search, on a large image collection (100 million images crawled from Flickr) with realistic response time. On the ground of the statistics collected, it will be possible, for the first time, to study the user behavior (e.g., the way they combine text and image content search) in this new realistic environment.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2010
Content-based image retrieval is becoming a popular way for searching digital libraries as the am... more Content-based image retrieval is becoming a popular way for searching digital libraries as the amount of available multimedia data increases. However, the cost of developing from scratch a robust and reliable system with content-based image retrieval facilities for large databases is quite prohibitive. In this paper, we propose to exploit an approach to perform approximate similarity search in metric spaces developed by [3, 6]. The idea at the basis of these techniques is that when two objects are very close one to each other they'see'the ...
Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR'11, 2011
We present the VIsual Support to Interactive TOurism in Tuscany (VISITO Tuscany) project which of... more We present the VIsual Support to Interactive TOurism in Tuscany (VISITO Tuscany) project which offers an interactive guide for tourists visiting cities of art accessible via smartphones. The peculiarity of the system is that user interaction is mainly obtained by the use of images -- In order to receive information on a particular monument users just have to take a picture of it. VISITO Tuscany, using techniques of image analysis and content recognition, automatically recognize the photographed monuments and pertinent information is displayed to the user. In this paper we illustrate how the use of landmarks recognition from mobile devices can provide the tourist with relevant and customized information about various type of objects in cities of art.
Proceedings - International Workshop on Content-Based Multimedia Indexing, 2011
Abstract In this paper we propose a novel approach that allows processing image content based que... more Abstract In this paper we propose a novel approach that allows processing image content based queries expressed as arbitrary combinations of local and global visual features, by using a single index realized as an inverted file. The index was implemented on top of the Lucene retrieval engine. This is particularly useful to allow people to efficiently and interactively check the quality of the retrieval result by exploiting combinations of features, by using a single index realized as an inverted file. The index was implemented on top of the ...
The Artificial Intelligence for Media and Humanities laboratory (AIMH) has the mission to investi... more The Artificial Intelligence for Media and Humanities laboratory (AIMH) has the mission to investigate and advance the state of the art in the Artificial Intelligence field, specifically addressing applications to digital media and digital humanities, and taking also into account issues related to scalability. This report summarize the 2020 activities of the research group.
Lecture Notes in Computer Science, 2006
The digital library field is recently broadening its scope of applicability and it is also contin... more The digital library field is recently broadening its scope of applicability and it is also continuously adapting to the frequent changes occurring in the internet society. Accordingly, digital libraries are slightly moving from a controlled environment accessible only to professionals and domain-experts, to environments accessible to casual users that want to exploit the potentialities offered by the digital library technology. These new trends require, for instance, new search paradigms to be offered, new media content to be managed, and new description extraction techniques to be used. Building digital library applications, and effectively adapting them to new emerging trends, requires to develop a platform that offers standard and powerful building blocks to support application developers. In this paper we discuss our experience of using MILOS, a multimedia content management system oriented to the construction of digital libraries, to build a demanding application dedicated to non-professional users. Specifically, we discuss the design and implementation of an on-line photo album (PhotoBook), which is a digital library application that allows people to manage their own photos, to share them with friends, and to make them publicly available and searchable. PhotoBook, uses a complex internal metadata schema (MPEG-7) and allows users to simply express complex queries (combining similarity search and fielded search), enabling them to retrieve material of interest even if metadata are imprecise or missing.
Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, 2016
Surrogate Text Representation (STR) is a profitable solution to efficient similarity search on me... more Surrogate Text Representation (STR) is a profitable solution to efficient similarity search on metric space using conventional text search engines, such as Apache Lucene. This technique is based on comparing the permutations of some reference objects in place of the original metric distance. However, the Achilles heel of STR approach is the need to reorder the result set of the search according to the metric distance. This forces to use a support database to store the original objects, which requires efficient random I/O on a fast secondary memory (such as flash-based storages). In this paper, we propose to extend the Surrogate Text Representation to specifically address a class of visual metric objects known as Vector of Locally Aggregated Descriptors (VLAD). This approach is based on representing the individual sub-vectors forming the VLAD vector with the STR, providing a finer representation of the vector and enabling us to get rid of the reordering phase. The experiments on a publicly available dataset show that the extended STR outperforms the baseline STR achieving satisfactory performance near to the one obtained with the original VLAD vectors.
Lecture Notes in Computer Science, 2015
Lecture Notes in Computer Science, 2007
The objective of this paper is to demonstrate the reuse of digital content, as video documents or... more The objective of this paper is to demonstrate the reuse of digital content, as video documents or PowerPoint presentations, by exploiting existing technologies for automatic extraction of metadata (OCR, speech recognition, cut detection, MPEG-7 visual descriptors, etc.). The multimedia documents and the extracted metadata are then indexed and managed by the Multimedia Content Management System (MCMS) MILOS, specifically developed to support design and effective implementation of digital library applications. As a result, the indexed digital material can be retrieved by means of content based retrieval on the text extracted and on the MPEG-7 visual descriptors (via similarity search), assisting the user of the e-Learning Library (student or teacher) to retrieve the items not only on the basic bibliographic metadata (title, author, etc.).
Lecture Notes in Computer Science, 2006
The digital library field is recently broadening its scope of applicability and it is also contin... more The digital library field is recently broadening its scope of applicability and it is also continuously adapting to the frequent changes occurring in the internet society. Accordingly, digital libraries are slightly moving from a controlled environment accessible only to professionals and domain-experts, to environments accessible to casual users that want to exploit the potentialities offered by the digital library technology. These new trends require, for instance, new search paradigms to be offered, new media content to be managed, and new description extraction techniques to be used. Building digital library applications, and effectively adapting them to new emerging trends, requires to develop a platform that offers standard and powerful building blocks to support application developers. In this paper we discuss our experience of using MILOS, a multimedia content management system oriented to the construction of digital libraries, to build a demanding application dedicated to non-professional users. Specifically, we discuss the design and implementation of an on-line photo album (PhotoBook), which is a digital library application that allows people to manage their own photos, to share them with friends, and to make them publicly available and searchable. PhotoBook, uses a complex internal metadata schema (MPEG-7) and allows users to simply express complex queries (combining similarity search and fielded search), enabling them to retrieve material of interest even if metadata are imprecise or missing.
Proceedings of the 6th International Conference on Mobile Technology, Application and Systems, Mobility '09, 2009
In this paper we present a prototype for parental control that detects images with adult content ... more In this paper we present a prototype for parental control that detects images with adult content received on a mobile device. More specifically, the application that we developed is able to intercept images received through various communication channels (bluetooth, MMS) on mobile devices based on the Symbian T M operating systems. Once intercepted, the images are analysed by the component of the system that automatically classify images with explicit sexual content. At the current stage the application that intercept images runs on the mobile device, the classifier runs on a remote server.
Communications in Computer and Information Science, 2013
In this paper we present the architecture of a Digital Library for enabling the reusing of audiov... more In this paper we present the architecture of a Digital Library for enabling the reusing of audiovisual documents in an e-Learning context. The reuse of Learning Objects is based on automatically extracted descriptors carrying a semantic meaning for the professional that uses these Learning Objects to prepare new interactive multimedia lectures. The presented system is based on MILOS, a general purpose Multimedia Content Management System created to support design and effective implementation of digital library applications. MILOS supports the storage and content based retrieval of any multimedia documents whose descriptions are provided by using arbitrary metadata models represented in XML. The objective is to demonstrate the reuse of digital content, as video documents or PowerPoint presentations, by exploiting existing technologies for automatic extraction of metadata (OCR, speech recognition, cut detection, MPEG-7 visual descriptors, etc.). The search interface assists the user of the system in the retrieval the multimedia objects in the collection, by combining full-text retrieval on text extracted and metadata, and similarity search on the MPEG-7 visual descriptors.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2013
Vector of locally aggregated descriptors (VLAD) is a promising approach for addressing the proble... more Vector of locally aggregated descriptors (VLAD) is a promising approach for addressing the problem of image search on a very large scale. This representation is proposed to overcome the quantization error problem faced in Bag-of-Words (BoW) representation. However, text search engines have not be used yet for indexing VLAD given that it is not a sparse vector of occurrence counts. For this reason BoW approach is still the most widely adopted method for finding images that represent the same object or location given an image as a query and a large set of images as dataset. In this paper, we propose to enable inverted files of standard text search engines to exploit VLAD representation to deal with large-scale image search scenarios. We show that the use of inverted file with VLAD significantly outperforms BoW in terms of efficiency and effectiveness on the same hardware and software infrastructure.
Enabling effective and efficient Content-Based Image Retrieval (CBIR) on Very Large Digital Libra... more Enabling effective and efficient Content-Based Image Retrieval (CBIR) on Very Large Digital Libraries (VLDLs), is today an important research issue. While there exist well-known approaches for information retrieval on textual content for VLDLs, the research for an effective CBIR method that is also able to scale to very large collections is still open. A practical effect of this situation is that most of the image retrieval services currently available for VLDLs are based only on textual metadata. In this paper, we report on our experience in creating a collection of 106 million images, i.e., the CoPhIR collection, the largest currently available to the scientific community for research purposes.We discuss the various issues arising from working with a such large collection and dealing with a complex retrieval model on information-rich features. We present the non-trivial process of image crawling and descriptive feature extraction, using the European EGEE computer GRID. The feature extraction phase is often ignored when discussing the scalability issue while, as we show in this work, it could be one of the toughest issues to be solved in order to make CBIR feasible on VLDLs.
The scalability, as well as the effectiveness, of the different Content-based Image Retrieval (CB... more The scalability, as well as the effectiveness, of the different Content-based Image Retrieval (CBIR) approaches proposed in literature, is today an important research issue. Given the wealth of images on the Web, CBIR systems must in fact leap towards Web-scale datasets. In this paper, we report on our experience in building a test collection of 100 million images, with the corresponding descriptive features, to be used in experimenting new scalable techniques for similarity searching, and comparing their results. In the context of the SAPIR (Search on Audiovisual content using Peer-to-peer Information Retrieval) European project, we had to experiment our distributed similarity searching technology on a realistic data set. Therefore, since no large-scale collection was available for research purpose, we had to tackle the non-trivial process of image crawling and descriptive feature extraction (we used five MPEG-7 features) using the European EGEE computer GRID. The result of this effort is CoPhIR, the first CBIR test collection of such scale. CoPhIR is now open to the research community for experiments and comparisons, and access to the collection was already granted to more than 50 research groups worldwide.
This report describes the MILOS Multimedia Content Management System: a general purpose software ... more This report describes the MILOS Multimedia Content Management System: a general purpose software component tailored to support design and effective implementation of any digital library application. MILOS supports the storage and content based retrieval of any multimedia documents whose descriptions are provided by using arbitrary metadata models represented in XML. MILOS is flexible in the management of documents containing different types of data and content descriptions; it is efficient and ...
ABSTRACT Vector of locally aggregated descriptors (VLAD) is a promising approach for addressing t... more ABSTRACT Vector of locally aggregated descriptors (VLAD) is a promising approach for addressing the problem of image search on a very large scale. This representation is proposed to overcome the quantization error problem faced in Bag-of-Words (BoW) representation. In this paper, we propose to enable inverted files of standard text search engines to exploit VLAD representation to deal with large-scale image search scenarios. We show that the use of inverted files with VLAD significantly outperforms BoW in terms of efficiency and effectiveness on the same hardware and software infrastructure.
In this paper we present the web user interface of a scalable and distributed system for image re... more In this paper we present the web user interface of a scalable and distributed system for image retrieval based on visual features and annotated text, developed in the context of the SAPIR project. Its architecture makes use of Peer-to-Peer networks to achieve scalability and efficiency allowing the management of huge amount of data and simultaneous access by a large number of users. Describing the SAPIR web user interface we want to encourage final users to use SAPIR to search by content similarity, together with the usual text search, on a large image collection (100 million images crawled from Flickr) with realistic response time. On the ground of the statistics collected, it will be possible, for the first time, to study the user behavior (e.g., the way they combine text and image content search) in this new realistic environment.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2010
Content-based image retrieval is becoming a popular way for searching digital libraries as the am... more Content-based image retrieval is becoming a popular way for searching digital libraries as the amount of available multimedia data increases. However, the cost of developing from scratch a robust and reliable system with content-based image retrieval facilities for large databases is quite prohibitive. In this paper, we propose to exploit an approach to perform approximate similarity search in metric spaces developed by [3, 6]. The idea at the basis of these techniques is that when two objects are very close one to each other they'see'the ...
Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR'11, 2011
We present the VIsual Support to Interactive TOurism in Tuscany (VISITO Tuscany) project which of... more We present the VIsual Support to Interactive TOurism in Tuscany (VISITO Tuscany) project which offers an interactive guide for tourists visiting cities of art accessible via smartphones. The peculiarity of the system is that user interaction is mainly obtained by the use of images -- In order to receive information on a particular monument users just have to take a picture of it. VISITO Tuscany, using techniques of image analysis and content recognition, automatically recognize the photographed monuments and pertinent information is displayed to the user. In this paper we illustrate how the use of landmarks recognition from mobile devices can provide the tourist with relevant and customized information about various type of objects in cities of art.
Proceedings - International Workshop on Content-Based Multimedia Indexing, 2011
Abstract In this paper we propose a novel approach that allows processing image content based que... more Abstract In this paper we propose a novel approach that allows processing image content based queries expressed as arbitrary combinations of local and global visual features, by using a single index realized as an inverted file. The index was implemented on top of the Lucene retrieval engine. This is particularly useful to allow people to efficiently and interactively check the quality of the retrieval result by exploiting combinations of features, by using a single index realized as an inverted file. The index was implemented on top of the ...