ROSSIO Infrastructure: a digital research tool for Social Sciences, Arts and Humanities (original) (raw)

Abstract. ROSSIO Infrastructure is building an open-access and free platform that aims to aggregate, organise and connect digital resources related to Social Sciences, Arts and Humanities located in Portuguese educational and cultural institutions. This paper aims to present ROSSIO infrastructure, the institutions involved, its main goals and the services it will provide, such as a discovery portal, exhibitions, collections and a virtual research environment. Underlying these services is a metadata aggregation approach that brings into ROSSIO the metadata on digital objects from the providing institutions. The aggregated dataset is transformed into linked data and enriched with entities from controlled vocabularies, which are defined by ROSSIO. We will detail this process, including the applications employed and how they interoperate. Finally, we will conclusively reflect on the potentialities of these services for public dissemination of science, taking into account the FAIR principles.

Corresponding author: [email protected]

1. Introduction

In 2016, PARTHENOS Project1 defined Research Infrastructures as “complex agglomerations of knowledge, data, people, and services that bring together diverse resources for a wide user base and make these resources (re)usable and available for an appropriately long term in order to support research (either individual or collaborative) and share the results of that research2”. Although this quote synthesizes some of the general features normally included in research infrastructures, the general scope of its concept and aims are still evolving, with some organizations valuing some aspects more than others. For instance, the European Strategy Forum on Research Infrastructures - ESFRI3 seems to be giving more emphasis to data sharing premises and data preservation during a research infrastructure lifecycle, as demonstrated by the European Roadmap for Research Infrastructures (2006, 2018) [1-2]. On the other hand, the American Council of Learned Societies Commission on Cyberinfrastructure for the Humanities and Social Sciences highlights the Infrastructures’ contribution to promote collaborative research and to build networks and communities, as revealed in the report “Our Cultural Commonwealth” (2006) [3]. Nevertheless, the reports from these institutions point out that infrastructures as well as the contents and services they provide, can vary greatly, depending, among other factors, on the scientific area they serve. According to the aforementioned institutions, we can distinguish three types of infrastructures: I) “single-sited”, infrastructures with their facilities concentrated in the same geographical location; II) “distributed”, located geographically in different poles; III) “digital”,with a strong and large technological component.

Since 2006, the European Strategy Forum on Research Infrastructures and its ESFRI Roadmap (e.g. 2016, 2018) have been promoting a strategy to develop and consolidate national and transnational research infrastructures in European Union countries and promote the collaboration between these institutions, which has, in turn, contributed to influence national scientific policies. Aiming to integrate Portuguese institutions within this context, the Foundation for Science and Technology (FCT) created the National Roadmap for Research Infrastructures of Strategic Interest (RNIE 2014-2020) in 2013, with the objective of mapping and evaluating the Portuguese research infrastructures. Initially, it consisted of 40 infrastructures, including ROSSIO Infrastructure: Social Sciences, Arts and Humanities4. In 2020, the number expanded to 56 infrastructures, of which only seven belong to Social Sciences and Humanities (SSAH) [4]. They collaborate with European counterparts, helping to create and develop international networks. For example, Social Sciences DataLab5 is the Portuguese node of SHARE ERIC project – Survey of Health, Ageing and Retirement in Europe6 and ROSSIO has the same role in the Digital Research Infrastructure for the Arts and Humanities (DARIAH7).

ROSSIO Infrastructure is a research infrastructure coordinated by the Nova School of Social Sciences and Humanities (NOVA University Lisbon). It integrates six Portuguese cultural institutions: Arquivo Municipal de Lisboa (Lisbon Municipal Archive), Cinemateca Portuguesa (Portuguese Film Archive), Biblioteca de Arte da Fundação Calouste Gulbenkian (Calouste Gulbenkian Art Library), Teatro Nacional D. Maria II (National Theater D. Maria II), Direção Geral do Património Cultural (Directorate-General for Cultural Heritage) Direção Geral do Livro, dos Arquivos e das Bibliotecas (Directorate-General for Books, Archives and Libraries). ROSSIO also includes content providers as are the cases of ARQUIVO.pt (Portuguese web-archive) and Diplomatic Institute at the Portuguese Ministry of Foreign Affairs. It is inspired by the best principles applied by other international research infrastructures related to SSAH, such as Digital Public Library of America (DPLA), Historiana, and Torvi.

ROSSIO Infrastructure has five major objectives: I) to aggregate, organise, connect, contextualize and provide free and open access to digital resources related to SSAH located in the aforementioned Portuguese educational and cultural institutions (providing, in some cases, the necessary funds for archival treatment and digitization of sources8); II) to promote the development of high-quality research on SSAH, stimulating new agendas and debates; III) to generate synergies and articulate individuals and institutions in order to promote scientific innovation and cultural heritage dissemination; IV) to contribute to the internationalization of SSAH studies, allowing researchers from all over the world to have a more transparent access to contents in Portuguese language, following the best international practices by other research infrastructures and FAIR data principles (findability, accessibility, interoperability and reuse); V) to build a sustainable network between academic and non-academic communities to better respond to the societal challenges.

This paper aims to present ROSSIO and reflect on how its services will contribute to change, promote and develop quality research, collaborative work and dissemination of knowledge. It is divided into two parts. The first will present the metadata aggregation approach and the applications that will be employed by ROSSIO, its potential users, and how the applications work together to support the work done by them. We will then focus on services provided by the platform, such as a discovery portal, exhibitions, collections and a virtual research environment (VRE).

2. Development and implementation process

Digital resources of interest for research in SSAH are dispersed over a large number of academic and cultural heritage institutions, which brings challenges to the discoverability and usage of such resources. An often-used approach, and the one applied by ROSSIO, is metadata aggregation, where a central organisation takes the role of facilitating the discovery and use of the resources by collecting their associated metadata. Based on these aggregated datasets of metadata, ROSSIO is in a position to further promote the usage of the digital resources by means that cannot be efficiently undertaken by each providing institution in isolation.

The technological approach to metadata aggregation applied by ROSSIO is based on the OAI-PMH protocol. This protocol was designed in 1999 [5] and was meant to address shortcomings in scholarly communication by providing a technical interoperability solution for discovery of e-prints, via metadata aggregation. The cultural heritage domain also embraced OAI-PMH, since discovery of cultural heritage digital resources was only feasible if based on metadata instead of full-text [6]. OAI-PMH is nowadays widely deployed in academic and cultural heritage institutions to support cooperative networks such as Europeana and the Digital Public Library of America.

2.1 Applications architecture

The metadata aggregated by ROSSIO is processed centrally by several systems in order to provide access and search functionalities on the metadata, which is then used by the VRE, the digital exhibitions and collections applications. ROSSIO’s systems also publish, according to the FAIR principles, these aggregated datasets, and other datasets created by the researchers while using the infrastructure. Figure 1 presents the applications that form the ROSSIO Infrastructure, how they are related, and with which users they interact, and which applications interoperate with external systems.

This applications architecture considers three general types of users (actors):

The architecture comprehends the following applications:

Fig. 1. The application architecture of ROSSIO Infrastructure.

During the initial operation of ROSSIO infrastructure, the metadata harvested from data providers will consist of a simple data model based on the 15 elements of the Dublin Core Metadata Element Set. Nevertheless, ROSSIO’s applications are being implemented for supporting a richer data model, which consists in a profile of the Europeana Data Model (EDM). This EDM application profile was defined in 2017 by a working group formed by representatives from Portuguese academic and cultural heritage institutions, and was named EDM-DRD application profile. This data model allows ROSSIO to represent the administrative metadata required for its operation, and also the enriched metadata created during the ingestion process. In the future, we expect that EDM-DRD will be implemented by data providers allowing ROSSIO to operate with high-quality metadata that will benefit its services for researchers.

2.2 The development of controlled vocabularies

As mentioned above, metadata normalization and enrichment are supported by controlled vocabularies that are published as linked open data by the ROSSIO Infrastructure. At this time, the following vocabularies are being developed:

The ROSSIO vocabularies are being modelled in SKOS [8], a W3C recommendation for thesauri and other knowledge organization systems in the Semantic Web. In addition to SKOS, the vocabularies reuse elements from other widely used ontologies:

The development of the ROSSIO vocabularies leverages existing structured and unstructured vocabulary resources, including lists of index terms provided by members of the ROSSIO consortium, as well as by reusing sections of established thesauri in SSAH such as the Getty’s Art and Architecture Thesaurus19. As a minimum requirement, the concepts included in the ROSSIO vocabularies are identified by Portuguese and English labels, whose form generally follows the conventions of thesauri for information retrieval [9].

As linked data resources, it is fundamental for the ROSSIO vocabularies to include links to external resources identified through URIs. This is achieved by declaring mapping properties between concepts in the ROSSIO vocabularies and external knowledge organization systems. Concepts in ROSSIO Thesaurus and ROSSIO Periods are being mapped to Getty’s Art and Architecture Thesaurus, either manually or semi-automatically through alignment tools for linked data resources. The ROSSIO Thesaurus is also aligned with the Backbone Thesaurus,20 a meta-thesaurus for the humanities published by DARIAH-EU. Finally, ROSSIO Agents is aligned with VIAF (Virtual International Authority File),21 while ROSSIO Places is aligned both with GeoNames22 and Getty Thesaurus of Geographic Names23.

3. ROSSIO Infrastructure services: discovery portal, exhibitions and digital collections, virtual research environment

The metadata aggregation process and the controlled vocabularies developed are the pillars that will allow ROSSIO Infrastructure to create a platform. The platform will employ different information and communication technologies (ICT) tools, commonly defined as devices, applications and systems that allow different agents – such as individuals and organizations – to interact digitally: a discovery portal, a VRE, and digital exhibitions and collections.

The discovery portal will allow the search of the digital resources (e.g. documents, videos, audio, photos, among others) located in the different heritage institutions, providing simple and advanced search options. In the latter case, the results will be more concrete and oriented towards controlled vocabularies, with filters that allow a more immediate approximation to the desired result. As with other similar initiatives, such as DPLA, this is going to be particularly important for the research community [10]. On the one hand, researchers are used to build more advanced research surveys and need tools to help them refine the results obtained. On the other hand, the research model will allow them to optimise the time spent on search and increase their research capacity. Furthermore, it will help them to open new lines of reflection and interpretation on the patterns, trends and links between the aggregated resources. In the case of ROSSIO, the discovery portal will allow access, within the same platform, to digital sources dispersed in different heritage institutions, and to scientific outputs produced at NOVA University. The discovery portal is the core of ROSSIO platform since all other products and services are highly dependable on their rightful implementation. Therefore, the development of simple and advanced research, based on controlled ontologies and vocabularies is vital for the interoperability between systems and platforms and dissemination of archival collections to SSAH experts and the general public.

Another service the platform will provide are exhibitions and digital collections. According to authors such as Martin R. Kalfatovic [11], Chee Khoon Leong [12-13], Maria Teresa Natale [9] and Angeliki Antoniou [15], digital exhibitions are activities that use hypermedia - information presented in the form of text, graphics, audio and video - with the objective of developing a given subject, resorting to a diverse set of digital objects arranged according to a predetermined narrative, potentially accessible to a wider and geographically dispersed audience. The inclusion of a contextualizing narrative distinguishes digital exhibitions from other similar initiatives, such as image galleries. The digital collections - not to be confused with sources digitized - are similar to the aforementioned exhibitions, aside from small differences. Considering the work developed by other Infrastructures (e.g. DPLA, Europeana, Trovi, Culturaitalia) and cultural institutions (e.g. British Library, Gallica), the digital collections that will be made available by ROSSIO are small sized exhibitions targeted to specific audiences, such as students, teachers or personnel related to tourism or cultural industries. These digital exhibitions and collections will resort to documentation aggregated and connected within the platform and contribute to promote their intrinsic and extrinsic value.

The VRE is designed to create a web-based working environment to enhance the research and facilitate the sharing of ROSSIO digital resources. Although this feature is open to anyone who accesses ROSSIO, it is being developed with specific communities in mind such as researchers, teachers and school and university students. Following principles of technical interoperability - with the use of open-source software and the adoption of standardized data organization standards (e.g. OAI-PMH protocol), sustainability, security and easy-to-use practices, the VRE is an indispensable tool for intuitive Research Infrastructures. Its collaborative character is particularly relevant, enabling dialogue and cooperation between different interlocutors in the scientific community [16, 17, 18].

In 2020, the pandemic context reinforced the importance of e-infrastructures and platforms, as well as the urgency in making content available for free and open access, following internationally defined methodologies (FAIR principles). It revealed its fundamental and indispensable role in overcoming the constraints caused by lockdowns and the growing infodemic, ensuring the community's access to scientific knowledge [19-20]. Considering the strengthening of these realities in the coming years, the ROSSIO platform will take on an important role in the aggregation, dissemination, curation and study of resources related with SSAH and located in different cultural, educational and diplomatic Portuguese institutions, as well as their integration in international networks.

The discovery portal will make it possible to research in several cultural, educational and diplomatic institutions at the same time, as well as to fine-tune the research carried out and, thus, increase the number of views of the resources available, highlighting both the valuable documents of Portuguese History (e.g. Medieval Royal Chancelleries; First Portuguese videos), as well as collections and sources related to Global History (e.g. UNESCO Memory of the World Programme). The exhibitions and digital collections will provide a way of presenting these digital resources, aimed at a wider audience, contributing to promote social inclusion in scientific knowledge, but also to enhance the digital literacy of society. The exhibitions and digital collections will encourage users to search for digital resources in the discovery portal to learn more about the objects displayed, but also to get to be acquainted with the related academic research projects in development. These exhibitions and collections can promote the development of collaborative practices that allow strengthening community participation in the production of scientific knowledge.

The VRE is an additional tool characterised as a personal web-based workspace that will provide all necessary information on digital objects made available by the platform and a means of enhancing a community of practices, enabling the development of collaborative initiatives. Although VRE has traditionally been associated with scientific work, it is intended to benefit other target communities such as the educational (students and teachers) and cultural and tourism industries (tourist guides and museums), following the example of other international infrastructures (e.g. Historiana). In the case of the teachers and students, this may help to bring them closer to scientific and cultural institutions, promoting dynamic and interactive learning spaces, where hands-on initiatives are encouraged [21-22].

Within ROSSIO, the development of controlled vocabularies is expected to enable the normalization and semantic enrichment of the metadata aggregated and produced in the platform. The publication of the ROSSIO vocabularies as linked open data complies with FAIR principles and is a relevant contribution for the Portuguese section of the linguistic linked open data cloud, which remains underrepresented in terms of number of resources. For example, of the more than 100,000 resources listed in LingHub, a directory for linked data language resources, only 96 are in Portuguese24.

Furthermore, the deployment of applications for managing and publishing SKOS vocabularies is expected to facilitate the collaborative development of domain or institutional specific controlled vocabularies by members of the ROSSIO consortium, functioning as a hub for information organization activities in the SSAH and in the Portuguese language.

The platform and its services will, hopefully, contribute to the development and promotion of the best international practices to Portuguese cultural and academic institutions, focusing on state-of-the-art procedures for safeguarding documentation and its subsequent connection, enrichment and dissemination in digital platforms and infrastructures to the general public. The experience acquired in the building process of the platform will also encourage and facilitate the entry of new content providers in the future, from central to local Portuguese heritage institutions. A good example of this is the recent incorporation of the Diplomatic Institute at the Portuguese Ministry of Foreign Affairs as a content provider (2021). ROSSIO could also be an asset for local and generally small-sized cultural institutions (e.g. municipal historical archives), since many of them do not have the necessary technical skills and financial funds to ensure the aggregation and semantic enrichment of their resources on digital platforms.

Thus, keeping in mind the words of Tim Sherratt regarding platforms, ROSSIO platform intends to be a relevant digital tool for unlocking, sharing and exploring the Portuguese Cultural Heritage.

References

[1] European Strategy Forum on Research Infrastructures (2016). European Roodmap for Research Infrastructures Report 2016. Luxembourg: Office for Official Publications of the European Communities. . https://www.esfri.eu/sites/default/files/esfri_roadmap_2006_en.pdf;

[2] European Strategy Forum on Research Infrastructures (2018). European Roodmap for Research Infrastructures Report 2018. Luxembourg: Office for Official Publications of the European Communities. http://roadmap2018.esfri.eu/

[3] ACLS Commission on Cyberinfrastructure (2006). O_ur Cultural Commonwealth: the report of the American Council of learned societies commission on cyberinfrastructure for the humanities and social science_.

[4] Foundation for Science and Technology (2020), P_ortuguese Roadmap of Research Infrastructures – 2020 Update_. Lisbon: FCT. https://www.fct.pt/apoios/equipamento/roteiro/index.phtml.en

[5] Lagoze, C., Van de Sompel, H., Nelson, M. and Warner, S. (2002). The Open Archives Initiative Protocol for Metadata Harvesting, Version 2.0 <http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm>.

[6] Van de Sompel, H., Nelson, M. (2015). “Reminiscing About 15 Years of Interoperability Efforts”. D-Lib Magazine, 21(11/12). doi:10.1045/november2015-vandesompel < http://www.dlib.org/dlib/november15/vandesompel/11vandesompel.html>

[7] Almeida, B., Freire, N., & Monteiro, D. (2021). “The Development of the ROSSIO Thesaurus: Supporting Content Discovery and Management in a Research Infrastructure”. In D. Dosso, S. Ferilli, P. Manghi, A. Poggi, G. Serra, & G. Silvello (Eds.), Proceedings of the 17th Italian Research Conference on Digital Libraries (pp. 138–146). Aachen: CEUR-WS. http://ceur-ws.org/Vol-2816/

[8] Miles, A., & Bechhofer, S. (2009). SKOS Simple Knowledge Organization System Reference. http://www.w3.org/TR/skos-reference

[9] ISO 25964-1. (2011). Information and documentation—Thesauri and interoperability with other vocabularies—Part 1: Thesauri for information retrieval. Geneva: ISO.

[10] Sherratt, T. (2013). From portals to platforms – building new frameworks for user engagement. 1-9. Paper presented at LIANZA 2013, Hamilton, New Zealand. https://doi.org/10.5281/zenodo.3563238

[11] Kalfatovic, M. R. (2002), Creating a Winning Online Exhibition. A Guide for Libraries, Archives, and Museums. Chicago/London: American Library Association.

[12] Khoon, L. C., Chennupati, R. K., Foo, S. (2003). “The design and development of an online exhibition for heritage information awareness in Singapore”, Program, 37(2), pp. 85-93.

[13] Khoon, L. C., Chennupati, R. K. (2014). “Design and development of Web-based Online Exhibitions”, DESIDOC Journal of Library & Information Technology, 32(2), pp. 97-102.

[14] Natale, M. T., Fernández, S., López, M. (Eds.) (2012). Handbook on virtual exhibitions and virtual performances. Tivoli (Roma): Offiine Grafihe Tiburtine, 2012.

[15] Antoniou, A., Lepouras, G. L., Vassi lakis, C. (2013). “Methodology for Design of Online Exhibitions”, DESIDOC Journal of Library & Information Technology, 33(3), pp. 158-167.

[16] Candela, L., Castelli, D., Pagano, P. (2013). “Virtual Research Environments: an overview and a research agenda”, Data Science Journal, 12, pp. 75-81.

[17] Zhou, J. et al (2020), “Building Science Gateways for Humanities”. In: Practice and Experience in Advanced Research Computing (PEARC’20). New York: Association for Computing Machinery, pp. 327–332. doi: https://doi.org/10.1145/3311790.3396628

[18] Carusi, A., REIMER, T. (2010), Virtual Research Environment Collaborative Landscape Study: A JISC funded project. [Bristol]: JISC.

[19] OSÓRIO, A. J. (2020). “Reflexões sobre tecnologia e educação em tempo de pandemia”. In: A Universidade do Minho em tempos de pandemia II. Minho: UMinho Editora, pp. 212-224. https://doi.org/10.21814/uminho.ed.24.9

[20] RODRIGUES, E. (2020), “A pandemia e a emergência da ciência aberta”. In: A Universidade do Minho em tempos de pandemia. II. Minho: UMinho Editora, pp. 263-294. https://doi.org/10.21814/uminho.ed.24.12

[21] Elrayies, G. M., (2017), “Flipped Learning as a Paradigm Shift in Architectural Education”, International Education Studies, 10 (1), pp. 93-108.

[22] Ahmed, Hanaa Ouda Khadri, (2016) “Flipped Learning As A New Educational Paradigm: An Analytical Critical Study”, European Scientific Journal, 12 (10), pp. 417-444.