Uroš Milošević | Universität Leipzig (original) (raw)
Papers by Uroš Milošević
With the emergence of Linked Data, DBpedia has steadily grown to become one of the largest and mo... more With the emergence of Linked Data, DBpedia has steadily grown to become one of the largest and most important structured knowledge sources we know of today. Adopting Wikipedia’s practice of entrusting the community with most of the work, the DBpedia internationalization committee has made a major step towards the move from unstructured to structured knowledge on the Web. Still, with new languages come new challenges. In this paper, we inspect some common obstacles that need to be tackled in order to add one more language to this popular data hub, but also some that haven’t been encountered before in this domain. More specifically, we explore the digraphic nature of the Serbian language, analyze the state of the DBpedia Extraction Framework with respect to its support for languages that use multiple scripts, and propose solutions towards overcoming this problem. Finally, we deploy the first digraphic DBpedia edition, taking the leading position amongst all DBpedia versions in the per...
To be able to cope with all the challenges that are emerging with the rise of the Web of Data, li... more To be able to cope with all the challenges that are emerging with the rise of the Web of Data, linked data tools must find ways to deal with its heterogeneity with regard to the content being published and the languages that content is being published in. Adequate internationalization support is a must for distributing or consuming multilingual data, but providing it is not straightforward. In this paper, we analyze the problems semantic web tools are faced with when it comes to data being published in Western Balkan languages. We look at resource presentation mechanisms, show how IRIs can solve many of the common problems related to Serbian alphabets, but also that the Semantic Web is still not ready for full transition to internationalized identifiers. We also review different serialization options and explain why most XML-based formats are not suitable for the task of internationalization. To see how popular linked data tools deal with such data, we propose an evaluation methodol...
As part of harmonisation with the EU legislation and practices, the Serbian authorities follow th... more As part of harmonisation with the EU legislation and practices, the Serbian authorities follow the standards and technologies that are used in the European Union and adapt their IT infrastructure to ensure quality service to the public, other governmental agencies, and other business partners, domestic and international. In this paper, we showcase the activities of the Statistical Office of the Republic of Serbia related to the publication of statistical data as Linked Data,., that are to enable better transparency and interaction of data with similar institutions in Europe. The given example outlines the steps to be taken for safe transitioning from conventional statistical data formats to Linked Data and aligning the practices with the international standards, in an effort to establish a high level of semantic interoperability, both locally and globally.
To make the Web of Data a reality, and push large scale integration of, and reasoning on, data on... more To make the Web of Data a reality, and push large scale integration of, and reasoning on, data on the Web, huge amounts of data must be made available in a standard format, reachable and manageable by Semantic Web tools. National statistical offices across the world already possess an abundance of structured data, both in their databases and files in various formats. We will first consider the reasons for making such data available as Linked Data. Then, some novel data representation methods, compatible with SDMX, an international standard for exchanging statistical data, will be showcased. We will then explain how to move from structured statistical data, represented in XML, to RDF, as well as how to enrich such datasets using standard classification schemas. Finally, we will present a way of increasing data visibility through cataloging newly created linked statistical data at both local and international level.
In order to improve efficiency in the provision of public services, increase transparency and int... more In order to improve efficiency in the provision of public services, increase transparency and interaction with citizens and society as a whole, but also create new businesses and job opportunities, both local and national governments need to find better strategies for delivering their data to the public in a powerful, machine-readable and future-proof format. We examine the case of Serbia, give an insight into the current state of affairs in the country, and show how both governmental agencies and the citizens of Serbia can benefit from Linked Open Data, a state of the art web-based technology that is promising to reshape the web itself. A working approach to publishing government data as Linked Data is demonstrated on the case of one of the largest data providers in the country, the Statistical Office of the Republic of Serbia. By cataloging the results in a local metadata repository, and scheduling periodical “harvesting” at an international level, both transparency and public ser...
There is growing interest from ecological experts to create qualitative models of phenomena for w... more There is growing interest from ecological experts to create qualitative models of phenomena for which numerical information is sparse or missing. We present a number of successful models in the field of environmental science, namely, the domain of global warming. The motivation behind the effort is to enrich the DynaLearn interactive learning environment model repository. We first model two of the negative feedback factors (snow and ice albedo and cooling aerosols), then two of the positive feedback ones (water vapor and warming aerosols). We then combine the two mechanisms in a larger model (low and high clouds), proving the possibility of extracting general mechanisms that can be reapplied to other systems sharing similar characteristics. Two field experts have evaluated the results.
As a .Net C# application, NooJ was originally reserved for a single family of platforms - Windows... more As a .Net C# application, NooJ was originally reserved for a single family of platforms - Windows. As many potential NooJ users use other operating systems (e.g. Linux, BSD, Solaris, Mac OSX, etc.), a need emerged to support NooJ on these platforms as well. Java is very well supported on many operating systems commonly used on desktop and laptop computers, but also on smart phones and pad devices, which could significantly contribute to the increase of NooJ users. Although Java as a programming language is similar to C#, there are also many differences, especially in the implementation of the GUI, which makes porting NooJ to Java a complex task. Furthermore, the NooJ GUI was not clearly separated from the engine in the .Net version, while in the Java version this separation has been performed and an API has been defined allowing NooJ users to apply NooJ’s engine in their own applications thus contributing to interoperability. Other activities on porting NooJ to Java also include the...
Lecture Notes in Computer Science, 2014
Recently, a large number of open data repositories, catalogs and portals have been emerging in th... more Recently, a large number of open data repositories, catalogs and portals have been emerging in the scientific and government realms. In this chapter, we characterise this newly emerging class of information systems. We describe the key functionality of open data portals, present a conceptual model and showcase the pan-European data portal PublicData.eu as a prominent example. Using examples from Serbia and Poland, we present an approach for lifting the often semantically shallow datasets registered at such data portals to Linked Data in order to make data portals the backbone of a distributed global data warehouse for our information society on the Web.
Lecture Notes in Computer Science, 2014
ABSTRACT This paper contributes to the understanding of challenges related to publishing and cons... more ABSTRACT This paper contributes to the understanding of challenges related to publishing and consuming public sector information using Linked Data tools. Linked Data paradigm has opened new possibilities and perspectives for the process of collecting and monitoring socio-economic indicators. Due to multidimensionality of the statistical data, in order to ensure efficient exploration and analysis, hierarchical data structures are needed for modeling the space and time dimensions. This paper presents several illustrative examples of modeling, analyzing and visualization of Linked Data from Serbian government bodies. The approach utilizes tools from the Linked Data stack, as well as the first prototype of the Exploratory Spatio-Temporal Analysis component that has been developed in the GeoKnow project framework.
This document presents the progress and effort made in the design of a dialog system for the virt... more This document presents the progress and effort made in the design of a dialog system for the virtual characters in DynaLearn. The main purpose of this system is to provide means by which the virtual characters can present relevant system knowledge to the learners in a pedagogically sound manner.
Linked Open Data (LOD) is a growing movement for organizations to make their existing data availa... more Linked Open Data (LOD) is a growing movement for organizations to make their existing data available in a machinereadable format. There are two equally important viewpoints to LOD: publishing and consuming. This article analyzes the requirements for both sub-processes and presents an example of publishing statistical data in RDF format and integrating the data into the LOD cloud via the PublicData.eu portal. In particular, it discusses the establishment of the Serbian CKAN metadata repository that serves for publishing open governmental data from Serbia, as well as a source catalogue for the PublicData.eu portal. Furthermore, by using an illustrative case study of the Statistical Office of the Republic of Serbia, it elaborates the adaption of the LOD2 Stack for analysis and dissemination of official statistics information.
To improve transparency and public service delivery, national, regional and local governmental bo... more To improve transparency and public service delivery, national, regional and local governmental bodies need to consider new strategies to openning up their data. We approach the problem of creating a more scalable and interoperable Open Government Data ecosystem by considering the latest advances in Linked Open Data. More precisely, we showcase how an integrated and coherent collection of aligned state of the art software tools, the LOD2 Stack, can be used to deliver trusted, open and rich collections of interlinked datasets to the public. The usage of the Tool Stack is demonstrated on the case of one of the largest data providers in the Republic of Serbia its Statistical Office.
Lecture Notes in Computer Science, 2014
This chapter focuses on data transformation to RDF and Linked Data and furthermore on the improve... more This chapter focuses on data transformation to RDF and Linked Data and furthermore on the improvement of existing or extracted data especially with respect to schema enrichment and ontology repair. Tasks concerning the triplification of data are mainly grounded on existing and well-proven techniques and were refined during the lifetime of the LOD2 project and integrated into the LOD2 Stack. Triplification of legacy data, i.e. data not yet in RDF, represents the entry point for legacy systems to participate in the LOD cloud. While existing systems are often very useful and successful, there are notable differences between the ways knowledge bases and Wikis or databases are created and used. One of the key differences in content is in the importance and use of schematic information in knowledge bases. This information is usually absent in the source system and therefore also in many LOD knowledge bases. However, schema information is needed for consistency checking and finding modelling problems. We will present a combination of enrichment and repair steps to tackle this problem based on previous research in machine learning and knowledge representation. Overall, the Chapter describes how to enable tool-supported creation and publishing of RDF as Linked Data (Sect. 1) and how to increase the quality and value of such large knowledge bases when published on the Web (Sect. 2).
With the emergence of Linked Data, DBpedia has steadily grown to become one of the largest and mo... more With the emergence of Linked Data, DBpedia has steadily grown to become one of the largest and most important structured knowledge sources we know of today. Adopting Wikipedia’s practice of entrusting the community with most of the work, the DBpedia internationalization committee has made a major step towards the move from unstructured to structured knowledge on the Web. Still, with new languages come new challenges. In this paper, we inspect some common obstacles that need to be tackled in order to add one more language to this popular data hub, but also some that haven’t been encountered before in this domain. More specifically, we explore the digraphic nature of the Serbian language, analyze the state of the DBpedia Extraction Framework with respect to its support for languages that use multiple scripts, and propose solutions towards overcoming this problem. Finally, we deploy the first digraphic DBpedia edition, taking the leading position amongst all DBpedia versions in the per...
To be able to cope with all the challenges that are emerging with the rise of the Web of Data, li... more To be able to cope with all the challenges that are emerging with the rise of the Web of Data, linked data tools must find ways to deal with its heterogeneity with regard to the content being published and the languages that content is being published in. Adequate internationalization support is a must for distributing or consuming multilingual data, but providing it is not straightforward. In this paper, we analyze the problems semantic web tools are faced with when it comes to data being published in Western Balkan languages. We look at resource presentation mechanisms, show how IRIs can solve many of the common problems related to Serbian alphabets, but also that the Semantic Web is still not ready for full transition to internationalized identifiers. We also review different serialization options and explain why most XML-based formats are not suitable for the task of internationalization. To see how popular linked data tools deal with such data, we propose an evaluation methodol...
As part of harmonisation with the EU legislation and practices, the Serbian authorities follow th... more As part of harmonisation with the EU legislation and practices, the Serbian authorities follow the standards and technologies that are used in the European Union and adapt their IT infrastructure to ensure quality service to the public, other governmental agencies, and other business partners, domestic and international. In this paper, we showcase the activities of the Statistical Office of the Republic of Serbia related to the publication of statistical data as Linked Data,., that are to enable better transparency and interaction of data with similar institutions in Europe. The given example outlines the steps to be taken for safe transitioning from conventional statistical data formats to Linked Data and aligning the practices with the international standards, in an effort to establish a high level of semantic interoperability, both locally and globally.
To make the Web of Data a reality, and push large scale integration of, and reasoning on, data on... more To make the Web of Data a reality, and push large scale integration of, and reasoning on, data on the Web, huge amounts of data must be made available in a standard format, reachable and manageable by Semantic Web tools. National statistical offices across the world already possess an abundance of structured data, both in their databases and files in various formats. We will first consider the reasons for making such data available as Linked Data. Then, some novel data representation methods, compatible with SDMX, an international standard for exchanging statistical data, will be showcased. We will then explain how to move from structured statistical data, represented in XML, to RDF, as well as how to enrich such datasets using standard classification schemas. Finally, we will present a way of increasing data visibility through cataloging newly created linked statistical data at both local and international level.
In order to improve efficiency in the provision of public services, increase transparency and int... more In order to improve efficiency in the provision of public services, increase transparency and interaction with citizens and society as a whole, but also create new businesses and job opportunities, both local and national governments need to find better strategies for delivering their data to the public in a powerful, machine-readable and future-proof format. We examine the case of Serbia, give an insight into the current state of affairs in the country, and show how both governmental agencies and the citizens of Serbia can benefit from Linked Open Data, a state of the art web-based technology that is promising to reshape the web itself. A working approach to publishing government data as Linked Data is demonstrated on the case of one of the largest data providers in the country, the Statistical Office of the Republic of Serbia. By cataloging the results in a local metadata repository, and scheduling periodical “harvesting” at an international level, both transparency and public ser...
There is growing interest from ecological experts to create qualitative models of phenomena for w... more There is growing interest from ecological experts to create qualitative models of phenomena for which numerical information is sparse or missing. We present a number of successful models in the field of environmental science, namely, the domain of global warming. The motivation behind the effort is to enrich the DynaLearn interactive learning environment model repository. We first model two of the negative feedback factors (snow and ice albedo and cooling aerosols), then two of the positive feedback ones (water vapor and warming aerosols). We then combine the two mechanisms in a larger model (low and high clouds), proving the possibility of extracting general mechanisms that can be reapplied to other systems sharing similar characteristics. Two field experts have evaluated the results.
As a .Net C# application, NooJ was originally reserved for a single family of platforms - Windows... more As a .Net C# application, NooJ was originally reserved for a single family of platforms - Windows. As many potential NooJ users use other operating systems (e.g. Linux, BSD, Solaris, Mac OSX, etc.), a need emerged to support NooJ on these platforms as well. Java is very well supported on many operating systems commonly used on desktop and laptop computers, but also on smart phones and pad devices, which could significantly contribute to the increase of NooJ users. Although Java as a programming language is similar to C#, there are also many differences, especially in the implementation of the GUI, which makes porting NooJ to Java a complex task. Furthermore, the NooJ GUI was not clearly separated from the engine in the .Net version, while in the Java version this separation has been performed and an API has been defined allowing NooJ users to apply NooJ’s engine in their own applications thus contributing to interoperability. Other activities on porting NooJ to Java also include the...
Lecture Notes in Computer Science, 2014
Recently, a large number of open data repositories, catalogs and portals have been emerging in th... more Recently, a large number of open data repositories, catalogs and portals have been emerging in the scientific and government realms. In this chapter, we characterise this newly emerging class of information systems. We describe the key functionality of open data portals, present a conceptual model and showcase the pan-European data portal PublicData.eu as a prominent example. Using examples from Serbia and Poland, we present an approach for lifting the often semantically shallow datasets registered at such data portals to Linked Data in order to make data portals the backbone of a distributed global data warehouse for our information society on the Web.
Lecture Notes in Computer Science, 2014
ABSTRACT This paper contributes to the understanding of challenges related to publishing and cons... more ABSTRACT This paper contributes to the understanding of challenges related to publishing and consuming public sector information using Linked Data tools. Linked Data paradigm has opened new possibilities and perspectives for the process of collecting and monitoring socio-economic indicators. Due to multidimensionality of the statistical data, in order to ensure efficient exploration and analysis, hierarchical data structures are needed for modeling the space and time dimensions. This paper presents several illustrative examples of modeling, analyzing and visualization of Linked Data from Serbian government bodies. The approach utilizes tools from the Linked Data stack, as well as the first prototype of the Exploratory Spatio-Temporal Analysis component that has been developed in the GeoKnow project framework.
This document presents the progress and effort made in the design of a dialog system for the virt... more This document presents the progress and effort made in the design of a dialog system for the virtual characters in DynaLearn. The main purpose of this system is to provide means by which the virtual characters can present relevant system knowledge to the learners in a pedagogically sound manner.
Linked Open Data (LOD) is a growing movement for organizations to make their existing data availa... more Linked Open Data (LOD) is a growing movement for organizations to make their existing data available in a machinereadable format. There are two equally important viewpoints to LOD: publishing and consuming. This article analyzes the requirements for both sub-processes and presents an example of publishing statistical data in RDF format and integrating the data into the LOD cloud via the PublicData.eu portal. In particular, it discusses the establishment of the Serbian CKAN metadata repository that serves for publishing open governmental data from Serbia, as well as a source catalogue for the PublicData.eu portal. Furthermore, by using an illustrative case study of the Statistical Office of the Republic of Serbia, it elaborates the adaption of the LOD2 Stack for analysis and dissemination of official statistics information.
To improve transparency and public service delivery, national, regional and local governmental bo... more To improve transparency and public service delivery, national, regional and local governmental bodies need to consider new strategies to openning up their data. We approach the problem of creating a more scalable and interoperable Open Government Data ecosystem by considering the latest advances in Linked Open Data. More precisely, we showcase how an integrated and coherent collection of aligned state of the art software tools, the LOD2 Stack, can be used to deliver trusted, open and rich collections of interlinked datasets to the public. The usage of the Tool Stack is demonstrated on the case of one of the largest data providers in the Republic of Serbia its Statistical Office.
Lecture Notes in Computer Science, 2014
This chapter focuses on data transformation to RDF and Linked Data and furthermore on the improve... more This chapter focuses on data transformation to RDF and Linked Data and furthermore on the improvement of existing or extracted data especially with respect to schema enrichment and ontology repair. Tasks concerning the triplification of data are mainly grounded on existing and well-proven techniques and were refined during the lifetime of the LOD2 project and integrated into the LOD2 Stack. Triplification of legacy data, i.e. data not yet in RDF, represents the entry point for legacy systems to participate in the LOD cloud. While existing systems are often very useful and successful, there are notable differences between the ways knowledge bases and Wikis or databases are created and used. One of the key differences in content is in the importance and use of schematic information in knowledge bases. This information is usually absent in the source system and therefore also in many LOD knowledge bases. However, schema information is needed for consistency checking and finding modelling problems. We will present a combination of enrichment and repair steps to tackle this problem based on previous research in machine learning and knowledge representation. Overall, the Chapter describes how to enable tool-supported creation and publishing of RDF as Linked Data (Sect. 1) and how to increase the quality and value of such large knowledge bases when published on the Web (Sect. 2).