Max Jakob - Academia.edu (original) (raw)
Papers by Max Jakob
3Rd the Sixth International Conference on Knowledge Capture K Cap 2011 3rd the Sixth International Conference on Knowledge Capture K Cap 2011 25 06 2011 29 06 2011 Banff Alberta Canada, Jun 26, 2011
Enriching knowledge bases with multimedia information makes it possible to complement textual des... more Enriching knowledge bases with multimedia information makes it possible to complement textual descriptions with visual and audio information. Such complementary information can help users to understand the meaning of assertions, and in general improve the user experience with the knowledge base. In this paper we address the problem of how to enrich ontology instances with candidate images retrieved from existing Web search engines. DBpedia has evolved into a major hub in the Linked Data cloud, interconnecting millions of entities organized under a consistent ontology. Our approach taps into the Wikipedia corpus to gather context information for DBpedia instances and takes advantage of image tagging information when this is available to calculate semantic relatedness between instances and candidate images. We performed experiments with focus on the particularly challenging problem of highly ambiguous names. Both methods presented in this work outperformed the baseline. Our best method leveraged context words from Wikipedia, tags from Flickr and type information from DBpedia to achieve an average precision of 80%.
Proceedings of the 7th International Conference on Semantic Systems I Semantics 2011 7th International Conference on Semantic Systems I Semantics 2011 07 09 2011 09 09 2011 Graz Austria, 2011
Interlinking text documents with Linked Open Data enables the Web of Data to be used as backgroun... more Interlinking text documents with Linked Open Data enables the Web of Data to be used as background knowledge within document-oriented applications such as search and faceted browsing. As a step towards interconnecting the Web of Documents with the Web of Data, we developed DBpedia Spotlight, a system for automatically annotating text documents with DBpedia URIs. DBpedia Spotlight allows users to configure the annotations to their specific needs through the DBpedia Ontology and quality measures such as prominence, topical pertinence, contextual ambiguity and disambiguation confidence. We compare our approach with the state of the art in disambiguation, and evaluate our results in light of three baselines and six publicly available annotation systems, demonstrating the competitiveness of our system. DBpedia Spotlight is shared as open source and deployed as a Web Service freely available for public use.
This paper investigates the mapping between two semantic formalisms, namely the tectogrammatical ... more This paper investigates the mapping between two semantic formalisms, namely the tectogrammatical layer of the Prague Dependency Treebank 2.0 (PDT) and (Robust) Minimal Recursion Semantics ((R)MRS). It is a first attempt to relate the dependency-based annotation scheme of PDT to a compositional semantics approach like (R)MRS. A mapping algorithm that converts PDT trees to (R)MRS structures is developed, associating (R)MRSs to each node on the dependency tree. Furthermore, composition rules are formulated and the relation between dependency in PDT and semantic heads in (R)MRS is analyzed. It turns out that structure and dependencies, morphological categories and some coreferences can be preserved in the target structures. Moreover, valency and free modifications are distinguished using the valency dictionary of PDT as an additional resource. The validation results show that systematically correct underspecified target representations can be obtained by a rule-based mapping approach, w...
ABSTRACT There has recently been an increased interest in named entity recognition and disambigua... more ABSTRACT There has recently been an increased interest in named entity recognition and disambiguation systems at major conferences such as WWW, SIGIR, ACL, KDD, etc. However, most work has focused on algorithms and evaluations, leaving little space for implementation details. In this paper, we discuss some implementation and data processing challenges we encountered while developing a new multilingual version of DBpedia Spotlight that is faster, more accurate and easier to configure. We compare our solution to the previous system, considering time performance, space requirements and accuracy in the context of the Dutch and English languages. Additionally, we report results for 9 additional languages among the largest Wikipedias. Finally, we present challenges and experiences to foment the discussion with other developers interested in recognition and disambiguation of entities in natural language text.
Proceedings of the sixth international conference on Knowledge capture - K-CAP '11, 2011
Enriching knowledge bases with multimedia information makes it possible to complement textual des... more Enriching knowledge bases with multimedia information makes it possible to complement textual descriptions with visual and audio information. Such complementary information can help users to understand the meaning of assertions, and in general improve the user experience with the knowledge base. In this paper we address the problem of how to enrich ontology instances with candidate images retrieved from existing Web search engines. DBpedia has evolved into a major hub in the Linked Data cloud, interconnecting millions of entities organized under a consistent ontology. Our approach taps into the Wikipedia corpus to gather context information for DBpedia instances and takes advantage of image tagging information when this is available to calculate semantic relatedness between instances and candidate images. We performed experiments with focus on the particularly challenging problem of highly ambiguous names. Both methods presented in this work outperformed the baseline. Our best method leveraged context words from Wikipedia, tags from Flickr and type information from DBpedia to achieve an average precision of 80%.
Proceedings of the 7th International Conference on Semantic Systems - I-Semantics '11, 2011
Interlinking text documents with Linked Open Data enables the Web of Data to be used as backgroun... more Interlinking text documents with Linked Open Data enables the Web of Data to be used as background knowledge within document-oriented applications such as search and faceted browsing. As a step towards interconnecting the Web of Documents with the Web of Data, we developed DBpedia Spotlight, a system for automatically annotating text documents with DBpedia URIs. DBpedia Spotlight allows users to configure the annotations to their specific needs through the DBpedia Ontology and quality measures such as prominence, topical pertinence, contextual ambiguity and disambiguation confidence. We compare our approach with the state of the art in disambiguation, and evaluate our results in light of three baselines and six publicly available annotation systems, demonstrating the competitiveness of our system. DBpedia Spotlight is shared as open source and deployed as a Web Service freely available for public use.
Proceedings of LREC, May 1, 2010
This paper investigates the mapping between two semantic formalisms, namely the tectogrammatical ... more This paper investigates the mapping between two semantic formalisms, namely the tectogrammatical layer of the Prague Dependency Treebank 2.0 (PDT) and (Robust) Minimal Recursion Semantics ((R) MRS). It is a first attempt to relate the dependency-based annotation scheme of PDT to a compositional semantics approach like (R) MRS. A mapping algorithm that converts PDT trees to (R) MRS structures is developed, associating (R) MRSs to each node on the dependency tree. Furthermore, composition rules are formulated and ...
3Rd the Sixth International Conference on Knowledge Capture K Cap 2011 3rd the Sixth International Conference on Knowledge Capture K Cap 2011 25 06 2011 29 06 2011 Banff Alberta Canada, Jun 26, 2011
Enriching knowledge bases with multimedia information makes it possible to complement textual des... more Enriching knowledge bases with multimedia information makes it possible to complement textual descriptions with visual and audio information. Such complementary information can help users to understand the meaning of assertions, and in general improve the user experience with the knowledge base. In this paper we address the problem of how to enrich ontology instances with candidate images retrieved from existing Web search engines. DBpedia has evolved into a major hub in the Linked Data cloud, interconnecting millions of entities organized under a consistent ontology. Our approach taps into the Wikipedia corpus to gather context information for DBpedia instances and takes advantage of image tagging information when this is available to calculate semantic relatedness between instances and candidate images. We performed experiments with focus on the particularly challenging problem of highly ambiguous names. Both methods presented in this work outperformed the baseline. Our best method leveraged context words from Wikipedia, tags from Flickr and type information from DBpedia to achieve an average precision of 80%.
Proceedings of the 7th International Conference on Semantic Systems I Semantics 2011 7th International Conference on Semantic Systems I Semantics 2011 07 09 2011 09 09 2011 Graz Austria, 2011
Interlinking text documents with Linked Open Data enables the Web of Data to be used as backgroun... more Interlinking text documents with Linked Open Data enables the Web of Data to be used as background knowledge within document-oriented applications such as search and faceted browsing. As a step towards interconnecting the Web of Documents with the Web of Data, we developed DBpedia Spotlight, a system for automatically annotating text documents with DBpedia URIs. DBpedia Spotlight allows users to configure the annotations to their specific needs through the DBpedia Ontology and quality measures such as prominence, topical pertinence, contextual ambiguity and disambiguation confidence. We compare our approach with the state of the art in disambiguation, and evaluate our results in light of three baselines and six publicly available annotation systems, demonstrating the competitiveness of our system. DBpedia Spotlight is shared as open source and deployed as a Web Service freely available for public use.
This paper investigates the mapping between two semantic formalisms, namely the tectogrammatical ... more This paper investigates the mapping between two semantic formalisms, namely the tectogrammatical layer of the Prague Dependency Treebank 2.0 (PDT) and (Robust) Minimal Recursion Semantics ((R)MRS). It is a first attempt to relate the dependency-based annotation scheme of PDT to a compositional semantics approach like (R)MRS. A mapping algorithm that converts PDT trees to (R)MRS structures is developed, associating (R)MRSs to each node on the dependency tree. Furthermore, composition rules are formulated and the relation between dependency in PDT and semantic heads in (R)MRS is analyzed. It turns out that structure and dependencies, morphological categories and some coreferences can be preserved in the target structures. Moreover, valency and free modifications are distinguished using the valency dictionary of PDT as an additional resource. The validation results show that systematically correct underspecified target representations can be obtained by a rule-based mapping approach, w...
ABSTRACT There has recently been an increased interest in named entity recognition and disambigua... more ABSTRACT There has recently been an increased interest in named entity recognition and disambiguation systems at major conferences such as WWW, SIGIR, ACL, KDD, etc. However, most work has focused on algorithms and evaluations, leaving little space for implementation details. In this paper, we discuss some implementation and data processing challenges we encountered while developing a new multilingual version of DBpedia Spotlight that is faster, more accurate and easier to configure. We compare our solution to the previous system, considering time performance, space requirements and accuracy in the context of the Dutch and English languages. Additionally, we report results for 9 additional languages among the largest Wikipedias. Finally, we present challenges and experiences to foment the discussion with other developers interested in recognition and disambiguation of entities in natural language text.
Proceedings of the sixth international conference on Knowledge capture - K-CAP '11, 2011
Enriching knowledge bases with multimedia information makes it possible to complement textual des... more Enriching knowledge bases with multimedia information makes it possible to complement textual descriptions with visual and audio information. Such complementary information can help users to understand the meaning of assertions, and in general improve the user experience with the knowledge base. In this paper we address the problem of how to enrich ontology instances with candidate images retrieved from existing Web search engines. DBpedia has evolved into a major hub in the Linked Data cloud, interconnecting millions of entities organized under a consistent ontology. Our approach taps into the Wikipedia corpus to gather context information for DBpedia instances and takes advantage of image tagging information when this is available to calculate semantic relatedness between instances and candidate images. We performed experiments with focus on the particularly challenging problem of highly ambiguous names. Both methods presented in this work outperformed the baseline. Our best method leveraged context words from Wikipedia, tags from Flickr and type information from DBpedia to achieve an average precision of 80%.
Proceedings of the 7th International Conference on Semantic Systems - I-Semantics '11, 2011
Interlinking text documents with Linked Open Data enables the Web of Data to be used as backgroun... more Interlinking text documents with Linked Open Data enables the Web of Data to be used as background knowledge within document-oriented applications such as search and faceted browsing. As a step towards interconnecting the Web of Documents with the Web of Data, we developed DBpedia Spotlight, a system for automatically annotating text documents with DBpedia URIs. DBpedia Spotlight allows users to configure the annotations to their specific needs through the DBpedia Ontology and quality measures such as prominence, topical pertinence, contextual ambiguity and disambiguation confidence. We compare our approach with the state of the art in disambiguation, and evaluate our results in light of three baselines and six publicly available annotation systems, demonstrating the competitiveness of our system. DBpedia Spotlight is shared as open source and deployed as a Web Service freely available for public use.
Proceedings of LREC, May 1, 2010
This paper investigates the mapping between two semantic formalisms, namely the tectogrammatical ... more This paper investigates the mapping between two semantic formalisms, namely the tectogrammatical layer of the Prague Dependency Treebank 2.0 (PDT) and (Robust) Minimal Recursion Semantics ((R) MRS). It is a first attempt to relate the dependency-based annotation scheme of PDT to a compositional semantics approach like (R) MRS. A mapping algorithm that converts PDT trees to (R) MRS structures is developed, associating (R) MRSs to each node on the dependency tree. Furthermore, composition rules are formulated and ...