Marko Grobelnik | Jožef Stefan Institute (original) (raw)
Uploads
Papers by Marko Grobelnik
Information Visualization, Iv 2009, Proceedings, 2009
Lecture Notes in Computer Science, 2009
Lecture Notes in Computer Science, 2009
European Conference on Artificial Intelligence, 1992
Workshop on Evaluation of Ontology-based Tools, 2006
An ontology is an explicit formal conceptualization of some domain of interest. Ontology evaluati... more An ontology is an explicit formal conceptualization of some domain of interest. Ontology evaluation is the problem of assessing a given ontology from the point of view of a particular criterion or application, typically in order to determine which of several ontologies would best suit a particular purpose. This paper proposes an ontology evaluation approach based on comparing an ontology
National Conference on Artificial Intelligence, 2005
Automatic document summarization is a problem of creating a document surrogate that adequately re... more Automatic document summarization is a problem of creating a document surrogate that adequately represents the full document content. We aim at a summarization system that can replicate the quality of summaries created by humans. In this paper we investigate the machine learning method for extracting full sentences from documents based on the document semantic graph structure. In particular, we explore
In this paper, we review two techniques for topic discovery in collections of text documents (Lat... more In this paper, we review two techniques for topic discovery in collections of text documents (Latent Semantic Indexing and K- Means clustering) and present how we integrated them into a system for semiautomatic topic ontology construction. The system offers supports to the user during the construction process by suggesting topics and analyzing them in real time.
In this paper we present an approach to extracting subject-predicate-object triplets from English... more In this paper we present an approach to extracting subject-predicate-object triplets from English sentences. To begin with, four different well known syntactical parsers for English are used for generating parse trees from the sentences, followed by extraction of triplets from the parse trees using parser dependent techniques.
This paper presents a system for visualization of lar ge amounts of new stories. In the first pha... more This paper presents a system for visualization of lar ge amounts of new stories. In the first phase, the new stories are preprocessed for the purpose of name -entity extraction. Next, a graph of relationships between the extracted name entities is created, where each name entity represents one vertex in the graph and two name entities are connected if they
Text categorization is the task of classifying natural language documents into a set of predefine... more Text categorization is the task of classifying natural language documents into a set of predefined categories. Documents are typically represented by sparse vectors under the vector space model, where each word in the vocabulary is mapped to one coordinate axis and its occurrence in the document gives rise to one nonzero component in the vector representing that document. When training
In this paper we present a method for summarizing document by creating a semantic graph of the or... more In this paper we present a method for summarizing document by creating a semantic graph of the original document and identifying the substructure of such a graph that can be used to extract sentences for a document summary. We start with deep syntactic analysis of the text and, for each sentence, extract logical form triples, subject-predicate-object. We then apply cross-sentence
Text categorization is the task of classifying natural language documents into a set of predefine... more Text categorization is the task of classifying natural language documents into a set of predefined categories. Documents are typically represented by sparse vectors under the vector space model, where each word in the vocabulary is mapped to one coordinate axis and its occurrence in the document gives rise to one nonzero component in the vector representing that document. When training
Information Systems, 2000
In this paper we present a new version of OntoGen system for semi-automatic data-driven ontology ... more In this paper we present a new version of OntoGen system for semi-automatic data-driven ontology construction. The system is based on a novel ontology learning framework which formalizes and extends the role of machine learning and text mining algorithms used in the previous version. List of new features includes extended number of supported ontology formats (RDFS and OWL), supervised methods
Artificial Intelligence from the Information Processing Perspective, 1992
Informatica (slovenia), 2005
From the automated text processing point of view, natural language is very redundant in the sense... more From the automated text processing point of view, natural language is very redundant in the sense that many different words share a common or similar meaning. For computer this can be hard to understand without some background knowledge. Latent Semantic Indexing (LSI) is a technique that helps in extracting some of this background knowledge from corpus of text documents. This
International Conference on Machine Learning, 2002
In this paper we explore effects of various feature selection algorithms on document classificati... more In this paper we explore effects of various feature selection algorithms on document classification performance. We propose to use two, possibly distinct linear classifiers: one used exclusively for feature selection in order to obtain the feature space for training the second classifier, using possibly a different training set. The resulting classifier is used to classify new documents. Experiments show that
International Conference on Machine Learning, 1999
This paper describes an approach to feature subset selection that takes into account prob-lem spe... more This paper describes an approach to feature subset selection that takes into account prob-lem specifics and learning algorithm char-acteristics. It is developed for the Naive Bayesian classifier applied on text data, since it combines well with the addressed learning problems. We ...
An ontology is an explicit formal conceptualization of some domain of interest. Ontologies are in... more An ontology is an explicit formal conceptualization of some domain of interest. Ontologies are increasingly used in various fields such as knowledge management, information extraction, and the semantic web. Ontology evaluation is the problem of assessing a given ontology from the point of view of a particular criterion of application, typically in order to determine which of several ontologies would
... Popular Tags. Add a tag: No tags have been applied to this document. BibTeX | Add To MetaCart... more ... Popular Tags. Add a tag: No tags have been applied to this document. BibTeX | Add To MetaCart. @INPROCEEDINGS{Mladenic98wordsequences, author = {Dunja Mladenic and Marko Grobelnik},title = {Word Sequences as Features in Text-Learning}, booktitle = {In ...
This paper describes automatic document categorization based on large text hierarchy. Wehandle th... more This paper describes automatic document categorization based on large text hierarchy. Wehandle the large number of features and training examples by taking into account hierarchicalstructure of examples and using feature selection for large text data. We experimentally evaluatefeature subset selection on real-world text data collected from the existing Web hierarchy namedYahoo. In our learning experiments naive Bayesian classifier was used
Information Visualization, Iv 2009, Proceedings, 2009
Lecture Notes in Computer Science, 2009
Lecture Notes in Computer Science, 2009
European Conference on Artificial Intelligence, 1992
Workshop on Evaluation of Ontology-based Tools, 2006
An ontology is an explicit formal conceptualization of some domain of interest. Ontology evaluati... more An ontology is an explicit formal conceptualization of some domain of interest. Ontology evaluation is the problem of assessing a given ontology from the point of view of a particular criterion or application, typically in order to determine which of several ontologies would best suit a particular purpose. This paper proposes an ontology evaluation approach based on comparing an ontology
National Conference on Artificial Intelligence, 2005
Automatic document summarization is a problem of creating a document surrogate that adequately re... more Automatic document summarization is a problem of creating a document surrogate that adequately represents the full document content. We aim at a summarization system that can replicate the quality of summaries created by humans. In this paper we investigate the machine learning method for extracting full sentences from documents based on the document semantic graph structure. In particular, we explore
In this paper, we review two techniques for topic discovery in collections of text documents (Lat... more In this paper, we review two techniques for topic discovery in collections of text documents (Latent Semantic Indexing and K- Means clustering) and present how we integrated them into a system for semiautomatic topic ontology construction. The system offers supports to the user during the construction process by suggesting topics and analyzing them in real time.
In this paper we present an approach to extracting subject-predicate-object triplets from English... more In this paper we present an approach to extracting subject-predicate-object triplets from English sentences. To begin with, four different well known syntactical parsers for English are used for generating parse trees from the sentences, followed by extraction of triplets from the parse trees using parser dependent techniques.
This paper presents a system for visualization of lar ge amounts of new stories. In the first pha... more This paper presents a system for visualization of lar ge amounts of new stories. In the first phase, the new stories are preprocessed for the purpose of name -entity extraction. Next, a graph of relationships between the extracted name entities is created, where each name entity represents one vertex in the graph and two name entities are connected if they
Text categorization is the task of classifying natural language documents into a set of predefine... more Text categorization is the task of classifying natural language documents into a set of predefined categories. Documents are typically represented by sparse vectors under the vector space model, where each word in the vocabulary is mapped to one coordinate axis and its occurrence in the document gives rise to one nonzero component in the vector representing that document. When training
In this paper we present a method for summarizing document by creating a semantic graph of the or... more In this paper we present a method for summarizing document by creating a semantic graph of the original document and identifying the substructure of such a graph that can be used to extract sentences for a document summary. We start with deep syntactic analysis of the text and, for each sentence, extract logical form triples, subject-predicate-object. We then apply cross-sentence
Text categorization is the task of classifying natural language documents into a set of predefine... more Text categorization is the task of classifying natural language documents into a set of predefined categories. Documents are typically represented by sparse vectors under the vector space model, where each word in the vocabulary is mapped to one coordinate axis and its occurrence in the document gives rise to one nonzero component in the vector representing that document. When training
Information Systems, 2000
In this paper we present a new version of OntoGen system for semi-automatic data-driven ontology ... more In this paper we present a new version of OntoGen system for semi-automatic data-driven ontology construction. The system is based on a novel ontology learning framework which formalizes and extends the role of machine learning and text mining algorithms used in the previous version. List of new features includes extended number of supported ontology formats (RDFS and OWL), supervised methods
Artificial Intelligence from the Information Processing Perspective, 1992
Informatica (slovenia), 2005
From the automated text processing point of view, natural language is very redundant in the sense... more From the automated text processing point of view, natural language is very redundant in the sense that many different words share a common or similar meaning. For computer this can be hard to understand without some background knowledge. Latent Semantic Indexing (LSI) is a technique that helps in extracting some of this background knowledge from corpus of text documents. This
International Conference on Machine Learning, 2002
In this paper we explore effects of various feature selection algorithms on document classificati... more In this paper we explore effects of various feature selection algorithms on document classification performance. We propose to use two, possibly distinct linear classifiers: one used exclusively for feature selection in order to obtain the feature space for training the second classifier, using possibly a different training set. The resulting classifier is used to classify new documents. Experiments show that
International Conference on Machine Learning, 1999
This paper describes an approach to feature subset selection that takes into account prob-lem spe... more This paper describes an approach to feature subset selection that takes into account prob-lem specifics and learning algorithm char-acteristics. It is developed for the Naive Bayesian classifier applied on text data, since it combines well with the addressed learning problems. We ...
An ontology is an explicit formal conceptualization of some domain of interest. Ontologies are in... more An ontology is an explicit formal conceptualization of some domain of interest. Ontologies are increasingly used in various fields such as knowledge management, information extraction, and the semantic web. Ontology evaluation is the problem of assessing a given ontology from the point of view of a particular criterion of application, typically in order to determine which of several ontologies would
... Popular Tags. Add a tag: No tags have been applied to this document. BibTeX | Add To MetaCart... more ... Popular Tags. Add a tag: No tags have been applied to this document. BibTeX | Add To MetaCart. @INPROCEEDINGS{Mladenic98wordsequences, author = {Dunja Mladenic and Marko Grobelnik},title = {Word Sequences as Features in Text-Learning}, booktitle = {In ...
This paper describes automatic document categorization based on large text hierarchy. Wehandle th... more This paper describes automatic document categorization based on large text hierarchy. Wehandle the large number of features and training examples by taking into account hierarchicalstructure of examples and using feature selection for large text data. We experimentally evaluatefeature subset selection on real-world text data collected from the existing Web hierarchy namedYahoo. In our learning experiments naive Bayesian classifier was used