Marko Grobelnik | Jožef Stefan Institute (original) (raw)

Uploads

Papers by Marko Grobelnik

Research paper thumbnail of Document Visualization Based on Semantic Graphs

Information Visualization, Iv 2009, Proceedings, 2009

Research paper thumbnail of Demo: HistoryViz – Visualizing Events and Relations Extracted from Wikipedia

Lecture Notes in Computer Science, 2009

Research paper thumbnail of Modeling Common Real-Word Relations Using Triples Extracted from n-Grams

Lecture Notes in Computer Science, 2009

Research paper thumbnail of Stochastic Search in Inductive Logic Programming

European Conference on Artificial Intelligence, 1992

Research paper thumbnail of Gold Standard Based Ontology Evaluation Using Instance Assignment

Workshop on Evaluation of Ontology-based Tools, 2006

An ontology is an explicit formal conceptualization of some domain of interest. Ontology evaluati... more An ontology is an explicit formal conceptualization of some domain of interest. Ontology evaluation is the problem of assessing a given ontology from the point of view of a particular criterion or application, typically in order to determine which of several ontologies would best suit a particular purpose. This paper proposes an ontology evaluation approach based on comparing an ontology

Research paper thumbnail of Impact of Linguistic Analysis on the Semantic Graph Coverage and Learning of Document Extracts

National Conference on Artificial Intelligence, 2005

Automatic document summarization is a problem of creating a document surrogate that adequately re... more Automatic document summarization is a problem of creating a document surrogate that adequately represents the full document content. We aim at a summarization system that can replicate the quality of summaries created by humans. In this paper we investigate the machine learning method for extracting full sentences from documents based on the document semantic graph structure. In particular, we explore

Research paper thumbnail of System for semi-automatic ontology construction

In this paper, we review two techniques for topic discovery in collections of text documents (Lat... more In this paper, we review two techniques for topic discovery in collections of text documents (Latent Semantic Indexing and K- Means clustering) and present how we integrated them into a system for semiautomatic topic ontology construction. The system offers supports to the user during the construction process by suggesting topics and analyzing them in real time.

Research paper thumbnail of TRIPLET EXTRACTION FROM SENTENCES

In this paper we present an approach to extracting subject-predicate-object triplets from English... more In this paper we present an approach to extracting subject-predicate-object triplets from English sentences. To begin with, four different well known syntactical parsers for English are used for generating parse trees from the sentences, followed by extraction of triplets from the parse trees using parser dependent techniques.

Research paper thumbnail of VISUALIZATION OF NEWS ARTICLES

This paper presents a system for visualization of lar ge amounts of new stories. In the first pha... more This paper presents a system for visualization of lar ge amounts of new stories. In the first phase, the new stories are preprocessed for the purpose of name -entity extraction. Next, a graph of relationships between the extracted name entities is created, where each name entity represents one vertex in the graph and two name entities are connected if they

Research paper thumbnail of Feature Selection Using Linear Support Vector Machines

Text categorization is the task of classifying natural language documents into a set of predefine... more Text categorization is the task of classifying natural language documents into a set of predefined categories. Documents are typically represented by sparse vectors under the vector space model, where each word in the vocabulary is mapped to one coordinate axis and its occurrence in the document gives rise to one nonzero component in the vector representing that document. When training

Research paper thumbnail of Learning sub-structures of document semantic graphs for document summarization

In this paper we present a method for summarizing document by creating a semantic graph of the or... more In this paper we present a method for summarizing document by creating a semantic graph of the original document and identifying the substructure of such a graph that can be used to extract sentences for a document summary. We start with deep syntactic analysis of the text and, for each sentence, extract logical form triples, subject-predicate-object. We then apply cross-sentence

Research paper thumbnail of Feature selection using support vector machines

Text categorization is the task of classifying natural language documents into a set of predefine... more Text categorization is the task of classifying natural language documents into a set of predefined categories. Documents are typically represented by sparse vectors under the vector space model, where each word in the vocabulary is mapped to one coordinate axis and its occurrence in the document gives rise to one nonzero component in the vector representing that document. When training

Research paper thumbnail of SEMI-AUTOMATIC DATA-DRIVEN ONTOLOGY CONSTRUCTION SYSTEM

Information Systems, 2000

In this paper we present a new version of OntoGen system for semi-automatic data-driven ontology ... more In this paper we present a new version of OntoGen system for semi-automatic data-driven ontology construction. The system is based on a novel ontology learning framework which formalizes and extends the role of machine learning and text mining algorithms used in the previous version. List of new features includes extended number of supported ontology formats (RDFS and OWL), supervised methods

Research paper thumbnail of Inductive Learning Applied to Program Construction and Verification

Artificial Intelligence from the Information Processing Perspective, 1992

Research paper thumbnail of Visualization of Text Document Corpus

Informatica (slovenia), 2005

From the automated text processing point of view, natural language is very redundant in the sense... more From the automated text processing point of view, natural language is very redundant in the sense that many different words share a common or similar meaning. For computer this can be hard to understand without some background knowledge. Latent Semantic Indexing (LSI) is a technique that helps in extracting some of this background knowledge from corpus of text documents. This

Research paper thumbnail of Interaction of feature selection methods and linear classification models

International Conference on Machine Learning, 2002

In this paper we explore effects of various feature selection algorithms on document classificati... more In this paper we explore effects of various feature selection algorithms on document classification performance. We propose to use two, possibly distinct linear classifiers: one used exclusively for feature selection in order to obtain the feature space for training the second classifier, using possibly a different training set. The resulting classifier is used to classify new documents. Experiments show that

Research paper thumbnail of Feature Selection for Unbalanced Class Distribution and Naive Bayes

International Conference on Machine Learning, 1999

This paper describes an approach to feature subset selection that takes into account prob-lem spe... more This paper describes an approach to feature subset selection that takes into account prob-lem specifics and learning algorithm char-acteristics. It is developed for the Naive Bayesian classifier applied on text data, since it combines well with the addressed learning problems. We ...

Research paper thumbnail of A SURVEY OF ONTOLOGY EVALUATION TECHNIQUES

An ontology is an explicit formal conceptualization of some domain of interest. Ontologies are in... more An ontology is an explicit formal conceptualization of some domain of interest. Ontologies are increasingly used in various fields such as knowledge management, information extraction, and the semantic web. Ontology evaluation is the problem of assessing a given ontology from the point of view of a particular criterion of application, typically in order to determine which of several ontologies would

Research paper thumbnail of Word sequences as features in text-learning

... Popular Tags. Add a tag: No tags have been applied to this document. BibTeX | Add To MetaCart... more ... Popular Tags. Add a tag: No tags have been applied to this document. BibTeX | Add To MetaCart. @INPROCEEDINGS{Mladenic98wordsequences, author = {Dunja Mladenic and Marko Grobelnik},title = {Word Sequences as Features in Text-Learning}, booktitle = {In ...

Research paper thumbnail of Feature Selection for Classification Based on Text Hierarchy

This paper describes automatic document categorization based on large text hierarchy. Wehandle th... more This paper describes automatic document categorization based on large text hierarchy. Wehandle the large number of features and training examples by taking into account hierarchicalstructure of examples and using feature selection for large text data. We experimentally evaluatefeature subset selection on real-world text data collected from the existing Web hierarchy namedYahoo. In our learning experiments naive Bayesian classifier was used

Research paper thumbnail of Document Visualization Based on Semantic Graphs

Information Visualization, Iv 2009, Proceedings, 2009

Research paper thumbnail of Demo: HistoryViz – Visualizing Events and Relations Extracted from Wikipedia

Lecture Notes in Computer Science, 2009

Research paper thumbnail of Modeling Common Real-Word Relations Using Triples Extracted from n-Grams

Lecture Notes in Computer Science, 2009

Research paper thumbnail of Stochastic Search in Inductive Logic Programming

European Conference on Artificial Intelligence, 1992

Research paper thumbnail of Gold Standard Based Ontology Evaluation Using Instance Assignment

Workshop on Evaluation of Ontology-based Tools, 2006

An ontology is an explicit formal conceptualization of some domain of interest. Ontology evaluati... more An ontology is an explicit formal conceptualization of some domain of interest. Ontology evaluation is the problem of assessing a given ontology from the point of view of a particular criterion or application, typically in order to determine which of several ontologies would best suit a particular purpose. This paper proposes an ontology evaluation approach based on comparing an ontology

Research paper thumbnail of Impact of Linguistic Analysis on the Semantic Graph Coverage and Learning of Document Extracts

National Conference on Artificial Intelligence, 2005

Automatic document summarization is a problem of creating a document surrogate that adequately re... more Automatic document summarization is a problem of creating a document surrogate that adequately represents the full document content. We aim at a summarization system that can replicate the quality of summaries created by humans. In this paper we investigate the machine learning method for extracting full sentences from documents based on the document semantic graph structure. In particular, we explore

Research paper thumbnail of System for semi-automatic ontology construction

In this paper, we review two techniques for topic discovery in collections of text documents (Lat... more In this paper, we review two techniques for topic discovery in collections of text documents (Latent Semantic Indexing and K- Means clustering) and present how we integrated them into a system for semiautomatic topic ontology construction. The system offers supports to the user during the construction process by suggesting topics and analyzing them in real time.

Research paper thumbnail of TRIPLET EXTRACTION FROM SENTENCES

In this paper we present an approach to extracting subject-predicate-object triplets from English... more In this paper we present an approach to extracting subject-predicate-object triplets from English sentences. To begin with, four different well known syntactical parsers for English are used for generating parse trees from the sentences, followed by extraction of triplets from the parse trees using parser dependent techniques.

Research paper thumbnail of VISUALIZATION OF NEWS ARTICLES

This paper presents a system for visualization of lar ge amounts of new stories. In the first pha... more This paper presents a system for visualization of lar ge amounts of new stories. In the first phase, the new stories are preprocessed for the purpose of name -entity extraction. Next, a graph of relationships between the extracted name entities is created, where each name entity represents one vertex in the graph and two name entities are connected if they

Research paper thumbnail of Feature Selection Using Linear Support Vector Machines

Text categorization is the task of classifying natural language documents into a set of predefine... more Text categorization is the task of classifying natural language documents into a set of predefined categories. Documents are typically represented by sparse vectors under the vector space model, where each word in the vocabulary is mapped to one coordinate axis and its occurrence in the document gives rise to one nonzero component in the vector representing that document. When training

Research paper thumbnail of Learning sub-structures of document semantic graphs for document summarization

In this paper we present a method for summarizing document by creating a semantic graph of the or... more In this paper we present a method for summarizing document by creating a semantic graph of the original document and identifying the substructure of such a graph that can be used to extract sentences for a document summary. We start with deep syntactic analysis of the text and, for each sentence, extract logical form triples, subject-predicate-object. We then apply cross-sentence

Research paper thumbnail of Feature selection using support vector machines

Text categorization is the task of classifying natural language documents into a set of predefine... more Text categorization is the task of classifying natural language documents into a set of predefined categories. Documents are typically represented by sparse vectors under the vector space model, where each word in the vocabulary is mapped to one coordinate axis and its occurrence in the document gives rise to one nonzero component in the vector representing that document. When training

Research paper thumbnail of SEMI-AUTOMATIC DATA-DRIVEN ONTOLOGY CONSTRUCTION SYSTEM

Information Systems, 2000

In this paper we present a new version of OntoGen system for semi-automatic data-driven ontology ... more In this paper we present a new version of OntoGen system for semi-automatic data-driven ontology construction. The system is based on a novel ontology learning framework which formalizes and extends the role of machine learning and text mining algorithms used in the previous version. List of new features includes extended number of supported ontology formats (RDFS and OWL), supervised methods

Research paper thumbnail of Inductive Learning Applied to Program Construction and Verification

Artificial Intelligence from the Information Processing Perspective, 1992

Research paper thumbnail of Visualization of Text Document Corpus

Informatica (slovenia), 2005

From the automated text processing point of view, natural language is very redundant in the sense... more From the automated text processing point of view, natural language is very redundant in the sense that many different words share a common or similar meaning. For computer this can be hard to understand without some background knowledge. Latent Semantic Indexing (LSI) is a technique that helps in extracting some of this background knowledge from corpus of text documents. This

Research paper thumbnail of Interaction of feature selection methods and linear classification models

International Conference on Machine Learning, 2002

In this paper we explore effects of various feature selection algorithms on document classificati... more In this paper we explore effects of various feature selection algorithms on document classification performance. We propose to use two, possibly distinct linear classifiers: one used exclusively for feature selection in order to obtain the feature space for training the second classifier, using possibly a different training set. The resulting classifier is used to classify new documents. Experiments show that

Research paper thumbnail of Feature Selection for Unbalanced Class Distribution and Naive Bayes

International Conference on Machine Learning, 1999

This paper describes an approach to feature subset selection that takes into account prob-lem spe... more This paper describes an approach to feature subset selection that takes into account prob-lem specifics and learning algorithm char-acteristics. It is developed for the Naive Bayesian classifier applied on text data, since it combines well with the addressed learning problems. We ...

Research paper thumbnail of A SURVEY OF ONTOLOGY EVALUATION TECHNIQUES

An ontology is an explicit formal conceptualization of some domain of interest. Ontologies are in... more An ontology is an explicit formal conceptualization of some domain of interest. Ontologies are increasingly used in various fields such as knowledge management, information extraction, and the semantic web. Ontology evaluation is the problem of assessing a given ontology from the point of view of a particular criterion of application, typically in order to determine which of several ontologies would

Research paper thumbnail of Word sequences as features in text-learning

... Popular Tags. Add a tag: No tags have been applied to this document. BibTeX | Add To MetaCart... more ... Popular Tags. Add a tag: No tags have been applied to this document. BibTeX | Add To MetaCart. @INPROCEEDINGS{Mladenic98wordsequences, author = {Dunja Mladenic and Marko Grobelnik},title = {Word Sequences as Features in Text-Learning}, booktitle = {In ...

Research paper thumbnail of Feature Selection for Classification Based on Text Hierarchy

This paper describes automatic document categorization based on large text hierarchy. Wehandle th... more This paper describes automatic document categorization based on large text hierarchy. Wehandle the large number of features and training examples by taking into account hierarchicalstructure of examples and using feature selection for large text data. We experimentally evaluatefeature subset selection on real-world text data collected from the existing Web hierarchy namedYahoo. In our learning experiments naive Bayesian classifier was used