Michał Korzycki | AGH University of Science and Technology (original) (raw)

Uploads

Papers by Michał Korzycki

Research paper thumbnail of System for Web information monitoring

2013 International Conference on Computer Applications Technology (ICCAT), 2013

Web Monitoring is an important task for crime detection and crime investigation. This paper descr... more Web Monitoring is an important task for crime detection and crime investigation. This paper describes the rationale, functionality and the architecture of the MPI system, which was developed for Web information monitoring based on advanced semantic scripts based on the Conceptual Dependency model.

Research paper thumbnail of Can the human association norm evaluate machine-made association lists

Abstract: For more than three decades, there has been a commonly shared belief that word occurren... more Abstract: For more than three decades, there has been a commonly shared belief that word occurrences retrieved from a large text collection may define the lexical meaning of a word. Although there are some suggestions that co-occurrences retrieved from texts reflect the text’s contiguities, there also exist suggestions that algorithms, such as the LSA, are unable to distinguish between co-occurrences which are corpus-independent semantic dependencies (elements of a semantic prototype) and co-occurrences which are corpus-dependent factual dependencies. We shall adopt the second view to show that existing statistical algorithms use mechanisms which improperly filter word co-occurrences retrieved from texts. To prove this supposition, we shall compare the human association list to the association list retrieved from a text by three different algorithms, i.e. the Church–Hanks algorithm, the Latent Semantic Analysis (LSA) algorithm and the Latent Dirichlet Allocation (LDA) algorithm.

Research paper thumbnail of A Dictionary based Stemming Mechanism for Polish

In this paper we present and evaluate a robust stemming mechanism for Polish. We use the Polish I... more In this paper we present and evaluate a robust stemming mechanism for Polish. We use the Polish Inflection Dictionary to build a Rule Based Stemmer and a Generative Reversed Rule Stemmer. The combination of both stemmers in the shape of the described Hybrid Stemmer provides us with a high precision stemming mechanism that is able to match human performance. This assumption is supported by a conducted experiment, the results of which are presented.

Research paper thumbnail of List of Authors

Cognitive Approach to Natural Language Processing

Research paper thumbnail of Extracting Semantic Prototypes and Factual Information from a Large Scale Corpus Using Variable Size Window Topic Modelling

Proceedings of the 2014 Federated Conference on Computer Science and Information Systems, 2014

In this paper a model of textual events composed of a mixture of semantic stereotypes and factual... more In this paper a model of textual events composed of a mixture of semantic stereotypes and factual information is proposed. A method is introduced that enables distinguishing automatically semantic prototypes of a general nature describing general categories of events from factual elements specific to a given event. Next, this paper presents the results of an experiment of unsupervised topic extraction performed on documents from a large-scale corpus with an additional temporal structure. This experiment was realized as a comparison of the nature of information provided by Latent Dirichlet Allocation and Vector Space modelling based on Log-Entropy weights. The impact of using different time windows of the corpus on the results of topic modelling is presented. Finally, a discussion is suggested on the issue if unsupervised topic modelling may reflect deeper semantic information, such as elements describing a given event or its causes and results, and discern it from pure factual data.

Research paper thumbnail of A Compile-Time Deadlock Detection Pattern

Computer Science, Apr 20, 2013

The paper presents the application of the trait technique in generic programming for compiletime ... more The paper presents the application of the trait technique in generic programming for compiletime deadlock detection and prevention in multithreaded applications.

Research paper thumbnail of Does Topic Modelling Reflect Semantic Prototypes?

Advances in Intelligent Systems and Computing, 2015

ABSTRACT

Research paper thumbnail of Extraction and Application of Geolocalized Dictionaries

14th SGEM GeoConference on INFORMATICS, GEOINFORMATICS AND REMOTE SENSING, 2014

ABSTRACT

Research paper thumbnail of MPI - system for Web information monitoring

Research paper thumbnail of Latent Semantic Analysis Evaluation of Conceptual Dependency Driven Focused Crawling

Communications in Computer and Information Science, 2012

In this paper we study a focused crawler driven by deep semantic analysis provided by the Concept... more In this paper we study a focused crawler driven by deep semantic analysis provided by the Conceptual Dependency (CD) theory. We test in practice the application of CD scripts as an approach of defining topics (queries) in a focused crawler and its robustness in evaluating real text structures extracted from HTML documents. In order to benchmark its efficiency in comparison to classical approaches, apart from human evaluation we also provide an evaluation of the result set based on its internal similarity using Latent Semantic Analysis ( ...

Research paper thumbnail of System for Web information monitoring

2013 International Conference on Computer Applications Technology (ICCAT), 2013

Web Monitoring is an important task for crime detection and crime investigation. This paper descr... more Web Monitoring is an important task for crime detection and crime investigation. This paper describes the rationale, functionality and the architecture of the MPI system, which was developed for Web information monitoring based on advanced semantic scripts based on the Conceptual Dependency model.

Research paper thumbnail of Can the human association norm evaluate machine-made association lists

Abstract: For more than three decades, there has been a commonly shared belief that word occurren... more Abstract: For more than three decades, there has been a commonly shared belief that word occurrences retrieved from a large text collection may define the lexical meaning of a word. Although there are some suggestions that co-occurrences retrieved from texts reflect the text’s contiguities, there also exist suggestions that algorithms, such as the LSA, are unable to distinguish between co-occurrences which are corpus-independent semantic dependencies (elements of a semantic prototype) and co-occurrences which are corpus-dependent factual dependencies. We shall adopt the second view to show that existing statistical algorithms use mechanisms which improperly filter word co-occurrences retrieved from texts. To prove this supposition, we shall compare the human association list to the association list retrieved from a text by three different algorithms, i.e. the Church–Hanks algorithm, the Latent Semantic Analysis (LSA) algorithm and the Latent Dirichlet Allocation (LDA) algorithm.

Research paper thumbnail of A Dictionary based Stemming Mechanism for Polish

In this paper we present and evaluate a robust stemming mechanism for Polish. We use the Polish I... more In this paper we present and evaluate a robust stemming mechanism for Polish. We use the Polish Inflection Dictionary to build a Rule Based Stemmer and a Generative Reversed Rule Stemmer. The combination of both stemmers in the shape of the described Hybrid Stemmer provides us with a high precision stemming mechanism that is able to match human performance. This assumption is supported by a conducted experiment, the results of which are presented.

Research paper thumbnail of List of Authors

Cognitive Approach to Natural Language Processing

Research paper thumbnail of Extracting Semantic Prototypes and Factual Information from a Large Scale Corpus Using Variable Size Window Topic Modelling

Proceedings of the 2014 Federated Conference on Computer Science and Information Systems, 2014

In this paper a model of textual events composed of a mixture of semantic stereotypes and factual... more In this paper a model of textual events composed of a mixture of semantic stereotypes and factual information is proposed. A method is introduced that enables distinguishing automatically semantic prototypes of a general nature describing general categories of events from factual elements specific to a given event. Next, this paper presents the results of an experiment of unsupervised topic extraction performed on documents from a large-scale corpus with an additional temporal structure. This experiment was realized as a comparison of the nature of information provided by Latent Dirichlet Allocation and Vector Space modelling based on Log-Entropy weights. The impact of using different time windows of the corpus on the results of topic modelling is presented. Finally, a discussion is suggested on the issue if unsupervised topic modelling may reflect deeper semantic information, such as elements describing a given event or its causes and results, and discern it from pure factual data.

Research paper thumbnail of A Compile-Time Deadlock Detection Pattern

Computer Science, Apr 20, 2013

The paper presents the application of the trait technique in generic programming for compiletime ... more The paper presents the application of the trait technique in generic programming for compiletime deadlock detection and prevention in multithreaded applications.

Research paper thumbnail of Does Topic Modelling Reflect Semantic Prototypes?

Advances in Intelligent Systems and Computing, 2015

ABSTRACT

Research paper thumbnail of Extraction and Application of Geolocalized Dictionaries

14th SGEM GeoConference on INFORMATICS, GEOINFORMATICS AND REMOTE SENSING, 2014

ABSTRACT

Research paper thumbnail of MPI - system for Web information monitoring

Research paper thumbnail of Latent Semantic Analysis Evaluation of Conceptual Dependency Driven Focused Crawling

Communications in Computer and Information Science, 2012

In this paper we study a focused crawler driven by deep semantic analysis provided by the Concept... more In this paper we study a focused crawler driven by deep semantic analysis provided by the Conceptual Dependency (CD) theory. We test in practice the application of CD scripts as an approach of defining topics (queries) in a focused crawler and its robustness in evaluating real text structures extracted from HTML documents. In order to benchmark its efficiency in comparison to classical approaches, apart from human evaluation we also provide an evaluation of the result set based on its internal similarity using Latent Semantic Analysis ( ...