Michał Korzycki | AGH University of Science and Technology (original) (raw)
Uploads
Papers by Michał Korzycki
2013 International Conference on Computer Applications Technology (ICCAT), 2013
Web Monitoring is an important task for crime detection and crime investigation. This paper descr... more Web Monitoring is an important task for crime detection and crime investigation. This paper describes the rationale, functionality and the architecture of the MPI system, which was developed for Web information monitoring based on advanced semantic scripts based on the Conceptual Dependency model.
Abstract: For more than three decades, there has been a commonly shared belief that word occurren... more Abstract: For more than three decades, there has been a commonly shared belief that word occurrences retrieved from a large text collection may define the lexical meaning of a word. Although there are some suggestions that co-occurrences retrieved from texts reflect the text’s contiguities, there also exist suggestions that algorithms, such as the LSA, are unable to distinguish between co-occurrences which are corpus-independent semantic dependencies (elements of a semantic prototype) and co-occurrences which are corpus-dependent factual dependencies. We shall adopt the second view to show that existing statistical algorithms use mechanisms which improperly filter word co-occurrences retrieved from texts. To prove this supposition, we shall compare the human association list to the association list retrieved from a text by three different algorithms, i.e. the Church–Hanks algorithm, the Latent Semantic Analysis (LSA) algorithm and the Latent Dirichlet Allocation (LDA) algorithm.
In this paper we present and evaluate a robust stemming mechanism for Polish. We use the Polish I... more In this paper we present and evaluate a robust stemming mechanism for Polish. We use the Polish Inflection Dictionary to build a Rule Based Stemmer and a Generative Reversed Rule Stemmer. The combination of both stemmers in the shape of the described Hybrid Stemmer provides us with a high precision stemming mechanism that is able to match human performance. This assumption is supported by a conducted experiment, the results of which are presented.
Cognitive Approach to Natural Language Processing
Proceedings of the 2014 Federated Conference on Computer Science and Information Systems, 2014
In this paper a model of textual events composed of a mixture of semantic stereotypes and factual... more In this paper a model of textual events composed of a mixture of semantic stereotypes and factual information is proposed. A method is introduced that enables distinguishing automatically semantic prototypes of a general nature describing general categories of events from factual elements specific to a given event. Next, this paper presents the results of an experiment of unsupervised topic extraction performed on documents from a large-scale corpus with an additional temporal structure. This experiment was realized as a comparison of the nature of information provided by Latent Dirichlet Allocation and Vector Space modelling based on Log-Entropy weights. The impact of using different time windows of the corpus on the results of topic modelling is presented. Finally, a discussion is suggested on the issue if unsupervised topic modelling may reflect deeper semantic information, such as elements describing a given event or its causes and results, and discern it from pure factual data.
Computer Science, Apr 20, 2013
The paper presents the application of the trait technique in generic programming for compiletime ... more The paper presents the application of the trait technique in generic programming for compiletime deadlock detection and prevention in multithreaded applications.
Advances in Intelligent Systems and Computing, 2015
ABSTRACT
14th SGEM GeoConference on INFORMATICS, GEOINFORMATICS AND REMOTE SENSING, 2014
ABSTRACT
Communications in Computer and Information Science, 2012
In this paper we study a focused crawler driven by deep semantic analysis provided by the Concept... more In this paper we study a focused crawler driven by deep semantic analysis provided by the Conceptual Dependency (CD) theory. We test in practice the application of CD scripts as an approach of defining topics (queries) in a focused crawler and its robustness in evaluating real text structures extracted from HTML documents. In order to benchmark its efficiency in comparison to classical approaches, apart from human evaluation we also provide an evaluation of the result set based on its internal similarity using Latent Semantic Analysis ( ...
2013 International Conference on Computer Applications Technology (ICCAT), 2013
Web Monitoring is an important task for crime detection and crime investigation. This paper descr... more Web Monitoring is an important task for crime detection and crime investigation. This paper describes the rationale, functionality and the architecture of the MPI system, which was developed for Web information monitoring based on advanced semantic scripts based on the Conceptual Dependency model.
Abstract: For more than three decades, there has been a commonly shared belief that word occurren... more Abstract: For more than three decades, there has been a commonly shared belief that word occurrences retrieved from a large text collection may define the lexical meaning of a word. Although there are some suggestions that co-occurrences retrieved from texts reflect the text’s contiguities, there also exist suggestions that algorithms, such as the LSA, are unable to distinguish between co-occurrences which are corpus-independent semantic dependencies (elements of a semantic prototype) and co-occurrences which are corpus-dependent factual dependencies. We shall adopt the second view to show that existing statistical algorithms use mechanisms which improperly filter word co-occurrences retrieved from texts. To prove this supposition, we shall compare the human association list to the association list retrieved from a text by three different algorithms, i.e. the Church–Hanks algorithm, the Latent Semantic Analysis (LSA) algorithm and the Latent Dirichlet Allocation (LDA) algorithm.
In this paper we present and evaluate a robust stemming mechanism for Polish. We use the Polish I... more In this paper we present and evaluate a robust stemming mechanism for Polish. We use the Polish Inflection Dictionary to build a Rule Based Stemmer and a Generative Reversed Rule Stemmer. The combination of both stemmers in the shape of the described Hybrid Stemmer provides us with a high precision stemming mechanism that is able to match human performance. This assumption is supported by a conducted experiment, the results of which are presented.
Cognitive Approach to Natural Language Processing
Proceedings of the 2014 Federated Conference on Computer Science and Information Systems, 2014
In this paper a model of textual events composed of a mixture of semantic stereotypes and factual... more In this paper a model of textual events composed of a mixture of semantic stereotypes and factual information is proposed. A method is introduced that enables distinguishing automatically semantic prototypes of a general nature describing general categories of events from factual elements specific to a given event. Next, this paper presents the results of an experiment of unsupervised topic extraction performed on documents from a large-scale corpus with an additional temporal structure. This experiment was realized as a comparison of the nature of information provided by Latent Dirichlet Allocation and Vector Space modelling based on Log-Entropy weights. The impact of using different time windows of the corpus on the results of topic modelling is presented. Finally, a discussion is suggested on the issue if unsupervised topic modelling may reflect deeper semantic information, such as elements describing a given event or its causes and results, and discern it from pure factual data.
Computer Science, Apr 20, 2013
The paper presents the application of the trait technique in generic programming for compiletime ... more The paper presents the application of the trait technique in generic programming for compiletime deadlock detection and prevention in multithreaded applications.
Advances in Intelligent Systems and Computing, 2015
ABSTRACT
14th SGEM GeoConference on INFORMATICS, GEOINFORMATICS AND REMOTE SENSING, 2014
ABSTRACT
Communications in Computer and Information Science, 2012
In this paper we study a focused crawler driven by deep semantic analysis provided by the Concept... more In this paper we study a focused crawler driven by deep semantic analysis provided by the Conceptual Dependency (CD) theory. We test in practice the application of CD scripts as an approach of defining topics (queries) in a focused crawler and its robustness in evaluating real text structures extracted from HTML documents. In order to benchmark its efficiency in comparison to classical approaches, apart from human evaluation we also provide an evaluation of the result set based on its internal similarity using Latent Semantic Analysis ( ...