Petr Berka - Academia.edu (original) (raw)
Papers by Petr Berka
Biomolecules
Acute heart failure (AHF) is a common and severe condition with a poor prognosis. Its course is o... more Acute heart failure (AHF) is a common and severe condition with a poor prognosis. Its course is often complicated by worsening renal function (WRF), exacerbating the outcome. The population of AHF patients experiencing WRF is heterogenous, and some novel possibilities for its analysis have recently emerged. Clustering is a machine learning (ML) technique that divides the population into distinct subgroups based on the similarity of cases (patients). Given that, we decided to use clustering to find subgroups inside the AHF population that differ in terms of WRF occurrence. We evaluated data from the three hundred and twelve AHF patients hospitalized in our institution who had creatinine assessed four times during hospitalization. Eighty-six variables evaluated at admission were included in the analysis. The k-medoids algorithm was used for clustering, and the quality of the procedure was judged by the Davies–Bouldin index. Three clinically and prognostically different clusters were d...
Biomedicines
Acute heart failure (AHF) is a life-threatening, heterogeneous disease requiring urgent diagnosis... more Acute heart failure (AHF) is a life-threatening, heterogeneous disease requiring urgent diagnosis and treatment. The clinical severity and medical procedures differ according to a complex interplay between the deterioration cause, underlying cardiac substrate, and comorbidities. This study aimed to analyze the natural phenotypic heterogeneity of the AHF population and evaluate the possibilities offered by clustering (unsupervised machine-learning technique) in a medical data assessment. We evaluated data from 381 AHF patients. Sixty-three clinical and biochemical features were assessed at the admission of the patients and were included in the analysis after the preprocessing. The K-medoids algorithm was implemented to create the clusters, and optimization, based on the Davies-Bouldin index, was used. The clustering was performed while blinded to the outcome. The outcome associations were evaluated using the Kaplan-Meier curves and Cox proportional-hazards regressions. The algorithm ...
A number of stochastic models for modeling time series data can be found in the literature. Among... more A number of stochastic models for modeling time series data can be found in the literature. Among them models based on Log-normal distribution are more traditional, while models using Johnson SB or Johnson SU distributions were introduced recently. We present basic properties of the above-mentioned distributions and discuss their usability to model economic data. Data concerning the wages of more than two million Czech employees collected for more than twenty years are used for the comparison.
The paper summarizes the conception, data preparation and result evaluation of the LDMC, which ha... more The paper summarizes the conception, data preparation and result evaluation of the LDMC, which has been organized in connection with the DMoLD'13 - Data Mining on Linked Data Workshop, Prague, September 23 (as part of the ECML/PKDD conference program).
We present two di erent approaches to knowledge acquisition from data, which have been developed ... more We present two di erent approaches to knowledge acquisition from data, which have been developed and implemented at the Dept. of Information and Knowledge Engineering of the Prague University of Economics. The KEX approach is based on the idea of acquiring PROSPECTOR like knowledge base in the form of weighted rules. The RS approach is based on a variation of the rough sets theory. The report describes both algorithms and applications of program packages that have been built upon them. Both systems are then compared using so-called MONK problems, an arti cial task that has been previously used for a comparison of various learning algorithms.
Encyclopedia of Information Science and Technology, Third Edition
The article covers basic principles of data mining, i.e. basic tasks and application areas, knowl... more The article covers basic principles of data mining, i.e. basic tasks and application areas, knowledge dicovery life cycle according to the CRISP-DM methodology. It also gives basic information about text mining and web mining.
Encyclopedia of Information Science and Technology, Third Edition
Metadata, Analysis and Interaction, 2011
In this chapter the focus is on demonstrating how different reasoning algorithms can be applied i... more In this chapter the focus is on demonstrating how different reasoning algorithms can be applied in multimedia analysis. Extracting semantics from images and videos has proved to be a very difficult task. On the other hand, artificial intelligence has made significant progress, especially in the area of knowledge technologies. Knowledge representation and reasoning form a research area that has been chosen by many researchers to enable the interpretation of the content of an image scene or a video shot. The rich theoretical ...
Pozvanie do znalostnej spolocnosti. Kyberkultura a internet, antiutopicke vizie a kriticke poznam... more Pozvanie do znalostnej spolocnosti. Kyberkultura a internet, antiutopicke vizie a kriticke poznamky, funkcie umenia v znalostnej spolocnosti, media art - umenie pre znalostnu spolecnosť?
... 978-80-227-2827-0. FIIT STU Bratislava, Ústav informatiky a softvérového ininierstva, 2008. ... more ... 978-80-227-2827-0. FIIT STU Bratislava, Ústav informatiky a softvérového ininierstva, 2008. ... Goldmann, L., Samour, A., Cobet, A., Sikora, T., Praks, P.: K-Space at TREVid ... NIST TRECVid 2007 - Text REtrieval Conference TRECVid Workshop, Gaithersburg, MD, 5-6 November ...
Vyd
On selecting a constituent part of MU the "Overview of publishing activities" page will... more On selecting a constituent part of MU the "Overview of publishing activities" page will be displayed with information relevant to the selected constituent part. The "Overview of publishing activities" page is not available for non-activated items. ... HORÁKOVÁ, Jana - KELEMEN, Jozef - BERKA, Petr - BUREŠ, Vladimír - HVORECKÝ, Jozef - MIKULECKÝ, Peter. Pozvanie do znalostnej spoločnosti. Vyd. 1. Bratislava : Iura Edition (Wolters Kluwer), 2007. 265 pp. Iura Edition. ISBN 978 -80 -8078 -149 -1. ... Pozvanie do znalostnej spoločnosti. Kyberkultúra a internet, ...
Journal of Intelligent Information Systems
In most biomedical research paper corpus, document classification is a crucial task. Even due to ... more In most biomedical research paper corpus, document classification is a crucial task. Even due to the global epidemic, it is a crucial task for researchers across a variety of fields to figure out the relevant scientific research papers accurately and quickly from a flood of biomedical research papers. It can also assist learners or researchers in assigning a research paper to an appropriate category and also help to find the relevant research paper within a very short time. A biomedical document classifier needs to be designed differently to go beyond a "general" text classifier because it's not dependent only on the text itself (i.e. on titles and abstracts) but can also utilize other information like entities extracted using some medical taxonomies or bibliometric data. The main objective of this research was to find out the type of information or features and representation method creates influence the biomedical document classification task. For this reason, we run several experiments on conventional text classification methods with different kinds of features extracted from the titles, abstracts, and bibliometric data. These procedures include data cleaning, feature engineering, and multi-class classification. Eleven different variants of input data tables were created and analyzed using ten machine learning algorithms. We also evaluate the data efficiency and interpretability of these models as essential features of any biomedical research paper classification system for handling specifically the COVID-19 related health crisis. Our major findings are that TF-IDF representations outperform the entity extraction methods and the abstract itself provides sufficient information for correct classification. Out of the used machine learning algorithms, the best performance over various forms of document representation was achieved by Random Forest and Neural Network (BERT). Our results lead to a concrete guideline for practitioners on biomedical document classification.
Abstract. It becomes a good habit to organize a data mining cup, a competition or a challenge at ... more Abstract. It becomes a good habit to organize a data mining cup, a competition or a challenge at machine learning or data mining conferences. The main idea of the Discovery Challenge organized at the European Conferences on Principles and Practice of Knowledge Discovery in Databases since 1999 was to encourage a col-laborative research effort rather than a competition between data miners. Different 330 P. Berka, J. Rauch, M. Tomečková data sets have been used for the Discovery Challenge workshops during the seven years. The paper summarizes our experience gained when organizing and evaluating the Discovery Challenge on the atherosclerosis risk factor data.
Information about the category (type) of a WWW page can be helpful for the user within search, fi... more Information about the category (type) of a WWW page can be helpful for the user within search, filtering, as well as navigation tasks. We propose a multidimensional categorisation scheme, with bibliographic dimension as the primary one. We examine the possibilities and limits of performing such categorisation based on information extracted from URL, which is particularly useful for certain on-line applications such as meta-search or navigation support. In addition, we describe the problem of ambiguity of URL terms, and suggest a method for its partial overcoming by means of machine learning. As a side--effect, we show that general purpose WWW search engines can be used for providing input data for both human and computational analysis of the web. 1 Introduction The task of document categorisation is common within web applications, in particular for navigational, search and filtering systems, which give access to large amounts of documents. The aim of the categorisation may be . to e...
In this paper, we propose the application of rulebased reasoning for knowledge assisted image seg... more In this paper, we propose the application of rulebased reasoning for knowledge assisted image segmentation and object detection. A region merging approach is proposed based on fuzzy labeling and not on visual descriptors, while reasoning is used in evaluation of dissimilarity between adjacent regions according to rules applied on local information.
Proceedings of the 22nd International Scientific Conference on Applications of Mathematics and Statistics in Economics (AMSE 2019), 2019
Comput. Informatics, 2016
The paper describes some practical aspects of using LISp-Miner for data mining. LISp-Miner is a s... more The paper describes some practical aspects of using LISp-Miner for data mining. LISp-Miner is a software tool that is under development at the University of Economics, Prague. We will review the different types of knowledge patterns discovered by the system, and discuss their applicability for various data mining tasks. We also compare LISp-Miner 18.16 with Weka 3.6.9 and Rapid Miner 5.3.
Biomolecules
Acute heart failure (AHF) is a common and severe condition with a poor prognosis. Its course is o... more Acute heart failure (AHF) is a common and severe condition with a poor prognosis. Its course is often complicated by worsening renal function (WRF), exacerbating the outcome. The population of AHF patients experiencing WRF is heterogenous, and some novel possibilities for its analysis have recently emerged. Clustering is a machine learning (ML) technique that divides the population into distinct subgroups based on the similarity of cases (patients). Given that, we decided to use clustering to find subgroups inside the AHF population that differ in terms of WRF occurrence. We evaluated data from the three hundred and twelve AHF patients hospitalized in our institution who had creatinine assessed four times during hospitalization. Eighty-six variables evaluated at admission were included in the analysis. The k-medoids algorithm was used for clustering, and the quality of the procedure was judged by the Davies–Bouldin index. Three clinically and prognostically different clusters were d...
Biomedicines
Acute heart failure (AHF) is a life-threatening, heterogeneous disease requiring urgent diagnosis... more Acute heart failure (AHF) is a life-threatening, heterogeneous disease requiring urgent diagnosis and treatment. The clinical severity and medical procedures differ according to a complex interplay between the deterioration cause, underlying cardiac substrate, and comorbidities. This study aimed to analyze the natural phenotypic heterogeneity of the AHF population and evaluate the possibilities offered by clustering (unsupervised machine-learning technique) in a medical data assessment. We evaluated data from 381 AHF patients. Sixty-three clinical and biochemical features were assessed at the admission of the patients and were included in the analysis after the preprocessing. The K-medoids algorithm was implemented to create the clusters, and optimization, based on the Davies-Bouldin index, was used. The clustering was performed while blinded to the outcome. The outcome associations were evaluated using the Kaplan-Meier curves and Cox proportional-hazards regressions. The algorithm ...
A number of stochastic models for modeling time series data can be found in the literature. Among... more A number of stochastic models for modeling time series data can be found in the literature. Among them models based on Log-normal distribution are more traditional, while models using Johnson SB or Johnson SU distributions were introduced recently. We present basic properties of the above-mentioned distributions and discuss their usability to model economic data. Data concerning the wages of more than two million Czech employees collected for more than twenty years are used for the comparison.
The paper summarizes the conception, data preparation and result evaluation of the LDMC, which ha... more The paper summarizes the conception, data preparation and result evaluation of the LDMC, which has been organized in connection with the DMoLD'13 - Data Mining on Linked Data Workshop, Prague, September 23 (as part of the ECML/PKDD conference program).
We present two di erent approaches to knowledge acquisition from data, which have been developed ... more We present two di erent approaches to knowledge acquisition from data, which have been developed and implemented at the Dept. of Information and Knowledge Engineering of the Prague University of Economics. The KEX approach is based on the idea of acquiring PROSPECTOR like knowledge base in the form of weighted rules. The RS approach is based on a variation of the rough sets theory. The report describes both algorithms and applications of program packages that have been built upon them. Both systems are then compared using so-called MONK problems, an arti cial task that has been previously used for a comparison of various learning algorithms.
Encyclopedia of Information Science and Technology, Third Edition
The article covers basic principles of data mining, i.e. basic tasks and application areas, knowl... more The article covers basic principles of data mining, i.e. basic tasks and application areas, knowledge dicovery life cycle according to the CRISP-DM methodology. It also gives basic information about text mining and web mining.
Encyclopedia of Information Science and Technology, Third Edition
Metadata, Analysis and Interaction, 2011
In this chapter the focus is on demonstrating how different reasoning algorithms can be applied i... more In this chapter the focus is on demonstrating how different reasoning algorithms can be applied in multimedia analysis. Extracting semantics from images and videos has proved to be a very difficult task. On the other hand, artificial intelligence has made significant progress, especially in the area of knowledge technologies. Knowledge representation and reasoning form a research area that has been chosen by many researchers to enable the interpretation of the content of an image scene or a video shot. The rich theoretical ...
Pozvanie do znalostnej spolocnosti. Kyberkultura a internet, antiutopicke vizie a kriticke poznam... more Pozvanie do znalostnej spolocnosti. Kyberkultura a internet, antiutopicke vizie a kriticke poznamky, funkcie umenia v znalostnej spolocnosti, media art - umenie pre znalostnu spolecnosť?
... 978-80-227-2827-0. FIIT STU Bratislava, Ústav informatiky a softvérového ininierstva, 2008. ... more ... 978-80-227-2827-0. FIIT STU Bratislava, Ústav informatiky a softvérového ininierstva, 2008. ... Goldmann, L., Samour, A., Cobet, A., Sikora, T., Praks, P.: K-Space at TREVid ... NIST TRECVid 2007 - Text REtrieval Conference TRECVid Workshop, Gaithersburg, MD, 5-6 November ...
Vyd
On selecting a constituent part of MU the "Overview of publishing activities" page will... more On selecting a constituent part of MU the "Overview of publishing activities" page will be displayed with information relevant to the selected constituent part. The "Overview of publishing activities" page is not available for non-activated items. ... HORÁKOVÁ, Jana - KELEMEN, Jozef - BERKA, Petr - BUREŠ, Vladimír - HVORECKÝ, Jozef - MIKULECKÝ, Peter. Pozvanie do znalostnej spoločnosti. Vyd. 1. Bratislava : Iura Edition (Wolters Kluwer), 2007. 265 pp. Iura Edition. ISBN 978 -80 -8078 -149 -1. ... Pozvanie do znalostnej spoločnosti. Kyberkultúra a internet, ...
Journal of Intelligent Information Systems
In most biomedical research paper corpus, document classification is a crucial task. Even due to ... more In most biomedical research paper corpus, document classification is a crucial task. Even due to the global epidemic, it is a crucial task for researchers across a variety of fields to figure out the relevant scientific research papers accurately and quickly from a flood of biomedical research papers. It can also assist learners or researchers in assigning a research paper to an appropriate category and also help to find the relevant research paper within a very short time. A biomedical document classifier needs to be designed differently to go beyond a "general" text classifier because it's not dependent only on the text itself (i.e. on titles and abstracts) but can also utilize other information like entities extracted using some medical taxonomies or bibliometric data. The main objective of this research was to find out the type of information or features and representation method creates influence the biomedical document classification task. For this reason, we run several experiments on conventional text classification methods with different kinds of features extracted from the titles, abstracts, and bibliometric data. These procedures include data cleaning, feature engineering, and multi-class classification. Eleven different variants of input data tables were created and analyzed using ten machine learning algorithms. We also evaluate the data efficiency and interpretability of these models as essential features of any biomedical research paper classification system for handling specifically the COVID-19 related health crisis. Our major findings are that TF-IDF representations outperform the entity extraction methods and the abstract itself provides sufficient information for correct classification. Out of the used machine learning algorithms, the best performance over various forms of document representation was achieved by Random Forest and Neural Network (BERT). Our results lead to a concrete guideline for practitioners on biomedical document classification.
Abstract. It becomes a good habit to organize a data mining cup, a competition or a challenge at ... more Abstract. It becomes a good habit to organize a data mining cup, a competition or a challenge at machine learning or data mining conferences. The main idea of the Discovery Challenge organized at the European Conferences on Principles and Practice of Knowledge Discovery in Databases since 1999 was to encourage a col-laborative research effort rather than a competition between data miners. Different 330 P. Berka, J. Rauch, M. Tomečková data sets have been used for the Discovery Challenge workshops during the seven years. The paper summarizes our experience gained when organizing and evaluating the Discovery Challenge on the atherosclerosis risk factor data.
Information about the category (type) of a WWW page can be helpful for the user within search, fi... more Information about the category (type) of a WWW page can be helpful for the user within search, filtering, as well as navigation tasks. We propose a multidimensional categorisation scheme, with bibliographic dimension as the primary one. We examine the possibilities and limits of performing such categorisation based on information extracted from URL, which is particularly useful for certain on-line applications such as meta-search or navigation support. In addition, we describe the problem of ambiguity of URL terms, and suggest a method for its partial overcoming by means of machine learning. As a side--effect, we show that general purpose WWW search engines can be used for providing input data for both human and computational analysis of the web. 1 Introduction The task of document categorisation is common within web applications, in particular for navigational, search and filtering systems, which give access to large amounts of documents. The aim of the categorisation may be . to e...
In this paper, we propose the application of rulebased reasoning for knowledge assisted image seg... more In this paper, we propose the application of rulebased reasoning for knowledge assisted image segmentation and object detection. A region merging approach is proposed based on fuzzy labeling and not on visual descriptors, while reasoning is used in evaluation of dissimilarity between adjacent regions according to rules applied on local information.
Proceedings of the 22nd International Scientific Conference on Applications of Mathematics and Statistics in Economics (AMSE 2019), 2019
Comput. Informatics, 2016
The paper describes some practical aspects of using LISp-Miner for data mining. LISp-Miner is a s... more The paper describes some practical aspects of using LISp-Miner for data mining. LISp-Miner is a software tool that is under development at the University of Economics, Prague. We will review the different types of knowledge patterns discovered by the system, and discuss their applicability for various data mining tasks. We also compare LISp-Miner 18.16 with Weka 3.6.9 and Rapid Miner 5.3.