Peter Piros | Óbuda University (original) (raw)
Papers by Peter Piros
In this comparative study authors investigated if a difference exists in the predictive power of ... more In this comparative study authors investigated if a difference exists in the predictive power of decision tree models tuned with different resampling methods. K-fold cross validation, repeated cross validation and bootstrap were used to find the optimal parameters for each model on the dataset of Hungarian Myocardial Infarction Registry. The target variable was the 1year mortality and the differences were measured in 10 different cases with different number of records on randomly selected, real-world datasets. Results show that a relatively small difference exists, and an order can be established between the resampling methods: cross validation was slightly outperformed repeated cross validation and both had better results than models trained with bootstrap.
Knowledge Based Systems, Sep 1, 2019
The objective of the current study is to compare the relative performance of decision tree, neura... more The objective of the current study is to compare the relative performance of decision tree, neural network, and logistic regression for predicting 30-day and 1-year mortality in a real-word, unfiltered dataset (n = 47, 391) of patients hospitalized with acute myocardial infarction. Area under the ROC curve (AUC) was used for evaluating performance of a learning algorithm. For 30-day mortality, we achieved an average of 0.788 for decision tree models, 0.837 for neural net models and 0.836 for regression models on training set (on validation sets: 0.774, 0.835 and 0.834, respectively). For 1-year mortality, the averages were 0.754 for decision tree models, 0.8194 for neural net models and 0.8191 for regression models (on validation sets: 0.743, 0.8179 and 0.8176, respectively). Differences were non-significant between neural network and regression, but both significantly outperformed decision trees. The machine learning methods investigated in the present study could not outperform traditional regression modelling for mortality prediction in myocardial infarction.
IEEE Conference Proceedings, 2016
Acta Polytechnica Hungarica, 2023
In the current study, we present a new approach to predict 30-day and 1-year mortality of patient... more In the current study, we present a new approach to predict 30-day and 1-year mortality of patients hospitalized with acute myocardial infarction. The dataset of this research is originated from Hungarian Myocardial Infarction Registry, a full, real-world, unfiltered database of myocardial infarctions from year 2014 to 2016 (n = 47,391). The new approach is based on ensembling and uses the prediction capability of different (already ensembled, in some cases) models like Random Forest, General Boosting Machine, Neural Network and Generalized Linear Model. We previously presented more different modelling techniques with the same target on the same dataset, and this new ensemble-based way of prediction proved to be the best among all the others. By numbers, this means 0.856 ROC AUC (area under the receiver operating characteristic curve) for the 30-day, and 0.839 ROC AUC for the 1-year mortality, both measured on validation datasets. We came to the conclusion that the combination of machine learning algorithms and regression models results the best performance in mortality prediction on the dataset of HUMIR.
In this paper, we present new predictive modelling results achieved with Generalized Boosted Mode... more In this paper, we present new predictive modelling results achieved with Generalized Boosted Models (GBM) on the dataset of Hungarian Myocardial Infarction Registry (mathbfn=mathbf47,391)(\mathbf{n}= \mathbf{47,391})(mathbfn=mathbf47,391). We investigated patients hospitalized with acute myocardial infarction from two aspects, namely the 30-day and 1-year mortality. The ROC AUC values of our new models for predicting 30-day mortality were 0.847 and 0.839 (training and validation set), while for the 1-year models these were 0.828 and 0.821, respectively. These performance values represent a strong and stable learner with almost the similar predictive power as our previously published random forest models'.
New Trends in Software Methodologies, Tools and Techniques, 2017
Nowadays, several databases store information about patients and diseases, but only a few exists ... more Nowadays, several databases store information about patients and diseases, but only a few exists that focus directly on myocardial events and treatments. This paper is divided into two parts. In the first part, we list and summarize the ongoing European myocardial projects (Myocardial Ischaemia National Audit Project (MINAP) in England, Swedish Web-system for Enhancement and Development of Evidence-based care in Heart disease Evaluated According to Recommended Therapies (SWEDEHEART) in Sweden, National Registry of Acute Myocardial Infarction in Switzerland (AMIS Plus) in Switzerland). Where possible, we discuss the validity and accuracy of the stored data. In the second part, we introduce the history and legal environment of the Hungarian Myocardial Infarction Registry (HUMIR), and some research results that were achieved with the help of the information in the Hungarian registry.
The objective of the current study is to compare how our two tree-based machine learning algorith... more The objective of the current study is to compare how our two tree-based machine learning algorithms can predict 30-day and 1-year mortality of patients hospitalized with acute myocardial infarction. The two algorithms were decision tree and random forest, and the source of dataset is Hungarian Myocardial Infarction Registry (n=47,391). As a result, we found that the ROC AUC values of Random Forest models for predicting 30-day mortality were 0.843 and 0.847 (training and validation set), while for the 1-year models these were 0.835 and 0.836, respectively. These numbers mean that, the Random Forest models were at least 5-6% better than the decision tree models, but in some cases the improvement is above 9%.
2019 IEEE 13th International Symposium on Applied Computational Intelligence and Informatics (SACI)
In this comparative study authors investigated if a difference exists in the predictive power of ... more In this comparative study authors investigated if a difference exists in the predictive power of decision tree models tuned with different resampling methods. K-fold cross validation, repeated cross validation and bootstrap were used to find the optimal parameters for each model on the dataset of Hungarian Myocardial Infarction Registry. The target variable was the 1year mortality and the differences were measured in 10 different cases with different number of records on randomly selected, real-world datasets. Results show that a relatively small difference exists, and an order can be established between the resampling methods: cross validation was slightly outperformed repeated cross validation and both had better results than models trained with bootstrap.
Applied Sciences, 2021
Data science and machine learning are buzzwords of the early 21st century. Now pervasive through ... more Data science and machine learning are buzzwords of the early 21st century. Now pervasive through human civilization, how do these concepts translate to use by researchers and clinicians in the life-science and medical field? Here, we describe a software toolkit, just large enough in scale, so that it can be maintained and extended by a small team, optimised for problems that arise in small/medium laboratories. In particular, this system may be managed from data ingestion statistics preparation predictions by a single person. At the system’s core is a graph type database, so that it is flexible in terms of irregular, constantly changing data types, as such data types are common during explorative research. At the system’s outermost shell, the concept of ’user stories’ is introduced to help the end-user researchers perform various tasks separated by their expertise: these range from simple data input, data curation, statistics, and finally to predictions via machine learning algorithm...
Nowadays, several databases store information about patients and diseases, but only a few exists ... more Nowadays, several databases store information about patients and diseases, but only a few exists that focus directly on myocardial events and treatments. This paper is divided into two parts. In the first part, we list and summarize the ongoing European myocardial projects (Myocardial Ischaemia National Audit Project (MINAP) in England, Swedish Web-system for Enhancement and Development of Evidence-based care in Heart disease Evaluated According to Recommended Therapies (SWEDEHEART) in Sweden, National Registry of Acute Myocardial Infarction in Switzerland (AMIS Plus) in Switzerland). Where possible, we discuss the validity and accuracy of the stored data. In the second part, we introduce the history and legal environment of the Hungarian Myocardial Infarction Registry (HUMIR), and some research results that were achieved with the help of the information in the Hungarian registry.
2020 IEEE 20th International Symposium on Computational Intelligence and Informatics (CINTI), 2020
In this paper, we present new predictive modelling results achieved with Generalized Boosted Mode... more In this paper, we present new predictive modelling results achieved with Generalized Boosted Models (GBM) on the dataset of Hungarian Myocardial Infarction Registry (mathbfn=mathbf47,391)(\mathbf{n}= \mathbf{47,391})(mathbfn=mathbf47,391). We investigated patients hospitalized with acute myocardial infarction from two aspects, namely the 30-day and 1-year mortality. The ROC AUC values of our new models for predicting 30-day mortality were 0.847 and 0.839 (training and validation set), while for the 1-year models these were 0.828 and 0.821, respectively. These performance values represent a strong and stable learner with almost the similar predictive power as our previously published random forest models'.
2020 IEEE 15th International Conference of System of Systems Engineering (SoSE), 2020
The objective of the current study is to compare how our two tree-based machine learning algorith... more The objective of the current study is to compare how our two tree-based machine learning algorithms can predict 30-day and 1-year mortality of patients hospitalized with acute myocardial infarction. The two algorithms were decision tree and random forest, and the source of dataset is Hungarian Myocardial Infarction Registry (n=47,391). As a result, we found that the ROC AUC values of Random Forest models for predicting 30-day mortality were 0.843 and 0.847 (training and validation set), while for the 1-year models these were 0.835 and 0.836, respectively. These numbers mean that, the Random Forest models were at least 5-6% better than the decision tree models, but in some cases the improvement is above 9%.
Knowledge-Based Systems, 2019
The objective of the current study is to compare the relative performance of decision tree, neura... more The objective of the current study is to compare the relative performance of decision tree, neural network, and logistic regression for predicting 30-day and 1-year mortality in a real-word, unfiltered dataset (n = 47, 391) of patients hospitalized with acute myocardial infarction. Area under the ROC curve (AUC) was used for evaluating performance of a learning algorithm. For 30-day mortality, we achieved an average of 0.788 for decision tree models, 0.837 for neural net models and 0.836 for regression models on training set (on validation sets: 0.774, 0.835 and 0.834, respectively). For 1-year mortality, the averages were 0.754 for decision tree models, 0.8194 for neural net models and 0.8191 for regression models (on validation sets: 0.743, 0.8179 and 0.8176, respectively). Differences were non-significant between neural network and regression, but both significantly outperformed decision trees. The machine learning methods investigated in the present study could not outperform traditional regression modelling for mortality prediction in myocardial infarction.
2016 IEEE 11th International Symposium on Applied Computational Intelligence and Informatics (SACI), 2016
Publishing information about university buildings in Linked Data format allows the information to... more Publishing information about university buildings in Linked Data format allows the information to be freely consumed and aggregated with other data sources, filtered and delivered via multiple channels and devices to potential users: students, lecturers and visitors of universities. In this paper the loc ontology for indoor navigation is introduced and its usage is demonstrated in publishing the Óbuda University indoor data as LOD. Best practices for indoor modeling of a building is presented and the possible use of the data is demonstrated by outlining some SPARQL queries for the navigation features of future applications.
2017 IEEE 15th International Symposium on Applied Machine Intelligence and Informatics (SAMI), 2017
Óbuda University wanted to publish public information related to the university in the form of Li... more Óbuda University wanted to publish public information related to the university in the form of Linked Open Data. Information about subjects, courses, events and teachers are successfully published using the related standards. We had to collect and categorize the sources where information come from; define a way for each category to make these information available; and develop algorithms and softwares to generate the gathered information in form of Linked Open Data. Finally, the automation of the generation process was partly achieved.
2016 IEEE 11th International Symposium on Applied Computational Intelligence and Informatics (SACI), 2016
In this comparative study authors investigated if a difference exists in the predictive power of ... more In this comparative study authors investigated if a difference exists in the predictive power of decision tree models tuned with different resampling methods. K-fold cross validation, repeated cross validation and bootstrap were used to find the optimal parameters for each model on the dataset of Hungarian Myocardial Infarction Registry. The target variable was the 1year mortality and the differences were measured in 10 different cases with different number of records on randomly selected, real-world datasets. Results show that a relatively small difference exists, and an order can be established between the resampling methods: cross validation was slightly outperformed repeated cross validation and both had better results than models trained with bootstrap.
Knowledge Based Systems, Sep 1, 2019
The objective of the current study is to compare the relative performance of decision tree, neura... more The objective of the current study is to compare the relative performance of decision tree, neural network, and logistic regression for predicting 30-day and 1-year mortality in a real-word, unfiltered dataset (n = 47, 391) of patients hospitalized with acute myocardial infarction. Area under the ROC curve (AUC) was used for evaluating performance of a learning algorithm. For 30-day mortality, we achieved an average of 0.788 for decision tree models, 0.837 for neural net models and 0.836 for regression models on training set (on validation sets: 0.774, 0.835 and 0.834, respectively). For 1-year mortality, the averages were 0.754 for decision tree models, 0.8194 for neural net models and 0.8191 for regression models (on validation sets: 0.743, 0.8179 and 0.8176, respectively). Differences were non-significant between neural network and regression, but both significantly outperformed decision trees. The machine learning methods investigated in the present study could not outperform traditional regression modelling for mortality prediction in myocardial infarction.
IEEE Conference Proceedings, 2016
Acta Polytechnica Hungarica, 2023
In the current study, we present a new approach to predict 30-day and 1-year mortality of patient... more In the current study, we present a new approach to predict 30-day and 1-year mortality of patients hospitalized with acute myocardial infarction. The dataset of this research is originated from Hungarian Myocardial Infarction Registry, a full, real-world, unfiltered database of myocardial infarctions from year 2014 to 2016 (n = 47,391). The new approach is based on ensembling and uses the prediction capability of different (already ensembled, in some cases) models like Random Forest, General Boosting Machine, Neural Network and Generalized Linear Model. We previously presented more different modelling techniques with the same target on the same dataset, and this new ensemble-based way of prediction proved to be the best among all the others. By numbers, this means 0.856 ROC AUC (area under the receiver operating characteristic curve) for the 30-day, and 0.839 ROC AUC for the 1-year mortality, both measured on validation datasets. We came to the conclusion that the combination of machine learning algorithms and regression models results the best performance in mortality prediction on the dataset of HUMIR.
In this paper, we present new predictive modelling results achieved with Generalized Boosted Mode... more In this paper, we present new predictive modelling results achieved with Generalized Boosted Models (GBM) on the dataset of Hungarian Myocardial Infarction Registry (mathbfn=mathbf47,391)(\mathbf{n}= \mathbf{47,391})(mathbfn=mathbf47,391). We investigated patients hospitalized with acute myocardial infarction from two aspects, namely the 30-day and 1-year mortality. The ROC AUC values of our new models for predicting 30-day mortality were 0.847 and 0.839 (training and validation set), while for the 1-year models these were 0.828 and 0.821, respectively. These performance values represent a strong and stable learner with almost the similar predictive power as our previously published random forest models'.
New Trends in Software Methodologies, Tools and Techniques, 2017
Nowadays, several databases store information about patients and diseases, but only a few exists ... more Nowadays, several databases store information about patients and diseases, but only a few exists that focus directly on myocardial events and treatments. This paper is divided into two parts. In the first part, we list and summarize the ongoing European myocardial projects (Myocardial Ischaemia National Audit Project (MINAP) in England, Swedish Web-system for Enhancement and Development of Evidence-based care in Heart disease Evaluated According to Recommended Therapies (SWEDEHEART) in Sweden, National Registry of Acute Myocardial Infarction in Switzerland (AMIS Plus) in Switzerland). Where possible, we discuss the validity and accuracy of the stored data. In the second part, we introduce the history and legal environment of the Hungarian Myocardial Infarction Registry (HUMIR), and some research results that were achieved with the help of the information in the Hungarian registry.
The objective of the current study is to compare how our two tree-based machine learning algorith... more The objective of the current study is to compare how our two tree-based machine learning algorithms can predict 30-day and 1-year mortality of patients hospitalized with acute myocardial infarction. The two algorithms were decision tree and random forest, and the source of dataset is Hungarian Myocardial Infarction Registry (n=47,391). As a result, we found that the ROC AUC values of Random Forest models for predicting 30-day mortality were 0.843 and 0.847 (training and validation set), while for the 1-year models these were 0.835 and 0.836, respectively. These numbers mean that, the Random Forest models were at least 5-6% better than the decision tree models, but in some cases the improvement is above 9%.
2019 IEEE 13th International Symposium on Applied Computational Intelligence and Informatics (SACI)
In this comparative study authors investigated if a difference exists in the predictive power of ... more In this comparative study authors investigated if a difference exists in the predictive power of decision tree models tuned with different resampling methods. K-fold cross validation, repeated cross validation and bootstrap were used to find the optimal parameters for each model on the dataset of Hungarian Myocardial Infarction Registry. The target variable was the 1year mortality and the differences were measured in 10 different cases with different number of records on randomly selected, real-world datasets. Results show that a relatively small difference exists, and an order can be established between the resampling methods: cross validation was slightly outperformed repeated cross validation and both had better results than models trained with bootstrap.
Applied Sciences, 2021
Data science and machine learning are buzzwords of the early 21st century. Now pervasive through ... more Data science and machine learning are buzzwords of the early 21st century. Now pervasive through human civilization, how do these concepts translate to use by researchers and clinicians in the life-science and medical field? Here, we describe a software toolkit, just large enough in scale, so that it can be maintained and extended by a small team, optimised for problems that arise in small/medium laboratories. In particular, this system may be managed from data ingestion statistics preparation predictions by a single person. At the system’s core is a graph type database, so that it is flexible in terms of irregular, constantly changing data types, as such data types are common during explorative research. At the system’s outermost shell, the concept of ’user stories’ is introduced to help the end-user researchers perform various tasks separated by their expertise: these range from simple data input, data curation, statistics, and finally to predictions via machine learning algorithm...
Nowadays, several databases store information about patients and diseases, but only a few exists ... more Nowadays, several databases store information about patients and diseases, but only a few exists that focus directly on myocardial events and treatments. This paper is divided into two parts. In the first part, we list and summarize the ongoing European myocardial projects (Myocardial Ischaemia National Audit Project (MINAP) in England, Swedish Web-system for Enhancement and Development of Evidence-based care in Heart disease Evaluated According to Recommended Therapies (SWEDEHEART) in Sweden, National Registry of Acute Myocardial Infarction in Switzerland (AMIS Plus) in Switzerland). Where possible, we discuss the validity and accuracy of the stored data. In the second part, we introduce the history and legal environment of the Hungarian Myocardial Infarction Registry (HUMIR), and some research results that were achieved with the help of the information in the Hungarian registry.
2020 IEEE 20th International Symposium on Computational Intelligence and Informatics (CINTI), 2020
In this paper, we present new predictive modelling results achieved with Generalized Boosted Mode... more In this paper, we present new predictive modelling results achieved with Generalized Boosted Models (GBM) on the dataset of Hungarian Myocardial Infarction Registry (mathbfn=mathbf47,391)(\mathbf{n}= \mathbf{47,391})(mathbfn=mathbf47,391). We investigated patients hospitalized with acute myocardial infarction from two aspects, namely the 30-day and 1-year mortality. The ROC AUC values of our new models for predicting 30-day mortality were 0.847 and 0.839 (training and validation set), while for the 1-year models these were 0.828 and 0.821, respectively. These performance values represent a strong and stable learner with almost the similar predictive power as our previously published random forest models'.
2020 IEEE 15th International Conference of System of Systems Engineering (SoSE), 2020
The objective of the current study is to compare how our two tree-based machine learning algorith... more The objective of the current study is to compare how our two tree-based machine learning algorithms can predict 30-day and 1-year mortality of patients hospitalized with acute myocardial infarction. The two algorithms were decision tree and random forest, and the source of dataset is Hungarian Myocardial Infarction Registry (n=47,391). As a result, we found that the ROC AUC values of Random Forest models for predicting 30-day mortality were 0.843 and 0.847 (training and validation set), while for the 1-year models these were 0.835 and 0.836, respectively. These numbers mean that, the Random Forest models were at least 5-6% better than the decision tree models, but in some cases the improvement is above 9%.
Knowledge-Based Systems, 2019
The objective of the current study is to compare the relative performance of decision tree, neura... more The objective of the current study is to compare the relative performance of decision tree, neural network, and logistic regression for predicting 30-day and 1-year mortality in a real-word, unfiltered dataset (n = 47, 391) of patients hospitalized with acute myocardial infarction. Area under the ROC curve (AUC) was used for evaluating performance of a learning algorithm. For 30-day mortality, we achieved an average of 0.788 for decision tree models, 0.837 for neural net models and 0.836 for regression models on training set (on validation sets: 0.774, 0.835 and 0.834, respectively). For 1-year mortality, the averages were 0.754 for decision tree models, 0.8194 for neural net models and 0.8191 for regression models (on validation sets: 0.743, 0.8179 and 0.8176, respectively). Differences were non-significant between neural network and regression, but both significantly outperformed decision trees. The machine learning methods investigated in the present study could not outperform traditional regression modelling for mortality prediction in myocardial infarction.
2016 IEEE 11th International Symposium on Applied Computational Intelligence and Informatics (SACI), 2016
Publishing information about university buildings in Linked Data format allows the information to... more Publishing information about university buildings in Linked Data format allows the information to be freely consumed and aggregated with other data sources, filtered and delivered via multiple channels and devices to potential users: students, lecturers and visitors of universities. In this paper the loc ontology for indoor navigation is introduced and its usage is demonstrated in publishing the Óbuda University indoor data as LOD. Best practices for indoor modeling of a building is presented and the possible use of the data is demonstrated by outlining some SPARQL queries for the navigation features of future applications.
2017 IEEE 15th International Symposium on Applied Machine Intelligence and Informatics (SAMI), 2017
Óbuda University wanted to publish public information related to the university in the form of Li... more Óbuda University wanted to publish public information related to the university in the form of Linked Open Data. Information about subjects, courses, events and teachers are successfully published using the related standards. We had to collect and categorize the sources where information come from; define a way for each category to make these information available; and develop algorithms and softwares to generate the gathered information in form of Linked Open Data. Finally, the automation of the generation process was partly achieved.
2016 IEEE 11th International Symposium on Applied Computational Intelligence and Informatics (SACI), 2016