Branko Kavšek | University of Primorska (original) (raw)
Papers by Branko Kavšek
Managing Global Transitions, 2017
Stock market analysis is one of the biggest areas of interest for text mining. Many researchers p... more Stock market analysis is one of the biggest areas of interest for text mining. Many researchers proposed different approaches that use text information for predicting the movement of stock market indices. Many of these approaches focus either on maximising the predictive accuracy of the model or on devising alternative methods for model evaluation. In this paper, we propose a more descriptive approach focusing on the models themselves, trying to identify the individual words in the text that most affect the movement of stock market indices. We use data from two sources (for the past eight years): the daily data for the Dow Jones Industrial Average index ('open' and 'close' values for each trading day) and the headlines of the most voted 25 news on the Reddit WorldNews Channel for the previous 'trading days. ' By applying machine learning algorithms on these data and analysing individual words that appear in the final predictive models, we find that the words gay, propaganda and massacre are typically associated with a daily increase of the stock index, while the word iran mostly coincide with its decrease. While this work presents a first step towards qualitative analysis of stock market models, there is still plenty of room for improvements.
Applied Sciences, 2022
Indoor Air Quality monitoring is a major asset to improving quality of life and building manageme... more Indoor Air Quality monitoring is a major asset to improving quality of life and building management. Today, the evolution of embedded technologies allows the implementation of such monitoring on the edge of the network. However, several concerns need to be addressed related to data security and privacy, routing and sink placement optimization, protection from external monitoring, and distributed data mining. In this paper, we describe an integrated framework that features distributed storage, blockchain-based Role-based Access Control, onion routing, routing and sink placement optimization, and distributed data mining to answer these concerns. We describe the organization of our contribution and show its relevance with simulations and experiments over a set of use cases.
This paper reports on data mining experiences of the
Procedia Computer Science, 2020
Associative classification is a machine learning approach that aims to build accurate, effective ... more Associative classification is a machine learning approach that aims to build accurate, effective and compact classification models (classifiers) by combining paradigms from classification and association rule mining. Research studies show that associative classification approaches could achieve higher accuracy than some of the traditional classification methods. In this paper, we propose a simple and accurate classification method by selecting "strong" class association rules that highly contribute to improve the overall coverage of the classifier. The advantage of our proposed classifier is that it generates reasonably less rules on bigger datasets compared to traditional rule-based classifiers. We also discuss how the overall coverage of such classifiers affects their classification accuracy. We have performed experiments on 15 real-life datasets from the UCI Machine Learning Database Repository and compared our proposed associative classifier with other 8 well-known classification algorithms on accuracy and the number of classification rules (all differences were tested for statistical significance). Experimental results show that our proposed method was comparative with other well-known classification algorithms on accuracy, it achieved the fourthhighest average accuracy (82.7%) among all classification methods, and tends to outperform the other algorithms in terms of average number of rules (especially on bigger datasets). Although not achieving the best results in terms of classification accuracy, our approach is relatively simple and produces a compact and understandable classifier by exhaustively searching the entire example space.
Advances in Methodology and Statistics, 2004
Rule learning is typically used in solving classification and prediction tasks. However, learning... more Rule learning is typically used in solving classification and prediction tasks. However, learning of classification rules can be adapted also to subgroup discovery. Such an adaptation has already been done for the CN2 rule learning algorithm. In previous work this new algorithm, called CN2-SD, has been described in detail and applied to the well known UCI data sets. This paper summarizes the modifications needed for the adaptation of the CN2 rule learner to subgroup discovery and presents its application to a real-life data set - the UK traffic data - confirming its appropriateness for subgroup discovery in real-life applications through experimental comparison with the CN2 rule learning algorithm as well as through the evaluation of an expert. Furthermore we make the first step towards the comparison of the new CN2-SD algorithm to another state-of-the-art subgroup discovery algorithm SubgroupMiner by applying both algorithms to a slightly different data set - the UK traffic challen...
József Balogh: On some geometric applications of the container method Béla Csaba: Embedding graph... more József Balogh: On some geometric applications of the container method Béla Csaba: Embedding graphs having Ore-degree at most five Dezső Miklós: On the vertex and edge sign balances of (hyper)graphs István Miklós: The swap Markov chain is rapidly mixing on the realizations of linearly bounded degree sequences Coffee break SESSION 11.00-12.30 Chair: Andrej Brodnik Rolf Niedermeier: Fixed-parameter tractability inside P Benedek Nagy: On 5'-3' Watson-Crick finite and pushdown automata Moritz von Looz: Parallel mesh (re)partitioning with balanced k-means Sándor Szabó: Cliques and differential equations Lunch SESSION 14.00-16.00 Chair: József Békési Nysret Musliu: Improving the efficiency of dynamic programming on tree decompositions via machine learning Branko Kavšek: Development and evaluation of different models for predicting tourist category from texts Csaba Raduly-Baka: Discrete structures in access road design
Journal of Machine Learning Research, 2004
This paper investigates how to adapt standard classification rule learning approaches to subgroup... more This paper investigates how to adapt standard classification rule learning approaches to subgroup discovery. The goal of subgroup discovery is to find rules describing subsets of the population tha...
Building accurate and compact classifiers in real-world applications is one of the crucial tasks ... more Building accurate and compact classifiers in real-world applications is one of the crucial tasks in data mining nowadays. In this paper, we propose a new method that can reduce the number of class association rules produced by classical class association rule classifiers, while maintaining an accurate classification model that is comparable to the ones generated by state-of-the-art classification algorithms. More precisely, we propose a new associative classifier that selects “strong” class association rules based on overall coverage of the learning set. The advantage of the proposed classifier is that it generates significantly smaller rules on bigger datasets compared to traditional classifiers while maintaining the classification accuracy. We also discuss how the overall coverage of such classifiers affects their classification accuracy. Performed experiments measuring classification accuracy, number of classification rules and other relevance measures such as precision, recall a...
Today we are used to being interconnected via our smartphones and having our phone location track... more Today we are used to being interconnected via our smartphones and having our phone location tracked by different apps. ICT technology enables real-time monitoring and processing the user location data from GPS coordinates of a phone. Based on observing the user mobility, Artificial Intelligence methods can be used to improve transportation, proactively provide mobility recommendations and acquire knowledge using the user context. This paper describes the application of machine learning algorithms on user mobility data to identify and understand potentially interesting events. The data for this research was collected from a sample of users consenting to be monitored through our in-house developed smart phone app. A pilot study that includes 227 users that were tracked over a period of 7 years yields fairly positive evaluation results in terms of predictive accuracy of identified events but succeeds in identifying exclusively “well-known” events related to users going to or coming fro...
Machine Learning, Optimization, and Data Science, 2019
Existing classification rule learning algorithms use mainly greedy heuristic search to find regul... more Existing classification rule learning algorithms use mainly greedy heuristic search to find regularities in datasets for classification. In recent years, extensive research on association rule mining was performed in the machine learning community on learning rules by using exhaustive search. The main objective is to find all rules in data that satisfy the user-specified minimum support and minimum confidence constraints. Although the whole set of rules may not be used directly for accurate classification, effective and efficient classifiers have been built using these, so called, classification association rules.
Computer Science and Information Systems, 2020
Huge amounts of data are being collected and analyzed nowadays. By using the popular rule-learnin... more Huge amounts of data are being collected and analyzed nowadays. By using the popular rule-learning algorithms, the number of rule discovered on those ?big? datasets can easily exceed thousands. To produce compact, understandable and accurate classifiers, such rules have to be grouped and pruned, so that only a reasonable number of them are presented to the end user for inspection and further analysis. In this paper, we propose new methods that are able to reduce the number of class association rules produced by ?classical? class association rule classifiers, while maintaining an accurate classification model that is comparable to the ones generated by state-of-the-art classification algorithms. More precisely, we propose new associative classifiers, called DC, DDC and CDC, that use distance-based agglomerative hierarchical clustering as a post-processing step to reduce the number of its rules, and in the rule-selection step, we use different strategies (based on database coverage an...
StuCoSReC. Proceedings of the 2018 5th Student Computer Science Research Conference., Oct 1, 2018
It is no longer necessary to emphasize that computer science is omnipresent today. Terms as digit... more It is no longer necessary to emphasize that computer science is omnipresent today. Terms as digitalization, Industry 4.0 etc. are frequently heard in the mass media and are becoming a part of our everyday life. The rapid advances in computer science require a large investment in research and development. In particular, it is important to educate young scientists, which is one of the objectives of the StuCoSReC conference. In front of you is the proceedings of the fifth Stu-CoSReC conference. The StuCoSReC is intended for masters & PhD students to show their research achievements. Socializing is also important and the conference is an excellent opportunity for students to get to know each other. The conference connects students of computer science and goes beyond the invisible limits of faculties. The uniqueness of StuCoSReC conference is that it is organized each year by different higher education institution. The University of Ljubljana-Faculty of Computer and Information Science is proud to host the conference this year. Twelve papers addressed this conference, covering several topics of the computer science. All the papers were reviewed by two international reviewers and accepted for the oral presentation. ref. prof. dr. Nikolaj Zimic Kataložni zapis o publikaciji (CIP) pripravili v Narodni in univerzitetni knjižnici v Ljubljani COBISS.
Geografski vestnik, 2015
Sa šoMoš kon Harp hasea,d. o. o.Ko per,Čev ljar skauli ca8,SI-6000 Ko per,Slo ve ni ja;sa som@har... more Sa šoMoš kon Harp hasea,d. o. o.Ko per,Čev ljar skauli ca8,SI-6000 Ko per,Slo ve ni ja;sa som@harp ha sea.si dr. Ja nezŽibert Uni ver zanaPri mor skem,Fakul te tazamate ma ti ko,nara vo slov jeininfor ma cij sketeh no lo gi je,Gla go ljaška 8, SI-6000 Ko per,Slo ve ni ja;ja nez.zi
Lecture Notes in Computer Science, 2015
Terms of service (ToS) are becoming an ubiquitous part of online account creation. There is a gen... more Terms of service (ToS) are becoming an ubiquitous part of online account creation. There is a general understanding that users rarely read them and do not particularly care about binding themselves into legally enforceable contracts with online service providers. Some services are trying to change this trend with presenting ToS section as key points on a ToS dedicated page. However, little is known how would such presentation of key points affect the continuation of user registration at the time of account creation. This paper provides an exploratory study in this area. We have offered users to participate in a draft for a prize in exchange for their names and email addresses. For this purpose we have created three registration forms: a standard form with ToS hiding behind a hyperlink and two with ToS key points presented at the time of account creation with different engagement requirements. Initial results suggest that ToS key points presented just as a list at the time of account creation is no more engaging than a form with ToS hidden behind a link. M ore text even made several users to complete the registration quicker than the users with the standard form. M oreover, different designs of the ToS key points list requiring different user engagement affect the interaction and reading of ToS key points, but the actual time spent on ToS is very low.
Applied Artificial Intelligence
This paper reports on data mining experiences of the 5th Framework project Data Mining and Decisi... more This paper reports on data mining experiences of the 5th Framework project Data Mining and Decision Support for Business Competitiveness: A European Virtual Enterprise (Sol-Eu-Net). The data mining lessons learned are reported from the following perspectives: application results, business, views of Sol-Eu-Net partners acquired by interview technique, and lessons learned in two particular data mining projects: analysis of Web education materials and UK traffic accident data analysis.
Lecture Notes in Computer Science, 2005
ABSTRACT This paper proposes a selection of knowledge technologies for health care planning and d... more ABSTRACT This paper proposes a selection of knowledge technologies for health care planning and decision support in regional-level management of Slovenian public health care. Data mining and statistical techniques were used to analyze databases collected by a regional Public Heath Institute. Specifically, we addressed the problem of directing patients from primary health care centers to specialists. Decision support tools were used for resource modeling in terms of availability and accessibility of public health services for the population. Specifically, we analyzed organisational aspects of public health resources in one Sovenian region (Celje) with the goal to identify the areas that are atypical in terms of availability and accessibility of public health services.
2007 29th International Conference on Information Technology Interfaces, 2007
The paper presents a way to overcome the shortcomings of traditional learning by enforcing collab... more The paper presents a way to overcome the shortcomings of traditional learning by enforcing collaboration between students and introducing self-assessment as part of the process of final grade formation. Treating collaboration and self-assessment as two elements of a modern learning process that are very closely bounded together, the authors argue that these elements should by no means replace the traditional (ex-cathedra) way of learning but rather extend it. A specifically designed computer science course is presented as an illustration of how the introduction of self-assessment combined with teacher evaluation can encourage collaboration between students. The benefits and drawbacks of this method are discussed.
Managing Global Transitions, 2017
Stock market analysis is one of the biggest areas of interest for text mining. Many researchers p... more Stock market analysis is one of the biggest areas of interest for text mining. Many researchers proposed different approaches that use text information for predicting the movement of stock market indices. Many of these approaches focus either on maximising the predictive accuracy of the model or on devising alternative methods for model evaluation. In this paper, we propose a more descriptive approach focusing on the models themselves, trying to identify the individual words in the text that most affect the movement of stock market indices. We use data from two sources (for the past eight years): the daily data for the Dow Jones Industrial Average index ('open' and 'close' values for each trading day) and the headlines of the most voted 25 news on the Reddit WorldNews Channel for the previous 'trading days. ' By applying machine learning algorithms on these data and analysing individual words that appear in the final predictive models, we find that the words gay, propaganda and massacre are typically associated with a daily increase of the stock index, while the word iran mostly coincide with its decrease. While this work presents a first step towards qualitative analysis of stock market models, there is still plenty of room for improvements.
Applied Sciences, 2022
Indoor Air Quality monitoring is a major asset to improving quality of life and building manageme... more Indoor Air Quality monitoring is a major asset to improving quality of life and building management. Today, the evolution of embedded technologies allows the implementation of such monitoring on the edge of the network. However, several concerns need to be addressed related to data security and privacy, routing and sink placement optimization, protection from external monitoring, and distributed data mining. In this paper, we describe an integrated framework that features distributed storage, blockchain-based Role-based Access Control, onion routing, routing and sink placement optimization, and distributed data mining to answer these concerns. We describe the organization of our contribution and show its relevance with simulations and experiments over a set of use cases.
This paper reports on data mining experiences of the
Procedia Computer Science, 2020
Associative classification is a machine learning approach that aims to build accurate, effective ... more Associative classification is a machine learning approach that aims to build accurate, effective and compact classification models (classifiers) by combining paradigms from classification and association rule mining. Research studies show that associative classification approaches could achieve higher accuracy than some of the traditional classification methods. In this paper, we propose a simple and accurate classification method by selecting "strong" class association rules that highly contribute to improve the overall coverage of the classifier. The advantage of our proposed classifier is that it generates reasonably less rules on bigger datasets compared to traditional rule-based classifiers. We also discuss how the overall coverage of such classifiers affects their classification accuracy. We have performed experiments on 15 real-life datasets from the UCI Machine Learning Database Repository and compared our proposed associative classifier with other 8 well-known classification algorithms on accuracy and the number of classification rules (all differences were tested for statistical significance). Experimental results show that our proposed method was comparative with other well-known classification algorithms on accuracy, it achieved the fourthhighest average accuracy (82.7%) among all classification methods, and tends to outperform the other algorithms in terms of average number of rules (especially on bigger datasets). Although not achieving the best results in terms of classification accuracy, our approach is relatively simple and produces a compact and understandable classifier by exhaustively searching the entire example space.
Advances in Methodology and Statistics, 2004
Rule learning is typically used in solving classification and prediction tasks. However, learning... more Rule learning is typically used in solving classification and prediction tasks. However, learning of classification rules can be adapted also to subgroup discovery. Such an adaptation has already been done for the CN2 rule learning algorithm. In previous work this new algorithm, called CN2-SD, has been described in detail and applied to the well known UCI data sets. This paper summarizes the modifications needed for the adaptation of the CN2 rule learner to subgroup discovery and presents its application to a real-life data set - the UK traffic data - confirming its appropriateness for subgroup discovery in real-life applications through experimental comparison with the CN2 rule learning algorithm as well as through the evaluation of an expert. Furthermore we make the first step towards the comparison of the new CN2-SD algorithm to another state-of-the-art subgroup discovery algorithm SubgroupMiner by applying both algorithms to a slightly different data set - the UK traffic challen...
József Balogh: On some geometric applications of the container method Béla Csaba: Embedding graph... more József Balogh: On some geometric applications of the container method Béla Csaba: Embedding graphs having Ore-degree at most five Dezső Miklós: On the vertex and edge sign balances of (hyper)graphs István Miklós: The swap Markov chain is rapidly mixing on the realizations of linearly bounded degree sequences Coffee break SESSION 11.00-12.30 Chair: Andrej Brodnik Rolf Niedermeier: Fixed-parameter tractability inside P Benedek Nagy: On 5'-3' Watson-Crick finite and pushdown automata Moritz von Looz: Parallel mesh (re)partitioning with balanced k-means Sándor Szabó: Cliques and differential equations Lunch SESSION 14.00-16.00 Chair: József Békési Nysret Musliu: Improving the efficiency of dynamic programming on tree decompositions via machine learning Branko Kavšek: Development and evaluation of different models for predicting tourist category from texts Csaba Raduly-Baka: Discrete structures in access road design
Journal of Machine Learning Research, 2004
This paper investigates how to adapt standard classification rule learning approaches to subgroup... more This paper investigates how to adapt standard classification rule learning approaches to subgroup discovery. The goal of subgroup discovery is to find rules describing subsets of the population tha...
Building accurate and compact classifiers in real-world applications is one of the crucial tasks ... more Building accurate and compact classifiers in real-world applications is one of the crucial tasks in data mining nowadays. In this paper, we propose a new method that can reduce the number of class association rules produced by classical class association rule classifiers, while maintaining an accurate classification model that is comparable to the ones generated by state-of-the-art classification algorithms. More precisely, we propose a new associative classifier that selects “strong” class association rules based on overall coverage of the learning set. The advantage of the proposed classifier is that it generates significantly smaller rules on bigger datasets compared to traditional classifiers while maintaining the classification accuracy. We also discuss how the overall coverage of such classifiers affects their classification accuracy. Performed experiments measuring classification accuracy, number of classification rules and other relevance measures such as precision, recall a...
Today we are used to being interconnected via our smartphones and having our phone location track... more Today we are used to being interconnected via our smartphones and having our phone location tracked by different apps. ICT technology enables real-time monitoring and processing the user location data from GPS coordinates of a phone. Based on observing the user mobility, Artificial Intelligence methods can be used to improve transportation, proactively provide mobility recommendations and acquire knowledge using the user context. This paper describes the application of machine learning algorithms on user mobility data to identify and understand potentially interesting events. The data for this research was collected from a sample of users consenting to be monitored through our in-house developed smart phone app. A pilot study that includes 227 users that were tracked over a period of 7 years yields fairly positive evaluation results in terms of predictive accuracy of identified events but succeeds in identifying exclusively “well-known” events related to users going to or coming fro...
Machine Learning, Optimization, and Data Science, 2019
Existing classification rule learning algorithms use mainly greedy heuristic search to find regul... more Existing classification rule learning algorithms use mainly greedy heuristic search to find regularities in datasets for classification. In recent years, extensive research on association rule mining was performed in the machine learning community on learning rules by using exhaustive search. The main objective is to find all rules in data that satisfy the user-specified minimum support and minimum confidence constraints. Although the whole set of rules may not be used directly for accurate classification, effective and efficient classifiers have been built using these, so called, classification association rules.
Computer Science and Information Systems, 2020
Huge amounts of data are being collected and analyzed nowadays. By using the popular rule-learnin... more Huge amounts of data are being collected and analyzed nowadays. By using the popular rule-learning algorithms, the number of rule discovered on those ?big? datasets can easily exceed thousands. To produce compact, understandable and accurate classifiers, such rules have to be grouped and pruned, so that only a reasonable number of them are presented to the end user for inspection and further analysis. In this paper, we propose new methods that are able to reduce the number of class association rules produced by ?classical? class association rule classifiers, while maintaining an accurate classification model that is comparable to the ones generated by state-of-the-art classification algorithms. More precisely, we propose new associative classifiers, called DC, DDC and CDC, that use distance-based agglomerative hierarchical clustering as a post-processing step to reduce the number of its rules, and in the rule-selection step, we use different strategies (based on database coverage an...
StuCoSReC. Proceedings of the 2018 5th Student Computer Science Research Conference., Oct 1, 2018
It is no longer necessary to emphasize that computer science is omnipresent today. Terms as digit... more It is no longer necessary to emphasize that computer science is omnipresent today. Terms as digitalization, Industry 4.0 etc. are frequently heard in the mass media and are becoming a part of our everyday life. The rapid advances in computer science require a large investment in research and development. In particular, it is important to educate young scientists, which is one of the objectives of the StuCoSReC conference. In front of you is the proceedings of the fifth Stu-CoSReC conference. The StuCoSReC is intended for masters & PhD students to show their research achievements. Socializing is also important and the conference is an excellent opportunity for students to get to know each other. The conference connects students of computer science and goes beyond the invisible limits of faculties. The uniqueness of StuCoSReC conference is that it is organized each year by different higher education institution. The University of Ljubljana-Faculty of Computer and Information Science is proud to host the conference this year. Twelve papers addressed this conference, covering several topics of the computer science. All the papers were reviewed by two international reviewers and accepted for the oral presentation. ref. prof. dr. Nikolaj Zimic Kataložni zapis o publikaciji (CIP) pripravili v Narodni in univerzitetni knjižnici v Ljubljani COBISS.
Geografski vestnik, 2015
Sa šoMoš kon Harp hasea,d. o. o.Ko per,Čev ljar skauli ca8,SI-6000 Ko per,Slo ve ni ja;sa som@har... more Sa šoMoš kon Harp hasea,d. o. o.Ko per,Čev ljar skauli ca8,SI-6000 Ko per,Slo ve ni ja;sa som@harp ha sea.si dr. Ja nezŽibert Uni ver zanaPri mor skem,Fakul te tazamate ma ti ko,nara vo slov jeininfor ma cij sketeh no lo gi je,Gla go ljaška 8, SI-6000 Ko per,Slo ve ni ja;ja nez.zi
Lecture Notes in Computer Science, 2015
Terms of service (ToS) are becoming an ubiquitous part of online account creation. There is a gen... more Terms of service (ToS) are becoming an ubiquitous part of online account creation. There is a general understanding that users rarely read them and do not particularly care about binding themselves into legally enforceable contracts with online service providers. Some services are trying to change this trend with presenting ToS section as key points on a ToS dedicated page. However, little is known how would such presentation of key points affect the continuation of user registration at the time of account creation. This paper provides an exploratory study in this area. We have offered users to participate in a draft for a prize in exchange for their names and email addresses. For this purpose we have created three registration forms: a standard form with ToS hiding behind a hyperlink and two with ToS key points presented at the time of account creation with different engagement requirements. Initial results suggest that ToS key points presented just as a list at the time of account creation is no more engaging than a form with ToS hidden behind a link. M ore text even made several users to complete the registration quicker than the users with the standard form. M oreover, different designs of the ToS key points list requiring different user engagement affect the interaction and reading of ToS key points, but the actual time spent on ToS is very low.
Applied Artificial Intelligence
This paper reports on data mining experiences of the 5th Framework project Data Mining and Decisi... more This paper reports on data mining experiences of the 5th Framework project Data Mining and Decision Support for Business Competitiveness: A European Virtual Enterprise (Sol-Eu-Net). The data mining lessons learned are reported from the following perspectives: application results, business, views of Sol-Eu-Net partners acquired by interview technique, and lessons learned in two particular data mining projects: analysis of Web education materials and UK traffic accident data analysis.
Lecture Notes in Computer Science, 2005
ABSTRACT This paper proposes a selection of knowledge technologies for health care planning and d... more ABSTRACT This paper proposes a selection of knowledge technologies for health care planning and decision support in regional-level management of Slovenian public health care. Data mining and statistical techniques were used to analyze databases collected by a regional Public Heath Institute. Specifically, we addressed the problem of directing patients from primary health care centers to specialists. Decision support tools were used for resource modeling in terms of availability and accessibility of public health services for the population. Specifically, we analyzed organisational aspects of public health resources in one Sovenian region (Celje) with the goal to identify the areas that are atypical in terms of availability and accessibility of public health services.
2007 29th International Conference on Information Technology Interfaces, 2007
The paper presents a way to overcome the shortcomings of traditional learning by enforcing collab... more The paper presents a way to overcome the shortcomings of traditional learning by enforcing collaboration between students and introducing self-assessment as part of the process of final grade formation. Treating collaboration and self-assessment as two elements of a modern learning process that are very closely bounded together, the authors argue that these elements should by no means replace the traditional (ex-cathedra) way of learning but rather extend it. A specifically designed computer science course is presented as an illustration of how the introduction of self-assessment combined with teacher evaluation can encourage collaboration between students. The benefits and drawbacks of this method are discussed.