E. Menasalvas - Academia.edu (original) (raw)
Papers by E. Menasalvas
... and DESC is the set of class descriptions. DESA ??? AK ??? is the smallest set such that V AR... more ... and DESC is the set of class descriptions. DESA ??? AK ??? is the smallest set such that V ARA ??? DESA (atomic object descriptions), and if D1,D2 ??? DESA, then D1 ??? D2 ??? DESA. DESE ??? AK ??? is the smallest set such that V ...
Lecture Notes in Computer Science, 2003
Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with ... more Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with assumptions of conditional independence among features given the class, called naïve Bayes, is competitive with state of the art classifiers. On this paper a new naive Bayes classifier called Interval Estimation naïve Bayes is proposed. Interval Estimation naïve Bayes performs on two phases. On the first phase an interval estimation of each probability necessary to specify the naïve Bayes is estimated. On the second phase the best combination of values inside these intervals is calculated with a heuristic search that is guided by the accuracy of the classifiers. The founded values in the search are the new parameters for the naïve Bayes classifier. Our new approach has shown to be quite competitive related to simple naïve Bayes. Experimental tests have been done with 21 data sets from the UCI repository.
Lecture Notes in Computer Science, 2004
Web-based commerce systems fail to achieve many of the features that enable small businesses to d... more Web-based commerce systems fail to achieve many of the features that enable small businesses to develop a friendly human relationship with customers. Although many enterprises have worried about user identification to solve the problem, the solution goes far beyond trying to find out what navigator's behavior looks like. Many approaches have recently been proposed to enrich the data in web logs with semantics related to the business so that web mining algorithms can later be applied to discover patterns and trends. In this paper we present an innovative method of log enrichment as several goals and viewpoints of the organization owning the site are taken into account. By later applying discriminant analysis to the information enriched this way, it is possible to identify the relevant factors that contribute most to the success of a session for each viewpoint under consideration. The method also helps to estimate ongoing session value in terms of how the company's objectives and expectations are being achieved.
Lecture Notes in Computer Science, 2002
... that different domains (ie educational, business, administration, government) can be distingu... more ... that different domains (ie educational, business, administration, government) can be distinguished in the web, the pro-posed methodology will ... To identify features that increase pages affability will certainly include fea-tures usually taken into consideration in web sites design. ...
Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003), 2003
Personalized recommender systems can be classified into three main categories: content-based, mos... more Personalized recommender systems can be classified into three main categories: content-based, mostly used to make suggestions depending on the text of the web documents, collaborative filtering, that use ratings from many users to suggest a document or an action to a given user and hybrid solutions. In the collaborative filtering task we can find algorithms such as the naïve Bayes classifier or some of its variants. However, the results of these classifiers can be improved, as we demonstrate through experimental results, with our new semi naïve Bayes approach based on intervals. In this work we present this new approach. 1
Information Visualization, 2014
Most visualization techniques have traditionally used two-dimensional, instead of three-dimension... more Most visualization techniques have traditionally used two-dimensional, instead of three-dimensional representations to visualize multidimensional and multivariate data. In this article, a way to demonstrate the underlying superiority of three-dimensional, with respect to two-dimensional, representation is proposed. Specifically, it is based on the inevitable quality degradation produced when reducing the data dimensionality. The problem is tackled from two different approaches: a visual and an analytical approach. First, a set of statistical tests (point classification, distance perception, and outlier identification) using the two-dimensional and three-dimensional visualization are carried out on a group of 40 users. The results indicate that there is an improvement in the accuracy introduced by the inclusion of a third dimension; however, these results do not allow to obtain definitive conclusions on the superiority of three-dimensional representation. Therefore, in order to draw ...
Lecture Notes in Computer Science, 2003
Web mining is a broad term that has been used to refer to the process of information discovery fr... more Web mining is a broad term that has been used to refer to the process of information discovery from Web sources: content, structure, and usage. Information collected by web servers and kept in the server log is the main source of data for analyzing user navigation patterns. Notwithstanding, knowing the most frequent user paths is not enough: it is necessary to integrate web mining with the company site goals in order to make sites more competitive. The concept of Web Goal Mining is introduced in this paper to refer to the process information discovery of the relationship between site visitors and sponsor goals.
Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003), 2003
The amazing evolution of e-commerce and the fierce competitive environment it has produced have e... more The amazing evolution of e-commerce and the fierce competitive environment it has produced have encouraged commercial firms to apply intelligent methods to take advantage of competitors by gathering and analyzing information collected from consumer Web sessions. Knowledge about user objectives and session goals can be discovered from the information collected regarding user activities, as tracked by Web clicks. Most current approaches to customer behaviour analysis study the user session by examining only Web page accesses. To find out about navigators behaviour is crucial to Web sites sponsors attempting to evaluate the performance of their sites. Nevertheless, knowing the current navigation patterns is not always enough. Very often it is also necessary to measure sessions value according to business goals perspectives. We present two different measures to include business goals inside click stream analysis. Each of the alternatives is discussed and evaluated in terms of how company's objectives and expectations are taken into account as well as how this approach could be achieved.
Describing a complex system is in many ways a problem akin to identifying an object, in that it i... more Describing a complex system is in many ways a problem akin to identifying an object, in that it involves defining boundaries, constituent parts and their relationships by the use of grouping laws. Here we propose a novel method which extends the use of complex networks theory to a generalized class of non-Gestaltic systems, taking the form of collections of isolated, possibly heterogeneous, scalars, e.g. sets of biomedical tests. The ability of the method to unveil relevant information is illustrated for the case of gene expression in the response to osmotic stress of Arabidopsis thaliana. The most important genes turn out to be the nodes with highest centrality in appropriately reconstructed networks. The method allows predicting a set of 15 genes whose relationship with such stress was previously unknown in the literature. The validity of such predictions is demonstrated by means of a target experiment, in which the predicted genes are one by one artificially induced, and the growth of the corresponding phenotypes turns out to feature statistically significant differences when compared to that of the wild-type.
IFIP Advances in Information and Communication Technology, 2014
In the ever-increasing availability of massive data sets describing complex systems, i.e. systems... more In the ever-increasing availability of massive data sets describing complex systems, i.e. systems composed of a plethora of elements interacting in a non-linear way, complex networks have emerged as powerful tools for characterizing these structures of interactions in a mathematical way. In this contribution, we explore how different Data Mining techniques can be adapted to improve such characterization. Specifically, we here describe novel techniques for optimizing network representations of different data sets; automatize the extraction of relevant topological metrics, and using such metrics toward the synthesis of high-level knowledge. The validity and usefulness of such approach is demonstrated through the analysis of medical data sets describing groups of control subjects and patients. Finally, the application of these techniques to other social and technological problems is discussed.
Diabetes Technology & Therapeutics, 2010
In Latin America, public health systems that manage and warrant the health of the population lack... more In Latin America, public health systems that manage and warrant the health of the population lack mechanisms and technological capabilities that enable them to accept and adopt initiatives focused to guide, look after, and improve the quality of life of millions of patients with diabetes who need attention and special care. However, the proposal presented here for a holistic, interactive, and persuasive model to facilitate self-care of diabetes patients (hiPAPD) is the first proposal in Panama, Central America, and the Caribbean Region to develop and implement information communications technology (ICT) platforms for the care of patients with chronic diseases such as diabetes. The process of experimentation was initiated with an agreement with all the staff of the project to comply with the international biomedical studies stipulations, having as reference the Declaration of Helsinki of the World Medical Association (Recommendations to Guide to Doctors in Biomedical Research on People). After several months of evaluation and ongoing work the study obtained successful validation of the hiPAPD model. The project had the support of 107 patients with diabetes, their families, friends, doctors, nurses and nursing assistants, and social groups in rural communities. Finally, the project contributed to society with a highly innovative ICT environment that facilitates self-care of diabetes patients without financial resources and health. A timely health treatment at a decisive moment may be the difference in care for patients. Through the validation process conducted in this research initiative, it was demonstrated that the hiPAPD model, from the perspective of the patient with diabetes, relatives, friends, health workforce (nurses and nursing assistants), doctors, and societal contexts, allowed the improvement of the quality of life of patients with diabetes in poor rural zones of Panama.
Lecture Notes in Computer Science, 2002
When data mining first appeared, several disciplines related to data analysis, like statistics or... more When data mining first appeared, several disciplines related to data analysis, like statistics or artificial intelligence were combined toward a new topic: extracting significant patterns from data. The original data sources were small datasets and, therefore, traditional machine learning techniques were the most common tools for this tasks. As the volume of data grows these traditional methods were reviewed and extended with the knowledge from experts working on the field of data management and databases. Today problems are even bigger than before and, once again, a new discipline allows the researchers to scale up to these data. This new discipline is distributed and parallel processing. In order to use parallel processing techniques, specific factors about the mining algorithms and the data should be considered. Nowadays, there are several new parallel algorithms, that in most of the cases are extensions of a traditional centralized algorithm. Many of these algorithms have common core parts and only differ on distribution schema, parallel coordination or load/task balancing methods. We call these groups algorithm families. On this paper we introduce a methodology to implement algorithm families. This methodology is founded on the MOIRAE distributed control architecture. In this work we will show how this architecture allows researchers to design parallel processing components that can change, dynamically, their behavior according to some control policies.
2005 IEEE International Conference on Granular Computing, 2005
... Dr. Santiago Eibe Dr. Pilar ... Virtual Environ-ments and Data Mining) with the background ... more ... Dr. Santiago Eibe Dr. Pilar ... Virtual Environ-ments and Data Mining) with the background of the rest of the people involved in the group (Data Mining, Data Bases, Data Warehousing, Heuristic Optimization, Intelli-gent Agents and Multi-Agents Systems and Grid Comput-ing). ...
Drift detection methods in data streams can detect changes in incoming data so that learned model... more Drift detection methods in data streams can detect changes in incoming data so that learned models can be used to represent the underlying population. In many real-world scenarios context information is available and could be exploited to improve existing approaches, by detecting or even anticipating to recurring concepts in the underlying population. Several applications, among them health-care or recommender systems, lend themselves to use such information as data from sensors is available but is not being used. Nevertheless, new challenges arise when integrating context with drift detection methods. Modeling and comparing context information, representing the context-concepts history and storing previously learned concepts for reuse are some of the critical problems. In this work, we propose the Context-aware Learning from Data Streams (CALDS) system to improve existing drift detection methods by exploiting available context information. Our enhancement is seamless: we use the association between context information and learned concepts to improve detection and adaptation to drift when concepts reappear. We present and discuss our preliminary experimental results with synthetic and real datasets.
Abstract. In data stream classification the problem of recurring con-cepts is a special case of c... more Abstract. In data stream classification the problem of recurring con-cepts is a special case of concept drift where the underlying concepts may reappear. Several methods have been proposed to learn in the presence of concept drift, but few consider recurring concepts and context ...
Recent advances in ubiquitous devices open an opportunity to apply new data stream mining techniq... more Recent advances in ubiquitous devices open an opportunity to apply new data stream mining techniques to support intelligent decision making in the next generation of ubiquitous applications. This paper motivates and describes a novel Context-aware Collaborative data stream mining system CC-Stream that allows intelligent mining and classification of time-changing data streams on-board ubiquitous devices. CC-Stream explores the knowledge available in other ubiquitous devices to improve local classification accuracy. Such knowledge is associated with context information that captures the system state for a particular underlying concept. CC-Stream uses an ensemble method where the classifiers are selected and weighted based on their local accuracy for different partitions of the instance space and their context similarity in relation to the current context.
... and DESC is the set of class descriptions. DESA ??? AK ??? is the smallest set such that V AR... more ... and DESC is the set of class descriptions. DESA ??? AK ??? is the smallest set such that V ARA ??? DESA (atomic object descriptions), and if D1,D2 ??? DESA, then D1 ??? D2 ??? DESA. DESE ??? AK ??? is the smallest set such that V ...
Lecture Notes in Computer Science, 2003
Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with ... more Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with assumptions of conditional independence among features given the class, called naïve Bayes, is competitive with state of the art classifiers. On this paper a new naive Bayes classifier called Interval Estimation naïve Bayes is proposed. Interval Estimation naïve Bayes performs on two phases. On the first phase an interval estimation of each probability necessary to specify the naïve Bayes is estimated. On the second phase the best combination of values inside these intervals is calculated with a heuristic search that is guided by the accuracy of the classifiers. The founded values in the search are the new parameters for the naïve Bayes classifier. Our new approach has shown to be quite competitive related to simple naïve Bayes. Experimental tests have been done with 21 data sets from the UCI repository.
Lecture Notes in Computer Science, 2004
Web-based commerce systems fail to achieve many of the features that enable small businesses to d... more Web-based commerce systems fail to achieve many of the features that enable small businesses to develop a friendly human relationship with customers. Although many enterprises have worried about user identification to solve the problem, the solution goes far beyond trying to find out what navigator's behavior looks like. Many approaches have recently been proposed to enrich the data in web logs with semantics related to the business so that web mining algorithms can later be applied to discover patterns and trends. In this paper we present an innovative method of log enrichment as several goals and viewpoints of the organization owning the site are taken into account. By later applying discriminant analysis to the information enriched this way, it is possible to identify the relevant factors that contribute most to the success of a session for each viewpoint under consideration. The method also helps to estimate ongoing session value in terms of how the company's objectives and expectations are being achieved.
Lecture Notes in Computer Science, 2002
... that different domains (ie educational, business, administration, government) can be distingu... more ... that different domains (ie educational, business, administration, government) can be distinguished in the web, the pro-posed methodology will ... To identify features that increase pages affability will certainly include fea-tures usually taken into consideration in web sites design. ...
Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003), 2003
Personalized recommender systems can be classified into three main categories: content-based, mos... more Personalized recommender systems can be classified into three main categories: content-based, mostly used to make suggestions depending on the text of the web documents, collaborative filtering, that use ratings from many users to suggest a document or an action to a given user and hybrid solutions. In the collaborative filtering task we can find algorithms such as the naïve Bayes classifier or some of its variants. However, the results of these classifiers can be improved, as we demonstrate through experimental results, with our new semi naïve Bayes approach based on intervals. In this work we present this new approach. 1
Information Visualization, 2014
Most visualization techniques have traditionally used two-dimensional, instead of three-dimension... more Most visualization techniques have traditionally used two-dimensional, instead of three-dimensional representations to visualize multidimensional and multivariate data. In this article, a way to demonstrate the underlying superiority of three-dimensional, with respect to two-dimensional, representation is proposed. Specifically, it is based on the inevitable quality degradation produced when reducing the data dimensionality. The problem is tackled from two different approaches: a visual and an analytical approach. First, a set of statistical tests (point classification, distance perception, and outlier identification) using the two-dimensional and three-dimensional visualization are carried out on a group of 40 users. The results indicate that there is an improvement in the accuracy introduced by the inclusion of a third dimension; however, these results do not allow to obtain definitive conclusions on the superiority of three-dimensional representation. Therefore, in order to draw ...
Lecture Notes in Computer Science, 2003
Web mining is a broad term that has been used to refer to the process of information discovery fr... more Web mining is a broad term that has been used to refer to the process of information discovery from Web sources: content, structure, and usage. Information collected by web servers and kept in the server log is the main source of data for analyzing user navigation patterns. Notwithstanding, knowing the most frequent user paths is not enough: it is necessary to integrate web mining with the company site goals in order to make sites more competitive. The concept of Web Goal Mining is introduced in this paper to refer to the process information discovery of the relationship between site visitors and sponsor goals.
Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003), 2003
The amazing evolution of e-commerce and the fierce competitive environment it has produced have e... more The amazing evolution of e-commerce and the fierce competitive environment it has produced have encouraged commercial firms to apply intelligent methods to take advantage of competitors by gathering and analyzing information collected from consumer Web sessions. Knowledge about user objectives and session goals can be discovered from the information collected regarding user activities, as tracked by Web clicks. Most current approaches to customer behaviour analysis study the user session by examining only Web page accesses. To find out about navigators behaviour is crucial to Web sites sponsors attempting to evaluate the performance of their sites. Nevertheless, knowing the current navigation patterns is not always enough. Very often it is also necessary to measure sessions value according to business goals perspectives. We present two different measures to include business goals inside click stream analysis. Each of the alternatives is discussed and evaluated in terms of how company's objectives and expectations are taken into account as well as how this approach could be achieved.
Describing a complex system is in many ways a problem akin to identifying an object, in that it i... more Describing a complex system is in many ways a problem akin to identifying an object, in that it involves defining boundaries, constituent parts and their relationships by the use of grouping laws. Here we propose a novel method which extends the use of complex networks theory to a generalized class of non-Gestaltic systems, taking the form of collections of isolated, possibly heterogeneous, scalars, e.g. sets of biomedical tests. The ability of the method to unveil relevant information is illustrated for the case of gene expression in the response to osmotic stress of Arabidopsis thaliana. The most important genes turn out to be the nodes with highest centrality in appropriately reconstructed networks. The method allows predicting a set of 15 genes whose relationship with such stress was previously unknown in the literature. The validity of such predictions is demonstrated by means of a target experiment, in which the predicted genes are one by one artificially induced, and the growth of the corresponding phenotypes turns out to feature statistically significant differences when compared to that of the wild-type.
IFIP Advances in Information and Communication Technology, 2014
In the ever-increasing availability of massive data sets describing complex systems, i.e. systems... more In the ever-increasing availability of massive data sets describing complex systems, i.e. systems composed of a plethora of elements interacting in a non-linear way, complex networks have emerged as powerful tools for characterizing these structures of interactions in a mathematical way. In this contribution, we explore how different Data Mining techniques can be adapted to improve such characterization. Specifically, we here describe novel techniques for optimizing network representations of different data sets; automatize the extraction of relevant topological metrics, and using such metrics toward the synthesis of high-level knowledge. The validity and usefulness of such approach is demonstrated through the analysis of medical data sets describing groups of control subjects and patients. Finally, the application of these techniques to other social and technological problems is discussed.
Diabetes Technology & Therapeutics, 2010
In Latin America, public health systems that manage and warrant the health of the population lack... more In Latin America, public health systems that manage and warrant the health of the population lack mechanisms and technological capabilities that enable them to accept and adopt initiatives focused to guide, look after, and improve the quality of life of millions of patients with diabetes who need attention and special care. However, the proposal presented here for a holistic, interactive, and persuasive model to facilitate self-care of diabetes patients (hiPAPD) is the first proposal in Panama, Central America, and the Caribbean Region to develop and implement information communications technology (ICT) platforms for the care of patients with chronic diseases such as diabetes. The process of experimentation was initiated with an agreement with all the staff of the project to comply with the international biomedical studies stipulations, having as reference the Declaration of Helsinki of the World Medical Association (Recommendations to Guide to Doctors in Biomedical Research on People). After several months of evaluation and ongoing work the study obtained successful validation of the hiPAPD model. The project had the support of 107 patients with diabetes, their families, friends, doctors, nurses and nursing assistants, and social groups in rural communities. Finally, the project contributed to society with a highly innovative ICT environment that facilitates self-care of diabetes patients without financial resources and health. A timely health treatment at a decisive moment may be the difference in care for patients. Through the validation process conducted in this research initiative, it was demonstrated that the hiPAPD model, from the perspective of the patient with diabetes, relatives, friends, health workforce (nurses and nursing assistants), doctors, and societal contexts, allowed the improvement of the quality of life of patients with diabetes in poor rural zones of Panama.
Lecture Notes in Computer Science, 2002
When data mining first appeared, several disciplines related to data analysis, like statistics or... more When data mining first appeared, several disciplines related to data analysis, like statistics or artificial intelligence were combined toward a new topic: extracting significant patterns from data. The original data sources were small datasets and, therefore, traditional machine learning techniques were the most common tools for this tasks. As the volume of data grows these traditional methods were reviewed and extended with the knowledge from experts working on the field of data management and databases. Today problems are even bigger than before and, once again, a new discipline allows the researchers to scale up to these data. This new discipline is distributed and parallel processing. In order to use parallel processing techniques, specific factors about the mining algorithms and the data should be considered. Nowadays, there are several new parallel algorithms, that in most of the cases are extensions of a traditional centralized algorithm. Many of these algorithms have common core parts and only differ on distribution schema, parallel coordination or load/task balancing methods. We call these groups algorithm families. On this paper we introduce a methodology to implement algorithm families. This methodology is founded on the MOIRAE distributed control architecture. In this work we will show how this architecture allows researchers to design parallel processing components that can change, dynamically, their behavior according to some control policies.
2005 IEEE International Conference on Granular Computing, 2005
... Dr. Santiago Eibe Dr. Pilar ... Virtual Environ-ments and Data Mining) with the background ... more ... Dr. Santiago Eibe Dr. Pilar ... Virtual Environ-ments and Data Mining) with the background of the rest of the people involved in the group (Data Mining, Data Bases, Data Warehousing, Heuristic Optimization, Intelli-gent Agents and Multi-Agents Systems and Grid Comput-ing). ...
Drift detection methods in data streams can detect changes in incoming data so that learned model... more Drift detection methods in data streams can detect changes in incoming data so that learned models can be used to represent the underlying population. In many real-world scenarios context information is available and could be exploited to improve existing approaches, by detecting or even anticipating to recurring concepts in the underlying population. Several applications, among them health-care or recommender systems, lend themselves to use such information as data from sensors is available but is not being used. Nevertheless, new challenges arise when integrating context with drift detection methods. Modeling and comparing context information, representing the context-concepts history and storing previously learned concepts for reuse are some of the critical problems. In this work, we propose the Context-aware Learning from Data Streams (CALDS) system to improve existing drift detection methods by exploiting available context information. Our enhancement is seamless: we use the association between context information and learned concepts to improve detection and adaptation to drift when concepts reappear. We present and discuss our preliminary experimental results with synthetic and real datasets.
Abstract. In data stream classification the problem of recurring con-cepts is a special case of c... more Abstract. In data stream classification the problem of recurring con-cepts is a special case of concept drift where the underlying concepts may reappear. Several methods have been proposed to learn in the presence of concept drift, but few consider recurring concepts and context ...
Recent advances in ubiquitous devices open an opportunity to apply new data stream mining techniq... more Recent advances in ubiquitous devices open an opportunity to apply new data stream mining techniques to support intelligent decision making in the next generation of ubiquitous applications. This paper motivates and describes a novel Context-aware Collaborative data stream mining system CC-Stream that allows intelligent mining and classification of time-changing data streams on-board ubiquitous devices. CC-Stream explores the knowledge available in other ubiquitous devices to improve local classification accuracy. Such knowledge is associated with context information that captures the system state for a particular underlying concept. CC-Stream uses an ensemble method where the classifiers are selected and weighted based on their local accuracy for different partitions of the instance space and their context similarity in relation to the current context.