Antonio Gomariz Peñalver - Academia.edu (original) (raw)
Papers by Antonio Gomariz Peñalver
Springer eBooks, 2014
Sequential pattern mining is a popular data mining task with wide applications. However, it may p... more Sequential pattern mining is a popular data mining task with wide applications. However, it may present too many sequential patterns to users, which makes it difficult for users to comprehend the results. As a solution, it was proposed to mine maximal sequential patterns, a compact representation of the set of sequential patterns, which is often several orders of magnitude smaller than the set of all sequential patterns. However, the task of mining maximal patterns remains computationally expensive. To address this problem, we introduce a vertical mining algorithm named VMSP (Vertical mining of Maximal Sequential Patterns). It is to our knowledge the first vertical mining algorithm for mining maximal sequential patterns. An experimental study on five real datasets shows that VMSP is up to two orders of magnitude faster than the current state-of-the-art algorithm.
Lecture Notes in Computer Science, 2014
Lecture Notes in Computer Science, 2013
In this paper, we propose a new algorithm, called ClaSP for mining frequent closed sequential pat... more In this paper, we propose a new algorithm, called ClaSP for mining frequent closed sequential patterns in temporal transaction data. Our algorithm uses several efficient search space pruning methods together with a vertical database layout. Experiments on both synthetic and real datasets show that ClaSP outperforms currently well known state of the art methods, such as CloSpan.
JMIR Medical Informatics, 2021
Background It is important to exploit all available data on patients in settings such as intensiv... more Background It is important to exploit all available data on patients in settings such as intensive care burn units (ICBUs), where several variables are recorded over time. It is possible to take advantage of the multivariate patterns that model the evolution of patients to predict their survival. However, pattern discovery algorithms generate a large number of patterns, of which only some are relevant for classification. Objective We propose to use the diagnostic odds ratio (DOR) to select multivariate sequential patterns used in the classification in a clinical domain, rather than employing frequency properties. Methods We used data obtained from the ICBU at the University Hospital of Getafe, where 6 temporal variables for 465 patients were registered every day during 5 days, and to model the evolution of these clinical variables, we used multivariate sequential patterns by applying 2 different discretization methods for the continuous attributes. We compared 4 ways in which to emp...
We present SPMF, an open-source data mining library offering implementations of more than 55 data... more We present SPMF, an open-source data mining library offering implementations of more than 55 data mining algorithms. SPMF is a cross-platform library implemented in Java, specialized for discovering patterns in transaction and sequence databases such as frequent itemsets, association rules and sequential patterns. The source code can be integrated in other Java programs. Moreover, SPMF offers a command line interface and a simple graphical interface for quick testing. The source code is available under the GNU General Public License, version 3. The website of the project offers several resources such as docu-mentation with examples of how to run each algorithm, a developer’s guide, performance comparisons of algorithms, data sets, an active forum, a FAQ and a mailing list.
SPMF is an open-source data mining library, specialized in pattern mining, offering implementatio... more SPMF is an open-source data mining library, specialized in pattern mining, offering implementations of more than 120 data mining algorithms. It has been used in more than 310 research papers to solve applied problems in a wide range of domains from authorship attribution to restaurant recommendation. Its implementations are also commonly used as benchmarks in research papers, and it has also been integrated in several data analysis software programs. After three years of development, this paper introduces the second major revision of the library, named SPMF 2, which provides (1) more than 60 new algorithm implementations (including novel algorithms for sequence prediction), (2) an improved user interface with pattern visualization (3) a novel plug-in system, (4) improved performance, and (5) support for text mining.
Lecture Notes in Computer Science, 2016
Chronobiology is the scientific discipline that deals with the study of the biological rhythms an... more Chronobiology is the scientific discipline that deals with the study of the biological rhythms and their underlying mechanisms. The alteration of biological rhythms, such as blood pressure or temperature begins to be considered as a good marker of certain diseases and senescence. Among the variables, the wrist skin temperature has proven to be a good marker of the circadian rhythms of the subject. In this paper we evaluate the wrist temperature of four groups of subjects with different age in order to gain some knowledge on the evolution of the circadian rhythms and its application to age classification.
Machine Learning and Knowledge Discovery in Databases, 2016
SPMF is an open-source data mining library, specialized in pattern mining, offering implementatio... more SPMF is an open-source data mining library, specialized in pattern mining, offering implementations of more than 120 data mining algorithms. It has been used in more than 310 research papers to solve applied problems in a wide range of domains from authorship attribution to restaurant recommendation. Its implementations are also commonly used as benchmarks in research papers, and it has also been integrated in several data analysis software programs. After three years of development, this paper introduces the second major revision of the library, named SPMF 2, which provides (1) more than 60 new algorithm implementations (including novel algorithms for sequence prediction), (2) an improved user interface with pattern visualization (3) a novel plug-in system, (4) improved performance, and (5) support for text mining.
2009 21st IEEE International Conference on Tools with Artificial Intelligence, 2009
ABSTRACT The early diagnosis and the correct therapy for generalized infections is an important f... more ABSTRACT The early diagnosis and the correct therapy for generalized infections is an important factor for patient survival in Intensive Care Burn Units (ICBUs). Due to the number of pathologies involved, there is not a specific etiology and, therefore, it is difficult for physicians to quantify the patient severity to state the diagnosis. In this scenario, CBR finds problems to obtain a reliable solution when retrieved cases are highly similar. For example, in ICBU patients slight variations of monitored parameters have a deep impact on the patient's severity evaluation. Therefore, it seems necessary to extend the system outcome in order to indicate the reliance of the solution obtained. Main efforts in the literature for CBR evaluation focus on case retrieval (i.e. similarity) or on a retrospective analysis. However, these approaches do not seem to suffice when cases are very close. In this work, we propose and implement a CBR system to state the chance of a patient to survive. The system has been tested using a database of 89 patients from an ICBU, obtaining about 76\% accuracy. Furthermore, in order to evaluate the behaviour of the CBR system in this kind of scenarios, we propose three techniques to obtain a reliance solution degree, one based on case retrieval and two based on case reuse.
Lecture Notes in Computer Science, 2014
Sequential pattern mining is a popular data mining task with wide applications. However, the set ... more Sequential pattern mining is a popular data mining task with wide applications. However, the set of all sequential patterns can be very large. To discover fewer but more representative patterns, several compact representations of sequential patterns have been studied. The set of sequential generators is one the most popular representations. It was shown to provide higher accuracy for classification than using all or only closed sequential patterns. Furthermore, mining generators is a key step in several other data mining tasks such as sequential rule generation. However, mining generators is computationally expensive. To address this issue, we propose a novel mining algorithm named VGEN (Vertical sequential GENerator miner). An experimental study on five real datasets shows that VGEN is up to two orders of magnitude faster than the state-of-the-art algorithms for sequential generator mining.
Lecture Notes in Computer Science, 2011
... Phenomena 110(1-2), 43–50 (1997) 2. Cuesta, D., Varela, M., Miró, P., Galdós, P., Abásolo, D.... more ... Phenomena 110(1-2), 43–50 (1997) 2. Cuesta, D., Varela, M., Miró, P., Galdós, P., Abásolo, D., Hornero, R., Aboy, M.: Predicting ... IEEE Transactions on Information Theory 22(1), 75–81 (1976) 9. Nagarajan, R., Szczepanski, J., Wajnryb, E.: Interpreting non-random signatures in ...
Lecture Notes in Computer Science, 2013
In this paper, we propose a new algorithm, called ClaSP for mining frequent closed sequential pat... more In this paper, we propose a new algorithm, called ClaSP for mining frequent closed sequential patterns in temporal transaction data. Our algorithm uses several efficient search space pruning methods together with a vertical database layout. Experiments on both synthetic and real datasets show that ClaSP outperforms currently well known state of the art methods, such as CloSpan.
Advances in Knowledge Discovery and Data Mining, 2014
Lecture Notes in Computer Science, 2010
ABSTRACT Case-based reasoning has demonstrated to be a suitable similarity-based approach to deve... more ABSTRACT Case-based reasoning has demonstrated to be a suitable similarity-based approach to develop decision-support system in different domains. However, in certain scenarios CBR finds difficulties to obtain a reliable solution when retrieved cases are highly similar. For example, patients from an Intensive Care Unit are critical patients in which slight variations of monitored parameters have a deep impact on the patient severity evaluation. In this scenario, it seems necessary to extend the system outcome in order to indicate the reliance of the solution obtained. Main efforts in the literature for CBR evaluation focus on case retrieval (i.e. similarity) or a retrospective analysis. However, these approaches do not seem to suffice when cases are very close. To this end, we propose three techniques to obtain a reliance solution degree, one based on case retrieval and two based on case adaptation. We also show the capacities of this proposal in a medical problem.
Lecture Notes in Computer Science, 2012
Problem Oriented Medical Record (POMR) is a medical record approach that provides a quick and str... more Problem Oriented Medical Record (POMR) is a medical record approach that provides a quick and structured acquisition of the patient's history. POMR, unlike classical health records, focuses on patient's problems, their evolution, and the relations between the clinical events. This approach provides the physician a view of the patients' history as an orderly process to solve their problems, giving the opportunity to make explicit hypotheses and clinical decisions. Most efforts regarding POMR focus on the implementation of information systems as an alternative of classical health records. Results reveal that POMR information systems provide a better organisation of patients' information but unsuitable mechanisms to perform other basic issues (e.g. administrative reports). Due to its features, POMR can help to bridge the gap between the traditional clinical information process and knowledge management. Despite the potential advantages of POMR, only few efforts have been done to exploit its capacities as a knowledge representation model and a further automatic reasoning. In this work, we propose the Problem Flow, a computational model based on the POMR. This proposal has a double objective: (1) to make explicit the knowledge included in the POMR for reasoning purposes and (2) to allow the coexistence between classical health records and the POMR. We also present PLOW, a knowledge acquisition tool which supports the proposed model. We illustrate its application in the Intensive Care Unit domain.
Lecture Notes in Computer Science, 2014
Sequential pattern mining is a popular data mining task with wide applications. However, it may p... more Sequential pattern mining is a popular data mining task with wide applications. However, it may present too many sequential patterns to users, which makes it difficult for users to comprehend the results. As a solution, it was proposed to mine maximal sequential patterns, a compact representation of the set of sequential patterns, which is often several orders of magnitude smaller than the set of all sequential patterns. However, the task of mining maximal patterns remains computationally expensive. To address this problem, we introduce a vertical mining algorithm named VMSP (Vertical mining of Maximal Sequential Patterns). It is to our knowledge the first vertical mining algorithm for mining maximal sequential patterns. An experimental study on five real datasets shows that VMSP is up to two orders of magnitude faster than the current state-of-the-art algorithm.
Uno de los problemas a los que las tecnologias de la informacion han tenido que enfrentarse en lo... more Uno de los problemas a los que las tecnologias de la informacion han tenido que enfrentarse en los ultimos anos es el analisis de una enorme cantidad de datos originada en las actividades cotidianas de organizaciones o personas. Este analisis puede consistir en la busqueda tanto de modelos como patrones que ayuden en la comprension de los datos o el comportamiento de estas organizaciones o personas. Una componente esencial asociada a este tipo de conocimiento es la dimension temporal, que cuando es tenida en cuenta en los patrones, no solo proporciona mucha mas informacion, sino tambien los convierte en mas complejos.La mineria de datos de secuencias (SDM) es un area en el campo de la deteccion de conocimiento en bases de datos (KDD) cuyo objetivo es extraer los conjuntos de patrones frecuentes que se encuentran, ordenados en el tiempo, en una base de datos. Algunas tecnicas de SDM han sido empleadas en una amplia variedad de dominios de aplicacion, tales como el descubrimiento de p...
Springer eBooks, 2014
Sequential pattern mining is a popular data mining task with wide applications. However, it may p... more Sequential pattern mining is a popular data mining task with wide applications. However, it may present too many sequential patterns to users, which makes it difficult for users to comprehend the results. As a solution, it was proposed to mine maximal sequential patterns, a compact representation of the set of sequential patterns, which is often several orders of magnitude smaller than the set of all sequential patterns. However, the task of mining maximal patterns remains computationally expensive. To address this problem, we introduce a vertical mining algorithm named VMSP (Vertical mining of Maximal Sequential Patterns). It is to our knowledge the first vertical mining algorithm for mining maximal sequential patterns. An experimental study on five real datasets shows that VMSP is up to two orders of magnitude faster than the current state-of-the-art algorithm.
Lecture Notes in Computer Science, 2014
Lecture Notes in Computer Science, 2013
In this paper, we propose a new algorithm, called ClaSP for mining frequent closed sequential pat... more In this paper, we propose a new algorithm, called ClaSP for mining frequent closed sequential patterns in temporal transaction data. Our algorithm uses several efficient search space pruning methods together with a vertical database layout. Experiments on both synthetic and real datasets show that ClaSP outperforms currently well known state of the art methods, such as CloSpan.
JMIR Medical Informatics, 2021
Background It is important to exploit all available data on patients in settings such as intensiv... more Background It is important to exploit all available data on patients in settings such as intensive care burn units (ICBUs), where several variables are recorded over time. It is possible to take advantage of the multivariate patterns that model the evolution of patients to predict their survival. However, pattern discovery algorithms generate a large number of patterns, of which only some are relevant for classification. Objective We propose to use the diagnostic odds ratio (DOR) to select multivariate sequential patterns used in the classification in a clinical domain, rather than employing frequency properties. Methods We used data obtained from the ICBU at the University Hospital of Getafe, where 6 temporal variables for 465 patients were registered every day during 5 days, and to model the evolution of these clinical variables, we used multivariate sequential patterns by applying 2 different discretization methods for the continuous attributes. We compared 4 ways in which to emp...
We present SPMF, an open-source data mining library offering implementations of more than 55 data... more We present SPMF, an open-source data mining library offering implementations of more than 55 data mining algorithms. SPMF is a cross-platform library implemented in Java, specialized for discovering patterns in transaction and sequence databases such as frequent itemsets, association rules and sequential patterns. The source code can be integrated in other Java programs. Moreover, SPMF offers a command line interface and a simple graphical interface for quick testing. The source code is available under the GNU General Public License, version 3. The website of the project offers several resources such as docu-mentation with examples of how to run each algorithm, a developer’s guide, performance comparisons of algorithms, data sets, an active forum, a FAQ and a mailing list.
SPMF is an open-source data mining library, specialized in pattern mining, offering implementatio... more SPMF is an open-source data mining library, specialized in pattern mining, offering implementations of more than 120 data mining algorithms. It has been used in more than 310 research papers to solve applied problems in a wide range of domains from authorship attribution to restaurant recommendation. Its implementations are also commonly used as benchmarks in research papers, and it has also been integrated in several data analysis software programs. After three years of development, this paper introduces the second major revision of the library, named SPMF 2, which provides (1) more than 60 new algorithm implementations (including novel algorithms for sequence prediction), (2) an improved user interface with pattern visualization (3) a novel plug-in system, (4) improved performance, and (5) support for text mining.
Lecture Notes in Computer Science, 2016
Chronobiology is the scientific discipline that deals with the study of the biological rhythms an... more Chronobiology is the scientific discipline that deals with the study of the biological rhythms and their underlying mechanisms. The alteration of biological rhythms, such as blood pressure or temperature begins to be considered as a good marker of certain diseases and senescence. Among the variables, the wrist skin temperature has proven to be a good marker of the circadian rhythms of the subject. In this paper we evaluate the wrist temperature of four groups of subjects with different age in order to gain some knowledge on the evolution of the circadian rhythms and its application to age classification.
Machine Learning and Knowledge Discovery in Databases, 2016
SPMF is an open-source data mining library, specialized in pattern mining, offering implementatio... more SPMF is an open-source data mining library, specialized in pattern mining, offering implementations of more than 120 data mining algorithms. It has been used in more than 310 research papers to solve applied problems in a wide range of domains from authorship attribution to restaurant recommendation. Its implementations are also commonly used as benchmarks in research papers, and it has also been integrated in several data analysis software programs. After three years of development, this paper introduces the second major revision of the library, named SPMF 2, which provides (1) more than 60 new algorithm implementations (including novel algorithms for sequence prediction), (2) an improved user interface with pattern visualization (3) a novel plug-in system, (4) improved performance, and (5) support for text mining.
2009 21st IEEE International Conference on Tools with Artificial Intelligence, 2009
ABSTRACT The early diagnosis and the correct therapy for generalized infections is an important f... more ABSTRACT The early diagnosis and the correct therapy for generalized infections is an important factor for patient survival in Intensive Care Burn Units (ICBUs). Due to the number of pathologies involved, there is not a specific etiology and, therefore, it is difficult for physicians to quantify the patient severity to state the diagnosis. In this scenario, CBR finds problems to obtain a reliable solution when retrieved cases are highly similar. For example, in ICBU patients slight variations of monitored parameters have a deep impact on the patient's severity evaluation. Therefore, it seems necessary to extend the system outcome in order to indicate the reliance of the solution obtained. Main efforts in the literature for CBR evaluation focus on case retrieval (i.e. similarity) or on a retrospective analysis. However, these approaches do not seem to suffice when cases are very close. In this work, we propose and implement a CBR system to state the chance of a patient to survive. The system has been tested using a database of 89 patients from an ICBU, obtaining about 76\% accuracy. Furthermore, in order to evaluate the behaviour of the CBR system in this kind of scenarios, we propose three techniques to obtain a reliance solution degree, one based on case retrieval and two based on case reuse.
Lecture Notes in Computer Science, 2014
Sequential pattern mining is a popular data mining task with wide applications. However, the set ... more Sequential pattern mining is a popular data mining task with wide applications. However, the set of all sequential patterns can be very large. To discover fewer but more representative patterns, several compact representations of sequential patterns have been studied. The set of sequential generators is one the most popular representations. It was shown to provide higher accuracy for classification than using all or only closed sequential patterns. Furthermore, mining generators is a key step in several other data mining tasks such as sequential rule generation. However, mining generators is computationally expensive. To address this issue, we propose a novel mining algorithm named VGEN (Vertical sequential GENerator miner). An experimental study on five real datasets shows that VGEN is up to two orders of magnitude faster than the state-of-the-art algorithms for sequential generator mining.
Lecture Notes in Computer Science, 2011
... Phenomena 110(1-2), 43–50 (1997) 2. Cuesta, D., Varela, M., Miró, P., Galdós, P., Abásolo, D.... more ... Phenomena 110(1-2), 43–50 (1997) 2. Cuesta, D., Varela, M., Miró, P., Galdós, P., Abásolo, D., Hornero, R., Aboy, M.: Predicting ... IEEE Transactions on Information Theory 22(1), 75–81 (1976) 9. Nagarajan, R., Szczepanski, J., Wajnryb, E.: Interpreting non-random signatures in ...
Lecture Notes in Computer Science, 2013
In this paper, we propose a new algorithm, called ClaSP for mining frequent closed sequential pat... more In this paper, we propose a new algorithm, called ClaSP for mining frequent closed sequential patterns in temporal transaction data. Our algorithm uses several efficient search space pruning methods together with a vertical database layout. Experiments on both synthetic and real datasets show that ClaSP outperforms currently well known state of the art methods, such as CloSpan.
Advances in Knowledge Discovery and Data Mining, 2014
Lecture Notes in Computer Science, 2010
ABSTRACT Case-based reasoning has demonstrated to be a suitable similarity-based approach to deve... more ABSTRACT Case-based reasoning has demonstrated to be a suitable similarity-based approach to develop decision-support system in different domains. However, in certain scenarios CBR finds difficulties to obtain a reliable solution when retrieved cases are highly similar. For example, patients from an Intensive Care Unit are critical patients in which slight variations of monitored parameters have a deep impact on the patient severity evaluation. In this scenario, it seems necessary to extend the system outcome in order to indicate the reliance of the solution obtained. Main efforts in the literature for CBR evaluation focus on case retrieval (i.e. similarity) or a retrospective analysis. However, these approaches do not seem to suffice when cases are very close. To this end, we propose three techniques to obtain a reliance solution degree, one based on case retrieval and two based on case adaptation. We also show the capacities of this proposal in a medical problem.
Lecture Notes in Computer Science, 2012
Problem Oriented Medical Record (POMR) is a medical record approach that provides a quick and str... more Problem Oriented Medical Record (POMR) is a medical record approach that provides a quick and structured acquisition of the patient's history. POMR, unlike classical health records, focuses on patient's problems, their evolution, and the relations between the clinical events. This approach provides the physician a view of the patients' history as an orderly process to solve their problems, giving the opportunity to make explicit hypotheses and clinical decisions. Most efforts regarding POMR focus on the implementation of information systems as an alternative of classical health records. Results reveal that POMR information systems provide a better organisation of patients' information but unsuitable mechanisms to perform other basic issues (e.g. administrative reports). Due to its features, POMR can help to bridge the gap between the traditional clinical information process and knowledge management. Despite the potential advantages of POMR, only few efforts have been done to exploit its capacities as a knowledge representation model and a further automatic reasoning. In this work, we propose the Problem Flow, a computational model based on the POMR. This proposal has a double objective: (1) to make explicit the knowledge included in the POMR for reasoning purposes and (2) to allow the coexistence between classical health records and the POMR. We also present PLOW, a knowledge acquisition tool which supports the proposed model. We illustrate its application in the Intensive Care Unit domain.
Lecture Notes in Computer Science, 2014
Sequential pattern mining is a popular data mining task with wide applications. However, it may p... more Sequential pattern mining is a popular data mining task with wide applications. However, it may present too many sequential patterns to users, which makes it difficult for users to comprehend the results. As a solution, it was proposed to mine maximal sequential patterns, a compact representation of the set of sequential patterns, which is often several orders of magnitude smaller than the set of all sequential patterns. However, the task of mining maximal patterns remains computationally expensive. To address this problem, we introduce a vertical mining algorithm named VMSP (Vertical mining of Maximal Sequential Patterns). It is to our knowledge the first vertical mining algorithm for mining maximal sequential patterns. An experimental study on five real datasets shows that VMSP is up to two orders of magnitude faster than the current state-of-the-art algorithm.
Uno de los problemas a los que las tecnologias de la informacion han tenido que enfrentarse en lo... more Uno de los problemas a los que las tecnologias de la informacion han tenido que enfrentarse en los ultimos anos es el analisis de una enorme cantidad de datos originada en las actividades cotidianas de organizaciones o personas. Este analisis puede consistir en la busqueda tanto de modelos como patrones que ayuden en la comprension de los datos o el comportamiento de estas organizaciones o personas. Una componente esencial asociada a este tipo de conocimiento es la dimension temporal, que cuando es tenida en cuenta en los patrones, no solo proporciona mucha mas informacion, sino tambien los convierte en mas complejos.La mineria de datos de secuencias (SDM) es un area en el campo de la deteccion de conocimiento en bases de datos (KDD) cuyo objetivo es extraer los conjuntos de patrones frecuentes que se encuentran, ordenados en el tiempo, en una base de datos. Algunas tecnicas de SDM han sido empleadas en una amplia variedad de dominios de aplicacion, tales como el descubrimiento de p...