Ronaldo Prati - Profile on Academia.edu (original) (raw)

Papers by Ronaldo Prati

The Impact of Interdisciplinary, Gender and Geographic Distributions on the Citation Patterns of the Journal of Chemical Information and Modeling

Figshare, 2023

Data-driven water need estimation for IoT-based smart irrigation: A survey

Expert Systems With Applications, Sep 1, 2023

Comparative study of musical timbral variations: Crescendo and Vibrato using FFT-Acoustic Descriptor

Quantitative evaluation of the musical timbre and its variations is important for the analysis of... more Quantitative evaluation of the musical timbre and its variations is important for the analysis of audio recordings and computer-aided music composition. Using the FFT acoustic descriptors and their representation in an abstract timbral space, variations of a sample of monophonic sounds of chordophones (violin, cello) and aerophones (trumpet, transverse flute, and clarinet) sounds are analyzed. It is concluded that the FFT acoustic descriptors allow us to distinguish the timbral variations of the musical dynamics, including crescendo and vibrato. Furthermore, using the Random Forest algorithm, it is shown that the FFT-Acoustic provides a statistically significant classification to distinguish musical instruments, family of instruments, and dynamics. We observed a better behavior for the FFT-Acoustic descriptors when classifying pitch compared to some timbral features of Librosa.

Comparing Modern and Traditional Modeling Methods for Predicting Soil Moisture in Iot-Based Irrigation Systems

Eng

Musical timbre is a phenomenon of auditory perception that allows the recognition of musical soun... more Musical timbre is a phenomenon of auditory perception that allows the recognition of musical sounds. The recognition of musical timbre is a challenging task because the timbre of a musical instrument or sound source is a complex and multifaceted phenomenon that is influenced by a variety of factors, including the physical properties of the instrument or sound source, the way it is played or produced, and the recording and processing techniques used. In this paper, we explore an abstract space with 7 dimensions formed by the fundamental frequency and FFT-Acoustic Descriptors in 240 monophonic sounds from the Tinysol and Good-Sounds databases, corresponding to the fourth octave of the transverse flute and clarinet. This approach allows us to unequivocally define a collection of points and, therefore, a timbral space (Category Theory) that allows different sounds of any type of musical instrument with its respective dynamics to be represented as a single characteristic vector. The geom...

Soil moisture forecast for smart irrigation: The primetime for machine learning

Expert Systems with Applications

Receiver Operating Characteristics (ROC) graph is a popular way of assessing the performance of c... more Receiver Operating Characteristics (ROC) graph is a popular way of assessing the performance of classification rules. However, as such graphs are based on class conditional probabilities, they are inappropriate to evaluate the quality of association rules. This follows from the fact that there is no class in association rule mining, and the consequent part of two different association rules might not have any correlation at all. This chapter presents an extension of ROC graphs, named QROC (for Quality ROC), which can be used in association rule context. Furthermore, QROC can be used to help analysts to evaluate the relative interestingness among different association rules in different cost scenarios.

Software and Libraries for Imbalanced Classification

Researchers in the topic of imbalanced classification have proposed throughout the years a large ... more Researchers in the topic of imbalanced classification have proposed throughout the years a large amount of different approaches to address this issue. To keep on developing this area of study, it is of extreme importance to make these methods available for the research community. This allows for a double advantage: (1) to analyze in depth the features and capabilities of the algorithms; and (2) to carry out a fair comparison with any novel proposal. Taking the former into account, different open source libraries and software packages on imbalanced classification can be found, being built under different tools. In this chapter, we compile the most significant ones focusing on their main characteristics and included methods, from standard DM to Big Data applications. Our intention is to make close to researchers, practitioners and corporations, a non-exhaustive list of the alternatives for applying diverse algorithms to their problem in order to achieve the most accurate results with the lowest effort. To present these software tools, this chapter is organized as follows. First, in Sect. 14.1 the significance of software implementations for imbalanced classification is stressed. Then, Sect. 14.2 introduces the Java tools, i.e. KEEL [2] and WEKA [17]. Next, Sect. 14.3 focus on different R packages. The “imbalanced-learn” Python toolbox [29] from “scikit learn” [39] is described in Sect. 14.4. Big Data solutions under Spark [26] are summarized in Sect. 14.5. Finally, Sect. 14.6 provides some concluding remarks.

Data Level Preprocessing Methods

Springer eBooks, 2018

The first mechanism to address the problem of imbalanced learning was the use of sampling methods... more The first mechanism to address the problem of imbalanced learning was the use of sampling methods. They consists of modifying a set of imbalanced data using different procedures to provide a balanced or more adequate data distribution to the subsequent learning tasks. In the specialized literature, many studies have shown that, for several types of classifiers, rebalancing the dataset significantly improves the overall performance of the classification compared to a non-preprocessed data set. Over the years, this procedure has been common and the use of sampling methods for imbalanced learning has been standardized. Still, classifiers do not always have to use this kind of preprocessing because many of them are able to directly deal with imbalanced datasets. There is no clear rule that tells us which strategy is best, whether to adapt the behavior of learning algorithms or to use data preprocessing techniques. However, data sampling and preprocessing techniques are standard techniques in imbalanced learning, they are widely used in Data Science problems. They are simple and easily configurable and can be used in synergy with any learning algorithm. This chapter will review the techniques of sampling, undersampling (the classical ones in Sect. 5.2 and advanced approaches in Sect. 5.3) and oversampling such as SMOTE in Sect. 5.4, as well as the most-known algorithm SMOTE and its derivatives in Sect. 5.5. Some hybridizations of undersampling and oversampling are described in Sect. 5.6. Experiments with graphical illustrations will be carried out to show the behavior of these techniques.

This paper proposes a multilabel fuzzy decision tree classifier named FuzzDTML. The algorithm use... more This paper proposes a multilabel fuzzy decision tree classifier named FuzzDTML. The algorithm uses generalized fuzzy entropy, aggregated over all labels, to choose the best attribute for growing the tree. The proposed algorithm also can generate leaves predicting partial label sets, which can incorporate to some degree the dependence among labels, as well as produce more interpretable models. An empirical analysis shows that, although the algorithm does not yet incorporate pruning nor fuzzy interval adjustment phases, it is competitive with other tree based approaches for multilabel classification, with better performance in data sets having numerical features that can be fuzzified.

Exploring Unclassified Texts Using Multiview Semisupervised Learning

IGI Global eBooks, Jan 18, 2011

Knowledge and Information Systems, Jul 6, 2018

Ensemble Learning

Springer eBooks, 2018

Dimensionality Reduction for Imbalanced Learning

Springer eBooks, 2018

One of the most successful data preprocessing techniques used is the reduction of the data dimens... more One of the most successful data preprocessing techniques used is the reduction of the data dimensionality by means of feature selection and/or feature extraction. The key idea is to simplify the data by replacing the original features with new created that extract the main information or simply select a subset of original set. Although this topic has been carefully studied in the specialized literature for the classical predictive problems, there are also several approaches specifically devised to deal with imbalance learning scenarios. Again, their main purpose is to exploit the most informative features to preserve as much as possible the concept related to the minority class. This chapter will describe the most-known techniques of feature selection and feature extraction developed to tackle imbalance data sets. We will consider these two main families of techniques separately and we will also provide the recent advances in feature selection and feature extraction by non-linear methods. In addition, we will mention a recently proposed discretization approach which is able to reduce the numeric features into categories. The chapter is organized as follows. After a short introduction in Sect. 9.1, we will review in Sect. 9.2 the straightforward solutions devised in feature selection for tackling imbalanced classification. Next, we will delve deeper into describing more advanced techniques for feature selection in Sect. 9.3. Section 9.4 will be devoted to explain the redefined feature extraction techniques based on linear models. In Sects. 9.5 and 9.6, a non-linear feature extraction technique based on autoencoders and a discretization method will be outlined, respectively. Finally, Sect. 9.7 will conclude this chapter.

Algorithm-Level Approaches

Springer eBooks, 2018

Algorithm-level solutions can be seen as an alternative approach to data pre-processing methods f... more Algorithm-level solutions can be seen as an alternative approach to data pre-processing methods for handling imbalanced datasets. Instead of focusing on modifying the training set in order to combat class skew, this approach aims at modifying the classifier learning procedure itself. This requires an in-depth understanding of the selected earning approach in order to identify what specific mechanism may be responsible for creating the bias towards the majority class. Algorithm-level solutions do not cause any shifts in data distributions, being more adaptable to various types of imbalanced datasets – at the cost of being specific only for a given classifier type. In this chapter we will discuss the basics of algorithm-level solutions, as well as review existing skew-insensitive modifications. To do so, the background will be introduced first in Sect. 6.1. Then, special attention will be given to four groups of methods. First, modifications of SVMs will be discussed in Sect. 6.2. Section 6.3 will focus on skew-insensitive decision trees. Variants of NN classifiers for imbalanced problems will be presented in Sect. 6.4 and skew insensitive Bayesian in Sect. 6.5. Finally, one-class classifiers will be discussed in Sect. 6.6, whereas Sect. 6.7 will conclude this chapter and will present future challenges in the field of algorithm-level solutions to class imbalance.

A Multi-Objective Evolutionary Algorithm to Build Knowledge Classification Rules with Specific Properties

International Conference Hybrid Intelligent Systems, Dec 13, 2006

This work proposes the use of evolutionary algorithms to build individual knowledge rules with sp... more This work proposes the use of evolutionary algorithms to build individual knowledge rules with specific properties that are usually neglected when conducted by traditional supervised learning methods. The proposed evolutionary algorithm uses a rank-based, multi-objective fitness ...

Introduction to KDD and Data Science

Springer eBooks, 2018

Nowadays, the availability of large volumes of data and the widespread use of tools for the prope... more Nowadays, the availability of large volumes of data and the widespread use of tools for the proper extraction of knowledge information has become very frequent, especially in large corporations. This fact has transformed the data analysis by orienting it towards certain specialized techniques included under the umbrella of Data Science. In summary, Data Science can be considered as a discipline for discovering new and significant relationships, patterns and trends in the examination of large amounts of data. Therefore, Data Science techniques pursue the automatic discovery of the knowledge contained in the information stored in large databases. These techniques aim to uncover patterns, profiles and trends through the analysis of data using reconnaissance technologies, such as clustering, classification, predictive analysis, association mining, among others. For this reason, we are witnessing the development of multiple software solutions for the treatment of data and integrating lots of Data Science algorithms. In order to better understand the nature of Data Science, this chapter is organized as follows. Sections 1.2 and 1.3 defines the Data Science terms and its workflow. Then, in Sect. 1.4 the standard problems in Data Science are introduced. Section 1.5 describes some standard data mining algorithms. Finally, in Sect. 1.6 some of the non-standard problems in Data Science are mentioned.

Dado um conjunto de exemplos (rotulados) de treinamento, ¶ e poss ¶ ³vel aplicar diversos algorit... more Dado um conjunto de exemplos (rotulados) de treinamento, ¶ e poss ¶ ³vel aplicar diversos algoritmos de Aprendizado de M ¶ aquina supervisionado, obtendo, para cada um desses algoritmos, uma descri» cão do conceito (classi¯cador) que descreve o conhecimento implicito nesses exemplos de uma forma mais compacta. Em geral, ¶ e poss ¶ ³vel transformar um classi¯cador, obtido atrav ¶ es de um algoritmo de aprendizado simb ¶ olico, os quais descrevem o conhecimento induzido em uma forma diretamente interpret ¶ avel por seres humanos, para um conjunto de regras. Essas regras podem ser avaliados em conjunto, comparando o desempenho de cada classi¯cador em rela» cão a outros classi¯cadores como uma \caixa preta", ou cada regra pode ser avaliada individualmente, quanto µ a qualidade, interessabilidade, novidade, entre outras medidas objetivas. Para a avalia» cão das regras, são necess ¶ arias informa» cões que permitam derivar facilmente medidas objetivas. Entretanto, somente alguns algoritmos de aprendizado apresentam essas informa» cões, mas de uma forma não uniforme. Neste trabalho ¶ e o proposta e implementada uma biblioteca de ferramentas que calcula um conjunto m ¶ ³nimo de informa» cões para cada regra, de uma forma padronizada, para um conjunto de algoritmos de aprendizado de m ¶ aquina supervisionado freqÄ uentemente utilizados pela comunidade. Essa biblioteca est ¶ a integrada a um sistema computacional de maior porte, que vem sendo desenvolvido em nosso Laborat ¶ orio de Inteligência Computacional {LABIC {, para realizar, entre outros, extra» cão autom ¶ atica e an ¶ alise de conhecimento. A biblioteca foi projetada de forma a admitir, futuramente, a inclusão de novos indutores.

Aquisição automática de conhecimento utilizando a biblioteca de aprendizado de máquina MLC

Anais, 1999

Rotulação Automática de Imagens Utilizando Aprendizado de Máquina Multirrótulo

SBBD (Short Papers), 2011

Resumo. O objetivo deste trabalho é a construção de um sistema de anotação automática (rotulação)... more Resumo. O objetivo deste trabalho é a construção de um sistema de anotação automática (rotulação) de imagens capaz de extrair informações (características) dessas imagens e de utilizar Aprendizado de Máquina (AM) multirrótulo com as informações extraídas para a tarefa de rotulação. AM multirrótulo é adequado dentro desse contexto pois na classificação multirrótulo mais de um rótulo pode ser atribuído a cada novo objeto a ser classificado, situação comum em rotulação de imagens. Foram realizados experimentos com algoritmos multirrótulo baseados em vizinhança e com diferentes extratores de características (e combinações desses extratores) em uma base de dados de imagens da cidade de Barcelona, que mostraram resultados iniciais promissores.

The Impact of Interdisciplinary, Gender and Geographic Distributions on the Citation Patterns of the Journal of Chemical Information and Modeling

Figshare, 2023

Data-driven water need estimation for IoT-based smart irrigation: A survey

Expert Systems With Applications, Sep 1, 2023

Comparative study of musical timbral variations: Crescendo and Vibrato using FFT-Acoustic Descriptor

Comparing Modern and Traditional Modeling Methods for Predicting Soil Moisture in Iot-Based Irrigation Systems

Eng

Soil moisture forecast for smart irrigation: The primetime for machine learning

Expert Systems with Applications

Software and Libraries for Imbalanced Classification

Data Level Preprocessing Methods

Springer eBooks, 2018

Exploring Unclassified Texts Using Multiview Semisupervised Learning

IGI Global eBooks, Jan 18, 2011

Knowledge and Information Systems, Jul 6, 2018

Ensemble Learning

Springer eBooks, 2018

Dimensionality Reduction for Imbalanced Learning

Springer eBooks, 2018

Algorithm-Level Approaches

Springer eBooks, 2018

A Multi-Objective Evolutionary Algorithm to Build Knowledge Classification Rules with Specific Properties

International Conference Hybrid Intelligent Systems, Dec 13, 2006

Introduction to KDD and Data Science

Springer eBooks, 2018

Aquisição automática de conhecimento utilizando a biblioteca de aprendizado de máquina MLC

Anais, 1999

Rotulação Automática de Imagens Utilizando Aprendizado de Máquina Multirrótulo

SBBD (Short Papers), 2011

Los escenarios sociales digitales dispuestos en la actualidad son empleados cada vez más como pla... more Los escenarios sociales digitales dispuestos en la actualidad son empleados cada vez más como plataformas para la deliberación y exposición de opiniones. Este artículo busca identificar algunos de los rasgos generales de participación ciudadana llevado a cabo en Twitter durante las elecciones realizadas en España el 24 de mayo de 2015, así como estimar el peso o la importancia que tuvieron los mensajes con carga ideológica definida y las variables asociadas a la publicación de este tipo de mensajes. Ello, a partir del social media mining de 24 900 tuiteos, recolectados a partir de los hashtags #24M y #Elecciones2015. El estudio nos permite observar, entre otras cosas, el rol asumido por los ciudadanos, quienes centraron su participación en esta elección en la divulgación del acontecer de esta jornada más que en el establecimiento de las diferentes posturas ideológicas que denotasen sus preferencias partidistas. A pesar de ello, se pudo ver cómo los mensajes con carga ideológica claramente definida estuvieron más impulsados por usuarios con orientación ideológica progresista o de tendencias de izquierda, además de la relación existente entre la publicación de estos mensajes y la orientación partidista de quienes divulgaron este tipo de mensajes.

—This paper proposes a multilabel fuzzy decision tree classifier named FuzzDTML. The algorithm us... more —This paper proposes a multilabel fuzzy decision tree classifier named FuzzDTML. The algorithm uses generalized fuzzy entropy, aggregated over all labels, to choose the best attribute for growing the tree. The proposed algorithm also can generate leaves predicting partial label sets, which can incorporate to some degree the dependence among labels, as well as produce more interpretable models. An empirical analysis shows that, although the algorithm does not yet incorporate pruning nor fuzzy interval adjustment phases, it is competitive with other tree based approaches for multilabel classification, with better performance in data sets having numerical features that can be fuzzified.