Sakinat Tijani -Folorunso | Olabisi Onabanjo University (original) (raw)

Papers by Sakinat Tijani -Folorunso

Research paper thumbnail of FAIR Guidelines and Data Regulatory Framework for Digital Health in Nigeria

Data Intelligence

Adopting the FAIR Guidelines—that data should be Findable, Accessible, Interoperable and Reusable... more Adopting the FAIR Guidelines—that data should be Findable, Accessible, Interoperable and Reusable (FAIR)—in the health data system in Nigeria will help protect data against use by unauthorised parties, while also making data more accessible to legitimate users. However, little is known about the FAIR Guidelines and their compatibility with data and health laws and policies in Nigeria. This study assesses the governance framework for digital and health/eHealth policies in Nigeria and explores the possibility of a policy window opening for the FAIR Guidelines to be adopted and implemented in Nigeria's eHealth sector. Ten Nigerian policy documents were examined for mention of the FAIR Guidelines (or FAIR Equivalent terminology) and the 15 sub-criteria or facets. The analysis found that although the FAIR Guidelines are not explicitly mentioned, 70% of the documents contain FAIR Equivalent terminology. The Nigeria Data Protection Regulation contained the most FAIR Equivalent principl...

Research paper thumbnail of Proof of Concept and Horizons on Deployment of FAIR Data Points in the COVID-19 Pandemic

Data Intelligence

Rapid and effective data sharing is necessary to control disease outbreaks, such as the current c... more Rapid and effective data sharing is necessary to control disease outbreaks, such as the current coronavirus pandemic. Despite the existence of data sharing agreements, data silos, lack of interoperable data infrastructures, and different institutional jurisdictions hinder data sharing and accessibility. To overcome these challenges, the Virus Outbreak Data Network (VODAN)-Africa initiative is championing an approach in which data never leaves the institution where it was generated, but, instead, algorithms can visit the data and query multiple datasets in an automated way. To make this possible, FAIR Data Points—distributed data repositories that host machine-actionable data and metadata that adhere to the FAIR Guidelines (that data should be Findable, Accessible, Interoperable and Reusable)—have been deployed in participating institutions using a dockerised bundle of tools called VODAN in a Box (ViB). ViB is a set of multiple FAIR-enabling and open-source services with a single goa...

Research paper thumbnail of Expanding Non-Patient COVID-19 Data: Towards the FAIRification of Migrants' Data in Tunisia, Libya and Niger

Data intelligence, Aug 18, 2022

This article describes the FAIRification process (which involves making data Findable, Accessible... more This article describes the FAIRification process (which involves making data Findable, Accessible, Interoperable and Reusable-or FAIR-for both machines and humans) for data related to the impact of COVID-19 on migrants, refugees and asylum seekers in Tunisia, Libya and Niger, according to the scheme adopted by GO FAIR. This process was divided into three phases: pre-FAIRification, FAIRification and post-FAIRification. Each phase consisted of seven steps. In the first phase, 118 indepth interviews and 565 press articles and research reports were collected by students and researchers at the University of Sousse in Tunisia and researchers in Niger. These interviews, articles and reports constitute the dataset for this research. In the second phase, the data were sorted and converted into a machine actionable format and published on a FAIR Data Point hosted at the University of Sousse. In the third phase, an assessment of the implementation of the FAIR Guidelines was undertaken. Certain barriers and challenges were faced in this process and solutions were found. For FAIR data curation, certain changes need to be made to the technical process. People need to be convinced to make these changes and that the implementation of FAIR will generate a long-term return on investment. Although the implementation of FAIR Guidelines is not straightforward, making our resources FAIR is essential to achieving better science together.

Research paper thumbnail of FAIR Machine Learning Model Pipeline Implementation of COVID-19 Data

Data Intelligence

Research and development are gradually becoming data-driven and the implementation of the FAIR Gu... more Research and development are gradually becoming data-driven and the implementation of the FAIR Guidelines (that data should be Findable, Accessible, Interoperable, and Reusable) for scientific data administration and stewardship has the potential to remarkably enhance the framework for the reuse of research data. In this way, FAIR is aiding digital transformation. The ‘FAIRification’ of data increases the interoperability and (re)usability of data, so that new and robust analytical tools, such as machine learning (ML) models, can access the data to deduce meaningful insights, extract actionable information, and identify hidden patterns. This article aims to build a FAIR ML model pipeline using the generic FAIRification workflow to make the whole ML analytics process FAIR. Accordingly, FAIR input data was modelled using a FAIR ML model. The output data from the FAIR ML model was also made FAIR. For this, a hybrid hierarchical k-means (HHK) clustering ML algorithm was applied to group...

Research paper thumbnail of Curriculum Development for FAIR Data Stewardship

Data Intelligence

The FAIR Guidelines attempts to make digital data Findable, Accessible, Interoperable, and Reusab... more The FAIR Guidelines attempts to make digital data Findable, Accessible, Interoperable, and Reusable (FAIR). To prepare FAIR data, a new data science discipline known as data stewardship is emerging and, as the FAIR Guidelines gain more acceptance, an increase in the demand for data stewards is expected. Consequently, there is a need to develop curricula to foster professional skills in data stewardship through effective knowledge communication. There have been a number of initiatives aimed at bridging the gap in FAIR data management training through both formal and informal programmes. This article describes the experience of developing a digital initiative for FAIR data management training under the Digital Innovations and Skills Hub (DISH) project. The FAIR Data Management course offers 6 short on-demand certificate modules over 12 weeks. The modules are divided into two sets: FAIR data and data science. The core subjects cover elementary topics in data science, regulatory framewo...

Research paper thumbnail of When and where Proactively predicting traffic accident in South Africa: our machine learning competition winning approach

International Journal of Society Systems Science, 2021

South Africa (SA) records high mortality originating from traffic accident annually making the co... more South Africa (SA) records high mortality originating from traffic accident annually making the country to be ranked highly among nations with the highest traffic mortality globally. There is seemingly no study that has attempted to forecast when and where next accident will occur in SA. This study aims to use machine learning method to predict traffic accident in SA for every hour ranging between 1 January and 31 March 2019 at a segment ID. We obtained details of accidents that occurred in Cape Town, SA between 2016 and 2019 SANRAL, Uber Movement and Cape Town FMS via Zindi competition platform. This research adopted Catboost and LightGBM models to predict the traffic incident occurrence. Our model shows a F1 score of 0.11. The results of this research will aid prediction of accident occurrence at a particular road segment hourly.

Research paper thumbnail of Evaluation of Multi-Target Regression Models on Africa Soil Properties

Research paper thumbnail of AiIoMT: IoMT-Based System-Enabled Artificial Intelligence for Enhanced Smart Healthcare Systems

Machine Learning for Critical Internet of Medical Things, 2022

Research paper thumbnail of A Multi-Step Predictive Model for COVID-19 Cases in Nigeria Using Machine Learning

International Series in Operations Research & Management Science

Research paper thumbnail of Heart Disease Classification Using Machine Learning Models

Informatics and Intelligent Applications

Research paper thumbnail of Author response for "Design of a FAIR digital data health infrastructure in Africa for COVID‐19 reporting and research

Research paper thumbnail of Migrating Business Services and Applications Into the Cloud

Cloud computing has attracted a lot of hyperbole since it became a trendy topic for IT managers t... more Cloud computing has attracted a lot of hyperbole since it became a trendy topic for IT managers to talk about. Companiesfrequently trumpet their cloud enabled services but rarely give up details on precisely how they achieved this or how muchof their infrastructure has been fully migrated. Security and reliability of cloud services are often raised as concerns. Byunderstanding the basics of cloud computing and knowing how to assess important factors such as security and theidentification of systems that are suitable for migration, it becomes much easier to design and implement a cloud strategy.This paper provides the essential facts about the cloud computing, list some factors to prepare for when adopting cloudcomputing, consideration for managers migrating their services and applications into the cloud. It also discussed the meritsof going into the cloud.Keywords: Cloud Computing, Public Cloud, Service as a Service, Application Migration, Decision Making

Research paper thumbnail of A secured transaction based on blockchain architecture in mobile banking platform

International Journal of Internet Technology and Secured Transactions, 2021

Research paper thumbnail of Artificial Intelligence and the Control of COVID-19: A Review of Machine and Deep Learning Approaches

Artificial Intelligence for COVID-19, 2021

This study explores the prevalent Machine and Deep Learning approaches for the control of COVID-1... more This study explores the prevalent Machine and Deep Learning approaches for the control of COVID-19. It reveals the impact of Artificial Intelligence in the case prediction, analysis, diagnosis, and treatment of the disease. Apart from discussing four (4) knowledge areas where Machine Learning and Deep Learning approaches were employed in the fight against the pandemic, we proposed a Generalized Artificial Intelligence Response Framework using those areas. We observed that most of the works seeking Artificial Intelligence scientific solutions to the pandemic were employing the use of chest X-ray images and chest computed tomography scans for prognosis and diagnosis while applying different Machine and Deep Learning approaches using available data dashboards. However, a production-ready landmark contribution towards the control of the disease through Artificial Intelligence is still at the moment a work in progress. Hence, the need for a response framework to give researchers and prac...

Research paper thumbnail of COVID-19 data by States in Nigeria

This dataset was collected from the official website of the Nigeria Centre for Disease Control (N... more This dataset was collected from the official website of the Nigeria Centre for Disease Control (NCDC) provides the daily incidence of COVID-19 from February 23, 2020, to April 10, 2021, were organised in a spreadsheet to build a daily time-series database. The dataset also contains population per state in Nigeria, COVID-19 testing laboratories, etc.

Research paper thumbnail of COVID-19, healthcare facilities, and economic related data in Nigeria

The dataset describes the healthcare facilities, state budget, and laboratories in Nigeria in res... more The dataset describes the healthcare facilities, state budget, and laboratories in Nigeria in response to the COVID-19 pandemic and the economic situation of the entire population of the country amidst the pandemic. The project GitHub repository can be found via https://bit.ly/COVID-19data_project_repo.

Research paper thumbnail of Empirical Study of Enhanced Sampling Schemes with Ensembles to Alleviate the Class Imbalance Problem

Classification of an imbalanced dataset is sub-optimal as traditional classifiers are biased towa... more Classification of an imbalanced dataset is sub-optimal as traditional classifiers are biased towards the majority class completely ignoring the minority class. However, this minority class is a class of interest and should not be ignored as its misclassification cost is higher. This research is aimed at identifying and treating imbalance dataset. This study proposes five SMOTE (Synthetic Minority Oversampling Technique)-based enhanced data sampling schemes with both homogeneous and heterogeneous ensembles to alleviate the class imbalance problem. Waikato Environment Knowledge Analysis (WEKA) filter library was extended with these enhanced sampling schemes for pre-processing of the datasets before their classification. Real life datasets collected from different domains in Nigeria were used for its implementation and Receivers’ Operators Characteristics Area Under Curve (ROC_AUC) and Performance Loss/Gain metrics were used as evaluation metric for these schemes. SMOTE300ENN, one of t...

Research paper thumbnail of Alleviating Classification Problem of Imbalanced Dataset

The Class Imbalance problem occurs when there are many more instances of some class than others. ... more The Class Imbalance problem occurs when there are many more instances of some class than others. i.e. skewed class distribution. In cases like this, standard classifier tends to be overwhelmed by the majority class and ignores the minority class. It is one of the 10 challenging problems of data mining research and pattern recognition. This imbalanced dataset degrades the performance of the classifier as accuracy is tendered towards the majority class. Several techniques have been proposed to solve this problem. This paper aims to improve the true positive rate/ detection of the minority class (GDM) which is the class of interest. This study proposes the use of two under sampling techniques reported in the literature. It involves under sampling the majority class which balances the dataset before classification. These under sampling schemes were evaluated on three learning algorithms (Decision tree both pruned and un- pruned and RIPPER) using Matthew’s Correlation Coefficient (MCC) a...

Research paper thumbnail of Empirical Comparison of Time Series Data Mining Algorithms

Time series is a sequence of observed data that is usually ordered in time. Time series data mini... more Time series is a sequence of observed data that is usually ordered in time. Time series data mining is the innovative application of the principles and techniques of data mining in the analysis of time series. This research is aimed to apply data mining techniques to forecasting time series data. Electric Power consumption data consumed by Nigerians from 2001 to 2017. Experiments are conducted with four data mining techniques: Random Regression Forest (RRF), Linear Regression (LR), Support Vector Regression (SVR) and Artificial Neural Network (ANN) which were evaluated based on their forecasting errors generated: Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE) and prediction accuracy on Waikato Environment for Knowledge Analysis (WEKA) platform. The combination of parameters that yields the best results in terms of predefined performance criteria was chosen as optimal for each regressor. A comparative analysis of the regressors’ perfo...

Research paper thumbnail of ORIN: The Nigerian Music Dataset

Music Information Retrieval (MIR) is the task of extracting higher-level information such as genr... more Music Information Retrieval (MIR) is the task of extracting higher-level information such as genre, artist or instrumentation from music [1]. Music genre classification is an important area of MIR and a rapidly evolving research area. Presently, slight research work has been done on automatic music genre classification of Nigerian songs. Hence, this study presents a new music dataset named ORIN which mainly cores traditional Nigerian songs of four genres (Fuji, Juju, Highlife and Apala). The ORIN dataset consists of 208 Nigerian songs downloaded from the internet. Timbral Texture Features were then mined from one or two 30 second segments from each song using the Librosa [2] python library. Songs features were mined straight from the digital song files. Each piece of song was sampled at 22.5Khz 16-bit mono audio files. The song signal was then shared into frames of 1024 samples with 50% overlay between successive frames. A Hamming window is applied without pre-emphasis for each frame. Then, 29 averaged Spectral energies are obtained from a bank of 29 Mel triangular filters followed by a DCT, yielding 20 Mel frequency Cepstrum Coefficients (MFCC). The mean and standard deviation of the values taken across frames is considered as the representative final feature that is fed to the model for each of the spectral features. These features consist of the time (FFT) and frequency (MFCC) feature sets of the dataset domains.

Research paper thumbnail of FAIR Guidelines and Data Regulatory Framework for Digital Health in Nigeria

Data Intelligence

Adopting the FAIR Guidelines—that data should be Findable, Accessible, Interoperable and Reusable... more Adopting the FAIR Guidelines—that data should be Findable, Accessible, Interoperable and Reusable (FAIR)—in the health data system in Nigeria will help protect data against use by unauthorised parties, while also making data more accessible to legitimate users. However, little is known about the FAIR Guidelines and their compatibility with data and health laws and policies in Nigeria. This study assesses the governance framework for digital and health/eHealth policies in Nigeria and explores the possibility of a policy window opening for the FAIR Guidelines to be adopted and implemented in Nigeria's eHealth sector. Ten Nigerian policy documents were examined for mention of the FAIR Guidelines (or FAIR Equivalent terminology) and the 15 sub-criteria or facets. The analysis found that although the FAIR Guidelines are not explicitly mentioned, 70% of the documents contain FAIR Equivalent terminology. The Nigeria Data Protection Regulation contained the most FAIR Equivalent principl...

Research paper thumbnail of Proof of Concept and Horizons on Deployment of FAIR Data Points in the COVID-19 Pandemic

Data Intelligence

Rapid and effective data sharing is necessary to control disease outbreaks, such as the current c... more Rapid and effective data sharing is necessary to control disease outbreaks, such as the current coronavirus pandemic. Despite the existence of data sharing agreements, data silos, lack of interoperable data infrastructures, and different institutional jurisdictions hinder data sharing and accessibility. To overcome these challenges, the Virus Outbreak Data Network (VODAN)-Africa initiative is championing an approach in which data never leaves the institution where it was generated, but, instead, algorithms can visit the data and query multiple datasets in an automated way. To make this possible, FAIR Data Points—distributed data repositories that host machine-actionable data and metadata that adhere to the FAIR Guidelines (that data should be Findable, Accessible, Interoperable and Reusable)—have been deployed in participating institutions using a dockerised bundle of tools called VODAN in a Box (ViB). ViB is a set of multiple FAIR-enabling and open-source services with a single goa...

Research paper thumbnail of Expanding Non-Patient COVID-19 Data: Towards the FAIRification of Migrants' Data in Tunisia, Libya and Niger

Data intelligence, Aug 18, 2022

This article describes the FAIRification process (which involves making data Findable, Accessible... more This article describes the FAIRification process (which involves making data Findable, Accessible, Interoperable and Reusable-or FAIR-for both machines and humans) for data related to the impact of COVID-19 on migrants, refugees and asylum seekers in Tunisia, Libya and Niger, according to the scheme adopted by GO FAIR. This process was divided into three phases: pre-FAIRification, FAIRification and post-FAIRification. Each phase consisted of seven steps. In the first phase, 118 indepth interviews and 565 press articles and research reports were collected by students and researchers at the University of Sousse in Tunisia and researchers in Niger. These interviews, articles and reports constitute the dataset for this research. In the second phase, the data were sorted and converted into a machine actionable format and published on a FAIR Data Point hosted at the University of Sousse. In the third phase, an assessment of the implementation of the FAIR Guidelines was undertaken. Certain barriers and challenges were faced in this process and solutions were found. For FAIR data curation, certain changes need to be made to the technical process. People need to be convinced to make these changes and that the implementation of FAIR will generate a long-term return on investment. Although the implementation of FAIR Guidelines is not straightforward, making our resources FAIR is essential to achieving better science together.

Research paper thumbnail of FAIR Machine Learning Model Pipeline Implementation of COVID-19 Data

Data Intelligence

Research and development are gradually becoming data-driven and the implementation of the FAIR Gu... more Research and development are gradually becoming data-driven and the implementation of the FAIR Guidelines (that data should be Findable, Accessible, Interoperable, and Reusable) for scientific data administration and stewardship has the potential to remarkably enhance the framework for the reuse of research data. In this way, FAIR is aiding digital transformation. The ‘FAIRification’ of data increases the interoperability and (re)usability of data, so that new and robust analytical tools, such as machine learning (ML) models, can access the data to deduce meaningful insights, extract actionable information, and identify hidden patterns. This article aims to build a FAIR ML model pipeline using the generic FAIRification workflow to make the whole ML analytics process FAIR. Accordingly, FAIR input data was modelled using a FAIR ML model. The output data from the FAIR ML model was also made FAIR. For this, a hybrid hierarchical k-means (HHK) clustering ML algorithm was applied to group...

Research paper thumbnail of Curriculum Development for FAIR Data Stewardship

Data Intelligence

The FAIR Guidelines attempts to make digital data Findable, Accessible, Interoperable, and Reusab... more The FAIR Guidelines attempts to make digital data Findable, Accessible, Interoperable, and Reusable (FAIR). To prepare FAIR data, a new data science discipline known as data stewardship is emerging and, as the FAIR Guidelines gain more acceptance, an increase in the demand for data stewards is expected. Consequently, there is a need to develop curricula to foster professional skills in data stewardship through effective knowledge communication. There have been a number of initiatives aimed at bridging the gap in FAIR data management training through both formal and informal programmes. This article describes the experience of developing a digital initiative for FAIR data management training under the Digital Innovations and Skills Hub (DISH) project. The FAIR Data Management course offers 6 short on-demand certificate modules over 12 weeks. The modules are divided into two sets: FAIR data and data science. The core subjects cover elementary topics in data science, regulatory framewo...

Research paper thumbnail of When and where Proactively predicting traffic accident in South Africa: our machine learning competition winning approach

International Journal of Society Systems Science, 2021

South Africa (SA) records high mortality originating from traffic accident annually making the co... more South Africa (SA) records high mortality originating from traffic accident annually making the country to be ranked highly among nations with the highest traffic mortality globally. There is seemingly no study that has attempted to forecast when and where next accident will occur in SA. This study aims to use machine learning method to predict traffic accident in SA for every hour ranging between 1 January and 31 March 2019 at a segment ID. We obtained details of accidents that occurred in Cape Town, SA between 2016 and 2019 SANRAL, Uber Movement and Cape Town FMS via Zindi competition platform. This research adopted Catboost and LightGBM models to predict the traffic incident occurrence. Our model shows a F1 score of 0.11. The results of this research will aid prediction of accident occurrence at a particular road segment hourly.

Research paper thumbnail of Evaluation of Multi-Target Regression Models on Africa Soil Properties

Research paper thumbnail of AiIoMT: IoMT-Based System-Enabled Artificial Intelligence for Enhanced Smart Healthcare Systems

Machine Learning for Critical Internet of Medical Things, 2022

Research paper thumbnail of A Multi-Step Predictive Model for COVID-19 Cases in Nigeria Using Machine Learning

International Series in Operations Research & Management Science

Research paper thumbnail of Heart Disease Classification Using Machine Learning Models

Informatics and Intelligent Applications

Research paper thumbnail of Author response for "Design of a FAIR digital data health infrastructure in Africa for COVID‐19 reporting and research

Research paper thumbnail of Migrating Business Services and Applications Into the Cloud

Cloud computing has attracted a lot of hyperbole since it became a trendy topic for IT managers t... more Cloud computing has attracted a lot of hyperbole since it became a trendy topic for IT managers to talk about. Companiesfrequently trumpet their cloud enabled services but rarely give up details on precisely how they achieved this or how muchof their infrastructure has been fully migrated. Security and reliability of cloud services are often raised as concerns. Byunderstanding the basics of cloud computing and knowing how to assess important factors such as security and theidentification of systems that are suitable for migration, it becomes much easier to design and implement a cloud strategy.This paper provides the essential facts about the cloud computing, list some factors to prepare for when adopting cloudcomputing, consideration for managers migrating their services and applications into the cloud. It also discussed the meritsof going into the cloud.Keywords: Cloud Computing, Public Cloud, Service as a Service, Application Migration, Decision Making

Research paper thumbnail of A secured transaction based on blockchain architecture in mobile banking platform

International Journal of Internet Technology and Secured Transactions, 2021

Research paper thumbnail of Artificial Intelligence and the Control of COVID-19: A Review of Machine and Deep Learning Approaches

Artificial Intelligence for COVID-19, 2021

This study explores the prevalent Machine and Deep Learning approaches for the control of COVID-1... more This study explores the prevalent Machine and Deep Learning approaches for the control of COVID-19. It reveals the impact of Artificial Intelligence in the case prediction, analysis, diagnosis, and treatment of the disease. Apart from discussing four (4) knowledge areas where Machine Learning and Deep Learning approaches were employed in the fight against the pandemic, we proposed a Generalized Artificial Intelligence Response Framework using those areas. We observed that most of the works seeking Artificial Intelligence scientific solutions to the pandemic were employing the use of chest X-ray images and chest computed tomography scans for prognosis and diagnosis while applying different Machine and Deep Learning approaches using available data dashboards. However, a production-ready landmark contribution towards the control of the disease through Artificial Intelligence is still at the moment a work in progress. Hence, the need for a response framework to give researchers and prac...

Research paper thumbnail of COVID-19 data by States in Nigeria

This dataset was collected from the official website of the Nigeria Centre for Disease Control (N... more This dataset was collected from the official website of the Nigeria Centre for Disease Control (NCDC) provides the daily incidence of COVID-19 from February 23, 2020, to April 10, 2021, were organised in a spreadsheet to build a daily time-series database. The dataset also contains population per state in Nigeria, COVID-19 testing laboratories, etc.

Research paper thumbnail of COVID-19, healthcare facilities, and economic related data in Nigeria

The dataset describes the healthcare facilities, state budget, and laboratories in Nigeria in res... more The dataset describes the healthcare facilities, state budget, and laboratories in Nigeria in response to the COVID-19 pandemic and the economic situation of the entire population of the country amidst the pandemic. The project GitHub repository can be found via https://bit.ly/COVID-19data_project_repo.

Research paper thumbnail of Empirical Study of Enhanced Sampling Schemes with Ensembles to Alleviate the Class Imbalance Problem

Classification of an imbalanced dataset is sub-optimal as traditional classifiers are biased towa... more Classification of an imbalanced dataset is sub-optimal as traditional classifiers are biased towards the majority class completely ignoring the minority class. However, this minority class is a class of interest and should not be ignored as its misclassification cost is higher. This research is aimed at identifying and treating imbalance dataset. This study proposes five SMOTE (Synthetic Minority Oversampling Technique)-based enhanced data sampling schemes with both homogeneous and heterogeneous ensembles to alleviate the class imbalance problem. Waikato Environment Knowledge Analysis (WEKA) filter library was extended with these enhanced sampling schemes for pre-processing of the datasets before their classification. Real life datasets collected from different domains in Nigeria were used for its implementation and Receivers’ Operators Characteristics Area Under Curve (ROC_AUC) and Performance Loss/Gain metrics were used as evaluation metric for these schemes. SMOTE300ENN, one of t...

Research paper thumbnail of Alleviating Classification Problem of Imbalanced Dataset

The Class Imbalance problem occurs when there are many more instances of some class than others. ... more The Class Imbalance problem occurs when there are many more instances of some class than others. i.e. skewed class distribution. In cases like this, standard classifier tends to be overwhelmed by the majority class and ignores the minority class. It is one of the 10 challenging problems of data mining research and pattern recognition. This imbalanced dataset degrades the performance of the classifier as accuracy is tendered towards the majority class. Several techniques have been proposed to solve this problem. This paper aims to improve the true positive rate/ detection of the minority class (GDM) which is the class of interest. This study proposes the use of two under sampling techniques reported in the literature. It involves under sampling the majority class which balances the dataset before classification. These under sampling schemes were evaluated on three learning algorithms (Decision tree both pruned and un- pruned and RIPPER) using Matthew’s Correlation Coefficient (MCC) a...

Research paper thumbnail of Empirical Comparison of Time Series Data Mining Algorithms

Time series is a sequence of observed data that is usually ordered in time. Time series data mini... more Time series is a sequence of observed data that is usually ordered in time. Time series data mining is the innovative application of the principles and techniques of data mining in the analysis of time series. This research is aimed to apply data mining techniques to forecasting time series data. Electric Power consumption data consumed by Nigerians from 2001 to 2017. Experiments are conducted with four data mining techniques: Random Regression Forest (RRF), Linear Regression (LR), Support Vector Regression (SVR) and Artificial Neural Network (ANN) which were evaluated based on their forecasting errors generated: Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE) and prediction accuracy on Waikato Environment for Knowledge Analysis (WEKA) platform. The combination of parameters that yields the best results in terms of predefined performance criteria was chosen as optimal for each regressor. A comparative analysis of the regressors’ perfo...

Research paper thumbnail of ORIN: The Nigerian Music Dataset

Music Information Retrieval (MIR) is the task of extracting higher-level information such as genr... more Music Information Retrieval (MIR) is the task of extracting higher-level information such as genre, artist or instrumentation from music [1]. Music genre classification is an important area of MIR and a rapidly evolving research area. Presently, slight research work has been done on automatic music genre classification of Nigerian songs. Hence, this study presents a new music dataset named ORIN which mainly cores traditional Nigerian songs of four genres (Fuji, Juju, Highlife and Apala). The ORIN dataset consists of 208 Nigerian songs downloaded from the internet. Timbral Texture Features were then mined from one or two 30 second segments from each song using the Librosa [2] python library. Songs features were mined straight from the digital song files. Each piece of song was sampled at 22.5Khz 16-bit mono audio files. The song signal was then shared into frames of 1024 samples with 50% overlay between successive frames. A Hamming window is applied without pre-emphasis for each frame. Then, 29 averaged Spectral energies are obtained from a bank of 29 Mel triangular filters followed by a DCT, yielding 20 Mel frequency Cepstrum Coefficients (MFCC). The mean and standard deviation of the values taken across frames is considered as the representative final feature that is fed to the model for each of the spectral features. These features consist of the time (FFT) and frequency (MFCC) feature sets of the dataset domains.