Bhekisipho Twala | Tshwane University of Technology (original) (raw)

Papers by Bhekisipho Twala

Research paper thumbnail of M.: A New Imputation Method for Small Software Project Data Sets

Effort prediction is a very important issue for software project management. Historical project d... more Effort prediction is a very important issue for software project management. Historical project data sets are frequently used to support such prediction. But missing data are often contained in these data sets and this makes prediction more difficult. One common practice is to ignore the cases with missing data, but this makes the originally small software project database even smaller and can further decrease the accuracy of prediction. The alternative is missing data imputation and there are many imputation methods. But small size (in terms of the number of cases) is usually an important characteristic of such software data sets, and complicated imputation methods prefer larger data sets, so we explore using simple methods to impute missing data in small project effort data sets. For this reason we propose a class mean imputation (CMI) method based k-NN hot deck imputation method (MINI) to impute both continuous and categorical missing data in small data sets. We use an incrementa...

Research paper thumbnail of A Robust Machine Learning Approach to Microprocessor Instructions Identification

Since the first publication, side channel leakage has been widely used for the purposes of extrac... more Since the first publication, side channel leakage has been widely used for the purposes of extracting secret information, such as cryptographic keys, from embedded devices. However, in a few instances it has been utilised for extracting other information about the internal state of a computing device. In this paper, we show how to create a robust instruction-level side channel leakage profile of an embedded processor. Using the profile we show how to extract executed instructions from the device’s leakage with good accuracy. In addition, we provide a comparison between several performance and recognition enhancement tools.

Research paper thumbnail of Global optimisation using Pareto cuckoo search algorithm

International Journal of Advanced Computer Research, 2017

It is important to reach an optimal way of doing things in the real world because resources are n... more It is important to reach an optimal way of doing things in the real world because resources are normally limited. Optimisation is about reaching better results using resources available. Some examples of optimisation include, but not limited to electricity network operation, electricity generation, wireless communications routing and minimization of energy losses during electricity transmission. Proper validations of optimisation algorithms require assessment of computational time and convergence rate in addition to the accuracy to determine the minimum or maximum values [1-7]. Cuckoo search (CS) algorithms have proved to be more effective than other nature-inspired algorithms in solving complex problems [8, 9]. The original CS algorithm step sizes are derived from the Levy probability distribution function. However, some researchers have managed to improve the performance of CS by using different probability distributions to determine step sizes. *Author for correspondence The first study (in 2012) was done by Zheng and Zhou [10] who used Gauss distribution instead of Levy distribution. When applied to find global minimum values of 6 mathematics test functions, the Gauss CS performed better than the Levy CS in all cases. Furthermore, the Gauss and Levy CS algorithms were used to solve engineering design optimisation problem. The results further confirmed that Gauss CS is better than the Levy CS in terms of the higher convergence rate and the average generation was reduced from 20.15 to 13.95 for Gauss CS. The rapid growing rate of documentation in the internet space poses some challenges, especially in the documentation retrieval process. Zaw and Mon [11], solved this web document clustering by using a Gauss based CS algorithm. The algorithm was tested on 3 clusters and 300 documents. The results confirmed that the Gauss CS algorithm outperformed Levy CS algorithm. More specifically, the convergence rate of Gauss CS and Levy CS are 120 and 160 iterations, respectively. The quality of clustering was determined by a combination of precision and recall, called the F-measure where high F-measure indicated high accuracy. The Gauss

Research paper thumbnail of Practical Techniques for Securing the Internet of Things (IoT) Against Side Channel Attacks

Studies in Big Data, 2017

As a global infrastructure with the aim of enabling objects to communicate with each other, the I... more As a global infrastructure with the aim of enabling objects to communicate with each other, the Internet of Things (IoT) is being widely used and applied to many critical applications. While that is true, it should be pointed out that the introduction of IoT could also expose Information Communication and Technology (ICT) environments to new security threats such as side channel attacks due to increased openness. Side-channel analysis is known to be a serious threat to embedded devices. Side-channel analysis or power analysis attempts to expose devices cryptographic keys through the evaluation of leakage information that emanates from a physical implementation. In the work presented herein, it is shown that a skilful attacker can take advantage of side channel analysis to break a 3DES implementation on an FPGA platform. Because of the threats posed by side channel analysis to ICT systems in general and IoT in particular, counter attack mechanisms in the form of leakage reduction techniques applicable to CMOS devices are proposed and evaluated. The modelling results revealed that building CMOS devices with high-κ dielectrics or adding strain in silicon during the device fabrication could help drastically reduce leakages in CMOS devices and therefore assist in designing more effective countermeasures for side channel analysis.

Research paper thumbnail of Unsupervised learning for robust Bitcoin fraud detection

2016 Information Security for South Africa (ISSA), 2016

The rampant absorption of Bitcoin as a cryptographic currency, along with rising cybercrime activ... more The rampant absorption of Bitcoin as a cryptographic currency, along with rising cybercrime activities, warrants utilization of anomaly detection to identify potential fraud. Anomaly detection plays a pivotal role in data mining since most outlying points contain crucial information for further investigation. In the financial world which the Bitcoin network is part of by default, anomaly detection amounts to fraud detection. This paper investigates the use of trimmed k-means, that is capable of simultaneous clustering of objects and fraud detection in a multivariate setup, to detect fraudulent activity in Bitcoin transactions. The proposed approach detects more fraudulent transactions than similar studies or reports on the same dataset.

Research paper thumbnail of An electromagnetic approach to smart card instruction identification using machine learning techniques

2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), 2017

Since the first publication, side channel leakage has been widely used for the purposes of extrac... more Since the first publication, side channel leakage has been widely used for the purposes of extracting secret information, such as cryptographic keys, from embedded devices. However, in a few instances it has been utilized for extracting other information about the internal state of a computing device. In this paper, we show how to create a robust instruction-level side channel leakage profile of an embedded processor. Using the electromagnetic profile we show how to extract executed instructions from a smart card's leakage with good accuracy. In addition, we provide a comparison between several performance and recognition enhancement tools.

Research paper thumbnail of Estimating “Teaching Effectiveness” using Classification

In this paper we present the application of three classification methods on an educational databa... more In this paper we present the application of three classification methods on an educational database. Measures are collected in educational context, especially from classroom visits. Those data mining techniques process quantified acts and behaviours to estimate a global characteristic of teaching effectiveness, the teacher’s “ability to change the course of events”. A database collected by the second author (inspector-researcher) among his professional classrooms visits to more than 200 teachers is exploited. An interactive grid gathering 63 educational acts and behaviours was conceived as an observation instrument for those visits. Within WEKA environment, and with progressive enhancement of our database, the three classification methods provide rates of correct classified instances that exceed 94%.

Research paper thumbnail of Techniques applied in design optimization of parallel manipulators

There are some key advantages associated with parallel robots, which have warranted their continu... more There are some key advantages associated with parallel robots, which have warranted their continued research and wide application in both university laboratories and industry. To obtain a parallel manipulator with good properties, as customized by user specifications, the design parameters of a parallel manipulator must be optimized. The optimal design of the general parallel manipulator's kinematic parameters may be decomposed into two processes: structural synthesis and dimensional synthesis. Although both these processes will be described, the scope of this paper explores the dimensional synthesis aspect. Historical optimization methods adopted by researchers are discussed. This paper presents dimensional synthesis approaches based on performance requirements that have a potential to obtain almost all feasible design solutions that satisfy the requirements. The optimal design problem is a constrained nonlinear optimization problem with no explicit analytical expression. This ...

Research paper thumbnail of Reverse engineering smart card malware using side channel analysis with machine learning techniques

2016 IEEE International Conference on Big Data (Big Data), 2016

From inception, side channel leakage has been widely used for the purposes of extracting secret i... more From inception, side channel leakage has been widely used for the purposes of extracting secret information, such as cryptographic keys, from embedded devices. However, in a few instances it has been utilized for extracting other information about the internal state of a computing device. In this paper, we exploit side channel information to recover large parts of the Sykipot malware program executed on a smart card. We present the first methodology to recover the program code of a smart card malware by evaluating its power consumption only. Besides well-studied methods from side channel analysis, we apply a combination of dimensionality reduction techniques in the form of PCA and LDA models to compress the large amount of data generated while preserving as much variance of the original data as possible. Among feature extraction techniques, PCA and LDA are very common dimensionality reduction algorithms that have successfully been applied in many classification problems like face recognition, character recognition, speech recognition, etc. with the chief objective being to eliminate insignificant data (without losing too much information) during the pre-processing step. In addition to quantifying the potential of the created side channel based disassembler, we highlight its diverse and unique application scenarios.

Research paper thumbnail of Accessing Imbalance Learning Using Dynamic Selection Approach in Water Quality Anomaly Detection

Symmetry

Automatic anomaly detection monitoring plays a vital role in water utilities’ distribution system... more Automatic anomaly detection monitoring plays a vital role in water utilities’ distribution systems to reduce the risk posed by unclean water to consumers. One of the major problems with anomaly detection is imbalanced datasets. Dynamic selection techniques combined with ensemble models have proven to be effective for imbalanced datasets classification tasks. In this paper, water quality anomaly detection is formulated as a classification problem in the presences of class imbalance. To tackle this problem, considering the asymmetry dataset distribution between the majority and minority classes, the performance of sixteen previously proposed single and static ensemble classification methods embedded with resampling strategies are first optimised and compared. After that, six dynamic selection techniques, namely, Modified Class Rank (Rank), Local Class Accuracy (LCA), Overall-Local Accuracy (OLA), K-Nearest Oracles Eliminate (KNORA-E), K-Nearest Oracles Union (KNORA-U) and Meta-Learnin...

Research paper thumbnail of Assessing the Quality and Cleaning of a Software Project Dataset: An Experience Report

OBJECTIVE-The aim is to report upon an assessment of the impact noise has on the predictive accur... more OBJECTIVE-The aim is to report upon an assessment of the impact noise has on the predictive accuracy by comparing noise handling techniques. METHOD-We describe the process of cleaning a large software management dataset comprising initially of more than 10,000 projects. The data quality is mainly assessed through feedback from the data provider and manual inspection of the data. Three methods of noise correction (polishing, noise elimination and robust algorithms) are compared with each other assessing their accuracy. The noise detection was undertaken by using a regression tree model. RESULTS-Three noise correction methods are compared and different results in their accuracy where noted. CONCLUSIONS-The results demonstrated that polishing improves classification accuracy compared to noise elimination and robust algorithms approaches.

Research paper thumbnail of Impact of noise on credit risk prediction: Does data quality really matter?

Intelligent Data Analysis

ABSTRACT Machine learning has been successfully used for credit-evaluation decisions. Most resear... more ABSTRACT Machine learning has been successfully used for credit-evaluation decisions. Most research on machine learning assumes that the attributes of training and tests instances are not only completely specified but are also free from noise. Real world data, however, often suffer from corruptions or noise but not always known. This is the heart of information-based credit risk models. However, blindly applying such machine learning techniques to noisy financial credit risk evaluation data may fail to make very good or perfect predictions. Unfortunately, despite extensive research over the last decades, the impact of poor quality of data especially noise on the accuracy of credit risk has attracted less attention, even though it remains a significant problem for many. This paper investigates the robustness of five machine learning supervised algorithms to noisy credit risk environment. In particular, we show that when noise is added to four real-world credit risk domains, a significant and disproportionate number of total errors are contributed by class noise compared to attribute noise; thus, in the presence of noise, it is noise on the class variable that are responsible for the poor predictive accuracy of the learning concept.

Research paper thumbnail of Sources Of Spending Variation In Professional Services Among Texas Hospital Referral Regions: An Analysis Of Private Insurance Population

Research paper thumbnail of Determining Distribution Power System Loading Measurements Accuracy Using Fuzzy Logic

Procedia Manufacturing, 2016

Research paper thumbnail of A New Imputation Method for Small Software Project Data Sets

Effort prediction is a very important issue for software project management. Historical project d... more Effort prediction is a very important issue for software project management. Historical project data sets are frequently used to support such prediction. But missing data are often contained in these data sets and this makes prediction more difficult. One common ...

Research paper thumbnail of Reasoning with Noisy Software Effort Data

Applied Artificial Intelligence, Jul 14, 2014

ABSTRACT Constructing an accurate effort prediction model remains a challenge in software enginee... more ABSTRACT Constructing an accurate effort prediction model remains a challenge in software engineering. Recently, machine learning classifiers have been used successfully for software effort evaluation decisions. However, the development and validation of these classifiers (and other modes) require good quality data. Most research on machine learning assumes that the attributes of training and tests instances are not only completely specified but are also free from noise. Real-world industrial datasets, however, suffer from corruption or noise that is not always known. However, blindly applying such machine learning techniques to noisy software effort evaluation data may fail to make very good or perfect predictions, resulting in poor decisions and ineffective project management. This article investigates the effect of noisy domains on the learning accuracy of eight machine learning and statistical pattern recognition algorithms. We further derive solutions for the problem of noisy domains in software effort prediction from a probabilistic point of view. Our experimental results show that our algorithm can improve prediction for software effort corrupted by noise with reasonable and much improved accuracy.

Research paper thumbnail of First principle leakage current reduction technique for CMOS devices

2015 International Conference on Computing, Communication and Security (ICCCS), 2015

Research paper thumbnail of Leakage current minimisation and power reduction techniques using sub-threshold design

2015 International Conference on Information Society (i-Society), 2015

Research paper thumbnail of Simulation and parameter optimization of polysilicon gate biaxial strained silicon MOSFETs

2015 Fifth International Conference on Digital Information Processing and Communications (ICDIPC), 2015

Research paper thumbnail of Data Mining for Teaching and Learning Effectiveness Enhancement

It is a data mining research project that contributes to improve teaching and learning effectiven... more It is a data mining research project that contributes to improve teaching and learning effectiveness within academic institutions by exploiting intelligent algorithm on collected databases for educational knowledge extraction. These teaching and learning databases are accumulated from quantitative " measures " done through indoors classroom visits within academic institutions, online web access learners' questionnaires answers, paper written statements' analysis of academic exams in STEM education (science, technology, engineering, and mathematics), and online elementary grades seizure from written traces of learners' performances within STEM exams. Findings of these processes, elaborated by researcher's team within Johannesburg University, are disseminated through diversified publication in Tunisia and South Africa, and are the subject of multiple professional meetings, especially, teachers training sessions. The project's data mining strategy in educa...

Research paper thumbnail of M.: A New Imputation Method for Small Software Project Data Sets

Effort prediction is a very important issue for software project management. Historical project d... more Effort prediction is a very important issue for software project management. Historical project data sets are frequently used to support such prediction. But missing data are often contained in these data sets and this makes prediction more difficult. One common practice is to ignore the cases with missing data, but this makes the originally small software project database even smaller and can further decrease the accuracy of prediction. The alternative is missing data imputation and there are many imputation methods. But small size (in terms of the number of cases) is usually an important characteristic of such software data sets, and complicated imputation methods prefer larger data sets, so we explore using simple methods to impute missing data in small project effort data sets. For this reason we propose a class mean imputation (CMI) method based k-NN hot deck imputation method (MINI) to impute both continuous and categorical missing data in small data sets. We use an incrementa...

Research paper thumbnail of A Robust Machine Learning Approach to Microprocessor Instructions Identification

Since the first publication, side channel leakage has been widely used for the purposes of extrac... more Since the first publication, side channel leakage has been widely used for the purposes of extracting secret information, such as cryptographic keys, from embedded devices. However, in a few instances it has been utilised for extracting other information about the internal state of a computing device. In this paper, we show how to create a robust instruction-level side channel leakage profile of an embedded processor. Using the profile we show how to extract executed instructions from the device’s leakage with good accuracy. In addition, we provide a comparison between several performance and recognition enhancement tools.

Research paper thumbnail of Global optimisation using Pareto cuckoo search algorithm

International Journal of Advanced Computer Research, 2017

It is important to reach an optimal way of doing things in the real world because resources are n... more It is important to reach an optimal way of doing things in the real world because resources are normally limited. Optimisation is about reaching better results using resources available. Some examples of optimisation include, but not limited to electricity network operation, electricity generation, wireless communications routing and minimization of energy losses during electricity transmission. Proper validations of optimisation algorithms require assessment of computational time and convergence rate in addition to the accuracy to determine the minimum or maximum values [1-7]. Cuckoo search (CS) algorithms have proved to be more effective than other nature-inspired algorithms in solving complex problems [8, 9]. The original CS algorithm step sizes are derived from the Levy probability distribution function. However, some researchers have managed to improve the performance of CS by using different probability distributions to determine step sizes. *Author for correspondence The first study (in 2012) was done by Zheng and Zhou [10] who used Gauss distribution instead of Levy distribution. When applied to find global minimum values of 6 mathematics test functions, the Gauss CS performed better than the Levy CS in all cases. Furthermore, the Gauss and Levy CS algorithms were used to solve engineering design optimisation problem. The results further confirmed that Gauss CS is better than the Levy CS in terms of the higher convergence rate and the average generation was reduced from 20.15 to 13.95 for Gauss CS. The rapid growing rate of documentation in the internet space poses some challenges, especially in the documentation retrieval process. Zaw and Mon [11], solved this web document clustering by using a Gauss based CS algorithm. The algorithm was tested on 3 clusters and 300 documents. The results confirmed that the Gauss CS algorithm outperformed Levy CS algorithm. More specifically, the convergence rate of Gauss CS and Levy CS are 120 and 160 iterations, respectively. The quality of clustering was determined by a combination of precision and recall, called the F-measure where high F-measure indicated high accuracy. The Gauss

Research paper thumbnail of Practical Techniques for Securing the Internet of Things (IoT) Against Side Channel Attacks

Studies in Big Data, 2017

As a global infrastructure with the aim of enabling objects to communicate with each other, the I... more As a global infrastructure with the aim of enabling objects to communicate with each other, the Internet of Things (IoT) is being widely used and applied to many critical applications. While that is true, it should be pointed out that the introduction of IoT could also expose Information Communication and Technology (ICT) environments to new security threats such as side channel attacks due to increased openness. Side-channel analysis is known to be a serious threat to embedded devices. Side-channel analysis or power analysis attempts to expose devices cryptographic keys through the evaluation of leakage information that emanates from a physical implementation. In the work presented herein, it is shown that a skilful attacker can take advantage of side channel analysis to break a 3DES implementation on an FPGA platform. Because of the threats posed by side channel analysis to ICT systems in general and IoT in particular, counter attack mechanisms in the form of leakage reduction techniques applicable to CMOS devices are proposed and evaluated. The modelling results revealed that building CMOS devices with high-κ dielectrics or adding strain in silicon during the device fabrication could help drastically reduce leakages in CMOS devices and therefore assist in designing more effective countermeasures for side channel analysis.

Research paper thumbnail of Unsupervised learning for robust Bitcoin fraud detection

2016 Information Security for South Africa (ISSA), 2016

The rampant absorption of Bitcoin as a cryptographic currency, along with rising cybercrime activ... more The rampant absorption of Bitcoin as a cryptographic currency, along with rising cybercrime activities, warrants utilization of anomaly detection to identify potential fraud. Anomaly detection plays a pivotal role in data mining since most outlying points contain crucial information for further investigation. In the financial world which the Bitcoin network is part of by default, anomaly detection amounts to fraud detection. This paper investigates the use of trimmed k-means, that is capable of simultaneous clustering of objects and fraud detection in a multivariate setup, to detect fraudulent activity in Bitcoin transactions. The proposed approach detects more fraudulent transactions than similar studies or reports on the same dataset.

Research paper thumbnail of An electromagnetic approach to smart card instruction identification using machine learning techniques

2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), 2017

Since the first publication, side channel leakage has been widely used for the purposes of extrac... more Since the first publication, side channel leakage has been widely used for the purposes of extracting secret information, such as cryptographic keys, from embedded devices. However, in a few instances it has been utilized for extracting other information about the internal state of a computing device. In this paper, we show how to create a robust instruction-level side channel leakage profile of an embedded processor. Using the electromagnetic profile we show how to extract executed instructions from a smart card's leakage with good accuracy. In addition, we provide a comparison between several performance and recognition enhancement tools.

Research paper thumbnail of Estimating “Teaching Effectiveness” using Classification

In this paper we present the application of three classification methods on an educational databa... more In this paper we present the application of three classification methods on an educational database. Measures are collected in educational context, especially from classroom visits. Those data mining techniques process quantified acts and behaviours to estimate a global characteristic of teaching effectiveness, the teacher’s “ability to change the course of events”. A database collected by the second author (inspector-researcher) among his professional classrooms visits to more than 200 teachers is exploited. An interactive grid gathering 63 educational acts and behaviours was conceived as an observation instrument for those visits. Within WEKA environment, and with progressive enhancement of our database, the three classification methods provide rates of correct classified instances that exceed 94%.

Research paper thumbnail of Techniques applied in design optimization of parallel manipulators

There are some key advantages associated with parallel robots, which have warranted their continu... more There are some key advantages associated with parallel robots, which have warranted their continued research and wide application in both university laboratories and industry. To obtain a parallel manipulator with good properties, as customized by user specifications, the design parameters of a parallel manipulator must be optimized. The optimal design of the general parallel manipulator's kinematic parameters may be decomposed into two processes: structural synthesis and dimensional synthesis. Although both these processes will be described, the scope of this paper explores the dimensional synthesis aspect. Historical optimization methods adopted by researchers are discussed. This paper presents dimensional synthesis approaches based on performance requirements that have a potential to obtain almost all feasible design solutions that satisfy the requirements. The optimal design problem is a constrained nonlinear optimization problem with no explicit analytical expression. This ...

Research paper thumbnail of Reverse engineering smart card malware using side channel analysis with machine learning techniques

2016 IEEE International Conference on Big Data (Big Data), 2016

From inception, side channel leakage has been widely used for the purposes of extracting secret i... more From inception, side channel leakage has been widely used for the purposes of extracting secret information, such as cryptographic keys, from embedded devices. However, in a few instances it has been utilized for extracting other information about the internal state of a computing device. In this paper, we exploit side channel information to recover large parts of the Sykipot malware program executed on a smart card. We present the first methodology to recover the program code of a smart card malware by evaluating its power consumption only. Besides well-studied methods from side channel analysis, we apply a combination of dimensionality reduction techniques in the form of PCA and LDA models to compress the large amount of data generated while preserving as much variance of the original data as possible. Among feature extraction techniques, PCA and LDA are very common dimensionality reduction algorithms that have successfully been applied in many classification problems like face recognition, character recognition, speech recognition, etc. with the chief objective being to eliminate insignificant data (without losing too much information) during the pre-processing step. In addition to quantifying the potential of the created side channel based disassembler, we highlight its diverse and unique application scenarios.

Research paper thumbnail of Accessing Imbalance Learning Using Dynamic Selection Approach in Water Quality Anomaly Detection

Symmetry

Automatic anomaly detection monitoring plays a vital role in water utilities’ distribution system... more Automatic anomaly detection monitoring plays a vital role in water utilities’ distribution systems to reduce the risk posed by unclean water to consumers. One of the major problems with anomaly detection is imbalanced datasets. Dynamic selection techniques combined with ensemble models have proven to be effective for imbalanced datasets classification tasks. In this paper, water quality anomaly detection is formulated as a classification problem in the presences of class imbalance. To tackle this problem, considering the asymmetry dataset distribution between the majority and minority classes, the performance of sixteen previously proposed single and static ensemble classification methods embedded with resampling strategies are first optimised and compared. After that, six dynamic selection techniques, namely, Modified Class Rank (Rank), Local Class Accuracy (LCA), Overall-Local Accuracy (OLA), K-Nearest Oracles Eliminate (KNORA-E), K-Nearest Oracles Union (KNORA-U) and Meta-Learnin...

Research paper thumbnail of Assessing the Quality and Cleaning of a Software Project Dataset: An Experience Report

OBJECTIVE-The aim is to report upon an assessment of the impact noise has on the predictive accur... more OBJECTIVE-The aim is to report upon an assessment of the impact noise has on the predictive accuracy by comparing noise handling techniques. METHOD-We describe the process of cleaning a large software management dataset comprising initially of more than 10,000 projects. The data quality is mainly assessed through feedback from the data provider and manual inspection of the data. Three methods of noise correction (polishing, noise elimination and robust algorithms) are compared with each other assessing their accuracy. The noise detection was undertaken by using a regression tree model. RESULTS-Three noise correction methods are compared and different results in their accuracy where noted. CONCLUSIONS-The results demonstrated that polishing improves classification accuracy compared to noise elimination and robust algorithms approaches.

Research paper thumbnail of Impact of noise on credit risk prediction: Does data quality really matter?

Intelligent Data Analysis

ABSTRACT Machine learning has been successfully used for credit-evaluation decisions. Most resear... more ABSTRACT Machine learning has been successfully used for credit-evaluation decisions. Most research on machine learning assumes that the attributes of training and tests instances are not only completely specified but are also free from noise. Real world data, however, often suffer from corruptions or noise but not always known. This is the heart of information-based credit risk models. However, blindly applying such machine learning techniques to noisy financial credit risk evaluation data may fail to make very good or perfect predictions. Unfortunately, despite extensive research over the last decades, the impact of poor quality of data especially noise on the accuracy of credit risk has attracted less attention, even though it remains a significant problem for many. This paper investigates the robustness of five machine learning supervised algorithms to noisy credit risk environment. In particular, we show that when noise is added to four real-world credit risk domains, a significant and disproportionate number of total errors are contributed by class noise compared to attribute noise; thus, in the presence of noise, it is noise on the class variable that are responsible for the poor predictive accuracy of the learning concept.

Research paper thumbnail of Sources Of Spending Variation In Professional Services Among Texas Hospital Referral Regions: An Analysis Of Private Insurance Population

Research paper thumbnail of Determining Distribution Power System Loading Measurements Accuracy Using Fuzzy Logic

Procedia Manufacturing, 2016

Research paper thumbnail of A New Imputation Method for Small Software Project Data Sets

Effort prediction is a very important issue for software project management. Historical project d... more Effort prediction is a very important issue for software project management. Historical project data sets are frequently used to support such prediction. But missing data are often contained in these data sets and this makes prediction more difficult. One common ...

Research paper thumbnail of Reasoning with Noisy Software Effort Data

Applied Artificial Intelligence, Jul 14, 2014

ABSTRACT Constructing an accurate effort prediction model remains a challenge in software enginee... more ABSTRACT Constructing an accurate effort prediction model remains a challenge in software engineering. Recently, machine learning classifiers have been used successfully for software effort evaluation decisions. However, the development and validation of these classifiers (and other modes) require good quality data. Most research on machine learning assumes that the attributes of training and tests instances are not only completely specified but are also free from noise. Real-world industrial datasets, however, suffer from corruption or noise that is not always known. However, blindly applying such machine learning techniques to noisy software effort evaluation data may fail to make very good or perfect predictions, resulting in poor decisions and ineffective project management. This article investigates the effect of noisy domains on the learning accuracy of eight machine learning and statistical pattern recognition algorithms. We further derive solutions for the problem of noisy domains in software effort prediction from a probabilistic point of view. Our experimental results show that our algorithm can improve prediction for software effort corrupted by noise with reasonable and much improved accuracy.

Research paper thumbnail of First principle leakage current reduction technique for CMOS devices

2015 International Conference on Computing, Communication and Security (ICCCS), 2015

Research paper thumbnail of Leakage current minimisation and power reduction techniques using sub-threshold design

2015 International Conference on Information Society (i-Society), 2015

Research paper thumbnail of Simulation and parameter optimization of polysilicon gate biaxial strained silicon MOSFETs

2015 Fifth International Conference on Digital Information Processing and Communications (ICDIPC), 2015

Research paper thumbnail of Data Mining for Teaching and Learning Effectiveness Enhancement

It is a data mining research project that contributes to improve teaching and learning effectiven... more It is a data mining research project that contributes to improve teaching and learning effectiveness within academic institutions by exploiting intelligent algorithm on collected databases for educational knowledge extraction. These teaching and learning databases are accumulated from quantitative " measures " done through indoors classroom visits within academic institutions, online web access learners' questionnaires answers, paper written statements' analysis of academic exams in STEM education (science, technology, engineering, and mathematics), and online elementary grades seizure from written traces of learners' performances within STEM exams. Findings of these processes, elaborated by researcher's team within Johannesburg University, are disseminated through diversified publication in Tunisia and South Africa, and are the subject of multiple professional meetings, especially, teachers training sessions. The project's data mining strategy in educa...