Omar Shiba - Profile on Academia.edu (original) (raw)

Papers by Omar Shiba

Data Mining: WEKA Software ( an Overview )

Data mining (also known as knowledge discovery from databases) is the process of extraction of hi... more Data mining (also known as knowledge discovery from databases) is the process of extraction of hidden, previously unknown and potentially useful information from database. In today's world data mining have progressively become interesting and popular in terms of all application. The data mining requires huge and small amount of data sets for extraction of knowledge from it. The main aim of data mining software is to allow user to examine data. This paper is a review paper that introduces some topics related to data mining steps and also describing the steps of how to use WEKA tool for various technologies & different facility to classify the data through various algorithms.

The International Journal of the Computer, the Internet and Management, 2006

One of the most important tasks that we have to face in real world applications is the task of cl... more One of the most important tasks that we have to face in real world applications is the task of classifying particular situations and /or events as belonging to a certain class. In order to solve the classification problem, accurate classifier systems or models must be built. Several computational intelligence methodologies have been applied to construct such a classifier from particular cases or data. This paper introduces a new classification method based on slicing techniques that was proposed for procedural programming languages. The paper also discusses two of common classification algorithms that are used either in data mining or in general AI. The algorithms are: Induction of Decision Tree Algorithm (ID3) and Base Learning Algorithm (C4.5). The paper also studies the comparison between the proposed method and the two selected classification algorithms using several domains.

A Comparison of the Discretization Approach for CST and Discretization Approach for VDM

The preprocessing data is the most important task in data mining steps. Discretization process is... more The preprocessing data is the most important task in data mining steps. Discretization process is known to be one of the most important data preprocessing tasks in data mining. Presently, many discretization methods are available. Discretization methods are used to reduce the number of values for a given continuous attributes by dividing the range of the attribute into intervals. Discretization makes learning more accurate and faster. This paper will investigate the effects of discretization of continuous attributes in a dataset to the performance of Case Slicing Technique as a classification method and compared it with another classification approach which will be a generic version of the VDM algorithm" the discretized value difference metric (DVDM).

Several classification techniques are designed to discover such classifications when the classifi... more Several classification techniques are designed to discover such classifications when the classifications are unknown. The techniques are tested and evaluated, however, by matching the classifications they recover against expected classifications. Several such techniques may be compared by experimentally evaluating their performance on the same datasets. The goal of this paper is to evaluate the case slicing technique as a new classification technique. The paper achieves this goal in three steps: Firstly, it introduces the case slicing technique as a new approach. Secondly, the paper presents applications of this technique on several datasets. Lastly, it compares the proposed approach with other selected approaches such as the K-Nearest Neighbour (K-NN), Base Learning Algorithm (C4.5) and Naive Bayes classifier (NB) in solving the classification problems. The results obtained shows that the proposed approach is a promising method in solving decision-making problem.

Finding a good classification algorithm is an important component of many data mining projects. D... more Finding a good classification algorithm is an important component of many data mining projects. Data mining researchers often use classifiers to identify important classes of objects within a data repository. The goal of this paper is to improve the case classification accuracy in data mining. The paper achieves this goal by introducing a new approach of similarity-based retrieval based on program slicing techniques and is called Case Slicing Technique (CST). The proposed approach helps identify the subset of features used to compute the similarity measures needed by the classification algorithms. The idea is based on slicing cases with respect to the slicing criterion. Likewise, the paper presents the experimental results of the CST using five real-world datasets which are; Australian Credit Application (AUS), Cleveland Heart Disease (CLEV), Breast Cancer (BCO), German Credit Card (GERM) and Hepatitis Domain (HEPA). The paper compares CST with other selected approaches. The results...

Case Slicing Technique for Feature Selection

One of the problems addressed by machine learning is data classification. Finding a good classifi... more One of the problems addressed by machine learning is data classification. Finding a good classification algorithm is an important component of many data mining projects. Since the 1960s, many algorithms for data classification have been proposed. Data mining researchers often use classifiers to identify important classes of objects within a data repository.This research undertakes two main tasks. The first task is to introduce slicing technique for feature subset selection. The second task is to enhance classification accuracy based on the first task, so that it can be used to classify objects or cases based on selected relevant features only. This new approach called Case Slicing Technique (CST). Applying to this technique on classification task can result in further enhancing case classification accuracy. Case Slicing Technique (CST) helps in identifying the subset of features used in computing the similarity measures needed by classification algorithms. CST was tested on nine datasets from UCI machine learning repositories and domain theories. The maximum and minimum accuracy obtained is 99% and 96% respectively, based on the evaluation approach. The most commonly used evaluation technique is called k-cross validation technique. This technique with k = 10 has been used in this thesis to evaluate the proposed approach. CST was compared to other selected classification methods based on feature subset selection such as Induction of Decision Tree Algorithm (ID3), Base Learning Algorithm K-Nearest Nighbour Algorithm (k-NN) and NaYve Bay~sA lgorithm (NB). All these approaches are implemented with RELIEF feature selection approach. The classification accuracy obtained from the CST method is compared to other selected classification methods such as Value Difference Metric (VDM), Pre-Category Feature Importance (PCF), Cross-Category Feature Importance (CCF), Instance-Based Algorithm (IB4), Decision Tree Algorithms such as Induction of Decision Tree Algorithm (ID3) and Base Learning Algorithm (C4.5), Rough Set methods such as Standard Integer Programming (SIP) and Decision Related Integer Programming (DRIP) and Neural Network methods such as the Multilayer method.

The selection of the optimal features subset and the classification has become an important issue... more The selection of the optimal features subset and the classification has become an important issue in the data mining field. We propose a feature selection scheme based on slicing technique which was originally proposed for programming languages. The proposed approach called Case Slicing Technique (CST). Slicing means that we are interested in automatically obtaining that portion 'features' of the case responsible for specific parts of the solution of the case at hand. We show that our goal should be to eliminate the number of features by removing irrelevant once. Choosing a subset of the features may increase accuracy and reduce complexity of the acquired knowledge. Our experimental results indicate that the performance of CST as a method of feature subset selection is better than the performance of the other approaches which are RELIEF with Base Learning Algorithm (C4.5), RELIEF with K-Nearest Neighbour (K-NN), RELIEF with Induction of Decision Tree Algorithm (ID3) and RELIEF with Naïve Bayes (NB), which are mostly used in the feature selection task.

Using Code Slicing Technique to Improve Case Classification: a Suggested Case Slicing Approach

Citeseer

Traditional approaches to classification involve generating a set of rules, based on induction fr... more Traditional approaches to classification involve generating a set of rules, based on induction from training examples such as decision tree algorithms. These approaches achieve classification by remembering individual instances, as done in case-based systems, or change ...

Towards an optimal feature subset selection

Proceedings. Student Conference on Research and Development, 2003. SCORED 2003., 2003

Student Conference on Research and Development ISCOReD) 2003 Proceedings, Putraiaya, Malaysia ...... more Student Conference on Research and Development ISCOReD) 2003 Proceedings, Putraiaya, Malaysia ... Shiba 0. A., Saeed W ., Sulaiman MN, Ahmad F. and Mamat A. Faculty of Computer Sclence and Information Technology University Putra Malaysia 43400 UPM ...