Huseyin Seker | Staffordshire University (original) (raw)

Research Profile by Huseyin Seker

Research paper thumbnail of Research Profile (See http://smartdatacrew.com)

Papers by Huseyin Seker

Research paper thumbnail of CnnSound: Convolutional Neural Networks for the Classification of Environmental Sounds

2020 The 4th International Conference on Advances in Artificial Intelligence, 2020

The classification of environmental sounds (ESC) has been increasingly studied in recent years. T... more The classification of environmental sounds (ESC) has been increasingly studied in recent years. The main reason is that environmental sounds are part of our daily life, and associating them with our environment that we live in is important in several aspects as ESC is used in areas such as managing smart cities, determining location from environmental sounds, surveillance systems, machine hearing, environment monitoring. The ESC is however more difficult than other sounds because there are too many parameters that generate background noise in the ESC, which makes the sound more difficult to model and classify. The main aim of this study is therefore to develop more robust convolution neural networks architecture (CNN). For this purpose, 150 different CNN-based models were designed by changing the number of layers and values of their tuning parameters used in the layers. In order to test the accuracy of the models, the Urbansound8k environmental sound database was used. The sounds in...

Research paper thumbnail of FEPDS: A Proposal for the Extraction of Fuzzy Emerging Patterns in Data Streams

IEEE Transactions on Fuzzy Systems, 2020

Research paper thumbnail of Bioinformatics Approach to Classification of Four Classes of Organism in Relation to Their Optimal Growth Temperature

International Journal of Pharma Medicine and Biological Sciences, 2018

Research paper thumbnail of Recognition of protozoan parasites from microscopic images: Eimeria species in chickens and rabbits as a case study

Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference, Jul 1, 2017

Automated diagnosis and identification of diseases and conditions such as parasites from microsco... more Automated diagnosis and identification of diseases and conditions such as parasites from microscopic images have been mainly carried out by utilizing the object morphological characteristics. The extraction of morphometric features needs the use of highly complex techniques that require computational power. Therefore, in order to reduce this complexity, this paper presents an automated identification based on analyzing three groups of pixel-based feature sets: column features (CF), row features (RF), and the third one (CRF) obtained by merging CF and RF together. For the classification task, K-Nearest Neighbor (KNN) and Artificial Neural Networks (ANN) have been applied. The classification results have been evaluated by adapting a 5-fold cross validation. Additionally, a robust sub-set of the features has been selected by Relieff feature selection method to prevent overfitting, which in turn has improved the final results. Two microscopic image slide databases of a type of protozoan...

Research paper thumbnail of Structural classification of protein sequences based on signal processing and support vector machines

Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference, Aug 1, 2016

The function of any protein depends directly on its secondary and tertiary structure. Proteins ca... more The function of any protein depends directly on its secondary and tertiary structure. Proteins can fold into a three-dimensional shape, which is primarily depended on the arrangement of amino acids in the primary structure. In recent years, with the explosive sequencing of proteins, it is unfeasible to perform detailed experimental studies, as these methodologies are very expensive and time consuming. This leaves the structure of the majority of currently available protein sequences unknown. In this paper, a predictive model is therefore presented for the classification of protein sequence's secondary structures, namely alpha helix and beta sheet. The proteins used throughout this study were collected from the Structural Classification of Proteinsextended (SCOPe) database, which contains manually curated information from proteins with known structure. Two sets of proteins are used for all alpha and all beta protein sequences. The first set comprise of sequences with less than 40...

Research paper thumbnail of Discovering Differences in Gender-related Skeletal Muscle Aging through the Majority Voting-Based Identification of Differently Expressed Genes

International Journal on Bioinformatics & Biosciences, 2016

Research paper thumbnail of Reconfiguration-based implementation of SVM classifier on FPGA for Classifying Microarray data

Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference, 2013

Classifying Microarray data, which are of high dimensional nature, requires high computational po... more Classifying Microarray data, which are of high dimensional nature, requires high computational power. Support Vector Machines-based classifier (SVM) is among the most common and successful classifiers used in the analysis of Microarray data but also requires high computational power due to its complex mathematical architecture. Implementing SVM on hardware exploits the parallelism available within the algorithm kernels to accelerate the classification of Microarray data. In this work, a flexible, dynamically and partially reconfigurable implementation of the SVM classifier on Field Programmable Gate Array (FPGA) is presented. The SVM architecture achieved up to 85× speed-up over equivalent general purpose processor (GPP) showing the capability of FPGAs in enhancing the performance of SVM-based analysis of Microarray data as well as future bioinformatics applications.

Research paper thumbnail of Classification of Influenza Hemagglutinin Protein Sequences using Convolutional Neural Networks

2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 2021

Research paper thumbnail of Overlapping Clusters and Support Vector Machines Based Interval Type-2 Fuzzy System for the Prediction of Peptide Binding Affinity

IEEE Access

In the post-genome era, it is becoming more complex to process high dimensional, low-instance ava... more In the post-genome era, it is becoming more complex to process high dimensional, low-instance available, and nonlinear biological datasets. This paper aims to address these characteristics as they have adverse effects on the performance of predictive models in bioinformatics. In this paper, an interval type-2 Takagi Sugeno fuzzy predictive model is proposed in order to manage high-dimensionality and nonlinearity of such datasets which is the common feature in bioinformatics. A new clustering framework is proposed for this purpose to simplify antecedent operations for an interval type-2 fuzzy system. This new clustering framework is based on overlapping regions between the clusters. The cluster analysis of partitions and statistical information derived from them has identified the upper and lower membership functions forming the premise part. This is further enhanced by adapting the regression version of support vector machines in the consequent part. The proposed method is used in experiments to quantitatively predict affinities of peptide bindings to biomolecules. This case study imposes a challenge in post-genome studies and remains an open problem due to the complexity of the biological system, diversity of peptides, and curse of dimensionality of amino acid index representation characterizing the peptides. Utilizing four different peptide binding affinity datasets, the proposed method resulted in better generalization ability for all of them yielding an improved prediction accuracy of up to 58.2% on unseen peptides in comparison with the predictive methods presented in the literature. Source code of the algorithm is available at https://github.com/sekerbigdatalab. INDEX TERMS Interval type-2 fuzzy systems, support vector regression, overlapping clusters, peptide binding affinity, clustering, high-dimensionality.

Research paper thumbnail of Editorial Message: Special Issue on Efficient Fuzzy Systems for Mining Large Scale, Imprecise, Uncertain and Vague Data

International Journal of Fuzzy Systems

Research paper thumbnail of Binding affinity prediction of S. cerevisiae 14-3-3 and GYF peptide-recognition domains using support vector regression

2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2016

Proteins interact with other proteins and bio-molecules to carry out biological processes in a ce... more Proteins interact with other proteins and bio-molecules to carry out biological processes in a cell. Computational models help understanding complex biochemical processes that happens throughout the life of a cell. Domain-mediated protein interaction to peptides one such complex problem in bioinformatics that requires computational predictive models to identify meaningful bindings. In this study, domain-peptide binding affinity prediction models are proposed based on support vector regression. Proposed models are applied to yeast bmh 14-3-3 and syh GYF peptide-recognition domains. The cross validated results of the domain-peptide binding affinity data sets show that predictive performance of the support vector based models are efficient.

Research paper thumbnail of The quantitative prediction of HLA-B*2705 peptide binding affinities using Support Vector Regression to gain insights into its role for the Spondyloarthropathies

2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2015

Computational methods are increasingly utilised in many immunoinformatics problems such as the pr... more Computational methods are increasingly utilised in many immunoinformatics problems such as the prediction of binding affinity of peptides. The peptides could provide valuable insight into the drug design and development such as vaccines. Moreover, they can be used to diagnose diseases. The presence of human class I MHC allele HLA-B*2705 is one of the strong hypothesis that would lead spondyloarthropathies. In this paper, Support Vector Regression is used in order to predict binding affinity of peptides with the aid of experimentally determined peptide-MHC binding affinities of 222 peptides to HLA-B*2705 to get more insight into this problematic disease. The results yield a high correlation coefficient as much as 0.65 and the SVR-based predictive models can be considered as a useful tool in order to predict the binding affinities for newly discovered peptides.

Research paper thumbnail of Quantitative prediction of peptide binding affinity by using hybrid fuzzy support vector regression

Applied Soft Computing, 2016

Support Vector Machines has a wide use for the prediction problems in life sciences. It has been ... more Support Vector Machines has a wide use for the prediction problems in life sciences. It has been shown to offer more generalisation ability in input-output mapping. However, the performance of predictive models is often negatively influenced due to the complex, high-dimensional, and non-linear nature of the post-genome data. Soft computing methods can be used to model such non-linear systems. Fuzzy systems are one of the widely used methods of soft computing that model uncertainties. It is formed of interpretable rules aiding one to gain insight into applied model. This study is therefore concerned to provide more interpretable and efficient biological model with the development of a hybrid method that integrates the fuzzy system and support vector regression. In order to demonstrate the robustness of this new hybrid method, it is applied to the prediction of peptide binding affinity being one of the most challenging problems in the post-genomic era due to diversity in peptide families and complexity and high-dimensionality in the characteristic features of the peptides. Having used four different case studies, this hybrid predictive model has yielded the highest predictive power in all the four cases and achieved an improvement of as much as 34% compared to the results presented in the literature. Availability: Matlab scripts are available at https://github.com/sekerbigdatalab/tsksvr.

Research paper thumbnail of Investigation into the role of sequence-driven-features for prediction of protein structural classes

2008 8th Ieee International Conference on Bioinformatics and Bioengineering, Oct 1, 2008

There have been a number of techniques developed for the prediction of protein structural classes... more There have been a number of techniques developed for the prediction of protein structural classes, however, they show various degrees of accuracies over different assessment procedures and, in particular, the role of sequence-drivenfeatures (SDF) not rigorously investigated. Therefore, the aim of this study is to carry out the largest comprehensive and consistent investigation on approximately 1500 protein sequence-driven-features that form 65 subsets in order to develop a robust predictive model and identify how well these feature(s) are at predicting protein structural classes. For evaluation of the features, two high quality 40% (or less) homology datasets that contain over 7000 protein sequences were extracted from proteomic databases. As a predictive technique, an optimum K-Nearest Neighbour Classifier, namely multiple-K-NN (MKNN) was developed, which not only records MKNN results, but also a predictive accuracy for each K nearest neighbourhood for K=l to 11. In order to make the analyses consistent, three different cross-validation test procedures, 10-fold, leave-one-out and independent set, were used for all data sets and methods implemented. Over 5000 individual predictive results obtained, no firm consensus found on which features are highly associated with protein structural classes. However, interestingly, the best subsets of the features are found to be traditional AAC (48.62%) for 10-fold and (50.09 %) for LOO, and dipeptide composition (85.91 %) for independent set. The results appear to suggest that the AAC features are one of the best two subsets over 65 different subsets. Interestingly, in particular, with pseudo-amino-acid composition (PseAAC), unlike other research results presented in the literature, this investigation finds that there is no statistical improvement obtained from the sequence-order effect aspect (lamda) of PseAAC, which averaged 39.15%. The results also suggest that most of its predictive power comes from the AAC part that averaged at 46.84 % , and the overall average predictive accuracy for PseAAC is 47.86%. This information appears to suggest that this feature set, which is claimed to better capture sequence order, yields almost no improvement and can be considered a redundant and noisy feature set. It should be noted that overall outcome of this comprehensive study sheds light not only in structural class prediction, but also other proteomic studies. I. INTRODUCTION P ROTEIN prediction is one of the most difficult and important fields within proteomics, mainly because the thousands of conformational changes in a protein makes it difficult to predict how it will fold into its secondary or

Research paper thumbnail of Investigation into Effectiveness of Rough Sets in Prediction of Enzyme and Protein Structure Classes

Proceedings of the 2009 International Joint Conference on Neural Networks, 2009

Among various methods in protein function prediction, rough set has recently been applied to pred... more Among various methods in protein function prediction, rough set has recently been applied to prediction of protein structural classes. However, this was a blind application on a single but small data set of high homology, which did not consider investigation of various parameters in the rough set. The aim of this paper is therefore to study rough set in the

Research paper thumbnail of Comparison of unsupervised feature selection methods for high-dimensional regression problems in prediction of peptide binding affinity

Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference, Aug 1, 2015

Identification of robust set of predictive features is one of the most important steps in the con... more Identification of robust set of predictive features is one of the most important steps in the construction of clustering, classification and regression models from many thousands of features. Although there have been various attempts to select predictive feature sets from high-dimensional data sets in classification and clustering, there is a limited attempt to study it in regression problems. As semi-supervised and supervised feature selection methods tend to identify noisy features in addition to discriminative variables, unsupervised feature selection methods (USFSMs) are generally regarded as more unbiased approach. Therefore, in this study, along with the entire feature set, four different USFSMs are considered for the quantitative prediction of peptide binding affinities being one of the most challenging post-genome regression problems of very high-dimension comparted to extremely small size of samples. As USFSMs are independent of any predictive method, support vector regress...

Research paper thumbnail of A Quantisation of Cognitive Learning Process by Computer Graphics-Games: Towards More Efficient Learning Models

OALib, 2016

With the latest developments in computer technologies and artificial intelligence (AI) techniques... more With the latest developments in computer technologies and artificial intelligence (AI) techniques, more opportunities of cognitive data acquisition and stimulation via game-based systems have become available for computer scientists and psychologists. This may lead to more efficient cognitive learning model developments to be used in different fields of cognitive psychology than in the past. The increasing popularity of computer games among a broad range of age groups leads scientists and experts to seek game domain solutions to cognitive based learning abnormalities, especially for younger age groups and children. One of the major advantages of computer graphics and using game-based techniques over the traditional face-to-face therapies is that individuals, especially children immerse in the game's virtual environment and consequently feel more open to share their cognitive behavioural characteristics naturally. The aim of this work is to investigate the effects of graphical agents on cognitive behaviours to generate more efficient cognitive models.

Research paper thumbnail of Inference of nonlinear gene regulatory networks through optimized ensemble of support vector regression and dynamic Bayesian networks

2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2015

Comprehensive understanding of gene regulatory networks (GRNs) is a major challenge in systems bi... more Comprehensive understanding of gene regulatory networks (GRNs) is a major challenge in systems biology. Most methods for modeling and inferring the dynamics of GRNs, such as those based on state space models, vector autoregressive models and G1DBN algorithm, assume linear dependencies among genes. However, this strong assumption does not make for true representation of time-course relationships across the genes, which are inherently nonlinear. Nonlinear modeling methods such as the S-systems and causal structure identification (CSI) have been proposed, but are known to be statistically inefficient and analytically intractable in high dimensions. To overcome these limitations, we propose an optimized ensemble approach based on support vector regression (SVR) and dynamic Bayesian networks (DBNs). The method called SVR-DBN, uses nonlinear kernels of the SVR to infer the temporal relationships among genes within the DBN framework. The two-stage ensemble is further improved by SVR parameter optimization using Particle Swarm Optimization. Results on eight insilico-generated datasets, and two real world datasets of Drosophila Melanogaster and Escherichia Coli, show that our method outperformed the G1DBN algorithm by a total average accuracy of 12%. We further applied our method to model the time-course relationships of ovarian carcinoma. From our results, four hub genes were discovered. Stratified analysis further showed that the expression levels Prostrate differentiation factor and BTG family member 2 genes, were significantly increased by the cisplatin and oxaliplatin platinum drugs; while expression levels of Polo-like kinase and Cyclin B1 genes, were both decreased by the platinum drugs. These hub genes might be potential biomarkers for ovarian carcinoma.

Research paper thumbnail of Automated identification of chicken eimeria species from microscopic images

2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE), 2015

ABSTRACT

Research paper thumbnail of CnnSound: Convolutional Neural Networks for the Classification of Environmental Sounds

2020 The 4th International Conference on Advances in Artificial Intelligence, 2020

The classification of environmental sounds (ESC) has been increasingly studied in recent years. T... more The classification of environmental sounds (ESC) has been increasingly studied in recent years. The main reason is that environmental sounds are part of our daily life, and associating them with our environment that we live in is important in several aspects as ESC is used in areas such as managing smart cities, determining location from environmental sounds, surveillance systems, machine hearing, environment monitoring. The ESC is however more difficult than other sounds because there are too many parameters that generate background noise in the ESC, which makes the sound more difficult to model and classify. The main aim of this study is therefore to develop more robust convolution neural networks architecture (CNN). For this purpose, 150 different CNN-based models were designed by changing the number of layers and values of their tuning parameters used in the layers. In order to test the accuracy of the models, the Urbansound8k environmental sound database was used. The sounds in...

Research paper thumbnail of FEPDS: A Proposal for the Extraction of Fuzzy Emerging Patterns in Data Streams

IEEE Transactions on Fuzzy Systems, 2020

Research paper thumbnail of Bioinformatics Approach to Classification of Four Classes of Organism in Relation to Their Optimal Growth Temperature

International Journal of Pharma Medicine and Biological Sciences, 2018

Research paper thumbnail of Recognition of protozoan parasites from microscopic images: Eimeria species in chickens and rabbits as a case study

Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference, Jul 1, 2017

Automated diagnosis and identification of diseases and conditions such as parasites from microsco... more Automated diagnosis and identification of diseases and conditions such as parasites from microscopic images have been mainly carried out by utilizing the object morphological characteristics. The extraction of morphometric features needs the use of highly complex techniques that require computational power. Therefore, in order to reduce this complexity, this paper presents an automated identification based on analyzing three groups of pixel-based feature sets: column features (CF), row features (RF), and the third one (CRF) obtained by merging CF and RF together. For the classification task, K-Nearest Neighbor (KNN) and Artificial Neural Networks (ANN) have been applied. The classification results have been evaluated by adapting a 5-fold cross validation. Additionally, a robust sub-set of the features has been selected by Relieff feature selection method to prevent overfitting, which in turn has improved the final results. Two microscopic image slide databases of a type of protozoan...

Research paper thumbnail of Structural classification of protein sequences based on signal processing and support vector machines

Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference, Aug 1, 2016

The function of any protein depends directly on its secondary and tertiary structure. Proteins ca... more The function of any protein depends directly on its secondary and tertiary structure. Proteins can fold into a three-dimensional shape, which is primarily depended on the arrangement of amino acids in the primary structure. In recent years, with the explosive sequencing of proteins, it is unfeasible to perform detailed experimental studies, as these methodologies are very expensive and time consuming. This leaves the structure of the majority of currently available protein sequences unknown. In this paper, a predictive model is therefore presented for the classification of protein sequence's secondary structures, namely alpha helix and beta sheet. The proteins used throughout this study were collected from the Structural Classification of Proteinsextended (SCOPe) database, which contains manually curated information from proteins with known structure. Two sets of proteins are used for all alpha and all beta protein sequences. The first set comprise of sequences with less than 40...

Research paper thumbnail of Discovering Differences in Gender-related Skeletal Muscle Aging through the Majority Voting-Based Identification of Differently Expressed Genes

International Journal on Bioinformatics & Biosciences, 2016

Research paper thumbnail of Reconfiguration-based implementation of SVM classifier on FPGA for Classifying Microarray data

Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference, 2013

Classifying Microarray data, which are of high dimensional nature, requires high computational po... more Classifying Microarray data, which are of high dimensional nature, requires high computational power. Support Vector Machines-based classifier (SVM) is among the most common and successful classifiers used in the analysis of Microarray data but also requires high computational power due to its complex mathematical architecture. Implementing SVM on hardware exploits the parallelism available within the algorithm kernels to accelerate the classification of Microarray data. In this work, a flexible, dynamically and partially reconfigurable implementation of the SVM classifier on Field Programmable Gate Array (FPGA) is presented. The SVM architecture achieved up to 85× speed-up over equivalent general purpose processor (GPP) showing the capability of FPGAs in enhancing the performance of SVM-based analysis of Microarray data as well as future bioinformatics applications.

Research paper thumbnail of Classification of Influenza Hemagglutinin Protein Sequences using Convolutional Neural Networks

2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 2021

Research paper thumbnail of Overlapping Clusters and Support Vector Machines Based Interval Type-2 Fuzzy System for the Prediction of Peptide Binding Affinity

IEEE Access

In the post-genome era, it is becoming more complex to process high dimensional, low-instance ava... more In the post-genome era, it is becoming more complex to process high dimensional, low-instance available, and nonlinear biological datasets. This paper aims to address these characteristics as they have adverse effects on the performance of predictive models in bioinformatics. In this paper, an interval type-2 Takagi Sugeno fuzzy predictive model is proposed in order to manage high-dimensionality and nonlinearity of such datasets which is the common feature in bioinformatics. A new clustering framework is proposed for this purpose to simplify antecedent operations for an interval type-2 fuzzy system. This new clustering framework is based on overlapping regions between the clusters. The cluster analysis of partitions and statistical information derived from them has identified the upper and lower membership functions forming the premise part. This is further enhanced by adapting the regression version of support vector machines in the consequent part. The proposed method is used in experiments to quantitatively predict affinities of peptide bindings to biomolecules. This case study imposes a challenge in post-genome studies and remains an open problem due to the complexity of the biological system, diversity of peptides, and curse of dimensionality of amino acid index representation characterizing the peptides. Utilizing four different peptide binding affinity datasets, the proposed method resulted in better generalization ability for all of them yielding an improved prediction accuracy of up to 58.2% on unseen peptides in comparison with the predictive methods presented in the literature. Source code of the algorithm is available at https://github.com/sekerbigdatalab. INDEX TERMS Interval type-2 fuzzy systems, support vector regression, overlapping clusters, peptide binding affinity, clustering, high-dimensionality.

Research paper thumbnail of Editorial Message: Special Issue on Efficient Fuzzy Systems for Mining Large Scale, Imprecise, Uncertain and Vague Data

International Journal of Fuzzy Systems

Research paper thumbnail of Binding affinity prediction of S. cerevisiae 14-3-3 and GYF peptide-recognition domains using support vector regression

2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2016

Proteins interact with other proteins and bio-molecules to carry out biological processes in a ce... more Proteins interact with other proteins and bio-molecules to carry out biological processes in a cell. Computational models help understanding complex biochemical processes that happens throughout the life of a cell. Domain-mediated protein interaction to peptides one such complex problem in bioinformatics that requires computational predictive models to identify meaningful bindings. In this study, domain-peptide binding affinity prediction models are proposed based on support vector regression. Proposed models are applied to yeast bmh 14-3-3 and syh GYF peptide-recognition domains. The cross validated results of the domain-peptide binding affinity data sets show that predictive performance of the support vector based models are efficient.

Research paper thumbnail of The quantitative prediction of HLA-B*2705 peptide binding affinities using Support Vector Regression to gain insights into its role for the Spondyloarthropathies

2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2015

Computational methods are increasingly utilised in many immunoinformatics problems such as the pr... more Computational methods are increasingly utilised in many immunoinformatics problems such as the prediction of binding affinity of peptides. The peptides could provide valuable insight into the drug design and development such as vaccines. Moreover, they can be used to diagnose diseases. The presence of human class I MHC allele HLA-B*2705 is one of the strong hypothesis that would lead spondyloarthropathies. In this paper, Support Vector Regression is used in order to predict binding affinity of peptides with the aid of experimentally determined peptide-MHC binding affinities of 222 peptides to HLA-B*2705 to get more insight into this problematic disease. The results yield a high correlation coefficient as much as 0.65 and the SVR-based predictive models can be considered as a useful tool in order to predict the binding affinities for newly discovered peptides.

Research paper thumbnail of Quantitative prediction of peptide binding affinity by using hybrid fuzzy support vector regression

Applied Soft Computing, 2016

Support Vector Machines has a wide use for the prediction problems in life sciences. It has been ... more Support Vector Machines has a wide use for the prediction problems in life sciences. It has been shown to offer more generalisation ability in input-output mapping. However, the performance of predictive models is often negatively influenced due to the complex, high-dimensional, and non-linear nature of the post-genome data. Soft computing methods can be used to model such non-linear systems. Fuzzy systems are one of the widely used methods of soft computing that model uncertainties. It is formed of interpretable rules aiding one to gain insight into applied model. This study is therefore concerned to provide more interpretable and efficient biological model with the development of a hybrid method that integrates the fuzzy system and support vector regression. In order to demonstrate the robustness of this new hybrid method, it is applied to the prediction of peptide binding affinity being one of the most challenging problems in the post-genomic era due to diversity in peptide families and complexity and high-dimensionality in the characteristic features of the peptides. Having used four different case studies, this hybrid predictive model has yielded the highest predictive power in all the four cases and achieved an improvement of as much as 34% compared to the results presented in the literature. Availability: Matlab scripts are available at https://github.com/sekerbigdatalab/tsksvr.

Research paper thumbnail of Investigation into the role of sequence-driven-features for prediction of protein structural classes

2008 8th Ieee International Conference on Bioinformatics and Bioengineering, Oct 1, 2008

There have been a number of techniques developed for the prediction of protein structural classes... more There have been a number of techniques developed for the prediction of protein structural classes, however, they show various degrees of accuracies over different assessment procedures and, in particular, the role of sequence-drivenfeatures (SDF) not rigorously investigated. Therefore, the aim of this study is to carry out the largest comprehensive and consistent investigation on approximately 1500 protein sequence-driven-features that form 65 subsets in order to develop a robust predictive model and identify how well these feature(s) are at predicting protein structural classes. For evaluation of the features, two high quality 40% (or less) homology datasets that contain over 7000 protein sequences were extracted from proteomic databases. As a predictive technique, an optimum K-Nearest Neighbour Classifier, namely multiple-K-NN (MKNN) was developed, which not only records MKNN results, but also a predictive accuracy for each K nearest neighbourhood for K=l to 11. In order to make the analyses consistent, three different cross-validation test procedures, 10-fold, leave-one-out and independent set, were used for all data sets and methods implemented. Over 5000 individual predictive results obtained, no firm consensus found on which features are highly associated with protein structural classes. However, interestingly, the best subsets of the features are found to be traditional AAC (48.62%) for 10-fold and (50.09 %) for LOO, and dipeptide composition (85.91 %) for independent set. The results appear to suggest that the AAC features are one of the best two subsets over 65 different subsets. Interestingly, in particular, with pseudo-amino-acid composition (PseAAC), unlike other research results presented in the literature, this investigation finds that there is no statistical improvement obtained from the sequence-order effect aspect (lamda) of PseAAC, which averaged 39.15%. The results also suggest that most of its predictive power comes from the AAC part that averaged at 46.84 % , and the overall average predictive accuracy for PseAAC is 47.86%. This information appears to suggest that this feature set, which is claimed to better capture sequence order, yields almost no improvement and can be considered a redundant and noisy feature set. It should be noted that overall outcome of this comprehensive study sheds light not only in structural class prediction, but also other proteomic studies. I. INTRODUCTION P ROTEIN prediction is one of the most difficult and important fields within proteomics, mainly because the thousands of conformational changes in a protein makes it difficult to predict how it will fold into its secondary or

Research paper thumbnail of Investigation into Effectiveness of Rough Sets in Prediction of Enzyme and Protein Structure Classes

Proceedings of the 2009 International Joint Conference on Neural Networks, 2009

Among various methods in protein function prediction, rough set has recently been applied to pred... more Among various methods in protein function prediction, rough set has recently been applied to prediction of protein structural classes. However, this was a blind application on a single but small data set of high homology, which did not consider investigation of various parameters in the rough set. The aim of this paper is therefore to study rough set in the

Research paper thumbnail of Comparison of unsupervised feature selection methods for high-dimensional regression problems in prediction of peptide binding affinity

Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference, Aug 1, 2015

Identification of robust set of predictive features is one of the most important steps in the con... more Identification of robust set of predictive features is one of the most important steps in the construction of clustering, classification and regression models from many thousands of features. Although there have been various attempts to select predictive feature sets from high-dimensional data sets in classification and clustering, there is a limited attempt to study it in regression problems. As semi-supervised and supervised feature selection methods tend to identify noisy features in addition to discriminative variables, unsupervised feature selection methods (USFSMs) are generally regarded as more unbiased approach. Therefore, in this study, along with the entire feature set, four different USFSMs are considered for the quantitative prediction of peptide binding affinities being one of the most challenging post-genome regression problems of very high-dimension comparted to extremely small size of samples. As USFSMs are independent of any predictive method, support vector regress...

Research paper thumbnail of A Quantisation of Cognitive Learning Process by Computer Graphics-Games: Towards More Efficient Learning Models

OALib, 2016

With the latest developments in computer technologies and artificial intelligence (AI) techniques... more With the latest developments in computer technologies and artificial intelligence (AI) techniques, more opportunities of cognitive data acquisition and stimulation via game-based systems have become available for computer scientists and psychologists. This may lead to more efficient cognitive learning model developments to be used in different fields of cognitive psychology than in the past. The increasing popularity of computer games among a broad range of age groups leads scientists and experts to seek game domain solutions to cognitive based learning abnormalities, especially for younger age groups and children. One of the major advantages of computer graphics and using game-based techniques over the traditional face-to-face therapies is that individuals, especially children immerse in the game's virtual environment and consequently feel more open to share their cognitive behavioural characteristics naturally. The aim of this work is to investigate the effects of graphical agents on cognitive behaviours to generate more efficient cognitive models.

Research paper thumbnail of Inference of nonlinear gene regulatory networks through optimized ensemble of support vector regression and dynamic Bayesian networks

2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2015

Comprehensive understanding of gene regulatory networks (GRNs) is a major challenge in systems bi... more Comprehensive understanding of gene regulatory networks (GRNs) is a major challenge in systems biology. Most methods for modeling and inferring the dynamics of GRNs, such as those based on state space models, vector autoregressive models and G1DBN algorithm, assume linear dependencies among genes. However, this strong assumption does not make for true representation of time-course relationships across the genes, which are inherently nonlinear. Nonlinear modeling methods such as the S-systems and causal structure identification (CSI) have been proposed, but are known to be statistically inefficient and analytically intractable in high dimensions. To overcome these limitations, we propose an optimized ensemble approach based on support vector regression (SVR) and dynamic Bayesian networks (DBNs). The method called SVR-DBN, uses nonlinear kernels of the SVR to infer the temporal relationships among genes within the DBN framework. The two-stage ensemble is further improved by SVR parameter optimization using Particle Swarm Optimization. Results on eight insilico-generated datasets, and two real world datasets of Drosophila Melanogaster and Escherichia Coli, show that our method outperformed the G1DBN algorithm by a total average accuracy of 12%. We further applied our method to model the time-course relationships of ovarian carcinoma. From our results, four hub genes were discovered. Stratified analysis further showed that the expression levels Prostrate differentiation factor and BTG family member 2 genes, were significantly increased by the cisplatin and oxaliplatin platinum drugs; while expression levels of Polo-like kinase and Cyclin B1 genes, were both decreased by the platinum drugs. These hub genes might be potential biomarkers for ovarian carcinoma.

Research paper thumbnail of Automated identification of chicken eimeria species from microscopic images

2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE), 2015

ABSTRACT

Research paper thumbnail of Pigment network-based skin cancer detection

2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2015

Diagnosing skin cancer in its early stages is a challenging task for dermatologists given the fac... more Diagnosing skin cancer in its early stages is a challenging task for dermatologists given the fact that the chance for a patient's survival is higher and hence the process of analyzing skin images and making decisions should be time efficient. Therefore, diagnosing the disease using automated and computerized systems has nowadays become essential. This paper proposes an efficient system for skin cancer detection on dermoscopic images. It has been shown that the statistical characteristics of the pigment network, extracted from the dermoscopic image, could be used as efficient discriminating features for cancer detection. The proposed system has been assessed on a dataset of 200 dermoscopic images of the `Hospital Pedro Hispano' [1] and the results of cross-validation have shown high detection accuracy.