Nung Kion Lee | Universiti Malaysia Sarawak (UNIMAS) (original) (raw)
Papers by Nung Kion Lee
Conference on Information Technology in Asia, 2003
Neural classifiers have been widely used in many application areas. This paper describes generali... more Neural classifiers have been widely used in many application areas. This paper describes generalized neural classifier based on the radial basis function network. The contributions of this work are: i) improvement on the standard radial basis function network architecture, ii) proposed a new cost function for classification, iii) hidden units feature subset selection algorithm, and iv) optimizing the neural classifier using the genetic algorithm with a new cost function. Comparative studies on the proposed neural classifier on protein classification problem are given.
Epigenetic signatures such as chromatin and histone modification marks are prominent indicator of... more Epigenetic signatures such as chromatin and histone modification marks are prominent indicator of enhancer motif regions. While many works have been using k-mer as feature of epigenetic sequence, no comprehensive studies has been done to compare and contrast how the different choices of k-mers feature parameter affect machine learning algorithm performances. Furthermore, it is not known how effective is the k-mer feature for representing different epigenetic marks-H3K4me1, DHS and p300. In this paper, a comparative study is performed to determine the accuracy, sensitivity and specificity of using k-mer feature for predicting these marks. Our results found that, classifier perform better when the k-mer length is between 4 to 6. Short k-mer length has poor accuracy, sensitivity and specificity. The k-mer feature works best for DHS sequences and has low accuracy for H3K4me1 sequences prediction. The k-mer feature is also performed poorly on specificity of DHS sequences. It can be concluded that, there are still much room for improvement of identifying better feature for representing epigenetic feature for enhancer prediction.
Human factor theories are always being neglected especially in the design of biological tools. Th... more Human factor theories are always being neglected especially in the design of biological tools. This problem was found in sequence logo which is used to visualize the conservation characteristics of the biological sequence motifs. Previous studies have found some limitations in the graphical representation which cause biasness and misinterpretation of the results in sequence logo. Therefore, the aim of this study is to investigate on the visual attributes performance in helping viewers to perceive and interpret the information based the preattentive theories and Gestalt principles of perception. A survey was carried out to gather user’s opinion. The results showed some limitations in the use of colour, negative space, size and arrangement of the nucleotides and the lack of information and interactivity in the sequence logo. Therefore, improvements in standardizing the colour, graphical representation of the nucleotides and interactivity of the tool are needed to solve the problems of...
Journal of Biomolecular Structure and Dynamics, 2022
The Human Immunodeficiency Virus (HIV) infection is a global pandemic that has claimed 33 million... more The Human Immunodeficiency Virus (HIV) infection is a global pandemic that has claimed 33 million lives to-date. One of the most efficacious treatments for naïve or pretreated HIV patients is the HIV integrase strand transfer inhibitors (INSTIs). However, given that HIV treatment is life-long, the emergence of HIV strains resistant to INSTIs is an imminent challenge. In this work, we showed two best regression QSAR models that were constructed using a boosted Random Forest algorithm (r2 = 0.998, q210CV = 0.721, q2external_test = 0.754) and a boosted K* algorithm (r2 = 0.987, q210CV = 0.721, q2external_test = 0.758) to predict the pIC50 values of INSTIs. Subsequently, the regression QSAR models were deployed against the Drugbank database for drug repositioning. The top-ranked compounds were further evaluated for their target engagement activity using molecular docking studies and accelerated Molecular Dynamics simulation. Lastly, their potential as INSTIs were also evaluated from our literature search. Our study offers the first example of a large-scale regression QSAR modelling effort for discovering highly active INSTIs to combat HIV infection.Communicated by Ramaswamy H. Sarma.
The Human Immunodeficiency Virus (HIV) infection is a global pandemic that has claimed 33 million... more The Human Immunodeficiency Virus (HIV) infection is a global pandemic that has claimed 33 million lives to date. One of the most efficacious treatment for naïve or pre-treated HIV patients is with the HIV integrase strand transfer inhibitors (INSTIs). However, given that HIV treatment is lifelong, the emergence of HIV-1 strains resistant to INSTIs is an imminent challenge. In this work, we showed two best regression QSAR models that were constructed using a boosted Random Forest algorithm (r2 = 0.998, q210CV = 0.721, q2external_test = 0.754) and a boosted K* algorithm (r2 = 0.987, q210CV = 0.721, q2external_test = 0.758) to predict the pIC50 values of INSTIs. Subsequently, the regression QSAR models were deployed against the Drugbank database for drug repositioning. The top ranked compounds were further evaluated for their target engagement activity using molecular docking studies and their potential as INSTIs evaluated from our literature search. Our study offers the first example ...
Proceedings of the 7th International Conference on Computational Systems-Biology and Bioinformatics, 2016
Unravelling gene expression has become a critical procedure in bioinformatics world today and req... more Unravelling gene expression has become a critical procedure in bioinformatics world today and required continuous efforts to form a complete picture of enhancers. Enhancers are explicit patterns of gene expression that bound by activators to stimulate transcription. It could reside in upstream or downstream thousands of base pairs away without any fixed position. Therefore, the identification task of enhancers is extremely challenging. The inclusion of gaps in motif identification improved the overall accuracy and sensitivity, however, this feature is not fully utilised in deep learning method yet. Deep learning, is a powerful machine learning technique that has been actively used in image recognition and this technique has begun to shed light in bioinformatics. The expressiveness of deep learning enables higher feature learning from lower level ones. As a result, an integration of gapped motif feature representation (GMFR) and deep learning approach called deep convolutional neural...
Journal of Telecommunication, Electronic and Computer Engineering, 2017
Job recruitment portals become the main recruitment channel in most of the organizations nowadays... more Job recruitment portals become the main recruitment channel in most of the organizations nowadays because they offer many advantages to recruiters and job applicants. An outstanding recruitment system should be able to filter and recommend the best potential candidates for a job vacancy so that it can avoid hiring of inappropriate individuals or miss out the good candidates. Nevertheless, most of the existing job portals do not cover the unskilled job sectors. Matching unskilled jobs to applicants is challenging because the selection criteria can be very subjective and difficult to specify in terms of professional qualifications. In this paper, Kansei Engineering (KE) Model is applied to find the most prominent personality traits that are preferred by employers in different unskilled job categories in Malaysia. We have identified most prominent 20 Kansei words related to personality traits that are important to six main industries of unskilled workers. The six unskilled sectors invo...
2016 IEEE 16th International Conference on Bioinformatics and Bioengineering (BIBE), 2016
This paper proposes an ensemble approach based on data partitioning for large-scale DNA motif ana... more This paper proposes an ensemble approach based on data partitioning for large-scale DNA motif analysis. Motif prediction using genome-scale dataset is challenging due to high time and space complexity. Existing ensemble approaches, while demonstrated improve performances, are only applicable to small datasets. Our approach called ENSPART first partitions the input dataset into non-overlapping subsets which serve as input to multiple distinct motif prediction tools. It is assumed that the core motifs of a transcription factor protein exists in all data subsets. We employed seven motif prediction tools to obtain initial candidate motifs and they are merged according to their sequence content similarity. An alignment-free method is used to establish motif similarity. A novel motifs merging method is proposed to merge similar motifs obtained by tools in different data partitions. Ten genome-wide ChIP datasets are collected for evaluation. We compare our approach with MEME-ChIP and obtai...
The medical information system plays an important role for either the flow of business transactio... more The medical information system plays an important role for either the flow of business transaction or record medication histories. In the Department of Nuclear Medicine of Sarawak General Hospital, the officers and doctors handle the patient information manually during registration and diagnosis process. Unfortunately, this manual documented system has led to severity of the problems such as medication history error, misplacement of drug particulars, delay release of reports and insecurity to the patient's record. Thus, there is a demand for a web-based information system so that the doctors and officers are able to sieve the benefits of the accuracy confidentiality and integrity of the data. The proposed project is to develop the Nuclear Medicine Information System(NMIS) to integrate the spatial information such as the patients' medical report and their diagnosis scan report. All the information such as patient, appointment, diagnosis, cancer and user is able to manipulate ...
International Journal of Business and Society, 2017
In our previous work we proposed ENSPART-an ensemble method for DNA motif discovery which partiti... more In our previous work we proposed ENSPART-an ensemble method for DNA motif discovery which partitions input dataset into several equal size subsets runs by several distinct tools for candidate motif prediction. The candidate motifs obtained from different data subsets are merged to obtain the final motifs. Nevertheless, the original ENSPART has several limitations: (1) the same background sequences are used for the calculation of Receiver Operating Cost (ROC) of motifs obtained from different datasets. This causes bias because different datasets might have different background distribution; (2) it does not consider the duplication of a motif and its reverse complement. This causes many redundant motifs in the result set which requires filtering. In this work, we extended the original ENSPART to solve those two issues. For the first issue, we employed background sequences that is based on the distribution of bases in the input sequences. As for the second issue, we employ a "trip...
This study examine the efficiency and effectiveness of the procedure of reporting the transfusion... more This study examine the efficiency and effectiveness of the procedure of reporting the transfusion-related adverse events. Blood transfusion is a process of receiving blood through an intravenous (IV) line from patient’s blood vessels. It is considered as a safe and common procedure as most of the blood transfusion goes well. However, there are still some minor or severe problems developed occasionally. These transfusion-related adverse events need to be investigated to find out the cause of adverse reaction during or after the blood transfusion. In Malaysia, although there is a procedure of reporting the adverse event that need to be follow strictly by the medical staff, there are some issues happens such as unattended cases, lack of detail, incomplete report and so on.
Epigenetic signatures such as chromatin and histone modification marks are prominent indicator of... more Epigenetic signatures such as chromatin and histone modification marks are prominent indicator of enhancer motif regions. While many works have been using k-mer as feature of epigenetic sequence, no comprehensive studies has been done to compare and contrast how the different choices of k-mers feature parameter affect machine learning algorithm performances. Furthermore, it is not known how effective is the k-mer feature for representing different epigenetic marksH3K4me1, DHS and p300. In this paper, a comparative study is performed to determine the accuracy, sensitivity and specificity of using k-mer feature for predicting these marks. Our results found that, classifier perform better when the k-mer length is between 4 to 6. Short k-mer length has poor accuracy, sensitivity and specificity. The k-mer feature works best for DHS sequences and has low accuracy for H3K4me1 sequences prediction. The k-mer feature is also performed poorly on specificity of DHS sequences. It can be conclu...
Pertanika journal of tropical agricultural science, 2019
Enhancers are indispensable elements in various developmental stages, orchestrating numerous biol... more Enhancers are indispensable elements in various developmental stages, orchestrating numerous biological processes via the elevation of gene expression with the aid of transcription factors. Enhancer variations have been linked to various onset of genetic diseases, highlighting their equal importance as the coding regions in the genome. Despite the first enhancer, SV40 been discovered four decades ago, the progress in enhancer identification and characterization is still in its infancy. As more genome sequences are made available, especially from that of the non-human primates, we can finally study the enhancer landscape of these primates that differs evolutionarily from that of human. One interesting genome to investigate is that of the proboscis monkey as it is deemed as one of the most ancient primates alive to date with unique morphological and dietary characteristics; it is also one of the IUCN endangered species with the strong demands for immediate conservation. In this review...
2021 2nd International Conference on Artificial Intelligence and Data Sciences (AiDAS), 2021
Black pepper (Piper nigrum) diseases and nutrient deficiency can often be observed based on the s... more Black pepper (Piper nigrum) diseases and nutrient deficiency can often be observed based on the symptoms exerted on its leaves. This paper aimed to investigate the effectiveness of employing a deep learning approach to classify black pepper disease and nutrient deficiency based on leaf images. We constructed a customized convolutionary neural network to determine how its training parameters would affect the prediction performances. Another two deep learning neural networks VGG16 and Inception V3, are also employed for comparisons. We have sampled 947 images from farms in Sarawak consisted of 8 classes in total. Image augmentation is performed on the images to produce a total of 9532 images. The result shows that the customized CNN performed slightly better than the other two deep learning approaches at a 0.98 sensitivity rate. Furthermore, image augmentation contributed to improving prediction performance for all the deep learning models. This study has demonstrated that deep learning is a feasible approach for classifying black pepper diseases and nutrient deficiency based on leaf images.
Pertanika Journal of Science and Technology, 2021
Automated Essay Scoring (AES) is a service or software that can predictively grade essay based on... more Automated Essay Scoring (AES) is a service or software that can predictively grade essay based on a pre-trained computational model. It has gained a lot of research interest in educational institutions as it expedites the process and reduces the effort of human raters in grading the essays as close to humans’ decisions. Despite the strong appeal, its implementation varies widely according to researchers’ preferences. This critical review examines various AES development milestones specifically on different methodologies and attributes used in deriving essay scores. To generalize existing AES systems according to their constructs, we attempted to fit all of them into three frameworks which are content similarity, machine learning and hybrid. In addition, we presented and compared various common evaluation metrics in measuring the efficiency of AES and proposed Quadratic Weighted Kappa (QWK) as standard evaluation metric since it corrects the agreement purely by chance when estimate t...
Malaysian Journal of Microbiology, 2020
Advanced Science Letters, 2018
Organizational culture defines an organization's uniqueness and identity. It is made up of values... more Organizational culture defines an organization's uniqueness and identity. It is made up of values, beliefs, attitudes, norms, and patterns of behavior that are shared and adopted by individuals in the organization to cope with internal and external pressure. Computerized culture audit system is more cost efficient, time saving and is less prone to error. However, one of the challenges faced is the difficulty in obtaining accurate employee opinions from free texts. The existing sentiment analysis methods available cannot effectively be applied directly to the organizational culture context for employee opinion analysis. Therefore, this study proposes an employee opinion analysis method known as "Opinion Keyword Extraction" which is based on building customized corpus specific for sentiment analysis in organizational culture context. Opinion Keyword Extraction is a combination of the rule-base and lexicon approach using our own corpus datastore. The customized corpus consists of features related to negation detection, detection of special words relevant to organizational culture and detection of emotion symbols. We evaluated our method using primary data collected from 100 participants and found that our Opinion
Biotechnology & Biotechnological Equipment, 2018
We propose an improved solution to the three-stage DNA motif prediction approach. The threestage ... more We propose an improved solution to the three-stage DNA motif prediction approach. The threestage approach uses only a subset of input sequences for initial motif prediction, and the initial motifs obtained are employed for site detection in the remaining input subset of non-overlaps. The currently available solution is not robust because motifs obtained from the initial subset are represented as a position weight matrices, which results in high false positives. Our approach, called DeepFinder, employs deep learning neural networks with features associated with binding sites to construct a motif model. Furthermore, multiple prediction tools are used in the initial motif prediction process to obtain a higher number of positive hits. Our features are engineered from the context of binding sites, which are assumed to be enriched with specificity information of sites recognized by transcription factor proteins. DeepFinder is evaluated using several performance metrics on ten chromatin immunoprecipitation (ChIP) datasets. The results show marked improvement of our solution in comparison with the existing solution. This indicates the effectiveness and potential of our proposed DeepFinder for large-scale motif analysis.
Journal of IT in Asia, 2017
This paper aims at developing techniqus for design and implementation of neural classifiers. Base... more This paper aims at developing techniqus for design and implementation of neural classifiers. Based on our previous study on generalized RBF neural network architecture and learning criterion function for parameter optimization, this work addresses two realization issues, i.e. supervised input features selection and genetic computation techniques for tuning classifiers. A comparative study on classifiation performance is carried on by a set of protein sequence data.
Conference on Information Technology in Asia, 2003
Neural classifiers have been widely used in many application areas. This paper describes generali... more Neural classifiers have been widely used in many application areas. This paper describes generalized neural classifier based on the radial basis function network. The contributions of this work are: i) improvement on the standard radial basis function network architecture, ii) proposed a new cost function for classification, iii) hidden units feature subset selection algorithm, and iv) optimizing the neural classifier using the genetic algorithm with a new cost function. Comparative studies on the proposed neural classifier on protein classification problem are given.
Epigenetic signatures such as chromatin and histone modification marks are prominent indicator of... more Epigenetic signatures such as chromatin and histone modification marks are prominent indicator of enhancer motif regions. While many works have been using k-mer as feature of epigenetic sequence, no comprehensive studies has been done to compare and contrast how the different choices of k-mers feature parameter affect machine learning algorithm performances. Furthermore, it is not known how effective is the k-mer feature for representing different epigenetic marks-H3K4me1, DHS and p300. In this paper, a comparative study is performed to determine the accuracy, sensitivity and specificity of using k-mer feature for predicting these marks. Our results found that, classifier perform better when the k-mer length is between 4 to 6. Short k-mer length has poor accuracy, sensitivity and specificity. The k-mer feature works best for DHS sequences and has low accuracy for H3K4me1 sequences prediction. The k-mer feature is also performed poorly on specificity of DHS sequences. It can be concluded that, there are still much room for improvement of identifying better feature for representing epigenetic feature for enhancer prediction.
Human factor theories are always being neglected especially in the design of biological tools. Th... more Human factor theories are always being neglected especially in the design of biological tools. This problem was found in sequence logo which is used to visualize the conservation characteristics of the biological sequence motifs. Previous studies have found some limitations in the graphical representation which cause biasness and misinterpretation of the results in sequence logo. Therefore, the aim of this study is to investigate on the visual attributes performance in helping viewers to perceive and interpret the information based the preattentive theories and Gestalt principles of perception. A survey was carried out to gather user’s opinion. The results showed some limitations in the use of colour, negative space, size and arrangement of the nucleotides and the lack of information and interactivity in the sequence logo. Therefore, improvements in standardizing the colour, graphical representation of the nucleotides and interactivity of the tool are needed to solve the problems of...
Journal of Biomolecular Structure and Dynamics, 2022
The Human Immunodeficiency Virus (HIV) infection is a global pandemic that has claimed 33 million... more The Human Immunodeficiency Virus (HIV) infection is a global pandemic that has claimed 33 million lives to-date. One of the most efficacious treatments for naïve or pretreated HIV patients is the HIV integrase strand transfer inhibitors (INSTIs). However, given that HIV treatment is life-long, the emergence of HIV strains resistant to INSTIs is an imminent challenge. In this work, we showed two best regression QSAR models that were constructed using a boosted Random Forest algorithm (r2 = 0.998, q210CV = 0.721, q2external_test = 0.754) and a boosted K* algorithm (r2 = 0.987, q210CV = 0.721, q2external_test = 0.758) to predict the pIC50 values of INSTIs. Subsequently, the regression QSAR models were deployed against the Drugbank database for drug repositioning. The top-ranked compounds were further evaluated for their target engagement activity using molecular docking studies and accelerated Molecular Dynamics simulation. Lastly, their potential as INSTIs were also evaluated from our literature search. Our study offers the first example of a large-scale regression QSAR modelling effort for discovering highly active INSTIs to combat HIV infection.Communicated by Ramaswamy H. Sarma.
The Human Immunodeficiency Virus (HIV) infection is a global pandemic that has claimed 33 million... more The Human Immunodeficiency Virus (HIV) infection is a global pandemic that has claimed 33 million lives to date. One of the most efficacious treatment for naïve or pre-treated HIV patients is with the HIV integrase strand transfer inhibitors (INSTIs). However, given that HIV treatment is lifelong, the emergence of HIV-1 strains resistant to INSTIs is an imminent challenge. In this work, we showed two best regression QSAR models that were constructed using a boosted Random Forest algorithm (r2 = 0.998, q210CV = 0.721, q2external_test = 0.754) and a boosted K* algorithm (r2 = 0.987, q210CV = 0.721, q2external_test = 0.758) to predict the pIC50 values of INSTIs. Subsequently, the regression QSAR models were deployed against the Drugbank database for drug repositioning. The top ranked compounds were further evaluated for their target engagement activity using molecular docking studies and their potential as INSTIs evaluated from our literature search. Our study offers the first example ...
Proceedings of the 7th International Conference on Computational Systems-Biology and Bioinformatics, 2016
Unravelling gene expression has become a critical procedure in bioinformatics world today and req... more Unravelling gene expression has become a critical procedure in bioinformatics world today and required continuous efforts to form a complete picture of enhancers. Enhancers are explicit patterns of gene expression that bound by activators to stimulate transcription. It could reside in upstream or downstream thousands of base pairs away without any fixed position. Therefore, the identification task of enhancers is extremely challenging. The inclusion of gaps in motif identification improved the overall accuracy and sensitivity, however, this feature is not fully utilised in deep learning method yet. Deep learning, is a powerful machine learning technique that has been actively used in image recognition and this technique has begun to shed light in bioinformatics. The expressiveness of deep learning enables higher feature learning from lower level ones. As a result, an integration of gapped motif feature representation (GMFR) and deep learning approach called deep convolutional neural...
Journal of Telecommunication, Electronic and Computer Engineering, 2017
Job recruitment portals become the main recruitment channel in most of the organizations nowadays... more Job recruitment portals become the main recruitment channel in most of the organizations nowadays because they offer many advantages to recruiters and job applicants. An outstanding recruitment system should be able to filter and recommend the best potential candidates for a job vacancy so that it can avoid hiring of inappropriate individuals or miss out the good candidates. Nevertheless, most of the existing job portals do not cover the unskilled job sectors. Matching unskilled jobs to applicants is challenging because the selection criteria can be very subjective and difficult to specify in terms of professional qualifications. In this paper, Kansei Engineering (KE) Model is applied to find the most prominent personality traits that are preferred by employers in different unskilled job categories in Malaysia. We have identified most prominent 20 Kansei words related to personality traits that are important to six main industries of unskilled workers. The six unskilled sectors invo...
2016 IEEE 16th International Conference on Bioinformatics and Bioengineering (BIBE), 2016
This paper proposes an ensemble approach based on data partitioning for large-scale DNA motif ana... more This paper proposes an ensemble approach based on data partitioning for large-scale DNA motif analysis. Motif prediction using genome-scale dataset is challenging due to high time and space complexity. Existing ensemble approaches, while demonstrated improve performances, are only applicable to small datasets. Our approach called ENSPART first partitions the input dataset into non-overlapping subsets which serve as input to multiple distinct motif prediction tools. It is assumed that the core motifs of a transcription factor protein exists in all data subsets. We employed seven motif prediction tools to obtain initial candidate motifs and they are merged according to their sequence content similarity. An alignment-free method is used to establish motif similarity. A novel motifs merging method is proposed to merge similar motifs obtained by tools in different data partitions. Ten genome-wide ChIP datasets are collected for evaluation. We compare our approach with MEME-ChIP and obtai...
The medical information system plays an important role for either the flow of business transactio... more The medical information system plays an important role for either the flow of business transaction or record medication histories. In the Department of Nuclear Medicine of Sarawak General Hospital, the officers and doctors handle the patient information manually during registration and diagnosis process. Unfortunately, this manual documented system has led to severity of the problems such as medication history error, misplacement of drug particulars, delay release of reports and insecurity to the patient's record. Thus, there is a demand for a web-based information system so that the doctors and officers are able to sieve the benefits of the accuracy confidentiality and integrity of the data. The proposed project is to develop the Nuclear Medicine Information System(NMIS) to integrate the spatial information such as the patients' medical report and their diagnosis scan report. All the information such as patient, appointment, diagnosis, cancer and user is able to manipulate ...
International Journal of Business and Society, 2017
In our previous work we proposed ENSPART-an ensemble method for DNA motif discovery which partiti... more In our previous work we proposed ENSPART-an ensemble method for DNA motif discovery which partitions input dataset into several equal size subsets runs by several distinct tools for candidate motif prediction. The candidate motifs obtained from different data subsets are merged to obtain the final motifs. Nevertheless, the original ENSPART has several limitations: (1) the same background sequences are used for the calculation of Receiver Operating Cost (ROC) of motifs obtained from different datasets. This causes bias because different datasets might have different background distribution; (2) it does not consider the duplication of a motif and its reverse complement. This causes many redundant motifs in the result set which requires filtering. In this work, we extended the original ENSPART to solve those two issues. For the first issue, we employed background sequences that is based on the distribution of bases in the input sequences. As for the second issue, we employ a "trip...
This study examine the efficiency and effectiveness of the procedure of reporting the transfusion... more This study examine the efficiency and effectiveness of the procedure of reporting the transfusion-related adverse events. Blood transfusion is a process of receiving blood through an intravenous (IV) line from patient’s blood vessels. It is considered as a safe and common procedure as most of the blood transfusion goes well. However, there are still some minor or severe problems developed occasionally. These transfusion-related adverse events need to be investigated to find out the cause of adverse reaction during or after the blood transfusion. In Malaysia, although there is a procedure of reporting the adverse event that need to be follow strictly by the medical staff, there are some issues happens such as unattended cases, lack of detail, incomplete report and so on.
Epigenetic signatures such as chromatin and histone modification marks are prominent indicator of... more Epigenetic signatures such as chromatin and histone modification marks are prominent indicator of enhancer motif regions. While many works have been using k-mer as feature of epigenetic sequence, no comprehensive studies has been done to compare and contrast how the different choices of k-mers feature parameter affect machine learning algorithm performances. Furthermore, it is not known how effective is the k-mer feature for representing different epigenetic marksH3K4me1, DHS and p300. In this paper, a comparative study is performed to determine the accuracy, sensitivity and specificity of using k-mer feature for predicting these marks. Our results found that, classifier perform better when the k-mer length is between 4 to 6. Short k-mer length has poor accuracy, sensitivity and specificity. The k-mer feature works best for DHS sequences and has low accuracy for H3K4me1 sequences prediction. The k-mer feature is also performed poorly on specificity of DHS sequences. It can be conclu...
Pertanika journal of tropical agricultural science, 2019
Enhancers are indispensable elements in various developmental stages, orchestrating numerous biol... more Enhancers are indispensable elements in various developmental stages, orchestrating numerous biological processes via the elevation of gene expression with the aid of transcription factors. Enhancer variations have been linked to various onset of genetic diseases, highlighting their equal importance as the coding regions in the genome. Despite the first enhancer, SV40 been discovered four decades ago, the progress in enhancer identification and characterization is still in its infancy. As more genome sequences are made available, especially from that of the non-human primates, we can finally study the enhancer landscape of these primates that differs evolutionarily from that of human. One interesting genome to investigate is that of the proboscis monkey as it is deemed as one of the most ancient primates alive to date with unique morphological and dietary characteristics; it is also one of the IUCN endangered species with the strong demands for immediate conservation. In this review...
2021 2nd International Conference on Artificial Intelligence and Data Sciences (AiDAS), 2021
Black pepper (Piper nigrum) diseases and nutrient deficiency can often be observed based on the s... more Black pepper (Piper nigrum) diseases and nutrient deficiency can often be observed based on the symptoms exerted on its leaves. This paper aimed to investigate the effectiveness of employing a deep learning approach to classify black pepper disease and nutrient deficiency based on leaf images. We constructed a customized convolutionary neural network to determine how its training parameters would affect the prediction performances. Another two deep learning neural networks VGG16 and Inception V3, are also employed for comparisons. We have sampled 947 images from farms in Sarawak consisted of 8 classes in total. Image augmentation is performed on the images to produce a total of 9532 images. The result shows that the customized CNN performed slightly better than the other two deep learning approaches at a 0.98 sensitivity rate. Furthermore, image augmentation contributed to improving prediction performance for all the deep learning models. This study has demonstrated that deep learning is a feasible approach for classifying black pepper diseases and nutrient deficiency based on leaf images.
Pertanika Journal of Science and Technology, 2021
Automated Essay Scoring (AES) is a service or software that can predictively grade essay based on... more Automated Essay Scoring (AES) is a service or software that can predictively grade essay based on a pre-trained computational model. It has gained a lot of research interest in educational institutions as it expedites the process and reduces the effort of human raters in grading the essays as close to humans’ decisions. Despite the strong appeal, its implementation varies widely according to researchers’ preferences. This critical review examines various AES development milestones specifically on different methodologies and attributes used in deriving essay scores. To generalize existing AES systems according to their constructs, we attempted to fit all of them into three frameworks which are content similarity, machine learning and hybrid. In addition, we presented and compared various common evaluation metrics in measuring the efficiency of AES and proposed Quadratic Weighted Kappa (QWK) as standard evaluation metric since it corrects the agreement purely by chance when estimate t...
Malaysian Journal of Microbiology, 2020
Advanced Science Letters, 2018
Organizational culture defines an organization's uniqueness and identity. It is made up of values... more Organizational culture defines an organization's uniqueness and identity. It is made up of values, beliefs, attitudes, norms, and patterns of behavior that are shared and adopted by individuals in the organization to cope with internal and external pressure. Computerized culture audit system is more cost efficient, time saving and is less prone to error. However, one of the challenges faced is the difficulty in obtaining accurate employee opinions from free texts. The existing sentiment analysis methods available cannot effectively be applied directly to the organizational culture context for employee opinion analysis. Therefore, this study proposes an employee opinion analysis method known as "Opinion Keyword Extraction" which is based on building customized corpus specific for sentiment analysis in organizational culture context. Opinion Keyword Extraction is a combination of the rule-base and lexicon approach using our own corpus datastore. The customized corpus consists of features related to negation detection, detection of special words relevant to organizational culture and detection of emotion symbols. We evaluated our method using primary data collected from 100 participants and found that our Opinion
Biotechnology & Biotechnological Equipment, 2018
We propose an improved solution to the three-stage DNA motif prediction approach. The threestage ... more We propose an improved solution to the three-stage DNA motif prediction approach. The threestage approach uses only a subset of input sequences for initial motif prediction, and the initial motifs obtained are employed for site detection in the remaining input subset of non-overlaps. The currently available solution is not robust because motifs obtained from the initial subset are represented as a position weight matrices, which results in high false positives. Our approach, called DeepFinder, employs deep learning neural networks with features associated with binding sites to construct a motif model. Furthermore, multiple prediction tools are used in the initial motif prediction process to obtain a higher number of positive hits. Our features are engineered from the context of binding sites, which are assumed to be enriched with specificity information of sites recognized by transcription factor proteins. DeepFinder is evaluated using several performance metrics on ten chromatin immunoprecipitation (ChIP) datasets. The results show marked improvement of our solution in comparison with the existing solution. This indicates the effectiveness and potential of our proposed DeepFinder for large-scale motif analysis.
Journal of IT in Asia, 2017
This paper aims at developing techniqus for design and implementation of neural classifiers. Base... more This paper aims at developing techniqus for design and implementation of neural classifiers. Based on our previous study on generalized RBF neural network architecture and learning criterion function for parameter optimization, this work addresses two realization issues, i.e. supervised input features selection and genetic computation techniques for tuning classifiers. A comparative study on classifiation performance is carried on by a set of protein sequence data.
Pre-print, 2021
The Human Immunodeficiency Virus (HIV) infection is a global pandemic that has claimed 33 million... more The Human Immunodeficiency Virus (HIV) infection is a global pandemic that has claimed 33 million lives to date. One of the most efficacious treatment for naïve or pre-treated HIV patients is with the HIV integrase strand transfer inhibitors (INSTIs). However, given that HIV treatment is lifelong , the emergence of HIV-1 strains resistant to INSTIs is an imminent challenge. In this work, we showed two best regression QSAR models that were constructed using a boosted Random Forest algorithm (r 2 = 0.998, q 10CV 2 = 0.721, q external_test 2 = 0.754) and a boosted K* algorithm (r 2 = 0.987, q 10CV 2 = 0.721, q external_test 2 = 0.758) to predict the pIC50 values of INSTIs. Subsequently, the regression QSAR models were deployed against the Drugbank database for drug repositioning. The top ranked compounds were further evaluated for their target engagement activity using molecular docking studies and their potential as INSTIs evaluated from our literature search. Our study offers the first example of a large-scale regression QSAR modelling effort for discovering highly active INSTIs to combat HIV infection.