Walker Land - Academia.edu (original) (raw)

Papers by Walker Land

International Journal of Functional Informatics and Personalised Medicine, 2008

This research describes a non-interactive process that applies several forms of computational int... more This research describes a non-interactive process that applies several forms of computational intelligence to classifying biopsy lung tissue samples. Three types of lung cancer evaluated (squamous cell carcinoma, adenocarcinoma, and bronchioalveolar carcinoma) together account for 65-70% of diagnoses. Accuracy achieved supports hypothesis that an accurate predictive model is generated from training images, and performance achieved is an accurate baseline for the process's potential scaling to larger datasets. Feature vector performance is good or better than Thiran and Macq's in every case. Except bronchioalveolar carcinomas, each individual cancer classification task experienced improvement, with two groupings showing nearly 20% classification accuracy.

Schaffer and Land 14 described a method whereby a machine intelligence (MI) process can “know wha... more Schaffer and Land 14 described a method whereby a machine intelligence (MI) process can “know what it doesn’t know.” In this paper, the concept is illustrated by three examples: the GRNN oracle ensemble method that combines multiple SVM classifiers for detecting Alzheimer’s type dementia using features automatically extracted from a speech sample, an Evolutionary Programming and Adaptive Boosting hybrid and a Generalized Regression Neural Network hybrid for classifying breast cancer. The authors assert it is (1) applicable quite directly to a great many other learning classifier systems, and (2) provides an intuitive approach to comparing the performance of different classifiers on a given task using the size of the “area of uncertainty” as a measure of performance metric. This paper provides support for these assertions by describing the steps needed to apply it to a previously published study of breast cancer benign / malignancy prediction, and then illustrates how this “area of u...

Procedia Computer Science, 2013

Procedia Computer Science, 2016

Bayesian networks (BNs) have classically been designed by two methods: expert approach (ask an ex... more Bayesian networks (BNs) have classically been designed by two methods: expert approach (ask an expert for nodes and links) and data driven approach (infer them from data). An unexpected by-product of previous Alzheimer's / dementia research (presented at CAS2015) was yet another approach where the results of a hybrid design were used to configure a BN. A complex adaptive systems approach, (e.g. GA-SVM-oracle hybrid) can sift through the combinatorics of feature subset selection, yielding a modest set of only the most influential features. Then using known likelihoods of demographics associated to dementia, and assuming direct and independent influence of dementia upon speech features, the BN is specified. The conditional probabilities needed can be estimated with far fewer data than the traditional BN data-driven approach. Although BNs have advantages (intuitive interpretation and graceful handling of missing data) they also have challenges. We report initial implementation results that suggest the need to reduce continuous variables to discrete categories, and the still-remaining need to estimate a substantial number of conditional probabilities, remain challenges for BNs. We suggest some ways forward in the application of BNs with the objective of improving / refining Alzheimer's / dementia detection using speech.

Procedia Computer Science, 2014

Gene expression microarray analysis is a rapid, low cost method of analyzing gene expression prof... more Gene expression microarray analysis is a rapid, low cost method of analyzing gene expression profiles for cancer prognosis/diagnosis. Microarray data generated from oncological studies typically contain thousands of expression values with few cases. Traditional regression and classification methods require first reducing the number of dimensions via statistical or heuristic methods. Partial Least Squares (PLS) is a dimensionality reduction method that builds a least squares regression model in a reduced dimensional space. It is well known that Support Vector Machines (SVM) outperform least squares regression models. In this study, we replace the PLS least squares model with a SVM model in the PLS reduced dimensional space. To verify our method, we build upon our previous work with a publicly available data set from the Gene Expression Omnibus database containing gene expression levels, clinical data, and survival times for patients with non-small cell lung carcinoma. Using 5-fold cross validation, and Receiver Operating Characteristic (ROC) analysis, we show a comparison of classifier performance between the traditional PLS model and the PLS/SVM hybrid. Our results show that replacing least squares regression with SVM, we increase the quality of the model as measured by the area under the ROC curve.

2007 IEEE 7th International Symposium on BioInformatics and BioEngineering, 2007

Page 1. An End-to-End Process for Cancer Identification from Images of Lung Tissue Walker H. Land... more Page 1. An End-to-End Process for Cancer Identification from Images of Lung Tissue Walker H. Land, Jr. Department of Bioengineering Binghamton University Binghamton, USA Dan Mckee Dept. ofMath. and Comp. Science, Mansfield University Mansfield, USA ...

Medical Imaging 2006: Image Processing, 2006

ABSTRACT

Procedia Computer Science, 2012

An accurate prognostic model of a cancer patient after treatment can be useful in deciding the ne... more An accurate prognostic model of a cancer patient after treatment can be useful in deciding the next course of treatment or efficacy of said treatment. Gene expression microarray data has been used to predict survival times [1], or to classify the patient as having a good/poor prognosis [2] by predicting whether the patient belongs to the class that will have a recurrence of cancer before or after a certain period, typically 3 or 5 years. Microarrays typically contain thousands of gene expression probes and a typical study may only contain a few hundred patients or less. Typical regression techniques will fail to generalize, suffering from the 'Curse of Dimensionality', resulting in an over-fitted model that performs very well on the training data, and very poorly or validation data. Various feature selection/reduction methods have been used to reduce the dimensionality of the data and improve or facilitate a solution [3]. Gene expression is known to be modulated by the expression of other genes, forming a so-called gene network or pathway. Furthermore, several networks may affect the aggressiveness of the cancer simultaneously [4]. While past studies have selected features based on statistical methods alone [5] or have simply included 'known cancer genes', none to our knowledge have used classification models based on ensembles of models based on multiple known gene networks. Based on the data presented in Shedden, et. al. [6], this study uses a General Regression Neural Network (GRNN) Oracle ensemble that combines several Partial least squares (PLS) models trained to predict recurrence times from 12 different gene networks. We hypothesize that it is possible to correctly classify recurrence by combining the results based on the gene network models.

Intelligent Engineering Systems through Artificial Neural Networks, Volume 16, 2006

SPIE Proceedings, 2003

The objectives of this paper are to discuss: (1) the development and testing of a new Evolutionar... more The objectives of this paper are to discuss: (1) the development and testing of a new Evolutionary Programming (EP) method to optimally configure Support Vector Machine (SVM) parameters for facilitating the diagnosis of breast cancer; (2) evaluation of EP derived learning machines when the number of BI-RADS and clinical history discriminators are reduced from 16 to 7; (3) establishing system performance for several SVM kernels in addition to the EP/Adaptive Boosting (EP/AB) hybrid using the Digital Database for Screening Mammography, University of South Florida (DDSM USF) and Duke data sets; and (4) obtaining a preliminary evaluation of the measurement of SVM learning machine inter-institutional generalization capability using BI-RADS data. Measuring performance of the SVM designs and EP/AB hybrid against these objectives will provide quantative evidence that the software packages described can generalize to larger patient data sets from different institutions. Most iterative methods currently in use to optimize learning machine parameters are time consuming processes, which sometimes yield sub-optimal values resulting in performance degradation. SVMs are new machine intelligence paradigms, which use the Structural Risk Minimization (SRM) concept to develop learning machines. These learning machines can always be trained to provide global minima, given that the machine parameters are optimally computed. In addition, several system performance studies are described which include EP derived SVM performance as a function of: (a) population and generation size as well as a method for generating initial populations and (b) iteratively derived versus EP derived learning machine parameters. Finally, the authors describe a set of experiments providing preliminary evidence that both the EP/AB hybrid and SVM Computer Aided Diagnostic C++ software packages will work across a large population of patients, based on a data set of approximately 2,500 samples from five different institutions.

Recent Advances in Breast Imaging, Mammography, and Computer-Aided Diagnosis of Breast Cancer

SPIE Proceedings, 2004

This research consisted of evaluating diagnostic performance results using SVM outputs previously... more This research consisted of evaluating diagnostic performance results using SVM outputs previously obtained from an integrated Duke/DDSM USF data set and the GRNN oracle. The SVM kernels used in this research included Additive, Multiplicative, S2000, and Spline kernels. GRNN results are presented for the following combinations of gate variables: age, mass margin (MM), age and MM, and all 6 BIRADS™ indicators plus age. For all experiments, Differential Evolution (DE) was used to train the GRNN. A summary of the DE process is described, independent of the software application. The experiments described in this paper show that the GRNN oracle, with all of the gate variable combinations, performed better than any of the individual SVM kernels alone at or below 98% sensitivity.

Procedia Computer Science, 2011

PLS initially creates uncorrelated latent variables which are linear combinations of the original... more PLS initially creates uncorrelated latent variables which are linear combinations of the original input vectors X i , where weights are used to determine linear combinations, which are proportional to the covariance. Secondly, a least squares regression is then performed on the subset of extracted latent variables that lead to a lower and biased variance on transformed data. This process, leads to a lower variance estimate of the regression coefficients when compared to the Ordinary Least Squares regression approach. Classical Principal Component Analysis (PCA), linear PLS and kernel ridge regression (KRR) techniques are well known shrinkage estimators designed to deal with multicollinearity, which can be a serious problem. That is, multi-collinearity can dramatically influence the effectiveness of a regression model by changing the values and signs of estimated regression coefficients given different but similar data samples, thereby leading to a regression model which represents training data reasonably well, but generalizes poorly to validation and test data. We explain how to address these problems, which is followed by performing a PLS hypotheses driven preliminary research study and sensitivities analysis by not doing a combinatorial analysis as PLS will eliminate the unnecessary variables using a microarray colon cancer data set. Research studies as well as preliminary results are described in the results section.

Procedia Computer Science, 2011

We describe a sequence of experiments in which a robot "brain" was evolved to mimic the behaviour... more We describe a sequence of experiments in which a robot "brain" was evolved to mimic the behaviours captured under control of a heuristic rule program (imitation learning). The task was light-seeking while avoiding obstacles using binocular light sensors and a trio of IR proximity sensors. The "brain" was a spiking neural network simulator whose parameters were tuned by a genetic algorithm, where fitness was assessed by the closeness to target output spike trains. Spike trains were frequency encoded. The network topology was manually designed, and then modified in response to observed difficulties during evolution. We noted that good performance seems best approached by judicious mixing of excitation and inhibition. Besides robotic applications, the domain of "smart" prosthetics also appears promising.

Procedia Computer Science, 2011

New advances in medicine have led to a disparity between the existing information about patients ... more New advances in medicine have led to a disparity between the existing information about patients and the ability of clinicians to utilize it. Lack of training and incompatibility with clinical techniques has made the use of the complex adaptive systems approach difficult. To avoid this, we used statistical learning theory as an inline preprocess between existing data collection methods and clinical analysis of data. Clinicians would be able to use this system without any changes to their techniques, while improving accuracy. We used data from CT scans of patients with metastatic carcinoma to predict prognosis. Specifically, we used the standard for evaluating response to treatment, RECIST, and new qualitative and quantitative features. An Evolutionary Programming trained Support Vector Machine (EP-SVM), was used to preprocess the data for two traditional survival analysis techniques: Cox Proportional Hazard Models and Kaplan Meier curves. This was compared to Logistic Regression (LR) and using cutoff points. Analyses were also done to compare different inputs and different radiologists. The EP-SVM outperformed both LR and the cutoff method significantly and allowed us to both intelligently combine data from multiple sources and identify the most predictive features without necessitating changes in clinical methods.

Procedia Computer Science, 2011

In biomedical science, data mining techniques have been applied to extract statistically signific... more In biomedical science, data mining techniques have been applied to extract statistically significant and clinically useful information from a given dataset. Finding biomarker gene sets for diseases can aid in understanding disease diagnosis, prognosis and therapy response. Gene expression microarrays have played an important role in such studies and yet, there have also been criticisms in their analysis. Analysis of these datasets presents the high risk of over-fitting (discovering spurious patterns) because of their feature-rich but case-poor nature. This paper describes a GA-SVM hybrid along with Gaussian noise perturbation (with a manual noise gain) to combat over-fitting; determine the strongest signal in the dataset; and discover stable biomarker sets. A colon cancer gene expression microarray dataset is used to show that the strongest signal in the data (optimal noise gain where a modest number of similar candidates emerge) can be found by a binary search. The diversity of candidates (measured by cluster analysis) is reduced by the noise perturbation, indicating some of the patterns are being eliminated (we hope mostly spurious ones). Initial biological validated has been tested and genes have different levels of significance to the candidates; although the discovered biomarker sets should be studied further to ascertain their biological significance and clinical utility. Furthermore, statistical validity displays that the strongest signal in the data is spurious and the discovered biomarker sets should be rejected.

Procedia Computer Science, 2012

Breast cancer screening has reference to screening of asymptomatic, generally healthy women for b... more Breast cancer screening has reference to screening of asymptomatic, generally healthy women for breast cancer, to identify those who should receive a follow up check. Early screening can detect non-invasive ductal carcinoma in situ (called "pre breast cancer"), which almost never forms a lump and is generally non-detectible, except by mammography. This paper will describe the design and preliminary evaluation of this PNN/GRNN ensemble pre-screener, in the context of a possible pre-screening protocol, which may, if required, include other data. The results show that using the ensemble technique provides almost a 20% AUC increase over the average standalone PNN and almost 10% over the best performing PNN.

Procedia Computer Science, 2013

Recent outbreaks of listeria, salmonella, and other pathogens have reinforced the need for more r... more Recent outbreaks of listeria, salmonella, and other pathogens have reinforced the need for more rigorous testing of food products. Millions are spent each year testing food. Certifying the safety of the food is a challenging task using traditional testing methods. Current methods require long incubation times before the first results are observed and still only represent a small fraction of the food that is sold. Long analysis methods also lead to loss of consumables. 18.9 billion pounds of produce are lost a year to spoilage. A fast and effective method is needed to decrease the amount of time necessary to test the safety of food. The goal is to provide accurate sample classification as quickly as possible, thus allowing pathogen-free product to be shipped to market with the shortest delay possible. An autonomous electrochemical sensor was combined with a powerful multi-class Probabilistic Neural Network (PNN) system to classify four species of organisms (E. Coli #25922, E. Coli # 11775, S. Epidermis #12228, or C. Albicans #10231). We used an evolutionary based kernel optimization algorithm to optimize the kernel parameters, and trained the system on data sampled from four different organisms. The trained and optimized model was validated on a set containing several samples that were not used to train the network. We showed that the network was able to correctly classify unknown samples in a shorter period than the industry standard of 24 hours, thus providing a potential benefit to the agriculture industry.

Procedia Computer Science, 2013

The objective of this research is to develop a complex adaptive piecewise linear regression / pro... more The objective of this research is to develop a complex adaptive piecewise linear regression / probabilistic neural network (PNN) intelligent system for the rapid detection and classification of Escherichia coli (E.coli). The rapid detection and classification of E.coli is important because current methods require a long period of analysis before a classification can be determined. The objective of this paper is to describe the design and preliminarily evaluate an Intelligent Decision Support System (IDSS) that will validate the following hypotheses: an intelligent decision support system (IDSS) to allow the rapid collection and classification of E.coli can be designed and preliminarily evaluated, which will significantly decrease detection and classification times for E.coli bacteria, thereby addressing the food spoilage problem. The research in this paper provides a preliminary answer to: What performance improvement percentage can be realized against the 16 to 48 hours required for the conventional multistep methods of detection of microorganisms (using E.coli data as a baseline)? For the 16 hour period we have a 6.7% reduction in the time-to-detect period ((16-15)/15 × 100% = 6.7%) and for the 48 hour period we have a 220% reduction in time ((48-15)/15×100% = 220%).

International Journal of Functional Informatics and Personalised Medicine, 2008

Procedia Computer Science, 2013

Procedia Computer Science, 2016

Procedia Computer Science, 2014

2007 IEEE 7th International Symposium on BioInformatics and BioEngineering, 2007

Medical Imaging 2006: Image Processing, 2006

ABSTRACT

Procedia Computer Science, 2012

Intelligent Engineering Systems through Artificial Neural Networks, Volume 16, 2006

SPIE Proceedings, 2003

Recent Advances in Breast Imaging, Mammography, and Computer-Aided Diagnosis of Breast Cancer

SPIE Proceedings, 2004

Procedia Computer Science, 2011

Procedia Computer Science, 2012

Procedia Computer Science, 2013