Murat Dundar - Profile on Academia.edu (original) (raw)

Papers by Murat Dundar

Research paper thumbnail of Simplicity of Kmeans Versus Deepness of Deep Learning: A Case of Unsupervised Feature Learning with Limited Data

Simplicity of Kmeans Versus Deepness of Deep Learning: A Case of Unsupervised Feature Learning with Limited Data

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), 2015

ABSTRACT

Research paper thumbnail of Characterization of cardiac motion with spatial relationship

Characterization of cardiac motion with spatial relationship

Research paper thumbnail of System and Method for Multiple Instance Learning for Computer Aided Detection

System and Method for Multiple Instance Learning for Computer Aided Detection

Research paper thumbnail of Target detection with spatio-spectral data via concordance learning

Target detection with spatio-spectral data via concordance learning

2009 Ieee International Geoscience and Remote Sensing Symposium, 2009

ABSTRACT In challenging environments, in order to uniquely define a sample as a target, multiple ... more ABSTRACT In challenging environments, in order to uniquely define a sample as a target, multiple representations of the samples might be required. As a case study, we consider cars in the parking lots of an urban imagery as targets. What makes this problem challenging is the copresence of several parking garages and parking lots in the same imagery. Both the cars in the parking lots and in the parking garages present with similar spectral characteristics. Spectral representation alone is not sufficient to uniquely define a pixel as a car in the parking lot. In this example, before a pixel is confirmed as a target or rejected as not being a target, classifiers corresponding to spectral and spatial representations of the samples has to concord. The current study discusses some possible ways these classifiers can be trained so that the rate of true concordance is maximized. We consider independent training and feature concatenation first and then propose a joint optimization scheme. The proposed approach aims to optimize multiple classifiers at once so as to maximize concordance among the classifiers while minimizing the classification error.

Research paper thumbnail of Performance adjustments in medical decision support systems

Performance adjustments in medical decision support systems

Research paper thumbnail of Target detection with spatio-spectral data via concordance learning

Target detection with spatio-spectral data via concordance learning

2009 IEEE International Geoscience and Remote Sensing Symposium, 2009

ABSTRACT In challenging environments, in order to uniquely define a sample as a target, multiple ... more ABSTRACT In challenging environments, in order to uniquely define a sample as a target, multiple representations of the samples might be required. As a case study, we consider cars in the parking lots of an urban imagery as targets. What makes this problem challenging is the copresence of several parking garages and parking lots in the same imagery. Both the cars in the parking lots and in the parking garages present with similar spectral characteristics. Spectral representation alone is not sufficient to uniquely define a pixel as a car in the parking lot. In this example, before a pixel is confirmed as a target or rejected as not being a target, classifiers corresponding to spectral and spatial representations of the samples has to concord. The current study discusses some possible ways these classifiers can be trained so that the rate of true concordance is maximized. We consider independent training and feature concatenation first and then propose a joint optimization scheme. The proposed approach aims to optimize multiple classifiers at once so as to maximize concordance among the classifiers while minimizing the classification error.

Research paper thumbnail of Kernel Fisher's Discriminant with Heterogeneous Kernels

Kernel Fisher's Discriminant with Heterogeneous Kernels

Kernel Methods for Remote Sensing Data Analysis, 2009

Research paper thumbnail of Incorporating Spatial Contiguity into the Design of a Support Vector Machine Classifier

Incorporating Spatial Contiguity into the Design of a Support Vector Machine Classifier

2006 IEEE International Symposium on Geoscience and Remote Sensing, 2006

ABSTRACT We describe a modification of the standard support vector machine (SVM) classifier that ... more ABSTRACT We describe a modification of the standard support vector machine (SVM) classifier that exploits the tendency for spatially contiguous pixels to be similarly classified. A quadratic term characterizing the spatial correlations in a multispectral image is added into the standard SVM optimization criterion. The mathematical structure of the SVM programming problem is retained, and the solution can be expressed in terms of the ordinary SVM solution with a modified dot product. The spatial correlations are characterized by a "contiguity matrix" psi whose computation does not require labeled data; thus, the method provides a way to use a mix of labeled and unlabeled data. We present numerical comparisons of classification performance for this contiguity-enhanced SVM against a standard SVM for two multispectral data sets.

Research paper thumbnail of Sparse Fisher Discriminant Analysis for Computer Aided Detection

We describe a method for sparse feature selection for a class of problems motivated by our work i... more We describe a method for sparse feature selection for a class of problems motivated by our work in Computer-Aided Detection (CAD) systems for identifying structures of interest in medical images. Typical CAD data sets for classification are large (several thousand candidates) and unbalanced (significantly fewer than 1% of the candidates are "positive"). To be accepted by physicians, CAD systems must generalize well with extremely high sensitivity and very few false positives. In order to find the features that can lead to superior generalization, researchers typically generate a large number of experimental features for each candidate. The reason for such a large number of features is that there are no definitive methods for capturing the shape and image-based characteristics that correspond to the diagnostic features used by physicians to identify structures of interest in the image -for example, cancerous polyps in a CT (computed tomography) volume of a patient's colon. Thus several (100+) shape, texture, and intensity based features may be generated for each candidate at various levels of resolution. We propose a sparse formulation for Fisher Linear Discriminant (FLD) that scales well to large datasets; our method inherits all the desirable properties of FLD, while improving on handling large numbers of irrelevant and redundant features. We demonstrate that our sparse FLD formulation outperforms conventional FLD and two other methods for feature selection from the literature on both an artificial dataset and a real-world Colon CAD dataset.

Research paper thumbnail of Multiple Instance Learning for Computer Aided Diagnosis

Advances in neural information processing systems

Many computer aided diagnosis (CAD) problems can be best modelled as a multiple-instance learning... more Many computer aided diagnosis (CAD) problems can be best modelled as a multiple-instance learning (MIL) problem with unbalanced data: i.e. , the training data typically consists of a few positive bags, and a very large number of negative instances. Existing MIL algorithms are much too computationally expensive for these datasets. We describe CH, a framework for learning a Convex Hull representation of multiple instances that is significantly faster than existing MIL algorithms. Our CH framework applies to any standard hyperplane-based learning algorithm, and for some algorithms, is guaranteed to find the global optimal solution. Experimental studies on two different CAD applications further demonstrate that the proposed algorithm significantly improves diagnostic accuracy when compared to both MIL and traditional classifiers. Although not designed for standard MIL problems (which have both positive and negative bags and relatively balanced datasets), comparisons against other MIL methods on benchmark problems also indicate that the proposed method is competitive with the state-of-the-art.

Research paper thumbnail of A Multisite Study to Evaluate Performance of CAD in Polyp Detection

A Multisite Study to Evaluate Performance of CAD in Polyp Detection

PURPOSE The purpose of this study was that of evaluating the sensitivity and generalization capab... more PURPOSE The purpose of this study was that of evaluating the sensitivity and generalization capability of a new computer aided detection (CAD) algorithm in CT colonoscopy applied to datasets from multiple sites. METHOD AND MATERIALS The database of high-resolution CT images was obtained from NYU Medical Center (number of patients (n) =105, positives=61), Cleveland Clinic Foundation (n=42, positives= 9), and two EU sites in Vienna and Belgium (n=16, positives=15). The sensitivity was established with respect to CTC by comparison to colonoscopy. CT data was acquired using a Volume Zoom CT scanner 4x1mm slice detector and a Sensation16, all with 120kV, and exposure from 50 to110mAs, with B10f & B30f kernels. The slice thickness ranged between 1.25mm, 1.5mm & 2.0 mm with reconstruction interval of values between 0.7mm and 1.2mm and axial resolution between 0.54mm & 1.2mm. The CAD algorithm performed: candidate generation, based on shape characteristics; feature extraction at each candid...

Research paper thumbnail of Learning-based Component for Suppression of False Positives Located on the Ileo Cecal Valve: Evaluation of Performance on 802 CTC Volumes

Learning-based Component for Suppression of False Positives Located on the Ileo Cecal Valve: Evaluation of Performance on 802 CTC Volumes

PURPOSE Evaluate the performance of a module for suppression of false positive CAD marks located ... more PURPOSE Evaluate the performance of a module for suppression of false positive CAD marks located on the ileo cecal valve (ICV) in a prototype polyp detection system when applied to clean or tagged CTC datasets METHOD AND MATERIALS The ICV detection component uses 5 steps: a) detecting the ICV orifice leveraging its distinctive local curvature profile, b) a rough estimation of its orientation by aligning it with the local gradients at a given location. Estimation of the c) position, d) size of the ICV and e) refining the orientation is calculated by performing marginal space learning. In each of the 5 steps, all potential orifice candidates are evaluated using different classifiers. The top 100 candidates with maximal orifice probabilities are selected for further parameter estimation and searching. The number of selected candidates is set to maintain a good trade-off between detection accuracy and speed. The position of 116 manually marked ICV orifices were used to generate a traini...

Research paper thumbnail of Automated Polyp Detection: An Evaluation of a CAD System in over 1200 Cases

Automated Polyp Detection: An Evaluation of a CAD System in over 1200 Cases

PURPOSE To evaluate the accuracy of a prototype CAD system in CT Colonography (CTC) on a database... more PURPOSE To evaluate the accuracy of a prototype CAD system in CT Colonography (CTC) on a database of 1245 cases from multiple sites with clean and tagged preparation METHOD AND MATERIALS The database contains 1245 high-resolution MDCT datasets (545 clean and 700 tagged) partitioned randomly into training (208 clean and 169 tagged) and testing (337 clean and 531 tagged) sets. Data was obtained from 10 sites in the US and Europe, acquired using different CT scanner types from different manufacturers (4, 16, and 64 detectors), with a wide range of acquisition parameters. Axial spacing ranged from 0.48-0.97mm; reconstruction interval from 0.5 -4mm; exposure from 10 -200mAs. A majority of cases were acquired with 120 or 130 kVP. All available cases, including those with poor preparation, distention, and artifacts, were included. The test set also had data from sites not used in the training set. The CAD system was optimized to provide best sensitivity for polyps >= 6mm in extent RESUL...

Research paper thumbnail of Joint Optimization of Cascaded Classifiers for Computer Aided Detection

Joint Optimization of Cascaded Classifiers for Computer Aided Detection

2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007

... USA 51 Valley Stream Parkway Malvern, PA 19355 murat.dundar@siemens.com Jinbo Bi Knowledge So... more ... USA 51 Valley Stream Parkway Malvern, PA 19355 murat.dundar@siemens.com Jinbo Bi Knowledge Solutions and CAD Siemens Medical Solutions Inc. ... cpu time per can-didate, and was used together with the feature set 1 and CG features to learn the second classifier. ...

Research paper thumbnail of A two-level approach towards semantic colon segmentation: removing extra-colonic findings

Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention, 2009

Computer aided detection (CAD) of colonic polyps in computed tomographic colonography has tremend... more Computer aided detection (CAD) of colonic polyps in computed tomographic colonography has tremendously impacted colorectal cancer diagnosis using 3D medical imaging. It is a prerequisite for all CAD systems to extract the air-distended colon segments from 3D abdomen computed tomography scans. In this paper, we present a two-level statistical approach of first separating colon segments from small intestine, stomach and other extra-colonic parts by classification on a new geometric feature set; then evaluating the overall performance confidence using distance and geometry statistics over patients. The proposed method is fully automatic and validated using both the classification results in the first level and its numerical impacts on false positive reduction of extra-colonic findings in a CAD system. It shows superior performance than the state-of-art knowledge or anatomy based colon segmentation algorithms.

Research paper thumbnail of An Improved Multi-task Learning Approach with Applications in Medical Diagnosis

Lecture Notes in Computer Science, 2008

We propose a family of multi-task learning algorithms for collaborative computer aided diagnosis ... more We propose a family of multi-task learning algorithms for collaborative computer aided diagnosis which aims to diagnose multiple clinically-related abnormal structures from medical images. Our formulations eliminate features irrelevant to all tasks, and identify discriminative features for each of the tasks. A probabilistic model is derived to justify the proposed learning formulations. By equivalence proof, some existing regularization-based methods can also be interpreted by our probabilistic model as imposing a Wishart hyperprior. Convergence analysis highlights the conditions under which the formulations achieve convexity and global convergence. Two real-world medical problems: lung cancer prognosis and heart wall motion analysis, are used to validate the proposed algorithms.

Research paper thumbnail of Semi-Supervised Mixture of Kernels via LPBoost Methods

Fifth IEEE International Conference on Data Mining (ICDM'05), 2005

We propose an algorithm to construct classification models with a mixture of kernels from labeled... more We propose an algorithm to construct classification models with a mixture of kernels from labeled and unlabeled data. Unlike traditional kernel methods which select a kernel according to cross validation performance, we derive classifiers that are a mixture of models, each based on one kernel choice from a library of kernels. The sparse-favoring 1-norm regularization method is employed to restrict the complexity of mixture models and to achieve the sparsity of solutions. By modifying the column generation boosting algorithm LPBoost to a more general linear programming formulation, we are able to efficiently solve mixtureof-kernel problems and automatically select kernel basis functions centered at labeled data as well as unlabeled data. The effectiveness of the proposed approach is proved by experimental results on benchmark datasets and a real-world lung nodule detection system.

Research paper thumbnail of Supplementary Tables A community effort to assess and improve drug sensitivity prediction algorithms nbt.2877-S2

A predefined number of features were selected using Pearson correlation, training and prediction ... more A predefined number of features were selected using Pearson correlation, training and prediction was done using Support Vector Regression (SVR; radial basis Bidirectional search was used to select features, training and prediction was done using a support vector machine (SVM; radial basis).

Research paper thumbnail of Supplementary Software - Top performing team code - A community effort to assess and improve drug sensitivity prediction algorithms nbt.2877-S3

Supplementary Software - Top performing team code - A community effort to assess and improve drug sensitivity prediction algorithms nbt.2877-S3

Research paper thumbnail of Supplementary figures A community effort to assess and improve drug sensitivity prediction algorithms nbt.2877-S1

Research paper thumbnail of Simplicity of Kmeans Versus Deepness of Deep Learning: A Case of Unsupervised Feature Learning with Limited Data

Simplicity of Kmeans Versus Deepness of Deep Learning: A Case of Unsupervised Feature Learning with Limited Data

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), 2015

ABSTRACT

Research paper thumbnail of Characterization of cardiac motion with spatial relationship

Characterization of cardiac motion with spatial relationship

Research paper thumbnail of System and Method for Multiple Instance Learning for Computer Aided Detection

System and Method for Multiple Instance Learning for Computer Aided Detection

Research paper thumbnail of Target detection with spatio-spectral data via concordance learning

Target detection with spatio-spectral data via concordance learning

2009 Ieee International Geoscience and Remote Sensing Symposium, 2009

ABSTRACT In challenging environments, in order to uniquely define a sample as a target, multiple ... more ABSTRACT In challenging environments, in order to uniquely define a sample as a target, multiple representations of the samples might be required. As a case study, we consider cars in the parking lots of an urban imagery as targets. What makes this problem challenging is the copresence of several parking garages and parking lots in the same imagery. Both the cars in the parking lots and in the parking garages present with similar spectral characteristics. Spectral representation alone is not sufficient to uniquely define a pixel as a car in the parking lot. In this example, before a pixel is confirmed as a target or rejected as not being a target, classifiers corresponding to spectral and spatial representations of the samples has to concord. The current study discusses some possible ways these classifiers can be trained so that the rate of true concordance is maximized. We consider independent training and feature concatenation first and then propose a joint optimization scheme. The proposed approach aims to optimize multiple classifiers at once so as to maximize concordance among the classifiers while minimizing the classification error.

Research paper thumbnail of Performance adjustments in medical decision support systems

Performance adjustments in medical decision support systems

Research paper thumbnail of Target detection with spatio-spectral data via concordance learning

Target detection with spatio-spectral data via concordance learning

2009 IEEE International Geoscience and Remote Sensing Symposium, 2009

ABSTRACT In challenging environments, in order to uniquely define a sample as a target, multiple ... more ABSTRACT In challenging environments, in order to uniquely define a sample as a target, multiple representations of the samples might be required. As a case study, we consider cars in the parking lots of an urban imagery as targets. What makes this problem challenging is the copresence of several parking garages and parking lots in the same imagery. Both the cars in the parking lots and in the parking garages present with similar spectral characteristics. Spectral representation alone is not sufficient to uniquely define a pixel as a car in the parking lot. In this example, before a pixel is confirmed as a target or rejected as not being a target, classifiers corresponding to spectral and spatial representations of the samples has to concord. The current study discusses some possible ways these classifiers can be trained so that the rate of true concordance is maximized. We consider independent training and feature concatenation first and then propose a joint optimization scheme. The proposed approach aims to optimize multiple classifiers at once so as to maximize concordance among the classifiers while minimizing the classification error.

Research paper thumbnail of Kernel Fisher's Discriminant with Heterogeneous Kernels

Kernel Fisher's Discriminant with Heterogeneous Kernels

Kernel Methods for Remote Sensing Data Analysis, 2009

Research paper thumbnail of Incorporating Spatial Contiguity into the Design of a Support Vector Machine Classifier

Incorporating Spatial Contiguity into the Design of a Support Vector Machine Classifier

2006 IEEE International Symposium on Geoscience and Remote Sensing, 2006

ABSTRACT We describe a modification of the standard support vector machine (SVM) classifier that ... more ABSTRACT We describe a modification of the standard support vector machine (SVM) classifier that exploits the tendency for spatially contiguous pixels to be similarly classified. A quadratic term characterizing the spatial correlations in a multispectral image is added into the standard SVM optimization criterion. The mathematical structure of the SVM programming problem is retained, and the solution can be expressed in terms of the ordinary SVM solution with a modified dot product. The spatial correlations are characterized by a "contiguity matrix" psi whose computation does not require labeled data; thus, the method provides a way to use a mix of labeled and unlabeled data. We present numerical comparisons of classification performance for this contiguity-enhanced SVM against a standard SVM for two multispectral data sets.

Research paper thumbnail of Sparse Fisher Discriminant Analysis for Computer Aided Detection

We describe a method for sparse feature selection for a class of problems motivated by our work i... more We describe a method for sparse feature selection for a class of problems motivated by our work in Computer-Aided Detection (CAD) systems for identifying structures of interest in medical images. Typical CAD data sets for classification are large (several thousand candidates) and unbalanced (significantly fewer than 1% of the candidates are "positive"). To be accepted by physicians, CAD systems must generalize well with extremely high sensitivity and very few false positives. In order to find the features that can lead to superior generalization, researchers typically generate a large number of experimental features for each candidate. The reason for such a large number of features is that there are no definitive methods for capturing the shape and image-based characteristics that correspond to the diagnostic features used by physicians to identify structures of interest in the image -for example, cancerous polyps in a CT (computed tomography) volume of a patient's colon. Thus several (100+) shape, texture, and intensity based features may be generated for each candidate at various levels of resolution. We propose a sparse formulation for Fisher Linear Discriminant (FLD) that scales well to large datasets; our method inherits all the desirable properties of FLD, while improving on handling large numbers of irrelevant and redundant features. We demonstrate that our sparse FLD formulation outperforms conventional FLD and two other methods for feature selection from the literature on both an artificial dataset and a real-world Colon CAD dataset.

Research paper thumbnail of Multiple Instance Learning for Computer Aided Diagnosis

Advances in neural information processing systems

Many computer aided diagnosis (CAD) problems can be best modelled as a multiple-instance learning... more Many computer aided diagnosis (CAD) problems can be best modelled as a multiple-instance learning (MIL) problem with unbalanced data: i.e. , the training data typically consists of a few positive bags, and a very large number of negative instances. Existing MIL algorithms are much too computationally expensive for these datasets. We describe CH, a framework for learning a Convex Hull representation of multiple instances that is significantly faster than existing MIL algorithms. Our CH framework applies to any standard hyperplane-based learning algorithm, and for some algorithms, is guaranteed to find the global optimal solution. Experimental studies on two different CAD applications further demonstrate that the proposed algorithm significantly improves diagnostic accuracy when compared to both MIL and traditional classifiers. Although not designed for standard MIL problems (which have both positive and negative bags and relatively balanced datasets), comparisons against other MIL methods on benchmark problems also indicate that the proposed method is competitive with the state-of-the-art.

Research paper thumbnail of A Multisite Study to Evaluate Performance of CAD in Polyp Detection

A Multisite Study to Evaluate Performance of CAD in Polyp Detection

PURPOSE The purpose of this study was that of evaluating the sensitivity and generalization capab... more PURPOSE The purpose of this study was that of evaluating the sensitivity and generalization capability of a new computer aided detection (CAD) algorithm in CT colonoscopy applied to datasets from multiple sites. METHOD AND MATERIALS The database of high-resolution CT images was obtained from NYU Medical Center (number of patients (n) =105, positives=61), Cleveland Clinic Foundation (n=42, positives= 9), and two EU sites in Vienna and Belgium (n=16, positives=15). The sensitivity was established with respect to CTC by comparison to colonoscopy. CT data was acquired using a Volume Zoom CT scanner 4x1mm slice detector and a Sensation16, all with 120kV, and exposure from 50 to110mAs, with B10f & B30f kernels. The slice thickness ranged between 1.25mm, 1.5mm & 2.0 mm with reconstruction interval of values between 0.7mm and 1.2mm and axial resolution between 0.54mm & 1.2mm. The CAD algorithm performed: candidate generation, based on shape characteristics; feature extraction at each candid...

Research paper thumbnail of Learning-based Component for Suppression of False Positives Located on the Ileo Cecal Valve: Evaluation of Performance on 802 CTC Volumes

Learning-based Component for Suppression of False Positives Located on the Ileo Cecal Valve: Evaluation of Performance on 802 CTC Volumes

PURPOSE Evaluate the performance of a module for suppression of false positive CAD marks located ... more PURPOSE Evaluate the performance of a module for suppression of false positive CAD marks located on the ileo cecal valve (ICV) in a prototype polyp detection system when applied to clean or tagged CTC datasets METHOD AND MATERIALS The ICV detection component uses 5 steps: a) detecting the ICV orifice leveraging its distinctive local curvature profile, b) a rough estimation of its orientation by aligning it with the local gradients at a given location. Estimation of the c) position, d) size of the ICV and e) refining the orientation is calculated by performing marginal space learning. In each of the 5 steps, all potential orifice candidates are evaluated using different classifiers. The top 100 candidates with maximal orifice probabilities are selected for further parameter estimation and searching. The number of selected candidates is set to maintain a good trade-off between detection accuracy and speed. The position of 116 manually marked ICV orifices were used to generate a traini...

Research paper thumbnail of Automated Polyp Detection: An Evaluation of a CAD System in over 1200 Cases

Automated Polyp Detection: An Evaluation of a CAD System in over 1200 Cases

PURPOSE To evaluate the accuracy of a prototype CAD system in CT Colonography (CTC) on a database... more PURPOSE To evaluate the accuracy of a prototype CAD system in CT Colonography (CTC) on a database of 1245 cases from multiple sites with clean and tagged preparation METHOD AND MATERIALS The database contains 1245 high-resolution MDCT datasets (545 clean and 700 tagged) partitioned randomly into training (208 clean and 169 tagged) and testing (337 clean and 531 tagged) sets. Data was obtained from 10 sites in the US and Europe, acquired using different CT scanner types from different manufacturers (4, 16, and 64 detectors), with a wide range of acquisition parameters. Axial spacing ranged from 0.48-0.97mm; reconstruction interval from 0.5 -4mm; exposure from 10 -200mAs. A majority of cases were acquired with 120 or 130 kVP. All available cases, including those with poor preparation, distention, and artifacts, were included. The test set also had data from sites not used in the training set. The CAD system was optimized to provide best sensitivity for polyps >= 6mm in extent RESUL...

Research paper thumbnail of Joint Optimization of Cascaded Classifiers for Computer Aided Detection

Joint Optimization of Cascaded Classifiers for Computer Aided Detection

2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007

... USA 51 Valley Stream Parkway Malvern, PA 19355 murat.dundar@siemens.com Jinbo Bi Knowledge So... more ... USA 51 Valley Stream Parkway Malvern, PA 19355 murat.dundar@siemens.com Jinbo Bi Knowledge Solutions and CAD Siemens Medical Solutions Inc. ... cpu time per can-didate, and was used together with the feature set 1 and CG features to learn the second classifier. ...

Research paper thumbnail of A two-level approach towards semantic colon segmentation: removing extra-colonic findings

Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention, 2009

Computer aided detection (CAD) of colonic polyps in computed tomographic colonography has tremend... more Computer aided detection (CAD) of colonic polyps in computed tomographic colonography has tremendously impacted colorectal cancer diagnosis using 3D medical imaging. It is a prerequisite for all CAD systems to extract the air-distended colon segments from 3D abdomen computed tomography scans. In this paper, we present a two-level statistical approach of first separating colon segments from small intestine, stomach and other extra-colonic parts by classification on a new geometric feature set; then evaluating the overall performance confidence using distance and geometry statistics over patients. The proposed method is fully automatic and validated using both the classification results in the first level and its numerical impacts on false positive reduction of extra-colonic findings in a CAD system. It shows superior performance than the state-of-art knowledge or anatomy based colon segmentation algorithms.

Research paper thumbnail of An Improved Multi-task Learning Approach with Applications in Medical Diagnosis

Lecture Notes in Computer Science, 2008

We propose a family of multi-task learning algorithms for collaborative computer aided diagnosis ... more We propose a family of multi-task learning algorithms for collaborative computer aided diagnosis which aims to diagnose multiple clinically-related abnormal structures from medical images. Our formulations eliminate features irrelevant to all tasks, and identify discriminative features for each of the tasks. A probabilistic model is derived to justify the proposed learning formulations. By equivalence proof, some existing regularization-based methods can also be interpreted by our probabilistic model as imposing a Wishart hyperprior. Convergence analysis highlights the conditions under which the formulations achieve convexity and global convergence. Two real-world medical problems: lung cancer prognosis and heart wall motion analysis, are used to validate the proposed algorithms.

Research paper thumbnail of Semi-Supervised Mixture of Kernels via LPBoost Methods

Fifth IEEE International Conference on Data Mining (ICDM'05), 2005

We propose an algorithm to construct classification models with a mixture of kernels from labeled... more We propose an algorithm to construct classification models with a mixture of kernels from labeled and unlabeled data. Unlike traditional kernel methods which select a kernel according to cross validation performance, we derive classifiers that are a mixture of models, each based on one kernel choice from a library of kernels. The sparse-favoring 1-norm regularization method is employed to restrict the complexity of mixture models and to achieve the sparsity of solutions. By modifying the column generation boosting algorithm LPBoost to a more general linear programming formulation, we are able to efficiently solve mixtureof-kernel problems and automatically select kernel basis functions centered at labeled data as well as unlabeled data. The effectiveness of the proposed approach is proved by experimental results on benchmark datasets and a real-world lung nodule detection system.

Research paper thumbnail of Supplementary Tables A community effort to assess and improve drug sensitivity prediction algorithms nbt.2877-S2

A predefined number of features were selected using Pearson correlation, training and prediction ... more A predefined number of features were selected using Pearson correlation, training and prediction was done using Support Vector Regression (SVR; radial basis Bidirectional search was used to select features, training and prediction was done using a support vector machine (SVM; radial basis).

Research paper thumbnail of Supplementary Software - Top performing team code - A community effort to assess and improve drug sensitivity prediction algorithms nbt.2877-S3

Supplementary Software - Top performing team code - A community effort to assess and improve drug sensitivity prediction algorithms nbt.2877-S3

Research paper thumbnail of Supplementary figures A community effort to assess and improve drug sensitivity prediction algorithms nbt.2877-S1