Reproducibility and Prognosis of Quantitative Features Extracted from CT Images - PubMed (original) (raw)

Reproducibility and Prognosis of Quantitative Features Extracted from CT Images

Yoganand Balagurunathan et al. Transl Oncol. 2014.

Abstract

We study the reproducibility of quantitative imaging features that are used to describe tumor shape, size, and texture from computed tomography (CT) scans of non-small cell lung cancer (NSCLC). CT images are dependent on various scanning factors. We focus on characterizing image features that are reproducible in the presence of variations due to patient factors and segmentation methods. Thirty-two NSCLC nonenhanced lung CT scans were obtained from the Reference Image Database to Evaluate Response data set. The tumors were segmented using both manual (radiologist expert) and ensemble (software-automated) methods. A set of features (219 three-dimensional and 110 two-dimensional) was computed, and quantitative image features were statistically filtered to identify a subset of reproducible and nonredundant features. The variability in the repeated experiment was measured by the test-retest concordance correlation coefficient (CCCTreT). The natural range in the features, normalized to variance, was measured by the dynamic range (DR). In this study, there were 29 features across segmentation methods found with CCCTreT and DR ≥ 0.9 and R(2) Bet ≥ 0.95. These reproducible features were tested for predicting radiologist prognostic score; some texture features (run-length and Laws kernels) had an area under the curve of 0.9. The representative features were tested for their prognostic capabilities using an independent NSCLC data set (59 lung adenocarcinomas), where one of the texture features, run-length gray-level nonuniformity, was statistically significant in separating the samples into survival groups (P ≤ .046).

PubMed Disclaimer

Figures

Figure 1

Figure 1

Process flow for finding representative image features using test-retest data set and testing for prognosis using independent data set.

Figure 2

Figure 2

SI of manual to ensemble segmentation. The average SI is 79% and 78% for test and retest data sets.

Figure 3

Figure 3

Bland-Altman plot for test and retest to data is shown for conventional univariate, bivariate, and volume features in (A) manual and (B) ensemble segmentations.

Figure 4

Figure 4

Example of slice and 3D region for a sample segmented using manual method for test and retest of the patient (top and bottom rows). The subset of slices was arbitrarily selected by increasing slice numbers (matched for test and retest) to approximately cover the entire volume.

Figure 5

Figure 5

Distribution of DR and CCC computed on test/retest data in (A) manual and (B) ensemble segmentations.

Figure 5

Figure 5

Distribution of DR and CCC computed on test/retest data in (A) manual and (B) ensemble segmentations.

Figure 6

Figure 6

Hierarchial clustering of repeatable image features (CCC and DR > 0.9) in test/retest data and across segmentations. The representative features are obtained by removing features with high dependency (R2 ≥ 0.95); those that pass the cutoff are outlined (see Table 4_C_). The feature value was averaged over different segmentations (manual and ensemble) and repeats (test and retest). The features were standardized to 0 to 1. The clustering was arbitrarily stopped at seven and four groups on feature and sample axes, respectively. The F No. indicates feature position in the overall set of 219 3D features.

Figure 7

Figure 7

Discrimination of prognostic score with feature value for size- and texture-based features with optimal threshold. (A) Volume feature, (B) run-length GLN, and (C) Laws feature.

Figure 7

Figure 7

Discrimination of prognostic score with feature value for size- and texture-based features with optimal threshold. (A) Volume feature, (B) run-length GLN, and (C) Laws feature.

Figure 8

Figure 8

The 2D slice and 3D rendering of three samples selected in the test/retest RIDER data. The samples in A had a better radiologist prognostic score (smaller average run-length GLN), whereas the samples in B had a poor radiologist prognostic score (larger average run-length GLN).

Figure 9

Figure 9

Prognostic test result using run-length nonuniformity (GLN) image feature split at the median value is shown. (A) Kaplan-Meier plot for independent adenocarcinoma cases. (B) Example of three extreme tumor samples with low and high values of average run-length GLN is shown in 2 and 3D plots.

Similar articles

Cited by

References

    1. Jemal A, Siegel R, Ward E, Hao Y, Xu J, Murray T, Thun MJ. Cancer statistics, 2008. CA Cancer J Clin. 2008;58:71–96. - PubMed
    1. Howlader N, Noone AM, Krapcho M, Garshell J, Neyman N, Altekruse SF, Kosary CL, Yu M, Ruhl J, Tatalovich Z, et al., editors. SEER-NCI, author. SEER Cancer Statistics Review (CSR) 1975–2010. Bethesda, MD: National Cancer Institute; 2013.
    1. USPH-Service, author. Smoking and Health: Report of the Advisory Committee to the Surgen General of the Public Health Service. Washington, DC: Government Printing Office; 1964.
    1. Nguyen T, Rangayyan R. Shape analysis of breast masses in mammograms via the fractal dimension. Conf Proc IEEE Eng Med Biol Soc. 2005;3:3210–3213. - PubMed
    1. Schuster DP. The opportunities and challenges of developing imaging biomarkers to study lung function and disease. Am J Respir Crit Care Med. 2007;176:224–230. - PubMed

LinkOut - more resources