An R package for an integrated evaluation of statistical approaches to cancer incidence projection - PubMed (original) (raw)

An R package for an integrated evaluation of statistical approaches to cancer incidence projection

Maximilian Knoll et al. BMC Med Res Methodol. 2020.

Abstract

Background: Projection of future cancer incidence is an important task in cancer epidemiology. The results are of interest also for biomedical research and public health policy. Age-Period-Cohort (APC) models, usually based on long-term cancer registry data (> 20 yrs), are established for such projections. In many countries (including Germany), however, nationwide long-term data are not yet available. General guidance on statistical approaches for projections using rather short-term data is challenging and software to enable researchers to easily compare approaches is lacking.

Methods: To enable a comparative analysis of the performance of statistical approaches to cancer incidence projection, we developed an R package (incAnalysis), supporting in particular Bayesian models fitted by Integrated Nested Laplace Approximations (INLA). Its use is demonstrated by an extensive empirical evaluation of operating characteristics (bias, coverage and precision) of potentially applicable models differing by complexity. Observed long-term data from three cancer registries (SEER-9, NORDCAN, Saarland) was used for benchmarking.

Results: Overall, coverage was high (mostly > 90%) for Bayesian APC models (BAPC), whereas less complex models showed differences in coverage dependent on projection-period. Intercept-only models yielded values below 20% for coverage. Bias increased and precision decreased for longer projection periods (> 15 years) for all except intercept-only models. Precision was lowest for complex models such as BAPC models, generalized additive models with multivariate smoothers and generalized linear models with age x period interaction effects.

Conclusion: The incAnalysis R package allows a straightforward comparison of cancer incidence rate projection approaches. Further detailed and targeted investigations into model performance in addition to the presented empirical results are recommended to derive guidance on appropriate statistical projection methods in a given setting.

Keywords: Bayesian model; Cancer epidemiology, age-period-cohort model; Cancer incidence projection; INLA.

PubMed Disclaimer

Conflict of interest statement

CS is now full-time employee of Boehringer Ingelheim Pharma GmbH & Co. KG, Ingelheim, Germany. The company had no role in design, analysis or interpretation of the presented work.

Figures

Fig. 1

Fig. 1

Overview of the analyzed cancer registry data, study design, model selection and evaluation metrics

Fig. 2

Fig. 2

The R package “incAnalysis”

Fig. 3

Fig. 3

Coverages of future projections after 2, 5, 10, 15 and 20 yrs. based on models with a 15 yr observation period. Dashed line: 95% coverage. Int: intercept only model, lin + interact: linear age, period and interaction effects, age,bs: univariate smoother (B-spline) for age, splineTensor: tensor product smoother (age, period), M-spline basis. GLMs, GAMs: neg-binomial distribution

Fig. 4

Fig. 4

Bias of future projections after 2, 5, 10, 15 and 20 yrs. based on models with a 15 yr observation period. Negative values indicate overestimation of cancer incidence. Bias values smaller than − 200 were set to − 200. Dashed line: no bias (0%). int: intercept only model, lin + interact: linear age, period and interaction effects, age,bs: univariate smoother (B-spline) for age, splineTensor}: tensor product smoother (age, period), M-spline basis. GLMs, GAMs: neg-binomial distribution

Fig. 5

Fig. 5

Precision of future projections after 2, 5, 10, 15 and 20 yrs. based on models with a 15 yr observation period. Transformed averaged posterior standard deviations are shown. Int: intercept only model, lin + interac: linear age, period and interaction effects, age,bs: univariate smoother (B-spline) for age, splineTensor: tensor product smoother (age, period), M-spline basis. GLMs, GAMs: neg-binomial distribution

Similar articles

Cited by

References

    1. Brown LD, Cai TT, DasGupta A, Agresti A, Coull BA, Casella G, Corcoran C, Mehta C, Ghosh M, Santner TJ, et al. Interval estimation for a binomial proportion - comment - rejoinder. Stat Sci. 2001;16(2):101–133.
    1. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. CA Cancer J Clin. 2019;69(1):7–34. - PubMed
    1. Moller B, Fekjaer H, Hakulinen T, Sigvaldason H, Storm HH, Talback M, Haldorsen T. Prediction of cancer incidence in the Nordic countries: empirical comparison of different approaches. Stat Med. 2003;22(17):2751–2766. - PubMed
    1. Bray F, Moller B. Predicting the future burden of cancer. Nat Rev Cancer. 2006;6(1):63–74. - PubMed
    1. Moller H, Fairley L, Coupland V, Okello C, Green M, Forman D, Moller B, Bray F. The future burden of cancer in England: incidence and numbers of new patients in 2020. Br J Cancer. 2007;96(9):1484–1488. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources