Deep learning is combined with massive-scale citizen science to improve large-scale image classification (original) (raw)
- Analysis
- Published: 01 October 2018
- Casper F Winsnes ORCID: orcid.org/0000-0002-0028-58651 na1,
- Lovisa Åkesson1,
- Martin Hjelmare1,
- Mikaela Wiking1,
- Rutger Schutten1,
- Linzi Campbell2,
- Hjalti Leifsson2,
- Scott Rhodes2,
- Andie Nordgren2,
- Kevin Smith3,
- Bernard Revaz4,
- Bergur Finnbogason2,
- Attila Szantner4 &
- …
- Emma Lundberg1,5,6
Nature Biotechnology volume 36, pages 820–828 (2018)Cite this article
- 14k Accesses
- 129 Citations
- 222 Altmetric
- Metrics details
Subjects
Abstract
Pattern recognition and classification of images are key challenges throughout the life sciences. We combined two approaches for large-scale classification of fluorescence microscopy images. First, using the publicly available data set from the Cell Atlas of the Human Protein Atlas (HPA), we integrated an image-classification task into a mainstream video game (EVE Online) as a mini-game, named Project Discovery. Participation by 322,006 gamers over 1 year provided nearly 33 million classifications of subcellular localization patterns, including patterns that were not previously annotated by the HPA. Second, we used deep learning to build an automated Localization Cellular Annotation Tool (Loc-CAT). This tool classifies proteins into 29 subcellular localization patterns and can deal efficiently with multi-localization proteins, performing robustly across different cell types. Combining the annotations of gamers and deep learning, we applied transfer learning to create a boosted learner that can characterize subcellular protein distribution with F1 score of 0.72. We found that engaging players of commercial computer games provided data that augmented deep learning and enabled scalable and readily improved image classification.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Additional access options:
Similar content being viewed by others
References
- Bouwer, J. et al. Petabyte data management and automated data workflow in neuroscience: delivering data from the instruments to the researcher's fingertips. Microsc. Microanal. 17, 276–277 (2011).
Article Google Scholar - Ferrucci, D. et al. Building Watson: an overview of the DeepQA project. AI Magazine 31, 59–79 (2010).
Article Google Scholar - Larrañaga, P. et al. Machine learning in bioinformatics. Brief. Bioinform. 7, 86–112 (2006).
Article Google Scholar - Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Article CAS Google Scholar - Litjens, G. et al. A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017).
Article Google Scholar - Cohn, J.P. Citizen science: can volunteers do real research? Bioscience 58, 192–197 (2008).
Article Google Scholar - Uhlen, M. et al. Towards a knowledge-based Human Protein Atlas. Nat. Biotechnol. 28, 1248–1250 (2010).
Article CAS Google Scholar - Thul, P.J. et al. A subcellular map of the human proteome. Science 356, eaai3321 (2017).
Article Google Scholar - Boland, M.V. & Murphy, R.F. A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells. Bioinformatics 17, 1213–1223 (2001).
Article CAS Google Scholar - Huang, K. & Murphy, R.F. Boosting accuracy of automated classification of fluorescence microscope images for location proteomics. BMC Bioinformatics 5, 78 (2004).
Article Google Scholar - Newberg, J.Y. et al. Automated analysis of Human Protein Atlas immunofluorescence images. Proc. IEEE Int. Symp. Biomed. Imaging 5193229, 1023–1026 (2009).
PubMed PubMed Central Google Scholar - Li, J., Newberg, J.Y., Uhlén, M., Lundberg, E. & Murphy, R.F. Automated analysis and reannotation of subcellular locations in confocal images from the Human Protein Atlas. PLoS One 7, e50514 (2012).
Article CAS Google Scholar - Li, J., Xiong, L., Schneider, J. & Murphy, R.F. Protein subcellular location pattern classification in cellular images using latent discriminative models. Bioinformatics 28, i32–i39 (2012).
Article CAS Google Scholar - Coelho, L.P. et al. Determining the subcellular location of new proteins from microscope images using local features. Bioinformatics 29, 2343–2349 (2013).
Article CAS Google Scholar - Chebira, A. et al. A multiresolution approach to automated classification of protein subcellular location images. BMC Bioinformatics 8, 210 (2007).
Article Google Scholar - LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article CAS Google Scholar - Pärnamaa, T. & Parts, L. Accurate classification of protein subcellular localization from high-throughput microscopy images using deep learning. G3 (Bethesda) 7, 1385–1392 (2017).
Article Google Scholar - Kraus, O.Z., Ba, J.L. & Frey, B.J. Classifying and segmenting microscopy images with deep multiple instance learning. Bioinformatics 32, i52–i59 (2016).
Article CAS Google Scholar - Nathalie Japkowicz, S.S. The class imbalance problem: A systematic study. Intell. Data Anal. 6, 429–449 (2002).
Article Google Scholar - Coelho, L.P., Peng, T. & Murphy, R.F. Quantifying the distribution of probes between subcellular locations using unsupervised pattern unmixing. Bioinformatics 26, i7–i12 (2010).
Article CAS Google Scholar - Zhao, T., Velliste, M., Boland, M.V. & Murphy, R.F. Object type recognition for automated analysis of protein subcellular location. IEEE Trans. Image Process. 14, 1351–1359 (2005).
Article Google Scholar - Shen, Y.-Y.X.L.-X.Y.H.-B. Bioimage-based protein subcellular location prediction: a comprehensive review. Front. Comput. Sci. 12, 26–39 (2018).
Article CAS Google Scholar - Khatib, F. et al. Algorithm discovery by protein folding game players. Proc. Natl. Acad. Sci. USA 108, 18949–18953 (2011).
Article CAS Google Scholar - Khatib, F. et al. Crystal structure of a monomeric retroviral protease solved by protein folding game players. Nat. Struct. Mol. Biol. 18, 1175–1177 (2011).
Article CAS Google Scholar - Chris, J. et al. Galaxy Zoo: 'Hanny's Voorwerp', a quasar light echo? Mon. Not. R. Astron. Soc. 399, 129–140 (2009).
Article Google Scholar - Clery, D. Galaxy evolution. Galaxy zoo volunteers share pain and glory of research. Science 333, 173–175 (2011).
Article CAS Google Scholar - Raddick, M.J. et al. Galaxy Zoo: exploring the motivations of citizen science volunteers. Astron. Educ. Rev. 9, 18 (2010).
Article Google Scholar - Lee, J. et al. RNA design rules from a massive open laboratory. Proc. Natl. Acad. Sci. USA 111, 2122–2127 (2014).
Article Google Scholar - Sørensen, J.J. et al. Exploring the quantum speed limit with computer games. Nature 532, 210–213 (2016).
Article Google Scholar - Hughes, A. et al. Quantius: Generic, high-fidelity human annotation of scientific images at 105-clicks-per-hour. Preprint at https://doi.org/www.biorxiv.org/content/early/2017/07/15/164087 (2017).
- Danielle, N., Shapiro, J.C. & Mueller, P.A. Using mechanical turk to study clinical populations. Clin. Pyschol. Sci. 1, 213–220 (2013).
Article Google Scholar - Cox, J. et al. How is success defined and measured in online citizen science? A case study of Zooniverse projects. Comput. Sci. Eng. 17, 28–41 (2015).
Article Google Scholar - Feng, W., Brandt, D. & Shah, D. A long-term study of a popular MMORPG. Proceedings of the 6th ACM SIGCOMM Workshop on Network and System Support for Games 19–24 (2007).
- Warfield, S.K., Zou, K.H. & Wells, W.M. Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans. Med. Imaging 23, 903–921 (2004).
Article Google Scholar - Snow, R., O'Connor, B., Jurafsky, D. & Ng, A. Cheap and fast, but is it good? Evaluating non-expert annotations for natural language tasks. Conference on Empirical Methods in Natural Language Processing 254–263 (2008).
- Calise, S.J. et al. Glutamine deprivation initiates reversible assembly of mammalian rods and rings. Cell. Mol. Life Sci. 71, 2963–2973 (2014).
Article CAS Google Scholar - Carcamo, W.C. et al. Induction of cytoplasmic rods and rings structures by inhibition of the CTP and GTP synthetic pathway in mammalian cells. PLoS One 6, e29690 (2011).
Article CAS Google Scholar - Handfield, L.F., Chong, Y.T., Simmons, J., Andrews, B.J. & Moses, A.M. Unsupervised clustering of subcellular protein expression patterns in high-throughput microscopy images reveals protein complexes and functional relationships between proteins. PLOS Comput. Biol. 9, e1003085 (2013).
Article CAS Google Scholar - Hasanpour, S., Rouhani, M., Fayyaz, M. & Sabokrou, M. Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures. Preprint at https://doi.org/arxiv.org/abs/1608.06037 (2016).
Acknowledgements
We acknowledge the staff of the Human Protein Atlas program for valuable contributions. We acknowledge the EVE Development team, the University of Reykjavik and the University of Iceland for assistance with the game implementation. We acknowledge MMOS Sarl for serving images and managing response collection and CCP hf and MMOS Sarl for financially supporting the image storage and serving throughout Project Discovery. Funding to E.L. was provided by the Knut and Alice Wallenberg Foundation.
Author information
Author notes
- Devin P Sullivan and Casper F Winsnes: These authors contributed equally to this work.
Authors and Affiliations
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH - Royal Institute of Technology, Stockholm, Sweden
Devin P Sullivan, Casper F Winsnes, Lovisa Åkesson, Martin Hjelmare, Mikaela Wiking, Rutger Schutten & Emma Lundberg - CCP hf, Reyjkavik, Iceland
Linzi Campbell, Hjalti Leifsson, Scott Rhodes, Andie Nordgren & Bergur Finnbogason - Science for Life Laboratory, School of Computer Science and Communication, KTH - Royal Institute of Technology, Stockholm, Sweden
Kevin Smith - MMOS Sàrl, Monthey, Switzerland
Bernard Revaz & Attila Szantner - Department of Genetics, Stanford University, Stanford, California, USA
Emma Lundberg - Chan Zuckerberg Biohub, San Francisco, San Francisco, California, USA
Emma Lundberg
Authors
- Devin P Sullivan
You can also search for this author inPubMed Google Scholar - Casper F Winsnes
You can also search for this author inPubMed Google Scholar - Lovisa Åkesson
You can also search for this author inPubMed Google Scholar - Martin Hjelmare
You can also search for this author inPubMed Google Scholar - Mikaela Wiking
You can also search for this author inPubMed Google Scholar - Rutger Schutten
You can also search for this author inPubMed Google Scholar - Linzi Campbell
You can also search for this author inPubMed Google Scholar - Hjalti Leifsson
You can also search for this author inPubMed Google Scholar - Scott Rhodes
You can also search for this author inPubMed Google Scholar - Andie Nordgren
You can also search for this author inPubMed Google Scholar - Kevin Smith
You can also search for this author inPubMed Google Scholar - Bernard Revaz
You can also search for this author inPubMed Google Scholar - Bergur Finnbogason
You can also search for this author inPubMed Google Scholar - Attila Szantner
You can also search for this author inPubMed Google Scholar - Emma Lundberg
You can also search for this author inPubMed Google Scholar
Contributions
A.S., B.R., B.F., A.N. and E.L. conceived the study. M.H., A.S., B.F., E.L., D.P.S. and C.F.W. developed the methodology for the study. A.S. and B.R. developed the citizen science engine. L.C., H.L., S.R. and B.F. developed the game narrative and implementation. Project Discovery was played by thousands of players of EVE Online. D.P.S., L.Å., M.W., R.S. and E.L. provided game support. C.F.W., K.S. and D.P.S. developed the machine learning. D.P.S., C.F.W. and E.L. carried out data analysis and investigation. D.P.S., C.F.W. and E.L. wrote the manuscript. D.P.S. and C.F.W. created the figures. E.L. supervised and administered the project and acquired funding.
Corresponding author
Correspondence toEmma Lundberg.
Ethics declarations
Competing interests
A.S. and B.R. are founders of MMOS Sarl.
Integrated supplementary information
Supplementary Figure 1 Thirty-day retention for each month of Project Discovery.
Rows represent the month players joined Project Discovery, and columns represent the number of months the corresponding user group has been playing for.
Supplementary Figure 2 Individual player performance in Project Discovery
(a) Individual player accuracies (dots) for players with a minimum of 10 image evaluations show that player accuracy generally increases as players evaluate more samples (contour). Despite ~10% of players perform worse than naively guessing the most common class (Cytoplasm, blue dots), the consensus accuracy (black line) remains remarkably higher than the player average. Though a large number of poor players drop off after 100 samples or so, player performance remains remarkably unimproved over samples analyzed. (b) Player performance vs time spent per task (seconds) shows no discernable trend. This measure is confounded with time which players spent on other in-game actions with the interface open.
Supplementary Figure 3 Project Discovery performance relative to HPA v14.
(a) Gamer over-represented co-annotations with solution classes from the HPA Cell Atlas v14 (p<1e-2, one-tailed Binomial test, Bonferroni corrected by row, sample size indicated in parenthesis on each row/column) of gamer predicted labels (columns, blue), with expected co-localization frequencies from HPA v14 (rows, red). Columns with large numbers of significant over co-annotations represent generally over annotated classifications by the gamers (Nucleus, Cytoplasm, Aggresome, Microtubule ends). (b) Proportion of co-annotation in Project Discovery from gamer labels (columns, blue) with HPA Cell Atlas v14 labels (rows, red). Note particularly that novel classes (e.g. nucleoli rim) are co-annotated with their logical parent class (nucleoli) indicating successful refinement of labels.
Supplementary Figure 4 Schematic outline of how the different methods presented in this paper generate their annotations
(a) Project Discovery (PD) let citizen scientists use a game interface to annotate images, taken from the Human Protein Atlas (HPA), into one or more of 29 different classes. (b) Localization Cellular Annotation Tool (Loc-CAT) is a neural network model which, using image derived features, annotates HPA images into one or more of 23 different classes. (c) Gamer Augmented Loc-CAT (GA Loc-CAT) uses image derived features in conjunction with player votes from PD to classify images from the HPA into one or more of 23 different classes. The votes from the gamers are presented as a p-value vector which is concatenated to the image features and fed to the Loc-CAT architecture. (d) Loc-CAT+ uses a separate neural network trained to estimate what players from PD would have voted for (“pseudo gamer”) together with the image features to classify images from the HPA into one or more of 23 different classes. The output from the “pseudo gamer” is concatenated to the feature vector and used as input to the Loc-CAT architecture.
Supplementary Figure 5 Overrepresented co-annotations in Loc-CAT+
Loc-CAT+ over-represented co-annotations with solution classes from the HPA Cell Atlas v14 (p<1e-2, one-tailed Binomial test, Bonferroni corrected by row, sample size indicated in parenthesis on each row/column) of Loc-CAT+ predicted labels (columns, blue), with expected co-localization frequencies from HPA v14 (rows, red). Columns with large numbers of significant over co-annotations (n>5) represent generally over annotated classifications by Loc-CAT+.
Supplementary information
Rights and permissions
About this article
Cite this article
Sullivan, D., Winsnes, C., Åkesson, L. et al. Deep learning is combined with massive-scale citizen science to improve large-scale image classification.Nat Biotechnol 36, 820–828 (2018). https://doi.org/10.1038/nbt.4225
- Received: 24 December 2017
- Accepted: 19 July 2018
- Published: 01 October 2018
- Issue Date: October 2018
- DOI: https://doi.org/10.1038/nbt.4225