Learning high-level visual representations from a child’s perspective without strong inductive biases (original) (raw)
References
Bomba, P. & Siqueland, E. The nature and structure of infant form categories. J. Exp. Child Psychol.35, 294–328 (1983). Article Google Scholar
Murphy, G. The Big Book of Concepts (MIT, 2002).
Kellman, P. & Spelke, E. Perception of partly occluded objects in infancy. Cogn. Psychol.15, 483–524 (1983). Article Google Scholar
Spelke, E., Breinlinger, K., Macomber, J. & Jacobson, K. Origin of knowledge. Psychol. Rev.99, 605–632 (1992). Article Google Scholar
Ayzenberg, V. & Lourenco, S. Young children outperform feed-forward and recurrent neural networks on challenging object recognition tasks. J. Vis.20, 310–310 (2020). Article Google Scholar
Huber, L. S., Geirhos, R. & Wichmann, F. A. The developmental trajectory of object recognition robustness: children are like small adults but unlike big deep neural networks. J. Vis.23, 4 (2023).
Locke, J. An Essay Concerning Human Understanding (ed. Fraser, A. C.) (Clarendon Press, 1894).
Leibniz, G. New Essays on Human Understanding 2nd edn (eds Remnant, P. & Bennett, J.) (Cambridge Univ. Press, 1996).
Spelke, E. Initial knowledge: six suggestions. Cognition50, 431–445 (1994). Article Google Scholar
Markman, E. Categorization and Naming in Children (MIT, 1989).
Merriman, W., Bowman, L. & MacWhinney, B. The mutual exclusivity bias in children’s word learning. Monogr. Soc. Res. Child Dev.54, 1–132 (1989).
Elman, J., Bates, E. & Johnson, M. Rethinking Innateness: A Connectionist Perspective on Development (MIT, 1996).
Sullivan, J., Mei, M., Perfors, A., Wojcik, E. & Frank, M. SAYCam: a large, longitudinal audiovisual dataset recorded from the infant’s perspective. Open Mind5, 20–29 (2022). Article Google Scholar
Caron, M. et al. Emerging properties in self-supervised vision transformers. In Proc. IEEE/CVF International Conference on Computer Vision 9650–9660 (IEEE, 2021).
He, K. et al. Masked autoencoders are scalable vision learners. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition 15979–15988 (IEEE, 2022).
Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. In International Conference on Learning Representations (2020).
Xie, S., Girshick, R., Dollár, P., Tu, Z. & He, K. Aggregated residual transformations for deep neural networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 1492–1500 (IEEE, 2017).
Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis.115, 211–252 (2015). ArticleMathSciNet Google Scholar
Smaira, L. et al. A short note on the Kinetics-700-2020 human action dataset. Preprint at https://arxiv.org/abs/2010.10864 (2020).
Grauman, K. et al. Ego4D: around the world in 3,000 hours of egocentric video. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18995–19012 (IEEE, 2022).
Esser, P., Rombach, R. & Ommer, B. Taming transformers for high-resolution image synthesis. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 12873–12883 (IEEE, 2021).
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A. & Torralba, A. Learning deep features for discriminative localization. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 2921–2929 (IEEE, 2016).
van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Kuznetsova, A. et al. The Open Images Dataset V4. Int. J. Comput. Vis.128, 1956–1981 (2020).
Smith, L. & Slone, L. A developmental approach to machine learning? Front. Psychol.8, 2124 (2017). Article Google Scholar
Bambach, S., Crandall, D., Smith, L. & Yu, C. Toddler-inspired visual object learning. Adv. Neural Inf. Process. Syst.31, 1209–1218 (2018).
Zaadnoordijk, L., Besold, T. & Cusack, R. Lessons from infant learning for unsupervised machine learning. Nat. Mach. Intell.4, 510–520 (2022). Article Google Scholar
Orhan, E., Gupta, V. & Lake, B. Self-supervised learning through the eyes of a child. Adv. Neur. In.33, 9960–9971 (2020). Google Scholar
Lee, D., Gujarathi, P. & Wood, J. Controlled-rearing studies of newborn chicks and deep neural networks. Preprint at https://arxiv.org/abs/2112.06106 (2021).
Zhuang, C. et al. Unsupervised neural network models of the ventral visual stream. Proc. Natl Acad. Sci. USA118, e2014196118 (2021). Article Google Scholar
Zhuang, C. et al. How well do unsupervised learning algorithms model human real-time and life-long learning? In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2022).
Vong, W. K., Wang, W., Orhan, A. E. & Lake, B. M. Grounded language acquisition through the eyes and ears of a single child. Science383, 504–511 (2024).
Locatello, F. et al. Object-centric learning with slot attention. Adv. Neur. In.33, 11525–11538 (2020). Google Scholar
Lillicrap, T., Santoro, A., Marris, L., Akerman, C. & Hinton, G. Backpropagation and the brain. Nat. Rev. Neurosci.21, 335–346 (2020). Article Google Scholar
Gureckis, T. & Markant, D. Self-directed learning: a cognitive and computational perspective. Perspect. Psychol. Sci.7, 464–481 (2012). Article Google Scholar
Long, B. et al. The BabyView camera: designing a new head-mounted camera to capture children’s early social and visual environments. Behav. Res. Methodshttps://doi.org/10.3758/s13428-023-02206-1 (2023).
Moore, D., Oakes, L., Romero, V. & McCrink, K. Leveraging developmental psychology to evaluate artificial intelligence. In 2022 IEEE International Conference on Development and Learning (ICDL) 36–41 (IEEE, 2022).
Frank, M. C. Bridging the data gap between children and large language models. Trends Cogn. Sci.27, 990–992 (2023).
Lomonaco, V. & Maltoni, D. CORe50: a new dataset and benchmark for continuous object recognition. In Proc. 1st Annual Conference on Robot Learning (eds Levine, S. et al.) 17–26 (PMLR, 2017).
Mehrer, J., Spoerer, C., Jones, E., Kriegeskorte, N. & Kietzmann, T. An ecologically motivated image dataset for deep learning yields better models of human vision. Proc. Natl Acad. Sci. USA118, e2011417118 (2021). Article Google Scholar
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A. & Torralba, A. Places: a 10 million image database for scene recognition. IEEE T. Pattern Anal.40, 1452–1464 (2017).
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B. & Hochreiter, S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Adv. Neural Inf. Process. Syst.30, 6629–6640 (2017).