Towards a Common Coordinate Framework for the Human Body (original) (raw)

Cell. Author manuscript; available in PMC 2020 Dec 12.

Published in final edited form as:

PMCID: PMC6934046

NIHMSID: NIHMS1545970

Jennifer E. Rood,1 Tim Stuart,*,2 Shila Ghazanfar,*,3 Tommaso Biancalani,*,1 Eyal Fisher,3 Andrew Butler,2,4 Anna Hupalowska,1 Leslie Gaffney,1 William Mauck,2,4 Gökçen Eraslan,1 John C. Marioni,†,3,5,6 Aviv Regev,†,1,7 and Rahul Satija†,2,4

Jennifer E. Rood

1Klarman Cell Observatory, Broad Institute, Cambridge, MA, USA 02142

Tim Stuart

2New York Genome Center. New York, NY 10013

Shila Ghazanfar

3Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, United Kingdom

Tommaso Biancalani

1Klarman Cell Observatory, Broad Institute, Cambridge, MA, USA 02142

Eyal Fisher

3Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, United Kingdom

Andrew Butler

2New York Genome Center. New York, NY 10013

4New York University, Center for Genomics and Systems Biology. New York, NY 10012.

Anna Hupalowska

1Klarman Cell Observatory, Broad Institute, Cambridge, MA, USA 02142

Leslie Gaffney

1Klarman Cell Observatory, Broad Institute, Cambridge, MA, USA 02142

William Mauck

2New York Genome Center. New York, NY 10013

4New York University, Center for Genomics and Systems Biology. New York, NY 10012.

Gökçen Eraslan

1Klarman Cell Observatory, Broad Institute, Cambridge, MA, USA 02142

John C. Marioni

3Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, United Kingdom

5European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom

6Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom

Aviv Regev

1Klarman Cell Observatory, Broad Institute, Cambridge, MA, USA 02142

7Howard Hughes Medical Institute and Koch Institute for Integrative Cancer Research, Department of Biology, MIT, Cambridge, MA, USA 02142

Rahul Satija

2New York Genome Center. New York, NY 10013

4New York University, Center for Genomics and Systems Biology. New York, NY 10012.

1Klarman Cell Observatory, Broad Institute, Cambridge, MA, USA 02142

2New York Genome Center. New York, NY 10013

3Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, United Kingdom

4New York University, Center for Genomics and Systems Biology. New York, NY 10012.

5European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom

6Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom

7Howard Hughes Medical Institute and Koch Institute for Integrative Cancer Research, Department of Biology, MIT, Cambridge, MA, USA 02142

*These authors contributed equally to this work

Abstract

Understanding the genetic and molecular features of phenotypic heterogeneity across individuals is central to biology. As new technologies enable fine-grained spatially resolved molecular profiling, we need new computational approaches to integrate data from the same organ across different individuals into a consistent reference, and to construct maps of molecular and cellular organization at histological and anatomical scales. Here, we review previous efforts and discuss challenges involved in establishing such a Common Coordinate Framework, the underlying map of tissues and organs. We focus on strategies to handle anatomical variation across individuals and highlight the need for new technologies and analytical methods spanning multiple hierarchical scales of spatial resolution.

eTOC blurb – Regev, Satija & Marioni

Satija, Regev, Marioni & colleagues recommend approaches to create a reference map of the human body down to the single-cell level – a task made challenging by the diverse human form.

Introduction

Recent technological advances in single-cell and spatial genomics have opened the door to profiling tissues in greater depth – both with molecular profiling at the level of single cells, and in more spatial detail. The convergence of these technologies now presents a remarkable opportunity, taken up by a worldwide initiative (Regev et al., 2017): to build an atlas of cells, tissues, and organs throughout the human body that captures their molecular characteristics and their spatial location and organization.

In order to create atlases that integrate high-resolution molecular and spatial information across individuals, however, it will be necessary to have a Common Coordinate Framework (CCF) to reference and address diverse types of biological data (Box 1). This coordinate system would serve as a reference map that can assign a reproducible address to every location in the human body. While an individual coordinate system would be relatively straightforward to construct for one person, a human CCF needs to represent data collected across individuals of different ages, genders, genetic backgrounds, and body sizes. In this way, a CCF will enable the robust comparison of data collected across individuals, while accounting for variability driven by spatial differences. Such a mapping will open the way to high resolution studies across large numbers of individuals, for example in the context of genetic association studies, or for identifying aberrations in disease relative to a healthy reference.

Box 1.

Glossary

Common coordinate framework An underlying reference map of organs, tissues, or cells that allows new individual samples to be mapped, to determine the relative location of structural regions between samples.
Landmark A defined anatomical, histological, or molecular feature that is readily identifiable and consistently located across individuals.
Macro-CCF A CCF describing tissues at the gross anatomical level at the organ, multi-organ or whole body scale
Meso-scale CCF A CCF at the scale between gross anatomy and detailed histology
Micro-scale CCF A CCF focused on local tissue organization, rather than the shape and structure of an entire organ
Anatomical plane coordinates Three-dimensional coordinate systems defined around a given anatomically determined origin and axes that represent the physical space of a sample.
Landmark-based axial coordinates The use of consistently located anatomical landmarks, rather than anatomical planes, to define a coordinate system based on the relative distance to landmarks
Image registration The transformation of spatially resolved data onto a common coordinate system in a way that maximally preserves the spatial structure within each individual dataset
Global registration The application of linear transformations to spatial coordinates to minimize overall positional differences between datasets
Local transformation The application local non-linear transformations to spatial datasets to minimize local differences in spatial structure, without affecting the relative position of more distant regions of a dataset
Intensity-based image registration The registration of spatial datasets through the minimization of differences in intensity values (eg, image intensity or gene expression level) at the same coordinate between the two datasets
Landmark-based image registration The registration of spatial datasets through the minimization of the distance between equivalent points (landmarks) between datasets
Deformation fields The deformations in shape required for transformation from an initial configuration to a deformed configuration
Joint non-negative matrix factorization A matrix factorization method for identifying shared and non-shared factors between a pair of datasets

Constructing a common reference framework poses a significant methodological challenge, particularly given the inherent anatomical diversity across individuals. Additionally, while certain elements of human anatomy can be uniquely defined, not all body locations are stereotypical, nor can they be reproducibly and invariantly defined across all humans in a consistent manner. Because the extent of anatomical reproducibility varies across scales, identifying mappable features (‘landmarks’) at multiple levels of resolution represents a key challenge for building a CCF (Box 1). While significant, these challenges also highlight opportunities for a human CCF, in terms of spatially registering individual datasets, anchoring precise comparisons across individuals, and enabling spatial meta-analyses of molecular data. Finally, the required resolution of the CCF itself will depend upon the biological questions of interest, with some studies requiring very fine-scale mapping, while coarser level information will suffice for others.

Here, we consider previous strategies to construct anatomical atlases at the level of individual tissues and organs in both humans and in model systems, as well as approaches for mapping in other fields, with potential relevance in biology. We discuss coordinate systems that can be used to define a human CCF, and review data collection procedures and computational methods motivated by construction and alignment of spatial datasets. Finally, we highlight outstanding challenges for working with human tissues, and the need for data types and analytical tools that can bridge multiple scales of hierarchical resolution.

Organizational differences across tissues and scales pose distinct challenges for CCF construction

The strategy for constructing an anatomical template and downstream coordinate system depends on the nature of the system. To illustrate how the sample type influences the selection of the strategy, consider two extremes of diversity of tissues in the human body. On one extreme, the anatomical organization of the tissue is highly similar across all samples, during, for example, the very first stages of embryogenesis. On the other extreme, cells are organized in a seemingly random and perhaps also dynamic fashion, for example in a tumor. Most human tissues fall on a spectrum between these two extremes (Figure 1).

An external file that holds a picture, illustration, etc. Object name is nihms-1545970-f0001.jpg

Types of coordinate systems

Different tissue or organ structures may require different types of coordinate systems in order to adapt to differing levels of inter-individual specimen variability. Highly replicable structured tissues, such as embryos, may be able to employ anatomical plane coordinate systems in a CCF, whereas increasing specimen variability will require higher degrees of flexibility in the coordinate system used. In tissues with greater inter-specimen variability, landmark-based coordinate systems or nonlinear approaches, rather than anatomical plane coordinates, are more robust to the presence of non-conserved spatial structures in different individuals.

Each extreme would be best served by different methods to map specimens to a CCF (Figure 1). In the highly organized extreme, the optimal strategy is to know the physical location of a sample, cell, etc. a priori by registering it to a pre-existing coordinate system. The alternative, better suited to samples lacking a clear organizational structure, is to learn the sample’s location based on cellular, histological, and/or anatomical features collected directly as data – that is, the map is a learned function. Because samples will fall on a spectrum between the completely reproducible to completely random organization, a successful CCF should integrate these two approaches – a combination of direct registration and data-driven learning of location. In this way, we can relate to prior knowledge and biological concepts, while also discovering novel features in the human CCF.

Similar contrasting options have appeared in prior efforts to build reference maps of biological knowledge in other contexts. For example, when assembling a transcriptome, one possibility is an ab initio approach (Trapnell, Pachter and Salzberg, 2009; Yassour et al., 2009; Guttman et al., 2010), where RNA transcripts are aligned to a reference template (here, the genome) – that is, the transcript’s location is known. In contrast, de novo approaches (Grabherr et al., 2011; Xie et al., 2014) permit creating a transcriptome by assembly without the aid of any reference. A further example is the two approaches applied to sequencing the human genome for the Human Genome Project: the first, physical mapping, used restriction enzymes or optical mapping techniques to align large chunks of genome sequence based on small areas of overlap (that is, it is already known where the piece belongs) (Jing et al., 1998). By contrast, shotgun sequencing sheared the DNA into small, random pieces – and by sequencing many of these pieces, it was possible to computationally reconstruct the genome, in a process analogous to learning a cell’s or sample’s location without registering it first.

The spectrum of sample reproducibility exists not only across distinct specimens, but even within the same organ when considering multiple scales. In the gastrointestinal tract, for example, the major anatomical structures (stomach, esophagus, small and large intestine, etc.) are invariantly present across individuals and easily identified, though their exact positions or sizes may vary. Focusing within the small intestine, the duodenum, jejunum, and ileum are conserved substructures, but the boundaries separating them are imprecise (San Roman and Shivdasani, 2011), and may be challenging to consistently annotate. Zooming in further, to the histological scale, reveals the presence of repeating intestinal crypts with similar structures but differences in quantity and exact location across individuals. Therefore, even within a single specimen, a combination of direct measurement and data-driven approaches may be needed to position data at different anatomical scales.

This challenge highlights a broader question for efforts to chart atlases – what is the desired anatomical scale for the human CCF? If constructed on a macro-scale, a CCF could be used to locate major organs and body systems, and study their relative position across individuals. A macro-CCF, potentially generated from whole-body imaging datasets, would aid in the understanding of how age, gender, and height affect gross human anatomy. A meso-scale CCF likely would assume a fixed position for these macroscopic structures, and then explore variation at a finer scale, for example, differences in the structure and location of branched airways in the lung, or renal cortex and renal medulla in the kidney. Extending further, a micro-scale CCF would reference a histological and cellular view of individual samples, and represents an opportunity to explore fine-scale organization at the cell neighborhood, cellular or even sub-cellular level, and observe how interactions between cells influence their phenotype (Box 1).

The choice of scale therefore has significant impact on both the required underlying data and downstream analysis methods. As we discuss below, prior atlas efforts have typically taken place at meso-scale resolution, with reference imaging data generated at the organ level. A key challenge for future studies is to extend atlas construction to the micro-scale, but with the ability to also relate these coordinates to meso- and macro-scale atlases. This challenge, particularly in the absence of technologies for whole-body imaging at histological resolution, suggests that a single human CCF may not be an optimal aim for current studies. Instead, we argue for the construction of a hierarchical framework, which, to our knowledge, has not been previously implemented, as an achievable and important goal for current studies to build a human atlas.

Reference coordinate systems for a CCF

We first describe previous coordinate systems that can be used for physical registration of specific locations within organs or the human body, and then briefly discuss key areas where methodological developments will facilitate creation of a reference coordinate system for a human body CCF.

Anatomical plane coordinates

Anatomical plane coordinates (Box 1) represent a framework for navigating anatomical atlases based on physical space, where XYZ coordinates represent the right, anterior, and superior (RAS; equivalently rostro-caudal, latero-lateral, dorso-ventral) axes (Lancaster et al., 1997; Mazziotta et al., 2001; Li et al., 2003). This system typically consists of conventional Cartesian coordinates that have been conveniently oriented with respect to the sagittal, cortical, and axial anatomical planes: X represents the left–right dimension of the specimen, Y the anterior–posterior dimension, and Z the inferior–superior dimension. In the corresponding data matrix, a specific voxel can be indexed as [X, Y, Z], where these three coordinates specify its position along each dimension (Figure 1). These coordinates have the advantage that they are the native space of physical tissues as they are directly imaged (e.g., by scanners), and can thus be applied to any 2D or 3D imaging dataset. As no scaling or nonlinear warping are used in the registration of samples to the anatomical planes, these coordinates can be interpreted as physical distances. However, for samples with significant variation in overall size or shape, simple anatomical plane coordinates will not be sufficient for direct comparison across specimens. Instead, samples must be normalized (or registered) to defined templates to account for inter-specimen variation in absolute distance from the coordinate axes. Similar challenges apply to spherical and cylindrical coordinate systems, which may be better tailored to describe curved anatomical structures, such as the interface between the femur and its surrounding cartilage (Kauffmann et al., 2003), but also require spatial normalization for inter-sample comparison.

Landmark-based axial coordinates

Landmark-based axial coordinates (Box 1) are a flexible extension of anatomical plane coordinate systems, where the origin point and reference orientation are anchored using conserved anatomical landmarks, rather than anatomical planes (Figure 1). The coordinate system itself can be Cartesian or polar; its unifying feature is not the way distance is defined but rather that key anatomical landmarks are the primary reference point. These systems (typically referred as stereotaxic coordinates) are widely used to precisely describe actionable locations for diverse surgical procedures, and are thus an attractive concept for human atlas efforts, where tissue samples and biopsies will often be obtained through resections and post-mortem examinations (Aguet et al., 2017; Regev et al., 2017; HuBMAP Consortium, 2019). They have been used to define reference frameworks in mammalian brains, and, in particular, applied in experimental studies with small animals, such as mice (Paxinos and Franklin, 2012), where organism-specific stereotaxic instruments are available, thus enabling precise localization for electrode placement, injection, and stimulation.

A canonical example of a human landmark-based spatial framework is the Talairach & Tournoux coordinate system, which sought to encompass both the remarkable consistency in overall human brain structure and the different shapes and sizes of individual brains (Talairach and Szikla, 1967). Here, the anatomical landmarks are the anterior commissure (AC) and posterior commissure (PC), two of the axon bundles (commissures) that cross the brain midline at different defined points, with the AC representing the origin (‘stereotaxic zero’), and the line connecting the AC and PC representing a principal axis. Once these landmarks and other key areas are manually defined, a simple nine-parameter linear transformation aligns a new volume to the original specimen. As discussed below, this initial ‘spatial normalization’ has served as an invaluable procedure in constructing anatomical template models of the human brain based off multiple samples, which can be iteratively improved upon through additional samples.

Complex, nonstandard coordinate systems

The above methods are well suited to cases where anatomical axes are meaningfully preserved across specimens, or when there are clear anatomical landmarks that can serve as origins and axes. However, this is not the case in all organs and systems, thus requiring special adaptations to accommodate unusual shapes or poorly conserved structures. For example, the Normalized Thoracic Coordinate System annotates the geometrical centers of thoracic vertebrae to define principal axes, followed by a spline-based coordinate system to define an origin and axes based on manually annotated polynomial models of the spine curve (Wang, Bai and Zhang, 2008). In another example, a local coordinate system for knee cartilage was constructed by fitting a cylindrical coordinate system to the bone-cartilage interface of the femur (Kauffmann et al., 2003). These complex coordinate systems differ from the landmark-based axial coordinates in the use of local nonlinear warping functions that allow for a greater degree of anatomical variation between samples mapped onto the coordinate system. These nonlinear approaches can tackle tissues where there is little spatial preservation in the conventional sense, as long as there are recurrent features across specimens. For example, an intratumor coordinate system could be constructed using the tumor/immune or tumor/stroma interface as an axis for orientation. Even in the absence of an interpretable physical coordinate system, nonlinear transformations can enable consistent segmentation and categorization of diverse data, as has previously been described for the human heart or lung (Fonseca et al., 2011; Li et al., 2012).

Key developments needed for a CCF reference system

Despite the broad applicability of these approaches, we anticipate that additional methodological developments will be needed to represent complex structures in the human body, focusing on two key areas. First, at the histological scale, human organs often cannot be characterized by fully stereotypical architectures. However, they may exhibit alternative conserved histological modules or repetitive geometrical structures, that is, anatomical or histological features, such as the branching structure of the human lung bronchioles, or the repeating (but non-fixed number) colonic crypts that characterize the intestinal mucosa. Second, existing coordinate systems model datasets at a single scale, such that the coordinate system remains static even when zooming in or out in spatial scale. However, when we aim to construct relevant frameworks that map from cells, to tissues, to organs, to whole body, the development of hierarchical coordinate systems that can transition between levels of resolution remains a key outstanding challenge (Figure 2). For example, because the resolution of MRI does not allow profiling of individual cells, mapping of cells with different morphologies or cell densities to precise locations will be challenging. Addressing this and similar challenges will call for new mapping and integration approaches that can transition and connect between different coordinate systems and levels of resolution.

An external file that holds a picture, illustration, etc. Object name is nihms-1545970-f0002.jpg

Hierarchical organization of coordinate systems

A common coordinate framework that encompasses the entire human body will require a hierarchy of coordinate systems covering different scales. From the whole-organ (macro) scale, the CCF will allow a study of the relative differences in size and shape of different body organs between individuals. Zooming in further, additional common coordinate framework layers at progressively finer scales will allow similar analysis of inter-individual anatomical variation at the intra-organ regional (meso), histological (micro) and cllular (fine) scales.

Constructing an anatomical template: diverse strategies for diverse systems

Before creating either an anatomical or landmark-based coordinate system, the first step is to construct an ‘anatomical template’ that represents an initial spatial reference (Figure 3). While this reference may be iteratively improved, it serves as an initial scaffold upon which a downstream coordinate system can be constructed and new data can be mapped. Regardless of the precise context or organ system, a fundamental challenge towards the integrative analysis of multiple individuals and samples is image registration (Box 1) – how to project spatial datasets into a common space where common features overlap across specimens (Zitová and Flusser, 2003).

An external file that holds a picture, illustration, etc. Object name is nihms-1545970-f0003.jpg

Methods for CCF assembly

(A) Methods for constructing a common coordinate framework often begin with the selection of a ‘reference’ template for future downstream alignment. This typically represents a single sample that is most similar to the population average, and will serve as the starting point for construction of the CCF.

(B) Following template selection, all samples in the population can be mapped to the reference template, transforming each individually into the coordinate space of the template. Alternatively, an iterative approach can be used where samples are aligned pairwise to the template, and the template averaged and updated at each iteration until convergence. An iterative approach can be more computationally expensive than pairwise alignment, but helps reduce bias toward any single sample in the final CCF.

(C) Another approach for the construction of a common coordinate framework is to reconstruct a tissue from its own features. In this case, the spatial relationships between cells are not known a priori, but are instead inferred from features measured in the cells.

Image registration approaches typically begin with a global registration (Box 1), followed by a local transformation (Box 1) (Christensen, Joshi and Miller, 1997; Sommer et al., 2013). Global methods look for an affine transformation (e.g., rotation) that maximizes the correspondence between the samples, and operate on every pixel of the image. These methods account for broad variance in anatomical diversity, particularly regarding size and scale differences across samples, as well as for technical sources of variance (for example, microscope alignment). Subsequently, local methods warp the image to account for local distortions, for instance by using radial basis functions (RBF) to define the transformation (Fornefett, Rohr and Stiehl, 2001). Newer approaches based on differential geometry that involve searching for the best diffeomorphic mapping have also been applied (Ashburner, 2007).

Methods for image registration can be roughly partitioned into two classes: intensity-based and landmark-based (Box 1). Intensity-based registration, such as that applied in the Allen Brain Atlas (Oh et al., 2014; Kuan et al., 2015; Allen Institute, 2017), aligns images based on their intensity with the goal of maximizing the correlation between the intensity matrices. Landmark-based methods, such as the Talairach affine transformation described above, aim to identify direct correspondences on key anatomical features that are present across datasets. These may be manually annotated, but are often detected automatically based on their high information content, such as curvature (Ram, Babu and Sivaswamy, 2009). The precise strategy may combine features of both intensity- and landmark-based approaches and depends on the tissue of interest and available spatial reference data. Here, we consider three broad categories of sample types, which fall along the organizational spectrum (Figure 1), and discuss how strategies for template construction can be tailored to these samples.

A highly stereotypical reference sample

At one end of our spectrum are highly stereotypical systems, where anatomical positions from one sample can be ‘invariantly aligned’ to an independent specimen. This use-case requires both biological and technical reproducibility, which is only feasible in model organisms. For example, when constructing a map of the mouse brain, the Allen Brain Atlas Common Coordinate Framework versions 2 and 3 used 1,675 mouse specimens that were identically aged and isogenic, such that additional similar samples can be aligned to the CCF for analysis (Oh et al., 2014; Kuan et al., 2015; Allen Institute, 2017). The experimental data used to build the third version of the mouse CCF originated from a powerful system that couples two-photon microscopy with automated vibratome sectioning (Ragan et al., 2012), leading to high-quality images with minimal tissue distortions. The resulting images of the brain slices from a single specimen were aligned using a 12-parameter affine transformation, then stacked together to create a 3D reconstructed volume. The 1,675 reconstructed volumes were registered together using a non-linear iterative procedure, previously adopted to construct an MRI study of the human brain in a pediatric population (Fonov et al., 2011). First, 41 volumes were globally aligned and averaged to form an initial seed (Kuan et al., 2015). Then, the 1,675 volumes were registered to the seed by maximizing the mutual information across volumes. The corresponding deformation fields, used to register each volume, were also averaged, leading to a mean transformation that was inverted and used to transform the average volume back to original coordinates, thus creating the seed for the next iteration (Box 1). The iterative procedure was interrupted when the mean deformation field was smaller than a chosen threshold.

The Drosophila BrainAligner project leveraged a similar strategy, first creating a seed based on 295 specimens, followed by registration of more than 2,500 additional samples (Peng et al., 2011). The BrainAligner software used landmark-based alignment, where landmarks were initially designated based on high curvature (‘corners or edge points’) in the data, and automatically detected in target images. Notably, while we focus first on the average template derived from mean-intensities, these efforts construct probabilistic atlases that retain and leverage information on individual variation (discussed below).

Such iterative procedures provide an attractive approach for constructing a population-based template without overly relying on a single volume (Kovacević et al., 2005; Chuang et al., 2011). Inevitably, the choice of the initial samples that will form a reference template will be a key challenge for human atlas efforts, particularly since the limited number of sampled individuals, and the differences between them that cannot be controlled for, will prohibit iterative efforts. In this case, it may be advisable to choose a single shape that minimizes the average mismatch between the samples and the reference after alignment, known as the sample Fréchet or Karcher mean (Figure 3A) (Grove and Karcher, 1973). The actual computation depends on the metric definition and data type. For the Gromov-Wasserstein metric, a fast algorithm for calculating the Fréchet mean has been developed (Peyré, Cuturi and Solomon, 2016), and other statistical approaches for construction of templates from images have also been described (Allassonnière, Amit and Trouvé, 2007; Ma et al., 2008). A variation on this approach was employed by the Edinburgh Mouse Atlas Project (EMAP eMouse Atlas Project (http://www.emouseatlas.org)), which used diverse MRI technologies to construct anatomical models of the developing mouse embryo (Richardson et al., 2014). In a fixed strain, the spatial structure of the embryo is well understood and reproducible; moreover, specific spatially resolved expression profiles are also well characterized (e.g., gradients of expression induced by signaling molecules). However, there can be significant anatomical heterogeneity depending on developmental time. The Edinburgh Mouse Atlas overcomes this heterogeneity by constructing separate anatomical references for each developmental time point (known as ‘Theiler’ stages). The atlas thus consists of more than 50 sequentially distinct models, each constructed with conceptually similar high-resolution imaging techniques (including uMRI, optical projection tomography, and episcopic microscopy). While these models are not connected to each other, generating such connections has recently been achieved in other domains with approaches based on optimal transport theory (Bonneel, 2018).

Similar inter-individual organ structure with differences in cell type location or organ dimensions

Unlike efforts in model organisms, building a human atlas must address the challenge of inter-individual variation, even in organs that have a stereotypical structure (Figure 2). For example, the International Consortium for Brain Mapping (ICBM) is constructing an anatomical reference template of the human brain, based on data from multi-section MRI (typically 1mm3 resolution) and aims to profile 7,000 human subjects spanning ages 18–90 years, including 342 twin pairs (half monozygotic, half dizygotic) (Mazziotta et al., 2001). However, individual differences in organ size and shape (due to age, gender, and natural variation), as well as technical inconsistencies in MRI imaging, create significant challenges for unsupervised alignment of different specimens.

To overcome challenges in terms of inter-individual variability, the ICBM used a modified process that incorporated supervision to construct an iterative series of templates (ending with the commonly used ICBM-152 template). Each of the initial seed volumes was first registered to the Talairach atlas, through manual landmark annotation followed by an affine transformation, prior to averaging to construct an initial template. Volumes were then re-aligned to this average template using an unsupervised, intensity-based approach, reducing the risk of inaccurate manual annotation. This strategy not only uses supervision to enhance alignment quality, but also results in a coordinate framework that is conceptually linked to an existing system, thus enabling the propagation of prior biological knowledge.

Such semi-supervised registration followed by population averaging represents a powerful and flexible approach that can be applied whenever manually annotated landmarks can be used to facilitate image registration (Figure 3B). Additional examples include a normative human lung atlas using selected airway and vascular tree landmarks (Li et al., 2003), and a biventricular cardiac atlas constructed using six landmarks spread across both ventricles (Bai et al., 2015). We anticipate that this strategy will be powerful for efforts to build a human atlas, particularly when invariant anatomical features can be easily identified in close collaboration with clinical, surgical, and anatomical experts. For example, the Cardiac Atlas Project and Society for Cardiovascular Magnetic Resonance (SCMR) have provided diverse images to a panel of independent experts, generating a series of ‘ground truth’ annotated datasets that can be used to test landmark-detection methods (Fonseca et al., 2011).

Highly non-stereotypical samples

On the other extreme of possible tissue organizations, there exists substantial inter-individual variability, such that a reference 3-dimensional coordinate system is harder, or perhaps impossible, to define (Figure 1). Such variability occurs in pathological instances, such as tumors, but it can also arise in healthy settings, for example due to disordered structures as in the microscopic airways or vasculature. However, particular ‘landmark features’, such as the distance of a cell from a blood vessel or from the margin of the tumor or normal tissue can be compared across samples. Currently, no general strategies exist for modeling these types of systems – they will require the development of new multi-scale hierarchical approaches.

An alternative approach: Reconstructing an atlas from its own features

A conceptually different approach to constructing both anatomical templates and coordinate systems is to learn their representation, or their salient features, directly from organ, tissue and cell data, either exploiting prior information, or entirely de novo (Figure 3C). In this way, a CCF is constructed ‘bottom-up’, where data collected on individual components of a system enable the identification of features that will form the scaffold for constructing a broader spatial map. These efforts require learning methods tailored to distinct data types, but when feasible, should in principle allow for spatial reconstruction even at single-cell resolution without any prior knowledge.

For example, there has been significant interest in the potential to reconstruct complex tissues using single-cell RNA-seq data from fully dissociated tissue. Such a strategy is possible when information about a cell’s spatial location is represented at least in part by its molecular profile. We first demonstrated the feasibility of this approach in both the zebrafish embryo (Satija et al., 2015) and annelid brain (Achim et al., 2015). In both cases, publicly available in situ gene expression databases could be used for spatial inference of a profiled cell’s spatial location, with mapping resolution determined by the number of known in situ patterns, data quality, and tissue structural organization. Similar approaches have since been successfully applied in the mouse blastocyst, mouse brain, Drosophila embryo, and mammalian liver (Habib et al., 2016; Halpern et al., 2017; Karaiskos et al., 2017; Mori et al., 2017; Stuart et al., 2019). Remarkably, these methods can even be extended to operate fully unsupervised in a de novo setting (that is, without any in situ reference data) (Nitzan et al., 2018) assuming spatial ‘smoothness’ in gene expression space. Such approaches extend the concept of ordering cells along a developmental continuum (‘pseudotime’) to ordering cells along a spatial gradient (‘pseudospace’) (Scialdone et al., 2016; Aizarani et al., 2019).

Given the success of these approaches to spatially map cells based on their gene expression patterns, it is useful to consider which situations are suitable for applying these methods. We highlight two here: spatial restriction with a stereotypical location and gradients. First, when cell types are both spatially restricted and stereotypically located, annotation of molecular data provides a direct representation of spatial location. For example, in the mouse cortex these strategies should work well for excitatory cell types that are arranged in layers but less well for inhibitory cells, which are dispersed through the tissue. The annelid brain also shares many of the same characteristics that enable learning location information. Second, morphogen or signaling gradients are often driven by gene expression and in turn drive downstream gene regulation, such that expression patterns reflect the underlying signaling environment and can be used to connect cells to their spatial location along the gradient. The gradient case is demonstrated in the zebrafish and mouse embryo and liver examples above (Satija et al., 2015; Habib et al., 2016; Halpern et al., 2017). Additionally, new spatial gene expression methods such as MERFISH and Slide-seq (Moffitt et al., 2018; Rodriques et al., 2019) have highlighted cells types that exhibit continuous gradients in the expression of gene modules that are correlated with spatial position in the brain. However, the extent of spatial representation by single cell profiles is unknown in many tissues. High-parameter spatial data provides a unique opportunity to discover these patterns. In addition, other molecular modalities, such as scATAC-seq, could also include representations of spatial position, which may be better understood either in the context of in situ chromatin profiling (Thornton et al., 2019), or by harmonizing these with RNA-seq profiles prior to or as part of the mapping process (Stuart et al., 2019).

A similar strategy can be used to map data to a reference set of known locations with image-based, instead of molecular, features. In recent years, many studies in the field of computer vision have suggested that the visual features of an image provide useful information about its location. Such applications have been mostly focused on photo geolocalization: for example, predicting that a photo is taken in Paris by recognizing the visual features of the Eiffel Tower. Initially, the geolocalization problem is formulated as a supervised classification task, where a classifier is provided hand-crafted visual features of images (Hays and Efros, 2008). Recent advances using deep convolutional neural networks allow automatic feature extraction and end-to-end machine learning models (Weyand, Kostrikov and Philbin, 2016; Vo, Jacobs and Hays, 2017; Seo et al., 2018). Although the structure and content of imaging readouts of biological samples are different than photos of geographic locations, the same principle can be leveraged because different regions of organs and tissues have both different structural (visual) and molecular characteristics, which can in turn be used to find approximate tissue location prior to image registration. Furthermore, such feature-based geolocation approaches can be instrumental in unraveling similar morphological patterns within or across tissues by leveraging the uncertainty of the predictions and finding several possible mapping locations for a given image. We envisage these approaches having potentially broad applicability in a biological context, particularly if learning algorithms can identify features that bridge multiple anatomical scales, enabling (for example) histological data to be mapped onto a meso-scale atlas.

Mapping new datasets onto an existing reference

Once a coordinate system has been assembled it has multiple potential uses. Perhaps most obviously, an atlas can be mined for novel features that can explain a unique function of a particular organ. Another key utility is in mapping new datasets onto the reference (Figure 4). Several approaches exist for this purpose, which are conceptually linked to the process of assembling the original reference itself.

An external file that holds a picture, illustration, etc. Object name is nihms-1545970-f0004.jpg

Mapping new datasets to a CCF

(A) Mapping within a modality (e.g. transcriptomic data) involves the registration of query datasets to the reference CCF. The transformation used to register the query with the reference can then be studied to learn sources of anatomical variation between individuals.

(B) Cross-modality mapping relies on the identification of corresponding features between separate query datasets and the reference. These can be features that are independently identifiable across the modalities used (e.g. lung branchpoints identified using CT and PET data). They can also represent molecular features, i.e. the expression level of genes or proteins, that can be measured across different assay types and facilitate integration.

Mapping the same data modality to the reference

When the new data are of the same kind as those used to build the original spatial reference data, mapping is relatively straightforward (Figure 4A). For example, in the Allen Brain Atlas, where gene expression measurements and connectivity data are derived from slices that were obtained from a similar process as the tomographic slices used to construct the map, measurements can be spatially mapped using analogous methods to the construction of the original template (global followed by local image registration). Similarly, in the ICBM, new MRI data can be mapped onto the ICBM-152 template in the same way as it was originally constructed. Finally, the Mouse Embryo Atlas developed a framework called ‘constrained distance transformation’ to map new in situ measurements onto existing developmental models. This framework computes radial basis functions as described above, but on geodesic distances (a method that can also be used for graph-alignment (Hill and Baldock, 2015)).

Mapping a different data modality to the reference

When the new data are of a different modality from the original template, mapping requires the identification of correspondences between data types (Figure 4B). For example, we recently introduced the use of canonical correlation analysis (Butler et al., 2018), or the identification of mutual nearest neighbors (Haghverdi et al., 2018), to identify correspondences (‘alignments’) between scRNA-seq datasets produced across different technologies. These methods demonstrate how identifying conserved patterns of covariation enable new data to be mapped (or ‘aligned’) onto an existing reference. Extensions of this approach, such as the use of joint non-negative matrix factorization (Welch et al., 2019), can be applied to identify correspondences between scRNA-seq and ‘ _in-situ_’ transcriptomics datasets (Box 1). These have been successfully applied to mapping single cell types and interpolating transcriptome-wide expression patterns in the mammalian cortex. While these approaches rely on shared variables (here, genes), multi-domain translation methods can relate data types even without any shared variables (Yang and Uhler, 2019), based on the assumption that data from different domains are generated from a shared latent representation (here, the same tissue).

While these approaches do not assume gene expression ‘smoothness’, they are limited by the resolution with which a cell’s location is encoded in its gene expression. This limitation reveals a key challenge for ongoing human atlas projects. Here, the spatial mapping of molecular datasets, including single-cell RNA-seq, is a primary goal. However, in most cases, a cell’s precise spatial location may not be uniquely encoded in its molecular make-up. In this setting, approaches that allow a measure of the confidence of the mapping will be extremely important. In particular, even without a precise mapping, information about the probable locations to which a cell will map is still extremely useful information.

One potential approach to further improve this mapping is for the tissue collector to note the location of the tissue sample with respect to pre-defined landmarks (i.e., defined in landmark-based coordinates) when the original tissue section is taken. This will enable the dataset to be inherently mapped to a plausible set or range of positions based on the metadata itself (i.e., within a statistical context it will act as prior information). While this will represent a relatively coarse level of resolution (for example, each sequenced cell within a sample will be assigned the same metadata, and therefore the same spatial location), this initial mapping can be performed reliably and consistently across samples. In conjunction with the molecular data, this has the potential to yield an improved mapping. Moreover, in this setting, groups who are focused on mapping can be successful by defining a hierarchical series of stereotactic coordinate systems (landmarks, axes, and transformations) in collaboration with tissue collectors.

The promise of a CCF: a unified map for querying data and exploring inter-individual variation

Though previous atlas efforts have not scaled up to an entire organism, they demonstrate how a CCF enables consistent organization and referencing of diverse data types. For example, the Allen Brain Atlas originally represented a comprehensive spatial map of gene expression, but this framework has now been used to reference additional data types including neural projections and single-cell RNA-seq profiles alongside online exploratory tools for visualization and queries. Similarly, the Mouse Embryology Atlas (Armit et al., 2015) now represents a broad resource for developmental biology, with associated browsers enabling 3D visualization of gene expression, anatomical components, and histological features. A human CCF will enable similar resources for diverse data types, ranging across both the molecular and histological scale, thereby playing a similarly impactful role to pioneering genomic data browsers.

A CCF can also be used for precise exploration of anatomical variation across individuals. The ‘spatial normalization’ that is inherent to CCF registration corrects for technical variation in data acquisition, alongside global variation in organ size, representing an essential prerequisite to these comparisons. For example, the Cardiac Atlas Project (Fonseca et al., 2011) has constructed a set of 4D atlases to track inter-individual variation in cardiac morphology over space and time. They apply Principal or Independent Components Analysis to identify the primary modes of variation across samples, and intersect these findings alongside clinical data. In a series of studies (reviewed in (Young and Frangi, 2009)), these modes of variation have been associated with the normal fluctuations in the shape of the heart wall between the end-diastole and end-systole, but also phenotypic metadata, including diastolic dysfunction and abnormal myocardial contraction.

Similar efforts have been applied across abdominal tissues (Reyes et al., 2009), multiple brain regions (Mazziotta et al., 2001; Habas et al., 2010; Pauli, Nili and Tyszka, 2018) and the lung (Hame et al., 2014; Yang et al., 2017). However, the examination of different measures of variability for current human atlas efforts provide an opportunity to relate morphological and molecular variation. For example, it has not been previously possible to comprehensively associate changes in tissue and organ structure with quantitative changes in cell type composition or gene expression. This could be revealed by paired spatial and molecular data.

Conclusion

A Common Coordinate Framework is a prerequisite for constructing a detailed molecular atlas of the human body. From a practical perspective it will be needed to rigorously combine information from multiple samples when constructing the atlas. However, once built, it will allow patterns of heterogeneity within organs to be studied at unprecedented depth. For example, new gradients of gene expression can be identified within cell types and at cell type boundaries as well as similarities in cell type organization across organs. We have highlighted key challenges associated with CCF construction, in particular, the extensive inter-individual variation inherent to human anatomy, alongside the challenge of mapping data at multiple anatomical scales. The latter challenge suggests that a single CCF with one strategy and one level of resolution may be insufficient to represent all biomedical data. A hierarchical organization of coordinate systems (an ‘atlas of atlases’) represents an attractive alternative, with each layer generated using different data types and capable of modeling variation at distinct scales; Crucially, a CCF will facilitate mapping between layers. Together, the Common Coordinate Framework alongside new experimental and computational approaches – including those that learn maps directly from data – have the potential to move us from a census of cell types and imaged spatial patterns to a fully-formed map, with profound implications for our understanding of biology in the context of both normal development and in disease.

Acknowledgements

This publication is part of the Human Cell Atlas. We gratefully acknowledge Richard Conroy, Ajay Pillai, Zorina Galis, and Katy Borner for generous feedback and discussion. This work was supported by the Human Biomolecular Atlas Project (NIH 1OT2OD026673- 01), NIH New Innovator Award (1DP2HG009623- 01), the Chan Zuckerberg Initiative (HCA2-A-1708-02755) and an NSF Graduate Fellowship (DGE1342536; A.B.). AR was additionally supported by the NIH BRAIN Initiative, Howard Hughes Medical Institute, and the Klarman Cell Observatory. S.G. is supported by a Royal Society Newton International Fellowship (NIF\R1\181950; E.F. is supported by the Wellcome Trust Mathematical Genomics and Medicine PhD programme (WT/215183/Z/19/Z). J.C.M. acknowledges core support from EMBL and from Cancer Research UK (C9545/A29580).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Conflict of Interest

AR is a founder and equity holder of Celsius Therapeutics and an SAB member of ThermoFisher Scientific, Neogene Therapeutics, and Syros Pharmaceuticals.

References