Common genetic variation drives molecular heterogeneity in human iPSCs (original) (raw)
Accession codes
Primary accessions
ArrayExpress
European Nucleotide Archive
References
- Sterneckert, J. L., Reinhardt, P. & Schöler, H. R. Investigating human disease using stem cell models. Nat. Rev. Genet. 15, 625–639 (2014)
Article CAS PubMed Google Scholar - Kim, K. et al. Epigenetic memory in induced pluripotent stem cells. Nature 467, 285–290 (2010)
Article CAS ADS PubMed PubMed Central Google Scholar - Kim, K. et al. Donor cell type can influence the epigenome and differentiation potential of human induced pluripotent stem cells. Nat. Biotechnol. 29, 1117–1119 (2011)
Article CAS PubMed PubMed Central Google Scholar - Lister, R. et al. Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells. Nature 471, 68–73 (2011)
Article CAS ADS PubMed PubMed Central Google Scholar - Nazor, K. L. et al. Recurrent variations in DNA methylation in human pluripotent stem cells and their differentiated derivatives. Cell Stem Cell 10, 620–634 (2012)
Article CAS PubMed PubMed Central Google Scholar - Rouhani, F. et al. Genetic background drives transcriptional variation in human induced pluripotent stem cells. PLoS Genet. 10, e1004432 (2014)
Article CAS PubMed PubMed Central Google Scholar - Burrows, C. K. et al. Genetic variation, not cell type of origin, underlies the majority of identifiable regulatory differences in iPSCs. PLoS Genet. 12, e1005793 (2016)
Article CAS PubMed PubMed Central Google Scholar - Vallier, L. et al. Signaling pathways controlling pluripotency and early cell fate decisions of human induced pluripotent stem cells. Stem Cells 27, 2655–2666 (2009)
Article CAS PubMed Google Scholar - Müller, F. J. et al. A bioinformatic assay for pluripotency in human cells. Nat. Methods 8, 315–317 (2011)
Article CAS PubMed PubMed Central Google Scholar - Danecek, P., & McCarthy, S. A., HipSci Consortium & Durbin, R. A method for checking genomic integrity in cultured cell lines from SNP genotyping data. PLoS ONE 11, e0155014 (2016)
Article CAS PubMed PubMed Central Google Scholar - Laurent, L. C. et al. Dynamic changes in the copy number of pluripotency and cell proliferation genes in human ESCs and iPSCs during reprogramming and time in culture. Cell Stem Cell 8, 106–118 (2011)
Article CAS PubMed PubMed Central Google Scholar - The International Stem Cell Initiative. Screening ethnically diverse human embryonic stem cells identifies a chromosome 20 minimal amplicon conferring growth advantage. Nat. Biotechnol. 29, 1132–1144 (2011)
- Abyzov, A. et al. Somatic copy number mosaicism in human skin revealed by induced pluripotent stem cells. Nature 492, 438–442 (2012)
Article CAS ADS PubMed PubMed Central Google Scholar - Mayshar, Y. et al. Identification and classification of chromosomal aberrations in human induced pluripotent stem cells. Cell Stem Cell 7, 521–531 (2010)
Article CAS PubMed Google Scholar - Taapken, S. M. et al. Karotypic abnormalities in human induced pluripotent stem cells and embryonic stem cells. Nat. Biotechnol. 29, 313–314 (2011)
Article CAS PubMed Google Scholar - Hussein, S. M. et al. Copy number variation and selection during reprogramming to pluripotency. Nature 471, 58–62 (2011)
Article CAS ADS PubMed Google Scholar - Laurin, M. & Côté, J. F. Insights into the biological functions of Dock family guanine nucleotide exchange factors. Genes Dev. 28, 533–547 (2014)
Article CAS PubMed PubMed Central Google Scholar - Zhang, X. et al. FATS is a transcriptional target of p53 and associated with antitumor activity. Mol. Cancer 9, 244 (2010)
Article CAS PubMed PubMed Central Google Scholar - Lo, J. Y., Chou, Y. T., Lai, F. J. & Hsu, L. J. Regulation of cell signaling and apoptosis by tumor suppressor WWOX. Exp. Biol. Med. 240, 383–391 (2015)
Article CAS Google Scholar - Futreal, P. A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004)
Article CAS PubMed PubMed Central Google Scholar - Duckett, C. S. et al. A conserved family of cellular genes related to the baculovirus iap gene and encoding apoptosis inhibitors. EMBO J. 15, 2685–2694 (1996)
Article CAS PubMed PubMed Central Google Scholar - Chia, N. Y. et al. A genome-wide RNAi screen reveals determinants of human embryonic stem cell identity. Nature 468, 316–320 (2010)
Article CAS ADS PubMed Google Scholar - Belinky, F . et al. PathCards: multi-source consolidation of human biological pathways. Database (Oxford) 2015, bav006 (2015)
Article Google Scholar - GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015)
- Grundberg, E. et al. Mapping _cis_- and _trans_-regulatory effects across multiple tissues in twins. Nat. Genet. 44, 1084–1089 (2012)
Article CAS PubMed PubMed Central Google Scholar - Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015)
Article CAS PubMed PubMed Central Google Scholar - Xu, H. et al. ESCAPE: database for integrating high-content published data collected from human and mouse embryonic stem cells. Database (Oxford) 2013, bat045 (2013)
Google Scholar - Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014)
Article CAS PubMed PubMed Central Google Scholar - Dubois, P. C. et al. Multiple common variants for celiac disease influencing immune gene expression. Nat. Genet. 42, 295–302 (2010)
Article CAS PubMed PubMed Central Google Scholar - Stranger, B. E. et al. Population genomics of human gene expression. Nat. Genet. 39, 1217–1224 (2007)
Article CAS PubMed PubMed Central Google Scholar - Zeller, T. et al. Genetics and beyond—the transcriptome of human monocytes and disease susceptibility. PLoS One 5, e10693 (2010)
Article ADS CAS PubMed PubMed Central Google Scholar - Purrington, K. S. et al. Genome-wide association study identifies 25 known breast cancer susceptibility loci as risk factors for triple-negative breast cancer. Carcinogenesis 35, 1012–1019 (2014)
Article CAS PubMed Google Scholar - Garcia-Closas, M. et al. Genome-wide association studies identify four ER negative-specific breast cancer risk loci. Nat. Genet. 45, 392–398 (2013)
Article CAS PubMed PubMed Central Google Scholar - Wang, Z. et al. Imputation and subset-based association analysis across different cancer types identifies multiple independent risk loci in the TERT_–_CLPTM1L region on chromosome 5p15.33. Hum. Mol. Genet. 23, 6616–6633 (2014)
Article CAS PubMed PubMed Central Google Scholar - Li, Q. et al. Integrative eQTL-based analyses reveal the biology of breast cancer risk loci. Cell 152, 633–641 (2013)
Article CAS PubMed PubMed Central Google Scholar - Ongen, H. et al. Putative _cis_-regulatory drivers in colorectal cancer. Nature 512, 87–90 (2014)
Article CAS ADS PubMed Google Scholar - Chen, Q. R., Hu, Y., Yan, C., Buetow, K. & Meerzaman, D. Systematic genetic analysis identifies _cis_-eQTL target genes associated with glioblastoma patient survival. PLoS One 9, e105393 (2014)
Article ADS CAS PubMed PubMed Central Google Scholar - Bojesen, S. E. et al. Multiple independent variants at the TERT locus are associated with telomere length and risks of breast and ovarian cancer. Nat. Genet. 45, 371–384 (2013)
Article CAS PubMed PubMed Central Google Scholar - Chiba, K. et al. Cancer-associated TERT promoter mutations abrogate telomerase silencing. eLife 4, e07918 (2015)
Article PubMed Central Google Scholar - Kyttälä, A. et al. Genetic Variability Overrides the Impact of Parental Cell Type and Determines iPSC Differentiation Potential. Stem Cell Reports 6, 200–212 (2016)
Article PubMed PubMed Central Google Scholar - Kajiwara, M. et al. Donor-dependent variations in hepatic differentiation from human-induced pluripotent stem cells. Proc. Natl Acad. Sci. USA 109, 12538–12543 (2012)
Article CAS ADS PubMed PubMed Central Google Scholar - Choi, J. et al. A comparison of genetically matched cell lines reveals the equivalence of human iPSCs and ESCs. Nat. Biotechnol. 33, 1173–1181 (2015)
Article CAS PubMed PubMed Central Google Scholar - Gerrits, A. et al. Expression quantitative trait loci are highly sensitive to cellular differentiation state. PLoS Genet. 5, e1000692 (2009)
Article CAS PubMed PubMed Central Google Scholar - Spies, N. et al. Constraint and divergence of global gene expression in the mammalian embryo. eLife 4, e05538 (2015)
Article PubMed PubMed Central Google Scholar - Cannavò, E. et al. Genetic variants regulating expression levels and isoform diversity during embryogenesis. Nature 541, 402–406 (2017)
Article ADS CAS PubMed Google Scholar - Carcamo-Orive, L. et al. Analysis of transcriptional variability in large human iPSC library reveals genetic and non-genetic determinants of heterogeneity. Cell Stem Cell 20, 518–532 (2017)
Article CAS PubMed Google Scholar - DeBoever, C. et al. Large-scale profiling reveals the influence of genetic variation on gene expression in human induced pluripotent stem cells. Cell Stem Cell 20, 533–546 (2017)
Article CAS PubMed PubMed Central Google Scholar - Kim, N. W. et al. Specific association of human telomerase activity with immortal cells and cancer. Science 266, 2011–2015 (1994)
Article CAS ADS PubMed Google Scholar - Kelly, L. M. & Gilliland, D. G. Genetics of myeloid leukemias. Annu. Rev. Genomics Hum. Genet. 3, 179–198 (2002)
Article CAS PubMed Google Scholar - ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)
- Takahashi, K. & Yamanaka, S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126, 663–676 (2006)
Article CAS PubMed Google Scholar - The UK10K Consortium. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015)
- The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015)
- Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009)
Article CAS PubMed PubMed Central Google Scholar - Delaneau, O., Marchini, J. & Zagury, J. F. A linear complexity phasing method for thousands of genomes. Nat. Methods 9, 179–181 (2011)
Article CAS PubMed Google Scholar - Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009)
Article CAS PubMed PubMed Central Google Scholar - Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012)
Article CAS PubMed PubMed Central Google Scholar - Huber, W., von Heydebreck, A., Sültmann, H., Poustka, A. & Vingron, M. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18 (Suppl 1), S96–S104 (2002)
Article PubMed Google Scholar - Vallier, L. et al. Early cell fate decisions of human embryonic stem cells and mouse epiblast stem cells are controlled by the same signalling pathways. PLoS ONE 4, e6082 (2009)
Article ADS CAS PubMed PubMed Central Google Scholar - Ly, T. et al. A proteomic chronology of gene expression through the cell cycle in human myeloid leukemia cells. eLife 3, e01630 (2014)
Article PubMed PubMed Central Google Scholar - Bensaddek, D. et al. Micro-proteomics with iterative data analysis: proteome analysis in C. elegans at the single worm level. Proteomics 16, 381–392 (2016)
Article CAS PubMed PubMed Central Google Scholar - Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008)
Article CAS PubMed Google Scholar - Aryee, M. J. et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30, 1363–1369 (2014)
Article CAS PubMed PubMed Central Google Scholar - Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013)
Article CAS PubMed Google Scholar - Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015)
Article CAS PubMed Google Scholar - Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010)
Article CAS PubMed PubMed Central Google Scholar - DeLuca, D. S. et al. RNA-SeQC: RNA-seq metrics for quality control and process optimization. Bioinformatics 28, 1530–1532 (2012)
Article CAS PubMed PubMed Central Google Scholar - Leha, A. et al. A high-content platform to characterise human induced pluripotent stem cell lines. Methods 96, 85–96 (2016)
Article CAS PubMed PubMed Central Google Scholar - Zack, T. I. et al. Pan-cancer patterns of somatic copy number alteration. Nat. Genet. 45, 1134–1140 (2013)
Article CAS PubMed PubMed Central Google Scholar - Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005)
Article CAS ADS PubMed PubMed Central Google Scholar - Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011)
Article CAS PubMed PubMed Central Google Scholar - Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015)
Article CAS PubMed PubMed Central Google Scholar - Stegle, O., Parts, L., Durbin, R. & Winn, J. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLOS Comput. Biol. 6, e1000770 (2010)
Article ADS MathSciNet CAS PubMed PubMed Central Google Scholar - Lippert, C., Casale, F. P., Rakitsch, B. & Stegle, O. LIMIX: genetic analysis of multiple traits. Preprint at http://biorxiv.org/content/early/2014/05/21/003905 (2014)
- Casale, F. P., Rakitsch, B., Lippert, C. & Stegle, O. Efficient set tests for the genetic analysis of correlated traits. Nat. Methods 12, 755–758 (2015)
Article CAS PubMed Google Scholar - Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003)
Article CAS ADS MathSciNet MATH PubMed PubMed Central Google Scholar - Pers, T. H., Timshel, P. & Hirschhorn, J. N. SNPsnap: a web-based tool for identification and annotation of matched SNPs. Bioinformatics 31, 418–420 (2015)
Article CAS PubMed Google Scholar - Lawrence, M., Gentleman, R. & Carey, V. rtracklayer: an R package for interfacing with genome browsers. Bioinformatics 25, 1841–1842 (2009)
Article CAS PubMed PubMed Central Google Scholar - Farh, K. K. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015)
Article CAS ADS PubMed Google Scholar - Lambert, J. C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat. Genet. 45, 1452–1458 (2013)
Article CAS PubMed PubMed Central Google Scholar - Trynka, G. et al. Dense genotyping identifies and localizes multiple common and rare variant association signals in celiac disease. Nat. Genet. 43, 1193–1201 (2011)
Article CAS PubMed PubMed Central Google Scholar - Liu, J. Z. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015)
Article CAS PubMed PubMed Central Google Scholar - International Multiple Sclerosis Genetics Consortium (IMSGC). Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis. Nat. Genet. 45, 1353–1360 (2013)
- Faraco, J. et al. ImmunoChip study implicates antigen presentation to T cells in narcolepsy. PLoS Genet. 9, e1003270 (2013)
Article CAS PubMed PubMed Central Google Scholar - Cordell, H. J. et al. International genome-wide meta-analysis identifies new primary biliary cirrhosis risk loci and targetable pathogenic pathways. Nat. Commun. 6, 8019 (2015)
Article CAS PubMed Google Scholar - Tsoi, L. C. et al. Identification of 15 new psoriasis susceptibility loci highlights the role of innate immunity. Nat. Genet. 44, 1341–1348 (2012)
Article CAS PubMed PubMed Central Google Scholar - Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014)
Article CAS ADS PubMed Google Scholar - Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014)
- Bentham, J. et al. Genetic association analyses implicate aberrant regulation of innate and adaptive immunity genes in the pathogenesis of systemic lupus erythematosus. Nat. Genet. 47, 1457–1464 (2015)
Article CAS PubMed PubMed Central Google Scholar - Onengut-Gumuscu, S. et al. Fine mapping of type 1 diabetes susceptibility loci and evidence for colocalization of causal variants with lymphoid gene enhancers. Nat. Genet. 47, 381–386 (2015)
Article CAS PubMed PubMed Central Google Scholar - Morris, A. P. et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat. Genet. 44, 981–990 (2012)
Article CAS PubMed PubMed Central Google Scholar - Li, Y. I. et al. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016)
Article CAS ADS PubMed PubMed Central Google Scholar - Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010)
Article CAS PubMed PubMed Central Google Scholar
Acknowledgements
This work was funded with a strategic award from the Wellcome Trust and UK Medical Research Council (WT098503). We thank the staff in the Cellular Genetics and Phenotyping and Sequencing core facilities at the Wellcome Trust Sanger Institute. Work at the Wellcome Trust Sanger Institute was further supported by Wellcome Trust grant WT090851. H.K. is supported by a MRC eMedLab Medical Bioinformatics career development award from the UK Medical Research Council (MR/L016311/1). F.M.W. acknowledges financial support from the Department of Health via the NIHR Biomedical Research Centre award to Guy’s & St Thomas’ National Health Service Foundation Trust in partnership with King’s College London and King’s College Hospital NHS Foundation Trust. We acknowledge the participation of all NIHR Cambridge BioResource volunteers, and thank the NIHR Cambridge BioResource centre staff for their contribution. We thank the National Institute for Health Research and NHS Blood and Transplant. The NIHR/Wellcome Trust Cambridge Clinical Research Facility supported the volunteer recruitment. We acknowledge Life Science Technologies Corporation as the provider of Cytotune. We thank F.-J. Müller for insights regarding the PluriTest method, and the GTEx consortium for making raw data and intermediate results available.
Author information
Author notes
- Helena Kilpinen
Present address: UCL Great Ormond Street Institute of Child Health, University College London, London, WC1N 1EH, UK - Andreas Leha
Present address: Department of Medical Statistics, University Medical Center Göttingen, Humboldtallee 32, 37073, Göttingen, Germany - Helena Kilpinen, Angela Goncalves, Oliver Stegle and Daniel J. Gaffney: These authors contributed equally to this work.
- Fiona M. Watt, Richard Durbin, Oliver Stegle and Daniel J. Gaffney: These authors jointly supervised this work.
Authors and Affiliations
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
Helena Kilpinen, Francesco Paolo Casale, Adam Faulconbridge, Peter W. Harrison, Davis McCarthy, Ian Streeter, Laura Clarke, Ewan Birney & Oliver Stegle - Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, Cambridge, UK
Angela Goncalves, Andreas Leha, Kaur Alasoo, Sendu Bala, Petr Danecek, Shane A. McCarthy, Yasin Memari, Alice Mann, Chukwuma A. Agu, Alex Alderton, Rachel Nelson, Sarah Harper, Minal Patel, Alistair White, Sharad R. Patel, Reena Halai, Christopher M. Kirton, Anja Kolb-Kokocinski, Willem H. Ouwehand, Ludovic Vallier, Richard Durbin & Daniel J. Gaffney - Centre for Gene Regulation & Expression, School of Life Sciences, University of Dundee, Dundee, DD1 5EH, UK
Vackar Afzal, Dalila Bensaddek & Angus I. Lamond - Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, CB2 0XY, UK
Sofie Ashford & Willem H. Ouwehand - Centre for Stem Cells & Regenerative Medicine, King’s College London, Tower Wing, Guy’s Hospital, Great Maze Pond, London, SE1 9RT, UK
Oliver J. Culley, Annie Kathuria, Ruta Meleckyte, Nathalie Moens, Davide Danovi & Fiona M. Watt - St Vincent’s Institute of Medical Research, 41 Victoria Parade, Fitzroy, 3065, Victoria, Australia
Davis McCarthy - Department of Surgery, Wellcome Trust and MRC Cambridge Stem Cell Institute and Biomedical Research Centre, Anne McLaren Laboratory, University of Cambridge, Cambridge, CB2 0SZ, UK
Filipa Soares & Ludovic Vallier - UCL Great Ormond Street Institute of Child Health, University College London, London, WC1N 1EH, UK
Philip Beales - NHS Blood and Transplant, Cambridge Biomedical Campus, Cambridge, CB2 0PT, UK
Willem H. Ouwehand
Authors
- Helena Kilpinen
You can also search for this author inPubMed Google Scholar - Angela Goncalves
You can also search for this author inPubMed Google Scholar - Andreas Leha
You can also search for this author inPubMed Google Scholar - Vackar Afzal
You can also search for this author inPubMed Google Scholar - Kaur Alasoo
You can also search for this author inPubMed Google Scholar - Sofie Ashford
You can also search for this author inPubMed Google Scholar - Sendu Bala
You can also search for this author inPubMed Google Scholar - Dalila Bensaddek
You can also search for this author inPubMed Google Scholar - Francesco Paolo Casale
You can also search for this author inPubMed Google Scholar - Oliver J. Culley
You can also search for this author inPubMed Google Scholar - Petr Danecek
You can also search for this author inPubMed Google Scholar - Adam Faulconbridge
You can also search for this author inPubMed Google Scholar - Peter W. Harrison
You can also search for this author inPubMed Google Scholar - Annie Kathuria
You can also search for this author inPubMed Google Scholar - Davis McCarthy
You can also search for this author inPubMed Google Scholar - Shane A. McCarthy
You can also search for this author inPubMed Google Scholar - Ruta Meleckyte
You can also search for this author inPubMed Google Scholar - Yasin Memari
You can also search for this author inPubMed Google Scholar - Nathalie Moens
You can also search for this author inPubMed Google Scholar - Filipa Soares
You can also search for this author inPubMed Google Scholar - Alice Mann
You can also search for this author inPubMed Google Scholar - Ian Streeter
You can also search for this author inPubMed Google Scholar - Chukwuma A. Agu
You can also search for this author inPubMed Google Scholar - Alex Alderton
You can also search for this author inPubMed Google Scholar - Rachel Nelson
You can also search for this author inPubMed Google Scholar - Sarah Harper
You can also search for this author inPubMed Google Scholar - Minal Patel
You can also search for this author inPubMed Google Scholar - Alistair White
You can also search for this author inPubMed Google Scholar - Sharad R. Patel
You can also search for this author inPubMed Google Scholar - Laura Clarke
You can also search for this author inPubMed Google Scholar - Reena Halai
You can also search for this author inPubMed Google Scholar - Christopher M. Kirton
You can also search for this author inPubMed Google Scholar - Anja Kolb-Kokocinski
You can also search for this author inPubMed Google Scholar - Philip Beales
You can also search for this author inPubMed Google Scholar - Ewan Birney
You can also search for this author inPubMed Google Scholar - Davide Danovi
You can also search for this author inPubMed Google Scholar - Angus I. Lamond
You can also search for this author inPubMed Google Scholar - Willem H. Ouwehand
You can also search for this author inPubMed Google Scholar - Ludovic Vallier
You can also search for this author inPubMed Google Scholar - Fiona M. Watt
You can also search for this author inPubMed Google Scholar - Richard Durbin
You can also search for this author inPubMed Google Scholar - Oliver Stegle
You can also search for this author inPubMed Google Scholar - Daniel J. Gaffney
You can also search for this author inPubMed Google Scholar
Contributions
H.K., A.G., O.S. and D.J.G. wrote the paper with input from all authors. H.K., A.G., D.B., Y.M., I.S., P.D., D.M., A.A., M.P., A.M., D.D., A.I.L., O.S. and D.G. contributed to the Supplementary Information. H.K., A.G., A.L., F.P.C., P.D., D.M., K.A. and D.D. analysed the data. S.A. and W.H.O. managed and supervised collection of research volunteer samples. F.S., C.A.A., A.A., R.N., S.H., M.P., S.R.P., A.W. and C.M.K. generated iPS cell lines, tier 1 assay data, cell growth data, RNA-seq and methylation data. V.A. and D.B. generated and processed the proteomics data. A.L., O.J.C., R.M., N.M. and D.D. generated and processed the high-content cellular imaging data. S.A.M., S.B. and Y.M. carried out initial data quality control and bioinformatics processing/pipelines. A.F., P.W.H., I.S. and L.C. curated and managed data and the project website. R.H. and A.K.-K. coordinated the project. D.D., P.B., W.H.O., E.B., L.V., A.I.L., F.M.W., R.D., O.S. and D.G. supervised and designed the research. H.K. and A.G. contributed equally to this work; O.S. and D.J.G. contributed equally to this work.
Corresponding authors
Correspondence toFiona M. Watt, Richard Durbin, Oliver Stegle or Daniel J. Gaffney.
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Additional information
Reviewer Information Nature thanks E. Dermitzakis, S. Montgomery and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Figure 1 Overview of the cellomics assay.
a, Example plate layout for the cellular differentiation assay. Images are shown for the pluripotency markers (OCT4, SOX2 and NANOG) as they are measured in the Cellomics imaging device. Each line is measured in two rows of the same plate as technical replicates. The secondary antibody used for each marker is shown in parenthesis (aG, anti-goat antibody; aM, anti-mouse antibody). Each plate also has measurements for staining with the secondary antibody only, which serves as a means to assess background fluorescence. The red channel shows the signal from the DAPI staining, the green channel the marker signal. As expected, there is only a small signal from the green channel in the wells stained for the secondary antibody only. Image acquisition stops as soon as 10,000 cells have been detected. b, Detailed variance components of the cellomics markers (Methods). Substantial proportions of the marker variance could be attributed to batch factors, including staining, technician effects and antibody lots. These effects mean that the fraction of cells expressing particular markers need to be interpreted with caution (Fig. 1c, d). c, Pairwise Pearson’s correlation (r) between quantitative expression scores derived from immunostaining for pluripotency and differentiation and the PluriTest score (P values from a Student’s _t_-test).
Extended Data Figure 2 Pluritest scores in the two culture conditions.
a–c, Comparison of PluriTest novelty score versus pluripotency score for the 711 lines generated. Lines grown on feeder-free conditions (E8 medium) scored systematically lower than feeder-dependent lines (P = 1.62 × 10−43; _t_-test, for pluripotency score). We note that, while we cannot rule out that feeder-free lines are less pluripotent, feeder-free conditions are not well represented in the PluriTest training dataset, which may explain this result (of the 204 ES cell/iPS cell lines in the PluriTest paper that have medium metadata available, none were cultured in E8 and only 37 were cultured in a variety of other feeder-free formulations such as MTSER). d, Despite lower pluripotency scores, lines grown on feeder-free conditions have higher fractions of cells expressing canonical protein markers of pluripotency.
Extended Data Figure 3 Extended CNA analysis.
Relationship between the number of CNAs, using three CNA minimum length thresholds for calling CNAs: 200 kb, 500 kb and 1,000 kb and other experimental factors. Values on the x axis have been ‘jittered’ (that is, small random ‘noise’ has been added to the true values) to enhance the visualization. Data points underlying the box plots are shown as semi-transparent blue dots. a, Number of CNAs per line versus passage number. P values are shown from a generalized linear mixed model (Poisson regression) with donor random effect. b, Box plot of the number of autosomal CNAs per line versus growth medium. P values are for a Poisson regression for culture condition. c, d, Number of autosomal CNAs per line versus PluriTest pluripotency and novelty scores. P values are for a linear mixed model of the number of autosomal CNAs per line with a random effect for donor. e, f, Number of CNA counts per donor versus gender and donor age. CNA counts refer to the total number of unique CNAs across all lines derived from the same donor. CNAs that are shared between lines of the same donor (overlap by at least one base) are counted only once. P values are shown for a Poisson regression for either gender or age.
Extended Data Figure 4 Location and consequence of the recurrent CNA on chr20.
Related to Fig. 2. Top, genomic location versus number of lines with copy number 3 (grey) and with a CNA (black). Bottom, the _N_AV gene score from ref. 22 and log2 gene expression fold change between the iPS cell lines with copy number 2 and 3a (colour scale), in the region highlighted in red in the top panel. Highlighted genes are upregulated when copy number increases, known oncogenes/tumour-suppressor genes and/or genes with a _N_AV score in the top 2%.
Extended Data Figure 5 Functional assessment of CNAs using growth assays.
a–c, Cell growth rate (a), proliferation (b) and apoptosis (c) in cell lines with copy number 2 (wild type, blue dots) or copy number 3 (mutant, red dots) in a recurrently duplicated region in iPS cells on chromosome 1, 17 or 20. Plot titles show the donor name and the genomic coordinates of the CNA. a, Cell counts taken on successive days in culture, for pairs of lines (one mutant, one wild type) grown on the same 24-well plates are shown. Asterisks denote significance levels for statistical interactions between day and copy number in a linear mixed model, using fixed effects to fit day and copy number, and random effects to account for culture plate effects. EIF4A3 denotes whether a copy number variant overlaps one of the suspected candidate genes on chromosome 17. *P < 0.05; **P < 0.01; ***P < 0.001. b, Protein expression level measured using TMT-based quantification using the Q-exactive plus (labelled QE Plus) orbitrap and a fusion (labelled Fusion) orbitrap mass spectometry platforms. c, Estimated fraction of fluorescing nuclei following an EdU assay in mutant and wild-type lines, following exposure to mitomycin (Treated) or in a control sample (Untreated). d, Estimated fraction of fluorescing nuclei following a terminal deoxynucleotidyl transferase dUTP nick end labelling (TUNEL) assay in mutant and wild-type lines, following exposure to mitomycin (Treated) or in a control sample (Untreated). Solid trend lines are least squares regression fits. P values in b and c denote the significance of statistical interactions between copy number and mitomycin treatment condition (treated or untreated).
Extended Data Figure 6 Effect of passage number on tier 1 and tier 2 data and overview of iPS cell cis eQTLs mapped with tier 1 gene expression array data.
a, b, Passage number versus PluriTest pluripotency and novelty scores shows no significant association between passage number and pluripotency. Trend lines are fit using a linear regression of the PluriTest scores and the passage number (score P = 0.66, novelty P = 0.21). Association was also not significant when including gender and medium as fixed effects and batch variables and donor as random effects (score P = 0.3, novelty P = 0.14). c, Passage number versus log10 RNA-seq expression of pluripotency factors NANOG and OCT4 shows no significant association between passage number and pluripotency. Trend lines are fit using linear regression of log10 expression and passage number (NANOG, P = 0.5; POU5F1, P = 0.15). Association was also not significant when considering the two genes together and when including gender and medium as fixed effects and batch variables and donor as random effects (passage, P = 0.28; passage–gene interaction, P = 0.96). d, e, Variance component analysis for tier 2 assays, showing that for the majority of genes gender and passage explained little of the total variance. f, g, Comparison of lead variant effect sizes (_β_2) in gexarray-based eQTL maps. The eQTL maps were derived by using mean expression levels per donor (‘main’ map) and with two sets of individual lines (one per donor), drawn randomly (‘replicate’ maps). The effect sizes for all tested genes are shown in black, with FDR < 5% eGenes from the main map indicated in blue. Effect sizes are compared between the two replicate maps (f, Spearman’s rank correlation ρ = 0.47 genome-wide, ρ = 0.80 eGenes only, both P < 2.2 × 10−16) and between the main map and one replicate map (g, Spearman’s rank correlation ρ = 0.57 genome-wide, ρ = 0.88 eGenes only, both P < 2.2 × 10−16). The effect sizes obtained using the mean expression values per donor are higher than when using individual lines. h, Pairwise correlation between gene expression levels in iPS cells measured with RNA-seq and gexarray. The Spearman rank correlation coefficients of either gene (pink) or gexarray probe (blue) region based read counts are shown, showing a higher correlation for probe-based counts.
Extended Data Figure 7 Properties of iPS cell cis eQTLs in comparison to somatic eQTLs.
a, b, The power to detect eQTLs is plotted, comparing 44 somatic tissues from GTEx24 (V6p) and the HipSci RNA-seq-based eQTL map (purple triangle), considering either the absolute (a) or relative (b) number of eQTLs identified (eGenes, FDR < 5%). The major determinant of eQTL detection power is sample size. c, Cumulative fraction of RNA-seq reads relative to the number of protein-coding genes expressed. The mean read count derived from 20 iPS cell lines (10 donors, two lines each) is plotted, five fibroblast lines, and two ES cell (ESC) lines. In iPS cells, half of the reads are explained by the expression of 1,071 genes, whereas 75% and 90% of the reads are explained by the expression of 3,159 and 5,814 genes, respectively (total protein-coding genes with non-zero counts n = 17,332). d, Distribution of iPS cell eQTLs around the annotated gene start position. The −log10(eQTL P value) is plotted against the distance (in bp) from the gene start for lead eQTL variants genome-wide, highlighting significant eQTLs (FDR < 5%) in orange. e, Comparison of the magnitude of eQTL effect size (absolute beta; left) and minor allele frequency (MAF; right) between iPS-cell-specific (n = 2,131; labelled as S) and non-specific eQTLs (n = 4,500; labelled as NS), demonstrating that overall, iPS-cell-specific eQTLs have smaller effects on the transcriptome than eQTLs shared among multiple tissues (P = 9.97 × 10−161; Wilcoxon rank-sum test) and have a lower minor allele frequency (P = 1.08 × 10−35, Wilcoxon rank-sum test).
Extended Data Figure 8 Comparison of eQTL mapping pipelines between HipSci and GTEx (V6p).
a, Proportion of tissue-specific eQTLs as a function of the discovery sample size. For iPS cells, the two sets of tissue-specific eQTLs obtained with the two different mapping pipelines (Methods) are shown, namely the standard HipSci pipeline (iPSC; purple triangle) and the alternative ‘GTEx-like’ pipeline (iPSC2; purple triangle). Points other than iPS cells are from the GTEx Consortium (44 somatic tissues and cell lines)24. b, Heat map of pairwise _π_1 values (_π_1 = 1 − _π_0) between iPS cells and GTEx tissues, with rows representing the discovery tissue and columns the replication tissue. Clustering of tissues is based on euclidean distance (R hclust, method = average). c, Effect of eQTL replication threshold on the definition of tissue-specific effects. The replication profile of iPS cell eQTLs across GTEx tissues relative to discovery sample size in each replication tissue is shown. The proportion of lead eQTLs from iPS cells that replicate in each tissue is plotted, with replication defined using two different replication thresholds (TH1: nominal eQTL P < 0.01 / _n_tissues; TH5: P < 0.05 / _n_tissues; plotted as dots and triangles, respectively). d, Enrichment of alternative iPS cell eQTLs (GTEx-like) at proximal and distal (defined as less than or greater than 2 kb from the transcription start site) transcription factor binding sites of promoters in H1 hES cells from the ENCODE Project50. Fold enrichments per factor are shown for iPS-cell-specific and non-specific eQTLs (minimum 10 observed overlaps) (Methods). Pluripotency-associated factors are indicated with an asterisk. The profile of enrichments is comparable to that obtained with the standard HipSci pipeline (Fig. 4d).
Extended Data Figure 9 iPS cell eQTLs and disease.
a, Cumulative number of cancer genes (COSMIC cancer census 27 April 2016; _n_genes = 571; ref. 20) regulated by eQTLs in iPS cells, somatic tissues (GTEx V6p), and three different cancers (ER positive and negative breast cancer, colorectal cancer)32,33. b, Enrichment of iPS cells and somatic eQTLs (lead variants and their high-linkage-disequilibrium proxies) at disease-associated variants in the NHGRI–EBI GWAS catalogue (10 April 2016). The fold enrichment of eQTLs over 100 random sets of matched variants for each tissue relative to eQTL discovery sample size is shown. The tissues showing the highest fold enrichment are liver and brain (cerebellar hemisphere; BrainCH). c, Somatic eQTL signal for the PTPN2 (protein tyrosine phosphatase, non-receptor type 2) locus on chromosome 18. This locus contains a colocalizing association signal for PTPN2 gene expression in iPS cells and five immunological disease phenotypes (Fig. 5a). d, Somatic eQTL signal for the TERT (telomerase reverse transcriptase) locus on chromosome 5 (Fig. 5b). In both c and d, the lead eQTL variant locations are indicated with red and orange vertical lines for iPS cells and somatic tissues, respectively. The focal gene regions are indicated in solid grey and gene start positions of other protein-coding genes on the same strand with vertical grey lines.
Extended Data Figure 10 Tissue expression and alternative splicing results for the TERT locus.
a, b, Normalized RNA-seq per-base coverage across the TERT locus stratified by rs10069690 genotype. The full locus (a) or zoomed view of the region (b) around the lead eQTL and cancer risk variant rs10069690 are shown. rs10069690 is indicated with a dotted line on each plot. Grey regions indicate annotated exons from Ensembl version 75. Coverage was computed from indexed BAM files using the coverageBed function from the bedtools (version 2.25.0)93. Raw coverage was divided by total library size in millions (total number of mapped reads) per sample to obtain normalized coverage, which was then averaged over samples with the same rs10069690 genotype to obtain mean normalized coverage for each genotype group. c, Profile of TERT expression in iPS cells and across somatic tissues from GTEx. The gene FPKM values obtained with RNA-SeQC (GTEx V6p) are shown. d, Splicing-QTL of TERT. We quantified TERT intron retention rates using Leafcutter92 and identified one alternative splicing event associated with rs10069690, the lead iPS cell eQTL variant for TERT (Fig. 5b). The TERT intron 4 retention ratio (PSI, per cent spliced in) is shown in iPS cell lines of all individual donors stratified by their genotype at rs10069690. This variant affects the splicing of the intron where it is located, with the minor allele (T) increasing the fraction of TERT transcripts in which intron 4 is retained (P = 1.7 × 10−9, Bonferroni-adjusted linear regression).
Supplementary information
Supplementary Information
This file contains supplementary information and methods. (PDF 337 kb)
Supplementary Table 1
Sample meta data for the HipSci cell lines used in this publication. This is a subset of HipSci's full catalogue of cell lines and data, which can be queried at http://www.hipsci.org/lines. (XLSX 184 kb)
Supplementary Table 2
CNA results. (a) CNA locations (b) Significance of CNA recurrence over 200 kb genome windows (c) Properties of the recurrent CNAs, including: peak region, overlap with chromatin fragile sites, cis (same chromosome from the CNA) and trans (different chromosome) regulated genes (i.e. genes differentially expressed between copy-number 2 and 3 lines), and top candidate genes (identified as described in the main text). (d) Genome-wide association of copy numbers at recurrent CNAs with gene expression (e) Pathway enrichment analysis of genes regulated in trans by the chromosome 17 recurrent CNA region. (XLSX 199 kb)
Supplementary Table 3
Gene expression variance components analysis. Fraction of variance explained by the factors considered for each expression array probe. (XLSX 3388 kb)
Supplementary Table 4
iPSC eQTL results. (a) eGene level summary of the cis-eQTLs discovered with different sample sets in this study. (b) eQTL results for primary and secondary lead eQTL variants of HipSci RNA-seq iPSC eGenes at FDR < 5% (N = 6,631). Primary and secondary eQTLs are defined by the column ‘primary_eQTL’. The column ‘iPSC_specific’ defines whether the eQTL is iPSC-specific. Columns ‘N_proxies_used’ and ‘proxy_positions’ give the total number and positions of proxy variants that were tested in the tissue-specific analysis. Additionally, the column ‘overlaps_CNA’ indicates whether the eQTL lead variant overlaps with a recurrent iPSC CNA. (XLSX 1545 kb)
Supplementary Table 5
Tissue information. (a) Description of the tissue data used in this study to define tissuespecific eQTLs (GTEx V6p, HipSci), including the embryonic origin of each tissue and number of tissue-specific eQTLs identified for each tissue. (b) Summary of iPSC eQTL replication tests in the tissue-specific analysis, showing for each replication tissue how often proxy variants (‘ld_buddy’, ‘best_proxy’) were tested instead of the same lead variant (‘same_as_lead’). (XLSX 17 kb)
Supplementary Table 6
iPSC eQTL overlap with disease-associated variants. (a) All disease-associated variants in the NHGRI-EBI GWAS catalogue (release 2016-04-10) which are tagged by an iPSC eQTL (lead variant or r2 > 0.8 proxy). For proxy matches, all eQTLs for which the variant is a proxy (r2 > 0.8) are shown. (b) Disease-associated variants in the GWAS catalogue that are lead eQTL variants in iPSCs (subset of (a)). For each variant, the number of high-LD proxies it has is listed (‘N_HIGH_LD_PROXIES’). (c) Individual traits in the GWAS catalogue for which iPSC eQTLs show a significant enrichment (BH-adjusted empirical P < 0.05, derived from 100 random sets of matched variants; Methods). Shown are traits with minimum five variants tagged by iPSC eQTLs. (d) Results of the colocalisation analysis for 14 traits. (XLSX 121 kb)
PowerPoint slides
Rights and permissions
About this article
Cite this article
Kilpinen, H., Goncalves, A., Leha, A. et al. Common genetic variation drives molecular heterogeneity in human iPSCs.Nature 546, 370–375 (2017). https://doi.org/10.1038/nature22403
- Received: 11 May 2016
- Accepted: 27 April 2017
- Published: 10 May 2017
- Issue Date: 15 June 2017
- DOI: https://doi.org/10.1038/nature22403