A promoter-level mammalian expression atlas (original) (raw)

Accession codes

Primary accessions

DDBJ/GenBank/EMBL

Data deposits

All CAGE data has been deposited at DDBJ DRA under accession number DRA000991.

References

  1. Vickaryous, M. K. & Hall, B. K. Human cell type diversity, evolution, development, and classification with special reference to cells derived from the neural crest. Biol. Rev. Camb. Philos. Soc. 81, 425–455 (2006)
    Article PubMed Google Scholar
  2. Lenhard, B., Sandelin, A. & Carninci, P. Metazoan promoters: emerging characteristics and insights into transcriptional regulation. Nature Rev. Genet. 13, 233–245 (2012)
    Article CAS PubMed Google Scholar
  3. Kanamori-Katayama, M. et al. Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Res. 21, 1150–1159 (2011)
    Article CAS PubMed PubMed Central Google Scholar
  4. Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature http://dx.doi.org/10.1038/nature12787 (this issue)
  5. The ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)
    Article PubMed Central ADS CAS Google Scholar
  6. Su, A. I. et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl Acad. Sci. USA 101, 6062–6067 (2004)
    Article CAS PubMed ADS PubMed Central Google Scholar
  7. Meehan, T. F. et al. Logical development of the cell ontology. BMC Bioinformatics 12, 6 (2011)
    Article PubMed PubMed Central Google Scholar
  8. Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E. & Haendel, M. A. Uberon, an integrative multi-species anatomy ontology. Genome Biol. 13, R5 (2012)
    Article PubMed PubMed Central Google Scholar
  9. Osborne, J. D. et al. Annotating the human genome with Disease Ontology. BMC Genomics 10 (Suppl 1). S6 (2009)
    Article PubMed PubMed Central CAS Google Scholar
  10. Severin, J. et al. Interactive visualization and analysis of large-scale NGS data-sets using ZENBU. Nature Biotechnol. http://dx.doi.org/10.1038/nbt.2840 (2014)
  11. Oja, E., Hyvarinen, A. & Karhunen, J. Independent Component Analysis (John Wiley & Sons, 2001)
    Google Scholar
  12. Affymetrix/Cold Spring Harbor Laboratory ENCODE Transcriptome Project Post-transcriptional processing generates a diversity of 5′-modified long and short RNAs. Nature 457, 1028–1032 (2009)
    Article CAS Google Scholar
  13. Carninci, P. et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nature Genet. 38, 626–635 (2006)
    Article CAS PubMed Google Scholar
  14. Ioshikhes, I., Hosid, S. & Pugh, B. F. Variety of genomic DNA patterns for nucleosome positioning. Genome Res. 21, 1863–1871 (2011)
    Article CAS PubMed PubMed Central Google Scholar
  15. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010)
    Article CAS PubMed Google Scholar
  16. Schug, J. et al. Promoter features related to tissue specificity as measured by Shannon entropy. Genome Biol. 6, R33 (2005)
    Article PubMed PubMed Central CAS Google Scholar
  17. Beissbarth, T. & Speed, T. P. GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 20, 1464–1465 (2004)
    Article CAS PubMed Google Scholar
  18. Velculescu, V. E. et al. Analysis of human transcriptomes. Nature Genet. 23, 387–388 (1999)
    Article CAS PubMed Google Scholar
  19. Schmidt, D. et al. Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding. Science 328, 1036–1040 (2010)
    Article CAS PubMed PubMed Central ADS Google Scholar
  20. Barolo, S. Shadow enhancers: frequently asked questions about distributed _cis_-regulatory information and enhancer redundancy. Bioessays 34, 135–141 (2012)
    Article CAS PubMed Google Scholar
  21. Roach, J. C. et al. Transcription factor expression in lipopolysaccharide-activated peripheral-blood-derived mononuclear cells. Proc. Natl Acad. Sci. USA 104, 16245–16250 (2007)
    Article CAS PubMed ADS PubMed Central Google Scholar
  22. Vaquerizas, J. M., Kummerfeld, S. K., Teichmann, S. A. & Luscombe, N. M. A census of human transcription factors: function, expression and evolution. Nature Rev. Genet. 10, 252–263 (2009)
    Article CAS PubMed Google Scholar
  23. Wingender, E., Schoeps, T. & Dönitz, J. TFClass: an expandable hierarchical classification of human transcription factors. Nucleic Acids Res. 41, D165–D170 (2013)
    Article CAS PubMed Google Scholar
  24. de Kok, Y. J. et al. Association between X-linked mixed deafness and mutations in the POU domain gene POU3F4. Science 267, 685–688 (1995)
    Article CAS PubMed ADS Google Scholar
  25. Kiernan, A. E. et al. Sox2 is required for sensory organ development in the mammalian inner ear. Nature 434, 1031–1035 (2005)
    Article CAS PubMed ADS Google Scholar
  26. Zheng, W. et al. The role of Six1 in mammalian auditory system development. Development 130, 3989–4000 (2003)
    Article CAS PubMed Google Scholar
  27. Paylor, R., Johnson, R. S., Papaioannou, V., Spiegelman, B. M. & Wehner, J. M. Behavioral assessment of c-fos mutant mice. Brain Res. 651, 275–282 (1994)
    Article CAS PubMed Google Scholar
  28. Trowe, M. O., Maier, H., Schweizer, M. & Kispert, A. Deafness in mice lacking the T-box transcription factor Tbx18 in otic fibrocytes. Development 135, 1725–1734 (2008)
    Article CAS PubMed Google Scholar
  29. Vahava, O. et al. Mutation in transcription factor POU4F3 associated with inherited progressive hearing loss in humans. Science 279, 1950–1954 (1998)
    Article CAS PubMed ADS Google Scholar
  30. Chabchoub, E., Willekens, D., Vermeesch, J. R. & Fryns, J. P. Holoprosencephaly and ZIC2 microdeletions: novel clinical and epidemiological specificities delineated. Clin. Genet. 81, 584–589 (2012)
    Article CAS PubMed Google Scholar
  31. Pingault, V. et al. SOX10 mutations in patients with Waardenburg-Hirschsprung disease. Nature Genet. 18, 171–173 (1998)
    Article CAS PubMed Google Scholar
  32. Kapoor, S., Mukherjee, S. B., Shroff, D. & Arora, R. Dysmyelination of the cerebral white matter with microdeletion at 6p25. Indian Pediatr. 48, 727–729 (2011)
    Article PubMed Google Scholar
  33. Murakami, T. et al. Signalling mediated by the endoplasmic reticulum stress transducer OASIS is involved in bone formation. Nature Cell Biol. 11, 1205–1211 (2009)
    Article CAS PubMed Google Scholar
  34. Acampora, D. et al. Craniofacial, vestibular and bone defects in mice lacking the _Distal-less_-related gene Dlx5. Development 126, 3795–3809 (1999)
    CAS PubMed Google Scholar
  35. Kieslinger, M. et al. EBF2 regulates osteoblast-dependent differentiation of osteoclasts. Dev. Cell 9, 757–767 (2005)
    Article CAS PubMed Google Scholar
  36. Funato, N. et al. Hand2 controls osteoblast differentiation in the branchial arch by inhibiting DNA binding of Runx2. Development 136, 615–625 (2009)
    Article CAS PubMed Google Scholar
  37. McIntyre, D. C. et al. Hox patterning of the vertebrate rib cage. Development 134, 2981–2989 (2007)
    Article CAS PubMed Google Scholar
  38. Driller, K. et al. Nuclear factor I X deficiency causes brain malformation and severe skeletal defects. Mol. Cell. Biol. 27, 3855–3867 (2007)
    Article CAS PubMed PubMed Central Google Scholar
  39. Lu, M. F. et al. prx-1 functions cooperatively with another paired-related homeobox gene, prx-2, to maintain cell fates within the craniofacial mesenchyme. Development 126, 495–504 (1999)
    CAS PubMed Google Scholar
  40. Ten Berge, D., Brouwer, A., Korving, J., Martin, J. F. & Meijlink, F. Prx1 and Prx2 in skeletogenesis: roles in the craniofacial region, inner ear and limbs. Development 125, 3831–3842 (1998)
    CAS PubMed Google Scholar
  41. Laclef, C. et al. Altered myogenesis in _Six1_-deficient mice. Development 130, 2239–2252 (2003)
    Article CAS PubMed Google Scholar
  42. Lee, M. S., Lowe, G. N., Strong, D. D., Wergedal, J. E. & Glackin, C. A. TWIST, a basic helix-loop-helix transcription factor, can regulate the human osteogenic lineage. J. Cell. Biochem. 75, 566–577 (1999)
    Article CAS PubMed Google Scholar
  43. Clement-Jones, M. et al. The short stature homeobox gene SHOX is involved in skeletal abnormalities in Turner syndrome. Hum. Mol. Genet. 9, 695–702 (2000)
    Article CAS PubMed Google Scholar
  44. He, G. et al. Inactivation of Six2 in mouse identifies a novel genetic mechanism controlling development and growth of the cranial base. Dev. Biol. 344, 720–730 (2010)
    Article CAS PubMed Google Scholar
  45. Freeman, T. C. et al. Construction, visualisation, and clustering of transcription networks from microarray expression data. PLoS Comput. Biol. 3, e206 (2007)
    Article MathSciNet PubMed Central ADS CAS Google Scholar
  46. The FANTOM Consortium The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005)
    Article ADS CAS Google Scholar
  47. Suzuki, H. et al. The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line. Nature Genet. 41, 553–562 (2009)
    Article CAS PubMed Google Scholar
  48. Kawaji, H. et al. Comparison of CAGE and RNA-seq transcriptome profiling using a clonally amplified and single molecule next generation sequencing. Genome Res. http://dx.doi.org/10.1101/gr.156232.113 (2014)
  49. Heffner, C. S. et al. Supporting conditional mouse mutagenesis with a comprehensive cre characterization resource. Nature Commun. 3, 1218 (2012)
    Article ADS CAS Google Scholar
  50. Pringle, I. A. et al. Rapid identification of novel functional promoters for gene therapy. J. Mol. Med. 90, 1487–1496 (2012)
    Article CAS PubMed Google Scholar
  51. Pham, T. H. et al. Dynamic epigenetic enhancer signatures reveal key transcription factors associated with monocytic differentiation states. Blood 119, e161–e171 (2012)
    Article CAS PubMed Google Scholar
  52. Shulha, H. P. et al. Epigenetic signatures of autism; trimethylated H3K4 landscapes in prefrontal neurons. Arch. Gen. Psychiatry 69, 314–324 (2012)
    Article CAS PubMed Google Scholar
  53. Yoneyama, M. et al. The RNA helicase RIG-I has an essential function in double-stranded RNA-induced innate antiviral responses. Nature Immunol. 5, 730–737 (2004)
    Article CAS Google Scholar
  54. Shapira, S. D. et al. A physical and regulatory map of host-influenza interactions reveals pathways in H1N1 infection. Cell 139, 1255–1267 (2009)
    Article PubMed PubMed Central Google Scholar
  55. Talukder, A. H. et al. Phospholipid scramblase 1 regulates Toll-like receptor 9-mediated type I interferon production in plasmacytoid dendritic cells. Cell Res. 22, 1129–1139 (2012)
    Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

FANTOM5 was made possible by a Research Grant for RIKEN Omics Science Center from MEXT to Y. Hayashizaki and a grant of the Innovative Cell Biology by Innovative Technology (Cell Innovation Program) from the MEXT, Japan to Y. Hayashizaki. It was also supported by Research Grants for RIKEN Preventive Medicine and Diagnosis Innovation Program (RIKEN PMI) to Y. Hayashizaki and RIKEN Centre for Life Science Technologies, Division of Genomic Technologies (RIKEN CLST (DGT)) from the MEXT, Japan. Extended acknowledgements are provided in the Supplementary Information.

Author information

Author notes

  1. Christian Schmidl, Yulia A. Medvedeva, James Briggs, Carlo V. Cannistraci, Soichi Ogishima, Hiroko Ohmiya & Alka Saxena
    Present address: Present addresses: Institute of Predictive and Personalized Medicine of Cancer, Ctra. de Can Roti, cami de les escoles, s/n, 08916 Badalona (Barcelona), Spain (Y.A.M.); Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Technische Universität Dresden, Dresden, Germany (C.V.C.); Genomics Core Facility, Biomedical Research Centre, Guy’s Hospital, London SE1 9RT, UK (A. Saxena); RIKEN Advanced Center for Computing and Communication (ACCC), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045 Japan (H. Ohmiya); Research Center for Molecular Medicine of the Austrian Academy of Sciences (CeMM), 1090 Vienna, Austria (C. Schmidl); Department of Biological and Biomedical Sciences, Harvard University, Cambridge, Massachusetts 02138, USA (J.B.); Department of Bioclinical Informatics,Tohoku Medical Megabank Organization,Tohoku University. Sendai 980-8573, Japan (S.O.).,
  2. Alistair R. R. Forrest, Hideya Kawaji, Michael Rehli and J. Kenneth Baillie: These authors contributed equally to this work.

Authors and Affiliations

  1. RIKEN Omics Science Center (OSC), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan.,
    Alistair R. R. Forrest, Hideya Kawaji, Michiel J. L. de Hoon, Timo Lassmann, Marina Lizio, Masayoshi Itoh, Nicolas Bertin, Erik Arner, Charles Plessy, Morana Vitezic, Jessica Severin, Yuri Ishizu, Takahiro Arakawa, Alessandro Bonetti, A. Maxwell Burroughs, Shiro Fukuda, Masaaki Furuno, Matthias Harbers, Jayson Harshbarger, Akira Hasegawa, Yuki Hasegawa, Takehiro Hashimoto, Fumi Hori, Bogumil Kaczkowski, Kaoru Kaida, Ai Kaiho, Kazuhiro Kajiyama, Mutsumi Kanamori-Katayama, Shintaro Katayama, Sachi Kato, Tsugumi Kawashima, Miki Kojima, Naoto Kondo, Atsutaka Kubosaki, Andrew T. Kwon, Ri-ichiroh Manabe, Efthymios Motakis, Mitsuyoshi Murata, Sayaka Nagao-Sato, Kenichi Nakazato, Noriko Ninomiya, Hiromi Nishiyori, Shohei Noma, Hiroko Ohmiya, Jordan A. Ramilowski, Sugata Roy, Eri Saijyo, Akiko Saka, Mizuho Sakai, Alka Saxena, Thierry Sengstag, Hisashi Shimoji, Jay W. Shin, Christophe Simon, Masanori Suzuki, Naoko Suzuki, Michihira Tagami, Naoko Takahashi, Shoko Watanabe, Shigehiro Yoshida, Harukazu Suzuki, Carsten O. Daub, Jun Kawai, Piero Carninci & Yoshihide Hayashizaki
  2. RIKEN Center for Life Science Technologies (Division of Genomic Technologies) (CLST (DGT)), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.,
    Alistair R. R. Forrest, Hideya Kawaji, Michiel J. L. de Hoon, Timo Lassmann, Marina Lizio, Masayoshi Itoh, Nicolas Bertin, Erik Arner, Charles Plessy, Jessica Severin, Yuri Ishizu, Takahiro Arakawa, Alessandro Bonetti, Masaaki Furuno, Jayson Harshbarger, Akira Hasegawa, Yuki Hasegawa, Fumi Hori, Bogumil Kaczkowski, Kaoru Kaida, Kazuhiro Kajiyama, Takeya Kasukawa, Sachi Kato, Tsugumi Kawashima, Miki Kojima, Naoto Kondo, Andrew T. Kwon, Ri-ichiroh Manabe, Efthymios Motakis, Mitsuyoshi Murata, Hiromi Nishiyori, Shohei Noma, Hiroko Ohmiya, Jordan A. Ramilowski, Sugata Roy, Mizuho Sakai, Jay W. Shin, Christophe Simon, Naoko Suzuki, Michihira Tagami, Naoko Takahashi, Shigehiro Yoshida, Harukazu Suzuki & Piero Carninci
  3. RIKEN Preventive Medicine and Diagnosis Innovation Program (PMI), 2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan.,
    Hideya Kawaji, Masayoshi Itoh, Jun Kawai & Yoshihide Hayashizaki
  4. Department of Internal Medicine III, University Hospital Regensburg, F.-J.-Strauss Allee 11, D-93042 Regensburg, Germany.,
    Michael Rehli, Christian Schmidl & Matthias Edinger
  5. Regensburg Centre for Interventional Immunology (RCI), D-93042 Regensburg, Germany.,
    Michael Rehli & Matthias Edinger
  6. The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Edinburgh, Midlothian EH25 9RG, UK.,
    J. Kenneth Baillie, Lynsey Fairbairn, Malcolm E. Fisher, Anagha Joshi, Kim M. Summers, Tom C. Freeman & David A. Hume
  7. Department of Biology, University of Bergen, Thormøhlensgate 53, NO-5006 Bergen, Norway.,
    Vanja Haberle
  8. Faculty of Medicine, Institute of Clinical Sciences, MRC Clinical Sciences Centre, Imperial College London, Hammersmith Hospital Campus, London W12 0NN, UK.,
    Vanja Haberle & Boris Lenhard
  9. Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov str. 32, Moscow 119991, Russia.,
    Ivan V. Kulakovskiy & Vsevolod J. Makeev
  10. Vavilov Institute of General Genetics, Russian Academy of Sciences, Gubkin str. 3, Moscow 119991, Russia.,
    Ivan V. Kulakovskiy, Yulia A. Medvedeva, Alexander V. Favorov, Artem S. Kasianov, Ilya E. Vorontsov & Vsevolod J. Makeev
  11. Department of Biology and BRIC, The Bioinformatics Centre, University of Copenhagen, Ole Maaloes Vej 5, DK 2200 Copenhagen, Denmark.,
    Robin Andersson, Mette Jørgensen, Yun Chen, Ilka Hoof, Kang Li, Berit Lilje, Xiaobei Zhao & Albin Sandelin
  12. Genomics Division, Lawrence Berkeley National Laboratory, 84R01, 1 Cyclotron Road, Berkeley, California 94720, USA.,
    Christopher J. Mungall
  13. Mouse Informatics, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.,
    Terrence F. Meehan
  14. Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Ibn Al-Haytham Building -2, Thuwal 23955-6900, Kingdom of Saudi Arabia.,
    Sebastian Schmeier, Ulf Schaefer, Yulia A. Medvedeva, Intikhab Alam, John A. C. Archer, Boris R. Jankovic, Benoit Marchand, Arnab Pain, Mamoon Rashid & Vladimir B. Bajic
  15. Institute of Natural and Mathematical Sciences, Massey University, Private Bag 102-904, North Shore Mail Centre, 0745 Auckland, New Zealand.,
    Sebastian Schmeier
  16. Department of Biostatistics, Harvard School of Public Health, 655 Huntington Ave, Boston, Massachusetts 02115, USA.,
    Emmanuel Dimont, Gabriel M. Altschuler, Shannan J. Ho Sui, Oliver M. Hofmann & Winston Hide
  17. Department of Cell and Molecular Biology, Karolinska Institutet, P.O. Box 285, SE-171 77 Stockholm, Sweden.,
    Morana Vitezic & Lukasz Huminiecki
  18. MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine (MRC-IGMM), University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK.,
    Colin A. Semple, Robert S. Young, Sarah Rennie, Alison Meynert, James G. D. Prendergast & Martin S. Taylor
  19. Department of Clinical Genetics, VU University Medical Center Amsterdam, Van der Boechorststraat 7, 1081 BT Amsterdam, The Netherlands.,
    Margherita Francescatto, Patrizia Rizzu & Peter Heutink
  20. Graduate Program in Areas of Basic and Applied Biology, Abel Salazar Biomedical Sciences Institute, University of Porto, Rua de Jorge Viterbo Ferreira n. 228, 4050-313 Porto, Portugal.,
    Margherita Francescatto
  21. Predictive Models for Biomedicine and Environment, Fondazione Bruno Kessler, via Sommarive 18, 38123 Trento, Italy.,
    Davide Albanese, Marco Chierici, Cesare Furlanello, Giuseppe Jurman & Marco Roncador
  22. Department of Medicine, Karolinska Institutet at Karolinska University Hospital, Huddinge, SE-141 86 Huddinge, Sweden.,
    Peter Arner & Niklas Mejhert
  23. Department of Dermatology and Allergy, Charité Campus Mitte, Universitätsmedizin Berlin, Chariteplatz 1, 10117 Berlin, Germany.,
    Magda Babina & Sven Guhl
  24. Biozentrum, University of Basel, Klingelbergstrasse 50-70, 4056 Basel, Switzerland.,
    Piotr J. Balwierz & Erik van Nimwegen
  25. Australian Institute for Bioengineering and Nanotechnology (AIBN), University of Queensland, Brisbane St Lucia, Queensland 4072, Australia.,
    Anthony G. Beckhouse, James Briggs, Kelly J. Hitchens, Dmitry A. Ovchinnikov, Dipti Vijayan, Christine A. Wells & Ernst Wolvetang
  26. Australian Infectious Diseases Research Centre (AID), University of Queensland, Brisbane St Lucia, Queensland 4072, Australia.,
    Anthony G. Beckhouse, Antje Blumenthal, Kelly J. Hitchens, Dipti Vijayan & Christine A. Wells
  27. Department of Biological Sciences, University of Delaware, Newark, 19713, Delaware, USA
    Swati Pradhan-Bhatt
  28. Bioinformatics and Computational Biology, The Jackson Laboratory, 600 Main Street, Bar Harbor, Maine 04609, USA.,
    Judith A. Blake
  29. Diamantina Institute, University of Queensland, Brisbane St Lucia, Queensland 4072, Australia.,
    Antje Blumenthal & Tony J. Kenna
  30. IRCCS Fondazione Santa Lucia, via del Fosso di Fiorano 64, 00143 Rome, Italy.,
    Beatrice Bodega & Valerio Orlando
  31. Immunology and Infectious Disease, International Centre for Genetic Engineering & Biotechnology (ICGEB) Cape Town component, Anzio Road, Observatory 7925, Cape Town, South Africa.,
    Frank Brombacher, Reto Guler, Suzana Savvi & Anita Schwegmann
  32. Division of Immunology, Institute of Infectious Diseases and Molecular Medicine (IDM), University of Cape Town, Anzio Road, Observatory 7925, Cape Town, South Africa.,
    Frank Brombacher, Reto Guler, Suzana Savvi & Anita Schwegmann
  33. Department of Systems Biology, Columbia University Medical Center, 1130 St. Nicholas Avenue, New York, New York 10032, USA.,
    Andrea Califano
  34. Department of Biochemistry and Molecular Biophysics, Columbia University Medical Center, 701 West 168th Street, New York, New York 10032, USA.,
    Andrea Califano
  35. Department of Biomedical Informatics, Columbia University Medical Center, 622 West 168th Street, VC5, New York, New York 10032, USA.,
    Andrea Califano
  36. Institute of Cancer Genetics, Columbia University Medical Center, Herbert Irving Comprehensive Cancer Center, 1130 St. Nicholas Avenue, New York, New York 10032, USA.,
    Andrea Califano & Yishai Shimoni
  37. Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Ibn Al-Haytham Building -2, Thuwal 23955-6900, Kingdom of Saudi Arabia.,
    Carlo V. Cannistraci, Valerio Orlando, Arnab Pain, Mamoon Rashid & Timothy Ravasi
  38. Applied Mathematics and Computational Science Program, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.,
    Carlo V. Cannistraci & Timothy Ravasi
  39. Department of Systems and Computational Biology, Albert Einstein College of Medicine, The Bronx, New York, New York 10461, USA.,
    Daniel Carbajo & Jessica C. Mar
  40. Laboratorio Nazionale del Consorzio Interuniversitario per le Biotecnologie (LNCIB), Padriciano 99, 34149 Trieste, Italy.,
    Yari Ciani, Emiliano Dalla, Silvano Piazza, Claudio Schneider & Roberto Verardo
  41. Hubrecht Institute, Uppsalalaan 8, 3584 CT Utrecht, The Netherlands.,
    Hans C. Clevers & Marc van de Wetering
  42. The Royal Netherlands Academy of Arts and Sciences, P.O. Box 19121, NL-1000 GC Amsterdam, The Netherlands.,
    Hans C. Clevers
  43. University Medical Centre Utrecht, Postbus 85500, 3508 GA Utrecht, The Netherlands.,
    Hans C. Clevers
  44. Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, New York 11797, USA.,
    Carrie A. Davis & Thomas Gingeras
  45. Institute of Pharmaceutical Sciences, ETH Zurich, Vladimir-Prelog-Weg 3, HCI H 303, 8093 Zurich, Switzerland.,
    Michael Detmar & Sarah Krampitz
  46. Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, New York State Center of Excellence in Bioinformatics and Life Sciences, 701 Ellicott Street, Buffalo, New York 14203, USA.,
    Alexander D. Diehl
  47. Gastroenterology, Research Center for Hepatitis and Immunology Research Institute, National Center for Global Health and Medicine, 1-7-1 Kohnodai, Ichikawa, Chiba 272-8516, Japan.,
    Taeko Dohi & Yuki I. Kawamura
  48. Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology (NTNU), P.O. Box 8905, NO-7491 Trondheim, Norway.,
    Finn Drabløs & Morten B. Rye
  49. Department of Otology and Laryngology, Harvard Medical School, Massachusetts Eye and Ear Infirmary, Eaton-Peabody Lab, 243 Charles Street, Boston, Massachusetts 02114, USA.,
    Albert S. B. Edge & Judith S. Kempfle
  50. Department of Biosciences and Nutrition, Center for Biosciences, Karolinska Institutet, Hälsovägen 7-9, SE-141 83 Huddinge, Sweden.,
    Karl Ekwall, Juha Kere, Andreas Lennartsson & Helena Persson
  51. RIKEN Research Center for Allergy and Immunology (RCAI), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.,
    Mitsuhiro Endoh, Jun-ichi Furusawa, Tomokatsu Ikawa, Hiroshi Kawamoto, Haruhiko Koseki, Shigeo Koyasu, Kazuyo Moro, Hiroshi Ohno & Mariko Okada-Hatakeyama
  52. RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.,
    Mitsuhiro Endoh, Jun-ichi Furusawa, Tomokatsu Ikawa, Haruhiko Koseki, Shigeo Koyasu, Kazuyo Moro, Hiroshi Ohno & Mariko Okada-Hatakeyama
  53. RIKEN Center for Developmental Biology (CDB), 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan.,
    Hideki Enomoto, Mitsuru Morimoto, Guojun Sheng & Yohei Yonekura
  54. FM Kirby Neurobiology Center, Children’s Hospital Boston, Harvard Medical School, 300 Longwood Avenue, Boston, Massachusetts 02115, USA.,
    Michela Fagiolini
  55. Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol BS8 1UB, UK.,
    Hai Fang, Julian Gough & Owen J. L. Rackham
  56. Department of Biochemistry and Cell Biology, Rice University, Houston, 77251-1892, Texas, USA
    Mary C. Farach-Carson
  57. Cancer Biology Program, Mater Medical Research Institute, Raymond Terrace, South Brisbane, Queensland 4101, Australia.,
    Geoffrey J. Faulkner
  58. Department of Oncology, Division of Oncology, Biostatistics and Bioinformatics, Johns Hopkins University School of Medicine, 550 North Broadway, Baltimore, Maryland 21205, USA.,
    Alexander V. Favorov
  59. State Research Institute of Genetics and Selection of Industrial Microorganisms GosNIIgenetika, 1-st Dorozhniy pr., 1, 117545 Moscow, Russia.,
    Alexander V. Favorov
  60. Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan.,
    Martin C. Frith
  61. Department of Medical Biochemistry, Tohoku University Graduate School of Medicine, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8575, Japan.,
    Rie Fujita, Hironori Satoh, Jun Takai & Masayuki Yamamoto
  62. Department of Microbiology and Immunology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku, Tokyo 160-8582, Japan.,
    Jun-ichi Furusawa, Shigeo Koyasu & Kazuyo Moro
  63. Experimental Immunology, Academic Medical Center, University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands.,
    Teunis B. Geijtenbeek & Linda M. van den Berg
  64. Department of Human Genetics, Leiden University Medical Center, Einthovenweg 20, 2333 ZC Leiden, The Netherlands.,
    Andrew P. Gibson, Jeroen F. J. Laros, Erik A. Schultes, Peter A. C. ’t Hoen, Zuotian Tatum & Mark Thompson
  65. Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, University of British Columbia, 950 West 28th Avenue, Vancouver, British Columbia V5Z 4H4, Canada.,
    Daniel Goldowitz, Thomas J. Ha, Anthony Mathelier, Wyeth W. Wasserman & Peter G. Zhang
  66. Neuroscience, SISSA, via Bonomea 265, 34136 Trieste, Italy.,
    Stefano Gustincich & Silvia Zucchelli
  67. Experimental Immunology, Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Japan.,
    Masahide Hamaguchi, Hiromasa Morikawa, Naganari Ohkura & Shimon Sakaguchi
  68. RIKEN Advanced Science Institute (ASI), 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.,
    Mitsuko Hara & Soichi Kojima
  69. Melanoma Research Center, The Wistar Institute, 3601 Spruce Street, Philadelphia, Pennsylvania 19104, USA.,
    Meenhard Herlyn, Rolf K. Swoboda & Susan E. Zabierowski
  70. RIKEN Bioinformatics And Systems Engineering Division (BASE), 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan.,
    Kei Iida, Shuji Kawaguchi & Tetsuro Toyoda
  71. Center for Molecular Medicine and Genetics, Wayne State University, 3228 Scott Hall, 540 East Canfield Street, Detroit, Michigan 48201-1928, USA.,
    Hui Jia, Leonard Lipovich & Emily J. Wood
  72. Laboratory Animal Research Center, Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan.,
    Chieko Kai, Toshiyuki Nakamura, Hiroki Sato, Takaaki Sugiyama & Misako Yoneda
  73. Science for Life Laboratory, Box 1031, SE-171 21 Solna, Sweden.,
    Juha Kere
  74. Centre for Vascular Research, University of New South Wales, Sydney, 2052, New South Wales, Australia
    Levon M. Khachigian & Margaret Patrikakis
  75. Division of Cellular Therapy and Division of Stem Cell Signaling, Institute of Medical Science, University of Tokyo, Tokyo 108-8639, Japan.,
    Toshio Kitamura & Fumio Nakahara
  76. Harry Perkins Institute of Medical Research, and the Centre for Medical Research, University of Western Australia, QQ Block, QEII Medical Centre, Nedlands, Perth, Western Australia 6009, Australia.,
    S. Peter Klinken & Louise N. Winteringham
  77. Respiratory Medicine, University of Nottingham, Clinical Sciences Building, City Hospital, Hucknall Road, Nottingham NG5 1PB, UK.,
    Alan J. Knox
  78. Department of Dermatology, Kyungpook National University School of Medicine, 130 Dongdeok-ro Jung-gu, Daegu 700-721, South Korea.,
    Weonju Lee
  79. National Centre for Adult Stem Cell Research, Eskitis Institute for Cell and Molecular Therapies, Griffith University, Brisbane, Queensland 4111, Australia.,
    Alan Mackay-sim
  80. Division of Functional Genomics and Systems Medicine, Research Center for Genomic Medicine, Saitama Medical University, 1397-1 Yamane, Hidaka, Saitama 350-1241, Japan.,
    Yosuke Mizuno, Yutaka Nakachi & Yasushi Okazaki
  81. Faculty of Engineering, University of Bristol, Merchant Venturers Building, Woodland Road, Clifton BS8 1UB, UK.,
    David A. de Lima Morais
  82. PRESTO, Japanese Science and Technology Agency (JST), 7 Gobancho, Chiyodaku, Tokyo 102-0076, Japan.,
    Kazuyo Moro
  83. Center for Radioisotope Sciences, Tohoku University Graduate School of Medicine, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8575, Japan.,
    Hozumi Motohashi
  84. Anatomy and Embryology, Leiden University Medical Center, Einthovenweg 20, P.O. Box 9600, 2300 RC Leiden, The Netherlands.,
    Christine L. Mummery & Robert Passier
  85. Division of Translational Research, Research Center for Genomic Medicine, Saitama Medical University, 1397-1 Yamane, Hidaka, Saitama 350-1241, Japan.,
    Yutaka Nakachi & Yasushi Okazaki
  86. RIKEN BioResource Center (BRC), Koyadai 3-1-1, Tsukuba, Ibaraki 305-0074, Japan.,
    Yukio Nakamura
  87. Department of Clinical Molecular Genetics, School of Pharmacy, Tokyo University of Pharmacy and Life Sciences, 1432-1 Horinouchi, Hachioji, Tokyo 192-0392, Japan.,
    Tadasuke Nozaki & Hiroo Toyoda
  88. Department of Bioinformatics, Medical Research Institute, Tokyo Medical and Dental University, 1-5-45 Yushima, Bunkyo-ku, Tokyo 113-8510, Japan.,
    Soichi Ogishima & Hiroshi Tanaka
  89. Department of Biochemistry, Ohu University School of Pharmaceutical Sciences, Misumido 31-1, Tomitamachi, Koriyama, Fukushima 963-8611, Japan.,
    Mitsuhiro Ohshima
  90. Department of Forensic Medicine, Hjelt Institute, University of Helsinki, Kytosuontie 11, 003000 Helsinki, Finland.,
    Antti Sajantila
  91. DSMB Dipartimento Scienze Mediche e Biologiche University of Udine, P.le Kolbe 3, 33100 Udine, Italy.,
    Claudio Schneider
  92. Department of Orthopedic, Trauma and Reconstructive Surgery, Charité Universitätsmedizin Berlin, Garystrasse 5, 14195 Berlin, Germany.,
    Gundula G. Schulze-Tanzil
  93. Center for Clinical and Translational Reseach, Kyushu University Hospital, Station for Collaborative Research1 4F, 3-1-1 Maidashi, Higashi-Ku, Fukuoka 812-8582, Japan.,
    Daisuke Sugiyama
  94. Graduate School of Pharmaceutical Sciences, Nagoya University, Furo-cho, Chikusa, Nagoya, Aichi 464-8601, Japan.,
    Hideki Tatsukawa
  95. Department of Molecular and Cellular Biology, Harvard University, 16 Divinity Avenue, Cambridge, Massachusetts 02138, USA.,
    Eivind Valen
  96. Department of Biochemistry, Nihon University School of Dentistry, 1-8-13, Kanda-Surugadai, Chiyoda-ku, Tokyo 101-8310, Japan.,
    Yoko Yamaguchi
  97. Department of Informatics, University of Bergen, Høgteknologisenteret, Thormøhlensgate 53, NO-5008 Bergen, Norway.,
    Boris Lenhard
  98. Department of Biological and Medical Physics, Moscow Institute of Physics and Technology (MIPT) 9, Institutsky Per., Dolgoprudny, Moscow Region 141700, Russia.,
    Vsevolod J. Makeev

Consortia

The FANTOM Consortium and the RIKEN PMI and CLST (DGT)

Contributions

The core members of FANTOM5 phase 1 were Alistair R. R. Forrest, Hideya Kawaji, Michael Rehli, J. Kenneth Baillie, Michiel J. L. de Hoon, Timo Lassmann, Masayoshi Itoh, Kim M. Summers, Harukazu Suzuki, Carsten O. Daub, Jun Kawai, Peter Heutink, Winston Hide, Tom C. Freeman, Boris Lenhard, Vladimir B. Bajic, Martin S. Taylor, Vsevolod J. Makeev, Albin Sandelin, David A. Hume, Piero Carninci and Yoshihide Hayashizaki. Samples were provided by: A. Blumenthal, A. Bonetti, A. Mackay-sim, A. Sajantila, A. Saxena, A. Schwegmann, A.G.B., A.J.K., A.L., A.R.R.F., A.S.B.E., B.B., C. Schmidl, C. Schneider, C.A.D., C.A.W., C.K., C.L.M., D.A.H., D.A.O., D.G., D.S., D.V., E.W., F.B., F.N., G.G.S., G.J.F., G.S., H. Kawamoto, H. Koseki, H. Morikawa, H. Motohashi, H. Ohno, H. Sato, H. Satoh, H. Tanaka, H. Tatsukawa, H. Toyoda, H.C.C., H.E., J. Kere, J.B., J.F., J.K.B., J.S.K., J.T., J.W.S., K.E., K.J.H., K.M., K.M.S., L.F., L.M.K., L.M.vdB., L.N.W., M. Edinger, M. Endoh, M. Fagiolini, M. Hamaguchi, M. Hara, M. Herlyn, M. Morimoto, M. Rehli, M. Yamamoto, M. Yoneda, M.B., M.C.F.C., M.D., M.E.F., M.O., M.O.H., M.P., M.vdW., N.M., N.O., N.T., P.A., P.G.Z., P.H., P.R., R.F., R.G., R.K.S., R.P., R.V., S. Guhl, S. Gustincich, S. Kojima, S. Koyasu, S. Krampitz, S. Sakaguchi, S. Savvi, S.E.Z., S.O., S.P.B., S.P.K., S. Roy., S.Z., T. Kitamura, T. Nakamura, T. Nozaki, T. Sugiyama, T.B.G., T.D., T.G., T.I., T.J.H., T.J.K., V.O., W.L., Y. Hasegawa, Y. Nakachi, Y. Nakamura, Y. Yamaguchi, Y. Yonekura, Y.I., Y.I.K., Y.M. and Y.O. Analyses were carried out by: A. Mathelier, A. Meynert, A. Sandelin, A.C., A.D.D., A.P.G., A.H., A.J., A.M.B., A.P., A.R.R.F., A.S.K., A.T.K., A.V.F., B. Lenhard, B. Lilje, B.D., B.K., B.M., B.R.J., C. Schmidl, C. Schneider, C.A.S., C.F., C.J.M., C.O.D., C.P., C.V.C., D.A., D.A.M., D.C., E. Dalla, E. Dimont, E.A., E.A.S., E.J.W., E.M., E.V., Ev.N., F.D., G.J., G.J.F., G.M.A., H. Kawaji, H. Ohmiya, H. Shimoji, H.F., H.J., H.P., I.A., I.E.V., I.H., I.V.K., J.A.B., J.A.C.A., J.A.R., J.C.M., J.F.J.L., J.G., J.G.D.P., J.H., J.K.B., J.S., K. Kajiyama, K.I., K.L., L.H., L.L., M. Francescatto, M. Rashid, M. Rehli, M. Roncador, M. Thompson, M.B.R., M.C., M.C.F., M.J., M.J.L.dH., M.L., M.S.T., M.V., N.B., O.J.L.R., O.M.H., P.A.C.tH., P.J.B, R.A., R.S.Y., S. Katayama, S. Kawaguchi, S. Schmeier, S. Rennie, S.F., S.J.H.S., S.P., T. Sengstag, T.C.F., T.F.M., T.H., T.K., T.L., T.R., T.T., U.S., V.B.B., V.H., V.J.M., W.H., W.W.W., X.Z., Y. Chen, Y. Ciani, Y.A.M., Y.S., Z.T. Libraries were generated by: A. Kaiho, A. Kubosaki, A. Saka, C. Simon, E.S., F.H., H.N., J. Kawai, K. Kaida, K.N., M. Furuno, M. Murata, M. Sakai, M. Tagami, M.I., M.K., M.K.K., N.K., N.N., N.S., P.C., R.M., S. Kato, S.N., S.N.-S., S.W., S.Y., T.A., T. Kawashima. The manuscript was written by A.R.R.F. and D.A.H. with help from A. Sandelin, J.K.B., M. Rehli, H.K., M.J.L.dH., V.H., I.V.K., M.T. and K.M.S. with contributions, edits and comments from all authors. The project was managed by Y. Hayashizaki, A.R.R.F., P.C., M.I., M.S., J. Kawai, C.O.D., H. Suzuki, T.L. and N.K. The scientific coordinator was A.R.R.F and the general organizer was Y. Hayashizaki.

Corresponding authors

Correspondence toAlistair R. R. Forrest, Piero Carninci or Yoshihide Hayashizaki.

Ethics declarations

Competing interests

The author declare no competing financial interests.

Extended data figures and tables

Extended Data Figure 1 Decomposition-based peak identification (DPI).

a, Schematic representation of each step in the peak identification. This starts from CAGE profiles at individual biological states (I), subsequently defines tag clusters (consecutive genomic region producing CAGE signals) over the accumulated CAGE profiles across all the states (II). Within each of the tag cluster, it infers up to five underlying signals (independent components) by using ICA independent component analysis (ICA) (III). It smoothens each of the independent components and finds peaks where signal is higher than the median (IV). The peaks along the individual components are finally merged if they are overlapping each other (V). b, c, Genomic view of actual examples (B4GALT1 locus) for human and mouse. CAGE profiles across the biological states (I) are shown as a greyscale plot, in which the x axis represents the genomic coordinates and individual rows represent individual biological states. Dark (or black) dots indicate frequent observation of transcription initiation (that is, larger number of CAGE read counts) and light dots (white) indicate less frequency. The blue histogram on the top indicates the accumulated CAGE read counts, and the entire region shown represents a single tag cluster (II). The histograms below the greyscale plot indicate the independent components of the CAGE signals inferred by ICA (III), and the resulting CAGE peaks are shown at the blue bars closest to the bottom (V). The bottom track indicates a gene model in RefSeq. The figures overall indicate that only one TSS is defined by RefSeq gene models in this locus, however, transcription starts from slightly different regions depending on the context, and the DPI method successfully captured the different initiation events. d, Breakdown of singleton and composite transcription initiation regions with homogenous or heterogeneous expression patterns according to likelihood ratio test (see Supplementary Methods).

Extended Data Figure 2 Broad and sharp promoters.

DPI peaks from the permissive set were aggregated by grouping neighbouring peaks less than 100 bp apart. Cumulative distribution of CAGE signal along each region was calculated and positions of 10th and 90th percentiles were determined. a, Schematic representation of CAGE signal within promoter region and calculation of interquantile width. Signal from CAGE transcription start sites (CTSS) is shown. Distance between these two positions (interquantile width) was used as a measure of promoter width. b, Distribution of promoter interquantile width across all 988 human samples. Individual grey lines show distribution in each sample and the average distribution is shown in yellow. For each sample only promoters with > = 5 TPM were selected. Distribution of obtained interquantile width was clearly bimodal and allowed us to set the empirical threshold at 10.5 bp that separates the best sharp from broad promoters. c, Distribution of expression specificity. The distribution of log ratios of expression in individual samples against the median expression across all samples is shown separately for sharp and broad promoters. Solid line shows the average distribution for all samples and the semi-transparent band denotes the 99% confidence interval. The dashed line corresponds to an expected log ratio if all samples contributed equally to the total expression. d, Average frequency of AA/AT/TA/TT (WW) dinucleotides around dominant TSS of sharp (red) and broad (blue) promoters across all human samples. Lines show the average signal and semi-transparent bands indicate the 99% confidence interval. Closer view of WW dinucleotide frequency displaying 10 bp periodicity is shown in the inset and indicates the likely position of the +1 nucleosome. For comparison, the signal aligned to randomly chosen TSS in broad promoters is shown in orange. e, As in a but for promoters in CD14+ monocytes. H2A.Z signal (subtracted coverage = plus strand coverage – minus strand coverage) around sharp and broad promoters is shown in corresponding semi-transparent colours (data from ref. 51). Transition point in subtracted coverage from positive to negative values indicates the most likely position of the nucleosome (shown as semi-transparent blue circle) centre. f, As in b but for promoters in frontal lobe. H3K4me3 signal (subtracted coverage = plus strand coverage – minus strand coverage) around sharp and broad promoters is shown in corresponding semi-transparent colours (data from ref. 52).

Extended Data Figure 3 Density plots of DPI peaks maximum and median expression.

a, Distribution for all human robust peaks. b, Distribution for all mouse robust peaks. Fraction on left of vertical dashed line corresponds to peaks with non-ubiquitous (cell-type-restricted) expression patterns (median <0.2 TPM). Fraction below the diagonal dashed line corresponds to ubiquitous-uniform (housekeeping) expression profiles (less than tenfold difference between maximum and median). Fraction in top-middle corresponds to ubiquitous-non-uniform expression profiles (maximum >tenfold median). ce Show distibutions based on cell line, primary cell and tissue data, respectively. The mixture of cells in tissues may overestimate the fraction of ubiquitously expressed genes. f, Boxplot showing the number of peaks and detected > = 10 TPM in primary cells, cell lines or tissues. g, As in a but showing transcription factor p1 peaks only. h, Boxplot showing maximum expression of the main promoter for transcription factors or all coding genes. i, Density plots of human robust DPI peaks maximum and median expression for the main promoter of coding genes. j, As in d but showing the main promoter of transcription factors. Fraction on the left of the vertical dashed line corresponds to peaks with non-ubiquitous (cell-type-restricted) expression patterns (median <0.2 TPM). Fraction below the diagonal dashed line corresponds to ubiquitous-uniform (housekeeping) expression profiles (less than tenfold difference between max and median). Fraction above the diagonal and to the right of the vertical dashed lines corresponds to ubiquitous-non-uniform expression profiles (maximum > tenfold median). k, Distribution for peaks with CpG island only (n = 55,897). l, Distribution for peaks with only a TATA motif (n = 3,933). m, Distribution for peaks with both CpG islands and TATA box motifs (n = 834). n, Distribution for DPI peaks with neither a TATA motif nor CpG island (n = 124,152). Fraction on the left of the vertical dashed line corresponds to peaks with non-ubiquitous (cell-type-restricted) expression patterns (median <0.2 TPM). Fraction below the diagonal dashed line corresponds to ubiquitous-uniform (housekeeping) expression profiles (less than tenfold difference between max and median). Fraction above diagonal and to right of vertical dashed lines corresponds to ubiquitous-non-uniform expression profiles (maximum > tenfold median).

Extended Data Figure 4 Cross-species projected super-clusters.

a, The number of mouse and human TSSs (both permissive and robust) per projected super-cluster. b, Same data as presented in panel a, with the y axis on a log scale. There is a slight tendency for more human TSSs per super-cluster than mouse TSSs. c, The number of human and mouse TSSs per projected super-cluster, density of data points indicated by log-scaled colour gradient shown on the right. Most super-clusters contain < = 4 DPI defined TSSs in both species. **d**, Evaluating the conservation of TSS annotation between species. Projected super-clusters are annotated by the most functional contributing TSS from each species (see Methods). Grey shading in the margins summarizes the proportion of super-clusters with each category of annotation in both mouse (_y_ axis) and human (_x_ axis). Numbers and volumes of circles represent counts of projected super-clusters, for example there are 34,868 super-clusters in which > = 1 human and > = 1 mouse component TSS are annotated as protein coding and 719 super-clusters in which the human TSSs are unannotated and at least one of the mouse TSSs are annotated as the 5′ end of a non-coding transcript.

Extended Data Figure 5 De novo derived, cell-state-specific motif signatures.

ac, The de novo motif discovery tools DMF, HOMER, ChIPMunk and ScanAll were applied to detect sequence motifs enriched in the vicinity of sample-specific peaks (a), yielding 8,699 de novo motifs (b). The coverage of known motif space by the de novo motifs was evaluated by comparing them to the SWISSREGULON, HOCOMOCO, TRANSFAC, HOMER, JASPAR, and ENCODE LEXICON motif collections. c, The remaining 1,221 de novo motifs that were not similar to known motifs were then clustered using MACRO-APE, resulting in 169 unique novel motifs. d, Known motifs from the HOMER database were annotated and counted in around cell-type-specific TSSs (−300 to +50 bp) associated with CpG islands (CGI) or non-CGI regions. eg, RNA Pol II ChIP-seq signal and motif finding in ‘housekeeping gene’ promoters with different absolute expression levels. Human housekeeping gene promoters were defined as (log10(max + 0.1) − log10(median + 0.1) < = 1). The resulting clusters were then extended by −300 and +50. Overlapping extended clusters were removed by only keeping those with the highest expression. e, Extended clusters were then split into 5 equal sized bins with decreasing absolute expression. f, RNA Pol II occupancy at binned clusters in ENCODE cell lines (highly expressed genes show the highest occupancy, but even bin5 clusters showing very low tag counts are still highly occupied). g, Bubble plot representation comparing known motif enrichments in bin1 (high expression) and bin5 (low expression) extended CAGE clusters. The bubble plots encode two quantitative parameters per motif: difference in motif occurrence between bin1 (x axis) and bin5 (y axis) as well as the adjusted P values for enrichment (bubble diameter). Colouring indicates significantly differentially distributed motifs (5% FDR). The right panel additionally summarizes the fraction of clusters in each bin that contain the indicated motifs along with the Benjamini Hochberg adjusted hypergeometric P value for differential enrichment.

Extended Data Figure 6 Features of cell-type-specific promoters.

a, The distribution of expression log ratios of all individual samples against the median of all samples is shown separately for CGI-associated and non-CGI-associated CAGE clusters. The dashed line corresponds to an expected log ratio if all samples contribute equally to the total expression. b, Histograms for genomic distance distributions of HepG2 DNase I hypersensitivity, H3K4me3, H2A.Z, POL2, P300, GABP, YY1, HNF4A, FOXA1 and FOXA2 ChIP-seq tag counts centred across CGI-associated and non-CGI-associated CAGE clusters (separated according to expression specificities) across a 2 kilobase (kb) genomic region. Expression specificity bins are colour-coded (as indicated in the DNase I panel) with blue representing the highest degree of specificity. Numbers of regions in bins are given in the GABP panel (CGI no. / nCGI no., colour coding as above). c, Histograms for genomic distance distributions of ChIP-seq-derived sequence motifs for GABP, YY1, HNF4A, FOXA1 and FOXA2 (corresponding to the samples in the lower panel of c) centred across CGI-associated and non-CGI-associated CAGE clusters (separated according to expression specificities) across a 2 kb genomic region. Motifs are shown on top. The percentage of promoters overlapping with ChIP-seq peaks (b) or consensus sequences (c) for transcription factors binding the highest specificity clusters (HNF4A, FOXA2, TCF7L2) is also given in blue. d, Plots showing mean expression specificity (high values indicate more constrained expression over cells, see the accompanying manuscript[4](/articles/nature13182#ref-CR4 "Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature http://dx.doi.org/10.1038/nature12787

             (this issue)")) in enhancers close to RefSeq promoters as a function of promoter CpG content and three classes of promoter expression specificity.

Extended Data Figure 7 Extended features of cell-type-specific promoters.

a, Distribution of global expression specificity estimated using primary cells, cell lines or tissues only. b, Distribution of expression specificity for HepG2, GM12878, HeLaS3, K562 and CD14+monocytes (distribution of expression log ratios of all individual samples against the median of all samples is shown separately for CGI-associated and nonCGI-associated CAGE clusters. The dashed line corresponds to an expected log ratio if all samples contribute equally to the total expression). c, Histograms for genomic distance distributions of K562 DNase I hypersensitivity, H3K4me3, H2A.Z, POL2, P300, GATA1 ChIP-seq tag counts centred across CGI-associated and non-CGI-associated CAGE clusters (separated according to expression specificities) across a 2 kb genomic region. Expression specificity bins are colour-coded with blue representing the highest degree of specificity. d, DNase I hypersensitivity, H3K4me3, H2A.Z, POL2, P300 and IRF4 in GM12878. e, DNase I hypersensitivity, H3K4me3, H2A.Z in HeLaS3. f, DNase I hypersensitivity, H3K4me3, H2A.Z, PU.1 and CEBPB in CD14+ monocytes.

Extended Data Figure 8 Transcription factor promoter expression profile clustering.

a, Biolayout visualization of transcription factor coexpression in human primary cells (3,775 nodes, 54,892 edges r > 0.70, MCL2.2). b, Hierarchical coexpression clustering and heatmap of ETS family transcription factors across the entire human collection (only promoter1(p1) data shown).

Extended Data Figure 9 Collapsed coexpression network for mouse coexpression groups.

One node is one group of promoters. Derived from expression profiles of 116,277 promoters across 402 primary cell types, tissues and cell lines (r > 0.75, MCLi = 2.2). For display, each group of promoters is collapsed into a sphere, the radius of which is proportional to the cube root of the number of promoters in that group. Edges indicate r > 0.6 between the average expression profiles of each cluster. Colours indicate loosely-associated collections of coexpression groups (MCLi = 1.2). Labels show representative descriptions of the dominant cell type in coexpression groups in each region of the network, and a selection of highly-enriched pathways (FDR < 10−4) from KEGG (K), WikiPathways (W), Netpath (N) and Reactome (R).

Extended Data Figure 10 Annotated expression profiles of alternative promoters.

Overlay of coexpression groups enriched for genes involved in the KEGG pathway for influenza A pathogenesis (hsa:05164; FDR < 0.1, _n_ > 2). a, Collapsed coexpression network showing 5 groups enriched for influenza pathogenesis genes: C0 (blue), C26 (purple), C61 (yellow), C187 (green) and C413 (red). b, Excerpt from KEGG pathway diagram showing positions of genes in each coexpression group (background colours as in a). Pathway entities that map to two coexpression groups have the background colour of the smaller group, and the text/border colour of the larger group. Details and promoter-level displays (edges indicate r > 0.75) for two coexpression groups are displayed with transcripts mapping to KEGG pathway highlighted (inset). In this example the KEGG pathway for influenza A pathogenesis (hsa:05164) was strikingly over-represented in one small coexpression group in particular (C413, P value <10−11, FDR = 4.5 × 10−10). Of 19 promoters in coexpression group 413, eight were present in the KEGG pathway, including RIG-I (DDX58), the gene encoding the receptor for the mitochondrial antiviral signalling pathway53. Four of the remaining genes (TRIM21, TRIM22, RTP4 and XAF1) were found to be key host determinants of influenza virus replication in a high-throughput short interfering RNA (siRNA) screen54, whereas another, PLSCR1, is required for a normal interferon response to influenza A55. The top five transcription factor expression profiles most correlated with C413 were IRF7, IRF9, STAT1, SP100 and ZNFX1, and from motif enrichment analysis, the most frequent motifs found in promoters of cluster C413 were potential IRF-binding motifs. c, p1@IRF9 and p2@IRF9 expression ranked by the ubiquitously expressed p1@IRF9 promoter. d, As in a but ranked by expression of p2@IRF9. e, f, Similar to a and b but showing expression of p1@TRMT5 (housekeeping profile) and p2@TRMT5 (expressed in pathogen challenged monocytes). g, Histogram showing the number of different coexpression clusters (see Fig. 4) in which named genes with alternative promoters participate. The majority of genes with alternative promoters participate in more than one cluster; 17 genes participate in more than 10 different clusters and are not shown on this graph.

Extended Data Figure 11 Sample ontology enrichment analysis (SOEA).

Expression profile-sample ontology associations were tested by Mann–Whitney rank sum test to identify cell, disease or anatomical ontology terms over-represented in ranked lists of samples expressing each peak. a, p1@CXCL6 enriched in vascular associated smooth muscle cells. b, p5@ST8SIA3 enriched in brain tissues. c, Novel peak enriched in mast cells. d, p1@KIAA0125 enriched in myeloid leukaemia. e, p1@BRI3 enriched in myeloid leukaemia. f, p1@BDNF enriched in fibroblasts. g, Novel peak enriched in leukocytes. h, Novel peak enriched in classical monocytes. i, j, Venn diagrams showing degree of overlap between peaks associated to known genes (blue), cell ontology enriched (yellow), Uberon anatomical ontology enriched (green) and disease ontology (red). i, At a threshold of 10−20 (Mann–Whitney rank sum test), 64% (59,835 out of 93,558) of the expression profiles of human known transcripts and 74% (67,810 out of 91,269) of the expression profiles for novel transcripts show enrichment for one or more sample ontologies. j, Mouse sample ontology enrichment 10−20 threshold. 30% (18,273 out of 61,134) known are enriched and 47% (26,176 out of 55,143) novel are enriched.

Extended Data Figure 12 Sample-to-sample correlation graph.

821 nodes are shown, 21,821 edges shown (_r_>0.75). a, Samples are coloured by sample type (primary cell, cell line or tissue). Note the separation of cell lines and primary cells. b, As in a, except major subgroups are coloured and labelled separately.

Supplementary information

Supplementary Information

This file contains Acknowledgements, Supplementary Methods, Supplementary Notes 1-7, Supplementary Figures 1-24, and additional references (see page 1 for more details). Supplementary Tables 1-16 are in a separate excel fie. (PDF 5780 kb)

Supplementary Tables

This file contains Supplementary Tables 1-16.Supplementary Table 2 in the original file was truncated and was replace online on 19 October 2015. (XLSX 19828 kb)

PowerPoint slides

Rights and permissions

About this article

Cite this article

The FANTOM Consortium and the RIKEN PMI and CLST (DGT). A promoter-level mammalian expression atlas.Nature 507, 462–470 (2014). https://doi.org/10.1038/nature13182

Download citation