GENCODE - Human Release 47 (original) (raw)

Release 47 (GRCh38.p14)

GTF / GFF3 files

Content Regions Description Download
Comprehensive gene annotation CHR It contains the comprehensive gene annotation on the reference chromosomes only GTF GFF3
Comprehensive gene annotation ALL It contains the comprehensive gene annotation on the reference chromosomes, scaffolds, assembly patches and alternate loci (haplotypes) GTF GFF3
Comprehensive gene annotation PRI It contains the comprehensive gene annotation on the primary assembly (chromosomes and scaffolds) sequence regions GTF GFF3
Basic gene annotation CHR It contains the basic gene annotation on the reference chromosomes only This is a subset of the corresponding comprehensive annotation, including only those transcripts tagged as 'basic' in every gene This is the main annotation file for most users GTF GFF3
Basic gene annotation ALL It contains the basic gene annotation on the reference chromosomes, scaffolds, assembly patches and alternate loci (haplotypes) This is a subset of the corresponding comprehensive annotation, including only those transcripts tagged as 'basic' in every gene This is a superset of the main annotation file GTF GFF3
Basic gene annotation PRI It contains the basic gene annotation on the primary assembly (chromosomes and scaffolds) sequence regions This is a subset of the corresponding comprehensive annotation, including only those transcripts tagged as 'basic' in every gene GTF GFF3
Long non-coding RNA gene annotation CHR It contains the comprehensive gene annotation of lncRNA genes on the reference chromosomes GTF GFF3
PolyA feature annotation CHR It contains the polyA features (polyA_signal, polyA_site, pseudo_polyA) manually annotated by HAVANA on the reference chromosomes This dataset does not form part of the main annotation file GTF GFF3
Consensus pseudogenes predicted by the Yale and UCSC pipelines CHR 2-way consensus (retrotransposed) pseudogenes predicted by the Yale and UCSC pipelines, but not by HAVANA, on the reference chromosomes This dataset does not form part of the main annotation file GTF GFF3
Predicted tRNA genes CHR tRNA genes predicted by ENSEMBL on the reference chromosomes using tRNAscan-SE This dataset does not form part of the main annotation file GTF GFF3

Fasta files

Content Regions Description Download
Transcript sequences CHR Nucleotide sequences of all transcripts on the reference chromosomes Fasta
Protein-coding transcript sequences CHR Nucleotide sequences of coding transcripts on the reference chromosomes Transcript biotypes: protein_coding, nonsense_mediated_decay, non_stop_decay, IG_*_gene, TR_*_gene, polymorphic_pseudogene, protein_coding_LoF Fasta
Protein-coding transcript translation sequences CHR Amino acid sequences of coding transcript translations on the reference chromosomes Transcript biotypes: protein_coding, nonsense_mediated_decay, non_stop_decay, IG_*_gene, TR_*_gene, polymorphic_pseudogene, protein_coding_LoF Fasta
Long non-coding RNA transcript sequences CHR Nucleotide sequences of long non-coding RNA transcripts on the reference chromosomes Fasta
Genome sequence (GRCh38.p14) ALL Nucleotide sequence of the GRCh38.p14 genome assembly version on all regions, including reference chromosomes, scaffolds, assembly patches and haplotypes The sequence region names are the same as in the GTF/GFF3 files Fasta
Genome sequence, primary assembly (GRCh38) PRI Nucleotide sequence of the GRCh38 primary genome assembly (chromosomes and scaffolds) The sequence region names are the same as in the GTF/GFF3 files Fasta

Metadata files

Content Regions Description Download
Annotation remarks ALL Remarks made during the manual annotation of the transcript Metadata
Entrez gene ids ALL Entrez gene ids associated to GENCODE transcripts (from Ensembl xref pipeline) Metadata
Exon annotation evidence ALL Piece of evidence used in the annotation of an exon (usually peptides, mRNAs, ESTs) Metadata
Gene source ALL Source of the gene annotation (Ensembl, Havana, Ensembl-Havana merged model or imported in the case of small RNA and mitochondrial genes) Metadata
Gene symbol ALL HGNC approved gene symbol (from Ensembl xref pipeline) Metadata
PDB id ALL PDB entries associated to the transcript (from Ensembl xref pipeline) Metadata
PolyA features ALL Manually annotated polyA features overlapping the transcript 3'-end Metadata
PubMed id ALL Pubmed ids of publications associated to the transcript (from HGNC website) Metadata
RefSeq ALL RefSeq RNA and/or protein associated to the transcript (from Ensembl xref pipeline) Metadata
Selenocysteine ALL Amino acid position of a selenocysteine residue in the transcript Metadata
SwissProt ALL UniProtKB/SwissProt entry associated to the transcript (from Ensembl xref pipeline) Metadata
Transcript source ALL Source of the transcript annotation Metadata
Transcript annotation evidence ALL Piece of evidence used in the annotation of the transcript Metadata
TrEMBL ALL UniProtKB/TrEMBL entry associated to the transcript (from Ensembl xref pipeline) Metadata