GENCODE - Human Release 47 (original) (raw)
Release 47 (GRCh38.p14)
- Statistics of this release
- More information about this assembly (including patches, scaffolds and haplotypes)
- Go to GRCh37 version of this release
GTF / GFF3 files
Content | Regions | Description | Download |
---|---|---|---|
Comprehensive gene annotation | CHR | It contains the comprehensive gene annotation on the reference chromosomes only | GTF GFF3 |
Comprehensive gene annotation | ALL | It contains the comprehensive gene annotation on the reference chromosomes, scaffolds, assembly patches and alternate loci (haplotypes) | GTF GFF3 |
Comprehensive gene annotation | PRI | It contains the comprehensive gene annotation on the primary assembly (chromosomes and scaffolds) sequence regions | GTF GFF3 |
Basic gene annotation | CHR | It contains the basic gene annotation on the reference chromosomes only This is a subset of the corresponding comprehensive annotation, including only those transcripts tagged as 'basic' in every gene This is the main annotation file for most users | GTF GFF3 |
Basic gene annotation | ALL | It contains the basic gene annotation on the reference chromosomes, scaffolds, assembly patches and alternate loci (haplotypes) This is a subset of the corresponding comprehensive annotation, including only those transcripts tagged as 'basic' in every gene This is a superset of the main annotation file | GTF GFF3 |
Basic gene annotation | PRI | It contains the basic gene annotation on the primary assembly (chromosomes and scaffolds) sequence regions This is a subset of the corresponding comprehensive annotation, including only those transcripts tagged as 'basic' in every gene | GTF GFF3 |
Long non-coding RNA gene annotation | CHR | It contains the comprehensive gene annotation of lncRNA genes on the reference chromosomes | GTF GFF3 |
PolyA feature annotation | CHR | It contains the polyA features (polyA_signal, polyA_site, pseudo_polyA) manually annotated by HAVANA on the reference chromosomes This dataset does not form part of the main annotation file | GTF GFF3 |
Consensus pseudogenes predicted by the Yale and UCSC pipelines | CHR | 2-way consensus (retrotransposed) pseudogenes predicted by the Yale and UCSC pipelines, but not by HAVANA, on the reference chromosomes This dataset does not form part of the main annotation file | GTF GFF3 |
Predicted tRNA genes | CHR | tRNA genes predicted by ENSEMBL on the reference chromosomes using tRNAscan-SE This dataset does not form part of the main annotation file | GTF GFF3 |
Fasta files
Content | Regions | Description | Download |
---|---|---|---|
Transcript sequences | CHR | Nucleotide sequences of all transcripts on the reference chromosomes | Fasta |
Protein-coding transcript sequences | CHR | Nucleotide sequences of coding transcripts on the reference chromosomes Transcript biotypes: protein_coding, nonsense_mediated_decay, non_stop_decay, IG_*_gene, TR_*_gene, polymorphic_pseudogene, protein_coding_LoF | Fasta |
Protein-coding transcript translation sequences | CHR | Amino acid sequences of coding transcript translations on the reference chromosomes Transcript biotypes: protein_coding, nonsense_mediated_decay, non_stop_decay, IG_*_gene, TR_*_gene, polymorphic_pseudogene, protein_coding_LoF | Fasta |
Long non-coding RNA transcript sequences | CHR | Nucleotide sequences of long non-coding RNA transcripts on the reference chromosomes | Fasta |
Genome sequence (GRCh38.p14) | ALL | Nucleotide sequence of the GRCh38.p14 genome assembly version on all regions, including reference chromosomes, scaffolds, assembly patches and haplotypes The sequence region names are the same as in the GTF/GFF3 files | Fasta |
Genome sequence, primary assembly (GRCh38) | PRI | Nucleotide sequence of the GRCh38 primary genome assembly (chromosomes and scaffolds) The sequence region names are the same as in the GTF/GFF3 files | Fasta |
Metadata files
Content | Regions | Description | Download |
---|---|---|---|
Annotation remarks | ALL | Remarks made during the manual annotation of the transcript | Metadata |
Entrez gene ids | ALL | Entrez gene ids associated to GENCODE transcripts (from Ensembl xref pipeline) | Metadata |
Exon annotation evidence | ALL | Piece of evidence used in the annotation of an exon (usually peptides, mRNAs, ESTs) | Metadata |
Gene source | ALL | Source of the gene annotation (Ensembl, Havana, Ensembl-Havana merged model or imported in the case of small RNA and mitochondrial genes) | Metadata |
Gene symbol | ALL | HGNC approved gene symbol (from Ensembl xref pipeline) | Metadata |
PDB id | ALL | PDB entries associated to the transcript (from Ensembl xref pipeline) | Metadata |
PolyA features | ALL | Manually annotated polyA features overlapping the transcript 3'-end | Metadata |
PubMed id | ALL | Pubmed ids of publications associated to the transcript (from HGNC website) | Metadata |
RefSeq | ALL | RefSeq RNA and/or protein associated to the transcript (from Ensembl xref pipeline) | Metadata |
Selenocysteine | ALL | Amino acid position of a selenocysteine residue in the transcript | Metadata |
SwissProt | ALL | UniProtKB/SwissProt entry associated to the transcript (from Ensembl xref pipeline) | Metadata |
Transcript source | ALL | Source of the transcript annotation | Metadata |
Transcript annotation evidence | ALL | Piece of evidence used in the annotation of the transcript | Metadata |
TrEMBL | ALL | UniProtKB/TrEMBL entry associated to the transcript (from Ensembl xref pipeline) | Metadata |