Introduction | IlluminaConnectedAnnotations (original) (raw)

Illumina Connected Annotations provides translational research-grade annotation of genomic variants (SNVs, MNVs, insertions, deletions, indels, STRs, gene fusions, and SVs (including CNVs). It can be run as a stand-alone package, or integrated into larger software tools that require variant annotation.

The input to Illumina Connected Annotations are VCFs and the output is a structured JSON representation of all annotation and sample information (as extracted from the VCF). Illumina Connected Annotations handles multiple alternate alleles and multiple samples with ease.

The software is being developed under a rigorous SDLC and testing process to ensure accuracy of the results and enable embedding in other software. Illumina Connected Annotations uses a continuous integration pipeline where millions of variant annotations are monitored against baseline values daily.

What does Illumina Connected Annotations annotate?

We use Sequence Ontology consequences to describe how each variant impacts a given transcript:

Reference genome

For GRCh37, our latest reference genome is using version GRCh37.p13. The list of chromosome and contigs in this version can be checked from this fileFor GRCh38, our latest reference genome is using version GRCh38.p14. The list of chromosome and contigs in this version can be checked from this file

caution

There was a bug in our previous genome reference. Previously, we generate genome reference based on GRCh38.p13 version 109.20190607 for chromosomes and contigs name. Despite that, we use GRCh38.p12 version 109 for the FASTA sequence. This resulted in mismatch between some alt contigs name and its FASTA sequence. This mismatch can cause some minor issue on the variant ID and HGVS g. notation if the variant is on the alt contigs that were affected. This issue is not present for variant that occurs on main chromosome (chromosome 1-22, X, and Y).

With release GRCh38.p14, we have fixed this issue and the GRCh38.p14 release will also be backward compatible with previous release. Old transcript and gene model files and all of supplementary annotation data will not have any issue with the new genome reference version. It is advisable to update to the current genome reference version.

Transcript and Gene Models

The transcript and gene models are obtained from RefSeq and Ensembl. The current officially supported versions for GRCh38 are:

Data Source Version Release Date
RefSeq GCF_000001405.40-RS_2024_08 2024-08-26
RefSeq GCF_000001405.40-RS_2023_10 2023-10-07
RefSeq GCF_000001405.40-RS_2023_03 2023-03-21
Ensembl 113 2024-10-18
Ensembl 112 2024-05-14
Ensembl 110 2023-04-27
Ensembl 108 2022-10-20

For GRCh37:

Data Source Version Release Date
RefSeq 105.20220307 2022-03-10
Ensembl 110 2023-02-08
Ensembl 108 2022-10-20
note

For GRCh37 Ensembl release, despite the latest version being 110 (with existing newer release versions 111 and 112), the annotation data is effectively the same as version 87 which was released back in 2017.

For GRCh37 Refseq release, NCBI has stopped releasing new gene annotation and version 105.20220307 is the last release version for GRCh37.

For gene symbols that we support for the transcript model above, the latest data is from HGNCwebsite as of 2024-06-03.

More supported versions can be listed and downloaded using our DataManager Manager.

Supplementary Annotation

In addition, it uses external data sources to provide additional context for each variant. Illumina Connected Annotations provides annotations from the following sources divided into 2 tiers: Professional and basic. The basic tier can be accessed free of charge. The professional tier requires a license. Please see Licensed Content for details. For access, please contact annotation_support@illumina.com.

Data Source Availability Latest Supported Version
COSMIC Professional 102
OMIM Professional 20250815
Primate AI-3D Professional 1.0
Promoter AI Professional 1.0
Splice AI Professional 1.3
PhyloPPrimate Basic 1.0
1000 Genomes Project Basic Phase 3 v3plus
Alpha Missense Basic 1.0
Cancer Hotspots Basic 2017
ClinGen Basic 20250815
ClinVar Basic 20250806
DANN Basic 20200205
dbSNP Basic 156
DECIPHER Basic 201509
FusionCatcher Basic 1.33
GERP Basic 20110522
GME Variome Basic 20160618
gnomAD (GRCh37) Basic 2.1
gnomAD (GRCh38) Basic 4.1
MITOMAP Basic 20200819
MultiZ 100 way Basic 20171006
REVEL Basic 20200205
TOPMed Basic freeze 5
AlphaMissense Basic 1.0
ABRaOM Basic SABE-WGS-1171

Data Management

To check for latest version, find available versions and downloading data, please use our utility DataManager.

Download

Please visit Illumina Connected Annotations.