Artificial genome sequences and Illumina reads - Nucleotide divergence (0.01% to 80%) from the SARS-CoV-2 reference (MN908947.3) (original) (raw)
Published March 1, 2024 | Version v1
Dataset Open
Authors/Creators
- 1. Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal
Description
This dataset comprises artificial genome sequences and derived simulated Illumina 150bp paired-end reads with varying percentages of homogeneous nucleotide divergence from the SARS-CoV-2 reference genome (MN908947.3), ranging from 0.01% to 80%. A total of 1990 Illumina 150bp paired-end reads were artificially generated per sample, targeting a 10-fold depth of coverage.
Reads were simulated using ART (https://doi.org/10.1093/bioinformatics/btr708).
The code and instructions to reproduce the artificial sequences and reads are also available in this repository.
Files
fasta.zip
Files (3.2 MB)