Artificial genome sequences and Illumina reads - Nucleotide divergence (0.01% to 80%) from the SARS-CoV-2 reference (MN908947.3) (original) (raw)

Published March 1, 2024 | Version v1

Dataset Open

Authors/Creators

Description

This dataset comprises artificial genome sequences and derived simulated Illumina 150bp paired-end reads with varying percentages of homogeneous nucleotide divergence from the SARS-CoV-2 reference genome (MN908947.3), ranging from 0.01% to 80%. A total of 1990 Illumina 150bp paired-end reads were artificially generated per sample, targeting a 10-fold depth of coverage.

Reads were simulated using ART (https://doi.org/10.1093/bioinformatics/btr708).

The code and instructions to reproduce the artificial sequences and reads are also available in this repository.

Files

fasta.zip

Files (3.2 MB)