GitHub - INSaFLU/findONTime: The findONTime tool runs concurrently with MinION sequencing and merges (at user defined time intervals) the FASTQ files that are being generated in real-time for each sample. It can also automatically upload the files to the INSaFLU-TELEVIR platform and launch the metagenomics virus detection analysis using the TELEVIR module. (original) (raw)

PyPI version PyPI version PyPI version

The findONTime tool runs concurrently with MinION sequencing and merges (at user defined time intervals) the FASTQ files that are being generated in real-time for each sample. It can also automatically upload the files to the INSaFLU-TELEVIR platform (https://insaflu.insa.pt/) and launch the metagenomics virus detection analysis using the TELEVIR module.

Motivation

Reducing the time needed for pathogen detection and the sequencing costs per sample is crucial in the context of diagnostics using metagenomics sequencing. In fact, when performing hypothesis-free viral diagnosis by sequencing complex biological samples, the proportion of the virus in a sample is unknown. As such, the amount of sequencing data, and consequently run length, needed to accurately detect the virus cannot be predicted a priori.

findONTime runs concurrently with MinION sequencing and monitors the FASTQ files that are being generated in real-time for each sample, merges the files (at user defined time intervals), uploads them to the INSaFLU-TELEVIR platform and launches the metagenomics virus detection analysis using the TELEVIR module.

This allows users to detect a virus in a sample as early as possible during the sequencing run, reducing the time gap between obtaining the sample and the diagnosis, and also reducing sequencing costs (as ONT runs can be stopped at any time and the flow cells can be cleaned and reused). findONTime can be used as a “start-to-end” solution or for particular tasks (e.g., merging ONT output files, metadata preparation and upload to INSaFLU-TELEVIR).

Details

Upload reads to INSaFLU-TELEVIR

findONTime can interact with the INSaFLU-TELEVIR platfotm in two ways:

Note: Automatic upload to the INSaFLU-TELEVIR website accounts is not available yet. If you only have an online account (and not a local INSaFLU installation), findONTime will be run concurrently with MinION sequencing to monitor and concatenate the FASTQ files that are being generated in real-time for each sample and prepare metadata templates ready to be upload to INSaFLU-TELEVIR.

Launch a virus detection analysis (TELEVIR)

If you have a local INSaFLU-TELEVIR installation (docker or server), and set the "--televir argument", findONTtimeThe tool can create one INSaFLU-TELEVIR (virus detection) project including the samples under ONT sequencing. The project name is defined by the user (--tag argument) and the sample names are the ones of the input directory (usually barcode01, barcode02, etc) with an extra user-defined tag as suffix.

Input Files

Config must contain:

see example config.ini

API

usage: findontime [-h] -i IN_DIR -o OUT_DIR [-s SLEEP] [-n TAG] [--config CONFIG] [--max_size MAX_SIZE] [--merge] [--downsize] [--upload {last,all,none}] [--connect {docker,ssh}] [--keep_names] [--monitor] [--televir]

Process fastq files.

optional arguments: -h, --help show this help message and exit -i IN_DIR, --in_dir IN_DIR Input directory -o OUT_DIR, --out_dir OUT_DIR Output directory -s SLEEP, --sleep SLEEP Sleep time between checks in monitor mode (default 600 seconds) -n TAG, --tag TAG name tag, if given, will be added to the output file names --config CONFIG config file --max_size MAX_SIZE max size of the output file, in kilobytes (default 400000 kbytes) --merge merge files --downsize downsize fastq files to max_size --upload {last,all,none} file upload strategy (default: last) --connect {docker,ssh} file upload strategy (default: docker) --keep_names keep original file names --monitor monitor directory until killed --televir deploy televir pathogen identification on each sample

REQUIREMENTS

INSTALLATION

python -m venv .venv source .venv/bin/activate python -m pip install findontime

USAGE

(from simple to more advanced usage situations)

findontime -i input_directory -o output_directory --tag suffix --max_size 100000000 --merge --upload none

NOTE: In this simpler usage case, the fastq.gz files will only be only merged, i.e., they will not be automatically uploaded to the INSaFLU-TELEVIR platform. In case you want to concatenate all ONT same-sample files (file_0.fastq.gz, file_1.fastq.gz, etc), make sure you set up a "max_size" (e.g., 100000000 kbytes) enough to ensure that the merged file compiles all partial files. findONTime will also prepare a metadata table ready to be uploaded to INSaFLU-TELEVIR.

findontime -i input_directory -o output_directory --tag suffix -s 600 --max_size 100000000 --monitor --merge --upload none

_NOTE: In this simpler usage case, the fastq.gz files will only be only merged, i.e., they will not be automatically uploaded to the INSaFLU-TELEVIR platform. In case you want to concatenate all ONT same-sample files (file_0.fastq.gz, file_1.fastq.gz, etc), make sure you set up a "max_size" (e.g., 100000000 kbytes) enough to ensure that the merged file compiles all partial files. findONTime will also prepare a metadata table ready to be uploaded to INSaFLU-TELEVIR. _

findontime -i input_directory -o output_directory --tag suffix -s 600 --max_size 400000 --monitor --merge --upload none --downsize

NOTE: In this case, the fastq.gz files will only be concatenated (i.e., they will not be automatically uploaded to the INSaFLU-TELEVIR platform), but the merged files will be downsized to the user-defined "max_sixe" (e.g., 400000 kbytes). This usage is useful to prepare files (reads and metadata) ready to be uploaded to the online INSaFLU-TELEVIR platform, which is currently limited to an upload max size per file of 400 MB.

findontime -i input_directory -o output_directory --tag suffix -s 600 --max_size 400000 --monitor --connect docker --merge --upload last --connect docker

NOTE: In this case, the fastq.gz files will be concatenated and automatically uploaded to the INSaFLU-TELEVIR platform. The merged files will be downsized to the user-defined "max_sixe" (e.g., 400000 kbytes), which must be fitted to the maximum upload file size defined in your local INSaFLU_TELEVIR installation

findontime -i input_directory -o output_directory --tag suffix -s 600 --max_size 400000 --monitor --connect docker --merge --upload last –-televir

NOTE: In this case, the fastq.gz files will be concatenated, automatically uploaded to the INSaFLU-TELEVIR platform and run under a virus detection (TELEVIR) project. The merged files will be downsized to the user-defined "max_sixe" (e.g., 400000 kbytes), which must be fitted to the maximum upload file size defined in your local INSaFLU_TELEVIR installation

TESTING

Running pytest in the root directory will run all tests that do not interact with INSaFLU-TELEVIR. In order to test the upload and metagenomics functionalities, the user needs to provide a valid config file to a local docker installation, and to pass the --docker flag to pytest:

pytest --docker --config-file config.ini

MAIN OUTPUT

Note: The output directory structure is maintained.

Maintainers

Funding

The development of findONTime tool was co-funded by the project “Sustainable use and integration of enhanced infrastructure into routine genome-based surveillance and outbreak investigation activities in Portugal” on behalf of EU4H programme (EU4H-2022-DGA-MS-IBA-1). It was also financed by the DURABLE project. The DURABLE project has been co-funded by the European Union, under the EU4Health Programme (EU4H), Project no. 101102733. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the European Health and Digital Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.

CITATION

If you run findONTime, please cite this Github page:

João D. Santos, André Santos, Joana Isidro, Miguel Pinto, João P. Gomes, Daniel Sobral, Vítor Borges (2023). findONTime: A bioinformatics tool for real-time metagenomics virus detection analysis using ONT technology and the INSaFLU-TELEVIR platform. https://github.com/INSaFLU/findONTime