GitHub - SpatialTranscriptomicsResearch/taggd: Genetic barcode demultiplexing (original) (raw)

TagGD: Barcode Demultiplexing Utilities for Spatial Transcriptomics Data

TagGD is a Python-based barcode demultiplexer for Spatial Transcriptomics data. It provides a generalized, optimized, and up-to-date version of the original C++ demultiplexer "findIndexes," available here.

For the original peer-reviewed reference to the program, see PLOS ONE.

Overview

The primary goal of TagGD is to extract cDNA barcodes from input files (FASTQ, FASTA, SAM, or BAM) and match them against a list of reference barcodes using a k-mer-based approach. Matched reads are output with barcode and spatial information added to each record.

TagGD is versatile and can be used to demultiplex any type of index if a reference file is provided. Users can even create fake spatial coordinates (X, Y) for general-purpose demultiplexing tasks.

Key Features


Requirements


Installation

From Source

If you are using a virtual environment like Anaconda:

git clone https://github.com/your-repo/taggd.git cd taggd python setup.py build python setup.py install

or using pip

git clone https://github.com/your-repo/taggd.git cd taggd pip install .

Using pip

Install directly from PyPI:


Building the Project

If you are contributing, testing or making changes to the code, you may need to build or rebuild the Cython extensions:

python setup.py build_ext --inplace

Testing the Project


Usage

Basic Command

To see all available options, run:

Input Reference File Format

The reference file should contain barcodes and optional spatial coordinates, formatted as follows:

Example:

ACGTACGT 0 0 TGCATGCA 1 1


Example Commands

Example

taggd_demultiplex --k 6 --max-edit-distance 3 --overhang 2 --subprocesses 4 --seed randomseed <barcodes.tsv>


Output

TagGD generates the following output files:

Options

Run taggd_demultiplex -h to view all available options and their descriptions.


Contact

For questions, bug reports, or contributions, please contact: