What is a Third Party Annotation (TPA) Sequence? (original) (raw)

Important update
Beginning in January 2025, TPA-Exp and TPA-Inf submission types will no longer be accepted as new submissions. Please see INSDC TPA Announcement for more information.

TPA: A database designed to capture experimental or inferential results that support submitter-provided annotation for, or assembly of, sequence data that the submitter did not directly determine but derived from GenBank primary data.

There are several types of TPA records:

TPA database records differ from GenBank and RefSeq records:

A TPA sequence is derived or assembled from primary sequence data currently found in the DDBJ/EMBL/GenBank International Nucleotide Sequence Database. It can be genomic or mRNA sequence and can be assembled or derived from primary genomic and/or mRNA sequences. TPA sequences are submitted to DDBJ/EMBL/GenBank as part of the process of publishing biological experiments that include the assembly and/or annotation of existing, primary nucleotide sequences.

Examples of TPA sequences are:

Note: It is required that all new annotations be experimentally determined to exist, directly or indirectly. Bioinformatic or computational work alone is not sufficient as supporting evidence of new annotation.

What is a primary sequence?

'Primary' sequences used to assemble a TPA sequence are those that have been experimentally determined and are now publicly available in the GenBank/EMBL/DDBJ databases. These include: SRA data (reported with their SRR numbers), Whole Genome Shotgun (WGS) contig sequences (not master records), Transcriptome Shotgun Assembly (TSA) sequences, ESTs, and Trace Archive database sequences. They may not be from a proprietary database. Each primary sequence used to assemble a TPA sequence must be identified by a GenBank accession number in the TPA sequence submission.

Reference sequences may not be cited as data used to build TPA sequences because RefSeqs are not primary data. For example, sequences with Accession Numbers such as NT_112233 or NW_123456 represent contig sequences; the sequences used to assemble these contigs, which can be found at the bottom of contig records, should be cited in a TPA sequence submission. Sequences with Accession Numbers such as XM_345678 or NM_123456 are RefSeqs representing mRNAs that are not experimentally determined and therefore cannot be cited as primary data.

How Do TPA Sequence Records Differ from Other GenBank/EMBL/DDBJ Records?

The display of a TPA sequence is similar to other GenBank/INSDC records, but includes the following:

Other Features and References are similar to those displayed in other GenBank/EMBL/DDBJ records.

An example of a TPA:experimental is BK000016

An example of a TPA:inferential is BK000554

An example of a TPA:assembly is BK010317

TPA sequence records are shared by all three Collaboration databases and can be found using typical search methods in the GenBank Nucleotide and Protein databases (ie, submitter name, gene/protein name, accession number, etc)

How to Submit TPA Sequence Data

Sequences can be submitted to the TPA database through BankIt:

When are TPA sequences released?

What should NOT be submitted to TPA