Dbfetch < EMBL-EBI (original) (raw)

Dbfetch Help

What is Dbfetch?

Dbfetch is an abbreviation for "database fetch". Dbfetch provides an easy way to retrieve entries from various databases at the EMBL-EBI in a consistent manner. It can be used from any browser as well as well as within a web-aware scripting tool that uses wget, lynx or similar.

How to use Dbfetch?

  1. Select a database:
    -If you are using the first form to paste your search items: choose adatabase name from this form.
    -If you are using the second form to upload your search items: thedatabase name is included at the beginning of each line line of the upload file followed by a colon...more
  2. Enter search terms:
    These MUST BE in the appropriate database format, up to 200 search items can be queried in one run.
    -If you are using the first form: separate search items with a comma or space...more
    -If you are using the second form: separate search items with a new line...more
  3. Choose an output format:
    Here you can choose the simpler fasta format, or the databases' default format for the chosen database...more
  4. Style:
    You can get your results as text or html...more
  5. Retrieve!
    You are now ready to fetch your results, by pressing the Retrieve button.

Search Items

You may enter up to 200 search items for your chosen database. Multiple search terms should be separated by EITHER a space OR a comma.

e.g. ENA Sequence: " AE014292,AE017197,AE017354 " or " AE014292 AE017197 AE017354 "
e.g. UniProtKB: " 1433X_MAIZE,1433T_RAT,ACR2_YEAST " or " 1433X_MAIZE 1433T_RAT ACR2_YEAST "

Upload File

Here you may upload a file in the specified format. You may retrieve up to 200 entries and Entries in the uploaded file all need to belong to the same database.

Each entry you wish to retrieve MUST be on a new line and in the format: "database name":"id"

e.g. ENA Sequence

 ena_sequence:AE014292
 ena_sequence:AE017197
 ena_sequence:AE017354
 

e.g. UniProt

 uniprot:A1AG1_HUMAN
 uniprot:A1AT_PIG
 uniprot:ACR2_YEAST
 

Output Format

The sequence/database format of the results. Sequence formats are simply the way in which the amino acid or DNA sequence is recorded in a computer file. Different programs expect different formats, so if you are to submit a job successfully, it is important to understand what the various formats look like. To learn more about sequence formats, please see the EMBOSS documentation athttp://emboss.open-bio.org/html/use/apa.html

The default is often the default format for the specified database, which can also can be selected in the drop down list. (e.g. the default for the UniProtKB Database is "uniprot" format.) See an example of this format at http://web.expasy.org/docs/userman.html

Fasta format contains a one line header followed by lines of sequence data. Sequences in fasta formatted files are preceded by a line starting with a " >" symbol. The first word on this line is the name of the sequence. The rest of the line is a description of the sequence. Example:

uniprot|P13346|FOSB_MOUSE Protein fosB. MFQAFPGDYDSGSRCSSSPSAESQYLSSVDSFGSPPTAAASQECAGLGEMPGSFVPTVTA ITTSQDLQWLVQPTLISSMAQSQGQPLASQPPAVDPYDMPGTSYSTPGLSAYSTGGASGS GGPSTSTTTSGPVSARPARARPRRPREETLTPEEEEKRRVRRERNKLAAAKCRNRRRELT DRLQAETDQLEEEKAELESEIAELQKEKERLEFVLVAHKPGCKIPYEEGPGPGPLAEVRD LPGSTSAKEDGFGWLLPPPPPPPLPFQSSRDAPPNLTASLFTHSEVQVLGDPFPVVSPSY TSSFVLTCPEVSAFAGAQRTSGSEQPSDPLNSPSLLAL

Learn more about this format athttp://www.wikipedia.org/wiki/FASTA_format.If no format is specified, as in an http request, the default format will be used.

ENA XML formats, there are two XML formats available from the ENA Sequence database.
EMBLXML : XML format for the ENA Sequence nucleotide sequence database, developed internally.
INSDXML : XML format for the ENA Sequence nucleotide sequence database, developed in collaboration with NCBI (GenBank) and DDBJ.
DTD for INSDXML and the DTD/XML Schema for EBMLXML can be found here.

Style

The results can either be delivered as raw text or as html, specify here which style you prefer. The default style is html. For people interested in programmatic access to the Dbfetch functionality, we recommend using our new Web Services version of Dbfetch: WSDbfetch

Alternatively, you can use Dbfetch for direct access. Making scripted http requests to Dbfetch is very simple, the parameters which can be used are db, id, format and style. Of these parameters only db and id are required fields. When omitting to use format and/or style, the defaults for the chosen database will be used (the default style is always html).

The URL to Dbfetch is always of this format: dbfetch?db=DB_NAME&id=IDS&format=FORMAT_NAME&style=STYLE_NAME

For details of the available databases, formats and styles see thelist of databases. Additional examples are provided in the syntax guide.

Examples:
[https://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db=ena_sequence&id=J00231,K00650,D87894,AJ242600](dbfetch?db=ena%5Fsequence&id=J00231,K00650,D87894,AJ242600)

Instead of the default raw (plain ASCII) style, entries can also be retrieved in plain text (raw):
[https://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db=ena_sequence&id=J00231,K00650,D87894,AJ242600&style=raw](dbfetch?db=ena%5Fsequence&id=J00231,K00650,D87894,AJ242600&style=raw)

It is also possible to retrieve Fasta formatted sequences:
[https://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db=ena_sequence&id=J00231,K00650,D87894,AJ242600&format=fasta](dbfetch?db=ena%5Fsequence&id=J00231,K00650,D87894,AJ242600&format=fasta)

Because of backward compatibility issues the program can be simply called by giving one or more INSDC accession numbers or entry names:
[https://www.ebi.ac.uk/Tools/dbfetch/dbfetch?J00231](dbfetch?J00231)

How do I cite Dbfetch?

Madeira F., Pearce M., Tivey A.R.N., Basutkar P., Lee J., Edbali O., Madhusoodanan N., Kolesnikov A., Lopez R. (2022) Search and sequence analysis tools services from EMBL-EBI in 2022.Nucleic Acids Research 2022 Apr. PubMed Id:35412617 Abstract DOI:10.1093/nar/gkac240

How to get support?

First please see if your issue is covered by theFrequently Asked Questions for Dbfetch. If your issue is not addressed in the FAQ then pleasecontact us for assistance.