IDconverter and IDClight: conversion and annotation of gene and protein IDs - PubMed (original) (raw)

IDconverter and IDClight: conversion and annotation of gene and protein IDs

Andreu Alibés et al. BMC Bioinformatics. 2007.

Abstract

Background: Researchers involved in the annotation of large numbers of gene, clone or protein identifiers are usually required to perform a one-by-one conversion for each identifier. When the field of research is one such as microarray experiments, this number may be around 30,000.

Results: To help researchers map accession numbers and identifiers among clones, genes, proteins and chromosomal positions, we have designed and developed IDconverter and IDClight. They are two user-friendly, freely available web server applications that also provide additional functional information by mapping the identifiers on to pathways, Gene Ontology terms, and literature references. Both tools are high-throughput oriented and include identifiers for the most common genomic databases. These tools have been compared to other similar tools, showing that they are among the fastest and the most up-to-date.

Conclusion: These tools provide a fast and intuitive way of enriching the information coming out of high-throughput experiments like microarrays. They can be valuable both to wet-lab researchers and to bioinformaticians.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Relationships between the different identifiers. All relationships between identifiers that we have taken into account are displayed. The different identifiers are color coded according to which database they are taken from. The path taken from an identifier to another is always the shortest one. Red asterisk: To ensure that pregenerated information is as complete as possible, there are several paths to go from identifiers in the UniGene database to Ensembl Gene ID. The script first tries to map the Entrez Gene ID with an Ensembl Gene ID. If this fails, it tries with a UniGene Cluster ID, and finally with the HUGO name. Red diamond: Gene location is taken from either Ensembl or UCSC, or both, at user's wish.

Figure 2

Figure 2

Snapshot of IDconverter and IDClight. On the left, IDconverter HTML output for a single human GenBank Accession. On the right, IDClight output for a mouse Ensembl gene.

Figure 3

Figure 3

Input and output possibilities for the four tools compared. Description of the allowed input IDs and those IDs they can be converted to, for MatchMiner (M), SOURCE (S), Onto-Translate (O), and IDconverter (I). Notes: 1 M: cDNA, FISH-mapped BAG; 2 M: Cytogenetic location as input. Cytogenetic location from UCSC, transcription start and end bp; 3 S: Chromosome Location, Cytoband; 4 O: 18 Affymetrix arrays; 5 O: dbest gi, seq id, protein gi; 6 O: Agilent (5 arrays), Amersham (3), Clonetech(22), Operon (3), Perkin Elmer (5), Sigmagenosys (2), Superarray (86), Takara (6); I: Location from Ensembl and UCSC (start bp, end bp, chromosome and strand).

Figure 4

Figure 4

Analysis of the time performance. Time (in seconds) vs. number of input IDs for the twelve tests performed with MatchMiner (black lines), SOURCE (green), Onto-Translate (blue), and IDconverter (red). Abbreviations: Affy: Affymetrix ID; GB: GenBank accession; UG: UniGene cluster; RS_pep: RefSeq_peptide; Entrez: Entrez Gene ID; RS_RNA: RefSeq_RNA.

Figure 5

Figure 5

Analysis of the completeness. The percentage of input IDs that are converted to at least one ID is shown, for each type of input and output tested and the four applications: MatchMiner (black), SOURCE (green), Onto-Translate (blue), and IDconverter (red). Solid colors: Percentage is calculated after running the whole set through the application. Diagonal lines: The application was not able to convert the whole set, thus the percentage is taken from a smaller set. Horizontal lines: MatchMiner does not allow the user to specify to which Affymetrix array the input IDs belong, thus, given that the same probeset id can be present in different Affymetrix arrays, the percentage has to be considered an upper boundary.

Similar articles

Cited by

References

    1. Rhodes DR, Chinnaiyan AM. Integrative analysis of the cancer transcriptome. Nat Genet. 2005;37:S31–S37. doi: 10.1038/ng1570. - DOI - PubMed
    1. Tsai J, Sultana R, Lee Y, Pertea G, Karamycheva S, Antonescu V, Cho J, Parvizi B, Cheung F, Quackenbush J. RESOURCERER: a database for annotating and linking microarray resources within and across species. Genome Biol. 2001;2 - PMC - PubMed
    1. Khatri P, Sellamuthu S, Malhotra P, Amin K, Done A, Draghici S. Recent additions and improvements to the Onto-Tools. Nucleic Acids Res. 2005:W762–W765. doi: 10.1093/nar/gki472. - DOI - PMC - PubMed
    1. Diehn M, Sherlock G, Binkley G, Jin H, Matese JC, Hernandez-Boussard T, Rees CA, Cherry MJ, Botstein D, Brown PO, Alizadeh AA. SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data. Nucl Acids Res. 2003;31:219–223. doi: 10.1093/nar/gkg014. - DOI - PMC - PubMed
    1. Bussey KJ, Kane D, Sunshine M, Narasimhan S, Nishizuka S, Reinhold WC, Zeeberg B, Ajay W, Weinstein JN. MatchMiner: a tool for batch navigation among gene and gene product identifiers. Genome Biol. 2003;4:R27. doi: 10.1186/gb-2003-4-4-r27. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources