Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion - PubMed (original) (raw)

. 2012 Jul;40(Web Server issue):W281-7.

doi: 10.1093/nar/gks469. Epub 2012 May 25.

Affiliations

Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

Martin Christen Frølund Thomsen et al. Nucleic Acids Res. 2012 Jul.

Abstract

Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed).

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

The submission (left) and graphical layout (right) part of the web interface. In the submission part the user specifies the input file, the format of output files, the logotype and the conditions for the handling of the input data. In the Graphical Layout part, the user customizes the graphical layout of the logo plot; page size, stacks per line, lines per page, colours, bars, rotation of position numbers and title.

Figure 2.

Figure 2.

Output from Seq2Logo. The upper panel shows the sequence logo calculated from a set of 13 artificial peptide sequences using the specification defined in Figure 1 (sequence weighting using clustering, pseudo count with a weight of 200 and logotype as Kullback–Leibler). Enriched amino acids are shown on the positive _y_-axis and depleted amino acids on the negative _y_-axis. The lower panel gives the position-specific (log-odds) scoring matrix (PSSM) calculated by Seq2Logo. Each line corresponds to a position and gives the consensus amino acid and the log-odds scores for the 20 amino acids.

Figure 3.

Figure 3.

Sequence logos generated from small sequence samples. All logos except the right logo in the lower row were calculated from a set of 13 artificial peptide sequences proposed to bind HLA-A*02:01 (see Figure 1). The upper row shows logos calculated by Seq2Logo using: (i) without sequence weighting and pseudo count correction, (ii) sequence weighting by clustering and no pseudo count correction and (iii) sequence weighting by clustering and pseudo count correction with a weight on prior of 200. The lower row shows logos calculated using: (i) Weblogo with ‘small sample correction’, (ii) EnoLOGOS and (iii) Seq2Logo from a set of 229 HLA-A*02:01 9mer ligands downloaded from the SYFPEITHI database (12) with sequence weighting by clustering and pseudo count correction with a weight on prior of 200.

Figure 4.

Figure 4.

The different logotype representations covered by Seq2Logo. Sequence logos generated from at set of 13 artificial peptide sequences proposed to bind HLA-A*02:01 (see Figure 1). All logos were calculated using clustering and pseudo counts with a weight on prior at 200. Upper row, left panel: Shannon, right panel: Kullback–Leibler. Lower row left panel: weighted Kullback–Leibler, right panel: probability weighted Kullback–Leibler.

Figure 5.

Figure 5.

PSSM-logo for the N-linked glycosylation motif. The motif was calculated from a set of 2128 unique experimentally verify N-glycosylation sites downloaded from the UniprotKB protein database. Only peptide fragments of length 11 (5 before and 5 after the N) were included in the analysis.

Figure 6.

Figure 6.

Seq2Logo visualization of a Blast sequence profile for 1K7C chain A. The Blast profile was obtained using Blast2logo (

www.cbs.dtu.dk/biotools/Blast2logo

(14 May 2012, date last accessed)) searching against the nr70 sequence database with default options. The active site of 1K7C:A is defined by the residues S9, G42, N74, D192 and H195 (13). All these residues show up as highly conserved in the sequence logo.

Similar articles

Cited by

References

    1. Schneider TD, Stephens RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990;18:6097–6100. - PMC - PubMed
    1. Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–1190. - PMC - PubMed
    1. Workman CT, Yin Y, Corcoran DL, Ideker T, Stormo GD, Benos PV. enoLOGOS: a versatile web tool for energy normalized sequence logos. Nucleic Acids Res. 2005;33:W389–W392. - PMC - PubMed
    1. Colaert N, Helsens K, Martens L, Vandekerckhove J, Gevaert K. Improved visualization of protein consensus sequences by iceLogo. Nat. Methods. 2009;6:786–787. - PubMed
    1. Vacic V, Iakoucheva LM, Radivojac P. Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments. Bioinformatics. 2006;22:1536–1537. - PubMed

Publication types

MeSH terms

LinkOut - more resources