Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion - PubMed (original) (raw)
. 2012 Jul;40(Web Server issue):W281-7.
doi: 10.1093/nar/gks469. Epub 2012 May 25.
Affiliations
- PMID: 22638583
- PMCID: PMC3394285
- DOI: 10.1093/nar/gks469
Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion
Martin Christen Frølund Thomsen et al. Nucleic Acids Res. 2012 Jul.
Abstract
Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed).
Figures
Figure 1.
The submission (left) and graphical layout (right) part of the web interface. In the submission part the user specifies the input file, the format of output files, the logotype and the conditions for the handling of the input data. In the Graphical Layout part, the user customizes the graphical layout of the logo plot; page size, stacks per line, lines per page, colours, bars, rotation of position numbers and title.
Figure 2.
Output from Seq2Logo. The upper panel shows the sequence logo calculated from a set of 13 artificial peptide sequences using the specification defined in Figure 1 (sequence weighting using clustering, pseudo count with a weight of 200 and logotype as Kullback–Leibler). Enriched amino acids are shown on the positive _y_-axis and depleted amino acids on the negative _y_-axis. The lower panel gives the position-specific (log-odds) scoring matrix (PSSM) calculated by Seq2Logo. Each line corresponds to a position and gives the consensus amino acid and the log-odds scores for the 20 amino acids.
Figure 3.
Sequence logos generated from small sequence samples. All logos except the right logo in the lower row were calculated from a set of 13 artificial peptide sequences proposed to bind HLA-A*02:01 (see Figure 1). The upper row shows logos calculated by Seq2Logo using: (i) without sequence weighting and pseudo count correction, (ii) sequence weighting by clustering and no pseudo count correction and (iii) sequence weighting by clustering and pseudo count correction with a weight on prior of 200. The lower row shows logos calculated using: (i) Weblogo with ‘small sample correction’, (ii) EnoLOGOS and (iii) Seq2Logo from a set of 229 HLA-A*02:01 9mer ligands downloaded from the SYFPEITHI database (12) with sequence weighting by clustering and pseudo count correction with a weight on prior of 200.
Figure 4.
The different logotype representations covered by Seq2Logo. Sequence logos generated from at set of 13 artificial peptide sequences proposed to bind HLA-A*02:01 (see Figure 1). All logos were calculated using clustering and pseudo counts with a weight on prior at 200. Upper row, left panel: Shannon, right panel: Kullback–Leibler. Lower row left panel: weighted Kullback–Leibler, right panel: probability weighted Kullback–Leibler.
Figure 5.
PSSM-logo for the N-linked glycosylation motif. The motif was calculated from a set of 2128 unique experimentally verify N-glycosylation sites downloaded from the UniprotKB protein database. Only peptide fragments of length 11 (5 before and 5 after the N) were included in the analysis.
Figure 6.
Seq2Logo visualization of a Blast sequence profile for 1K7C chain A. The Blast profile was obtained using Blast2logo (
www.cbs.dtu.dk/biotools/Blast2logo
(14 May 2012, date last accessed)) searching against the nr70 sequence database with default options. The active site of 1K7C:A is defined by the residues S9, G42, N74, D192 and H195 (13). All these residues show up as highly conserved in the sequence logo.
Similar articles
- RaacLogo: a new sequence logo generator by using reduced amino acid clusters.
Zheng L, Liu D, Yang W, Yang L, Zuo Y. Zheng L, et al. Brief Bioinform. 2021 May 20;22(3):bbaa096. doi: 10.1093/bib/bbaa096. Brief Bioinform. 2021. PMID: 32524143 - MetaLogo: a heterogeneity-aware sequence logo generator and aligner.
Chen Y, He Z, Men Y, Dong G, Hu S, Ying X. Chen Y, et al. Brief Bioinform. 2022 Mar 10;23(2):bbab591. doi: 10.1093/bib/bbab591. Brief Bioinform. 2022. PMID: 35108357 Free PMC article. - enoLOGOS: a versatile web tool for energy normalized sequence logos.
Workman CT, Yin Y, Corcoran DL, Ideker T, Stormo GD, Benos PV. Workman CT, et al. Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W389-92. doi: 10.1093/nar/gki439. Nucleic Acids Res. 2005. PMID: 15980495 Free PMC article. - Skylign: a tool for creating informative, interactive logos representing sequence alignments and profile hidden Markov models.
Wheeler TJ, Clements J, Finn RD. Wheeler TJ, et al. BMC Bioinformatics. 2014 Jan 13;15:7. doi: 10.1186/1471-2105-15-7. BMC Bioinformatics. 2014. PMID: 24410852 Free PMC article. - LogoBar: bar graph visualization of protein logos with gaps.
Pérez-Bercoff A, Koch J, Bürglin TR. Pérez-Bercoff A, et al. Bioinformatics. 2006 Jan 1;22(1):112-4. doi: 10.1093/bioinformatics/bti761. Epub 2005 Nov 3. Bioinformatics. 2006. PMID: 16269415
Cited by
- Polymorphisms of HLA-B: influences on assembly and immunity.
Olson E, Geng J, Raghavan M. Olson E, et al. Curr Opin Immunol. 2020 Jun;64:137-145. doi: 10.1016/j.coi.2020.05.008. Epub 2020 Jun 30. Curr Opin Immunol. 2020. PMID: 32619904 Free PMC article. Review. - The unstructured linker of Mlh1 contains a motif required for endonuclease function which is mutated in cancers.
Torres KA, Calil FA, Zhou AL, DuPrie ML, Putnam CD, Kolodner RD. Torres KA, et al. Proc Natl Acad Sci U S A. 2022 Oct 18;119(42):e2212870119. doi: 10.1073/pnas.2212870119. Epub 2022 Oct 10. Proc Natl Acad Sci U S A. 2022. PMID: 36215471 Free PMC article. - Tuning the binding interface between Machupo virus glycoprotein and human transferrin receptor.
Sjöström DJ, Lundgren A, Garforth SJ, Bjelic S. Sjöström DJ, et al. Proteins. 2021 Mar;89(3):311-321. doi: 10.1002/prot.26016. Epub 2020 Oct 26. Proteins. 2021. PMID: 33068039 Free PMC article. - T cell receptor fingerprinting enables in-depth characterization of the interactions governing recognition of peptide-MHC complexes.
Bentzen AK, Such L, Jensen KK, Marquard AM, Jessen LE, Miller NJ, Church CD, Lyngaa R, Koelle DM, Becker JC, Linnemann C, Schumacher TNM, Marcatili P, Nghiem P, Nielsen M, Hadrup SR. Bentzen AK, et al. Nat Biotechnol. 2018 Nov 19:10.1038/nbt.4303. doi: 10.1038/nbt.4303. Online ahead of print. Nat Biotechnol. 2018. PMID: 30451992 Free PMC article. - CapsNh-Kcr: Capsule network-based prediction of lysine crotonylation sites in human non-histone proteins.
Khanal J, Kandel J, Tayara H, Chong KT. Khanal J, et al. Comput Struct Biotechnol J. 2022 Dec 1;21:120-127. doi: 10.1016/j.csbj.2022.11.056. eCollection 2023. Comput Struct Biotechnol J. 2022. PMID: 36544479 Free PMC article.
References
- Colaert N, Helsens K, Martens L, Vandekerckhove J, Gevaert K. Improved visualization of protein consensus sequences by iceLogo. Nat. Methods. 2009;6:786–787. - PubMed
- Vacic V, Iakoucheva LM, Radivojac P. Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments. Bioinformatics. 2006;22:1536–1537. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials