PLAAC: a web and command-line application to identify proteins with prion-like amino acid composition (original) (raw)

Journal Article

,

1 1Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, 2Department of Pathology, Beth Israel Deaconess Medical Center, 3Center for Biomedical Informatics, Harvard Medical School, 10 Shattuck Street, Boston, MA 02115, USA, 4Department of Biology, 5Howard Hughes Medical Institute, MIT, 77 Massachusetts Avenue, Cambridge, MA 02139 and 6Department of Cell and Developmental Biology, University of Massachusetts Medical School, 55 Lake Avenue North, Worcester, MA 01655, USA

Search for other works by this author on:

,

1 1Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, 2Department of Pathology, Beth Israel Deaconess Medical Center, 3Center for Biomedical Informatics, Harvard Medical School, 10 Shattuck Street, Boston, MA 02115, USA, 4Department of Biology, 5Howard Hughes Medical Institute, MIT, 77 Massachusetts Avenue, Cambridge, MA 02139 and 6Department of Cell and Developmental Biology, University of Massachusetts Medical School, 55 Lake Avenue North, Worcester, MA 01655, USA

Search for other works by this author on:

,

1 1Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, 2Department of Pathology, Beth Israel Deaconess Medical Center, 3Center for Biomedical Informatics, Harvard Medical School, 10 Shattuck Street, Boston, MA 02115, USA, 4Department of Biology, 5Howard Hughes Medical Institute, MIT, 77 Massachusetts Avenue, Cambridge, MA 02139 and 6Department of Cell and Developmental Biology, University of Massachusetts Medical School, 55 Lake Avenue North, Worcester, MA 01655, USA

*To whom correspondence should be addressed.

Search for other works by this author on:

1 1Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, 2Department of Pathology, Beth Israel Deaconess Medical Center, 3Center for Biomedical Informatics, Harvard Medical School, 10 Shattuck Street, Boston, MA 02115, USA, 4Department of Biology, 5Howard Hughes Medical Institute, MIT, 77 Massachusetts Avenue, Cambridge, MA 02139 and 6Department of Cell and Developmental Biology, University of Massachusetts Medical School, 55 Lake Avenue North, Worcester, MA 01655, USA

*To whom correspondence should be addressed.

Search for other works by this author on:

Received:

10 January 2014

Revision received:

01 April 2014

Cite

Alex K. Lancaster, Andrew Nutter-Upham, Susan Lindquist, Oliver D. King, PLAAC: a web and command-line application to identify proteins with prion-like amino acid composition, Bioinformatics, Volume 30, Issue 17, September 2014, Pages 2501–2502, https://doi.org/10.1093/bioinformatics/btu310
Close

Navbar Search Filter Mobile Enter search term Search

Summary: Prions are self-templating protein aggregates that stably perpetuate distinct biological states and are of keen interest to researchers in both evolutionary and biomedical science. The best understood prions are from yeast and have a prion-forming domain with strongly biased amino acid composition, most notably enriched for Q or N. PLAAC is a web application that scans protein sequences for domains with prion-like amino acid composition. Users can upload sequence files, or paste sequences directly into a textbox. PLAAC ranks the input sequences by several summary scores and allows scores along sequences to be visualized. Text output files can be downloaded for further analyses, and visualizations saved in PDF and PNG formats.

Availability and implementation: http://plaac.wi.mit.edu/. The Ruby-based web framework and the command-line software (implemented in Java, with visualization routines in R) are available at http://github.com/whitehead/plaac under the MIT license. All software can be run under OS X, Windows and Unix.

Contact: oliver.king@umassmed.edu or lindquist_admin@wi.mit.edu

1 INTRODUCTION

Prions are proteins that can switch from non-aggregated states to self-templating highly ordered aggregates. This property allows them to confer stable changes in biological states that are of great interest in molecular and evolutionary biology (Newby and Lindquist, 2013). For example, they create neurodegenerative diseases, perpetuate activity states in neural synapses and provide access to a broad realm of phenotypic diversification in microbes. The ability to identify potential prion-like proteins from sequence data would speed the search for new prions across a wide variety of taxa. We previously developed (Alberti et al., 2009) a hidden Markov model (HMM) to identify candidate prions and parse these candidates into prion-like domains (PrLDs) and non-PrLDs, on the basis of amino acid (AA) composition. Briefly, the HMM has two hidden states, for PrLD and background, and the output symbols are the 20 AAs. The output probabilities for the PrLD state were constructed based on the AA frequencies in the PrLDs of four prions of Saccharomyces cerevisiae that were known at the time. This algorithm and extensions have since been used in several studies to identify prion-like sequences in yeast (Holmes et al., 2013) and also in humans (Kim et al., 2013; King et al., 2012), in which several proteins with PrLDs are associated with ALS and related neurodegenerative disorders. Here we describe a web-based front end to the prion-prediction algorithm, PLAAC, and give an overview of implementation and extensions; further details are provided on the PLAAC Web site.

2 FEATURES AND METHODS

PLAAC supports the scanning of single protein sequences for potential PrLDs, as well as the scanning of whole proteomes. The user can specify a minimum length for prion domains (set by a textbox, by default _L_core = 60), and can optionally use organism-specific background AA frequencies in the HMM instead of the default S. cerevisiae background frequencies. These frequencies can be computed from the uploaded sequences, or selected from precomputed organism-specific frequencies (set by a dropdown list). A parameter α (set via a slider) allows continuous interpolation between organism-specific background frequencies (α = 0) and S. cerevisiae background frequencies (α = 1). We have used α = 0.5 when scanning other species, reflecting our uncertainty in the degree to which the corresponding PrLD AA frequencies are skewed toward S. cerevisiae background frequencies (as opposed to being species-independent).

Resulting output including per-protein summary tables and per-residue tables for selected proteins can be downloaded as text files. Visualizations can also be downloaded as PNG or PDF files (Fig. 1). The command-line program allows additional control over plots [which tracks to display, and whether to show sliding averages of per-residue scores (Alberti et al., 2009) or sliding averages of these sliding averages (Kim et al., 2013)].

Visualization outputs from PLAAC. Top: four known yeast prion proteins with each AA color-coded by its enrichment log-likelihood ratio in PrLDs (styled after the Sequence Enrichment Visualization Tool; http://jura.wi.mit.edu/cgi-bin/bio/draw_enrichment.pl), with HMM parse indicated by outer bars. Bottom: detailed visualization of the Sup35 protein, including several prion-prediction scores discussed in the main text

Fig. 1.

Visualization outputs from PLAAC. Top: four known yeast prion proteins with each AA color-coded by its enrichment log-likelihood ratio in PrLDs (styled after the Sequence Enrichment Visualization Tool; http://jura.wi.mit.edu/cgi-bin/bio/draw_enrichment.pl), with HMM parse indicated by outer bars. Bottom: detailed visualization of the Sup35 protein, including several prion-prediction scores discussed in the main text

Single sequence: To search for PrLDs in a single sequence, the user pastes into a textbox or uploads the protein sequence, either in FASTA format or as bare sequence, and may modify the _L_core and α parameters, if desired. After submission, scores (including COREscore, LLR and PAPA scores described below) for the sequence are displayed along with a graphical visualization of the location, if any, of predicted PrLDs.

Multiple sequences and whole proteomes: To scan multiple protein sequences (including whole proteomes), the user again pastes or uploads them in FASTA format. (Upload is recommended for more than a few sequences.) _L_core and α parameters may be adjusted, and background frequencies may be computed directly from the provided sequences (not recommended if uploading or pasting just a few sequences). The user is presented with a summary table with a row for each uploaded protein ranked by COREscore from highest to lowest and then may select candidates in this summary list to generate plots for further visualization.

Output ranking and scores: Multiple sequences are ranked for prion-like properties by the COREscore metric (Alberti et al., 2009), which is the maximum sum of per-residue log-likelihood ratios for any subsequence of length _L_core that falls entirely within the PrLD state in the HMM Viterbi parse, provided a sequence of this length exists, and is undefined (NaN) otherwise. In addition, we compute a score LLR (for log-likelihood ratio) that is otherwise identical to COREscore, but without the requirement that the sequence falls entirely within the PrLD state of the HMM parse. Because LLR does not impose a hard cutoff, it can be useful when doing exploratory whole-proteome analyses, e.g. on the overall distribution of (near) PrLDs. However, examining whether the region with the highest LLR score falls entirely within the PrLD state in the HMM parse may be informative, e.g. when selecting domains to clone for studies of candidate PrLDs fused to reporter proteins.

Algorithm updates: Since the publication of (Alberti et al., 2009), the AA frequencies for the PrLD state of the HMM have been updated based on 28 candidate PrLDs that showed switching behavior or strong amyloid formation experimentally (see source code for details). These updated AA frequencies were used in later publications (Holmes et al., 2013; Kim et al., 2013; King et al., 2012).

A subsequent algorithm called PAPA (Toombs et al., 2010, 2012) that uses AA scores derived from a random mutagenesis screen can downweigh many of the apparent false positives from Alberti et al. (2009), and can give sharper predictions for the results of point mutations (Kim et al., 2013). It appears that a small number of hydrophobic residues can speed amyloid formation in regions otherwise highly enriched for polar uncharged residues such as Q and N (Toombs et al., 2010). PLAAC and PAPA are complementary, as PLAAC identifies such regions, and PAPA has been validated only on such regions. (Single scores based on local averages of per-residue AA scores do not adequately capture the trade-off between hydrophobic and polar uncharged residues.) We reimplemented PAPA, and included this score in the output and visualizations, along with predictions of intrinsically unfolded protein regions from a reimplementation of FoldIndex (Prilusky et al., 2005). It is also important to note that there are several known prions that are not strongly Q/N-rich (e.g. het-S, PrP, Mod5), but as systematic experimental screening for prion-like propagation is lacking for non-Q/N-rich proteins, it is difficult to estimate the false-negative rates of these algorithms.

PLAAC has been developed as a web application to allow users to scan single protein sequences as well as whole proteomes for the presence of PrLDs. We have also augmented the original algorithm with additional scores, making unified comparisons possible.

ACKNOWLEDGEMENTS

We thank R. Halfmann, S. Alberti, J. Shorter, A. Gitler, J.P. Taylor, the Lindquist Lab, S. McCallum of the Information Technology and F. Lewitter of the Bioinformatics and Research Computing (BaRC) groups at the Whitehead Institute for additional help and advice. R. Latek and K. Walker, former members of BaRC, developed the Sequence Enrichment Visualization Tool.

Funding: This work was supported by the G. Harold and Leila Y. Mathers Foundation [to S.L.]; Howard Hughes Medical Institute [to S.L.]; and the National Institutes of Health [GM025874 to S.L.].

Conflicts of Interest: none declared.

REFERENCES

Alberti

S

et al.

A systematic survey identifies prions and illuminates sequence features of prionogenic proteins

Cell

2009

137

146

158

Holmes

DL

et al.

Heritable remodeling of yeast multicellularity by an environmentally responsive prion

Cell

2013

153

153

165

Kim

HJ

et al.

Mutations in prion-like domains in hnRNPA2B1 and hnRNPA1 cause multisystem proteinopathy and ALS

Nature

2013

495

467

473

King

OD

et al.

The tip of the iceberg: RNA-binding proteins with prion-like domains in neurodegenerative disease

Brain Res.

2012

1462

61

80

Newby

GA

Lindquist

S

Blessings in disguise: biological benefits of prion-like mechanisms

Trends Cell Biol.

2013

23

251

259

Prilusky

J

et al.

FoldIndex©: a simple tool to predict whether a given protein sequence is intrinsically unfolded

Bioinformatics

2005

21

3435

3438

Toombs

JA

et al.

Compositional determinants of prion formation in yeast

Mol. Cell. Biol

2010

30

319

332

Toombs

JA

et al.

De novo design of synthetic prion domains

Proc. Natl Acad. Sci. USA

2012

109

6519

6524

Author notes

Associate Editor: Alfonso Valencia

© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

Citations

Views

Altmetric

Metrics

Total Views 10,065

7,692 Pageviews

2,373 PDF Downloads

Since 1/1/2017

Month: Total Views:
January 2017 6
February 2017 46
March 2017 39
April 2017 23
May 2017 41
June 2017 27
July 2017 34
August 2017 56
September 2017 16
October 2017 26
November 2017 25
December 2017 42
January 2018 61
February 2018 63
March 2018 69
April 2018 66
May 2018 94
June 2018 48
July 2018 35
August 2018 70
September 2018 57
October 2018 51
November 2018 52
December 2018 53
January 2019 56
February 2019 77
March 2019 96
April 2019 98
May 2019 122
June 2019 80
July 2019 123
August 2019 142
September 2019 197
October 2019 126
November 2019 112
December 2019 73
January 2020 79
February 2020 97
March 2020 65
April 2020 112
May 2020 85
June 2020 107
July 2020 81
August 2020 134
September 2020 132
October 2020 182
November 2020 162
December 2020 122
January 2021 98
February 2021 95
March 2021 198
April 2021 153
May 2021 130
June 2021 156
July 2021 166
August 2021 109
September 2021 142
October 2021 110
November 2021 121
December 2021 222
January 2022 103
February 2022 126
March 2022 115
April 2022 156
May 2022 129
June 2022 169
July 2022 115
August 2022 150
September 2022 109
October 2022 177
November 2022 135
December 2022 128
January 2023 137
February 2023 166
March 2023 225
April 2023 160
May 2023 122
June 2023 126
July 2023 124
August 2023 130
September 2023 129
October 2023 111
November 2023 130
December 2023 113
January 2024 181
February 2024 137
March 2024 138
April 2024 143
May 2024 140
June 2024 134
July 2024 133
August 2024 133
September 2024 115
October 2024 66

Citations

334 Web of Science

×

Email alerts

Citing articles via

More from Oxford Academic