Composability of regulatory sequences controlling transcription and translation in Escherichia coli - PubMed (original) (raw)
Composability of regulatory sequences controlling transcription and translation in Escherichia coli
Sriram Kosuri et al. Proc Natl Acad Sci U S A. 2013.
Abstract
The inability to predict heterologous gene expression levels precisely hinders our ability to engineer biological systems. Using well-characterized regulatory elements offers a potential solution only if such elements behave predictably when combined. We synthesized 12,563 combinations of common promoters and ribosome binding sites and simultaneously measured DNA, RNA, and protein levels from the entire library. Using a simple model, we found that RNA and protein expression were within twofold of expected levels 80% and 64% of the time, respectively. The large dataset allowed quantitation of global effects, such as translation rate on mRNA stability and mRNA secondary structure on translation rate. However, the worst 5% of constructs deviated from prediction by 13-fold on average, which could hinder large-scale genetic engineering projects. The ease and scale this of approach indicates that rather than relying on prediction or standardization, we can screen synthetic libraries for desired behavior.
Keywords: next-generation sequencing; synthetic biology; systems biology.
Conflict of interest statement
The authors declare no conflict of interest.
Figures
Fig. 1.
Library characterization and workflow. (A) We synthesized all combinations of 114 promoters and 111 RBS sites to create a library containing 12,653 constructs. The library was then cloned into an expression plasmid to express superfolder GFP, and mCherry was also independently expressed from a constitutive promoter to act as an intracellular control. The cell library was harvested for DNASeq, RNASeq, and FlowSeq to quantify DNA, RNA, and protein levels, respectively, for each construct. In FlowSeq, cells were sorted into bins of varying GFP-to-mCherry ratios, barcoded, and sequenced to reconstruct protein levels for each individual construct. (B) GFP expression levels for the library varied over approximately four orders of magnitude compared with relatively constant red fluorescence (Inset). (C) One hundred forty-four sequence-verified clones were individually subjected to flow cytometry analysis to act as controls. Displayed are GFP levels of two representative clones, P007-R065 (Left) and P081-R062 (Right), which show that individual constructs generally fall into 2 to 3 bins. (D, Upper) Library is split into 12 log-spaced bins based on the GFP-to-RFP ratio. (D, Lower) Individual bins have large differences in the number of cells that fall into each one.
Fig. 2.
RNA and protein level grids. The RNA (Left) and protein (Right) levels for all 12,653 constructs are plotted on a grid according to the identity of construct’s promoter (y axis) and RBS (x axis). Promoters and RBSs are sorted by average RNA and protein abundance, respectively. Gray boxes indicate constructs that were below empirically determined cutoffs. Scale bars for RNA (RNA/DNA ratio) and protein (relative fluorescent units of GFP/RFP ratio) levels are shown to the right.
Fig. 3.
Library measurements vs. individual colony and spike-in controls. (A) Protein levels for 141 sequence-verified constructs characterized by at least two flow cytometry measurements plotted against their FlowSeq-estimated protein levels. One construct of 142 is missing because it had insufficient reads in the FlowSeq analysis. (B) RNA levels for 41 constructs as measured in our library plotted against control constructs spiked into a separate library. One construct of 42 is missing because it had no reads in the spike-in data. (C) Protein levels for 42 control constructs spiked into a separate library plotted against protein levels for those same constructs measured at least twice by flow cytometry. (D) Protein levels for 42 control constructs spiked into a separate library are plotted against protein level measurements as measured in our promoter + RBS library. (All _R_2 values for linear regressions pass an F test with a P value <2.2e-16.). RFU, relative fluorescent units.
Fig. 4.
RNA and protein model deviations. Based the promoter and RBS strengths, we calculated expected RNA (Left) and protein (Right) levels for each construct. Red and blue denote measured values below and above expectation, and they are plotted on the same scale for both plots. For constructs where expected protein levels are above or below the empirically determined thresholds, we set the prediction to be at the threshold level.
Fig. 5.
ANOVA explained variance and composition effects of promoter and RBS pairs. (A) Explained variance (as percentages of the sum of squared deviations) for RNA and protein measurements using ANOVA. One pie chart shows partitioned variance for RNA measurements (Left), whereas the other chart shows partitioned variance for protein measurements (Right). “Residual” indicates the unexplained variance in the model. (B) Deviation from expected RNA level is correlated with RBS strength. RBSs are partitioned into five groups based on increasing average translation strength. (C) Free energy of a transcript’s 5′ secondary structure (transcription start site to +30 of superfolder GFP) is correlated with average deviation from the expected protein level. Average deviations are partitioned into six equal ranges. Brackets at the top indicate two-sample Student t tests with P values <2e-5 (**) and <0.02 (*). The box plot displays the median, with hinges indicating the first and third quartiles. Whiskers extend to farthest point within 1.5-fold of the interquartile range, with outliers shown as points.
Similar articles
- Precise and reliable gene expression via standard transcription and translation initiation elements.
Mutalik VK, Guimaraes JC, Cambray G, Lam C, Christoffersen MJ, Mai QA, Tran AB, Paull M, Keasling JD, Arkin AP, Endy D. Mutalik VK, et al. Nat Methods. 2013 Apr;10(4):354-60. doi: 10.1038/nmeth.2404. Epub 2013 Mar 10. Nat Methods. 2013. PMID: 23474465 - Sort-Seq Approach to Engineering a Formaldehyde-Inducible Promoter for Dynamically Regulated Escherichia coli Growth on Methanol.
Rohlhill J, Sandoval NR, Papoutsakis ET. Rohlhill J, et al. ACS Synth Biol. 2017 Aug 18;6(8):1584-1595. doi: 10.1021/acssynbio.7b00114. Epub 2017 May 9. ACS Synth Biol. 2017. PMID: 28463494 Free PMC article. - Environmental signal integration by a modular AND gate.
Anderson JC, Voigt CA, Arkin AP. Anderson JC, et al. Mol Syst Biol. 2007;3:133. doi: 10.1038/msb4100173. Epub 2007 Aug 14. Mol Syst Biol. 2007. PMID: 17700541 Free PMC article. - Expanding the synthetic biology toolbox: engineering orthogonal regulators of gene expression.
Rao CV. Rao CV. Curr Opin Biotechnol. 2012 Oct;23(5):689-94. doi: 10.1016/j.copbio.2011.12.015. Epub 2012 Jan 9. Curr Opin Biotechnol. 2012. PMID: 22237017 Review. - Optimizing scaleup yield for protein production: Computationally Optimized DNA Assembly (CODA) and Translation Engineering.
Hatfield GW, Roth DA. Hatfield GW, et al. Biotechnol Annu Rev. 2007;13:27-42. doi: 10.1016/S1387-2656(07)13002-7. Biotechnol Annu Rev. 2007. PMID: 17875472 Review.
Cited by
- Engineering Escherichia coli with a symbiotic plasmid for the production of phenylpyruvic acid.
Xiong T, Gao Q, Zhang J, Zhang J, Zhang C, Yue H, Liu J, Bai D, Li J. Xiong T, et al. RSC Adv. 2024 Aug 22;14(36):26580-26584. doi: 10.1039/d4ra03707c. eCollection 2024 Aug 16. RSC Adv. 2024. PMID: 39175686 Free PMC article. - A promoter-RBS library for fine-tuning gene expression in Methanosarcina acetivorans.
Zhu P, Molina Resendiz M, von Ossowski I, Scheller S. Zhu P, et al. Appl Environ Microbiol. 2024 Sep 18;90(9):e0109224. doi: 10.1128/aem.01092-24. Epub 2024 Aug 12. Appl Environ Microbiol. 2024. PMID: 39132998 Free PMC article. - Guide RNA structure design enables combinatorial CRISPRa programs for biosynthetic profiling.
Fontana J, Sparkman-Yager D, Faulkner I, Cardiff R, Kiattisewee C, Walls A, Primo TG, Kinnunen PC, Garcia Martin H, Zalatan JG, Carothers JM. Fontana J, et al. Nat Commun. 2024 Jul 27;15(1):6341. doi: 10.1038/s41467-024-50528-1. Nat Commun. 2024. PMID: 39068154 Free PMC article. - Transfer learning for cross-context prediction of protein expression from 5'UTR sequence.
Gilliot PA, Gorochowski TE. Gilliot PA, et al. Nucleic Acids Res. 2024 Jul 22;52(13):e58. doi: 10.1093/nar/gkae491. Nucleic Acids Res. 2024. PMID: 38864396 Free PMC article. - Automated characterization and analysis of expression compatibility between regulatory sequences and metabolic genes in Escherichia coli.
Wen X, Lin J, Yang C, Li Y, Cheng H, Liu Y, Zhang Y, Ma H, Mao Y, Liao X, Wang M. Wen X, et al. Synth Syst Biotechnol. 2024 May 17;9(4):647-657. doi: 10.1016/j.synbio.2024.05.010. eCollection 2024 Dec. Synth Syst Biotechnol. 2024. PMID: 38817827 Free PMC article.
References
- Organization for Economic Cooperation and Development . The Bioeconomy to 2030: Designing a Policy Agenda. Paris: OECD Publishing; 2009.
- Keasling JD. Manufacturing molecules through metabolic engineering. Science. 2010;330(6009):1355–1358. - PubMed
- Carr PA, Church GM. Genome engineering. Nat Biotechnol. 2009;27(12):1151–1162. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials