Composability of regulatory sequences controlling transcription and translation in Escherichia coli - PubMed (original) (raw)

Composability of regulatory sequences controlling transcription and translation in Escherichia coli

Sriram Kosuri et al. Proc Natl Acad Sci U S A. 2013.

Abstract

The inability to predict heterologous gene expression levels precisely hinders our ability to engineer biological systems. Using well-characterized regulatory elements offers a potential solution only if such elements behave predictably when combined. We synthesized 12,563 combinations of common promoters and ribosome binding sites and simultaneously measured DNA, RNA, and protein levels from the entire library. Using a simple model, we found that RNA and protein expression were within twofold of expected levels 80% and 64% of the time, respectively. The large dataset allowed quantitation of global effects, such as translation rate on mRNA stability and mRNA secondary structure on translation rate. However, the worst 5% of constructs deviated from prediction by 13-fold on average, which could hinder large-scale genetic engineering projects. The ease and scale this of approach indicates that rather than relying on prediction or standardization, we can screen synthetic libraries for desired behavior.

Keywords: next-generation sequencing; synthetic biology; systems biology.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.

Fig. 1.

Library characterization and workflow. (A) We synthesized all combinations of 114 promoters and 111 RBS sites to create a library containing 12,653 constructs. The library was then cloned into an expression plasmid to express superfolder GFP, and mCherry was also independently expressed from a constitutive promoter to act as an intracellular control. The cell library was harvested for DNASeq, RNASeq, and FlowSeq to quantify DNA, RNA, and protein levels, respectively, for each construct. In FlowSeq, cells were sorted into bins of varying GFP-to-mCherry ratios, barcoded, and sequenced to reconstruct protein levels for each individual construct. (B) GFP expression levels for the library varied over approximately four orders of magnitude compared with relatively constant red fluorescence (Inset). (C) One hundred forty-four sequence-verified clones were individually subjected to flow cytometry analysis to act as controls. Displayed are GFP levels of two representative clones, P007-R065 (Left) and P081-R062 (Right), which show that individual constructs generally fall into 2 to 3 bins. (D, Upper) Library is split into 12 log-spaced bins based on the GFP-to-RFP ratio. (D, Lower) Individual bins have large differences in the number of cells that fall into each one.

Fig. 2.

Fig. 2.

RNA and protein level grids. The RNA (Left) and protein (Right) levels for all 12,653 constructs are plotted on a grid according to the identity of construct’s promoter (y axis) and RBS (x axis). Promoters and RBSs are sorted by average RNA and protein abundance, respectively. Gray boxes indicate constructs that were below empirically determined cutoffs. Scale bars for RNA (RNA/DNA ratio) and protein (relative fluorescent units of GFP/RFP ratio) levels are shown to the right.

Fig. 3.

Fig. 3.

Library measurements vs. individual colony and spike-in controls. (A) Protein levels for 141 sequence-verified constructs characterized by at least two flow cytometry measurements plotted against their FlowSeq-estimated protein levels. One construct of 142 is missing because it had insufficient reads in the FlowSeq analysis. (B) RNA levels for 41 constructs as measured in our library plotted against control constructs spiked into a separate library. One construct of 42 is missing because it had no reads in the spike-in data. (C) Protein levels for 42 control constructs spiked into a separate library plotted against protein levels for those same constructs measured at least twice by flow cytometry. (D) Protein levels for 42 control constructs spiked into a separate library are plotted against protein level measurements as measured in our promoter + RBS library. (All _R_2 values for linear regressions pass an F test with a P value <2.2e-16.). RFU, relative fluorescent units.

Fig. 4.

Fig. 4.

RNA and protein model deviations. Based the promoter and RBS strengths, we calculated expected RNA (Left) and protein (Right) levels for each construct. Red and blue denote measured values below and above expectation, and they are plotted on the same scale for both plots. For constructs where expected protein levels are above or below the empirically determined thresholds, we set the prediction to be at the threshold level.

Fig. 5.

Fig. 5.

ANOVA explained variance and composition effects of promoter and RBS pairs. (A) Explained variance (as percentages of the sum of squared deviations) for RNA and protein measurements using ANOVA. One pie chart shows partitioned variance for RNA measurements (Left), whereas the other chart shows partitioned variance for protein measurements (Right). “Residual” indicates the unexplained variance in the model. (B) Deviation from expected RNA level is correlated with RBS strength. RBSs are partitioned into five groups based on increasing average translation strength. (C) Free energy of a transcript’s 5′ secondary structure (transcription start site to +30 of superfolder GFP) is correlated with average deviation from the expected protein level. Average deviations are partitioned into six equal ranges. Brackets at the top indicate two-sample Student t tests with P values <2e-5 (**) and <0.02 (*). The box plot displays the median, with hinges indicating the first and third quartiles. Whiskers extend to farthest point within 1.5-fold of the interquartile range, with outliers shown as points.

Similar articles

Cited by

References

    1. Organization for Economic Cooperation and Development . The Bioeconomy to 2030: Designing a Policy Agenda. Paris: OECD Publishing; 2009.
    1. Carlson R. Laying the foundations for a bio-economy. Syst Synth Biol. 2007;1(3):109–117. - PMC - PubMed
    1. Keasling JD. Manufacturing molecules through metabolic engineering. Science. 2010;330(6009):1355–1358. - PubMed
    1. Wang HH, et al. Programming cells by multiplex genome engineering and accelerated evolution. Nature. 2009;460(7257):894–898. - PMC - PubMed
    1. Carr PA, Church GM. Genome engineering. Nat Biotechnol. 2009;27(12):1151–1162. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources