Impact of normalization on miRNA microarray expression profiling - PubMed (original) (raw)

Impact of normalization on miRNA microarray expression profiling

Sylvain Pradervand et al. RNA. 2009 Mar.

Abstract

Profiling miRNA levels in cells with miRNA microarrays is becoming a widely used technique. Although normalization methods for mRNA gene expression arrays are well established, miRNA array normalization has so far not been investigated in detail. In this study we investigate the impact of normalization on data generated with the Agilent miRNA array platform. We have developed a method to select nonchanging miRNAs (invariants) and use them to compute linear regression normalization coefficients or variance stabilizing normalization (VSN) parameters. We compared the invariants normalization to normalization by scaling, quantile, and VSN with default parameters as well as to no normalization using samples with strong differential expression of miRNAs (heart-brain comparison) and samples where only a few miRNAs are affected (by p53 overexpression in squamous carcinoma cells versus control). All normalization methods performed better than no normalization. Normalization procedures based on the set of invariants and quantile were the most robust over all experimental conditions tested. Our method of invariant selection and normalization is not limited to Agilent miRNA arrays and can be applied to other data sets including those from one color miRNA microarray platforms, focused gene expression arrays, and gene expression analysis using quantitative PCR.

PubMed Disclaimer

Figures

FIGURE 1.

FIGURE 1.

Example of invariant probes selection. Removal of standard deviation (SD)-versus-mean trend is done by fitting loess (red line) to the scatter plot of SD versus mean (inset scatter plot). The invariant probes are identified from the mean and corrected standard deviation (main scatter plot). Probes with high mean are indicated in color, probes with low mean are in black. (Green) high mean probes belonging to the lowest SD component (“invariants”). (Blue and magenta) High mean probes belonging to higher SD components. (Open red circle) Positive control probes. (Open red triangle) Negative control probes. (Dashed line) Mean and SD cutoffs.

FIGURE 2.

FIGURE 2.

Effect of normalization on technical reproducibility. Data from brain and heart tissue samples were separated by quartile (Q1, Q2, Q3, Q4) based on the mean of unlogged unnormalized expression signal of the three technical replicates. Q1 corresponds to the bottom quartile and Q4 to the top quartile. Mean of log2 signal intensity (A) and standard deviation (B) of technical replicates were calculated for each normalization method, in each tissue and each quartile. VSN expression signals (in base e) were converted to base 2. (White) no normalization, (red) quantile, (blue) invariants, (cyan) scaling, (green) VSN.

FIGURE 3.

FIGURE 3.

Sensitivity and specificity of the normalization methods. (A,B) Fraction of positives recovered plotted against the FDR for a 50% brain–50% heart RNA mixture compared to a 95% brain–5% heart RNA mixture (A) or a 50% brain–50% heart RNA mixture compared to a 75% brain–25% heart RNA mixture (B). Positives were defined as miRNAs among the 50% most strongly expressed in comparison to the pure heart and brain RNA samples with at least a threefold expression difference and a P < 0.01 as measured using any normalization methods. VSN expression signals (in base e) were converted to base 2. (Black) No normalization, (red) quantile, (blue) invariants, (cyan) scaling, (green) VSN, (magenta dashed line) VSN using invariants selected as in our invariants normalization method.

FIGURE 4.

FIGURE 4.

Fold-change concordance between QPCR assays and microarrays. Seventeen miRNAs spanning the entire fold-change range between heart and brain samples were selected for validation with TaqMan assays. All expression signals were converted into log2 and the differences in means (M values) measured with the different normalization methods (_y_-axis) were plotted against those determined by TaqMan assays (_x_-axis). Dashed lines represent the 45° lines of complete concordance. The solid lines represent the results of the regression analysis. Correlation coefficient (r), slope (a), and intercept (b) of regression lines are indicated. Ninety-five percent confidence intervals are indicated in square brackets. Confidence intervals for correlation coefficients were calculated using Fisher's transformation.

FIGURE 5.

FIGURE 5.

Biological assessment of normalization methods using p53-induced miRNAs in human squamous carcinoma cell line SCC13. (A) Q-Q plots comparing the t statistics of Ad-p53 versus Ad-GFP samples (Sample Quantiles) against a t 10 distribution (Theoretical Quantiles). T statistics were calculated for the six Ad-p53 versus the six Ad-GFP samples (three biological replicates with two technical replicates) using data produced by each of the different normalized methods. Number of miRNAs and P values for a FDR cutoff of 5% are indicated. (B) Mean difference (M value) versus average expression (A value) plot for the different normalization methods. Probes with a FDR < 5% are indicated in open circles. (C) TaqMan QPCR validation of miRNAs. SCC13 cells were infected in biological triplicate with either Ad-GFP (white bars) or Ad-p53 (gray bars). Expression was calculated relative to the Z30 reference assay. Mean and standard error of four technical replicates are indicated.

Similar articles

Cited by

References

    1. Alvarez-Garcia I., Miska E.A. MicroRNA functions in animal development and human disease. Development. 2005;132:4653–4662. - PubMed
    1. Bartel D.P. MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell. 2004;116:281–297. - PubMed
    1. Baskerville S., Bartel D.P. Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. RNA. 2005;11:241–247. - PMC - PubMed
    1. Benjamini Y., Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol. 1995;57:289–300.
    1. Bolstad B.M., Irizarry R.A., Astrand M., Speed T.P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19:185–193. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources