Effect of population heterogenization on the reproducibility of mouse behavior: a multi-laboratory study - PubMed (original) (raw)

. 2011 Jan 31;6(1):e16461.

doi: 10.1371/journal.pone.0016461.

Joseph P Garner, Benjamin Zipser, Lars Lewejohann, Norbert Sachser, Chadi Touma, Britta Schindler, Sabine Chourbaji, Christiane Brandwein, Peter Gass, Niek van Stipdonk, Johanneke van der Harst, Berry Spruijt, Vootele Võikar, David P Wolfer, Hanno Würbel

Affiliations

Effect of population heterogenization on the reproducibility of mouse behavior: a multi-laboratory study

S Helene Richter et al. PLoS One. 2011.

Abstract

In animal experiments, animals, husbandry and test procedures are traditionally standardized to maximize test sensitivity and minimize animal use, assuming that this will also guarantee reproducibility. However, by reducing within-experiment variation, standardization may limit inference to the specific experimental conditions. Indeed, we have recently shown in mice that standardization may generate spurious results in behavioral tests, accounting for poor reproducibility, and that this can be avoided by population heterogenization through systematic variation of experimental conditions. Here, we examined whether a simple form of heterogenization effectively improves reproducibility of test results in a multi-laboratory situation. Each of six laboratories independently ordered 64 female mice of two inbred strains (C57BL/6NCrl, DBA/2NCrl) and examined them for strain differences in five commonly used behavioral tests under two different experimental designs. In the standardized design, experimental conditions were standardized as much as possible in each laboratory, while they were systematically varied with respect to the animals' test age and cage enrichment in the heterogenized design. Although heterogenization tended to improve reproducibility by increasing within-experiment variation relative to between-experiment variation, the effect was too weak to account for the large variation between laboratories. However, our findings confirm the potential of systematic heterogenization for improving reproducibility of animal experiments and highlight the need for effective and practicable heterogenization strategies.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: Three authors (Niek van Stipdonk, Johanneke van der Harst and Berry Spruijt) are employed by a commercial company (Delta Phenomics BV), and part of the study was conducted in a lab of this company. However, the company played no other role in this study. The collaboration was based on Berry Spruijt's expertise in behavioural phenotyping and not on anything related to the company. Moreover, the costs for the study (animals, consumables) and the costs for the biotechnician who tested the animals was covered by the grant provided by the German Research Foundation (DFG). The authors confirm that the affiliation to this company does not alter their adherence to all the PLoS ONE policies on sharing data and materials.

Figures

Figure 1

Figure 1. Experimental design.

Each of six laboratories used 64 female mice of two inbred strains (C57BL6NCrl, DBA/2NCrl) ordered in two consecutive batches (n = 16 per batch and strain), with each batch being allocated to one experimental design. Upon arrival of a batch, the 16 mice per strain were randomly assigned to four cages in groups of four. To test heterogenization against standardization, we selected two experimental factors (test age, cage enrichment) and chose three factor levels A, B and C for each factor. Within each laboratory, the two factors were either standardized to factor level A (standardized design, uniform grey) or systematically varied across B and C using a 2×2 factorial design (heterogenized design, varying grey). According to the 2×2 factorial design of the heterogenized condition, study populations were divided into four blocks that were also characterized by similar microenvironmental differences due to cage position within the rack.

Figure 2

Figure 2. Experimental procedure followed by each laboratory.

The 64 mice per laboratory, aged nine weeks for the standardized design, and five and thirteen weeks for the heterogenized design, were supplied in two independent batches (n = 16/strain). Upon arrival, the mice were group-housed in conventional polycarbonate cages for three weeks. Cages of the standardized design (red) contained two pieces of tissue paper (nesting material), while half of the cages of the heterogenized design (blue) contained a mouse house and the other half a climbing structure and a wooden ladder. Subsequent to the three-week housing phase, mice were subjected to a battery of five behavioral tests. The whole experimental procedure lasted five weeks, including a three-week housing phase, a one-week test phase and one week shift between the behavioral tests of the standardized and the heterogenized design. The order was balanced across the six laboratories with three laboratories starting with the standardized and three laboratories with the heterogenized design.

Figure 3

Figure 3. Number of stretched postures on the elevated zero maze shown by C57BL/6NCrl and DBA/2NCrl mice.

Data are presented as means (+ s.e.m., square-root-transformed, n = 16/strain and laboratory). The example illustrates large effects of the laboratory in the standardized (A) and heterogenized (B) design. Moreover, the direction of strain difference differed between Giessen and Munich in the standardized design.

Figure 4

Figure 4. Object exploration time in the novel object test shown by C57BL/6NCrl and DBA/2NCrl mice.

Data are presented as means (+ s.e.m., square-root-transformed, n = 16/strain and laboratory). The example illustrates large effects of strain and laboratory in the standardized (A) and heterogenized (B) design. Moreover, the direction of strain difference differed between Giessen and Zürich in the heterogenized design.

Figure 5

Figure 5. Variation of strain main effects across the six laboratories in both designs.

For each laboratory and experimental design, the main effect of ‘strain’ was separately calculated and displayed in terms of the mean F-ratio (+ s.e.m., square-root-transformed) across all 29 behavioral measures. Although the strain effect varied considerably among laboratories in the heterogenized design, the standardized design produced even more variable outcomes. Moreover, average F-ratios for ‘strain’ were considerably higher in the standardized design, indicating that treatment effects may be systematically overestimated by standardization.

Figure 6

Figure 6. Variation between laboratories in the standardized and in the heterogenized design.

The variation in strain differences is displayed as mean F-ratios (+ s.e.m.) of the ‘strain-by-laboratory’ interaction term calculated for 29 behavioral measures. F-ratios were determined separately for the two experimental designs, square-root-transformed to meet the assumptions of parametric analysis, and then compared using a GLM blocked by ‘behavioral measure’. F-ratios of the ‘strain-by-laboratory’ interaction terms were significantly lower in the heterogenized design (F1,28 = 4.222, p = 0.049), indicating lower between-experiment variation.

Figure 7

Figure 7. Variation of mean strain differences in the standardized and heterogenized design across the six laboratories.

Four examples of selected behavioral measures from four of the five behavioral tests are displayed: (A) Latency to fall off the pole in the vertical pole test, (B) number of open segment entries on the elevated zero maze, (C) number of corner entries in the open field test and (D) path travelled within the exploration zone in the novel object test. Strain differences varied considerably between laboratories in both designs, but were somewhat more consistent in the heterogenized design. Each laboratory tested 16 mice per strain for each experimental design.

Figure 8

Figure 8. Between-experiment variation versus within-experiment variation.

To assess the relative weight of between-laboratory variation versus within-laboratory variation, an F-ratio was calculated that reflects the partitioning of the ‘strain-by-block’ variance between all 24 blocks of one experimental design into variance due to variation between laboratories and variance due to variation within laboratories. For this, the mean squares of the ‘strain-by-laboratory’ interaction term were divided by the mean squares of the ‘strain-by-block’ interaction term. Data are displayed as mean F-ratios (+ s.e.m.; square-root-transformed) across all 29 behavioral measures for both conditions. F-ratios were significantly smaller in the heterogenized design (F1,28 = 4.678, p = 0.039), demonstrating that heterogenization increased within-experiment variation relative to between-experiment variation.

Similar articles

Cited by

References

    1. Baggerly KA. Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments. Bioinformatics. 2004;20:777. - PubMed
    1. Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S, et al. Multiple-laboratory comparison of microarray platforms. Nat Methods. 2005;2:345–350. - PubMed
    1. Larkin JE, Frank BC, Gavras H, Quackenbush J. Independence and reproducibility across microarray platforms. Nat Methods. 2005;2:337–343. - PubMed
    1. Members of the Toxicogenomics Research Consortium. Standardizing global gene expression analysis between laboratories and across platforms. Nat Methods. 2005;2:351–356. - PubMed
    1. Yang H, Harrington CA, Vartanian K, Coldren CD, Hall R, et al. Randomization in laboratory procedure is key to obtaining reproducible microarray results. PLoS ONE. 2008;3:e3724. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources