High concentrations of long interspersed nuclear element sequence distinguish monoallelically expressed genes - PubMed (original) (raw)

High concentrations of long interspersed nuclear element sequence distinguish monoallelically expressed genes

Elena Allen et al. Proc Natl Acad Sci U S A. 2003.

Abstract

Genes subject to monoallelic expression are expressed from only one of the two alleles either selected at random (random monoallelic genes) or in a parent-of-origin specific manner (imprinted genes). Because high densities of long interspersed nuclear element (LINE)-1 transposon sequence have been implicated in X-inactivation, we asked whether monoallelically expressed autosomal genes are also flanked by high densities of LINE-1 sequence. A statistical analysis of repeat content in the regions surrounding monoallelically and biallelically expressed genes revealed that random monoallelic genes were flanked by significantly higher densities of LINE-1 sequence, evolutionarily more recent and less truncated LINE-1 elements, fewer CpG islands, and fewer base-pairs of short interspersed nuclear elements (SINEs) sequence than biallelically expressed genes. Random monoallelic and imprinted genes were pooled and subjected to a clustering analysis algorithm, which found two clusters on the basis of aforementioned sequence characteristics. Interestingly, these clusters did not follow the random monoallelic vs. imprinted classifications. We infer that chromosomal sequence context plays a role in monoallelic gene expression and may involve the recognition of long repeats or other features. The sequence characteristics that distinguished the high-LINE-1 category were used to identify more than 1,000 additional genes from the human and mouse genomes as candidate genes for monoallelic expression.

PubMed Disclaimer

Figures

Fig. 1.

Fig. 1.

LINE-1 and SINE sequence abundance in regions flanking biallelically expressed, random monoallelic, and imprinted genes. Percent LINE-1 and SINE were calculated as the quotient of base pairs of repeats to total base pairs as described (see supporting information). Graphs for percent LINE-1 and SINE sequence are superimposed, with the higher value behind the lower value. Green, SINE sequence content; blue, LINE-1 sequence content; red box, gene; stippled red line, Ig gene region; gray box, region for which sequence is not available. The overall LINE-1 and SINE averages for 200-kb regions flanking all available mouse or human RefSeq genes (see Methods) are marked to the right of each plot. LINE-1 average, blue letters; SINE average, green letters. Biallelically expressed genes: human β_-ACTIN_ (A) and mouse Rras-1 (B). Random monoallelically expressed genes: human XIST (C); mouse Ly49G2 (D); mouse olfactory gene cluster containing Or8,Or10, and Or28 (E); human IL-2 (F); mouse Ig κ gene region (G); mouse Jsap (H). Imprinted genes: mouse p57 (kip2) (I); human_H19_ (J); mouse Zfp127 (K).

Fig. 2.

Fig. 2.

Average LINE-1 and SINE sequence abundance in regions flanking genes that are or are not subject to monoallelic expression. Percent LINE-1 or SINE sequence in sequential 20-kb sequence windows proceeding 100 kb upstream of the 5′ end of the gene, or extending 100 kb downstream from the 3′ end of the gene, are shown. Densities were calculated as described in_Methods_. (A) LINE-1 sequence averages. (B) SINE sequence averages.

Fig. 3.

Fig. 3.

Boxplots of select covariates. B, biallelically expressed genes; I, imprinted genes; RM, random monoallelically expressed genes; S, random sampling of 150 genes. The width of each box plot reflects the sample size.P values are determined by the Kruskal–Wallis test comparing B, I, and RM genes.

Fig. 4.

Fig. 4.

Distribution of LINE-1 ages between groups of genes. Plotted are the L1 subfamily mean percentages and their 95% confidence intervals. Below each LINE-1 type are P values across the corresponding values for the biallelic (white circle), imprinted (triangle), and random monoallelic (black circle) genes (Kruskal–Wallis test). Average (see Methods) for a random sample of 150 genes (squares). Mean percentages were calculated as the quotient of subfamily elements and total L1 elements. (A) Five mouse subfamilies are listed from the most ancient (L1M4) to the most recent (Lx). (B) Five human L1 subfamilies listed left to right from the evolutionarily most ancient (L1M4) repeats to the most recent (L1P). See supporting information for more details.

Fig. 5.

Fig. 5.

Isotonic multidimensional scaling plots (A–C) and prediction strength plots (D–F) of the 39 imprinted and 33 randomly inactivated genes in Table 1. (A) Mouse (green) and human (black) genes. (B) Mouse genes: black, cluster 1; red, cluster 2. (C) Human genes: black, cluster 1; red, cluster 2. We estimated the number of gene clusters using the prediction strength criterion: the number of clusters is estimated as the largest_k_ such that prediction strength (PS)+SE is >0.8 (dashed line). Prediction strengths plus their standard errors for different choices of the number of clusters (k) are shown. (D) Mouse and human genes. (E) Mouse genes. (F) Human genes.

Similar articles

Cited by

References

    1. Nicholls, R. D. & Knepper, J. L. (2001) Annu. Rev. Genom. Hum. Genet. 2, 153–175. - PubMed
    1. Lyon, M. F. (1961) Nature 190, 372–373. - PubMed
    1. Riggs, A. D. (1990) Aust. J. Zool. 37, 419–441.
    1. Gartler, S. M. & Riggs, A. D. (1983) Annu. Rev. Genet. 17, 155–190. - PubMed
    1. Lyon, M. F. (1998) Cytogenet. Cell Genet. 80, 133–137. - PubMed

Publication types

MeSH terms

LinkOut - more resources