Distribution of genes in the genome of Arabidopsis thaliana and its implications for the genome organization of plants - PubMed (original) (raw)

Comparative Study

Distribution of genes in the genome of Arabidopsis thaliana and its implications for the genome organization of plants

A Barakat et al. Proc Natl Acad Sci U S A. 1998.

Abstract

Previous work has shown that, in the large genomes of three Gramineae [rice, maize, and barley: 415, 2,500, and 5,300 megabases (Mb), respectively] most genes are clustered in long DNA segments (collectively called the "gene space") that represent a small fraction (12-24%) of nuclear DNA, cover a very narrow (0.8-1.6%) GC range, and are separated by vast expanses of gene-empty sequences. In the present work, we have analyzed the small (ca. 120 Mb) nuclear genome of Arabidopsis thaliana and shown that its organization is drastically different from that of the genomes of Gramineae. Indeed, (i) genes are distributed over about 85% of the main band of DNA in CsCl and cover an 8% GC range; (ii) ORFs are fairly evenly distributed in long (>50 kb) sequences from GenBank that amount to about 10 Mb; and (iii) the GC levels of protein-coding sequences (and of their third codon positions) are correlated with the GC levels of their flanking sequences. The different pattern of gene distribution of Arabidopsis compared with Gramineae appears to be because the genomes of the latter comprise (i) many large gene-empty regions separating gene clusters and (ii) abundant transposons in the intergenic sequences of gene clusters. Both sequences are absent or very scarce in the Arabidopsis genome. These observations provide a comparative view of angiosperm genome organization.

PubMed Disclaimer

Figures

Figure 1

Figure 1

(A) Absorbance profile of Arabidopsis nuclear DNA as obtained by centrifugation in a CsCl analytical density gradient. The shoulder (s) may correspond to contaminating chloroplast DNA, the following small peaks to contaminating mitochondrial DNA (ρ = 1.706 g/cm3), rDNA (ρ = 1.707 g/cm3), and to three satellite DNAs (see text). The shaded area corresponds to the DNA fractions containing nuclear protein-encoding genes (see legend of Fig. 2). (B) Compositional distribution of large (>50 kb) GenBank DNA sequences from Arabidopsis. (C) Gene distribution obtained by plotting the relative number of Arabidopsis genes against their GC3 values (top scale); 2,490 sequences from GenBank (release 103; October 15, 1997) were used to construct the histogram. In C, the common GC abscissa of the three plots represents the GC values of the DNA fractions containing the genes (as derived from Fig. 2).

Figure 2

Figure 2

Plot of GC3 of genes (circles) versus GC values of DNA fractions corresponding to the hybridization peaks (from the data of Table 1). The solid circles represent the two extreme GC3 values of Arabidopsis genes as found in GenBank. The vertical broken lines indicate the GC range of the DNA fractions containing the genes. This was used to define in Fig. 1_A_ (shaded area) the DNA range in which genes are located.

Figure 3

Figure 3

ORF density (number of ORFs per 100 kb) in large (>50 kb) DNA segments from Arabidopsis (circles). Average values were also estimated for each 1% GC bin (horizontal bars).

Figure 4

Figure 4

A scheme of genome organization and gene distribution in plant genomes. (A) In the large genomes of Gramineae, genes (large vertical boxes) are present in long gene clusters, which are separated from each other by gene-empty regions formed by repeated sequences (thick solid line). The ensemble of gene clusters forms the gene space. The intergenic sequences are compositionally very homogenous because largely formed by transposons (small horizontal boxes in the intergenic sequences). (B) The small genome of Arabidopsis essentially differs from the genomes of Gramineae because of (i) the disappearance (or very strong reduction) of gene-empty regions; (ii) the practical absence of transposons in intergenic sequences; and (iii) the higher gene density.

Similar articles

Cited by

References

    1. Salinas J, Matassi G, Montero L M, Bernardi G. Nucleic Acids Res. 1988;19:5561–5567. - PubMed
    1. Matassi G, Montero L M, Salinas J, Bernardi G. Nucleic Acids Res. 1989;17:5273–5290. - PMC - PubMed
    1. Montero L M, Matassi G, Bernardi G. Nucleic Acids Res. 1990;18:1859–1867. - PMC - PubMed
    1. Carels N, Barakat A, Bernardi G. Proc Natl Acad Sci USA. 1995;92:11057–11060. - PMC - PubMed
    1. Barakat A, Carels N, Bernardi G. Proc Natl Acad Sci USA. 1997;94:6857–6861. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources