The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana - PubMed (original) (raw)

The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana

Steven B Cannon et al. BMC Plant Biol. 2004.

Abstract

Background: Most genes in Arabidopsis thaliana are members of gene families. How do the members of gene families arise, and how are gene family copy numbers maintained? Some gene families may evolve primarily through tandem duplication and high rates of birth and death in clusters, and others through infrequent polyploidy or large-scale segmental duplications and subsequent losses.

Results: Our approach to understanding the mechanisms of gene family evolution was to construct phylogenies for 50 large gene families in Arabidopsis thaliana, identify large internal segmental duplications in Arabidopsis, map gene duplications onto the segmental duplications, and use this information to identify which nodes in each phylogeny arose due to segmental or tandem duplication. Examples of six gene families exemplifying characteristic modes are described. Distributions of gene family sizes and patterns of duplication by genomic distance are also described in order to characterize patterns of local duplication and copy number for large gene families. Both gene family size and duplication by distance closely follow power-law distributions.

Conclusions: Combining information about genomic segmental duplications, gene family phylogenies, and gene positions provides a method to evaluate contributions of tandem duplication and segmental genome duplication in the generation and maintenance of gene families. These differences appear to correspond meaningfully to differences in functional roles of the members of the gene families.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Sizes of gene families in A. thaliana Approximate gene family sizes were calculated using single-linkage clustering of BLASTP similarities below E-value thresholds of 10-10 (red), 10-20 (black), and 10-30 (blue). At the resolution of this graph, these lines follow nearly the same path. The curves follow a power law distribution. The best-fit power law equation for the 10-10 curve is indicated on the graph.

Figure 2

Figure 2

Dot plots of similarities in A. thaliana chromosomes 1 and 2 Chromosome 1 is shown to the top and left, chromosome 2 on the bottom and right. Dots represent BLASTP similarities at bit score thresholds of 500. Synteny blocks identified by DiagHunter [20,21] are shown in black (larger images are available at [24]). Hits of proteins to themselves have been suppressed. A large excess of local duplications is apparent in higher densities near the main diagonal. The average density at any given distance between genes can be calculated from diagonal strips through the dot plot. One such strip is highlighted in chromosome 2 × 2.

Figure 3

Figure 3

Densities of homologs by genomic distance in A. thaliana chromosome 2 and genome-wide The graph on the left (3A) shows average densities in 100 kb diagonal strips through the chromosome 2 × 2 dot plot of similarities. The value at any position in the graph represents the number of homologs between 100 kb windows around a query location and a target location. The graph on the right (3B) shows similar density measurements, but within 5 kb windows and spanning up to 200 kb between genes. The x-axis measures the difference between the query and target locations. The thin line shows the density-by-distance plot for chromosome 2 × 2. The bold line shows the comparable plot for the whole genome, with scores averaged across all five A. thaliana chromosome comparisons. The red dotted line shows the best-fit exponential equation to the whole-genome curve, fitted from 5 kb to 100 kb.

Figure 4

Figure 4

Comparison of observed/expected tandem and segmental duplications for 50 large A. thaliana gene families Ratios of observed to expected tandem duplications in the 50 gene families in the study are shown on the vertical axis, and ratios of observed to expected segmental duplications on the horizontal axis. For purposes of discussion, one and two standard deviations around the means on each axis are shown with a box plot. Among families outside of one standard deviation, families with members that play roles in pathogen defense are indicated in red. Transcription factor families are shown in light green. Several housekeeping genes are shown in dark green. Several broad-function enzyme families are shown in brown. Notice the relative scarcity of gene families that are high in both categories, and eight families that have no apparent tandem duplications.

Figure 5

Figure 5

Proteasome 20S subunit family: low tandem, high segmental The phylogeny on the left shows segmental duplications in the A. thaliana proteasome 20S subunit family, which lacks tandem duplications. The phylogeny on the right represents the same A. thaliana sequences but with M. truncatula and tomato EST sequences added to evaluate the degree to which these homologs are conserved. Relationships of clades represented in both phylogenies are in general agreement, with some differences due to instabilities of some deep nodes.

Figure 6

Figure 6

NBS-LRR disease resistance family: moderate tandem, low segmental duplications The NBS-LRR disease resistance family is divided into two subfamilies: the non-TIR subfamily (top third of the phylogeny) and the TIR subfamily (the bottom two-thirds). Tandem duplications are indicated with "t" and segmental with "S." Other duplications are not classified by our methods. For clarity in the large tree, gene names and positions have been removed. The complete phylogeny, including bootstrap values, is available at [24].

Figure 7

Figure 7

Chlorophyll a/b binding protein family: high tandem, low segmental duplications The phylogeny on the left shows segmental and tandem duplications in the A. thaliana chlorophyll a/b binding protein family. Gene names used in the photosynthesis literature are included in this tree. The phylogeny on the right shows the same A. thaliana sequences, with M. truncatula and tomato EST sequences added to provide an indication of degree of conservation of these sequences and lineages. Notice the tandem duplications in the A. thaliana lhc1-3 clade, and the corresponding duplications in Medicago and tomato, many of which appear to have occurred after separation of these plant families.

Figure 8

Figure 8

Major latex protein family: high tandem, low segmental duplications The phylogeny on the left shows segmental and tandem duplications in the A. thaliana major latex protein family. The phylogeny on the right shows the same A. thaliana sequences with M. truncatula and tomato EST sequences added to provide an indication of degree of conservation of these sequences and lineages. Clades are generally represented in comparable relationships, with some differences due to instabilities of some deep nodes. Bootstrap values are indicated as follows: *** >90%; ** >=80%; * >=70%. Note the expansion of several clades in each species following separation of these taxa.

Similar articles

Cited by

References

    1. AGI Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408:796–815. doi: 10.1038/35048692. - DOI - PubMed
    1. Shiu SH, Bleecker AB. Receptor-like kinases from Arabidopsis form a monophyletic gene family related to animal receptor kinases. Proc Natl Acad Sci U S A. 2001;98:10763–10768. doi: 10.1073/pnas.181141598. - DOI - PMC - PubMed
    1. Tichtinsky G, Vanoosthuyse V, Cock JM, Gaude T. Making inroads into plant receptor kinase signalling pathways. Trends Plant Sci. - PubMed
    1. Feldman KA. Cytochrome P450s as genes for crop improvement. Plant Biotech. 2001;4:162–167. - PubMed
    1. Nelson DR. Arabidopsis P450 statistics. http://drnelson.utmem.edu/Arabfam.html

Publication types

MeSH terms

Substances

LinkOut - more resources