Plant genome size variation: bloating and purging DNA (original) (raw)

Plant genome size variation is a dynamic process of bloating and purging DNA. While it was thought plants were on a path to obesity through continual DNA bloating, recent research supports that most plants activity purge DNA. Plant genome size research has greatly benefited from the cataloguing of genome size estimates at the Kew Plant DNA C-values Database, and the recent availability of over 50 fully sequenced and published plant genomes. The emerging trend is that plant genomes bloat due to the copy-and-paste proliferation of a few long terminal repeat retrotransposons (LTRs) and aggressively purge these proliferating LTRs through several mechanisms including illegitimate and incomplete recombination, and double-strand break repair through non-homologous end joining. However, ultra-small genomes such as Utricularia gibba (Bladderwort), which is 82 megabases (Mb), purge excess DNA through genome fractionation and neofunctionalization during multiple rounds of whole genome duplication (WGD). In contrast, the largest published genome, Picea abies (Norway Spruce) at 19 800 Mb, has no detectable WGD but has bloated with diverse and diverged LTRs that either have evaded purging mechanisms or these purging mechanism are absent in gymnosperms. Finally, advances in DNA methylation studies suggest that smaller genomes have a more aggressive epigenomic surveillance system to purge young LTR retrotransposons, which is less active or missing in larger genomes like the bloated gymnosperms. While genome size may not reflect genome complexity, evidence is mounting that genome size may reflect evolutionary status.

‘Thus, unless evidence for a comprehensive mechanism for removing interspersed repetitive DNAs is found, and/or strong selective pressures for reducing genome size can be determined, we must conclude that plants may indeed have a one-way ticket to larger genome sizes.’

–Bennetzen and Kellogg [1]

THE DYNAMIC PLANT GENOME

Plants experience invading DNA and RNA, proliferating transposable elements (TEs), whole genome duplications (WGDs), tandem repeats and polyploidy events, all of which potentially contribute to larger genomes sizes. However, plant genomes, specifically angiosperms, are highly dynamic and span four orders of magnitude in size, with very small genomes as well as very large genomes ([2], Figure 1). Presumably plant genomes exploit this barrage of genetic information to adapt to new and diverse environments since plants can only move through reproduction or hitchhiking [3]. Despite this barrage of DNA and RNA from external and internal sources, plant genomes do not just grow out of control, or bloat. Plants also purge DNA through fractionation after WGD and removal of proliferating long terminal repeat retrotransposons (LTRs) by uneven and illegitimate recombination. Instead of a ‘one-way ticket to genome obesity,’ [1] plants have a dynamic system that remarkably maintains a relatively constant gene number [4, 5] and chromosome number within species [6].

Plant genomes sizes span four orders of magnitude. Genome size variation across plants, vertebrates and invertebrates was adapted from the kew database and Gregory et al. (2007) [2, Tables 2 and 3]. Genome sizes were converted from picograms (pg) to megabases (1 pg = 978 Mb, 1 Mb = 1.022 × 10–3 pg).

Figure 1:

Plant genomes sizes span four orders of magnitude. Genome size variation across plants, vertebrates and invertebrates was adapted from the kew database and Gregory et al. (2007) [2, Tables 2 and 3]. Genome sizes were converted from picograms (pg) to megabases (1 pg = 978 Mb, 1 Mb = 1.022 × 10–3 pg).

Plant genomes size dynamics have been well documented and curated in a remarkable resource at the Kew Plant DNA C-values Database (http://data.kew.org/cvalues/) [7–12]. Plant genome sizes vary by four orders of magnitude from the carnivorous corkscrew plant Genlisea aurea at 60 megabases (Mb) to the rare Japanese plant Paris japonica with a genome size of a staggering 152000 Mb [7]. Genome sizes have been defined as very small (<1300 Mb), intermediate (>3400 and <13700 Mb) and large (>34000 Mb), and the ancestral angiosperm and gymnosperm genomes have been estimated to be small and intermediate, respectively [9]. The most frequently observed genome size in the Kew database is about 500 Mb, suggesting either that smaller genomes have been preferentially genome sized to date or in fact small plant genomes are the norm and DNA purging is the norm (Figure 2A). In fact, a general conclusion emerging from this database and other focused studies is that genome size is highly dynamic with both genome size increases and decrease within families and genera [9, 12]. In addition, using the Kew database it was demonstrated that 20% of genome size variation is explained by recombination rate across a broad group of species [13]. In contrast, larger genomes are restricted to species that occupy highly derived positions within clades [14], which is consistent with bloating being more isolated and associated with plants that are specializing.

Angiosperm genomesize frequency and chromosome number from Kew Plant DNA C-values Database [7]. (A) The most frequently observed genome size in the Kew database is 500 Mb. However, plant genomes span several orders of magnitude from G. aurea at 60 Mb to P. japonica at 152 000 Mb [7]. (B) Plotting chromosome number by genome size (Mb) from the Kew database reveals that larger genomes have fewer chromosomes while smaller genomes have more chromosomes.

Figure 2:

Angiosperm genomesize frequency and chromosome number from Kew Plant DNA C-values Database [7]. (A) The most frequently observed genome size in the Kew database is 500 Mb. However, plant genomes span several orders of magnitude from G. aurea at 60 Mb to P. japonica at 152 000 Mb [7]. (B) Plotting chromosome number by genome size (Mb) from the Kew database reveals that larger genomes have fewer chromosomes while smaller genomes have more chromosomes.

More than 50 plant genomes have been published over the past 13 years [5], which provides an unprecedented opportunity to explore the mechanisms of plant genome bloating versus purging, and the mechanisms that govern the architecture of plant genomes. In this year alone, the smallest and the largest genomes were published; retrotransposons were re-annotated by leveraging comparative genomics; and new findings in DNA methylation dynamics elucidated the forces driving genome evolution. While several excellent reviews have covered different aspects of genome size variation over the years [15–20], this review will focus on recent whole genome-based research that is shaping our understanding of the forces that determine whether a genome is bloating or purging DNA.

THE FIRST 50 PLANT GENOMES

Over 50 plant genomes have been published representing 36 dicots, 16 monocots, and one gymnosperm, lycopod and bryophyte each, which has provided a rich new resource to evaluate genome size from a whole genome comparative perspective [5]. The smallest genome published is the carnivorous Utricularia gibba (bladderwort) at 82 Mb [21], while the largest is Picea abies (Norway Spruce) at 19 800 Mb [22]. The most frequently observed published genome size is 500 Mb, which is similar to the most frequently reported genome size in the Kew database. While high throughput sequencing has accelerated plant genome sequencing [5], large, heterozygous and polyploidy genomes still require advanced approaches such as sequencing double haploid and monoploid plants as in banana and potato, respectively [23, 24], sub genomes in wheat and cotton [25–29] and gene space scaffolded by physical and genetics maps in barley [30].

Several key features of plant genomes have emerged such as an average genome count of about 32 thousand (k) protein-coding genes. There are some notable exceptions such as Malus x domestica (apple) with 57 k and Medicago truncatula (burclover) with 62 k predicted protein-coding genes. However, higher protein-coding gene predictions are often revised downward towards 32 k as assemblies improve and imperial gene-prediction data such as RNA-seq become available. This number of protein-coding genes is roughly double the predicted ancestral gene count of 12–14 k [4], most likely reflecting that plants preferentially retain duplicate genes after WGD events, through neofunctionalization such as seen in the tomato fruit genes [31]. Whole genome sequence has also made it possible to follow horizontal gene transfer, which accounts for the acquisition of genes essential for plant functions such as xylem formation, plant defense and nitrogen cycling, and may play a role in augmentation of gene count over the ancestral number [32].

Overall, protein-coding gene count is not significantly correlated to genome size in the plants published to date (_R_-squared: 0.003, _P_-value: 0.731). However, 73% of the published genomes are crop species, which represent a small slice of the species in the Viridiplantae, and could be biased by the effect that domestication has had on population size and nucleotide variation. For instance, the Spirodela polyrhiza (Greater Duckweed), which is an aquatic non-grass monocot being sequenced by the Joint Genome Institute (http://www.jgi.doe.gov/sequencing/why/duckweed.html), has only 19 623 predicted protein-coding genes (T.P. Michael, unpublished results; spirodelagenome.org), suggesting that our understanding of gene number may change as a broader swath of plant genomes are sequenced outside of the crops.

In contrast, repeat sequence and specifically the proliferation of TEs are driving genome size variation across the published sequenced genomes. Repeat sequence ranges from 3% in the 82 Mb genome of U. gibba [21] to 85% in the economically important Zea mays (maize) [33], while 57% is the most frequently observed amount of repeat sequence observed in published plant genomes. There is a positive correlation between genome size and repeat content (Figure 3A), and this is despite the fact that most genome assemblies fail to assemble 15% of the genome due to high copy number repeats that are hard (or impossible) to assemble with current sequencing technologies. Complicating the assembly and annotation of repeat sequence is the fact that 36% (20/55) of the published genomes have been sequenced solely with high throughput short read sequencing technology.

LTR number and overall repeat content correlates to genome size. (A) 21 genome publications reported the number of full-length LTRs found in the genome. These publications also reported overall repeat content, which was correlated to genome size. (B) In addition, the number of reported full-length LTRs was correlated to genome size in the 21 genomes. (C) In a more recent publication, full-length LTR number was re-estimated across eight published genomes and they were still correlated to genome size [35].

Figure 3:

LTR number and overall repeat content correlates to genome size. (A) 21 genome publications reported the number of full-length LTRs found in the genome. These publications also reported overall repeat content, which was correlated to genome size. (B) In addition, the number of reported full-length LTRs was correlated to genome size in the 21 genomes. (C) In a more recent publication, full-length LTR number was re-estimated across eight published genomes and they were still correlated to genome size [35].

In plant genomes, up to 90% of repeat sequence can be dominated by the two types of TEs, class I RNA-based ‘copy-and-paste’ retrotransposons and class II DNA-based ‘cut-and-paste’ transposons (Figure 4). While cut-and-past DNA transposons move from one position to another during chromosome replication, copy-and-paste retrotransposons are expressed as RNA and reverse-transcribed into a new DNA element that can be inserted every replication cycle [34]. Therefore, the expression of retrotransposons, and specifically LTR retrotransposons, leads to a rapid amplification that in turn drives the bloating of DNA in plant genomes. Indeed, several plant genome publications report the estimated number of full-length LTR retrotransposons and there is a positive correlation with genome size (Figure 3B). However, comparative genomic approaches are needed to refine the annotation of full-length LTR retrotransposons.

TEs amplify by either the class I cut-and-paste or class II copy-and-paste mechanisms. Class I DNA transposons move through a cut-and-paste mechanism where a TE is excised and inserted into a new genomic location each replication cycle. In contrast, LTR retrotransposons move through a copy-and-paste mechanism that involves a transcription and reverse-transcription step that leads to a new LTR each replication cycle.

Figure 4:

TEs amplify by either the class I cut-and-paste or class II copy-and-paste mechanisms. Class I DNA transposons move through a cut-and-paste mechanism where a TE is excised and inserted into a new genomic location each replication cycle. In contrast, LTR retrotransposons move through a copy-and-paste mechanism that involves a transcription and reverse-transcription step that leads to a new LTR each replication cycle.

GENOME BLOATING IS DRIVEN BY RECENT TRANSPOSITION AND ANCIENT RETENTION

While most of what is known about genome size variation in plants is how plant genomes are increasing, or bloating, several studies utilizing whole genome sequence have provided new perspectives on the process. The proliferation of TEs and specifically LTRs in genomes is the primary driver of genome size differences in plants [19], yet our understanding of LTR proliferation has been focused on expansion in one or few organisms due to experimental constraints and the lack of high quality whole genome sequence. A large comparative study looking at eight high quality genomes [Arabidopsis thaliana, Arabidopsis lyrata, Vitis vinifera (grape), Glycine max (soy), Oryza sativa (rice), Brachypodium distachyon, Sorghum bicolor (sorghum) and Zea mays (maize)] using a robust and automated annotation and classification process also found a correlation in full-length LTRs and genome size (Figure 3C) [35]. Utilizing this comparative genomics dataset, this study identified several unifying principles governing LTR expansions across these species: (1) LTRs accumulate in bursts of only one or a few families; (2) LTRs are rapidly removed since few were older than 3 million years; and (3) LTR bursts and removal were independent of lineage [35]. Therefore, genome bloating is highly active, recent and dynamic in genomes sequenced to date, although it still remains unclear what triggers that proliferation event.

A contrasting view of genome bloating was provided in the Norway Spruce genome, which has the largest genome sequenced to date at 19600 Mb and represents the first and only high quality gymnosperm genome sequence [22]. In contrast to most of the crop species sequenced to date that have undergone several rounds of WGD or remain polyploid, no evidence was detected that the spruce genome has undergone a WGD since the divergence with angiosperms 350 million years ago. In addition, spruce has the longest mean intron length of sequenced plant genomes, and while there is not a statistically significant correlation between intron length and genome size, the largest genomes have longer introns, and the smallest genomes have shorter introns [22].

The prime driver of genome size in spruce is that its 70% repetitive fraction is dominated by diverse and low copy number LTRs, with more than 86% of LTRs identified as singletons. These diverse LTRs are shared across draft genomes of several other gymnosperms, consistent with the slow and steady accumulation of these elements since the split with angiosperms. A similar pattern of low copy number LTRs was also found in sequencing of 10 pine BACs [36], suggesting that gymnosperm genomes commonly accumulate LTRs but seem to lack mechanisms to remove repeat elements. Large bloated genomes like gymnosperms could provide clues as to which mechanisms are missing for effective genome purging.

GENOME PURGING THROUGH HYPER WGD AND SELECTION ON MANY SMALL DELETIONS

The mechanisms governing genome purging have gained some attention recently due to the fact that genomes of species closely related to high quality plant models, as well as very large and small genomes have been published. Genome purging is thought to involve illegitimate or incomplete recombination, or other types of deletions. LTRs are generally purged through two different mechanisms: homologous recombination and deletion, and while the former results in solo-LTRs, the later just gradually eliminates LTRs, leaving partial LTRs [22]. Elegant work based on the idea that the presence of solo-LTRs are evidence of illegitimate recombination provided early insight into the mechanisms responsible for LTR removal, confirming that LTRs have been purged from the small genomes of A. thaliana and rice [37, 38]. A similar study in cotton also showed that LTR bursts correlate with genome size and that smaller cotton genomes have a faster rate of LTR purging [39].

Another way of approaching the question of genome purging is to look at genomes that purge ultra-aggressively; presumably these genomes are ultra-small. Such a group of genomes was identified in the carnivorous Lentibulariaceae family, which include Genlisea and Utricularia with genomes sizes of 63 Mb and 88 Mb, respectively [40]. It has been hypothesized that these genomes are ultra-small due to purging damaged DNA that resulted from oxidative damage associated with a carnivorous habit [41]. Sequencing revealed that both genomes have smaller introns, reduced intergenic sequence and the smallest amount of reported repeat sequence at 3%, and that U. gibba has only 95 predicted full-length LTRs [21, 42]. While there was no evidence of increased genome evolution in U. gibba as hypothesized, at least three WGD were identified since common ancestry with tomato that were resolved through genome fractionation and reduction to the current ultra-small genome [21]. It is interesting to speculate that the U. gibba and G. aurea genomes are in a state of hyper-purging where new traits are actively acquired through WGD, and purging is just a consequence of fractionation to manage gene dosage. However, the converse could also be true; these ultra-small genomes are hyperactively purging LTRs, and individuals that undergo WGD are selected for since they may duplicate essential genes that are lost in purging.

However, questions remain as to whether the processes that drove genome purging are under selection. Sequencing of A. lyrata, a close relative of the model gold standard plant genome A. thaliana provided what was needed for a detailed comparative analysis of genome purging [43]. At 207 Mb, the A. lyrata genome is 1.6 larger A. thaliana, although A. lyrata only has 1.2 times more genes (32 670 versus 27 025), and while there are some larger rearrangements, hundreds of thousands of small deletions account for the major genome size difference. Using 95 re-sequenced A. thaliana accessions, it was found that the deletions are approaching fixation, consistent with selection and not mutational bias acting on these many deletions. These results provide compelling evidence that smaller A. thaliana genomes are being selected for but the mechanism is still not clear. This could be explained by the mutational-hazard hypothesis, which proposes that non-coding DNA is more likely to accumulate deleterious mutations and be purged [44]. Consistent with this, it was also found that in A. thaliana intron loss is correlated with a higher mutation rate, and compared to A. lyrata intron loss is associated with selection for genome size reduction [45, 46].

Another study looked at a close relative of rice, which provided a similar scenario that LTRs are purged in smaller genomes, but also shed light on the effect that double-strand break (DSB) repair has on the process of genome purging. Sequencing of the 261 Mb wild rice genome of Oryza brachyantha, which is 68% smaller than its cultivated relative O. sativa, showed that 50% of the size difference was due to amplification of recent LTRs [47]. However, it was found that 30% of the genomes were not collinear, and that non-homologous end joining (NHEJ) after DSB accounted for a majority of gene movements between the two genomes. Shuffling genes at that rate would have a significant impact on gene function as well as possibility leading to reproductive barriers and speciation. It was recently shown that 21 bp small RNA-termed diRNAs are released at DSBs, presumably to initiate repair [48], which suggests the small RNA and epigenetic machinery play a role in DSB identification and repair. This study and others discussed below suggest that the epigenome plays a major role in regulating the dynamics of genome size variation.

THE EPIGENOME DRIVES GENOME SIZE

Central to genome size regulation is the ability for a plant to preserve ‘self,’ while exploiting recombination, transposon proliferation and gene duplication for genome innovation. In a seminal AAAS Presidential address Nina Fedoroff weaved together disparate findings relating to genome size variation into a coherent plant evolution hypothesis: ‘I argue that transposable elements accumulate in eukaryotic genomes because of, not despite, epigenetic silencing mechanisms.’ [49]. While understanding of epigenetic mechanisms is in its infancy in plants, key discoveries harnessing the power of high throughput sequencing to survey the heritable epigenetic methylation of cytosine nucleotides (DNA methylation) at a single nucleotide resolution have provided some new clues to how the epigenome is molding or even driving genome size variation in plants.

In plants, DNA methylation predominantly occurs at repetitive sequence and TEs [50]. However, in plants tested to date, DNA methylation is detected across the whole genome with higher levels in gene bodies and lower levels closer to transcriptional start sites. All three contexts of DNA methylation, CG, CHG and CHH (H = A, T or C), are abundant in TEs, while gene body methylation is dominated by CG methylation (and CHG and CHH are almost absent). Global CG methylation levels are the highest in B. distachyon and rice at 56% and 59%, respectively, while A. thaliana is half that at 22% [51, 52]. DNA methylation of repetitive sequence and TEs acts to silence their transcription, and in turn limit their ability to proliferate through the copy-and-paste mechanism. However, the function of gene body methylation is not known, but it is evolutionarily conserved across plants, and it has been suggested that it plays a role in suppressing intragenic promoters, or enhancing the accuracy of splicing [51, 53].

Using a whole genome approach, it was shown in A. thaliana that body methylated genes were longer, evolving more slowly and more likely to exhibit phenotypic effects when knocked out [54]. Furthermore, a comparative study between B. distachyon and rice demonstrated that gene body methylation was conserved between orthologs, and as seen in A. thaliana these genes were longer and slow evolving [52].Interestingly, body methylated genes tend to be moderately expressed, whereas highly or lowly expressed genes tend not to be body methylated [53]. Therefore, despite several rounds of WGD, moderately expressed body methylated genes are preferentially retained, possibly by the fact that they are body methylated or some unknown epigenetic mechanism. Understanding the mechanisms controlling how plants recognize self, or protect essential aspects of their genome, will be important steps in understanding genome size variation.

Heavily methylated LTRs generally mark gene-poor heterochromatin, centromeres and telomeres, while gene-rich regions are usually devoid or have few LTRs. These findings have led to several studies asking if this relationship between genes and LTRs plays a role in the evolution of plant genomes. Evolutionarily young LTR insertions that fall close to genes are rapidly targeted for DNA methylation, possibly as a result of an ‘epigenomic surveillance system’ that establishes a more stable association between Pol V and the promoter of the gene close to the LTR insertion [55] (Figure 5). The rapid DNA methylation of young LTRs is guided by 24 nt small interfering RNAs (siRNAs), which results in the silencing of transcription of that LTR but also negatively impacts the expression of local genes through siRNA spreading [56–58]. In contrast, LTRs further from genes are older and less methylated, suggesting that LTR insertions close to genes are more likely to be methylated and purged [56, 58, 59]. Smaller genomes like A. thaliana possibly purge more effectively because siRNAs are more specific (uniquely match) to LTRs. This finding is consistent with the observation that LTRs are further from genes in A. thaliana, presumably because they have been purged more effectively, compared to its larger and LTR laden relative A. lyrata whose siRNAs are less specific to its LTRs [43, 57].

Epigenetic surveillance system targets young transposons for removal while transposons further from genes remain and lead to heterochromatic regions. As transposons proliferate the chances of them landing near genes is a function of the current genome size and chromatin structure. If an LTR lands close to a gene, Pol V forms a less transient relationship with the promoter of that genes. Which in turn generates siRNA that target loci for silencing through DNA methylation.

Figure 5:

Epigenetic surveillance system targets young transposons for removal while transposons further from genes remain and lead to heterochromatic regions. As transposons proliferate the chances of them landing near genes is a function of the current genome size and chromatin structure. If an LTR lands close to a gene, Pol V forms a less transient relationship with the promoter of that genes. Which in turn generates siRNA that target loci for silencing through DNA methylation.

However, it is also possible that it is just more likely that an LTR will fall near a gene in smaller genomes, which in turn results in more active siRNA. Therefore, genome bloating or purging becomes an overall balance of current chromosome size, gene density and LTR activity [60]. As chromosomes bloat and gene density decreases, the epigenomic surveillance system is less likely to engage, resulting in the presence of many old LTRs as seen in the bloated Norway Spruce genome [22]. In smaller genomes where gene density is high, the impact of LTR could be much more severe like in the Genlisea and Utricularia genomes where chromosomes are estimated to be around 1 Mb in size [40]. In addition to genome size, the Kew database also tracks chromosome number, and a broad relationship emerges that smaller genomes have more chromosomes and larger genomes have fewer chromosomes (Figure 2B). A similar negative correlation between chromosome number and genome size has been observed in the Sedge genus Carex, which is characterized by its holocentric chromosomes and variability in chromosome number and genome size [61]. While not all genomes have a ‘one-way ticket to genome obesity,’ [1] or bloating to a few very large chromosomes, it appears there are several mechanisms, possibly driven by an epigenomic surveillance system, that maintain dynamic, small genomes with fewer chromosomes. The C-value paradox questioned why genome size does not correlate with biological complexity [1]; based on these recent findings, an updated question might be whether genome size is negatively correlated with evolutionary dynamism.

CONCLUSION AND PERSPECTIVES

Over the last 2 years 60% of the sequenced plant genomes were published. We are early days in understanding plant genome architecture, and the next 10 years promises to unlock new features of plant genomes that will provide clues as to why some genomes bloat while others purge their DNA. High quality finished plant genomes and surveys across plant populations will provide the needed material to understand which parts of genomes are susceptible to genome purging, the general population level dynamics needed to sustain genome purging and the overall consequences for a species that is in a cycle of purging or bloating. Whether it is due to millions of small deletions in the genome, wholesale loss of chromosomes or the selective loss and recombination of chromosomal regions due to novel epigenomic configurations, the diversity of plants will reveal that all forces play a role.

Key points

FUNDING

This work was funded by a grant from the DOE-Plant Feedstock Genomics for Bioenergy Program (DE-FG02-08ER6430).

References

1

Bennetzen

JL

Kellogg

EA

Do plants have a one-way ticket to genomic obesity?

Plant Cell

1997

9

1509

14

2

Gregory

TR

Nicol

JA

Tamm

H

et al.

Eukaryotic genome size databases

Nucleic Acids Res

2007

35

D332

8

3

Kellogg

EA

Bennetzen

JL

The evolution of nuclear genome structure in seed plants

Am J Bot

2004

91

1709

25

4

Proost

S

Pattyn

P

Gerats

T

et al.

Journey through the past: 150 million years of plant genome evolution

Plant J

2011

66

58

65

5

Michael

TP

Jackson

S

The first 50 plant genomes

Plant Genome

2013

6

1

7

6

Tang

H

Bowers

JE

Wang

X

et al.

Synteny and collinearity in plant genomes

Science

2008

320

486

8

8

Bennett

MD

Leitch

IJ

Nuclear DNA amounts in angiosperms: targets, trends and tomorrow

Ann Bot

2011

107

467

590

9

Leitch

IJ

Soltis

DE

Soltis

PS

et al.

Evolution of DNA amounts across land plants (embryophyta)

Ann Bot

2005

95

207

17

10

Bennett

MD

Leitch

IJ

Nuclear DNA amounts in angiosperms: progress, problems and prospects

Ann Bot

2005

95

45

90

11

Bennett

MD

Leitch

IJ

Plant genome size research: a field in focus

Ann Bot

2005

95

1

6

12

Wendel

JF

Cronn

RC

Johnston

JS

et al.

Feast and famine in plant genomes

Genetica

2002

115

37

47

13

Ross-Ibarra

J

Genome size and recombination in angiosperms: a second look

J Evol Biol

2007

20

800

6

14

Soltis

DE

Soltis

PS

Bennett

MD

et al.

Evolution of genome size in the angiosperms

Am J Bot

2003

90

1596

603

15

Bennetzen

JL

Ma

J

Devos

KM

Mechanisms of recent genome size variation in flowering plants

Ann Bot

2005

95

127

32

16

Gaut

BS

Wright

SI

Rizzon

C

et al.

Recombination: an underappreciated factor in the evolution of plant genomes

Nat Rev Genet

2007

8

77

84

17

Gaut

BS

Ross-Ibarra

J

Selection on major components of angiosperm genomes

Science

2008

320

5875

484

6

18

Flowers

JM

Purugganan

MD

The evolution of plant genomes: scaling up from a population perspective

Curr Opin Genet Dev

2008

18

565

70

19

Tenaillon

MI

Hollister

JD

Gaut

BS

A triptych of the evolution of plant transposable elements

Trends Plant Sci

2010

15

471

8

20

Grover

CE

Wendel

JF

Recent insights into mechanisms of genome size change in plants

J Bot

2010

2010

382732

21

Ibarra-Laclette

E

Lyons

E

Hernández-Guzmán

G

et al.

Architecture and evolution of a minute plant genome

Nature

2013

498

94

8

22

Nystedt

B

Street

NR

Wetterbom

A

et al.

The Norway spruce genome sequence and conifer genome evolution

Nature

2013

497

579

84

23

D'Hont

A

Denoeud

F

Aury

JM

et al.

The banana (Musa acuminata) genome and the evolution of monocotyledonous plants

Nature

2012

488

213

7

24

Potato Genome Sequencing Consortium

Xu

X

Pan

S

Cheng

S

et al.

Genome sequence and analysis of the tuber crop potato

Nature

2011

475

189

95

25

Brenchley

R

Spannagl

M

Pfeifer

M

et al.

Analysis of the bread wheat genome using whole-genome shotgun sequencing

Nature

2012

491

705

10

26

Jia

J

Zhao

S

Kong

X

et al.

Aegilops tauschii draft genome sequence reveals a gene repertoire for wheat adaptation

Nature

2013

496

91

5

27

Ling

HQ

Zhao

S

Liu

D

et al.

Draft genome of the wheat A-genome progenitor Triticum urartu

Nature

2013

496

87

90

28

Paterson

AH

Wendel

JF

Gundlach

H

et al.

Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres

Nature

2012

492

423

7

29

Wang

K

Wang

Z

Li

F

et al.

The draft genome of a diploid cotton Gossypium raimondii

Nat Genet

2012

44

1098

103

30

International Barley Genome Sequencing Consortium

Mayer

KF

Waugh

R

et al.

A physical, genetic and functional sequence assembly of the barley genome

Nature

2012

491

711

6

31

Tomato Genome Consortium

The tomato genome sequence provides insights into fleshy fruit evolution

Nature

2012

485

635

41

32

Yue

J

Hu

X

Sun

H

et al.

Widespread impact of horizontal gene transfer on plant colonization of land

Nat Commun

2012

3

1152

33

Schnable

PS

Ware

D

Fulton

RS

et al.

The B73 maize genome: complexity, diversity, and dynamics

Science

2009

326

1112

5

34

Wicker

T

Sabot

F

Hua-Van

A

et al.

A unified classification system for eukaryotic transposable elements

Nat Rev Genet

2007

8

973

82

35

El Baidouri

M

Panaud

O

Comparative genomic paleontology across plant kingdom reveals the dynamics of TE-driven genome evolution

Genome Biol Evol

2013

5

954

65

36

Kovach

A

Wegrzyn

JL

Parra

G

et al.

The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences

BMC Genomics

2010

11

420

37

Ma

J

Devos

KM

Bennetzen

JL

Analyses of LTR-retrotransposon structures reveal recent and rapid genomic DNA loss in rice

Genome Res

2004

14

860

9

38

Devos

KM

Brown

JK

Bennetzen

JL

Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis

Genome Res

2002

12

1075

9

39

Hawkins

JS

Proulx

SR

Rapp

RA

et al.

Rapid DNA loss as a counterbalance to genome expansion through retrotransposon proliferation in plants

Proc Natl Acad Sci USA

2009

106

17811

6

40

Greilhuber

J

Borsch

T

Müller

K

et al.

Smallest angiosperm genomes found in lentibulariaceae, with chromosomes of bacterial size

Plant Biol (Stuttg)

2006

8

770

7

41

Albert

VA

Jobson

RW

Michael

TP

et al.

The carnivorous bladderwort (Utricularia, Lentibulariaceae): a system inflates

J Exp Bot

2010

61

5

9

42

Leushkin

EV

Sutormin

RA

Nabieva

ER

et al.

The miniature genome of a carnivorous plant Genlisea aurea contains a low number of genes and short non-coding sequences

BMC Genomics

2013

14

476

43

Hu

TT

Pattyn

P

Bakker

EG

et al.

The Arabidopsis lyrata genome sequence and the basis of rapid genome size change

Nat Genet

2011

43

476

81

44

Lynch

M

Koskella

B

Schaack

S

Mutation pressure and the evolution of organelle genomic architecture

Science

2006

311

1727

30

45

Fawcett

JA

Rouzé

P

Van de Peer

Y

Higher intron loss rate in Arabidopsis thaliana than A. lyrata is consistent with stronger selection for a smaller genome

Mol Biol Evol

2012

29

849

59

46

Yang

YF

Zhu

T

Niu

DK

Association of intron loss with high mutation rate in Arabidopsis: implications for genome size evolution

Genome Biol Evol

2013

5

723

33

47

Chen

J

Huang

Q

Gao

D

et al.

Whole-genome sequencing of Oryza brachyantha reveals mechanisms underlying Oryza genome evolution

Nat Commun

2013

4

1595

48

Wei

W

Ba

Z

Gao

M

et al.

A role for small RNAs in DNA double-strand break repair

Cell

2012

149

101

12

49

Fedoroff

NV

Presidential address. Transposable elements, epigenetics, and genome evolution

Science

2012

338

758

67

50

Law

JA

Jacobsen

SE

Establishing, maintaining and modifying DNA methylation patterns in plants and animals

Nat Rev Genet

2010

11

204

20

51

Feng

S

Cokus

SJ

Zhang

X

et al.

Conservation and divergence of methylation patterning in plants and animals

Proc Natl Acad Sci USA

2010

107

8689

94

52

Takuno

S

Gaut

BS

Gene body methylation is conserved between plant orthologs and is of evolutionary consequence

Proc Natl Acad Sci USA

2013

110

1797

802

53

Zemach

A

McDaniel

IE

Silva

P

et al.

Genome-wide evolutionary analysis of eukaryotic DNA methylation

Science

2010

328

916

9

54

Takuno

S

Gaut

BS

Body-methylated genes in Arabidopsis thaliana are functionally important and evolve slowly

Mol Biol Evol

2012

29

219

27

55

Zhong

X

Hale

CJ

Law

JA

et al.

DDR complex facilitates global association of RNA polymerase V to promoters and evolutionarily young transposons

Nat Struct Mol Biol

2012

19

870

5

56

Hollister

JD

Gaut

BS

Epigenetic silencing of transposable elements: a trade-off between reduced transposition and deleterious effects on neighboring gene expression

Genome Res

2009

19

1419

28

57

Hollister

JD

Smith

LM

Guo

YL

et al.

Transposable elements and small RNAs contribute to gene expression divergence between Arabidopsis thaliana and Arabidopsis lyrata

Proc Natl Acad Sci USA

2011

108

2322

7

58

Ahmed

I

Sarazin

A

Bowler

C

et al.

Genome-wide evidence for local DNA methylation spreading from small RNA-targeted sequences in Arabidopsis

Nucleic Acids Res

2011

39

6919

31

59

Vonholdt

BM

Takuno

S

Gaut

BS

Recent retrotransposon insertions are methylated and phylogenetically clustered in japonica rice (Oryza sativaspp.japonica)

Mol Biol Evol

2012

29

3193

203

60

Tian

Z

Rizzon

C

Du

J

et al.

Do genetic recombination and gene density shape the pattern of DNA elimination in rice long terminal repeat retrotransposons?

Genome Res

2009

19

2221

30

61

Lipnerová

I

Bures

P

Horová

L

et al.

Evolution of genome size in Carex(Cyperaceae) in relation to chromosome number and genomic base composition

Ann Bot

2013

111

79

94

© The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com