Diversification of Genes for Carotenoid Biosynthesis in Aphids following an Ancient Transfer from a Fungus (original) (raw)

Abstract

The pea aphid genome was recently found to harbor genes for carotenoid biosynthesis, reflecting an ancestral transfer from a fungus. To explore the evolution of the carotene desaturase gene family within aphids, sequences were retrieved from a set of 34 aphid species representing numerous deeply diverging lineages of aphids and analyzed together with fungal sequences retrieved from databases. All aphids have at least one copy of this gene and some aphid species have up to seven, whereas fungal genomes consistently have a single copy. The closest relatives of aphids, adelgids, also have carotene desaturase; these sequences are most closely related to those from aphids, supporting a shared origin from a fungal to insect transfer predating the divergence of adelgids and aphids. Likewise, all aphids, and adelgids, have carotenoid profiles that are consistent with their biosynthesis using the acquired genes of fungal origin rather than derivation from food plants. The carotene desaturase was acquired from a fungal species outside of Ascomycota or Basidiomycota and closest to Mucoromycotina among sequences available in databases. In aphids, an ongoing pattern of gene duplication is indicated by the presence of both anciently and recently diverged paralogs within genomes and by the presence of a high frequency of pseudogenes that appear to be recently inactivated. Recombination among paralogs is evident, making analyses of patterns of selection difficult, but tests of selection for a nonrecombining region indicates that duplications tend to be followed by bouts of positive selection. Species of Macrosiphini, which often show color polymorphisms, typically have a larger number of desaturase copies relative to other species sampled in the study. These results indicate that aphid evolution has been accompanied by ongoing evolution of carotenogenic genes, which have undergone duplication, recombination, and occasional positive selection to yield a wide variety of carotenoid profiles in different aphid species.

Introduction

Carotenoids form a large class of compounds that are present in many organisms and that can confer a variety of benefits, including protection from oxidative damage, light detection, photoprotection, and display coloration (Britton et al. 2004, 2006). Although carotenoids are diverse and ubiquitous in organisms, their production depends on a limited set of enzymes that likely originated early in the evolution of cellular life (Klassen 2010). These include phytoene synthase (CrtB family, which produces the initial 40-carbon backbone of the carotenoid hydrocarbon), carotene desaturase (CrtI family, which introduces double bonds between neighboring carbon residues), and carotene cyclase (CrtL family, which introduces six-carbon rings at the termini of the C40 backbone). Additional enzymes, present in particular groups of organisms, can introduce further modifications, such as the addition of hydroxyl groups or shortening or lengthening of the carbon backbone. The core enzymatic machinery for carotenoid production is encoded in the genomes of many Bacteria, Archaea, unicellular eukaryotes, fungi, and plants. Phylogenetic analyses indicate an ancient origin of carotenoid biosynthesis, followed by limited amounts of horizontal transfer and gene duplication in particular lineages of prokaryotes (Klassen 2010).

Animals require carotenoids, for vision, display coloration, and other cellular functions (Britton et al. 2006), but most animals lack the genes encoding carotenoid biosynthetic machinery and thus must obtain carotenoids from food. Unexpectedly, Acyrthosiphon pisum (pea aphid) possesses functional carotenoid biosynthetic genes, providing the first instance of carotenoid production by animals, as well as an unusual case of an animal genome acquiring foreign genes of known function (Moran and Jarvik 2010). Phylogenetic analyses revealed that aphid carotenogenic genes are derived from an ancestral transfer of DNA from a fungal genome to an insect genome.

In both A. pisum and certain fungi, phytoene synthase and carotene cyclase are fused and are encoded in the same chromosomal region as carotene desaturase, in a distinctive bidirectionally transcribed arrangement not known from other carotenogenic organisms. In A. pisum, this unit has undergone several duplication events subsequent to its acquisition, resulting in four copies of the genes for carotene desaturase and three for the fused phytoene synthase/carotene cyclase. Such duplication of carotenoid biosynthetic genes is unusual in bacteria, fungi, or plants and potentially reflects subfunctionalization (e.g., Gallagher et al. 2004). Indeed, in A. pisum, only one of the four copies of the desaturase is responsible for the production of the red carotenoid torulene (Moran and Jarvik 2010). This observation raises the possibility that carotenogenic genes have undergone duplication and subfunctionalization through positive selection for the production of specific compounds.

Here, we address the evolution of horizontally transferred carotenoid biosynthetic genes within aphids. We use newly obtained sequences and analyses to address the following questions: At what point in arthropod evolution did the transfer from fungal to animal genome occur, and how many descendant lineages maintain the capability for carotenoid biosynthesis? What group of fungi donated the genes, and does this give clues to the ecological association that may have facilitated the gene transfer? What is the extent of duplication of these genes in different aphid lineages, and does the extent of duplication relate to the profile of carotenoids present? Do duplication events coincide with positive selection in descendant copies, as might be expected if paralogs are being selected for distinct activities? We address these questions by analyzing carotene desaturase sequences newly isolated from a set of species representing major aphid groups and derived from publicly available fungal genome sequences.

Materials and Methods

Study Species

We surveyed 38 insect species including 34 aphid species (Aphididae) plus four species from related insect groups (Adelgidae, Psyllidae, Aleyrodidae) (table 1). The aphids included species from subfamilies and tribes corresponding to several deeply branching lineages, based on von Dohlen and Moran (2000). Among these were representatives of several aphid species that are nearly white in life (Aspidophorodon longicaudus, Hyperomyzus pallidus, Macrosiphum diervillae, Eucallipterus tiliae, Monelliopsis caryae). For all species, DNA was extracted and used as template in efforts to amplify a region of the gene encoding carotene desaturase. DNA from lab cultures of several species was also used for Southern blot experiments aimed at detecting copies of carotene desaturase. Finally, tissues from some species were assayed for their carotenoid profiles.

Table 1.

Insect Species for Which Carotenoid Biosynthetic Genes Were Studied.

Taxonomic Group Species Name PCRa Carot. Assayd South Blote
Aphidinae: Aphidini Aphis craccivora 26/1 + +
Aphis nerii 16/2 + +
Rhopalosiphum padi 12/5
Schizaphis graminum 36/4 +
Aphidinae: Macrosiphini Acyrthosiphon pisum 4c c +
Aspidophorodon longicaudus 6/4
Brevicoryne brassicae 6/3
Diuraphis noxia 6/2 +
Hyperomyzus pallidus 4/1
Macrosiphum diervillae 5/4
Macrosiphum euphorbiae 1/1 +
Macrosiphum gaurae 19/7 +
Myzus persicae 24/6 +
Sitobion avenae 15/7 + +
Uroleucon ambrosiae 7/2
Wahlgreniella nervata red 19/7 +
Wahlgreniella nervata green 35/7 +
Calaphidinae: Calaphidini Calaphis sp. 2/1b
Calaphidinae: Panaphidini Myzocallis agrifolicola 5/3 +
Eucallipterus tiliae 5/1 +
Monelliopsis caryae 5/1
Drepanosiphinae Drepanaphis utahensis 1/1 +
Eriosomatinae: Eriosomatini Eriosoma lanigerum 9/2 +
Eriosomatinae: Fordini Geopemphigus floccosus 7/3 +
Melaphis rhois 8/1
Schlechtendalia chinensis 9/1
Eriosomatinae: Pemphigini Pemphigus populiramulorum 4/1
Eriosomatinae: Prociphilini Prociphilus sp. 4/1
Hormaphidinae: Hormaphidini Hamamelistes spinosus 5/3
Hormaphis hamamelidis 5/3 +
Chaitophorinae: Chaitophorini Chaitophorus populifolii 8/2
Chaitophorus stevensis 8/3
Lachninae: Cinarini Cinara sp. 3/2 +
Cinara pinea group 4/1
Cinara ponderosae 4/2
Tuberolachnus salignus 1/1b
Adelgidae Adelges cooleyi 1/1b +
Adelges laricis 1/1b
Aleyrodidae Bemisia tabaci 0/0 +
Psyllidae Pachypsylla venusta 0/0 +
Taxonomic Group Species Name PCRa Carot. Assayd South Blote
Aphidinae: Aphidini Aphis craccivora 26/1 + +
Aphis nerii 16/2 + +
Rhopalosiphum padi 12/5
Schizaphis graminum 36/4 +
Aphidinae: Macrosiphini Acyrthosiphon pisum 4c c +
Aspidophorodon longicaudus 6/4
Brevicoryne brassicae 6/3
Diuraphis noxia 6/2 +
Hyperomyzus pallidus 4/1
Macrosiphum diervillae 5/4
Macrosiphum euphorbiae 1/1 +
Macrosiphum gaurae 19/7 +
Myzus persicae 24/6 +
Sitobion avenae 15/7 + +
Uroleucon ambrosiae 7/2
Wahlgreniella nervata red 19/7 +
Wahlgreniella nervata green 35/7 +
Calaphidinae: Calaphidini Calaphis sp. 2/1b
Calaphidinae: Panaphidini Myzocallis agrifolicola 5/3 +
Eucallipterus tiliae 5/1 +
Monelliopsis caryae 5/1
Drepanosiphinae Drepanaphis utahensis 1/1 +
Eriosomatinae: Eriosomatini Eriosoma lanigerum 9/2 +
Eriosomatinae: Fordini Geopemphigus floccosus 7/3 +
Melaphis rhois 8/1
Schlechtendalia chinensis 9/1
Eriosomatinae: Pemphigini Pemphigus populiramulorum 4/1
Eriosomatinae: Prociphilini Prociphilus sp. 4/1
Hormaphidinae: Hormaphidini Hamamelistes spinosus 5/3
Hormaphis hamamelidis 5/3 +
Chaitophorinae: Chaitophorini Chaitophorus populifolii 8/2
Chaitophorus stevensis 8/3
Lachninae: Cinarini Cinara sp. 3/2 +
Cinara pinea group 4/1
Cinara ponderosae 4/2
Tuberolachnus salignus 1/1b
Adelgidae Adelges cooleyi 1/1b +
Adelges laricis 1/1b
Aleyrodidae Bemisia tabaci 0/0 +
Psyllidae Pachypsylla venusta 0/0 +

b

PCR product amplified with primers PD-D3F/PD-D7R.

c

Moran and Jarvik (2010).

Table 1.

Insect Species for Which Carotenoid Biosynthetic Genes Were Studied.

Taxonomic Group Species Name PCRa Carot. Assayd South Blote
Aphidinae: Aphidini Aphis craccivora 26/1 + +
Aphis nerii 16/2 + +
Rhopalosiphum padi 12/5
Schizaphis graminum 36/4 +
Aphidinae: Macrosiphini Acyrthosiphon pisum 4c c +
Aspidophorodon longicaudus 6/4
Brevicoryne brassicae 6/3
Diuraphis noxia 6/2 +
Hyperomyzus pallidus 4/1
Macrosiphum diervillae 5/4
Macrosiphum euphorbiae 1/1 +
Macrosiphum gaurae 19/7 +
Myzus persicae 24/6 +
Sitobion avenae 15/7 + +
Uroleucon ambrosiae 7/2
Wahlgreniella nervata red 19/7 +
Wahlgreniella nervata green 35/7 +
Calaphidinae: Calaphidini Calaphis sp. 2/1b
Calaphidinae: Panaphidini Myzocallis agrifolicola 5/3 +
Eucallipterus tiliae 5/1 +
Monelliopsis caryae 5/1
Drepanosiphinae Drepanaphis utahensis 1/1 +
Eriosomatinae: Eriosomatini Eriosoma lanigerum 9/2 +
Eriosomatinae: Fordini Geopemphigus floccosus 7/3 +
Melaphis rhois 8/1
Schlechtendalia chinensis 9/1
Eriosomatinae: Pemphigini Pemphigus populiramulorum 4/1
Eriosomatinae: Prociphilini Prociphilus sp. 4/1
Hormaphidinae: Hormaphidini Hamamelistes spinosus 5/3
Hormaphis hamamelidis 5/3 +
Chaitophorinae: Chaitophorini Chaitophorus populifolii 8/2
Chaitophorus stevensis 8/3
Lachninae: Cinarini Cinara sp. 3/2 +
Cinara pinea group 4/1
Cinara ponderosae 4/2
Tuberolachnus salignus 1/1b
Adelgidae Adelges cooleyi 1/1b +
Adelges laricis 1/1b
Aleyrodidae Bemisia tabaci 0/0 +
Psyllidae Pachypsylla venusta 0/0 +
Taxonomic Group Species Name PCRa Carot. Assayd South Blote
Aphidinae: Aphidini Aphis craccivora 26/1 + +
Aphis nerii 16/2 + +
Rhopalosiphum padi 12/5
Schizaphis graminum 36/4 +
Aphidinae: Macrosiphini Acyrthosiphon pisum 4c c +
Aspidophorodon longicaudus 6/4
Brevicoryne brassicae 6/3
Diuraphis noxia 6/2 +
Hyperomyzus pallidus 4/1
Macrosiphum diervillae 5/4
Macrosiphum euphorbiae 1/1 +
Macrosiphum gaurae 19/7 +
Myzus persicae 24/6 +
Sitobion avenae 15/7 + +
Uroleucon ambrosiae 7/2
Wahlgreniella nervata red 19/7 +
Wahlgreniella nervata green 35/7 +
Calaphidinae: Calaphidini Calaphis sp. 2/1b
Calaphidinae: Panaphidini Myzocallis agrifolicola 5/3 +
Eucallipterus tiliae 5/1 +
Monelliopsis caryae 5/1
Drepanosiphinae Drepanaphis utahensis 1/1 +
Eriosomatinae: Eriosomatini Eriosoma lanigerum 9/2 +
Eriosomatinae: Fordini Geopemphigus floccosus 7/3 +
Melaphis rhois 8/1
Schlechtendalia chinensis 9/1
Eriosomatinae: Pemphigini Pemphigus populiramulorum 4/1
Eriosomatinae: Prociphilini Prociphilus sp. 4/1
Hormaphidinae: Hormaphidini Hamamelistes spinosus 5/3
Hormaphis hamamelidis 5/3 +
Chaitophorinae: Chaitophorini Chaitophorus populifolii 8/2
Chaitophorus stevensis 8/3
Lachninae: Cinarini Cinara sp. 3/2 +
Cinara pinea group 4/1
Cinara ponderosae 4/2
Tuberolachnus salignus 1/1b
Adelgidae Adelges cooleyi 1/1b +
Adelges laricis 1/1b
Aleyrodidae Bemisia tabaci 0/0 +
Psyllidae Pachypsylla venusta 0/0 +

b

PCR product amplified with primers PD-D3F/PD-D7R.

c

Moran and Jarvik (2010).

Obtaining Sequences of Genes for Carotene Desaturase

Genomic DNA was isolated from fresh or frozen aphids using Qiagen Blood and Tissue kits. The resulting genomic DNA templates were polymerase chain reaction (PCR) amplified with primers complementary to a region of the A. pisum carotene desaturase genes. The chosen amplicon is approximately 1,400 bp in length and spans one 71-bp intron in A. pisum. The primary pair of primers (TGGAGTTGGTGGTACAGCAG and AGATAATCCTAGTATAGAMCCTTTCCA) corresponds to regions that were highly conserved between all three full-length copies of A. pisum carotene desaturase genes, under the assumption that these regions would be most likely to be conserved in other species. In cases in which the initial primer pair failed to produce an amplicon, alternative primers were used (table 2) to maximize retrieval of any carotene desaturase genes present.

Table 2.

List of Oligonucleotides Used for PCR and Southern Hybridization.a

Name Forward Sequence (5′–3′) Name Reverse Sequence (5′–3′) Amplicon Length (bp)
torF19584 TGGAGTTGGTGGTACAGCAG torR20949 AGATAATCCTAGTATAGAMCCTTTCCA 1365
torF19584 TGGAGTTGGTGGTACAGCAG torR20693 CGATGYGRCTRGGWACGT 1109
torF20391 GAYGACAAMGGWGTGGCGA torR20684 CGRCTRGGWACGTTMACRTAAA 293
PD-D3F CCNAGDATNGANCCYYTCCA PD-D7R GCNGARGGNATHTGGTAYCC 689
Probe100F TTYGATCAAGGHCCATCATT Probe601R CCTCCTTTYGGRTACCADAT 499
Name Forward Sequence (5′–3′) Name Reverse Sequence (5′–3′) Amplicon Length (bp)
torF19584 TGGAGTTGGTGGTACAGCAG torR20949 AGATAATCCTAGTATAGAMCCTTTCCA 1365
torF19584 TGGAGTTGGTGGTACAGCAG torR20693 CGATGYGRCTRGGWACGT 1109
torF20391 GAYGACAAMGGWGTGGCGA torR20684 CGRCTRGGWACGTTMACRTAAA 293
PD-D3F CCNAGDATNGANCCYYTCCA PD-D7R GCNGARGGNATHTGGTAYCC 689
Probe100F TTYGATCAAGGHCCATCATT Probe601R CCTCCTTTYGGRTACCADAT 499

a

Amplicon length corresponds to the gene region for carotene desaturase from A cyrthosiphon pisum genomic contigs.

Table 2.

List of Oligonucleotides Used for PCR and Southern Hybridization.a

Name Forward Sequence (5′–3′) Name Reverse Sequence (5′–3′) Amplicon Length (bp)
torF19584 TGGAGTTGGTGGTACAGCAG torR20949 AGATAATCCTAGTATAGAMCCTTTCCA 1365
torF19584 TGGAGTTGGTGGTACAGCAG torR20693 CGATGYGRCTRGGWACGT 1109
torF20391 GAYGACAAMGGWGTGGCGA torR20684 CGRCTRGGWACGTTMACRTAAA 293
PD-D3F CCNAGDATNGANCCYYTCCA PD-D7R GCNGARGGNATHTGGTAYCC 689
Probe100F TTYGATCAAGGHCCATCATT Probe601R CCTCCTTTYGGRTACCADAT 499
Name Forward Sequence (5′–3′) Name Reverse Sequence (5′–3′) Amplicon Length (bp)
torF19584 TGGAGTTGGTGGTACAGCAG torR20949 AGATAATCCTAGTATAGAMCCTTTCCA 1365
torF19584 TGGAGTTGGTGGTACAGCAG torR20693 CGATGYGRCTRGGWACGT 1109
torF20391 GAYGACAAMGGWGTGGCGA torR20684 CGRCTRGGWACGTTMACRTAAA 293
PD-D3F CCNAGDATNGANCCYYTCCA PD-D7R GCNGARGGNATHTGGTAYCC 689
Probe100F TTYGATCAAGGHCCATCATT Probe601R CCTCCTTTYGGRTACCADAT 499

a

Amplicon length corresponds to the gene region for carotene desaturase from A cyrthosiphon pisum genomic contigs.

To recover sequences from multiple loci that may have amplified with these primers, products of successful PCRs were cloned into Promega pGEM-T Easy vectors, and in average, 20 transformant colonies from each species were picked and their inserts amplified by colony PCR using T7 and SP6 primers. These products were then Sanger sequenced on an ABI3700 sequencer using services at the University of Arizona or at Yale University. Resulting reads for each colony were assembled into a single sequence using Sequencher and manually curated to remove obvious base-calling errors. Subsequently, all sequences with identities ≥ 99% were assembled into consensus sequences. In all cases, this process resulted only in the collapse of sequences from the same species. Divergence less than 1% may reflect sequencing error, cloning artifacts, allelic variation, or a combination. In the A. pisum genome, carotenoid biosynthetic genes show pairwise divergence of alleles of about 0.13%, whereas paralogous copies are >10% divergent at the nucleotide sequence level (Moran and Jarvik 2010). Thus, although we cannot definitively discriminate allelic variation from divergence between duplicate loci in the newly determined sequences, our cutoff would correctly assign these in A. pisum. The number of sequenced PCR products, the number of resulting consensual sequences for each species, and the accession numbers are shown in table 1 and Supplementary Data (Supplementary Material online).

Fungal Sequence

We used available fungal sequences of carotene desaturase in phylogenetic analyses aimed at improving resolution of the source of the transferred genes. Fungal sequences were obtained using blastp with A. pisum sequences as queries to retrieve all fungal homologs of carotene desaturase from GenBank, the Joint Genome Institute Fungal Portal (Mycocosm at http://genome.jgi-psf.org/programs/fungi/index.jsf), and the Fungal Genome Initiative at the Broad Institute (http://www.broadinstitute.org/scientific-community/science/projects/fungal-genome-initiative/fungal-genome-initiative). Most of fungal sequences obtained were from recently sequenced fungal genomes. We retrieved a total of 52 desaturase sequences derived from 52 fungal species. None of the fungal genome sequences contained multiple loci encoding carotene desaturase.

Analyses to Place the Aphid Sequences among Available Sequences from Fungal Species

Fungal protein sequences were combined with a subset of ten translated aphid desaturase sequences. These ten taxa were selected to represent the full diversity of the aphid sequences, based on preliminary analyses of the aphid sequences. A data set consisting of 63 taxa, including one desaturase sequence of bacterial origin as an outgroup, was aligned at a protein level in server-based program MAFFT (Multiple Alignment using Fast Fourier Transformation, http://mafft.cbrc.jp/alignment/server/index.html), using the E-INS-i algorithm with default parameters. The raw alignment was manually corrected in program BioEdit (Hall 1999) and further processed in GBlocks application (Castresana 2000) in order to remove unreliably aligned regions containing gap positions. The resulting alignment was analyzed using maximum likelihood (ML) and Bayesian inference (BI). ML-based analyses and 100 nonparametric bootstrap replicates were performed in the PhyML program (Guindon and Gascuel 2003) with the best fitting model LG + Γ selected in ProtTest 3 (Darriba et al. 2011) and parameters estimated from the data. Bayesian analysis was performed with the Whelan and Goldman model of protein evolution implemented in MrBayes version 3.1.2 (Ronquist and Huelsenbeck 2003) and following parameter settings: rates = gamma, ngen = 4,000,000, samplefreq = 100, and printfreq = 100. Other Markov chain Monte Carlo setting and prior distributions were set at the default values.

Analyses to Detect Recombination of the Gene Copies

The intron sequence was manually spliced out, and the set of 102 sequences was aligned as described above. Potentially pseudogenized sequences containing frameshifts or premature stop codons were temporarily removed from the matrix and a reduced alignment was tested for the presence of recombination breakpoints using Single Breakpoint Recombination and Genetic Algorithms for Recombination Detection (Kosakovsky Pond et al. 2006) algorithms as implemented in the program DataMonkey (Delport et al. 2010).

Tests of Selection for the Clade of Carotene Desaturases within Aphids

As the first nonrecombinant region (627 bp) spanned only 36 bp of several shorter sequences, including outgroup sequences of Adelges laricis and Adelges cooleyi, only the second matrix consisting of 413 nt position was further analyzed. Due to the short length of this matrix, data were analyzed at a nucleotide level only. Analyses were performed as for the initial phylogeny described above with the exception of a few steps concerning the use of nucleotide characters. General time reversible (GTR) + Γ was determined as the best fitting model of molecular evolution in program jModelTest (Posada 2008) and used in ML-based analyses. Bayesian probability was inferred using the same model implemented in MrBayes version 3.1.2. (Ronquist and Huelsenbeck 2003) with the following parameter settings: nst = 6, rates = gamma, ngen = 20,000,000, samplefreq = 100, and printfreq = 100.

To test for positive selection, we analyzed a set of intact partial sequences from the second nonrecombinant matrix region in order to detect changes in selective forces over time. Programs CodeML from the PAML software package (Yang 2007) and DataMonkey (Delport et al. 2010) were used to estimate omega, the ratio of nonsynonymous changes per nonsynonymous site to synonymous changes per synonymous site, or d_N_/d_S_, for particular nodes on phylogenetic trees for the gene family. In CodeML, the unconstrained free-ratio model allowing independent omega values for every branch was used in attempts to estimate variation in omega across the phylogeny. To further investigate when and how selection pressure varied over the evolutionary history, we applied the GA-Branch (Genetic Algorithm-Branch) method (Kosakovsky Pond and Frost 2005a) implemented in the program DataMonkey (Delport et al. 2010). Because the full data set with 80 sequences exceeded capabilities of the program, we used a reduced set of 8 taxa and 29 copies in our tests. To exclude the possibility that choice of copies or taxa was biasing results, we performed several analyses with different sets of sequences to determine if the same branches consistently gave similar estimates of omega. We generated these data sets with the aim of attaining an optimum sequence divergence as suggested by Yang (2002).

Residues likely to change the substrate specificity or site of desaturation are not defined for carotene desaturase. The NADH-binding motif is known (Pecker et al. 1992) and is a nonrecombinant region in our data set. As a second test for potentially detecting positive selection, the region containing the NADH-binding motif was analyzed using site-specific tests SLAC and FEL (Kosakovsky Pond and Frost 2005b) implemented in DataMonkey (Delport et al. 2010).

Southern Blots

Genomic DNA of five aphid species (table 1) was digested overnight with HincIII (New England Biolabs), separated in a 1% agarose gel, and transferred onto a nylon membrane (Roche Diagnostics) using rapid alkaline transfer protocol (Reed and Mann 1985). Preparation of labeled probes targeting a conserved gene region (table 2), Southern hybridization, and detection of hybridization signals using a Image Quant LAS-4000 system (GE Healthcare) were carried out with a DIG DNA labeling and detection kit (Roche Diagnostics).

Carotenoid Profiles

Whole bodies of a subset of species were analyzed for carotenoid profiles using high-performance liquid chromatography at Craft Technologies, as previously described (Moran and Jarvik 2010).

Results

Retrieval of Carotene Desaturase Gene Sequences from Insect Samples

All 34 species of Aphididae yielded PCR products. Following cloning, each species yielded from one to seven distinct DNA sequences encoding carotene desaturase. A total of 98 distinct sequences were obtained, based on collapsing sequences with <1% divergence and starting with a set of 336 sequences. In most cases, PCR product, corresponding to about 1,100 nt and encoding 330 amino acids, was retrieved. A few species, including the adelgids, gave products only for alternative primers that amplified a shorter region and that did not span an intron in the A. pisum sequences. This could reflect sequence divergence at primer sites or intron expansion, as seen in E . tiliae and M . caryae, in which intron lengths exceed 600 bp. Sequences obtained using all primer pairs showed unambiguous homology to carotene desaturase and were most similar to those from A. pisum, based on blastx searches of existing protein databases. Thus, aphids from diverse lineages corresponding to different subfamilies and tribes possess carotene desaturase.

Adelgids and phylloxerids are the closest relatives of Aphididae (von Dohlen and Moran 1995, 2000). Our two adelgid samples yielded a product only for the shorter PCR amplicon of about 700 bp, and the sequence showed clear homology to carotene desaturase. We could not obtain amplicons from samples from Aleyrodidae or Psyllidae, groups more distantly related to aphids within the suborder Sternorrhyncha. Failure to obtain a PCR product is not strong evidence of absence of these genes, so presence of carotene desaturase genes in other Sternorrhyncha, cannot be excluded. However, blast searches of 19,598 expressed sequence tags (ESTs) available in GenBank (12 May 2011) for another psyllid species, Diaphorina citri, did not yield any significant hits for carotene desaturase genes. Based on frequency of carotene desaturase transcripts in the A. pisum EST set (118 of 108,686 ESTs), we would expect about 20 D. citri ESTs if the genes were expressed at the same level, suggesting that they are absent from this psyllid species.

Each aphid species yielded from one to seven distinct carotene desaturase genes, under our criterion of >1% divergence for distinct copies. Although we sequenced multiple clones in an effort to sample the diversity of copies from each species, it is likely that some copies did not amplify, did not clone, or were not selected for sequencing. Thus, our data may underestimate the diversity of copies present within some aphid species. However, in some cases, we invested considerable effort into obtaining sequences for all paralogs, through the use of alternative primers and through extensive sequencing of cloned copies. For example, for Aphis craccivora, 26 clones were sequenced. These were nearly identical and collapsed into a single sequence under our criterion of >99% identity. In some cases, such as Aphis nerii, A. longicaudus, and M acrosiphum gaurae, two very similar copies (with identity only slightly <99%) were retrieved; possibly these represent alleles.

Assessing Number of Carotene Desaturase Genes

To further determine whether different aphid species contain different numbers of carotene desaturase genes, we performed Southern hybridizations with five species that differed in number of copies detected by PCR (supplementary fig. S1, Supplementary Material online). We obtained evidence for four copies in A. pisum clone LSR1, corresponding to the known number of copies in the completely sequenced genome (Moran and Jarvik 2010). For A. craccivora and A. nerii, one and two copies were found, corresponding to the results from cloning and sequencing. We obtained four to five bands for both Myzus persicae and Sitobion avenae, for which PCR and sequencing yielded six distinct sequences. Thus, differences in copy number among species are supported in cases in which more extensive sequencing of clones was carried out. Furthermore, the results suggest a pronounced difference in the extent of duplication for Aphidini and Macrosiphini species, including A. pisum, M. gaurae, W ahlgreniella nervata, and M. persicae, which contain many copies.

Certain aphids are very pale in life, suggesting the absence of carotenoids; however, some of the more saturated carotenoids, such as phytoene or zeta-carotene, have almost no color (Britton et al. 2004). Even very pale aphid species contained at least one apparently intact copy of carotene desaturase, based on our amplification and sequencing of most of the coding region.

Detection of Pseudogenes for Carotene Desaturase

Pseudogenes were detected in numerous species, based on the presence of obvious deletions or base changes that interrupted the reading frame and implied an inactivated gene. Detected pseudogenes were always very closely related to sequences that appeared to be from intact genes, indicating recent inactivation events. For example, all nine pseudogene sequences of A. nerii collapsed to A. nerii copy A according to our <1% divergence criterion, and all contained the same base substitution resulting in an early stop codon. Because we sequenced only part of the gene and did not measure expression, we might not have detected all cases of gene inactivation.

Phylogeny of Carotene Desaturases from Aphids and Fungi

Following preliminary analyses of the aphid-derived desaturase sequences, a set of ten aphid sequences representing the diversity of the larger set was used in an analysis that included all available fungal sequences for carotene desaturase (fig. 1). In topologies inferred by both ML (fig. 1) and BI (supplementary fig. S2, Supplementary Material online), the aphid sequences form a strongly supported clade, which branches outside of the Basidiomycota or Ascomycota. Among available sequences, the aphid sequences are closest to those from the Mucoromycotina. However, many basal fungal lineages are not represented among sequenced genomes, and the branch leading to the aphid clade is long. Thus, the closest relatives of the aphid sequences may fall within a fungal group not represented in our tree, possibly within the Entomophthorales which contains species parasitic in arthropods. The adelgid sequence belonged to the aphid clade with strong support in all analyses but was short and thus not used in most analyses.

ML phylogenetic tree for carotene desaturase proteins from fungi and from representative aphids. Accession numbers other than GenBank GI are designated as follows: GenBank WGS (open circles), the Fungal Genome Initiative at the Broad Institute protein ID (^), the Joint Genome Institute Fungal Portal locus ID (*). Solid circles indicate bootstrap values above 50 and are scaled accordingly. Accession numbers for aphid taxa from top to bottom: JN022723, JN022746, JN022711, JN022731, JN022727, JN022738, JN022781, JN022728, XM_001946654, and JN022748.

Fig. 1.

ML phylogenetic tree for carotene desaturase proteins from fungi and from representative aphids. Accession numbers other than GenBank GI are designated as follows: GenBank WGS (open circles), the Fungal Genome Initiative at the Broad Institute protein ID (^), the Joint Genome Institute Fungal Portal locus ID (*). Solid circles indicate bootstrap values above 50 and are scaled accordingly. Accession numbers for aphid taxa from top to bottom: JN022723, JN022746, JN022711, JN022731, JN022727, JN022738, JN022781, JN022728, XM_001946654, and JN022748.

These observations are most consistent with the hypothesis of a single acquisition of this gene in an ancestor of extant Aphidoidea and Adelgidae from a fungus outside of the Basidiomycota or Ascomycota.

Ongoing Duplications of Carotene Desaturase Genes within Aphids

For the full set of sequences from aphids and adelgids, analyses based on ML and BI (Materials and Methods) gave nearly identical results. However, even after 20,000,000 generations, the BI-derived topology was not fully resolved due to the limited amount of information retained. Thus, we present only the ML phylogeny (fig. 2). The only substantial difference in topology in the BI tree involves the position of two Chaitophorus species, which cluster at the base of Eriosomatinae taxa rather than at the base of the whole Aphididae as in the ML tree. This difference reflects the occurrence of a long branch for these sequences and does not affect general conclusions for evolution of the carotene desaturase genes. As noted, the tree may not contain all copies present in all species included in the analysis, due to failure of some copies to amplify or to be sequenced. Nonetheless, several conclusions are possible.

ML phylogenetic tree for carotene desaturase genes from aphids. Black bars represent the total number of collapsed sequences according to <1% divergence rule. White bars stands for the proportion of sequences that are pseudogenized. Solid circles indicate bootstrap values above 50 and are scaled accordingly. The ambiguous position of Chaitophorus sequences is depicted in gray color. Dark dashed lines highlight the position of the four carotene desaturase copies from the Acyrthosiphon pisum genome.

FIG. 2.

ML phylogenetic tree for carotene desaturase genes from aphids. Black bars represent the total number of collapsed sequences according to <1% divergence rule. White bars stands for the proportion of sequences that are pseudogenized. Solid circles indicate bootstrap values above 50 and are scaled accordingly. The ambiguous position of Chaitophorus sequences is depicted in gray color. Dark dashed lines highlight the position of the four carotene desaturase copies from the A cyrthosiphon pisum genome.

First, gene duplications date to a deep node in the aphid desaturase tree, implying that these genes have undergone duplication since the origin of Aphidinae or earlier. For example, A. pisum copy A corresponds to a cluster within the Macrosiphini that diverges from other Macrosiphini copies at a deep node. However, the most basal node in the tree separates aphids belonging to different subfamilies and thus does not appear to correspond to a duplication event. We note that the rooting is based on the shorter sequences obtained from the adelgid samples and is thus not certain. Fungal sequences were too divergent to be used for rooting.

Second, duplications are ongoing in many lineages, and many species contain closely related sister sequences present within the same sample. Although some of these could be allelic differences, the divergence is always >1% (since sequences closer than this were collapsed) and sometimes much more. Allelic differences are expected to be less than 1%, at least based on data from A. pisum (Moran and Jarvik 2010). Other evidence that at least some close copies represent duplicated loci rather than alleles comes from the retrieval of more than two close variants from samples originating from a single aphid clone grown in the lab. For example, B revicoryne brassicae, M. persicae, R hopalosiphum padi, and S chizaphis graminum were all reared as diploid asexual clones derived from a single female and thus can possess at most two alleles per locus; yet each contains three to five close copies. In some lineages, several unique duplications have occurred; examples include A . longicaudus, B. brassicae, M. persicae, R. padi, and S. graminum. And, in certain deeper clades of aphids, unique duplications have given rise to radiations confined to particular aphid groups. This can be observed most dramatically in the Macrosiphini.

Thus, duplication of genes for carotene desaturase has been ongoing in aphids. Many detected duplicates are closely related, but others are ancient. Persistence of ancient duplicates is more evident in some lineages than in others, based on the evidence for only one or few loci in certain species such as A. craccivora and A. nerii. Together, these observations suggest a high rate of duplication and a high rate of inactivation of recently duplicated copies. At the same time, certain lineages show retention of more ancient duplicates, as in the case of the divergent copies present within genomes of several species of Macrosiphini (A. pisum, M. persicae, M. gaurae, S. avenae, and W. nervata).

Occurrence of Recombination among Carotene Desaturase Families

According to the position of the identified breakpoint, the initial alignment of all 102 sequences was split into two matrices, of 627 and 413 bp in length, respectively. The first matrix contained fewer taxa since not all species were successfully amplified for this region. The 82 nonpseudogenized sequences from the second matrix were further tested for positive selection in alternate reduced data sets.

Assessing Positive and Purifying Selection on Aphid Carotene Desaturase Genes

We performed several analyses designed to detect variation in the average selective forces, measured as omega, over the branches on phylogenies for distinct reduced sets of carotene desaturase genes. All of our analyses supported variation in selection among branches. Several independent GA-Branch analyses assigned four to five omega categories to branches of our phylogenies (Results not shown). Omega values above were consistently retrieved for three percent of the branches, indicating that sporadic positive selection has acted on certain copies of this gene. Figure 3 presents the consensus of results from CodeML and GA-Branch analyses, where three nodes are followed by branches having omega estimates statistically greater than 1; all of these follow duplication events. Positive selection is supported for both deep and recent nodes. For example, the branch leading to node 48, a clade consisting of sequences from Macrosiphini (A. pisum copy C + M. gaurae copy A), shows a strong signature of positive selection. Examples of recent duplications followed by elevated omega values include several cases in which duplicates are confined to one species within our sample (Cinara ponderosae, S. graminum, M. gaurae, and W. nervata).

ML-based topology for aphid carotene desaturases highlighting branches with omega (dN/dS) > 1, indicating positive selection. Red color indicates branches selected in several independent GA-Branch analyses as well as CodeML analysis. Orange highlights branches with omega >1, calculated with the free-ratio model in CodeML (Materials and Methods). Solid circles indicate bootstrap values above 50 and are scaled accordingly. Internal nodes are numbered.

Fig. 3.

ML-based topology for aphid carotene desaturases highlighting branches with omega (d_N_/d_S_) > 1, indicating positive selection. Red color indicates branches selected in several independent GA-Branch analyses as well as CodeML analysis. Orange highlights branches with omega >1, calculated with the free-ratio model in CodeML (Materials and Methods). Solid circles indicate bootstrap values above 50 and are scaled accordingly. Internal nodes are numbered.

Site-specific tests performed on the nonrecombinant region containing the NADH-binding motif did not yield statistically significant evidence of positive selection. Because our tests were limited to a somewhat short alignment, and because our sampling of taxa and gene copies is incomplete, these tests should be regarded as preliminary, but they strongly suggest repeated subfunctionalization associated with adaptive evolution during the evolution of this gene family.

Carotenoid Composition of Aphid Species

Few aphid species have been assayed for carotenoid contents, raising the question of whether these genes continue to function in carotenoid production in diverse aphid lineages and whether the presence of multiple copies correlates with production of diverse carotenoid compounds. To address this, we obtained carotenoid profiles for a subset of species for which sufficient material could be obtained (fig. 4). All species of Aphididae, including very pale aphids, contain carotenoids of the C40 type. This observation strongly supports the view that, in aphids, carotene desaturase as well as phytoene synthase/carotene cyclase continue to function in the biosynthesis of carotenoids of the same types observed in fungi. In pale species, such as E . tiliae and Pemphigus betae, carotenoid profiles are dominated by relatively colorless C40 carotenoids.

Profiles of carotenoids obtained for different aphid species. Width of the bar indicates proportional representation among carotenoids detected in samples. Approximate color in life for each species is presented on the right. Xanthins, which are of likely plant origin, are presented as green colors on the right. Brackets and letters along the left side indicate the higher taxonomic groupings of the hosts and correspond to the taxonomic information presented in table 1.

FIG. 4.

Profiles of carotenoids obtained for different aphid species. Width of the bar indicates proportional representation among carotenoids detected in samples. Approximate color in life for each species is presented on the right. Xanthins, which are of likely plant origin, are presented as green colors on the right. Brackets and letters along the left side indicate the higher taxonomic groupings of the hosts and correspond to the taxonomic information presented in table 1.

The adelgid sample also contained C40 carotenoids, as did the whitefly and psyllid samples. Xanthins, which require a carotene hydroxylase step and which are expected as dominant carotenoids in most plant tissues, most likely represent carotenoids ingested with food. They are largely absent from species of Aphididae (fig. 4), supporting the hypothesis that aphids produce their own carotenoids, using their own enzymatic machinery. Small amounts of xanthins are found in D iuraphis noxia, a species that induces degradation of host plant cells, possibly releasing carotenoid into the ingested plant sap, in Cinara cupressi, and in the whitefly and psyllid samples.

In the case of one copy of carotene desaturase in A. pisum, a distinct role in the production of torulene has been demonstrated (Moran and Jarvik 2010). For other cases, specific compounds cannot be linked to particular gene copies, but the overall picture is that aphids produce a variety of C40 compounds and that C40 compounds dominate among carotenoids present in aphid tissues. Red body color in different aphid species can reflect either the presence of torulene (A. pisum, W. nervata), lycopene (M. gaurae), or both of these red compounds. The presence of C40 carotenoids in psyllids and whiteflies is unexplained and raises the possibility that undetected carotenogenic genes are present in these species. We failed to recover carotene desaturase sequences with degenerate PCR of psyllid and whitefly DNA samples, but these efforts do not provide definitive evidence that such sequences are absent.

Discussion

The presence of multiple loci for carotenoid biosynthesis is highly unusual within sequenced genomes of plants, fungi, and prokaryotes, yet most species of aphids appear to have multiple copies of carotene desaturase genes. These findings on carotene desaturase possibly are illustrative of the evolutionary processes dominating throughout the aphid genome: the A. pisum genome sequence revealed that aphid evolution has involved an exceptionally high level of gene family expansion through duplication (International Aphid Genomics Consortium 2010; Ollivier et al. 2010). For the genome overall, divergences of paralogs are as large as those of orthologs from distantly related aphid species, indicating that duplication resulting in expansion of gene families has been ongoing since near the time of origin of extant aphids or even longer (International Aphid Genomics Consortium 2010). Our results reveal a similar history for carotene desaturase, as indicated by the occurrence of duplication events deep in the clade of aphid gene copies (fig. 2). A high rate of duplications in many lineages is consistent with the finding of many pseudogene copies and suggests that duplication is often soon followed by nonfunctionalization.

However, some paralogs are retained within genomes for long periods; for example, the basal divergence of the paralogs present in A. pisum would predate the divergence of Eriosomatinae, Drepanosiphinae, Lachninae, and Aphidinae, which form basal lineages among extant aphids and which are estimated to have diverged at least 150–80 Ma, based on fossil and molecular evidence (von Dohlen and Moran 2000). Furthermore, paralogs retained in genomes often undergo positive selection for amino acid replacements (fig. 3). Thus, it appears that different copies have specialized to particular functions. Since expression confined to different developmental stages or tissues does not appear likely, functional differences between copies are more likely to involve different substrate specificity and production of specific carotenoid types. In the case of A. pisum, one desaturase copy was shown to be necessary for production of torulene, the basis of red body color (Moran and Jarvik 2010). We note that the other red/green color polymorphic species included in our samples, M. persicae, W. nervata, S. avenae, and M. gaurae, also have large number of desaturase copies. This suggests that the multiple copies are linked to the capacity for evolving novel carotenoid profiles, potentially linked to particular ecological circumstances. Carotene desaturase of bacteria has been used in experimental evolution studies, which have shown that minor changes in the enzyme can lead to the production of novel carotenoids including torulene (Schmidt-Dannert et al. 2000).

The relationship between the phylogenies for the desaturase gene family and for aphid lineages is complex, and our sampling of aphid species is limited. The cluster containing desaturase copies from species in Eriosomatinae, Lachninae, and Aphidinae: Macrosiphini (fig. 2) spans diverse aphid lineages that diverged over a relatively short period estimated to be at least 100 Ma (von Dohlen and Moran 2000; von Dohlen et al. 2006). We note that the Eriosomatinae does not form a clade in our trees and that this is similar to results in other molecular phylogenetic analyses based on single copy aphid genes (e.g., Martinez-Torres et al. 2001; Ortiz-Rivas et al. 2004; Zhang and Qiao 2008).

Our analyses provide a picture of the evolution of carotene desaturase genes in aphids following their acquisition from a fungus. They reveal that these genes have a single origin in a shared ancestor of aphids and adelgids, are ubiquitous in all living aphids, and have diversified through repeated bouts of duplication and selection. This suggests an important role of carotenoids in aphid biology and diversification.

We thank Tyler Jarvik for laboratory work on initial stages of this project, Carol von Dohlen, Robert Foottit, and Eric Maw for providing aphid samples and identifications, Kim Hammond for her help with live aphid colonies and sampling, and Neal Craft and Craft Technologies for the carotenoid assays. This work was supported by the United States National Science Foundation (DEB-0723472, DEB-1106195 to N.A.M.). E.N. was supported on a Fulbright fellowship and by the Grant Agency of the University of South Bohemia (135/2010/P (a)) and the Grant Agency of the Czech Republic (206/09/H026).

References

,

Carotenoids handbook

,

2004

Basel (Switzerland)

Birkhäuser-Verlag

,

Carotenoids: volume 4: natural functions

,

2006

Basel (Switzerland)

Birkhäuser-Verlag

Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis

,

Mol Biol Evol

,

2000

, vol.

17

(pg.

540

-

552

)

ProtTest 3: fast selection of best-fit models of protein evolution

,

Bioinformatics

,

2011

, vol.

27

(pg.

1164

-

1165

)

Datamonkey 2010: a suite of phylogenetic analysis tools for evolutionary biology

,

Bioinformatics

,

2010

, vol.

26

(pg.

2455

-

2457

)

Gene duplication in the carotenoid biosynthetic pathway preceded evolution of the grasses

,

Plant Physiol

,

2004

, vol.

135

(pg.

1776

-

1783

)

A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood

,

Syst Biol

,

2003

, vol.

52

(pg.

696

-

704

)

BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT

,

Nucleic Acids Symp Ser

,

1999

, vol.

41

(pg.

95

-

98

)

International Aphid Genomics Consortium

Genome sequence of the pea aphid Acyrthosiphon pisum

,

PLoS Biol

,

2010

, vol.

8

pg.

e1000313

Phylogenetic and evolutionary patterns in microbial carotenoid biosynthesis are revealed by comparative genomics

,

PLoS One

,

2010

, vol.

5

pg.

311257

A genetic algorithm approach to detecting lineage-specific variation in selection pressure

,

Mol Biol Evol

,

2005a

, vol.

22

(pg.

478

-

485

)

Not so different after all: a comparison of methods for detecting amino acid sites under selection

,

Mol Biol Evol

,

2005b

, vol.

22

(pg.

1208

-

1222

)

Automated phylogenetic detection of recombination using a genetic algorithm

,

Mol Biol Evol

,

2006

, vol.

23

(pg.

1891

-

1901

)

Molecular systematics of aphids and their primary endosymbionts

,

Mol Phylogenet Evol

,

2001

, vol.

20

(pg.

437

-

449

)

Lateral transfer of genes from fungi underlies carotenoid production in aphids

,

Science

,

2010

, vol.

328

(pg.

624

-

627

)

Comparative analysis of the Acyrthosiphon pisum genome and expressed sequence tag-based gene sets from other aphid species

,

Insect Mol Biol

,

2010

, vol.

19

(pg.

33

-

45

)

Molecular systematics of aphids (Homoptera: Aphididae): new insights from the long-wavelength opsin gene

,

Mol Phylogenet Evol

,

2004

, vol.

30

(pg.

24

-

37

)

A single polypeptide catalyzing the conversion of phytoene to zeta-carotene is transcriptionally regulated during tomato fruit ripening

,

Proc Natl Acad Sci U S A

,

1992

, vol.

89

(pg.

4962

-

4966

)

jModelTest: phylogenetic model averaging

,

Mol Biol Evol

,

2008

, vol.

25

(pg.

1253

-

1256

)

Rapid transfer of DNA from agarose gels to nylon membranes

,

Nucleic Acids Res

,

1985

, vol.

25

(pg.

7207

-

7221

)

MRBAYES 3: Bayesian phylogenetic inference under mixed models

,

Bioinformatics

,

2003

, vol.

19

(pg.

1572

-

1574

)

Molecular breeding of carotenoid biosynthetic pathways

,

Nat Biotechnol

,

2000

, vol.

18

(pg.

750

-

753

)

Molecular phylogeny of the Homoptera: a paraphyletic taxon

,

J Mol Evol

,

1995

, vol.

71

(pg.

689

-

717

)

Molecular data support a rapid radiation of aphid radiation of aphids in the Cretaceous and multiple origins of host alternation

,

Biol J Linn Soc

,

2000

, vol.

71

(pg.

689

-

717

)

A test of morphological hypotheses for tribal and subtribal relationships of Aphidinae (Insecta: Hemiptera: Aphididae) using DNA sequences

,

Mol Phylogenet Evol

,

2006

, vol.

38

(pg.

316

-

329

)

Inference of selection from multiple species alignments

,

Curr Opin Genet Dev

,

2002

, vol.

12

(pg.

688

-

694

)

PAML 4: phylogenetic analysis by maximum likelihood

,

Mol Biol Evol

,

2007

, vol.

24

(pg.

1586

-

1591

)

Molecular phylogeny of Pemphiginae (Hemiptera: Aphididae) inferred from nuclear gene EF-1_a_ sequences

,

Bull Entomol Res

,

2008

, vol.

98

(pg.

499

-

507

)

Author notes

Present address: Institute of Parasitology, Faculty of Science, University of South Bohemia, Ceske Budejovice, Czech Republic.

Associate editor: Jennifer Wernegreen

© The Author 2011. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com