Common origin of four diverse families of large eukaryotic DNA viruses - PubMed (original) (raw)

Comparative Study

Common origin of four diverse families of large eukaryotic DNA viruses

L M Iyer et al. J Virol. 2001 Dec.

Abstract

Comparative analysis of the protein sequences encoded in the genomes of three families of large DNA viruses that replicate, completely or partly, in the cytoplasm of eukaryotic cells (poxviruses, asfarviruses, and iridoviruses) and phycodnaviruses that replicate in the nucleus reveals 9 genes that are shared by all of these viruses and 22 more genes that are present in at least three of the four compared viral families. Although orthologous proteins from different viral families typically show weak sequence similarity, because of which some of them have not been identified previously, at least five of the conserved genes appear to be synapomorphies (shared derived characters) that unite these four viral families, to the exclusion of all other known viruses and cellular life forms. Cladistic analysis with the genes shared by at least two viral families as evolutionary characters supports the monophyly of poxviruses, asfarviruses, iridoviruses, and phycodnaviruses. The results of genome comparison allow a tentative reconstruction of the ancestral viral genome and suggest that the common ancestor of all of these viral families was a nucleocytoplasmic virus with an icosahedral capsid, which encoded complex systems for DNA replication and transcription, a redox protein involved in disulfide bond formation in virion membrane proteins, and probably inhibitors of apoptosis. The conservation of the disulfide-oxidoreductase, a major capsid protein, and two virion membrane proteins indicates that the odd-shaped virions of poxviruses have evolved from the more common icosahedral virion seen in asfarviruses, iridoviruses, and phycodnaviruses.

PubMed Disclaimer

Figures

FIG. 1

FIG. 1

Multiple alignments of conserved proteins that define the cytoplasmic DNA virus clade. (A) D5R-like helicases. With the PBCV ATPase as the seed, the ESV ortholog and many phage primases were recovered with highly significant Expectation (E) values in the first iteration. Proteins from the other NCLDV and the distantly related papillomavirus, parvovirus, and positive-strand RNA viruses were recovered in the second and third iterations with E-values of <10−3. For example, ASFV C962R was recovered with an E-value of 10−8 in the third iteration. Further transitive searches identified all of the members of superfamily III helicase. (B) A32L-like ATPases. With the PBCV ATPase as the seed, iridoviral orthologs were recovered in the first iteration with an E-value of <10−5. Orthologs from all other NCLDV were recovered by the third iteration with significant E-values such as 3 × 10−19 for MCV and 2 × 10−04 for ASFV orthologs. (C) A1L-like transcription factors. A profile made with previously detected FCS domains from the polyhomeotic and FIM families of proteins, when run against the NCLDV protein sets, with an inclusion cutoff of 0.01, recovered all members of this family; VV A1L, for example, was recovered with an E-value of 10−4. (D) D13L-like capsid proteins. With p50 of the Spodoptera exigua ascovirus as the seed, the PBCV and other iridoviral capsid proteins were recovered with E-values of <2 × 10−8. The ASFV ortholog was detected in the third iteration with an E-value of 3 × 10−3, and the poxviral D13L-like proteins were recovered at borderline E-values (0.14) in the fourth iteration. When a profile made from the alignment of the PBCV, iridovirus, and ASFV sequences was run against a database of all NCLDV proteins, the poxviral orthologs were detected as top hits, with E-values of <10−5. The probability of the conserved motifs shown here to occur in these proteins by chance was <10−15, as computed by using the MACAW program (49). (E) L1R/F9L-like virion membrane proteins. With CIV 048L as the seed, the ASFV and PBCV orthologs were recovered in the second iteration, with E-values of 8 × 10−4 and 10−3, respectively. The entomopoxviral orthologs were detected in the third iteration with an E-value of 2 × 10−4. A transitive search with the entomopoxviral proteins recovered the other poxviral proteins with E-values of <10−3. Each protein is denoted by the corresponding gene name followed by species abbreviation and the GenBank Identifier (GI) number. The numbers preceding and following the alignments indicate the positions of the first and last residues of the aligned regions in the corresponding protein sequences. The numbers between aligned blocks indicate the number of inserted residues that were omitted from the figure. The coloring reflects the conservation of amino acid residues at 85% consensus. The coloring scheme and the consensus abbreviations are as follows: hydrophobic residues (LIYFMWACV) are designated “h” in the consensus line, aliphatic (LIAV) residues are also shaded yellow and designated “l,” alcohol (S,T) is blue and designated “o,” charged (KERDH) residues are purple and designated “c,” polar (STEDRKHNQ) residues are purple and designated “p,” small (SACGDNPVT) residues are green and designated “s,” big (LIFMWYERKQ) residues are shaded gray and designated “b.” Conserved cysteines predicted to form a Zn-finger structure (C) or a disulfide bond (E) are indicated by white letters against a red background. Secondary structure elements predicted by using the PHD program are indicated in panels C and D; where “E” indicates extended conformation (b-strand) and “H” indicates the α-helix. The abbreviations for the NCLDV are defined in Materials and Methods. Additional abbreviations: AAV, adeno-associated virus 5; AcNPV, Autographa californica nucleopolyhedrovirus; Bf, Bacteroides fragilis, Ce, Caenorhabditis elegans; Cglu, Corynebacterium glutamicum; Cpf, Clostridium perfringens; Dm, Drosophila melanogaster; DpAV4, Diadromus pulchellu_s ascovirus; Ec, Escherichia coli; HPV08, human papillomavirus type 8; Hs, Homo sapiens; LcbA2, Lactobacillus casei bacteriophage A2; Mace, Methanosarcina acetivorans; MStV, maize streak virus; phi-105, Bacteriophage phi-105; phiC31, Bacteriophage phiC31; Polio, human poliovirus 1; SacV, Spodoptera exigua ascovirus; Si, Sulfolobus islandicus; SV40, Simian virus 40; Xf, Xylella fastidiosa._

FIG. 1

FIG. 1

Multiple alignments of conserved proteins that define the cytoplasmic DNA virus clade. (A) D5R-like helicases. With the PBCV ATPase as the seed, the ESV ortholog and many phage primases were recovered with highly significant Expectation (E) values in the first iteration. Proteins from the other NCLDV and the distantly related papillomavirus, parvovirus, and positive-strand RNA viruses were recovered in the second and third iterations with E-values of <10−3. For example, ASFV C962R was recovered with an E-value of 10−8 in the third iteration. Further transitive searches identified all of the members of superfamily III helicase. (B) A32L-like ATPases. With the PBCV ATPase as the seed, iridoviral orthologs were recovered in the first iteration with an E-value of <10−5. Orthologs from all other NCLDV were recovered by the third iteration with significant E-values such as 3 × 10−19 for MCV and 2 × 10−04 for ASFV orthologs. (C) A1L-like transcription factors. A profile made with previously detected FCS domains from the polyhomeotic and FIM families of proteins, when run against the NCLDV protein sets, with an inclusion cutoff of 0.01, recovered all members of this family; VV A1L, for example, was recovered with an E-value of 10−4. (D) D13L-like capsid proteins. With p50 of the Spodoptera exigua ascovirus as the seed, the PBCV and other iridoviral capsid proteins were recovered with E-values of <2 × 10−8. The ASFV ortholog was detected in the third iteration with an E-value of 3 × 10−3, and the poxviral D13L-like proteins were recovered at borderline E-values (0.14) in the fourth iteration. When a profile made from the alignment of the PBCV, iridovirus, and ASFV sequences was run against a database of all NCLDV proteins, the poxviral orthologs were detected as top hits, with E-values of <10−5. The probability of the conserved motifs shown here to occur in these proteins by chance was <10−15, as computed by using the MACAW program (49). (E) L1R/F9L-like virion membrane proteins. With CIV 048L as the seed, the ASFV and PBCV orthologs were recovered in the second iteration, with E-values of 8 × 10−4 and 10−3, respectively. The entomopoxviral orthologs were detected in the third iteration with an E-value of 2 × 10−4. A transitive search with the entomopoxviral proteins recovered the other poxviral proteins with E-values of <10−3. Each protein is denoted by the corresponding gene name followed by species abbreviation and the GenBank Identifier (GI) number. The numbers preceding and following the alignments indicate the positions of the first and last residues of the aligned regions in the corresponding protein sequences. The numbers between aligned blocks indicate the number of inserted residues that were omitted from the figure. The coloring reflects the conservation of amino acid residues at 85% consensus. The coloring scheme and the consensus abbreviations are as follows: hydrophobic residues (LIYFMWACV) are designated “h” in the consensus line, aliphatic (LIAV) residues are also shaded yellow and designated “l,” alcohol (S,T) is blue and designated “o,” charged (KERDH) residues are purple and designated “c,” polar (STEDRKHNQ) residues are purple and designated “p,” small (SACGDNPVT) residues are green and designated “s,” big (LIFMWYERKQ) residues are shaded gray and designated “b.” Conserved cysteines predicted to form a Zn-finger structure (C) or a disulfide bond (E) are indicated by white letters against a red background. Secondary structure elements predicted by using the PHD program are indicated in panels C and D; where “E” indicates extended conformation (b-strand) and “H” indicates the α-helix. The abbreviations for the NCLDV are defined in Materials and Methods. Additional abbreviations: AAV, adeno-associated virus 5; AcNPV, Autographa californica nucleopolyhedrovirus; Bf, Bacteroides fragilis, Ce, Caenorhabditis elegans; Cglu, Corynebacterium glutamicum; Cpf, Clostridium perfringens; Dm, Drosophila melanogaster; DpAV4, Diadromus pulchellu_s ascovirus; Ec, Escherichia coli; HPV08, human papillomavirus type 8; Hs, Homo sapiens; LcbA2, Lactobacillus casei bacteriophage A2; Mace, Methanosarcina acetivorans; MStV, maize streak virus; phi-105, Bacteriophage phi-105; phiC31, Bacteriophage phiC31; Polio, human poliovirus 1; SacV, Spodoptera exigua ascovirus; Si, Sulfolobus islandicus; SV40, Simian virus 40; Xf, Xylella fastidiosa._

FIG. 1

FIG. 1

Multiple alignments of conserved proteins that define the cytoplasmic DNA virus clade. (A) D5R-like helicases. With the PBCV ATPase as the seed, the ESV ortholog and many phage primases were recovered with highly significant Expectation (E) values in the first iteration. Proteins from the other NCLDV and the distantly related papillomavirus, parvovirus, and positive-strand RNA viruses were recovered in the second and third iterations with E-values of <10−3. For example, ASFV C962R was recovered with an E-value of 10−8 in the third iteration. Further transitive searches identified all of the members of superfamily III helicase. (B) A32L-like ATPases. With the PBCV ATPase as the seed, iridoviral orthologs were recovered in the first iteration with an E-value of <10−5. Orthologs from all other NCLDV were recovered by the third iteration with significant E-values such as 3 × 10−19 for MCV and 2 × 10−04 for ASFV orthologs. (C) A1L-like transcription factors. A profile made with previously detected FCS domains from the polyhomeotic and FIM families of proteins, when run against the NCLDV protein sets, with an inclusion cutoff of 0.01, recovered all members of this family; VV A1L, for example, was recovered with an E-value of 10−4. (D) D13L-like capsid proteins. With p50 of the Spodoptera exigua ascovirus as the seed, the PBCV and other iridoviral capsid proteins were recovered with E-values of <2 × 10−8. The ASFV ortholog was detected in the third iteration with an E-value of 3 × 10−3, and the poxviral D13L-like proteins were recovered at borderline E-values (0.14) in the fourth iteration. When a profile made from the alignment of the PBCV, iridovirus, and ASFV sequences was run against a database of all NCLDV proteins, the poxviral orthologs were detected as top hits, with E-values of <10−5. The probability of the conserved motifs shown here to occur in these proteins by chance was <10−15, as computed by using the MACAW program (49). (E) L1R/F9L-like virion membrane proteins. With CIV 048L as the seed, the ASFV and PBCV orthologs were recovered in the second iteration, with E-values of 8 × 10−4 and 10−3, respectively. The entomopoxviral orthologs were detected in the third iteration with an E-value of 2 × 10−4. A transitive search with the entomopoxviral proteins recovered the other poxviral proteins with E-values of <10−3. Each protein is denoted by the corresponding gene name followed by species abbreviation and the GenBank Identifier (GI) number. The numbers preceding and following the alignments indicate the positions of the first and last residues of the aligned regions in the corresponding protein sequences. The numbers between aligned blocks indicate the number of inserted residues that were omitted from the figure. The coloring reflects the conservation of amino acid residues at 85% consensus. The coloring scheme and the consensus abbreviations are as follows: hydrophobic residues (LIYFMWACV) are designated “h” in the consensus line, aliphatic (LIAV) residues are also shaded yellow and designated “l,” alcohol (S,T) is blue and designated “o,” charged (KERDH) residues are purple and designated “c,” polar (STEDRKHNQ) residues are purple and designated “p,” small (SACGDNPVT) residues are green and designated “s,” big (LIFMWYERKQ) residues are shaded gray and designated “b.” Conserved cysteines predicted to form a Zn-finger structure (C) or a disulfide bond (E) are indicated by white letters against a red background. Secondary structure elements predicted by using the PHD program are indicated in panels C and D; where “E” indicates extended conformation (b-strand) and “H” indicates the α-helix. The abbreviations for the NCLDV are defined in Materials and Methods. Additional abbreviations: AAV, adeno-associated virus 5; AcNPV, Autographa californica nucleopolyhedrovirus; Bf, Bacteroides fragilis, Ce, Caenorhabditis elegans; Cglu, Corynebacterium glutamicum; Cpf, Clostridium perfringens; Dm, Drosophila melanogaster; DpAV4, Diadromus pulchellu_s ascovirus; Ec, Escherichia coli; HPV08, human papillomavirus type 8; Hs, Homo sapiens; LcbA2, Lactobacillus casei bacteriophage A2; Mace, Methanosarcina acetivorans; MStV, maize streak virus; phi-105, Bacteriophage phi-105; phiC31, Bacteriophage phiC31; Polio, human poliovirus 1; SacV, Spodoptera exigua ascovirus; Si, Sulfolobus islandicus; SV40, Simian virus 40; Xf, Xylella fastidiosa._

FIG. 1

FIG. 1

Multiple alignments of conserved proteins that define the cytoplasmic DNA virus clade. (A) D5R-like helicases. With the PBCV ATPase as the seed, the ESV ortholog and many phage primases were recovered with highly significant Expectation (E) values in the first iteration. Proteins from the other NCLDV and the distantly related papillomavirus, parvovirus, and positive-strand RNA viruses were recovered in the second and third iterations with E-values of <10−3. For example, ASFV C962R was recovered with an E-value of 10−8 in the third iteration. Further transitive searches identified all of the members of superfamily III helicase. (B) A32L-like ATPases. With the PBCV ATPase as the seed, iridoviral orthologs were recovered in the first iteration with an E-value of <10−5. Orthologs from all other NCLDV were recovered by the third iteration with significant E-values such as 3 × 10−19 for MCV and 2 × 10−04 for ASFV orthologs. (C) A1L-like transcription factors. A profile made with previously detected FCS domains from the polyhomeotic and FIM families of proteins, when run against the NCLDV protein sets, with an inclusion cutoff of 0.01, recovered all members of this family; VV A1L, for example, was recovered with an E-value of 10−4. (D) D13L-like capsid proteins. With p50 of the Spodoptera exigua ascovirus as the seed, the PBCV and other iridoviral capsid proteins were recovered with E-values of <2 × 10−8. The ASFV ortholog was detected in the third iteration with an E-value of 3 × 10−3, and the poxviral D13L-like proteins were recovered at borderline E-values (0.14) in the fourth iteration. When a profile made from the alignment of the PBCV, iridovirus, and ASFV sequences was run against a database of all NCLDV proteins, the poxviral orthologs were detected as top hits, with E-values of <10−5. The probability of the conserved motifs shown here to occur in these proteins by chance was <10−15, as computed by using the MACAW program (49). (E) L1R/F9L-like virion membrane proteins. With CIV 048L as the seed, the ASFV and PBCV orthologs were recovered in the second iteration, with E-values of 8 × 10−4 and 10−3, respectively. The entomopoxviral orthologs were detected in the third iteration with an E-value of 2 × 10−4. A transitive search with the entomopoxviral proteins recovered the other poxviral proteins with E-values of <10−3. Each protein is denoted by the corresponding gene name followed by species abbreviation and the GenBank Identifier (GI) number. The numbers preceding and following the alignments indicate the positions of the first and last residues of the aligned regions in the corresponding protein sequences. The numbers between aligned blocks indicate the number of inserted residues that were omitted from the figure. The coloring reflects the conservation of amino acid residues at 85% consensus. The coloring scheme and the consensus abbreviations are as follows: hydrophobic residues (LIYFMWACV) are designated “h” in the consensus line, aliphatic (LIAV) residues are also shaded yellow and designated “l,” alcohol (S,T) is blue and designated “o,” charged (KERDH) residues are purple and designated “c,” polar (STEDRKHNQ) residues are purple and designated “p,” small (SACGDNPVT) residues are green and designated “s,” big (LIFMWYERKQ) residues are shaded gray and designated “b.” Conserved cysteines predicted to form a Zn-finger structure (C) or a disulfide bond (E) are indicated by white letters against a red background. Secondary structure elements predicted by using the PHD program are indicated in panels C and D; where “E” indicates extended conformation (b-strand) and “H” indicates the α-helix. The abbreviations for the NCLDV are defined in Materials and Methods. Additional abbreviations: AAV, adeno-associated virus 5; AcNPV, Autographa californica nucleopolyhedrovirus; Bf, Bacteroides fragilis, Ce, Caenorhabditis elegans; Cglu, Corynebacterium glutamicum; Cpf, Clostridium perfringens; Dm, Drosophila melanogaster; DpAV4, Diadromus pulchellu_s ascovirus; Ec, Escherichia coli; HPV08, human papillomavirus type 8; Hs, Homo sapiens; LcbA2, Lactobacillus casei bacteriophage A2; Mace, Methanosarcina acetivorans; MStV, maize streak virus; phi-105, Bacteriophage phi-105; phiC31, Bacteriophage phiC31; Polio, human poliovirus 1; SacV, Spodoptera exigua ascovirus; Si, Sulfolobus islandicus; SV40, Simian virus 40; Xf, Xylella fastidiosa._

FIG. 2

FIG. 2

Consensus cladogram of cytoplasmic DNA viruses. The cladistic analysis was performed as described in the text. The proteins that were probably present in the common ancestor of the universally supported NCLDV clade are superimposed on the consensus tree. Also shown on the consensus tree are the state changes in each of the terminal lineages and the strictly supported clades. The plus sign indicates a character that is most parsimoniously explained as an independent gain that was most likely acquired through horizontal transfer between the viral genome or through transfer from the host genome. The minus sign denotes the loss of an ancestral character in a particular lineage.

Similar articles

Cited by

References

    1. Afonso C L, Tulman E R, Lu Z, Oma E, Kutish G F, Rock D L. The genome of Melanoplus sanguinipes entomopoxvirus. J Virol. 1999;73:533–552. - PMC - PubMed
    1. Afonso C L, Tulman E R, Lu Z, Zsak L, Kutish G F, Rock D L. The genome of fowlpox virus. J Virol. 2000;74:3815–3831. - PMC - PubMed
    1. Alba M M, Das R, Orengo C A, Kellam P. Genomewide function conservation and phylogeny in the Herpesviridae. Genome Res. 2001;11:43–54. - PMC - PubMed
    1. Altschul S F, Koonin E V. PSI-BLAST—a tool for making discoveries in sequence databases. Trends Biochem Sci. 1998;23:444–447. - PubMed
    1. Altschul S F, Madden T L, Schaffer A A, Zhang J, Zhang Z, Miller W, Lipman D J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources