Discovery of the principal specific transcription factors of Apicomplexa and their implication for the evolution of the AP2-integrase DNA binding domains - PubMed (original) (raw)

Discovery of the principal specific transcription factors of Apicomplexa and their implication for the evolution of the AP2-integrase DNA binding domains

S Balaji et al. Nucleic Acids Res. 2005.

Abstract

The comparative genomics of apicomplexans, such as the malarial parasite Plasmodium, the cattle parasite Theileria and the emerging human parasite Cryptosporidium, have suggested an unexpected paucity of specific transcription factors (TFs) with DNA binding domains that are closely related to those found in the major families of TFs from other eukaryotes. This apparent lack of specific TFs is paradoxical, given that the apicomplexans show a complex developmental cycle in one or more hosts and a reproducible pattern of differential gene expression in course of this cycle. Using sensitive sequence profile searches, we show that the apicomplexans possess a lineage-specific expansion of a novel family of proteins with a version of the AP2 (Apetala2)-integrase DNA binding domain, which is present in numerous plant TFs. About 20-27 members of this apicomplexan AP2 (ApiAP2) family are encoded in different apicomplexan genomes, with each protein containing one to four copies of the AP2 DNA binding domain. Using gene expression data from Plasmodium falciparum, we show that guilds of ApiAP2 genes are expressed in different stages of intraerythrocytic development. By analogy to the plant AP2 proteins and based on the expression patterns, we predict that the ApiAP2 proteins are likely to function as previously unknown specific TFs in the apicomplexans and regulate the progression of their developmental cycle. In addition to the ApiAP2 family, we also identified two other novel families of AP2 DNA binding domains in bacteria and transposons. Using structure similarity searches, we also identified divergent versions of the AP2-integrase DNA binding domain fold in the DNA binding region of the PI-SceI homing endonuclease and the C-terminal domain of the pleckstrin homology (PH) domain-like modules of eukaryotes. Integrating these findings, we present a reconstruction of the evolutionary scenario of the AP2-integrase DNA binding domain fold, which suggests that it underwent multiple independent combinations with different types of mobile endonucleases or recombinases. It appears that the eukaryotic versions have emerged from versions of the domain associated with mobile elements, followed by independent lineage-specific expansions, which accompanied their recruitment to transcription regulation functions.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Alignment of AP2 domains. Proteins are denoted by their gene names, species abbreviations and GenBank identifier (gi) numbers. The number of AP2 domains in a polypeptide is shown to the right of the alignment. Residues involved in contacting DNA in the solution structure of the AP2 domain (pdb id: 1GCC) are shown below the alignment. The secondary structure was derived from the solution structure of the AP2 domain (PDB ID: 1GCC). E represents a β strand; H, helix. The coloring reflects the conservation profile at 80% consensus. The coloring scheme and consensus abbreviations are as follows: h, hydrophobic (h: ACFILMVWY) and a, aromatic (a: FWY) residues shaded yellow; b, big (LIYERFQKMW) residues shaded gray, s, small (AGSVCDN) residues colored green; and p, polar (STEDKRNQHC) residues colored magenta. Species abbreviations are as follows: APMV: Acanthamoeba polyphaga mimivirus; Atha: A.thaliana; Atum: Agrobacterium tumefaciens; BP01: Bacteriophage Felix 01; BPCorn: Mycobacteriophage Corndog; BPHK022: Enterobacteria phage HK022; BPRB49: Enterobacteria phage RB49; BPST3: Streptococcus thermophilus bacteriophage ST3; BPT1: Enterobacteria phage T1; BPT5: Bacteriophage T5; BPT7: Enterobacteria phage T7; BPXp10: X.oryzae bacteriophage Xp10; BPphig1e: Bacteriophage phig1e; Caur: Chloroflexus aurantiacus; Chom: Cryptosporidium hominis; Cpar: C.parvum; Dpsy: D.psychrophila; Ecol: Escherichia coli; Efae: Enterococcus faecalis; Ghir: Gossypium hirsutum; Lesc: Lycopersicon esculentum; Lmon: Listeria monocytogenes; Lpla: Lactobacillus plantarum; Nsyl: Nicotiana sylvestris; Pfa: Plasmodium falciparum; Rbal: Rhodopirellula baltica; Spyo: Streptococcus pyogenes; Taes: Triticum aestivum; Theileria annulata; Tery: Trichodesmium erythraeum; Tfus: Thermobifida fusca; Tthe: Tetrahymena thermophila; Vvul: Vibrio vulnificus.

Figure 1

Figure 1

Alignment of AP2 domains. Proteins are denoted by their gene names, species abbreviations and GenBank identifier (gi) numbers. The number of AP2 domains in a polypeptide is shown to the right of the alignment. Residues involved in contacting DNA in the solution structure of the AP2 domain (pdb id: 1GCC) are shown below the alignment. The secondary structure was derived from the solution structure of the AP2 domain (PDB ID: 1GCC). E represents a β strand; H, helix. The coloring reflects the conservation profile at 80% consensus. The coloring scheme and consensus abbreviations are as follows: h, hydrophobic (h: ACFILMVWY) and a, aromatic (a: FWY) residues shaded yellow; b, big (LIYERFQKMW) residues shaded gray, s, small (AGSVCDN) residues colored green; and p, polar (STEDKRNQHC) residues colored magenta. Species abbreviations are as follows: APMV: Acanthamoeba polyphaga mimivirus; Atha: A.thaliana; Atum: Agrobacterium tumefaciens; BP01: Bacteriophage Felix 01; BPCorn: Mycobacteriophage Corndog; BPHK022: Enterobacteria phage HK022; BPRB49: Enterobacteria phage RB49; BPST3: Streptococcus thermophilus bacteriophage ST3; BPT1: Enterobacteria phage T1; BPT5: Bacteriophage T5; BPT7: Enterobacteria phage T7; BPXp10: X.oryzae bacteriophage Xp10; BPphig1e: Bacteriophage phig1e; Caur: Chloroflexus aurantiacus; Chom: Cryptosporidium hominis; Cpar: C.parvum; Dpsy: D.psychrophila; Ecol: Escherichia coli; Efae: Enterococcus faecalis; Ghir: Gossypium hirsutum; Lesc: Lycopersicon esculentum; Lmon: Listeria monocytogenes; Lpla: Lactobacillus plantarum; Nsyl: Nicotiana sylvestris; Pfa: Plasmodium falciparum; Rbal: Rhodopirellula baltica; Spyo: Streptococcus pyogenes; Taes: Triticum aestivum; Theileria annulata; Tery: Trichodesmium erythraeum; Tfus: Thermobifida fusca; Tthe: Tetrahymena thermophila; Vvul: Vibrio vulnificus.

Figure 2

Figure 2

Structures of different domains of the AP2-IDBD fold. Strands and helices of the AP2-IDBD fold are colored green and pink, respectively. PDB ids for the displayed structures as follows; 1gcc: GCC-box binding domain; 1bb8: tn916 integrase DNA binding domain; 1kjk: lambda integrase N-terminal domain; 1qqg: Insulin receptor substrate 1 (IRS-1); 1lwt: PI-SceI homing endonuclease DNA binding domain.

Figure 3

Figure 3

DNA interactions of the AP2 domain. The solution structure of the A.thaliana GCC-box binding domain in complex with DNA (PDB Id: 1gcc) is shown. Strands are colored green and the helix is colored pink. Complementary DNA strands are labeled I and II and colored orange and yellow, respectively. The side-chains of DNA-contacting residues are displayed in the ball and stick format. Residues that interact with DNA bases are colored pink and those that predominantly interact with the DNA backbone are colored blue. Red arrows indicate positions that are well conserved in the ApiAP2 family (see Figure 1 and Table 1 for the equivalent residues in the ApiAP2 proteins).

Figure 4

Figure 4

Domain architectures of AP2 domain proteins. Domains are represented by their standard notations. ATH represents the AT-hook. The protein naming scheme and species abbreviations are as in Figure 1.

Figure 5

Figure 5

Expression patterns of AP2 proteins. Stage-specific expression of the ApiAp2 TFs and their potential target genes during the IDC. Microarray gene expression data were available for 46 timepoints as shown (26). Using _K_-means clustering, the predicted ApiAp2 TFs were grouped into five clusters. The first four clusters correspond to the four major developmental stages: (a) ring (b) trophozoite (c) early schizont and (d) schizont, whereas the fifth cluster (e) consists of genes that show the expression at two discontinuous developmental stages. Gene names for the ApiAp2 domain containing proteins are given by the sides, and an arrow next to the gene name indicates the presence of an ortholog in Cryposporidium. Note that there is at least one TF from each stage that has an ortholog in Cryptosporidium. The graphs on the right represent the average expression profile of non-ApiAp2 genes that show a high correlation in their expression profile with the ApiAp2 genes. The expression of such genes in a stage-specific manner suggests that these genes could be the potential targets for the predicted TFs.

Similar articles

Cited by

References

    1. Lodish H., Berk A., Zipursky S.L., Matsudaira P., Baltimore D., Darnell J.E. Molecular Cell Biology. NY: W.H. Freeman & Co.; 1999.
    1. Cramer P. Common structural features of nucleic acid polymerases. Bioessays. 2002;24:724–729. - PubMed
    1. Iyer L.M., Koonin E.V., Aravind L. Evolutionary connection between the catalytic subunits of DNA-dependent RNA polymerases and eukaryotic RNA-dependent RNA polymerases and the origin of RNA polymerases. BMC Struct. Biol. 2003;3:1. - PMC - PubMed
    1. Borukhov S., Nudler E. RNA polymerase holoenzyme: structure, function and biological implications. Curr. Opin. Microbiol. 2003;6:93–100. - PubMed
    1. Langer D., Hain J., Thuriaux P., Zillig W. Transcription in archaea: similarity to that in eucarya. Proc. Natl Acad. Sci. USA. 1995;92:5768–5772. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources