Abhiman Saraswathi | National Institutes of Health (original) (raw)

Papers by Abhiman Saraswathi

Research paper thumbnail of Natural History of the Eukaryotic Chromatin Protein Methylation System

Progress in Molecular Biology and Translational Science, 2011

In eukaryotes, methylation of nucleosomal histones and other nuclear proteins is a central aspect... more In eukaryotes, methylation of nucleosomal histones and other nuclear proteins is a central aspect of chromatin structure and dynamics. The past 15 years have seen an enormous advance in our understanding of the biochemistry of these modifications, and of their role in establishing the epigenetic code. We provide a synthetic overview, from an evolutionary perspective, of the main players in the eukaryotic chromatin protein methylation system, with an emphasis on catalytic domains. Several components of the eukaryotic protein methylation system had their origins in bacteria. In particular, the Rossmann fold protein methylases (PRMTs and DOT1), and the LSD1 and jumonji-related demethylases and oxidases, appear to have emerged in the context of bacterial peptide methylation and hydroxylation systems. These systems were originally involved in synthesis of peptide secondary metabolites, such as antibiotics, toxins, and siderophores. The peptidylarginine deiminases appear to have been acquired by animals from bacterial enzymes that modify cell-surface proteins. SET domain methylases, which display the β-clip fold, apparently first emerged in prokaryotes from the SAF superfamily of carbohydrate-binding domains. However, even in bacteria, a subset of the SET domains might have evolved a chromatin-related role in conjunction with a BAF60a/b-like SWIB domain protein and topoisomerases. By the time of the last eukaryotic common ancestor, multiple SET and PRMT methylases were already in place and are likely to have mediated methylation at the H3K4, H3K9, H3K36, and H4K20 positions, and carried out both asymmetric and symmetric arginine dimethylation. Inference of H3K27 methylation in the ancestral eukaryote appears uncertain, though it was certainly in place a little later in eukaryotic evolution. Current data suggest that unlike SET methylases, which are universally present in eukaryotes, demethylases are not. They appear to be absent in the earliest-branching eukaryotic lineages, and emerged later along with several other chromatin proteins, such as the Dot1-methylase, prior to divergence of the kinetoplastid-heterolobosean lineage from the remaining eukaryotes. This period also corresponds to the point of origin of DNA cytosine methylation by DNMT1. Origin of major lineages of SET domains such as the Trithorax, Su(var)3-9, Ash1, SMYD, and TTLL12 and E(Z) might have played the initial role in the establishment of multiple distinct heterochromatic and euchromatic states that are likely to have been present, in some form, through much of eukaryotic evolution. Elaboration of these chromatin states might have gone hand-in-hand with acquisition of multiple jumonji-related and LSD1-like demethylases, and functional linkages with the DNA methylation and RNAi systems. Throughout eukaryotic evolution, there were several lineage-specific expansions of SET domain proteins, which might be related to a special transcription regulation process in trypanosomes, acquisition of new meiotic recombination hotspots in animals, and methylation and associated modifications of the diatom silaffin proteins involved in silica biomineralization. The use of specific domains to "read" the methylation marks appears to have been present in the ancestral eukaryote itself. Of these the chromo-like domains appear to have been acquired from bacterial secreted proteins that might have a role in binding cell-surface peptides or peptidoglycan. Domain architectures of the primary enzymes involved in the eukaryotic protein methylation system indicate key features relating to interactions with each other and other modifications in chromatin, such as acetylation. They also emphasize the profound functional distinction between the role of demethylation and deacetylation in regulation of chromatin dynamics.

Research paper thumbnail of Natural History of Eukaryotic DNA Methylation Systems

Progress in Molecular Biology and Translational Science, 2011

Methylation of cytosines and adenines in DNA is a widespread epigenetic mark in both prokaryotes ... more Methylation of cytosines and adenines in DNA is a widespread epigenetic mark in both prokaryotes and eukaryotes. In eukaryotes, it has a profound influence on chromatin structure and dynamics. Recent advances in genomics and biochemistry have considerably elucidated the functions and provenance of these DNA modifications. DNA methylases appear to have emerged first in bacterial restriction-modification (R-M) systems from ancient RNA-modifying enzymes, in transitions that involved acquisition of novel catalytic residues and DNA-recognition features. DNA adenine methylases appear to have been acquired by ciliates, heterolobosean amoeboflagellates, and certain chlorophyte algae. Six distinct clades of cytosine methylases, including the DNMT1, DNMT2, and DNMT3 clades, were acquired by eukaryotes through independent lateral transfer of their precursors from bacteria or bacteriophages. In addition to these, multiple adenine and cytosine methylases were acquired by several families of eukaryotic transposons. In eukaryotes, the DNA-methylase module was often combined with distinct modified and unmodified peptide recognition domains and other modules mediating specialized interactions, for example, the RFD module of DNMT1 which contains a permuted Sm domain linked to a helix-turn-helix domain. In eukaryotes, the evolution of DNA methylases appears to have proceeded in parallel to the elaboration of histone-modifying enzymes and the RNAi system, with functions related to counter-viral and counter-transposon defense, and regulation of DNA repair and differential gene expression being their primary ancestral functions. Diverse DNA demethylation systems that utilize base-excision repair via DNA glycosylases and cytosine deaminases appear to have emerged in multiple eukaryotic lineages. Comparative genomics suggests that the link between cytosine methylation and DNA glycosylases probably emerged first in a novel R-M system in bacteria. Recent studies suggest that the 5mC is not a terminal DNA modification, with enzymes of the Tet/JBP family of 2-oxoglutarate- and iron-dependent dioxygenases further hydroxylating it to form 5-hydroxymethylcytosine (5hmC). These enzymes emerged first in bacteriophages and appear to have been transferred to eukaryotes on one or more occasions. Eukaryotes appear to have recruited three major types of DNA-binding domains (SRA/SAD, TAM/MBD, and CXXC) in discriminating DNA with methylated or unmethylated cytosines. Analysis of the domain architectures of these domains and the DNA methylases suggests that early in eukaryotic evolution they developed a close functional link with SET-domain methylases and Jumonji-related demethylases that operate on peptides in chromatin proteins. In several eukaryotes, other functional connections were elaborated in the form of various combinations between domains related to DNA methylation and those involved in ATP-dependent chromatin remodeling and RNAi. In certain eukaryotes, such as mammals and angiosperms, novel dependencies on the DNA methylation system emerged, which resulted in it affecting unexpected aspects of the biology of these organisms such as parent-offspring interactions. In genomic terms, this was reflected in the emergence of new proteins related to methylation, such as Stella. The well-developed methylation systems of certain heteroloboseans, stramenopiles, chlorophytes, and haptophyte indicate that these might be new model systems to explore the relevance of DNA modifications in eukaryotes.

Research paper thumbnail of Evolution of Eukaryotic Chromatin Proteins and Transcription Factors

Protein Families, 2013

ABSTRACT Comparative genomics of eukaryotes has profoundly impacted our understanding of the regu... more ABSTRACT Comparative genomics of eukaryotes has profoundly impacted our understanding of the regulatory systems involved in transcription and chromatin dynamics. The absolute numbers of specific transcription factors (TFs) and chromatin proteins (CPs) are positively correlated with proteome size in eukaryotes. Comparative analysis of known and predicted CPs allows reconstruction of the early evolutionary history of histone and DNA modification, nucleosome assembly, and chromatin-remodeling systems. Eukaryotic DNA methylases in particular appear to have emerged via multiple independent transfers from bacteria. Even though key histone modifications are universal to eukaryotes, domain architectures of proteins binding posttranslationally modified histones vary considerably across eukaryotes. This indicates that any epigenetic information stored in them might be “interpreted” differently in different lineages. The complexity of domain architectures of CPs appears to have increased in several lineages in the course of eukaryotic evolution and may have had a role in the origin of multicellularity and cell differentiation.

Research paper thumbnail of Large-scale prediction of function shift in protein families with a focus on enzymatic function

Proteins: Structure, Function, and Bioinformatics, 2005

Protein function shift can be predicted from sequence comparisons, either using positive selectio... more Protein function shift can be predicted from sequence comparisons, either using positive selection signals or evolutionary rate estimation. None of the methods have been validated on large datasets, however. Here we investigate existing and novel methods for protein function shift prediction, and benchmark the accuracy against a large dataset of proteins with known enzymatic functions. Function change was predicted between subfamilies by identifying two kinds of sites in a multiple sequence alignment: Conservation-Shifting Sites (CSS), which are conserved in two subfamilies using two different amino acid types, and Rate-Shifting Sites (RSS), which have different evolutionary rates in two subfamilies. CSS were predicted by a new entropy-based method, and RSS using the Rate-Shift program. In principle, the more CSS and RSS between two subfamilies, the more likely a function shift between them. A test dataset was built by extracting subfamilies from Pfam with different EC numbers that belong to the same domain family. Subfamilies were generated automatically using a phylogenetic tree-based program, BETE. The dataset comprised 997 subfamily pairs with four or more members per subfamily. We observed a significant increase in CSS and RSS for subfamily comparisons with different EC numbers compared to cases with same EC numbers. The discrimination was better using RSS than CSS, and was more pronounced for larger families. Combining RSS and CSS by discriminant analysis improved classification accuracy to 71%. The method was applied to the Pfam database and the results are available at http://FunShift.cgb.ki.se. A closer examination of some superfamily comparisons showed that single EC numbers sometimes embody distinct functional classes. Hence, the measured accuracy of function shift is underestimated.

Research paper thumbnail of FunShift: a database of function shift analysis on protein subfamilies

Nucleic Acids Research, 2004

Members of a protein family normally have a general biochemical function in common, but frequentl... more Members of a protein family normally have a general biochemical function in common, but frequently one or more subgroups have evolved a slightly different function, such as different substrate specificity. It is important to detect such function shifts for a more accurate functional annotation. The FunShift database described here is a compilation of function shift analysis performed between subfamilies in protein families. It consists of two main components: (i) subfamilies derived from protein domain families and (ii) pairwise subfamily comparisons analyzed for function shift. The present release, FunShift 12, was derived from Pfam 12 and consists of 151 934 subfamilies derived from 7300 families. We carried out function shift analysis by two complementary methods on families with up to 500 members. From a total of 179 210 subfamily pairs, 62 384 were predicted to be functionally shifted in 2881 families. Each subfamily pair is provided with a markup of probable functional specificity-determining sites. Tools for searching and exploring the data are provided to make this database a valuable resource for protein function annotation. Knowledge of these functionally important sites will be useful for experimental biologists performing functional mutation studies. FunShift is available at http:// FunShift.cgb.ki.se.

Research paper thumbnail of SUPFAM--a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes

Nucleic Acids Research, 2002

Members of a superfamily of proteins could result from divergent evolution of homologues with ins... more Members of a superfamily of proteins could result from divergent evolution of homologues with insignificant similarity in the amino acid sequences. A superfamily relationship is detected commonly after the threedimensional structures of the proteins are determined using X-ray analysis or NMR. The SUPFAM database described here relates two homologous protein families in a multiple sequence alignment database of either known or unknown structure. The present release (1.1), which is the first version of the SUPFAM database, has been derived by analysing Pfam, which is one of the commonly used databases of multiple sequence alignments of homologous proteins. The first step in establishing SUPFAM is to relate Pfam families with the families in PALI, which is an alignment database of homologous proteins of known structure that is derived largely from SCOP. The second step involves relating Pfam families which could not be associated reliably with a protein superfamily of known structure. The profile matching procedure, IMPALA, has been used in these steps. The first step resulted in identification of 1280 Pfam families (out of 2697, i.e. 47%) which are related, either by close homologous connection to a SCOP family or by distant relationship to a SCOP family, potentially forming new superfamily connections. Using the profiles of 1417 Pfam families with apparently no structural information, an all-against-all comparison involving a sequence-profile match using IMPALA resulted in clustering of 67 homologous protein families of Pfam into 28 potential new superfamilies. Expansion of groups of related proteins of yet unknown structural information, as proposed in SUPFAM, should help in identifying 'priority proteins' for structure determination in structural genomics initiatives to expand the coverage of structural information in the protein sequence space. For example, we could assign 858 distinct Pfam domains in 2203 of the gene products in the genome of Mycobacterium tubercolosis. Fifty-one of these Pfam families of unknown structure could be clustered into 17 potentially new superfamilies forming good targets for structural genomics. SUPFAM database can be accessed at http://pauling.mbu.iisc.ernet.in/~supfam.

Research paper thumbnail of Origin and evolution of peptide-modifying dioxygenases and identification of the wybutosine hydroxylase/hydroperoxidase

Nucleic Acids Research, 2010

Unlike classical 2-oxoglutarate and iron-dependent dioxygenases, which include several nucleic ac... more Unlike classical 2-oxoglutarate and iron-dependent dioxygenases, which include several nucleic acid modifiers, the structurally similar jumonji-related dioxygenase superfamily was only known to catalyze peptide modifications. Using comparative genomics methods, we predict that a family of jumonji-related enzymes catalyzes wybutosine hydroxylation/peroxidation at position 37 of eukaryotic tRNAPhe. Identification of this enzyme raised questions regarding the emergence of protein-and nucleic acid-modifying activities among jumonjirelated domains. We addressed these with a natural classification of DSBH domains and reconstructed the precursor of the dioxygenases as a sugar-binding domain. This precursor gave rise to sugar epimerases and metal-binding sugar isomerases. The sugar isomerase active site was exapted for catalysis of oxygenation, with a radiation of these enzymes in bacteria, probably due to impetus from the primary oxygenation event in Earth's history. 2-Oxoglutarate-dependent versions appear to have further expanded with rise of the tricarboxylic acid cycle. We identify previously under-appreciated aspects of their active site and multiple independent innovations of 2-oxoacidbinding basic residues among these superfamilies. We show that double-stranded b-helix dioxygenases diversified extensively in biosynthesis and modification of halogenated siderophores, antibiotics, peptide secondary metabolites and glycine-rich collagen-like proteins in bacteria. Jumonji-related domains diversified into three distinct lineages in bacterial secondary metabolism systems and these were precursors of the three major clades of eukaryotic enzymes. The specificity of wybutosine hydroxylase/peroxidase probably relates to the structural similarity of the modified moiety to the ancestral amino acid substrate of this superfamily.

Research paper thumbnail of PRODOC: a resource for the comparison of tethered protein domain architectures with in-built information on remotely related domain families

Nucleic Acids Research, 2005

PROtein Domain Organization and Comparison (PRODOC) comprises several programs that enable conven... more PROtein Domain Organization and Comparison (PRODOC) comprises several programs that enable convenient comparison of proteins as a sequence of domains. The in-built dataset currently consists of $698 000 proteins from 192 organisms with complete genomic data, and all the SWISSPROT proteins obtained from the Pfam database. All the entries in PRODOC are represented as a sequence of functional domains, assigned using hidden Markov models, instead of as a sequence of amino acids. On average 69% of the proteins in the proteomes and 49% of the residues are covered by functional domain assignments. Software tools allow the user to query the dataset with a sequence of domains and identify proteins with the same or a jumbled or circularly permuted arrangement of domains. As it is proposed that proteins with jumbled or the same domain sequences have similar functions, this search tool is useful in assigning the overall function of a multi-domain protein. Unique features of PRODOC include the generation of alignments between multi-domain proteins on the basis of the sequence of domains and in-built information on distantly related domain families forming superfamilies. It is also possible using PRODOC to identify domain sharing and gene fusion events across organisms. An exhaustive genomegenome comparison tool in PRODOC also enables the detection of successive domain sharing and domain fusion events across two organisms. The tool permits the identification of gene clusters involved in similar biological processes in two closely related organisms. The URL for PRODOC is http:// hodgkin.mbu.iisc.ernet.in/~prodoc.

Research paper thumbnail of Amidoligases with ATP-grasp, glutamine synthetase-like and acetyltransferase-like domains: synthesis of novel metabolites and peptide modifications of proteins

Molecular BioSystems, 2009

Recent studies have shown that the ubiquitin system had its origins in ancient cofactor/amino aci... more Recent studies have shown that the ubiquitin system had its origins in ancient cofactor/amino acid biosynthesis pathways. Preliminary studies also indicated that conjugation systems for other peptide tags on proteins, such as pupylation, have evolutionary links to cofactor/amino acid biosynthesis pathways. Following up on these observations, we systematically investigated the non-ribosomal amidoligases of the ATP-grasp, glutamine synthetase-like and acetyltransferase folds by classifying the known members and identifying novel versions. We then established their contextual connections using information from domain architectures and conserved gene neighborhoods. This showed remarkable, previously uncharacterized functional links between diverse peptide ligases, several peptidases of unrelated folds and enzymes involved in synthesis of modified amino acids. Using the network of contextual connections we were able to predict numerous novel pathways for peptide synthesis and modification, amine-utilization, secondary metabolite synthesis and potential peptide-tagging systems. One potential peptide-tagging system, which is widely distributed in bacteria, involves an ATP-grasp domain and a glutamine synthetase-like ligase, both of which are circularly permuted, an NTN-hydrolase fold peptidase and a novel alpha helical domain. Our analysis also elucidates key steps in the biosynthesis of antibiotics such as friulimicin, butirosin and bacilysin and cell surface structures such as capsular polymers and teichuronopeptides. We also report the discovery of several novel ribosomally synthesized bacterial peptide metabolites that are cyclized via amide and lactone linkages formed by ATP-grasp enzymes. We present an evolutionary scenario for the multiple convergent origins of peptide ligases in various folds and clarify the bacterial origin of eukaryotic peptide-tagging enzymes of the TTL family.

Research paper thumbnail of Prediction of Function Divergence in Protein Families Using the Substitution Rate Variation Parameter Alpha

Molecular Biology and Evolution, 2006

Protein families typically embody a range of related functions and may thus be decomposed into su... more Protein families typically embody a range of related functions and may thus be decomposed into subfamilies with, for example, distinct substrate specificities. Detection of functionally divergent subfamilies is possible by methods for recognizing branches of adaptive evolution in a gene tree. As the number of genome sequences is growing rapidly, it is highly desirable to automatically detect subfamily function divergence. To this end, we here introduce a method for large-scale prediction of function divergence within protein families. It is called the alpha shift measure (ASM) as it is based on detecting a shift in the shape parameter (alpha [a]) of the substitution rate gamma distribution. Four different methods for estimating a were investigated. We benchmarked the accuracy of ASM using function annotation from Enzyme Commission numbers within Pfam protein families divided into subfamilies by the automatic tree-based method BETE. In a test using 563 subfamily pairs in 162 families, ASM outperformed functional site-based methods using rate or conservation shifting (rate shift measure [RSM] and conservation shift measure [CSM]). The best results were obtained using the ''GZ-Gamma'' method for estimating a. By combining ASM with RSM and CSM using linear discriminant analysis, the prediction accuracy was further improved.

Research paper thumbnail of Characterization of a Trypanosoma cruzi acetyltransferase: cellular location, activity and structure☆

Molecular and Biochemical Parasitology, 2007

Trypanosoma cruzi and Trypanosoma brucei are flagellated protozoan parasites that cause Chagas di... more Trypanosoma cruzi and Trypanosoma brucei are flagellated protozoan parasites that cause Chagas disease and African trypanosomiasis in Latin American and African countries, respectively. Currently, over 8 million people are infected with T. cruzi and about 25 million more are at risk. About half a million people are affected by T. brucei. Trypanosome species share many peculiar biological and biochemical features, such as RNA editing. In contrast, they exhibit profound differences at the level of host-parasite interaction and disease pathology. Both parasites are transmitted to their host via different insect vectors. There are no available vaccines, and the current treatments have severe adverse effects. We were involved in sequencing the T. cruzi genome, an initiative launched by the WHO to increase our knowledge of the molecular basis of the parasite. The aim of this thesis was to participate in the sequencing and analysis of the T. cruzi genome, and use the data to investigate acetyltransferase enzymes, presumably linked to important metabolic pathways, as possible drug targets. In Papers I and II, we describe genome sequencing and analysis of two distinct T. cruzi strains. One of the selected strains, CL Brener, was found to be a genetic hybrid of two divergent strains; and it contains about 22 000 genes, encoded on 700 scaffolds with a total genome size of 110 Mb. About 50% of the genes are of unknown function, and lack homology to other sequenced eukaryotes. Large numbers of members of surface molecule gene families, such as trans-sialidase, mucin, mucin-associated protein, and GP63 were found. Comparative analyses revealed that TcI had a smaller genome by up to about 11 Mb. The genome size difference was linked to genes encoding surface molecules and to other repeats and repeated genes. Additionally, six reading frames present in TcVI were not detected in TcI. Genetic polymorphisms such as, indels, microsatellites and SNPs were identified and analyzed. Many genes were found to be under different selective pressures in T. cruzi, indicating differential evolutionary rates, signifying their importance to parasite biology. Within syntenic regions, the two genomes have the same gene complement. Identified features warrant sequencing of further T. cruzi strains, and findings from our studies offer opportunities for more targeted functional studies as well as tools for epidemiology. In the second part of this thesis, Papers III to V, a Trypanosoma cruzi acetyltransferase gene family, identified in the genome project, was chosen for functional characterization as a first step to evaluate its potential as drug target. Acetyltransferases are responsible for protein acetylation, where an acetyl molecule is transferred from acetyl-Coenzyme A to lysine residues in a protein sequence, N-epsilon acetylation, and to N-termini of proteins or peptides, protein N-alpha acetylation. N-alpha acetylation is linked to many metabolic pathways, influences protein stability, protein-protein interaction, localization to organelles and acts as degradation signals. The impact of this posttranslational modification in parasite is not known. We have identified T. cruzi NatC and A, and show that they are expressed in the three life cycle stages (epimastigote, trypomastigote, and amastigote). The catalytic and auxiliary subunits form a complex in vivo. Additionally, they partially co-sediment with the ribosome and may have both co-translational and post-translational protein acetylation functions. In epimastigote, the catalytic subunit of T. cruzi NatA was localized both in the nuclear periphery and in cytoplasm, whereas NatC was predominantly assigned to the cytoplasm. The auxiliary subunit of NatA was mainly confined to the cytoplasm with cytoskeletal-like labelling, whereas NatC showed a punctate profile. Interestingly, the staining patterns of the different subunits analysed for NatA and NatC differ between the life cycle stages, which suggests differential regulation and expression. The native substrates for NatC and predicted NatA, are similar to those described in yeast and humans, suggesting evolutionary conserved functions. The proteins appear to acetylate a large number of proteins N-terminally, suggesting that manipulation of the enzymes may simultaneously affect many cellular functions and thereby could interfere with or abolish infection. Additionally, our data indicate that NatC and A, may have both N-alpha and N-epsilon acetylation potential. Collectively, the genome analyses presented here have provided more molecular insights into the parasite's biology, and have narrowed the gaps between scientific communities working on parasite research. The identification of Nats and native substrates has hopefully laid a solid foundation for future study of Nats, which could provide chemotherapeutic targets for parasitic diseases. LIST OF PUBLICATIONS This thesis is based on the following list of publications referred to in the text by their roman numerals: I. The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease.

Research paper thumbnail of Comparative genomics uncovers novel structural and functional features of the heterotrimeric GTPase signaling system

Gene, 2011

Any queries or remarks that have arisen during the processing of your manuscript are listed below... more Any queries or remarks that have arisen during the processing of your manuscript are listed below and highlighted by flags in the proof. Click on the 'Q' link to go to the location in the proof.

Research paper thumbnail of A new family of polymerases related to superfamily A DNA polymerases and T7-like DNA-dependent RNA polymerases

Biology Direct, 2008

Using sequence profile methods and structural comparisons we characterize a previously unknown fa... more Using sequence profile methods and structural comparisons we characterize a previously unknown family of nucleic acid polymerases in a group of mobile elements from genomes of diverse bacteria, an algal plastid and certain DNA viruses, including the recently reported Sputnik virus. Using contextual information from domain architectures and gene-neighborhoods we present evidence that they are likely to possess both primase and DNA polymerase activity, comparable to the previously reported prim-pol proteins. These newly identified polymerases help in defining the minimal functional core of superfamily A DNA polymerases and related RNA polymerases. Thus, they provide a framework to understand the emergence of both DNA and RNA polymerization activity in this class of enzymes. They also provide evidence that enigmatic DNA viruses, such as Sputnik, might have emerged from mobile elements coding these polymerases. Reviewers This article was reviewed by Eugene Koonin and Mark Ragan.

Research paper thumbnail of MutL homologs in restriction-modification systems and the origin of eukaryotic MORC ATPases

Biology Direct, 2008

The provenance and biochemical roles of eukaryotic MORC proteins have remained poorly understood ... more The provenance and biochemical roles of eukaryotic MORC proteins have remained poorly understood since the discovery of their prototype MORC1, which is required for meiotic nuclear division in animals. The MORC family contains a combination of a gyrase, histidine kinase, and MutL (GHKL) and S5 domains that together constitute a catalytically active ATPase module. We identify the prokaryotic MORCs and establish that the MORC family belongs to a larger radiation of several families of GHKL proteins (paraMORCs) in prokaryotes. Using contextual information from conserved gene neighborhoods we show that these proteins primarily function in restriction-modification systems, in conjunction with diverse superfamily II DNA helicases and endonucleases. The common ancestor of these GHKL proteins, MutL and topoisomerase ATPase modules appears to have catalyzed structural reorganization of protein complexes and concomitant DNA-superstructure manipulations along with fused or standalone nuclease ...

Research paper thumbnail of Large-scale evaluation of function shift in protein families

Research paper thumbnail of BEN: a novel domain in chromatin factors and DNA viral proteins

Bioinformatics, 2008

We report a previously uncharacterized α-helical module, the BEN domain, in diverse animal protei... more We report a previously uncharacterized α-helical module, the BEN domain, in diverse animal proteins such as BANP/SMAR1, NAC1 and the Drosophila mod(mdg4) isoform C, in the chordopoxvirus virosomal protein E5R and in several proteins of polydnaviruses. Contextual analysis suggests that the BEN domain mediates protein–DNA and protein–protein interactions during chromatin organization and transcription. The presence of BEN domains in a poxviral early virosomal protein and in polydnaviral proteins also suggests a possible role for them in organization of viral DNA during replication or transcription. Contact: aravind@ncbi.nlm.nih.gov Supplementary information: Supplementary data for this study can also be accessed at http://www.ncbi.nlm.nih.gov/CBBresearch/Lakshmin/BEN/

Research paper thumbnail of Characterization of a Trypanosoma cruzi acetyltransferase: cellular location, activity and structure☆

Molecular and Biochemical Parasitology, 2007

Trypanosomatids are widespread parasites that cause three major tropical diseases. In trypanosoma... more Trypanosomatids are widespread parasites that cause three major tropical diseases. In trypanosomatids, as in most other organisms, acetylation is a common protein modification that is important in multiple, diverse processes. This paper describes a new member of the Trypanosoma cruzi acetyltransferase family. The gene is single copy and orthologs are also present in the other two sequenced trypanosomatids, Trypanosoma brucei and Leishmania major. This protein (TcAT-1) has the essential motifs present in members of the GCN5-related acetyltransferase (GNAT) family, as well as an additional motif also found in some enzymes from plant and animal species. The protein is evolutionarily more closely related to this group of enzymes than to histone acetyltransferases. The native protein has a cytosolic cellular location and is present in all three life-cycle stages of the parasite. The recombinant protein was shown to have autoacetylation enzymatic activity.

Research paper thumbnail of Natural History of the Eukaryotic Chromatin Protein Methylation System

Progress in Molecular Biology and Translational Science, 2011

In eukaryotes, methylation of nucleosomal histones and other nuclear proteins is a central aspect... more In eukaryotes, methylation of nucleosomal histones and other nuclear proteins is a central aspect of chromatin structure and dynamics. The past 15 years have seen an enormous advance in our understanding of the biochemistry of these modifications, and of their role in establishing the epigenetic code. We provide a synthetic overview, from an evolutionary perspective, of the main players in the eukaryotic chromatin protein methylation system, with an emphasis on catalytic domains. Several components of the eukaryotic protein methylation system had their origins in bacteria. In particular, the Rossmann fold protein methylases (PRMTs and DOT1), and the LSD1 and jumonji-related demethylases and oxidases, appear to have emerged in the context of bacterial peptide methylation and hydroxylation systems. These systems were originally involved in synthesis of peptide secondary metabolites, such as antibiotics, toxins, and siderophores. The peptidylarginine deiminases appear to have been acquired by animals from bacterial enzymes that modify cell-surface proteins. SET domain methylases, which display the β-clip fold, apparently first emerged in prokaryotes from the SAF superfamily of carbohydrate-binding domains. However, even in bacteria, a subset of the SET domains might have evolved a chromatin-related role in conjunction with a BAF60a/b-like SWIB domain protein and topoisomerases. By the time of the last eukaryotic common ancestor, multiple SET and PRMT methylases were already in place and are likely to have mediated methylation at the H3K4, H3K9, H3K36, and H4K20 positions, and carried out both asymmetric and symmetric arginine dimethylation. Inference of H3K27 methylation in the ancestral eukaryote appears uncertain, though it was certainly in place a little later in eukaryotic evolution. Current data suggest that unlike SET methylases, which are universally present in eukaryotes, demethylases are not. They appear to be absent in the earliest-branching eukaryotic lineages, and emerged later along with several other chromatin proteins, such as the Dot1-methylase, prior to divergence of the kinetoplastid-heterolobosean lineage from the remaining eukaryotes. This period also corresponds to the point of origin of DNA cytosine methylation by DNMT1. Origin of major lineages of SET domains such as the Trithorax, Su(var)3-9, Ash1, SMYD, and TTLL12 and E(Z) might have played the initial role in the establishment of multiple distinct heterochromatic and euchromatic states that are likely to have been present, in some form, through much of eukaryotic evolution. Elaboration of these chromatin states might have gone hand-in-hand with acquisition of multiple jumonji-related and LSD1-like demethylases, and functional linkages with the DNA methylation and RNAi systems. Throughout eukaryotic evolution, there were several lineage-specific expansions of SET domain proteins, which might be related to a special transcription regulation process in trypanosomes, acquisition of new meiotic recombination hotspots in animals, and methylation and associated modifications of the diatom silaffin proteins involved in silica biomineralization. The use of specific domains to "read" the methylation marks appears to have been present in the ancestral eukaryote itself. Of these the chromo-like domains appear to have been acquired from bacterial secreted proteins that might have a role in binding cell-surface peptides or peptidoglycan. Domain architectures of the primary enzymes involved in the eukaryotic protein methylation system indicate key features relating to interactions with each other and other modifications in chromatin, such as acetylation. They also emphasize the profound functional distinction between the role of demethylation and deacetylation in regulation of chromatin dynamics.

Research paper thumbnail of Natural History of Eukaryotic DNA Methylation Systems

Progress in Molecular Biology and Translational Science, 2011

Methylation of cytosines and adenines in DNA is a widespread epigenetic mark in both prokaryotes ... more Methylation of cytosines and adenines in DNA is a widespread epigenetic mark in both prokaryotes and eukaryotes. In eukaryotes, it has a profound influence on chromatin structure and dynamics. Recent advances in genomics and biochemistry have considerably elucidated the functions and provenance of these DNA modifications. DNA methylases appear to have emerged first in bacterial restriction-modification (R-M) systems from ancient RNA-modifying enzymes, in transitions that involved acquisition of novel catalytic residues and DNA-recognition features. DNA adenine methylases appear to have been acquired by ciliates, heterolobosean amoeboflagellates, and certain chlorophyte algae. Six distinct clades of cytosine methylases, including the DNMT1, DNMT2, and DNMT3 clades, were acquired by eukaryotes through independent lateral transfer of their precursors from bacteria or bacteriophages. In addition to these, multiple adenine and cytosine methylases were acquired by several families of eukaryotic transposons. In eukaryotes, the DNA-methylase module was often combined with distinct modified and unmodified peptide recognition domains and other modules mediating specialized interactions, for example, the RFD module of DNMT1 which contains a permuted Sm domain linked to a helix-turn-helix domain. In eukaryotes, the evolution of DNA methylases appears to have proceeded in parallel to the elaboration of histone-modifying enzymes and the RNAi system, with functions related to counter-viral and counter-transposon defense, and regulation of DNA repair and differential gene expression being their primary ancestral functions. Diverse DNA demethylation systems that utilize base-excision repair via DNA glycosylases and cytosine deaminases appear to have emerged in multiple eukaryotic lineages. Comparative genomics suggests that the link between cytosine methylation and DNA glycosylases probably emerged first in a novel R-M system in bacteria. Recent studies suggest that the 5mC is not a terminal DNA modification, with enzymes of the Tet/JBP family of 2-oxoglutarate- and iron-dependent dioxygenases further hydroxylating it to form 5-hydroxymethylcytosine (5hmC). These enzymes emerged first in bacteriophages and appear to have been transferred to eukaryotes on one or more occasions. Eukaryotes appear to have recruited three major types of DNA-binding domains (SRA/SAD, TAM/MBD, and CXXC) in discriminating DNA with methylated or unmethylated cytosines. Analysis of the domain architectures of these domains and the DNA methylases suggests that early in eukaryotic evolution they developed a close functional link with SET-domain methylases and Jumonji-related demethylases that operate on peptides in chromatin proteins. In several eukaryotes, other functional connections were elaborated in the form of various combinations between domains related to DNA methylation and those involved in ATP-dependent chromatin remodeling and RNAi. In certain eukaryotes, such as mammals and angiosperms, novel dependencies on the DNA methylation system emerged, which resulted in it affecting unexpected aspects of the biology of these organisms such as parent-offspring interactions. In genomic terms, this was reflected in the emergence of new proteins related to methylation, such as Stella. The well-developed methylation systems of certain heteroloboseans, stramenopiles, chlorophytes, and haptophyte indicate that these might be new model systems to explore the relevance of DNA modifications in eukaryotes.

Research paper thumbnail of Evolution of Eukaryotic Chromatin Proteins and Transcription Factors

Protein Families, 2013

ABSTRACT Comparative genomics of eukaryotes has profoundly impacted our understanding of the regu... more ABSTRACT Comparative genomics of eukaryotes has profoundly impacted our understanding of the regulatory systems involved in transcription and chromatin dynamics. The absolute numbers of specific transcription factors (TFs) and chromatin proteins (CPs) are positively correlated with proteome size in eukaryotes. Comparative analysis of known and predicted CPs allows reconstruction of the early evolutionary history of histone and DNA modification, nucleosome assembly, and chromatin-remodeling systems. Eukaryotic DNA methylases in particular appear to have emerged via multiple independent transfers from bacteria. Even though key histone modifications are universal to eukaryotes, domain architectures of proteins binding posttranslationally modified histones vary considerably across eukaryotes. This indicates that any epigenetic information stored in them might be “interpreted” differently in different lineages. The complexity of domain architectures of CPs appears to have increased in several lineages in the course of eukaryotic evolution and may have had a role in the origin of multicellularity and cell differentiation.

Research paper thumbnail of Large-scale prediction of function shift in protein families with a focus on enzymatic function

Proteins: Structure, Function, and Bioinformatics, 2005

Protein function shift can be predicted from sequence comparisons, either using positive selectio... more Protein function shift can be predicted from sequence comparisons, either using positive selection signals or evolutionary rate estimation. None of the methods have been validated on large datasets, however. Here we investigate existing and novel methods for protein function shift prediction, and benchmark the accuracy against a large dataset of proteins with known enzymatic functions. Function change was predicted between subfamilies by identifying two kinds of sites in a multiple sequence alignment: Conservation-Shifting Sites (CSS), which are conserved in two subfamilies using two different amino acid types, and Rate-Shifting Sites (RSS), which have different evolutionary rates in two subfamilies. CSS were predicted by a new entropy-based method, and RSS using the Rate-Shift program. In principle, the more CSS and RSS between two subfamilies, the more likely a function shift between them. A test dataset was built by extracting subfamilies from Pfam with different EC numbers that belong to the same domain family. Subfamilies were generated automatically using a phylogenetic tree-based program, BETE. The dataset comprised 997 subfamily pairs with four or more members per subfamily. We observed a significant increase in CSS and RSS for subfamily comparisons with different EC numbers compared to cases with same EC numbers. The discrimination was better using RSS than CSS, and was more pronounced for larger families. Combining RSS and CSS by discriminant analysis improved classification accuracy to 71%. The method was applied to the Pfam database and the results are available at http://FunShift.cgb.ki.se. A closer examination of some superfamily comparisons showed that single EC numbers sometimes embody distinct functional classes. Hence, the measured accuracy of function shift is underestimated.

Research paper thumbnail of FunShift: a database of function shift analysis on protein subfamilies

Nucleic Acids Research, 2004

Members of a protein family normally have a general biochemical function in common, but frequentl... more Members of a protein family normally have a general biochemical function in common, but frequently one or more subgroups have evolved a slightly different function, such as different substrate specificity. It is important to detect such function shifts for a more accurate functional annotation. The FunShift database described here is a compilation of function shift analysis performed between subfamilies in protein families. It consists of two main components: (i) subfamilies derived from protein domain families and (ii) pairwise subfamily comparisons analyzed for function shift. The present release, FunShift 12, was derived from Pfam 12 and consists of 151 934 subfamilies derived from 7300 families. We carried out function shift analysis by two complementary methods on families with up to 500 members. From a total of 179 210 subfamily pairs, 62 384 were predicted to be functionally shifted in 2881 families. Each subfamily pair is provided with a markup of probable functional specificity-determining sites. Tools for searching and exploring the data are provided to make this database a valuable resource for protein function annotation. Knowledge of these functionally important sites will be useful for experimental biologists performing functional mutation studies. FunShift is available at http:// FunShift.cgb.ki.se.

Research paper thumbnail of SUPFAM--a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes

Nucleic Acids Research, 2002

Members of a superfamily of proteins could result from divergent evolution of homologues with ins... more Members of a superfamily of proteins could result from divergent evolution of homologues with insignificant similarity in the amino acid sequences. A superfamily relationship is detected commonly after the threedimensional structures of the proteins are determined using X-ray analysis or NMR. The SUPFAM database described here relates two homologous protein families in a multiple sequence alignment database of either known or unknown structure. The present release (1.1), which is the first version of the SUPFAM database, has been derived by analysing Pfam, which is one of the commonly used databases of multiple sequence alignments of homologous proteins. The first step in establishing SUPFAM is to relate Pfam families with the families in PALI, which is an alignment database of homologous proteins of known structure that is derived largely from SCOP. The second step involves relating Pfam families which could not be associated reliably with a protein superfamily of known structure. The profile matching procedure, IMPALA, has been used in these steps. The first step resulted in identification of 1280 Pfam families (out of 2697, i.e. 47%) which are related, either by close homologous connection to a SCOP family or by distant relationship to a SCOP family, potentially forming new superfamily connections. Using the profiles of 1417 Pfam families with apparently no structural information, an all-against-all comparison involving a sequence-profile match using IMPALA resulted in clustering of 67 homologous protein families of Pfam into 28 potential new superfamilies. Expansion of groups of related proteins of yet unknown structural information, as proposed in SUPFAM, should help in identifying 'priority proteins' for structure determination in structural genomics initiatives to expand the coverage of structural information in the protein sequence space. For example, we could assign 858 distinct Pfam domains in 2203 of the gene products in the genome of Mycobacterium tubercolosis. Fifty-one of these Pfam families of unknown structure could be clustered into 17 potentially new superfamilies forming good targets for structural genomics. SUPFAM database can be accessed at http://pauling.mbu.iisc.ernet.in/~supfam.

Research paper thumbnail of Origin and evolution of peptide-modifying dioxygenases and identification of the wybutosine hydroxylase/hydroperoxidase

Nucleic Acids Research, 2010

Unlike classical 2-oxoglutarate and iron-dependent dioxygenases, which include several nucleic ac... more Unlike classical 2-oxoglutarate and iron-dependent dioxygenases, which include several nucleic acid modifiers, the structurally similar jumonji-related dioxygenase superfamily was only known to catalyze peptide modifications. Using comparative genomics methods, we predict that a family of jumonji-related enzymes catalyzes wybutosine hydroxylation/peroxidation at position 37 of eukaryotic tRNAPhe. Identification of this enzyme raised questions regarding the emergence of protein-and nucleic acid-modifying activities among jumonjirelated domains. We addressed these with a natural classification of DSBH domains and reconstructed the precursor of the dioxygenases as a sugar-binding domain. This precursor gave rise to sugar epimerases and metal-binding sugar isomerases. The sugar isomerase active site was exapted for catalysis of oxygenation, with a radiation of these enzymes in bacteria, probably due to impetus from the primary oxygenation event in Earth's history. 2-Oxoglutarate-dependent versions appear to have further expanded with rise of the tricarboxylic acid cycle. We identify previously under-appreciated aspects of their active site and multiple independent innovations of 2-oxoacidbinding basic residues among these superfamilies. We show that double-stranded b-helix dioxygenases diversified extensively in biosynthesis and modification of halogenated siderophores, antibiotics, peptide secondary metabolites and glycine-rich collagen-like proteins in bacteria. Jumonji-related domains diversified into three distinct lineages in bacterial secondary metabolism systems and these were precursors of the three major clades of eukaryotic enzymes. The specificity of wybutosine hydroxylase/peroxidase probably relates to the structural similarity of the modified moiety to the ancestral amino acid substrate of this superfamily.

Research paper thumbnail of PRODOC: a resource for the comparison of tethered protein domain architectures with in-built information on remotely related domain families

Nucleic Acids Research, 2005

PROtein Domain Organization and Comparison (PRODOC) comprises several programs that enable conven... more PROtein Domain Organization and Comparison (PRODOC) comprises several programs that enable convenient comparison of proteins as a sequence of domains. The in-built dataset currently consists of $698 000 proteins from 192 organisms with complete genomic data, and all the SWISSPROT proteins obtained from the Pfam database. All the entries in PRODOC are represented as a sequence of functional domains, assigned using hidden Markov models, instead of as a sequence of amino acids. On average 69% of the proteins in the proteomes and 49% of the residues are covered by functional domain assignments. Software tools allow the user to query the dataset with a sequence of domains and identify proteins with the same or a jumbled or circularly permuted arrangement of domains. As it is proposed that proteins with jumbled or the same domain sequences have similar functions, this search tool is useful in assigning the overall function of a multi-domain protein. Unique features of PRODOC include the generation of alignments between multi-domain proteins on the basis of the sequence of domains and in-built information on distantly related domain families forming superfamilies. It is also possible using PRODOC to identify domain sharing and gene fusion events across organisms. An exhaustive genomegenome comparison tool in PRODOC also enables the detection of successive domain sharing and domain fusion events across two organisms. The tool permits the identification of gene clusters involved in similar biological processes in two closely related organisms. The URL for PRODOC is http:// hodgkin.mbu.iisc.ernet.in/~prodoc.

Research paper thumbnail of Amidoligases with ATP-grasp, glutamine synthetase-like and acetyltransferase-like domains: synthesis of novel metabolites and peptide modifications of proteins

Molecular BioSystems, 2009

Recent studies have shown that the ubiquitin system had its origins in ancient cofactor/amino aci... more Recent studies have shown that the ubiquitin system had its origins in ancient cofactor/amino acid biosynthesis pathways. Preliminary studies also indicated that conjugation systems for other peptide tags on proteins, such as pupylation, have evolutionary links to cofactor/amino acid biosynthesis pathways. Following up on these observations, we systematically investigated the non-ribosomal amidoligases of the ATP-grasp, glutamine synthetase-like and acetyltransferase folds by classifying the known members and identifying novel versions. We then established their contextual connections using information from domain architectures and conserved gene neighborhoods. This showed remarkable, previously uncharacterized functional links between diverse peptide ligases, several peptidases of unrelated folds and enzymes involved in synthesis of modified amino acids. Using the network of contextual connections we were able to predict numerous novel pathways for peptide synthesis and modification, amine-utilization, secondary metabolite synthesis and potential peptide-tagging systems. One potential peptide-tagging system, which is widely distributed in bacteria, involves an ATP-grasp domain and a glutamine synthetase-like ligase, both of which are circularly permuted, an NTN-hydrolase fold peptidase and a novel alpha helical domain. Our analysis also elucidates key steps in the biosynthesis of antibiotics such as friulimicin, butirosin and bacilysin and cell surface structures such as capsular polymers and teichuronopeptides. We also report the discovery of several novel ribosomally synthesized bacterial peptide metabolites that are cyclized via amide and lactone linkages formed by ATP-grasp enzymes. We present an evolutionary scenario for the multiple convergent origins of peptide ligases in various folds and clarify the bacterial origin of eukaryotic peptide-tagging enzymes of the TTL family.

Research paper thumbnail of Prediction of Function Divergence in Protein Families Using the Substitution Rate Variation Parameter Alpha

Molecular Biology and Evolution, 2006

Protein families typically embody a range of related functions and may thus be decomposed into su... more Protein families typically embody a range of related functions and may thus be decomposed into subfamilies with, for example, distinct substrate specificities. Detection of functionally divergent subfamilies is possible by methods for recognizing branches of adaptive evolution in a gene tree. As the number of genome sequences is growing rapidly, it is highly desirable to automatically detect subfamily function divergence. To this end, we here introduce a method for large-scale prediction of function divergence within protein families. It is called the alpha shift measure (ASM) as it is based on detecting a shift in the shape parameter (alpha [a]) of the substitution rate gamma distribution. Four different methods for estimating a were investigated. We benchmarked the accuracy of ASM using function annotation from Enzyme Commission numbers within Pfam protein families divided into subfamilies by the automatic tree-based method BETE. In a test using 563 subfamily pairs in 162 families, ASM outperformed functional site-based methods using rate or conservation shifting (rate shift measure [RSM] and conservation shift measure [CSM]). The best results were obtained using the ''GZ-Gamma'' method for estimating a. By combining ASM with RSM and CSM using linear discriminant analysis, the prediction accuracy was further improved.

Research paper thumbnail of Characterization of a Trypanosoma cruzi acetyltransferase: cellular location, activity and structure☆

Molecular and Biochemical Parasitology, 2007

Trypanosoma cruzi and Trypanosoma brucei are flagellated protozoan parasites that cause Chagas di... more Trypanosoma cruzi and Trypanosoma brucei are flagellated protozoan parasites that cause Chagas disease and African trypanosomiasis in Latin American and African countries, respectively. Currently, over 8 million people are infected with T. cruzi and about 25 million more are at risk. About half a million people are affected by T. brucei. Trypanosome species share many peculiar biological and biochemical features, such as RNA editing. In contrast, they exhibit profound differences at the level of host-parasite interaction and disease pathology. Both parasites are transmitted to their host via different insect vectors. There are no available vaccines, and the current treatments have severe adverse effects. We were involved in sequencing the T. cruzi genome, an initiative launched by the WHO to increase our knowledge of the molecular basis of the parasite. The aim of this thesis was to participate in the sequencing and analysis of the T. cruzi genome, and use the data to investigate acetyltransferase enzymes, presumably linked to important metabolic pathways, as possible drug targets. In Papers I and II, we describe genome sequencing and analysis of two distinct T. cruzi strains. One of the selected strains, CL Brener, was found to be a genetic hybrid of two divergent strains; and it contains about 22 000 genes, encoded on 700 scaffolds with a total genome size of 110 Mb. About 50% of the genes are of unknown function, and lack homology to other sequenced eukaryotes. Large numbers of members of surface molecule gene families, such as trans-sialidase, mucin, mucin-associated protein, and GP63 were found. Comparative analyses revealed that TcI had a smaller genome by up to about 11 Mb. The genome size difference was linked to genes encoding surface molecules and to other repeats and repeated genes. Additionally, six reading frames present in TcVI were not detected in TcI. Genetic polymorphisms such as, indels, microsatellites and SNPs were identified and analyzed. Many genes were found to be under different selective pressures in T. cruzi, indicating differential evolutionary rates, signifying their importance to parasite biology. Within syntenic regions, the two genomes have the same gene complement. Identified features warrant sequencing of further T. cruzi strains, and findings from our studies offer opportunities for more targeted functional studies as well as tools for epidemiology. In the second part of this thesis, Papers III to V, a Trypanosoma cruzi acetyltransferase gene family, identified in the genome project, was chosen for functional characterization as a first step to evaluate its potential as drug target. Acetyltransferases are responsible for protein acetylation, where an acetyl molecule is transferred from acetyl-Coenzyme A to lysine residues in a protein sequence, N-epsilon acetylation, and to N-termini of proteins or peptides, protein N-alpha acetylation. N-alpha acetylation is linked to many metabolic pathways, influences protein stability, protein-protein interaction, localization to organelles and acts as degradation signals. The impact of this posttranslational modification in parasite is not known. We have identified T. cruzi NatC and A, and show that they are expressed in the three life cycle stages (epimastigote, trypomastigote, and amastigote). The catalytic and auxiliary subunits form a complex in vivo. Additionally, they partially co-sediment with the ribosome and may have both co-translational and post-translational protein acetylation functions. In epimastigote, the catalytic subunit of T. cruzi NatA was localized both in the nuclear periphery and in cytoplasm, whereas NatC was predominantly assigned to the cytoplasm. The auxiliary subunit of NatA was mainly confined to the cytoplasm with cytoskeletal-like labelling, whereas NatC showed a punctate profile. Interestingly, the staining patterns of the different subunits analysed for NatA and NatC differ between the life cycle stages, which suggests differential regulation and expression. The native substrates for NatC and predicted NatA, are similar to those described in yeast and humans, suggesting evolutionary conserved functions. The proteins appear to acetylate a large number of proteins N-terminally, suggesting that manipulation of the enzymes may simultaneously affect many cellular functions and thereby could interfere with or abolish infection. Additionally, our data indicate that NatC and A, may have both N-alpha and N-epsilon acetylation potential. Collectively, the genome analyses presented here have provided more molecular insights into the parasite's biology, and have narrowed the gaps between scientific communities working on parasite research. The identification of Nats and native substrates has hopefully laid a solid foundation for future study of Nats, which could provide chemotherapeutic targets for parasitic diseases. LIST OF PUBLICATIONS This thesis is based on the following list of publications referred to in the text by their roman numerals: I. The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease.

Research paper thumbnail of Comparative genomics uncovers novel structural and functional features of the heterotrimeric GTPase signaling system

Gene, 2011

Any queries or remarks that have arisen during the processing of your manuscript are listed below... more Any queries or remarks that have arisen during the processing of your manuscript are listed below and highlighted by flags in the proof. Click on the 'Q' link to go to the location in the proof.

Research paper thumbnail of A new family of polymerases related to superfamily A DNA polymerases and T7-like DNA-dependent RNA polymerases

Biology Direct, 2008

Using sequence profile methods and structural comparisons we characterize a previously unknown fa... more Using sequence profile methods and structural comparisons we characterize a previously unknown family of nucleic acid polymerases in a group of mobile elements from genomes of diverse bacteria, an algal plastid and certain DNA viruses, including the recently reported Sputnik virus. Using contextual information from domain architectures and gene-neighborhoods we present evidence that they are likely to possess both primase and DNA polymerase activity, comparable to the previously reported prim-pol proteins. These newly identified polymerases help in defining the minimal functional core of superfamily A DNA polymerases and related RNA polymerases. Thus, they provide a framework to understand the emergence of both DNA and RNA polymerization activity in this class of enzymes. They also provide evidence that enigmatic DNA viruses, such as Sputnik, might have emerged from mobile elements coding these polymerases. Reviewers This article was reviewed by Eugene Koonin and Mark Ragan.

Research paper thumbnail of MutL homologs in restriction-modification systems and the origin of eukaryotic MORC ATPases

Biology Direct, 2008

The provenance and biochemical roles of eukaryotic MORC proteins have remained poorly understood ... more The provenance and biochemical roles of eukaryotic MORC proteins have remained poorly understood since the discovery of their prototype MORC1, which is required for meiotic nuclear division in animals. The MORC family contains a combination of a gyrase, histidine kinase, and MutL (GHKL) and S5 domains that together constitute a catalytically active ATPase module. We identify the prokaryotic MORCs and establish that the MORC family belongs to a larger radiation of several families of GHKL proteins (paraMORCs) in prokaryotes. Using contextual information from conserved gene neighborhoods we show that these proteins primarily function in restriction-modification systems, in conjunction with diverse superfamily II DNA helicases and endonucleases. The common ancestor of these GHKL proteins, MutL and topoisomerase ATPase modules appears to have catalyzed structural reorganization of protein complexes and concomitant DNA-superstructure manipulations along with fused or standalone nuclease ...

Research paper thumbnail of Large-scale evaluation of function shift in protein families

Research paper thumbnail of BEN: a novel domain in chromatin factors and DNA viral proteins

Bioinformatics, 2008

We report a previously uncharacterized α-helical module, the BEN domain, in diverse animal protei... more We report a previously uncharacterized α-helical module, the BEN domain, in diverse animal proteins such as BANP/SMAR1, NAC1 and the Drosophila mod(mdg4) isoform C, in the chordopoxvirus virosomal protein E5R and in several proteins of polydnaviruses. Contextual analysis suggests that the BEN domain mediates protein–DNA and protein–protein interactions during chromatin organization and transcription. The presence of BEN domains in a poxviral early virosomal protein and in polydnaviral proteins also suggests a possible role for them in organization of viral DNA during replication or transcription. Contact: aravind@ncbi.nlm.nih.gov Supplementary information: Supplementary data for this study can also be accessed at http://www.ncbi.nlm.nih.gov/CBBresearch/Lakshmin/BEN/

Research paper thumbnail of Characterization of a Trypanosoma cruzi acetyltransferase: cellular location, activity and structure☆

Molecular and Biochemical Parasitology, 2007

Trypanosomatids are widespread parasites that cause three major tropical diseases. In trypanosoma... more Trypanosomatids are widespread parasites that cause three major tropical diseases. In trypanosomatids, as in most other organisms, acetylation is a common protein modification that is important in multiple, diverse processes. This paper describes a new member of the Trypanosoma cruzi acetyltransferase family. The gene is single copy and orthologs are also present in the other two sequenced trypanosomatids, Trypanosoma brucei and Leishmania major. This protein (TcAT-1) has the essential motifs present in members of the GCN5-related acetyltransferase (GNAT) family, as well as an additional motif also found in some enzymes from plant and animal species. The protein is evolutionarily more closely related to this group of enzymes than to histone acetyltransferases. The native protein has a cytosolic cellular location and is present in all three life-cycle stages of the parasite. The recombinant protein was shown to have autoacetylation enzymatic activity.