The APOBEC Protein Family: United by Structure, Divergent in Function (original) (raw)

. Author manuscript; available in PMC: 2017 Jul 1.

Published in final edited form as: Trends Biochem Sci. 2016 Jun 6;41(7):578–594. doi: 10.1016/j.tibs.2016.05.001

Summary

The APOBEC (Apolipoprotein B mRNA Editing Catalytic Polypeptide-like) family of proteins has diverse and important functions in human health and disease. These proteins have an intrinsic ability to bind to both RNA and single-stranded (ss)DNA. Both function and tissue-specific expression varies widely for each APOBEC protein. We are beginning to understand that the activity of APOBEC proteins is regulated through genetic alterations, changes in their transcription and mRNA processing, and through their interactions with other macromolecules in the cell. Loss of cellular control of APOBEC activities leads to DNA hypermutation and promiscuous RNA editing associated with the development of cancer or viral drug resistance, underscoring the importance of understanding how APOBEC proteins are regulated.

Keywords: APOBEC, cytidine deaminase, disease mechanisms, DNA mutation, epigenetics, RNA editing

The diversity of the APOBEC family exceeded expectations

The mid 1980s to early 1990s was a dynamic period of realization during which RNA editing in chloroplasts, mitochondria and the cell nucleus were discovered [1]. Despite the multitude of epigenetic mechanisms becoming apparent, no one predicted the number of APOBEC enzymes that were soon to be discovered or the magnitude of their impact on biological systems. The APOBEC field began with the discovery that apolipoprotein B (apoB) mRNA contained a C to U base-modification that was not genomically encoded [2, 3] [Text Box 1] (Table 1). The APOBEC family in humans consists of 11 primary gene products and alternatively spliced variants that include A1, Activation Induced Deaminase (AID), APOBEC2 (A2), APOBEC3A-H (A3A-H) and APOBEC4 (A4) proteins (Figure 1A). The family was suggested to originate from ancestral AID genes (PmCDA1 & 2) expressed in the lymphocytes of jawless fish over 500 million years ago (Figure 1B). The ancestral AID genes were proposed to have induced up to 1014 somatic recombination events in antigen receptor genes used in adaptive immunity [4]. Gene duplication from AID and A2 and diversification of AID structure and function in bony fish were hypothesized as contributing to the evolutionary process leading to genes that encoded the APOBEC proteins found in amphibians, birds and tetrapods including primates [5, 6].

Text box 1. APOBEC1 function and regulation, the founder of the field.

Apolipoprotein B (ApoB) mRNA editing of cytidine nucleotides to uridines occurs at positions 6666 and 6802 and only in two tissues, the small intestine of all mammals and in the liver of some species (e.g. rodents, dogs and horses) [7, 8]. A long and a short protein variant of apoB carry lipid and cholesterol in circulation from the digestive tract to the rest of the tissues in the body. The short form of apoB is rapidly cleared from the blood by tissues whereas cholesterol associated with the long form of apoB has a longer blood half-life during which it can become oxidized to the 'bad' form of cholesterol which is associated with an elevated risk of atherosclerosis. Protein expressed from C6666-edited mRNA is 48% of full-length apoB protein because a CAA codon is edited to a UAA stop codon. The potential that apoB mRNA editing might mitigate a risk factor for atherogenic disease is why the single nucleotide polymorphisms in ApoB mRNA became of intense interest. The catalytic subunit responsible for deamination of C6666 was discovered in 1993 [9] and named as Apolipoprotein B Editing Catalytic Subunit 1 (APOBEC1 or A1) in 1995 [10] even though no other APOBEC paralogs were known at that time.

A1 alone had no activity on apoB mRNA. Critical to its discovery was the identification of a cis-acting 11 nucleotide sequence required for editing site recognition (the mooring sequence, Table 1) and an RNA binding protein cofactor (A1CF) that binds to A1 and the mooring sequence, thereby targeting A1 to the editing site (reviewed in [11]). A1 complementation can also be provided by RBM47 [12]. Regulation of APOBEC activity through interaction with other proteins or cofactors is an important theme in this field. Potentially complicating the characterization of APOBEC cofactors is the possibility that they may have functions beyond RNA editing. The finding that gene knockouts of A1CF and RBM47 disrupted mouse embryological development [1214] whereas A1 gene knockouts were not lethal [14] support this possibility.

C to U editing of cytidines proximal to the mooring sequence has been an essential criterion used in the identification of editing targets and editing sites, such as the editing site in neurofibromin (NF1) mRNA [15] and the involvement of A1 in the multiple C to U transitions found in the 3' UTRs of mRNAs [16] (Table 1). The biological significance of 3' UTR editing of mRNA has not been determined but may be relevant to mRNA stability [16, 17]. Editing of NF1 mRNA has been proposed to lead to deregulation of Ras signaling in solid tumors of neurological origin [18, 19] due to the loss of the C-terminal Ras GTPase regulator domain in NF1 through the introduction of a premature stop codon in NF1 mRNA.

Table 1.

Genetic, Cellular and Functional Attributes of APOBEC Proteins

Name Genomic Info Well-DefinedTissue Expression ActiveDeaminaseDomains §CellularLocalization Cofactors/Interactions Targets
Location Exons BindingActivity DeaminaseSubstrate Sequence(C = edit site) GeneticElements
APOBEC1 12p13.1 5 GI tract 1 n/C A1CF,RBM47,CRM1 ssDNA,RNA ssDNA,RNA 5’-A C(n4-6)UGAUnnGnnnn-3’(for n, A and U preferred) apoB & NF1mRNA;3’ UTRs
AID 12p13 5 activated B cells 1 n/C eEF1A,HSP90, CRM1 ssDNA,RNA ssDNA,RNA? 5’-WRC-3’(W = A or T; R = A or G) immunoglobulingenes
APOBEC2 6p21 3 heart,skeletal muscle,TNFα activatedliver cells ? N/C ? ? ? ? Eif4g2 & PtenmRNA
APOBEC3A 22q13.1 5 monocytes/macrophages,non-progenitorcells 1 N/C(C inmonocytes) ? ssDNA,RNA ssDNA,RNA 5’-TC-3’ (preferred) AAV-2,retroviruses,retroelements;WT1, SDHB &SIN3A mRNAs
APOBEC3B 22q13.1 8 PKC inducedcancer cells, INFαactivated liver cells 1(C-terminal) N ? ssDNA,RNA ssDNA 5’-TC -3’ (preferred) retroelements,SIV, HBV, HPVproto-oncogenes
APOBEC3C 22q13.1 4 immune centers,peripheral-bloodcells 1 N/C HIV Vif,HIV Gag (MA) ssDNA,RNA ssDNA 5’-TC -3’ (preferred) SIV,retroelements
APOBEC3Dϕ 22q13.1 7 immune centers,peripheral-bloodcells 2(both weak) C HIV Vif,HIV Gag (NC) ssDNA,RNA ssDNA 5’-TC -3’ (preferred) HIV,retroelements
APOBEC3F 22q13.1 8 immune centers,peripheral-blood,INFα activatedliver cells 1(C-terminal) C HIV Vif,HIV Gag (NC) ssDNA,RNA ssDNA 5’-TC -3’ (preferred) manyretroviruses,retroelements,HBV
APOBEC3G 22q13.1 8 immune centers,peripheral-blood,INFα activatedliver cells 1(C-terminal) C(restricted) HIV Vif,HIV Gag (NC) ssDNA,RNA ssDNA 5’-CC -3’ (preferred)5’-TC -3’ (less abundant) manyretroviruses,retroelements,HBV
APOBEC3H 22q13.1 5 immune centers,peripheral-bloodcells 1 N/C(C for hap II) HIV Vif,HIV Gag (NC) ssDNA,RNA ssDNA 5’-TC -3’ (preferred) manyretroviruses,retroelements
APOBEC4 1q25.3 2 ? 0 ? ? ? ? ? ?

Figure 1. Alignment of members of the APOBEC family and their predicted phylogeny.

Figure 1

Figure 1

(A) The APOBEC proteins expressed in humans are AID, A2, A1, A3A through A3H and A4. Cartoons of the proteins are shown emphasizing gene duplication and likely divergence from the perspective of cytidine deaminase domain paralogs [Z1 (green), Z2 (orange) and Z3 (blue)] of the ZDD motif. APOBEC protein lengths (amino acid) are depicted to scale with the total length shown at the C terminus and with proteins aligned to the (first) ZDD motif. The canonical amino acid motif of the ZDD and of Z1-, Z2- and Z3-type cytidine deaminase domains are shown at the bottom of the panel. (B) A cartoon depicting the hypothesis for APOBEC paralog evolution. Phylogenetic relationships between several vertebrates and the emergence of APOBEC family members are shown on a timeline in millions of years that is not drawn to scale. AID and A2 are predicted to have emerged in jawless fish whereas A1 may have first appeared in ancestors of reptiles and amphibians. The timing of gene duplications and diversification events that led to the A3 subgroup (brown to green transition) is unclear, but likely occurred around 200 MYA. Humans and other primates express the largest diversity of A3 proteins. The phylogenetic relationship of A4 to other APOBECs is unclear. Though A4 is present in mammals, chickens and frogs, it is absent in fish [35].

A zinc-dependent cytidine deaminase domain of A1 is responsible for apoB mRNA C to U editing [9, 20] (Figure 1). The identification of a homologous cytidine deaminase domain motif within the AID primary amino acid sequence led to the prediction that AID [21] might function as the deoxycytidine deaminase required for the multiple mutations required for immunoglobulin gene diversification and recombination (known as somatic hypermutation (SHM) and class switch recombination (CSR)) [2225]. It also led to the prediction that A3G [26] was responsible for host cell restriction of HIV infection through extensive dC to dU hypermutation of HIV proviral DNA during reverse transcription [2629] (Table 1).

Recombination of the V(D)J region within immunogloblin genes gives rise to almost a billion different antibody proteins in a person's lifetime. However, it wasn’t until the discovery of AID and its hypermutational activity that we understood the mechanism for fine tuning that enables specific binding of antibody variable regions to new antigens. Point mutations in the immunoglobulin variable gene region (i.e. SHM) in activated B lymphocytes now is understood to be catalyzed by AID [21]. Moreover, the loss of AID function gives rise to an autosomal recessive, immune compromising disease known as hyper-IgM syndrome type 2 (HIGM2) wherein IgM accumulates due to an inability to perform CSR [30] which otherwise would confer Ig functionality throughout the body by recombining gene segments encoding different constant regions with a given variable region of the heavy chain immunoglobulin gene.

Likewise, A3G was discovered to be the causal factor for the dG to dA hypermutations in HIV proviral DNA that reduced the production of infectious virus [26, 28]. A3F, A3D and A3H likely support A3G through their ability to deaminate cytidines at additional sites in a concerted effort that brings about a catastrophic level of hypermutation of viral genomes; together, they are effectively an armada of host defense [31, 32]. Studies in which sufficient quantities of A3G, A3D, A3F or A3H were stably expressed in T cells demonstrated that each enzyme alone had the ability to mutate HIV during viral replication and inhibited infection in the absence of HIV Vif, the protein expressed by HIV to counteract these host defense factors [Text box 2] [32].

Text box 2. A critical viral defense mechanism is achieved by Vif-binding to A3s.

During HIV-1 infection, A3s can cause viral genomic mutations in ssDNA during reverse transcription that would ablate infection. However, the HIV-1 accessory protein Vif binds with A3C, D, F, G and H and recruits these antiretroviral proteins to an E3 ubiquitin ligase for polyubiquitination and subsequent proteasomal degradation. Vif binding to A3G is localized to the 128-DPD-130 motif at the junction of α4 helix and loop 7, while that for A3F, A3D, and A3C is over a broader conserved region localized mainly to a groove between α2 and α3 helices, opposite the ends that bind zinc and form the catalytic site. The Vif binding interface of A3H is less well understood but involves residues scattered across α3 and α4 helices. These interactions have been reviewed elsewhere [31, 33]. Lentiviral Vif-mediated degradation of A3 proteins is species specific. HIV-1 Vif binds and targets for degradation human A3D, A3F, A3G and to a lesser extent A3H, but not APOBEC of African green monkey (AGM) or rhesus macaque, while AGM simian immunodeficiency virus (SIV) Vif binds both A3G from AGM and rhesus macaque (rh), each coding K128, as opposed to human A3G, which codes for D128 [32, 34]. Likewise, rhA3D, rhA3F, rhA3G and rhA3H are degraded by rhSIV Vif, but only rhA3D is partially sensitive to HIV Vif, while rhA3F, rhA3G, and rhA3H are not [32].

A2 and A4 are the least studied of the APOBECs mainly due to the lack of specific phenotypes linked to either protein’s expression [35, 36]. Emerging evidence suggests that increased expression of A2 was linked to liver cancer and mutations in the genes encoding eukaryotic translation initiation factor 4 gamma 2 (Eif4g2) and phosphatase and tensin homolog (Pten) [37]. A2 may be essential for proper muscle differentiation and development [38], suggesting an as-of-yet unknown function of A2 may exist and may require the binding of a cofactor to regulate activity in certain tissue types.

While APOBEC proteins appear at a cursory level to be very similar because of their common evolutionary origin and a shared basic structural homology, they are very diverse in their impact on human health and disease. This review will consider recent understanding of APOBEC protein expression and its impact on the biology of cells relative to what we have begun to learn of shared structures as well as subtle structural differences that may explain unique functional characteristics. The reader will appreciate through this review that APOBEC genes have diversified greatly throughout evolution and that we are just beginning to contemplate that they have multifaceted roles in biology.

The A3 subfamily targets both foreign and self RNA and DNA

The duplication and divergence of A3 genes leading to expression of seven major homologs in humans (Figure 1A) may have both been driven by and contributed to mutations that promoted the evolution of endogenous and exogenous retroviruses/retroelements that bind to A3 proteins as substrates for editing [Text box 3] [3942]. In this regard, the primate A3 subfamily of proteins has distinct but fairly lax nucleotide sequence preferences adjacent to edited Cs (Table 1) that may have been selected by A3 interactions that better controlled invasion by diverse retroviruses. Supporting this possibility are the expansion of A3 paralogs from one A3 gene in mice to seven in humans and the comparative decreased presence of active retroelements in humans relative to mice [Text box 3] [43]. Present day differentiation of human myeloid cells and their sensitivity to viral infection from both RNA and DNA viruses appears to be closely tied to the transcriptional regulation of A3 genes [4450].

Text box 3. A3 proteins limit endogenous retroelement activity.

There are seven A3 proteins in humans (Figure 1A) active against endogenous retroelements (Table 1). In humans there are non-functional endogenous retrovirus (ERV) sequences that utilize long-terminal repeats (LTR) for genomic insertion similar to exogenous retroviruses. A3F and A3G hypermutation footprints have been identified in human ERV-K and ERV-1 elements as imprints of A3s being active against functional ERVs in our ancestral past [51]. There are also both functional and non-functional retroelements in humans that are non-LTR based retroelements (i.e. autonomous retroelements, called long interspersed nuclear elements (LINE) and non-autonomous retroelements that have trans-dependency on LINE replication machinery, called short interspersed nuclear elements (SINE)). To counter this threat to genomic integrity human A3A-H have been implicated in the inhibition of LINE and/or SINE retrotransposition [52]. Most A3 proteins are capable of directly hypermutating retroelement ssDNA and/or inhibiting reverse transcription of the retroelements [5256]. A3G also may inhibit SINE activity in a deaminase-independent mechanism by binding to SINE RNA (e.g. Alu) and sequestering these RNAs as ribonucleoprotein complexes in the cytoplasm [57, 58]. Conversely, mice have only one A3 gene and their genomes contain active LTR-based and non-LTR based retroelements with 50–60 times more active retroelements in their genomes than humans and the proportion of retroelements causing disease is 35% greater in mice compared to humans [43, 59]. Overall, the variety within A3 genes in humans is a key determinant in combating the genotoxic threat posed by endogenous retroelements. The battle between A3 proteins and endogenous retroelements has clear selective pressure for both A3 genes and retroelement activity. The details of A3 proteins’ specific links to genome evolution through retroelements is a key open question ideally suited for future bioinformatics studies.

RNA viruses

A3G is naturally expressed to significantly higher levels than other A3 family members and contributes most of the APOBEC mutagenic, anti-HIV activity in T cells; A3F and A3H contribute secondarily, while A3D contributes even less [60]. Human A3H is expressed as variants in different populations (haplotypes I-IV) that have a remarkable diversity in DNA mutagenic and antiviral attributes [6163]. A3H haplotype (hap) II is the only variant with a stable expression profile in humans and its expression correlates with increased anti-HIV activity [64]. Along with A3G and A3F, A3C and A3B are proposed to prevent simian immunodeficiency virus (SIV) from infecting humans, though A3B is weakly expressed in normal T cells [65]. A3C is moderately expressed in T cells, so it is a more likely host defense against SIV, but it lacks anti-HIV activity compared to A3G. This may be determined by its binding to the matrix (MA) region of HIV Gag rather than to the nucleocapsid (NC) region where A3G binds for viral particle encapsidation [66].

DNA viruses

A3B, A3F and A3G are up-regulated by interferon alpha (INFα) in liver cells and this expression has been suggested to protect these cells against hepatitis B virus (HBV) infection [67]. A3A is a potent inhibitor of a parvovirus, adeno-associated virus type 2 (AAV-2), and its activity seems to be related to the presumed DNA binding loops that differ from those found in A3G (Figure 2B, Key Figure) [68]. Interferons also upregulate the expression of two enzymatically active A3A isoforms in monocytes and macrophages, suggesting that A3A may protect monocytes from 'parasitic' DNA (e.g. viruses and retroelements) [69].

Figure 2, Key Figure. Structural similarities and differences of the APOBEC family of Proteins.

Figure 2, Key Figure

(A) Cartoon representation of the solution NMR structures of A2Δ40 (PDB 2RPZ) [96], A3A (PDB 2M65) [100], and A3G (N-terminal half, PDB 2MZZ [101]) and X-ray crystal structures of A3B (C-terminal half, PDB 5CQI [106]), A3C (PDB 3VOW) [102], A3F (C-terminal half, PDB 4J4J [108]) and A3G (C-terminal half, PDB 3IR2 [104]) are depicted in the same orientation as that shown in [Text Box 4]. A3 Z1-type CDA domains are green and Z2-types are orange. The catalytic zinc ion (or pseudo-catalytic for A3G(N)) is shown as a purple sphere while the blue sphere represents an intermolecular structural zinc ion that is coordinated by two residues from each of the A3G proteins in the asymmetric unit of the crystal mediating a potential zinc-dependent A3G C-terminal oligomeric interface involving loop 3, α2 helix and the β2-bulge-β2’ feature. (B) Alignment of APOBEC structures shown in (A) with a perspective that shows the structural differences among loops 1, 3, and 7 (red), which are predicted to be important for nucleic acid binding, substrate sequence preference and to control access of substrates to the catalytic pocket containing zinc (purple). Variations in loop sequence and structure have been proposed as determinants of ssDNA binding, catalysis and sequence specificity differences among APOBECs. The L3s of A3A and A3G C-term are both significantly longer than those of A2 and the Z2 A3C, A3F C-term and A3G N-term and both bind a secondary zinc ion (shown for A3G C-term structure as a blue sphere) that may allosterically regulate catalysis [114]. L3 of A3B C-term, the other Z1-type CDA domain, was deleted from the construct to facilitate crystallization [104]. L1 and L7 of A3B C-term are significantly shifted toward the catalytic zinc ion that is predicted to block access to the catalytic pocket; suggesting that this structure is a closed conformation (catalytically inactive) of what is known to be a catalytically active CDA domain. (C) Structural alignment of the β2 strands from APOBEC structures in (A) depicting the β2-bulge presence in Z1-type A3s (A3A, A3B C-term and A3G C-term) or absence in A2 and Z2-type A3s (A3C, A3F and A3G N-term).

RNA editing of cellular transcripts

Whereas most A3 proteins only use ssDNA as substrates, A3A (in monocytes), like A1 (in liver and intestine), has the ability to target specific RNAs for C to U editing (Table 1) [70]. The mechanism and phenotypes linked to A3A site-selective mRNA editing are yet to be been determined. Whether there is a phenotype resulting from APOBEC editing of RNA as clearly as seen with A1 editing of apoB mRNA [Text Box 1] is a general open question in the field. An open controversy has been whether AID edits mRNA as part of the mechanism for SHM and CSR [71, 72].

Nine APOBEC proteins with verified activity and a generally lax sequence preference (Table 1) enables the targeting of multiple RNA and DNA substrates by this family. Therefore, the overall number of APOBEC proteins in humans and targets speaks to the necessity of the APOBECs to both protect cells from foreign invaders and to influence cellular gene expression. However, there is an equally crucial need to regulate these proteins in order to prevent possible mutagenic capabilities of APOBEC deaminase activity on off-target cellular RNA and DNA.

Cells regulate APOBEC activity through localization

A1 and AID are regulated by nucleo-cytoplasmic trafficking

A crucial factor regulating APOBEC function is cellular localization and regulation of nucleo-cytoplasmic trafficking. APOBEC proteins that mutate ssDNA may affect genomic integrity and this is likely the reason why their access to the cell nucleus often is regulated. The activities of both A1 on mRNA [Text box 1] and AID on the ssDNA of the immunoglobulin locus take place within the cell nucleus (Table 1) [11]. Both enzymes contain nuclear localization signals (NLS) and CRM1-dependent nuclear export signals (NES) [73, 74]. A1 is maintained in the cytoplasm as an enzymatically inactive 60S complex along with A1CF [Text box 1]. A1 and A1CF then translocate to the cell nucleus and form a 27S active editosome on mRNA following RNA splicing [75]. In addition to NLS and NES trafficking signals, AID is actively retained in the cytoplasm through interactions with eukaryotic Elongation Factor 1-alpha (eEF1A), heat shock protein 90 (HSP90) [76] and possibly RNA [72] (Table 1).

Anti-HIV A3s avoid genomic DNA by staying in the cytoplasm

A3G is restricted to the cytoplasm through a strong cytoplasmic retention signal (CRS) [77]. Even during mitosis, A3G does not come in contact with chromosomal DNA [11]. The A3G CRS region contains an RNA binding domain [78] and RNA binding is involved in A3G association with cytoplasmic RNA processing centers, stress granules and p-bodies [79, 80]. RNA binding to A3G inhibits its ability to hypermutate ssDNA [81]. However, A3G deletion mutations that abrogated RNA binding remained in the cytoplasm, suggesting that RNA binding may only be part of the essential interactions for retention of A3G in the cytoplasm [77]. Interestingly, A3H hap II is the only A3H haplotype that has a predominantly cytoplasmic localization, consistent with other anti-HIV paralogs, A3D, A3F and A3G (Table 1). The localization of A3H in the cytoplasm may also be linked to its RNA binding activity, as nuclear localized haplotypes I,III, and IV did not support RNA binding [78].

Misregulation of APOBECs causes cancer

AID misregulation

Alternatively spliced variants of AID lacking exon 4 retain binding to trafficking and localization cofactors CRM1 and eEF1A, yet they have been specifically linked to misregulation of AID activity and cancer [76, 82]. Tight regulation of AID activity was shown to be important for regulating immunoglobulin gene expression and cMyc oncogenic translocation in B cell tumor development [83, 84]. In these studies off-target translocations were highly sensitive to AID expression, yet on-target SHM and CSR were only mildly affected by AID expression levels [83, 84]. This suggests that there is a regulatory mechanism (e.g. cofactor) that specifically regulates AID on-target hypermutation of the immunoglobulin gene region.

A1 misregulation

Attempts at gene therapy with A1 have been largely abandoned due to studies showing aberrant hypermutations even with regulated expression of A1 in mouse liver [85]. Furthermore over-expression of A1 has been linked to oncogenesis due to increased editing of the novel A1 target mRNA (NAT1), which encodes a widely expressed translational repressor unless the mRNA is edited to contain a premature translation stop codon [86]. A1 has also been shown to have genomic DNA mutagenic activity when expressed in E. coli and an indirect correlation between A1 up-regulation in cells that are precursors to esophageal adenocarcinomas and DNA mutations has been reported; although DNA mutation by A3B, which is also expressed highly in cancerous cells, was not ruled out [87].

A3B is predominantly nuclear due to a NLS that is similar to that of AID [88]. A3B expression is kept at low levels in most normal tissues, however whether this is controlled through transcription, RNA silencing or post-translational regulation is yet to be determined. By contrast, A3B expression is elevated in diverse forms of cancers [89]. Whether A3B mutagenic activity is a cancer driver or a downstream effector remains an open question and the mechanism for A3B up-regulation in cancer cells is not known [90]. A driver of A3B transcriptional up-regulation may be viral infection, as human papillomavirus (HPV) infection in head/neck and cervical cancers increased A3B expression through the viral oncoprotein E6 [91]. That access to genomic DNA seems to be regulated only by limiting the expression of A3B (and not other APOBEC proteins) may explain why A3B expression (among APOBEC proteins) is more commonly associated with cancer.

A3A is cytoplasmic and non-genotoxic in monocytes; however, when A3A was transfected into other cell types it localized throughout the cell and became genotoxic [92]. These findings suggest that either A3A interacts with a nuclear import factor in a foreign cell context or a cytoplasmic retention cofactor for A3A is expressed in greater abundance in monocytes. It is curious that a large proportion of Asian (37%), Amerindian (58%) and Oceania (93%) populations have a deletion in the A3B gene, which is associated with a 20-fold increase in the expression of an A3A from an mRNA variant containing the 3’UTR of A3B. This deletion is associated with an increased risk for breast cancer and hepatocellular carcinoma (HCC) [90]. It is not clear whether the increased risk for cancer is related to the 20-fold increase in expression of the A3A variant, which may be responsible for genomic mutations [93]. However, an analysis of A3A verses A3B signature mutations in cancer cells suggested that perhaps A3A mutations are generally more prevalent than those from A3B in cancer [94].

Overall, the data suggest that regulation of expression and cytoplasmic localization are critical regulatory mechanisms to maintain genomic integrity, but the mechanisms involved appear to differ for each paralog. Localization of APOBECs likely requires binding to a cell type-specific protein or RNA cofactor(s). It is increasingly apparent that misregulation of these processes are associated with cancer [90]. Thus, identifying and characterizing APOBEC cofactors is a key open question in the field.

The structures of APOBEC proteins suggest variations around a common core

The common core of APOBEC structures is the cytidine deaminase domain

Structures of APOBEC proteins have been difficult to obtain. The first APOBEC crystal structure obtained was for A2 with a 40-residue N-terminal deletion [95]. Subsequently, NMR solution structures of full-length A2 [96], the A3G C-terminal half [9799], A3A [100], and the A3G N-terminal half [101], were reported as well as X-ray structures of A3C [102], the A3G C-terminal half [103105], the A3B C-terminal half [106], and the A3F C-terminal half [107109]. A low-resolution small-angle X-ray scattering model of recombinant full-length A3G also has been reported [110]. Representative structures for each APOBEC are depicted as cartoons in Figure 2A.

All 11 APOBECs share a conserved zinc-dependent deaminase sequence motif (ZDD, Figure 1B) within a α-β-α super secondary structural element that forms the core catalytic site of a cytidine deaminase (CDA) domain [Text Box 4]. Variations in the length, composition and spatial location of conserved secondary structural features and in particular the loops adjacent to the catalytic site likely dictate function, substrate selection and regulation of catalytic deamination of (deoxy)cytidine to (deoxy)uridine and oligomerization. The mechanism of deamination is believed to be conserved with that of other CDA superfamily enzymes that act on cytidine, cytosine, or deoxycytidylate substrates [111].

Text Box 4. The APOBEC canonical cytidine deaminase (CDA) fold.

APOBECs share a conserved CDA fold, comprising a five-stranded mixed β-sheet that is stabilized by alpha helices packing against either face and with the secondary structural elements sequentially arranged as α1-β1-β2-α2-β3-α3-β4-α4-β5-α5-α6 (Figure I). The catalytic residues of the ZDD motif (Figure 1A) are found at the N-terminal ends of helices α2 and α3 (Figure I). A zinc ion is coordinated by the conserved Cys and His residues of the ZDD and by a water molecule that is activated by the zinc ion for nucleophilic attack of the C4 atom in the cytidine ring. The conserved Glu residue of the ZDD acts as a proton shuttle to aid the transfer of a proton from water to N3 of the cytidine ring and donates a proton to the leaving ammonia group.

Figure I legend, included with Figure I in Text Box 4.

Ribbon Diagram of an APOBEC Canonical Cytidine Deaminase (CDA) Fold.

A cartoon depiction of the A3F (C-terminal half, PDB 4J4J [108]) crystal structure with a 90 degree counter clockwise rotation illustrating the major secondary structural elements and the canonical CDA fold, that is evolutionarily conserved in the cytidine deaminase super family [36]. The CDA fold comprises a five-stranded mixed β-sheet that is flanked by six α-helices; the conserved catalytic Glu (blue) and zinc binding residues (His and two Cys, green) of the ZDD-motif form the core of the active site and are located at the N-terminal end of α2 and α3 helices. APOBEC proteins comprise either one CDA domain or two CDA domains covalently linked on the same peptide. Many single- and dual-deaminase domain APOBECs oligomerize through an unknown molecular mechanism. Variations in the length, composition and spatial location of conserved secondary structural features and loops dictate cellular localization, function, substrate selection and regulation of catalytic deamination of (deoxy)cytidine to (deoxy)uridine.

Text Box 4. The APOBEC canonical cytidine deaminase (CDA) fold

Text Box 4. The APOBEC canonical cytidine deaminase (CDA) fold

The β2 strand bulge is seen in some but not all A3 structures

A3A and the A3G and A3B C-terminal CDAs each possess an interrupted β2 strand (Figure 2A & C). The normally contiguous center of the long β2 strand is replaced with a short, bulging loop or helix (termed the β2 bulge) resulting in a β2-bulge-β2’ topology. This feature is not present in the Z2-type structures of A3C, A3F C-terminal CDA, A3G N-terminal CDA or in A2 as these structures feature continuous β2 strands. While the bulge is missing in the A3G N-terminal structure, the β2 strand is shorter and the preceding loop 2 is longer and less compact, perhaps mimicking the function of the bulge (Figure 2C). The physical proximity of the β2 strand with the beginning of the C-terminal CDA suggests that the β2 strand may be poised to interact with the adjacent N-terminal CDA (for the dual-deaminase A3B and A3G) in a bulge-dependent manner. Structural variability of β2 strands in the APOBEC family may play a role the quaternary organization of single domain A3A. Verification awaits a full-length dual-deaminase domain structure in which the position and interaction of the N-terminal CDA with and C-terminal CDA of dual domain APOBECs are restrained through short and covalent linkages.

Structural foundation of ssDNA and RNA binding by APOBECs

Structural features that regulate APOBEC interactions with nucleic acids

Nucleic acid binding is common to all APOBECs, though in the dual deaminase domains of A3B, A3F, and A3G the N-terminal domains have lost the ability to catalyze cytidine deamination but retained RNA binding activity (Table 1) and, at least for A3G, DNA binding activity as well [112]. The interaction of ssDNA with APOBECs is generally mediated through shallow grooves on the protein surface that lead to the catalytic site and are lined with patches of positively charged (basic) and aromatic (hydrophobic) residues that stabilize interactions with the negatively charged nucleic acid backbone and stack with nucleic acid bases, respectively [31, 99, 104]. While there is strong structural conservation of the CDA architecture among the APOBEC family [Text Box 4], examination of available APOBEC structures reveals a significant amount of structural plasticity and sequence variability that is localized to loops between conserved secondary structural elements (Figure 2B). Functional differences in substrate recognition and deamination activity among APOBECs have been mapped largely to variations in amino acid composition, length and spatial conformation of loops L1, L3 and L7 in particular.

L7 has conserved sequence motifs that determine nearest neighbor editing sequence preference among the APOBECs [107] (Table 1) where L1 together with L7 control substrate access to the catalytic pocket. In most APOBEC structures (the exception being the current A3B C-terminal structure [106]) L1 and L7 are splayed apart enough to grant a deoxycytosine base access to an open catalytic pocket [106] (Figure 2B). L3 is also involved in ssDNA substrate binding and has the greatest length and sequence variation among APOBECs [100, 103, 113]. L3 of A3G and A3A contribute a secondary zinc binding site that is proposed to allosterically regulate dC deaminase activity [114]. Concordantly, a zinc ion with intermolecular coordination is observed between residues in adjacent L3 loops of the asymmetric unit in the A3G(C) crystal structure [104]. Zinc coordination by this loop has also been observed at a dimeric interface in an A3A crystal structure [115].

Models of single-stranded DNA binding to APOBEC3s

Numerous models of ssDNA binding to the C-terminal CDA of A3G have been proposed by a variety of methods. NMR DNA-titration experiments predict a ‘brim‘ model involving a ring of basic, positively charged residues surrounding a concave active site. In this model, a putative ssDNA path navigates between L1, L3 and L7 within extended grooves of the A3G C-terminal surface. These loops are clustered at the confluence of the N-terminal ends of α2 and α3 helices. DNA contacts with residues of L5 and the α4 and α6 helices are also predicted [97] (Figure 2B). A structure-guided mutational assay [103] predicted a similar ‘straight’ path for ssDNA on A3G, while real-time monitoring of the deamination reaction [98] suggested that the ssDNA path is ‘kinked’ and runs through a different surface groove that is perpendicular to the ‘brim’ and ‘straight’ models. Mass spectroscopy (MS) analysis of UV-cross linked ssDNA in complex with full-length A3G suggested that ssDNA bound to both the catalytically active C-terminal CDA (L7 and a region comprising α6) and the catalytically inactive N-terminal CDA, in a region comprising its α6 [112]. Modeling based on this analysis suggested that a large, contiguous surface from both A3G domains engages ssDNA and provides a rationale for why the catalytically active C-terminal CDA of dual deaminase A3s alone is not sufficient for high affinity ssDNA binding [112].

DNA binding models for other A3s have also been proposed. A model of ssDNA binding to the A3F C-terminal half crystal structure [108] based on structure guided mutagenesis predicted a path of ssDNA that is similar to the ‘straight’ and ‘brim’ A3G models. The crystal structure of the A3B C-terminus was solved with a dCMP bound to residues of the α4 and α6 helices that are away from the active site [106]. Modeling of polynucleotide binding based on this structure predicted a similar interaction of ssDNA with A3B as suggested for A3G [106]. In contrast to the aforementioned models, a binding model for A3A suggested ssDNA bends sharply to insert the target dC into a deeper catalytic pocket with fewer localized contacts [100]. Differences in binding configurations may explain why, unlike in A3G and A3F, deamination by A3A is not predicted to be a processive event [116]. Overall, modeling has suggested somewhat similar paths for ssDNA binding to the C-terminal CDA of A3G, A3F, and A3B and the noted exception currently is A3A.

The covalent linkage of tandem CDAs that is inherent to A3B, A3D, A3F, and A3G, and the oligomerization of some single and dual deaminase domain APOBECs, may enable more robust APOBEC interactions with ssDNA. These features also may accommodate additional regulatory mechanisms for ssDNA binding. The dual-domain binding model spanning the N- and C-terminus of A3G predicted by UV-cross linking and MS [112] is supported by: (i) single point mutations in the N-terminal domain ablating nucleic acid binding by full-length A3G [117, 118], (ii) the observation that full-length A3G has ~1000 fold stronger affinity for ssDNA than the C-terminal CDA alone [97, 98, 118], and (iii) results from atomic force microscopy experiments interrogating APOBEC binding to ssDNA which revealed a bimodal distribution of contour lengths for the dual-domain A3G but only a monomodal distribution for the single-domain A3A [119]. In addition, full length A3G forms dimers [120122] that bind to ssDNA substrates to form catalytically active homo tetramers [121], thus the full extent of a ssDNA interaction with native APOBECs may be underestimated by modeling with single CDA domains.

APOBEC and RNA interactions

A1 requires AC1F for site-specific editing of apoB mRNA but A1 itself, has a low RNA binding sequence-specificity [123]. APOBECs are RNA binding proteins and A3 proteins bind non-selectively to cellular and viral RNAs [124]. Dual deaminase domain APOCECs bind to RNA predominantly through their non-catalytic N-terminal CDA [117, 118, 125, 126] and, at least for A3G, by the C-terminal domain as well [112]. RNA-bound A3s are catalytically inactive on ssDNA [110, 118]. MS analysis of nucleic acid cross-linked to A3G revealed RNA bound to the same sites that DNA bound, which suggests a competitive mechanism for RNA-mediated inhibition of ssDNA deamination [112]. However, the proximity of RNA binding sites in the N-terminal half of A3G are proximal to a DNA-binding site and this raises the question of whether RNA binding might allosterically regulate ssDNA binding. It is currently unclear whether RNA and ssDNA can bind A3G simultaneously. RNA binding to N-terminal and C-terminal CDAs may recruit APOBECs for alternate functions [81, 112, 124].

Catalytic activity, nucleic acid interactions and cellular abundance may be determined by APOBEC homo and hetero protein-protein interactions

A1 and AID

Some members of the APOBEC family form functionally relevant homo or hetero complexes. A1 dimerization [20, 127] and complex formation with A1CF is required for editing activity on apoB mRNA [128, 129]. The dimerization interface utilizes a conserved hydrophobic patch near the C terminus and, though the A1 structure is not known, this patch maps to α6 helix in the CDA structure [130132]. AID has been reported to form functional homodimers [132, 133] as well as functional monomers [134].

A2

The crystal packing of the initial A2 structure with a 40 residue N-terminal deletion (A2Δ40) [95] revealed a long β2-β2 interface between adjacent A2 molecules, suggesting a potential mechanism for oligomerization of single domain APOBEC proteins such as AID. However, the basis for this mode of oligomerization was refuted by the monomeric NMR solution structure of mouse A2Δ40 (Figure 2A, PDB ID 2RPZ), which demonstrated that native A2 may be a monomer. Further NMR analyses of full-length human A2 showed that the N-terminal 40 residue extension forms 3 dynamic α-helices that pack against the β2 strand, preventing a β2-β2 mediated dimeric interaction [96].

A3 family

The oligomeric state of A3 proteins has effects on catalytic function, nucleic acid binding and cellular distribution, though there is no consensus on the oligomeric state of catalytically active A3s. Factors that have confounded the study of APOBEC oligomerization and hindered a clear understanding of mechanism are that oligomerization occurs through both direct protein-protein interaction and through more complicated, but indirect, nucleic acid-bridged interactions. Fluorescence fluctuation spectroscopy demonstrated that A2, A3A, and A3C were monomeric in cells while A3B, A3D, A3F, A3G, and A3H were multimeric [135]. Interestingly, the ability to form multimeric complexes was correlated with sequestration of A3s and AID in cytoplasmic complexes [75, 76, 136] and p-bodies [79, 80] as well as with packaging in HIV virions in the cytoplasm of cells [78, 137139]. This suggested that oligomerization, whether through protein-protein interactions or RNA-protein interactions, may contribute to A3 localization and function. Biophysical analyses of purified A3G suggested multiple oligomeric states for catalytically active full-length A3G [110, 120, 140, 141] modeled alternatively as mediated by either the N-[125, 139, 142] or C-terminal [77] CDA. It is likely that the oligomeric state influences the orientation of loops involved in nucleic acid binding and catalytic site access; this could affect and even regulate the kinetics of deamination by providing larger binding surfaces for nucleic acid.

Some members of the A3 family (A3C, D, F, G and H) are bound by HIV-1 Vif protein and subsequently recruited to an E3 ubiquitin ligase complex for polyubiquitination and proteasomal degradation [Text Box 2]. Vif mediated degradation of A3s is essential for a sustainable HIV infection and the Vif-binding A3 surfaces vary among those A3s that are bound by Vif (reviewed in [31, 33]). Recent computational studies suggest how an ancestral lentiviral Vif may have evolved to bind multiple and distinct A3 surfaces from an original host organism with only a single A3 protein [143].

Concluding remarks

Our knowledge of APOBEC proteins, their functions and structures is growing rapidly. In recent years of research, A4 was added to the list of family members and the diversity of A3H haplotypes with differences in antiviral functions were described. A3A recently joined A1 as a deaminase that has mRNA editing activities. In addition, an expanded functional significance for A1 mRNA editing has been suggested by the discovery of 3' UTR mRNA editing and the discovery that the RNA binding protein RBM47 in addition to A1CF is a complementation factor; opening the possibility of yet-to-be identified RNAs that are C to U edited through different auxiliary protein targeting of A1.

Today the role of genetic mutations in A3B and A3A and elevated expression of AID, A3B and A3A leading to a cancer genotype and phenotype is all but certain. We are just beginning to appreciate the extent to which AID, or other APOBECs, contribute to chromosomal translocations in health and disease. We understand that cells have, as part of their innate immunity, an ability to neutralize infection by HIV through A3 proteins and their mutagenic activities on HIV. As the field has made progress in structural characterization of APOBECs and their interactions with substrates and other macromolecules, an overarching focus has become the regulatory processes that limit or enhance APOBEC expression and functions in promotion of healthy cell responses to diseases. Although the timing and reasons why APOBEC proteins evolved may be difficult to prove, a priority will be to understand the function of the present day APOBEC family. If we do come to appreciate the strengths and potential limitations of this family of proteins, there may be opportunities to use these enzymes in cell engineering and for the development of therapeutics for disease intervention.

Outstanding Questions.

  1. What are the native structures for APOBECs? Notably absent are structures of A1, AID, a full-length dual deaminase domain A3 and structures of substrate- and cofactor-bound APOBECs.
  2. What are the native functions for APOBECs? The relationships among APOBEC protein expression, cellular localization and function have been driven by native cell-type specific selection pressures. The functions of APOBECs that are inferred by experimentally overexpressing APOBECs in various cell lines have to be evaluated cautiously as cellular regulatory mechanisms may not function under these conditions.
  3. How do APOBECs bind to ssDNA and RNA and what are the interactions required for deoxycytidine/cytidine deamination?
  4. What nucleic acid binding or protein cofactor interactions affect APOBEC subcellular compartmentalization, interactions with substrates and regulation of deaminase activity?
  5. Are the DNA mutagenic activities of proto oncogenic APOBECs that localize to the cell nucleus, like AID, A1, A3A and A3B, involved in the initiation or progression of cancers? The beneficial roles of APOBECs clearly can be compromised through over-expression, production of APOBEC variants and inappropriate retention in the cell nucleus. We do not know what leads to mutation of off-target nucleic acids and which mutations can lead to the induction and progression of cancer.
  6. Does RNA editing by APOBEC affect the diversity of the protein or RNA function? The technology for transcriptome analysis has enabled the search and identification of modified RNA bases included presumed APOBEC editing sites. However the percentage of RNAs modified and therefore the impact RNAs with modified bases have on cell phenotype has not been addressed. Without quantitative analysis, edited mRNAs identified by high resolution RNA sequencing methods that are the result of over-expressing APOBEC in heterologous cell types raise questions concerning their occurrence in nature.

The APOBEC family of proteins does not have a common RNA or DNA substrate for cytidine/deoxycytidine deamination and in fact it appears that not all the predicted cytidine deaminase domains are catalytic activity.

A consensus is emerging that nucleic acid substrates are coordinated for deamination using amino acid residues flanking the catalytic domain.

APOBECs are not essential proteins for development and growth but several play critical roles in the maintenance of long-term health such as in metabolism and immunity (innate and acquired).

APOBECs are controlled through expression of wild type and alternative spliced mRNAs as well as localization, post-translational modification, turnover, and interactions with other proteins or RNAs.

Overexpression of APOBECs and APOBEC variants has been observed in several malignancies suggesting an association with cancer initiation and/or progression.

Acknowledgments

The authors thank Jenny M.L. Smith for the preparation of Figure 1. All authors contributed equally to the planning and writing of this review. The authors wish to acknowledge the numerous contributions to the advancement of science in the APOBEC field made by investigators across the globe whose work may not have been adequately referenced here due to the limitations of space. Preparation of this review was supported in part by a Public Health Services Grant GM110568 and GM110038 awarded to HCS.

Glossary

C to U base-modification

also referred to as cytidine to uridine deamination or editing, this is a reaction in which a coordinated zinc ion in an enzyme active site acts as a Lewis acid to activate a water molecule for hydrolytic, nucleophilic attack of the amide group at the C4 position of cytidine and a conserved glutamic acid acts as a proton shuttle to convert a cytidine base to a uridine in either RNA or DNA with an ammonium leaving group.

Class Switch Recombination (CSR)

an immunoglobulin-specific function in activated B cells and dependent on AID mutagenic activity where non homologous recombination within the Ig gene sequence encoding the immunoglobulin constant region is induced to switch from IgM to new functional classes of antibodies (IgG, IgA or IgE).

Cofactor

a protein/peptide, nucleic acid, lipid or organic small molecule that binds to an enzyme to regulate a specific structure, activity or subcellular distribution.

Hypermutation

when a multitude of nucleotides on the same strand of DNA or RNA are edited.

Interferons

members of the cytokine family of signaling proteins that are synthesized and secreted by cells in response to pathogens (e.g. viruses and bacteria) and that act on immune cells to activate their responses.

Retroelements

the two major classes of autonomous retroelements are long terminal repeat (LTR) based endogenous retroviruses and non-LTR based long interspersed nuclear elements (LINE) that both encode everything needed to reverse transcribe and re-insert their sequence into another location within the cell’s genome. There are also non-autonomous short interspersed nuclear elements (SINE) that are trans-dependent on the LINE encoded machinery for reverse transcription and genomic re-insertion. Endogenous retroelements’ ability to copy themselves into random locations in the genome leads to genomic instability and disease.

Somatic Hypermutation (SHM)

a process specific to immunoglobulin (Ig) genes and dependent on AID mutational activity in activated B-cells in which the gene sequence encoding the Ig variable region is hypermutated to produce Ig gene variants that will encode IgM with the ability to bind to antigens with high affinity.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Conflict of interest statement: Dr. HC Smith is a full-time tenured professor at the University of Rochester School of Medicine and Dentistry. He is also the founder and CEO of the University of Rochester spinout company OyaGen, Inc. The company has a financial interest in the development of antiviral and anti-cancer drugs based on APOBEC technology. Drs. JD Salter and RP Bennett are employees of OyaGen, Inc.

References