The Human Whey-acidic-protein Four-Disulfide Core domain (WFDC) cluster on 20q13 region: evolutionary history and role in human health and disease (original) (raw)
Molecular evolution of monotreme and marsupial whey acidic protein genes
Evolution & Development, 2007
SUMMARY Whey acidic protein (WAP), a major whey protein present in milk of a number of mammalian species has characteristic cysteine‐rich domains known as four‐disulfide cores (4‐DSC). Eutherian WAP, expressed in the mammary gland throughout lactation, has two 4‐DSC domains, (DI–DII) whereas marsupial WAP, expressed only during mid‐late lactation, contains an additional 4‐DSC (DIII), and has a DIII–D1–DII configuration. We report the expression and evolution of echidna (Tachyglossus aculeatus) and platypus (Onithorhynchus anatinus) WAP cDNAs. Predicted translation of monotreme cDNAs showed echidna WAP contains two 4‐DSC domains corresponding to DIII–DII, whereas platypus WAP contains an additional domain at the C‐terminus with homology to DII and has the configuration DIII–DII–DII. Both monotreme WAPs represent new WAP protein configurations. We propose models for evolution of the WAP gene in the mammalian lineage either through exon loss from an ancient ancestor or by rapid evoluti...
Pig whey acidic protein gene is surrounded by two ubiquitously expressed genes
Biochimica et Biophysica Acta (BBA) - Gene Structure and Expression, 2003
A 140-kb pig DNA fragment containing the whey acidic protein (WAP) gene cloned in a bacterial artificial chromosome (BAC344H5) has been shown to contain all of the cis-elements necessary for position-independent, copy-dependent and tissue-specific expression in transgenic mice. The insert from this BAC was sequenced. This revealed the presence of two other genes with quite different expression patterns in pig tissues and in transfected HC11 mouse mammary cells. The RAMP3 gene is located 15 kb upstream of the WAP gene in reverse orientation. The CPR2 gene is located 5 kb downstream of the WAP gene in the same orientation. The same locus organization was found in the human genome. The region between RAMP3 and CPR2 in the human genome contains a WAP gene-like sequence with several points of mutation which may account for the absence of WAP from human milk.
Ruminants genome no longer contains Whey Acidic Protein gene but only a pseudogene
Gene, 2006
Whey Acidic Protein (WAP) has been identified in the milk of only a few species, including mouse, rat, rabbit, camel, pig, tammar wallaby, brushtail possum, echidna and platypus. Despite intensive studies, it has not yet been found in the milk of Ruminants. We have isolated and characterized genomic WAP clones from ewe, goat and cow, identified their chromosomal localization and examined the expression of the endogenous WAP sequence in the mammary glands of all three species. The WAP sequences were localized on chromosome 4 (4q26) as expected from comparative mapping data. The three ruminant WAP sequences reveal the same deletion of a nucleotide at the end of the first exon when compared with the pig sequence. Due to this frameshift mutation, the putative proteins encoded by these sequences do not harbor the features of a usual WAP protein with two four-disulfide core domains. Moreover, RT-PCR experiments have shown that these sequences are not transcribed and are, thus, pseudogenes. This loss of functionality of the gene in Ruminants raises the question of the biological role of the WAP. Some putative roles previously suggested for WAP are discussed.
BMC Genetics, 2013
The pH is an important parameter influencing technological quality of pig meat, a trait affected by environmental and genetic factors. Several quantitative trait loci associated to meat pH are described on PigQTL database but only two genes influencing this parameter have been so far detected: Ryanodine receptor 1 and Protein kinase, AMP-activated, gamma 3 non-catalytic subunit. To search for genes influencing meat pH we analyzed genomic regions with quantitative effect on this trait in order to detect SNPs to use for an association study. Results: The expressed sequences mapping on porcine chromosomes 1, 2, 3 in regions associated to pork pH were searched in silico to find SNPs. 356 out of 617 detected SNPs were used to genotype Italian Large White pigs and to perform an association analysis with meat pH values recorded in semimembranosus muscle at about 1 hour (pH1) and 24 hours (pHu) post mortem. The results of the analysis showed that 5 markers mapping on chromosomes 1 or 3 were associated with pH1 and 10 markers mapping on chromosomes 1 or 2 were associated with pHu. After False Discovery Rate correction only one SNP mapping on chromosome 2 was confirmed to be associated to pHu. This polymorphism was located in the 3'UTR of two partly overlapping genes, Deoxyhypusine synthase (DHPS) and WD repeat domain 83 (WDR83). The overlapping of the 3'UTRs allows the co-regulation of mRNAs stability by a cis-natural antisense transcript method of regulation. DHPS catalyzes the first step in hypusine formation, a unique amino acid formed by the posttranslational modification of the protein eukaryotic translation initiation factor 5A in a specific lysine residue. WDR83 has an important role in the modulation of a cascade of genes involved in cellular hypoxia defense by intensifying the glycolytic pathway and, theoretically, the meat pH value.
Evolution & Development, 2009
Whey acidic protein (WAP) belongs to a family of four disulfide core (4-DSC) proteins rich in cysteine residues and is the principal whey protein found in milk of a number of mammalian species. Eutherian WAPs have two 4-DSC domains, whereas marsupial WAPs are characterized by the presence of an additional domain at the amino terminus. Structural and expression differences between marsupial and eutherian WAPs have presented challenges to identifying physiological functions of the WAP protein. We have characterized the genomic structure of tammar WAP (tWAP) gene, identified its chromosomal localization and investigated the potential function of tWAP. We have demonstrated that tWAP and domain III (DIII) of the protein alone stimulate proliferation of a mouse mammary epithelial cell line (HC11) and primary cultures of tammar mammary epithelial cells (Wall-MEC), whereas deletion of DIII from tWAP abolishes this proliferative effect. However, tWAP does not induce proliferation of human embryonic kidney (HEK293) cells. DNA synthesis and expression of cyclin D1 and cyclin-dependent kinase-4 genes were significantly up-regulated when Wall-MEC and HC11 cells were grown in the presence of either tWAP or DIII. These data suggest that DIII is the functional domain of the tWAP protein and that evolutionary pressure has led to the loss of this domain in eutherians, most likely as a consequence of adopting a reproductive strategy that relies on greater investment in development of the newborn during pregnancy.
Aminode: Identification of Evolutionary Constraints in the Human Proteome OPEN
Evolutionarily constrained regions (ECRs) are a hallmark for sites of critical importance for a protein's structure or function. ECRs can be inferred by comparing the amino acid sequences from multiple protein homologs in the context of the evolutionary relationships that link the analyzed proteins. The compilation and analysis of the datasets required to infer ECRs, however, are time consuming and require skills in coding and bioinformatics, which can limit the use of ECR analysis in the biomedical community. Here, we developed Aminode, a user-friendly webtool for the routine and rapid inference of ECRs. Aminode is pre-loaded with the results of the analysis of the whole human proteome compared with proteomes from 62 additional vertebrate species. Profiles of the relative rates of amino acid substitution and ECR maps of human proteins are available for immediate search and download on the Aminode website. Aminode can also be used for custom analyses of protein families of interest. Interestingly, mapping of known missense variants shows great enrichment of pathogenic variants and depletion of non-pathogenic variants in Aminode-generated ECRs, suggesting that ECR analysis may help evaluate the potential pathogenicity of variants of unknown significance. Aminode is freely available at http://www.aminode.org. Evolutionary changes along a protein sequence occur at rates that are inversely correlated with the strength of specific constraints at each site. Constrained regions are considered to be under functional constraint owing to a role in protein stability, post-translational modifications, subcellular localization, interaction with other molecules , or enzymatic function 1–4. Because constraint can vary widely along a given protein sequence, profiling the rates of evolutionary changes can provide information useful to identify the key residues or domains of the protein. Several studies have shown that evolutionarily constrained regions (ECRs) can pinpoint the position of residues that are relevant for the function of enzymes or other protein types and can even provide significant information to predict the effects of specific mutations 5–11. Therefore, the identification of ECRs may help inform investigation and experimental design of protein studies. For example, profiling evolutionary constraint can indicate regions to avoid or to target for protein tagging when the function or interactions of the protein must be preserved. Conversely, highly constrained regions might be an excellent choice for functional studies based on mutagenesis analysis 7,8,12. In the absence of prior experimental data, the identification of ECRs may indeed point towards candidate positions in a protein that, if mutated, may have a deleterious effect on the protein function. The underlying reasoning is that if a site has been refractory to changes over long periods of evolutionary time—as inferred from a comparison of numerous and distantly related taxa—any change at that site is likely deleterious 13,14. Effective methods of profiling a set of homologous proteins to determine ECRs require the simultaneous analysis of amino acid sequences and phylogenetic relationships of the proteins under examination 15,16. A general approach to identify ECRs consists of a multi-step procedure 15 : First, orthologs of the protein of interest are selected and a multiple alignment is generated to allow the measurement of the relative rate of substitution at each protein position. Depending on the analysis to be performed, paralogs may also be included—closely related paralogs if the analysis is focused on specific structural features of the protein under examination, or both close and distant paralogs if the analysis is aimed at identifying general constraints of the protein family 5,15. Next, the number of substitutions that have occurred at each protein position is computed based on the phylogenetic
Genome Research, 2007
The initial comparison of the human and chimpanzee genome sequences revealed 16 genomic regions with an unusually high density of rapidly evolving genes. One such region is the whey acidic protein (WAP) four-disulfide core domain locus (or WFDC locus), which contains 14 WFDC genes organized in two subloci on human chromosome 20q13. WAP protease inhibitors have roles in innate immunity and/or the regulation of a group of endogenous proteolytic enzymes called kallikreins. In human, the centromeric WFDC sublocus also contains the rapidly evolving seminal genes, semenogelin 1 and 2 (SEMG1 and SEMG2). The rate of SEMG2 evolution in primates has been proposed to correlate with female promiscuity and semen coagulation, perhaps related to post-copulatory sperm competition. We mapped and sequenced the centromeric WFDC sublocus in 12 primate species that collectively represent four different mating systems. Our analyses reveal a 130-kb region with a notably complex evolutionary history that has included nested duplications, deletions, and significant interspecies divergence of both coding and noncoding sequences; together, this has led to striking differences of this region among primates and between primates and rodents. Further, this region contains six closely linked genes (WFDC12, PI3, SEMG1, SEMG2, SLPI, and MATN4) that show strong patterns of adaptive selection, although an unambiguous correlation between gene mutation rates and mating systems could not be established.
A classification of disulfide patterns and its relationship to protein structure and function
Protein Science, 2004
We report a detailed classification of disulfide patterns to further understand the role of disulfides in protein structure and function. The classification is applied to a unique searchable database of disulfide patterns derived from the SwissProt and Pfam databases. The disulfide database contains seven times the number of publicly available disulfide annotations. Each disulfide pattern in the database captures the topology and cysteine spacing of a protein domain. We have clustered the domains by their disulfide patterns and visualized the results using a novel representation termed the “classification wheel.” The classification is applied to 40,620 protein domains with 2–10 disulfides. The effectiveness of the classification is evaluated by determining the extent to which proteins of similar structure and function are grouped together through comparison with the SCOP and Pfam databases, respectively. In general, proteins with similar disulfide patterns have similar structure and function, even in cases of low sequence similarity, and we illustrate this with specific examples. Using a measure of disulfide topology complexity, we find that there is a predominance of less complex topologies. We also explored the importance of loss or addition of disulfides to protein structure and function by linking classification wheels through disulfide subpattern comparisons. This classification, when coupled with our disulfide database, will serve as a useful resource for searching and comparing disulfide patterns, and understanding their role in protein structure, folding, and stability. Proteins in the disulfide clusters that do not contain structural information are prime candidates for structural genomics initiatives, because they may correspond to novel structures.
Comparative sequence analysis of the mRNAs coding for mouse and rat whey protein
Nucleic Acids Research, 1982
Whey acidic protein (WAP) is a major milk protein found in mouse and rat. Cloned WAP cDNAs from both species have been sequenced and the respective protein sequences have been deduced. Mouse and rat WAP (134 and 137 amino acids respectively) are acidic, cysteine rich proteins which contain a N-terminal signal peptide of 19 amino acids. Most of the cysteines are located in two clusters containing six cysteine residues each, arranged in an identical pattern. Comparison of the mouse and rat WAPs show that the signal peptide and the first cysteine domain are conserved to a greater extent than the rest of the protein. This result is reflected in the nucleotide sequence homology, where the regions coding for the signal peptide and cysteine domain I are the only regions where the rate of replacement substitution is lower than the rate of silent substitution. The 3' non-coding regions show a 91% conservation which is half the substitution rate for the coding region. This low rate of sequence divergence in the 3' non-translated region of the mRNA may indicate a functional importance for this region.
Journal of Molecular Biology, 2004
Disulfide bonds are conserved strongly among proteins of related structure and function. Despite the explosive growth of protein sequence databases and the vast numbers of sequence search tools, no tool exists to draw relations between the disulfide patterns of homologous proteins. We present a comprehensive database of disulfide bonding patterns and a search method to find proteins with similar disulfide patterns. The disulfide database was constructed using disulfide annotations extracted from SwissProt, and was expanded significantly from 16,736 to 94,499 disulfide-containing domains by an inference method that combines SwissProt annotations with Pfam multiple alignments. To search the database, we define a disulfide description, called the disulfide signature, which encodes both spacings between cysteine residues and cysteine connectivity. A web tool was developed that allows users to search for related disulfide patterns and for subpatterns resulting from the removal of one or more disulfides from the pattern. We explore the possibility of using disulfide pattern conservation to identify protein homologs that are undetectable by PSI-BLAST. Examples include the homology between a sea anemone antihypertensive/antiviral protein and a sea anemone neurotoxin, and the homology between tick anticoagulant peptide and bovine trypsin inhibitor. In both examples, there is a clear structural similarity and a functional relationship. We used the database to find structural homologs for the Cripto CFC domain. The identification of a von Willebrand Factor C (VWFC)-like domain agrees with its functional role and explains mutation data. We believe that the rapid increase in structure determinations arising from structural genomics efforts and advances in mass spectrometry techniques will greatly increase the number of disulfide annotations. This information will become a valuable resource for structural and functional annotations of proteins. The availability of a searchable disulfide pattern database will thus provide a powerful new addition to existing homolog discovery methods.
Biochemical Journal, 2001
The aim of the present study was to identify the functional domains of the upstream region of the rabbit whey acidic protein (WAP) gene, which has been used with considerable efficacy to target the expression of several foreign genes to the mammary gland. We have shown that this region exhibits three sites hypersensitive to DNase I digestion in the lactating mammary gland, and that all three sites harbour elements which can bind to Stat5 in itro in bandshift assays. However, not all hypersensitive regions are detected at all stages from pregnancy to weaning, and the level of activated Stat5 detected in the rabbit mammary gland is low except during lactation. We have studied the role of the distal site, which is only detected during lactation, in further detail. It is located within a 849 bp region that is
Structural Classification of Small, Disulfide-rich Protein Domains
Journal of Molecular Biology, 2006
Disulfide-rich domains are small protein domains whose global folds are stabilized primarily by the formation of disulfide bonds and, to a much lesser extent, by secondary structure and hydrophobic interactions. Disulfide-rich domains perform a wide variety of roles functioning as growth factors, toxins, enzyme inhibitors, hormones, pheromones, allergens, etc. These domains are commonly found both as independent (single-domain) proteins and as domains within larger polypeptides. Here, we present a comprehensive structural classification of approximately 3000 small, disulfide-rich protein domains. We find that these domains can be arranged into 41 fold groups on the basis of structural similarity. Our fold groups, which describe broader structural relationships than existing groupings of these domains, bring together representatives with previously unacknowledged similarities; 18 of the 41 fold groups include domains from several SCOP folds. Within the fold groups, the domains are assembled into families of homologs. We define 98 families of disulfide-rich domains, some of which include newly detected homologs, particularly among knottin-like domains. On the basis of this classification, we have examined cases of convergent and divergent evolution of functions performed by disulfide-rich proteins. Disulfide bonding patterns in these domains are also evaluated. Reducible disulfide bonding patterns are much less frequent, while symmetric disulfide bonding patterns are more common than expected from random considerations. Examples of variations in disulfide bonding patterns found within families and fold groups are discussed.
The complete complement of C1q-domain-containing proteins in Homo sapiens
Genomics, 2005
The C-terminal domains of the A, B, C chains of C1q subcomponent of C1 complex represent a common structural motif, the C1q domain, that is found in a diverse range of proteins. We analyzed the human genome for the complete complement of this family and have identified a total of 31 independent gene sequences. The predominant organization of C1q-domain-containing (C1qDC) proteins includes a leading signal peptide, a collagen-like region of variable length, and a C-terminal C1q domain. There are 15 highly conserved residues within the C1q domain, among which 8 are invariant within the human gene set and these are predicted to cluster within the hydrophobic core of the protein. We suggest a 3-subfamily classification based on sequence homology. For some C1qDC-encoding genes, strict orthology has been retained throughout vertebrate evolution and these examples suggest a highly specific functional role for C1qDC proteins that has been under significant selective pressure. Alternatively, individual species have co-opted C1qDC proteins for roles that are highly specific to their biology, suggesting an evolutionary strategy of gene duplication and functional diversification. A more extensive analysis of the evolutionary relationship of C1qDC proteins reveals an ancient rooting, with clear members found in eubacterial species. Curiously, we have been unable to identify C1qDC-encoding genes in many eukaryotic genomcs, such as Sacchromyces cerivisae and C. elegans, suggesting that the retention or loss of this gene family throughout evolution has been sporadic.
Analytical Chemistry, 2008
Cross-linking can be used to identify spatial relationships between amino acids in proteins or protein complexes. A rapid and sensitive method for identifying the site of protein cross-linking using dithiobis(sulfosuccinimidyl propionate) (DTSSP) is presented and illustrated with experiments using murine cortactin, actin and acyl-CoA thioesterase. A characteristic 66 Da doublet, which arises from the asymmetric fragmentation of the disulfide of DTSSP-modified peptides, is observed in the mass spectra obtained under MALDI-TOF/TOF-MS conditions and allows rapid assignment of cross-links in modified proteins. This doublet is observed not only for linear cross-linked peptides but also in the mass spectra of cyclic cross-linked peptides when simultaneous fragmentation of the disulfide and the peptide backbone occurs. We suggest a likely mechanism for this fragmentation. We use guanidinylation of the cross-linked peptides with O-methyl isourea to extend the coverage of cross-linked peptides observed in this MALDI-MS technique. The methodology we report is robust and amenable to automation, and permits the analysis of native cystines along with those introduced by disulfide-containing cross-linkers. (1) Lundblad, R. L. Chemical reagents for protein modification, 3rd ed.; CRC Press: Boca Raton, FL, 2005. (2) Young, M. M.; Tang, N.; Hempel, J. C.; Oshiro, C. M.; Taylor, E. W.; Kuntz, I. D.; Gibson, B. W.; Dollinger, G. Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 5802-5806. (3) Ihling, C.; Schmidt, A.; Kalkhof, S.; Schulz, D. M.; Stingl, C.; Mechtler, K.; Haack, M.; Beck-Sickinger, A. G.; Cooper, D. M.; Sinz, A. J. Am. Soc. Mass Spectrom. 2006, 17, 1100-1113. (4) Jacobsen, R. B.; Sale, K. L.; Ayson, M. J.; Novak, P.; Hong, J.; Lane, P.; Wood, N. L.; Kruppa, G. H.; Young, M. M.; Schoeniger, J. S. Protein Sci. 2006, 15, 1303-1317. (5) Mouradov, D.; Craven, A.; Forwood, J. K.; Flanagan, J. U.; Garcia-Castellanos, R.; Gomis-Ruth, F. X.; Hume, D. A.; Martin, J. L.; Kobe, B.; Huber, T. Robin, G.; Cowieson, N.; Forwood, J. K.; Listwan, P.; Hu, S. H.; Guncar, G.; Huber, T.; Kellie, S.; Hume, D. A.; Kobe, B.; Martin, J. L. Biomol Eng. 2006, 23, 281-289.
Proteins associated with diseases show enhanced sequence correlation between charged residues
Bioinformatics, 2004
Motivation: Function of proteins or a network of interacting proteins often involves communication between residues that are well separated in sequence. The classic example is the participation of distant residues in allosteric regulation. Bioinformatic and structural analysis methods have been introduced to infer residues that are correlated. Recently, increasing attention has been paid to obtain the sequence properties that determine the tendency of disease-related proteins (Aβ peptides, prion proteins, transthyretin, etc.) to aggregate and form fibrils. Motivated in part by the need to identify sequence characteristics that indicate a tendency to aggregate, we introduce a general method that probes covariations in charged residues along the sequence in a given protein family. The method, which involves computing the sequence correlation entropy (SCE) using the quenched probability P s k (i , j ) of finding a residue pair at a given sequence separation, s k , allows us to classify protein families in terms of their SCE. Our general approach may be a useful way in obtaining evolutionary covariations of amino acid residues on a genome wide level. Results: We use a combination of SCE and clustering based on the principle component analysis to classify the protein families. From an analysis of 839 families, covering ∼500 000 sequences, we find that proteins with relatively low values of SCE are predominantly associated with various diseases. In several families, residues that give rise to peaks in P s k (i , j ) are clustered in the three-dimensional structure. For the class of proteins with low SCE values, there are significant numbers of mixed charged-hydrophobic (CH) and charged-polar (CP) runs. Our findings suggest that the low values of SCE and the presence of (CH) and/or (CP) may be indicative of disease association or tendency to aggregate. Our results led to the hypothesis that functions of proteins with similar SCE values may be linked. The hypothesis is validated with a few anecdotal examples.
Genes
Prediabetes is a reversible, intermediate stage of type 2 diabetes mellitus (T2DM). Lifestyle changes that include healthy diet and exercise can substantially reduce progression to T2DM. The present study explored the association of 37 T2DM- and obesity-linked single nucleotide polymorphisms (SNPs) with prediabetes risk in a homogenous Saudi Arabian population. A total of 1129 Saudi adults [332 with prediabetes (29%) and 797 normoglycemic controls] were randomly selected and genotyped using the KASPar SNP genotyping method. Anthropometric and various serological parameters were measured following standard procedures. Heterozygous GA of HNF4A-rs4812829 (0.64; 95% CI 0.47–0.86; p < 0.01), heterozygous TC of WFS1-rs1801214 (0.60; 95% confidence interval (CI) 0.44–0.80; p < 0.01), heterozygous GA of DUSP9-rs5945326 (0.60; 95% CI 0.39–0.92; p = 0.01), heterozygous GA of ZFAND6-rs11634397 (0.75; 95% CI 0.56–1.01; p = 0.05), and homozygous AA of FTO-rs11642841 (1.50; 95% CI 0.8–1.45;...
Reproduction and immunity-driven natural selection in the human WFDC locus.
The whey acidic protein (WAP) four-disulfide core domain (WFDC) locus located on human chromosome 20q13 spans 19 genes with WAP and/or Kunitz domains. These genes participate in antimicrobial, immune, and tissue homoeostasis activities. Neighboring SEMG genes encode seminal proteins Semenogelin 1 and 2 (SEMG1 and SEMG2). WFDC and SEMG genes have a strikingly high rate of amino acid replacement (dN/dS), indicative of responses to adaptive pressures during vertebrate evolution. To better understand the selection pressures acting on WFDC genes in human populations, we resequenced 18 genes and 54 noncoding segments in 71 European (CEU), African (YRI), and Asian (CHB + JPT) individuals. Overall, we identified 484 single-nucleotide polymorphisms (SNPs), including 65 coding variants (of which 49 are nonsynonymous differences). Using classic neutrality tests, we confirmed the signature of short-term balancing selection on WFDC8 in Europeans and a signature of positive selection spanning genes PI3, SEMG1, SEMG2, and SLPI. Associated with the latter signal, we identified an unusually homogeneous-derived 100-kb haplotype with a frequency of 88% in Asian populations. A putative candidate variant targeted by selection is Thr56Ser in SEMG1, which may alter the proteolytic profile of SEMG1 and antimicrobial activities of semen. All the well-characterized genes residing in the WDFC locus encode proteins that appear to have a role in immunity and/or fertility, two processes that are often associated with adaptive evolution. This study provides further evidence that the WFDC and SEMG loci have been under strong adaptive pressure within the short timescale of modern humans.
Characterization of Five Novel Human Genes in the 11q13-q22 Region
Biochemical and Biophysical Research Communications, 2000
The redundancy of sequences in dbEST has approached a level where contiguous cDNA sequences of genes can be assembled, without the need to physically handle the clones from which the ESTs are derived. This is termed EST based in silico gene cloning. With the availability of sequence chromatogram files for a subset of ESTs, the quality of EST sequences can be ascertained accurately and used in contig assembly. In this report, we performed a study using this approach and isolated five novel human genes, C11orf1-C11orf5, in the 11q13-q22 region. The full open reading frames of these genes were determined by comparison with their orthologs, of which four mouse orthologs were isolated (c11orf1, c11orf2, c11orf3 and c11orf5). These genes were then analyzed using several proteomics tools. Both C11orf1 and C11orf2 are nuclear proteins with no other distinguishing features. C11orf3 is a cytoplasmic protein containing an ATP/GTP binding site, a signal peptide located in the N-terminus and a similarity to the C. elegans protein "Probable ARP 2/3 complex 20kD subunit." C11orf4 is a peptide which displays four putative transmembrane domains and is predicted to have a cytoplasmic localization. It contains signal peptides at the N-and C-termini. C11orf5 is a putative nuclear protein displaying a central coiled coil domain. Here, we propose that this purely EST-based cloning approach can be used by modestly sized laboratories to rapidly and accurately characterize and map a significant number of human genes without the need of further sequencing.