The Human Whey-acidic-protein Four-Disulfide Core domain (WFDC) cluster on 20q13 region: evolutionary history and role in human health and disease (original) (raw)
Related papers
Molecular evolution of monotreme and marsupial whey acidic protein genes
Evolution & Development, 2007
SUMMARY Whey acidic protein (WAP), a major whey protein present in milk of a number of mammalian species has characteristic cysteine‐rich domains known as four‐disulfide cores (4‐DSC). Eutherian WAP, expressed in the mammary gland throughout lactation, has two 4‐DSC domains, (DI–DII) whereas marsupial WAP, expressed only during mid‐late lactation, contains an additional 4‐DSC (DIII), and has a DIII–D1–DII configuration. We report the expression and evolution of echidna (Tachyglossus aculeatus) and platypus (Onithorhynchus anatinus) WAP cDNAs. Predicted translation of monotreme cDNAs showed echidna WAP contains two 4‐DSC domains corresponding to DIII–DII, whereas platypus WAP contains an additional domain at the C‐terminus with homology to DII and has the configuration DIII–DII–DII. Both monotreme WAPs represent new WAP protein configurations. We propose models for evolution of the WAP gene in the mammalian lineage either through exon loss from an ancient ancestor or by rapid evoluti...
Pig whey acidic protein gene is surrounded by two ubiquitously expressed genes
Biochimica et Biophysica Acta (BBA) - Gene Structure and Expression, 2003
A 140-kb pig DNA fragment containing the whey acidic protein (WAP) gene cloned in a bacterial artificial chromosome (BAC344H5) has been shown to contain all of the cis-elements necessary for position-independent, copy-dependent and tissue-specific expression in transgenic mice. The insert from this BAC was sequenced. This revealed the presence of two other genes with quite different expression patterns in pig tissues and in transfected HC11 mouse mammary cells. The RAMP3 gene is located 15 kb upstream of the WAP gene in reverse orientation. The CPR2 gene is located 5 kb downstream of the WAP gene in the same orientation. The same locus organization was found in the human genome. The region between RAMP3 and CPR2 in the human genome contains a WAP gene-like sequence with several points of mutation which may account for the absence of WAP from human milk.
Ruminants genome no longer contains Whey Acidic Protein gene but only a pseudogene
Gene, 2006
Whey Acidic Protein (WAP) has been identified in the milk of only a few species, including mouse, rat, rabbit, camel, pig, tammar wallaby, brushtail possum, echidna and platypus. Despite intensive studies, it has not yet been found in the milk of Ruminants. We have isolated and characterized genomic WAP clones from ewe, goat and cow, identified their chromosomal localization and examined the expression of the endogenous WAP sequence in the mammary glands of all three species. The WAP sequences were localized on chromosome 4 (4q26) as expected from comparative mapping data. The three ruminant WAP sequences reveal the same deletion of a nucleotide at the end of the first exon when compared with the pig sequence. Due to this frameshift mutation, the putative proteins encoded by these sequences do not harbor the features of a usual WAP protein with two four-disulfide core domains. Moreover, RT-PCR experiments have shown that these sequences are not transcribed and are, thus, pseudogenes. This loss of functionality of the gene in Ruminants raises the question of the biological role of the WAP. Some putative roles previously suggested for WAP are discussed.
Evolution & Development, 2009
Whey acidic protein (WAP) belongs to a family of four disulfide core (4-DSC) proteins rich in cysteine residues and is the principal whey protein found in milk of a number of mammalian species. Eutherian WAPs have two 4-DSC domains, whereas marsupial WAPs are characterized by the presence of an additional domain at the amino terminus. Structural and expression differences between marsupial and eutherian WAPs have presented challenges to identifying physiological functions of the WAP protein. We have characterized the genomic structure of tammar WAP (tWAP) gene, identified its chromosomal localization and investigated the potential function of tWAP. We have demonstrated that tWAP and domain III (DIII) of the protein alone stimulate proliferation of a mouse mammary epithelial cell line (HC11) and primary cultures of tammar mammary epithelial cells (Wall-MEC), whereas deletion of DIII from tWAP abolishes this proliferative effect. However, tWAP does not induce proliferation of human embryonic kidney (HEK293) cells. DNA synthesis and expression of cyclin D1 and cyclin-dependent kinase-4 genes were significantly up-regulated when Wall-MEC and HC11 cells were grown in the presence of either tWAP or DIII. These data suggest that DIII is the functional domain of the tWAP protein and that evolutionary pressure has led to the loss of this domain in eutherians, most likely as a consequence of adopting a reproductive strategy that relies on greater investment in development of the newborn during pregnancy.
BMC Genetics, 2013
The pH is an important parameter influencing technological quality of pig meat, a trait affected by environmental and genetic factors. Several quantitative trait loci associated to meat pH are described on PigQTL database but only two genes influencing this parameter have been so far detected: Ryanodine receptor 1 and Protein kinase, AMP-activated, gamma 3 non-catalytic subunit. To search for genes influencing meat pH we analyzed genomic regions with quantitative effect on this trait in order to detect SNPs to use for an association study. Results: The expressed sequences mapping on porcine chromosomes 1, 2, 3 in regions associated to pork pH were searched in silico to find SNPs. 356 out of 617 detected SNPs were used to genotype Italian Large White pigs and to perform an association analysis with meat pH values recorded in semimembranosus muscle at about 1 hour (pH1) and 24 hours (pHu) post mortem. The results of the analysis showed that 5 markers mapping on chromosomes 1 or 3 were associated with pH1 and 10 markers mapping on chromosomes 1 or 2 were associated with pHu. After False Discovery Rate correction only one SNP mapping on chromosome 2 was confirmed to be associated to pHu. This polymorphism was located in the 3'UTR of two partly overlapping genes, Deoxyhypusine synthase (DHPS) and WD repeat domain 83 (WDR83). The overlapping of the 3'UTRs allows the co-regulation of mRNAs stability by a cis-natural antisense transcript method of regulation. DHPS catalyzes the first step in hypusine formation, a unique amino acid formed by the posttranslational modification of the protein eukaryotic translation initiation factor 5A in a specific lysine residue. WDR83 has an important role in the modulation of a cascade of genes involved in cellular hypoxia defense by intensifying the glycolytic pathway and, theoretically, the meat pH value.
Aminode: Identification of Evolutionary Constraints in the Human Proteome OPEN
Evolutionarily constrained regions (ECRs) are a hallmark for sites of critical importance for a protein's structure or function. ECRs can be inferred by comparing the amino acid sequences from multiple protein homologs in the context of the evolutionary relationships that link the analyzed proteins. The compilation and analysis of the datasets required to infer ECRs, however, are time consuming and require skills in coding and bioinformatics, which can limit the use of ECR analysis in the biomedical community. Here, we developed Aminode, a user-friendly webtool for the routine and rapid inference of ECRs. Aminode is pre-loaded with the results of the analysis of the whole human proteome compared with proteomes from 62 additional vertebrate species. Profiles of the relative rates of amino acid substitution and ECR maps of human proteins are available for immediate search and download on the Aminode website. Aminode can also be used for custom analyses of protein families of interest. Interestingly, mapping of known missense variants shows great enrichment of pathogenic variants and depletion of non-pathogenic variants in Aminode-generated ECRs, suggesting that ECR analysis may help evaluate the potential pathogenicity of variants of unknown significance. Aminode is freely available at http://www.aminode.org. Evolutionary changes along a protein sequence occur at rates that are inversely correlated with the strength of specific constraints at each site. Constrained regions are considered to be under functional constraint owing to a role in protein stability, post-translational modifications, subcellular localization, interaction with other molecules , or enzymatic function 1–4. Because constraint can vary widely along a given protein sequence, profiling the rates of evolutionary changes can provide information useful to identify the key residues or domains of the protein. Several studies have shown that evolutionarily constrained regions (ECRs) can pinpoint the position of residues that are relevant for the function of enzymes or other protein types and can even provide significant information to predict the effects of specific mutations 5–11. Therefore, the identification of ECRs may help inform investigation and experimental design of protein studies. For example, profiling evolutionary constraint can indicate regions to avoid or to target for protein tagging when the function or interactions of the protein must be preserved. Conversely, highly constrained regions might be an excellent choice for functional studies based on mutagenesis analysis 7,8,12. In the absence of prior experimental data, the identification of ECRs may indeed point towards candidate positions in a protein that, if mutated, may have a deleterious effect on the protein function. The underlying reasoning is that if a site has been refractory to changes over long periods of evolutionary time—as inferred from a comparison of numerous and distantly related taxa—any change at that site is likely deleterious 13,14. Effective methods of profiling a set of homologous proteins to determine ECRs require the simultaneous analysis of amino acid sequences and phylogenetic relationships of the proteins under examination 15,16. A general approach to identify ECRs consists of a multi-step procedure 15 : First, orthologs of the protein of interest are selected and a multiple alignment is generated to allow the measurement of the relative rate of substitution at each protein position. Depending on the analysis to be performed, paralogs may also be included—closely related paralogs if the analysis is focused on specific structural features of the protein under examination, or both close and distant paralogs if the analysis is aimed at identifying general constraints of the protein family 5,15. Next, the number of substitutions that have occurred at each protein position is computed based on the phylogenetic
Genome Research, 2007
The initial comparison of the human and chimpanzee genome sequences revealed 16 genomic regions with an unusually high density of rapidly evolving genes. One such region is the whey acidic protein (WAP) four-disulfide core domain locus (or WFDC locus), which contains 14 WFDC genes organized in two subloci on human chromosome 20q13. WAP protease inhibitors have roles in innate immunity and/or the regulation of a group of endogenous proteolytic enzymes called kallikreins. In human, the centromeric WFDC sublocus also contains the rapidly evolving seminal genes, semenogelin 1 and 2 (SEMG1 and SEMG2). The rate of SEMG2 evolution in primates has been proposed to correlate with female promiscuity and semen coagulation, perhaps related to post-copulatory sperm competition. We mapped and sequenced the centromeric WFDC sublocus in 12 primate species that collectively represent four different mating systems. Our analyses reveal a 130-kb region with a notably complex evolutionary history that has included nested duplications, deletions, and significant interspecies divergence of both coding and noncoding sequences; together, this has led to striking differences of this region among primates and between primates and rodents. Further, this region contains six closely linked genes (WFDC12, PI3, SEMG1, SEMG2, SLPI, and MATN4) that show strong patterns of adaptive selection, although an unambiguous correlation between gene mutation rates and mating systems could not be established.
Analysis of splice variants of the human protein disulfide isomerase (P4HB) gene
BMC Genomics, 2020
BackgroundProtein Disulfide Isomerases are thiol oxidoreductase chaperones from thioredoxin superfamily with crucial roles in endoplasmic reticulum proteostasis, implicated in many diseases. The family prototype PDIA1 is also involved in vascular redox cell signaling. PDIA1 is coded by theP4HBgene. While forced changes inP4HBgene expression promote physiological effects, little is known about endogenousP4HBgene regulation and, in particular, gene modulation by alternative splicing. This study addressed theP4HBsplice variant landscape.ResultsTen protein coding sequences (Ensembl) of theP4HBgene originating from alternative splicing were characterized. Structural features suggest that except forP4HB-021, other splice variants are unlikely to exert thiol isomerase activity at the endoplasmic reticulum. Extensive analyses using FANTOM5, ENCODE Consortium and GTEx project databases as RNA-seq data sources were performed. These indicated widespread expression but significant variability i...
A classification of disulfide patterns and its relationship to protein structure and function
Protein Science, 2004
We report a detailed classification of disulfide patterns to further understand the role of disulfides in protein structure and function. The classification is applied to a unique searchable database of disulfide patterns derived from the SwissProt and Pfam databases. The disulfide database contains seven times the number of publicly available disulfide annotations. Each disulfide pattern in the database captures the topology and cysteine spacing of a protein domain. We have clustered the domains by their disulfide patterns and visualized the results using a novel representation termed the “classification wheel.” The classification is applied to 40,620 protein domains with 2–10 disulfides. The effectiveness of the classification is evaluated by determining the extent to which proteins of similar structure and function are grouped together through comparison with the SCOP and Pfam databases, respectively. In general, proteins with similar disulfide patterns have similar structure and function, even in cases of low sequence similarity, and we illustrate this with specific examples. Using a measure of disulfide topology complexity, we find that there is a predominance of less complex topologies. We also explored the importance of loss or addition of disulfides to protein structure and function by linking classification wheels through disulfide subpattern comparisons. This classification, when coupled with our disulfide database, will serve as a useful resource for searching and comparing disulfide patterns, and understanding their role in protein structure, folding, and stability. Proteins in the disulfide clusters that do not contain structural information are prime candidates for structural genomics initiatives, because they may correspond to novel structures.