Complete Haplotype Sequence of the Human Immunoglobulin Heavy-Chain Variable, Diversity, and Joining Genes and Characterization of Allelic and Copy-Number Variation (original) (raw)
Related papers
BMC Genomics, 2011
Background Segmental duplication and deletion were implicated for a region containing the human immunoglobulin heavy chain variable (IGHV) gene segments, 1.9III/hv3005 (possible allelic variants of IGHV3-30) and hv3019b9 (a possible allelic variant of IGHV3-33). However, very little is known about the ranges of the duplication and the polymorphic region. This is mainly because of the difficulty associated with distinguishing between allelic and paralogous sequences in the IGHV region containing extensive repetitive sequences. Inability to separate the two parental haploid genomes in the subjects is another serious barrier. To address these issues, unique DNA sequence tags evenly distributed within and flanking the duplicated region implicated by the previous studies were selected. The selected tags in single sperm from six unrelated healthy donors were amplified by multiplex PCR followed by microarray detection. In this way, individual haplotypes of different parental origins in the sperm donors could be analyzed separately and precisely. The identified polymorphic region was further analyzed at the nucleotide sequence level using sequences from the three human genomic sequence assemblies in the database. Results A large polymorphic region was identified using the selected sequence tags. Four of the 12 haplotypes were shown to contain consecutively undetectable tags spanning in a variable range. Detailed analysis of sequences from the genomic sequence assemblies revealed two large duplicate sequence blocks of 24,696 bp and 24,387 bp, respectively, and an incomplete copy of 961 bp in this region. It contains up to 13 IGHV gene segments depending on haplotypes. A polymorphic region was found to be located within the duplicated blocks. The variants of this polymorphism unusually diverged at the nucleotide sequence level and in IGHV gene segment number, composition and organization, indicating a limited selection pressure in general. However, the divergence level within the gene segments is significantly different from that in the intergenic regions indicating that these regions may have been subject to different selection pressures and that the IGHV gene segments in this region are functionally important. Conclusions Non-reciprocal genetic rearrangements associated with large duplicate sequence blocks could substantially contribute to the IGHV region diversity. Since the resulting polymorphisms may affect the number, composition and organization of the gene segments in this region, it may have significant impact on the function of the IGHV gene segment repertoire, antibody diversity, and therefore, the immune system. Because one of the gene segments, 3-30 (1.9III), is associated with autoimmune diseases, it could be of diagnostic significance to learn about the variants in the haplotypes by using the multiplex haplotype analysis system used in the present study with DNA sequence tags specific for the variants of all gene segments in this region.
An incomplete ascertainment of genetic variation within the highly polymorphic immunoglobulin heavy chain locus (IGH) has hindered our ability to define genetic factors that influence antibody and B cell mediated processes. To date, methods for locus-wide genotyping of all IGH variant types do not exist. Here, we combine targeted long-read sequencing with a novel bioinformatics tool, IGenotyper, to fully characterize genetic variation within IGH in a haplotype-specific manner. We apply this approach to eight human samples, including a haploid cell line and two mother-father-child trios, and demonstrate the ability to generate high-quality assemblies (>98% complete and >99% accurate), genotypes, and gene annotations, including 2 novel structural variants and 16 novel gene alleles. We show that multiplexing allows for scaling of the approach without impacting data quality, and that our genotype call sets are more accurate than short-read (>35% increase in true positives and &...
Genetic analysis of eight linked polymorphisms within the human immunoglobulin heavy-chain region
American journal of human genetics, 1985
Genetic analyses of multiple restriction fragment length polymorphisms, revealed by a single DNA probe containing the switch region of the immunoglobulin constant heavy-chain (IgCH) mu gene, are presented here in detail. Five of the polymorphic loci segregate in complete linkage with IgCH allotypic markers, while one appears to be located at more than 10 centimorgans from the IgCH region. A study of over 100 random haplotypes typed at eight linked loci, including the Ig switch polymorphisms and the classical Gm-Am allotypes, allowed us to construct an evolutionary tree by which each haplotypic variant can be derived one from the other either by single-step mutation or by recombination. A few of the recombinant haplotypes appeared to carry large DNA duplications that could be explained by unequal crossing over; others might postulate gene-conversion events. Linkage disequilibria observed between the IgCH-linked loci were compared with expected ones. A heterogeneous distribution of re...
Polymorphisms in immunoglobulin heavy chain variable genes and their upstream regions
ABSTRACTGermline variations in immunoglobulin genes influence the repertoire of B cell receptors and antibodies, and such polymorphisms may impact disease susceptibility. However, the knowledge of the genomic variation of the immunoglobulin loci is scarce. Here, we report 25 novel germline IGHV alleles as inferred from rearranged naïve B cell cDNA repertoires of 98 individuals. Thirteen novel alleles were selected for validation, out of which ten were successfully confirmed by targeted amplification and Sanger sequencing of non-B cell DNA. Moreover, we detected a high degree of variability upstream of the V-region in the 5’UTR, leader 1, and leader 2 sequences, and found that identical V-region alleles can differ in upstream sequences. Thus, we have identified a large genetic variation not only in the V-region but also in the upstream sequences of IGHV genes. Our findings challenge current approaches used for annotating immunoglobulin repertoire sequencing data.
Reconsidering the human immunoglobulin heavy-chain locus
Immunogenetics, 2006
We have used a bioinformatics approach to evaluate the completeness and functionality of the reported human immunoglobulin heavy-chain IGHD gene repertoire. Using the hidden Markov-model-based iHMMune-align program, 1,080 relatively unmutated heavy-chain sequences were aligned against the reported repertoire. These alignments were compared with alignments to 1,639 more highly mutated sequences. Comparisons of the frequencies of gene utilization in the two databases, and analysis of features of aligned IGHD gene segments, including their length, the frequency with which they appear to mutate, and the frequency with which specific mutations were seen, were used to determine the reliability of alignments to the less commonly seen IGHD genes. Analysis demonstrates that IGHD4-23 and IGHD5-24, which have been reported to be open reading frames of uncertain functionality, are represented in the expressed gene repertoire; however, the functionality of IGHD6-25 must be questioned. Sequence similarities make the unequivocal identification of members of the IGHD1 gene family problematic, although all genes except IGHD1-14*01 appear to be functional. On the other hand, reported allelic variants of IGHD2-2 and of the IGHD3 gene family appear to be nonfunctional, very rare, or nonexistent. Analysis also suggests that the reported repertoire is relatively complete, although one new putative polymorphism (IGHD3-10*p03) was identified. This study therefore confirms a surprising lack of diversity in the available IGHD gene repertoire, and restriction of the germline sequence databases to the functional set described here will substantially improve the accuracy of IGHD gene alignments and therefore the accuracy of analysis of the V-D-J junction.
Immunoglobulins (IGs), critical components of the human immune system, are composed of heavy and light protein chains encoded at three genomic loci. The IG Kappa (IGK) chain locus consists of two large, inverted segmental duplications. The complexity of IG loci has hindered effective use of standard high- throughput methods for characterizing genetic variation within these regions. To overcome these limitations, we leverage long-read sequencing to create haplotype-resolved IGK assemblies in an ancestrally diverse cohort (n=36), representing the first comprehensive description of IGK haplotype variation at population-scale. We identify extensive locus polymorphism, including novel single nucleotide variants (SNVs) and a common novel ∼24.7 Kbp structural variant harboring a functional IGKV gene. Among 47 functional IGKV genes, we identify 141 alleles, 64 (45.4%) of which were not previously curated. We report inter-population differences in allele frequencies for 14 of the IGKV genes,...
Population-Genetic Properties of Differentiated Human Copy-Number Polymorphisms
The American Journal of Human Genetics, 2011
Copy-number variants (CNVs) can reach appreciable frequencies in the human population, and recent discoveries have shown that several of these copy-number polymorphisms (CNPs) are associated with human diseases, including lupus, psoriasis, Crohn disease, and obesity. Despite new advances, significant biases remain in terms of CNP discovery and genotyping. We developed a method based on single-channel intensity data and benchmarked against copy numbers determined from sequencing read depth to successfully obtain CNP genotypes for 1495 CNPs from 487 human DNA samples of diverse ethnic backgrounds. This microarray contained CNPs in segmental duplication-rich regions and insertions of sequences not represented in the reference genome assembly or on standard SNP microarray platforms. We observe that CNPs in segmental duplications are more likely to be population differentiated than CNPs in unique regions (p ¼ 0.015) and that biallelic CNPs show greater stratification when compared to frequency-matched SNPs (p ¼ 0.0026). Although biallelic CNPs show a strong correlation of copy number with flanking SNP genotypes, the majority of multicopy CNPs do not (40% with r > 0.8). We selected a subset of CNPs for further characterization in 1876 additional samples from 62 populations; this revealed striking population-differentiated structural variants in genes of clinical significance such as OCLN, a tight junction protein involved in hepatitis C viral entry. Our microarray design allows these variants to be rapidly tested for disease association and our results suggest that CNPs (especially those that cannot be imputed from SNP genotypes) might have contributed disproportionately to human diversity and selection.
Genetic variation in the immunoglobulin heavy chain locus shapes the human antibody repertoire
Variation in the antibody response has been linked to differential outcomes in disease, and suboptimal vaccine and therapeutic responsiveness, the determinants of which have not been fully elucidated. Countering models that presume antibodies are generated largely by stochastic processes, we demonstrate that polymorphisms within the immunoglobulin heavy chain locus (IGH) significantly impact the naive and antigen-experienced antibody repertoire, indicating that genetics predisposes individuals to mount qualitatively and quantitatively different antibody responses. We pair recently developed long-read genomic sequencing methods with antibody repertoire profiling to comprehensively resolve IGH genetic variation, including novel structural variants, single nucleotide variants, and genes and alleles. We show that IGH germline variants determine the presence and frequency of antibody genes in the expressed repertoire, including those enriched in functional elements linked to V(D)J recomb...
Polymorphisms of Immunologically Relevant Loci in Human Disease
Annals of the New York Academy of Sciences, 1988
In higher vertebrates there are three major groups of polymorphic molecules that are central to immunologic specificity. The genes of the major histocompatibility complex were demonstrated to be polymorphic and associated with a number of human diseases over two decades ago, and "HLA typing" is now routinely performed for several human diseases.'.' Polymorphisms of the human T-cell receptor complex were more recently documented, and as of this writing allelic variants of T alpha, T beta, and T gamma have been described.= In several instances these variations have been associated with human Immunoglobulins are composed of heavy and light polypeptide chains; in man, allotypic forms are known within the kappa (Km) and heavy (Gm) chain families, the multiple forms of lambda being isotypic. The Gm and Km allotypes were first detected serologically, and over the years evidence has accumulated that these allotypic forms are associated with certain human diseases.'&'' As data from other species (particularly rabbit and mouse) accumulated in which serologic markers for the variable regions could be followed in pedigree analyses with allotypes in the constant region, a paradox emerged. Although there were certain preferences for certain V regions to be associated with certain C regions, by and large there was no evidence of significant linkage disequilibrium between the variable and constant region genes.".l4 The hypothesis developed that the described disease associations were largely related to limited amino acid variations in the constant regions of immunoglobulin molecules that somehow affected immunoglobulin function (for example, complement fixation or opsonizing capabilities). Because serologic markers in the variable region have been difficult to define in man, several investigators have turned to molecular genetic studies. Historically, human immunoglobulin V, structures have been divided into three 'This research was supported by the National Institutes of Health (1-R03-DK39800-01