Genomic screening by 454 pyrosequencing identifies a new human IGHV gene and sixteen other new IGHV allelic variants (original) (raw)
Related papers
Polymorphisms in immunoglobulin heavy chain variable genes and their upstream regions
ABSTRACTGermline variations in immunoglobulin genes influence the repertoire of B cell receptors and antibodies, and such polymorphisms may impact disease susceptibility. However, the knowledge of the genomic variation of the immunoglobulin loci is scarce. Here, we report 25 novel germline IGHV alleles as inferred from rearranged naïve B cell cDNA repertoires of 98 individuals. Thirteen novel alleles were selected for validation, out of which ten were successfully confirmed by targeted amplification and Sanger sequencing of non-B cell DNA. Moreover, we detected a high degree of variability upstream of the V-region in the 5’UTR, leader 1, and leader 2 sequences, and found that identical V-region alleles can differ in upstream sequences. Thus, we have identified a large genetic variation not only in the V-region but also in the upstream sequences of IGHV genes. Our findings challenge current approaches used for annotating immunoglobulin repertoire sequencing data.
Discovery of 10,828 new putative human immunoglobulin heavy chain IGHV variants
2021
The correct identification of immunoglobulin alleles in genome sequences is a challenge. Nevertheless, it can assist in the study of several human diseases associated with the antibody repertoire and in the development of new therapies using antibody engineering techniques. The advent of next-generation sequencing of human genomes and antibody repertoires enabled the development of several tools for the mapping and identification of new immunoglobulin (Ig) alleles. Some of these tools use 1,000 Genomes (G1K) data for new Ig alleles discovery. However, genome data from G1K present low coverage and variant call problems. Here, a computational screen of immunoglobulin alleles was carried out in the Genome Aggregation Database (gnomAD), the largest high-quality catalogue of variation from 125,748 exomes and 15,708 human genomes.A total of 10,909 putative IGHV alleles were identified, in which 10,828 of them are new and 2,024 appear at least in 6 different alleles from genomes/exomes. Th...
An incomplete ascertainment of genetic variation within the highly polymorphic immunoglobulin heavy chain locus (IGH) has hindered our ability to define genetic factors that influence antibody and B cell mediated processes. To date, methods for locus-wide genotyping of all IGH variant types do not exist. Here, we combine targeted long-read sequencing with a novel bioinformatics tool, IGenotyper, to fully characterize genetic variation within IGH in a haplotype-specific manner. We apply this approach to eight human samples, including a haploid cell line and two mother-father-child trios, and demonstrate the ability to generate high-quality assemblies (>98% complete and >99% accurate), genotypes, and gene annotations, including 2 novel structural variants and 16 novel gene alleles. We show that multiplexing allows for scaling of the approach without impacting data quality, and that our genotype call sets are more accurate than short-read (>35% increase in true positives and &...
The American Journal of Human Genetics, 2013
The immunoglobulin heavy-chain locus (IGH) encodes variable (IGHV), diversity (IGHD), joining (IGHJ), and constant (IGHC) genes and is responsible for antibody heavy-chain biosynthesis, which is vital to the adaptive immune response. Programmed V-(D)-J somatic rearrangement and the complex duplicated nature of the locus have impeded attempts to reconcile its genomic organization based on traditional B-lymphocyte derived genetic material. As a result, sequence descriptions of germline variation within IGHV are lacking, haplotype inference using traditional linkage disequilibrium methods has been difficult, and the human genome reference assembly is missing several expressed IGHV genes. By using a hydatidiform mole BAC clone resource, we present the most complete haplotype of IGHV, IGHD, and IGHJ gene regions derived from a single chromosome, representing an alternate assembly of~1 Mbp of high-quality finished sequence. From this we add 101 kbp of previously uncharacterized sequence, including functional IGHV genes, and characterize four large germline copy-number variants (CNVs). In addition to this germline reference, we identify and characterize eight CNV-containing haplotypes from a panel of nine diploid genomes of diverse ethnic origin, discovering previously unmapped IGHV genes and an additional 121 kbp of insertion sequence. We genotype four of these CNVs by using PCR in 425 individuals from nine human populations. We find that all four are highly polymorphic and show considerable evidence of stratification (F st ¼ 0.3-0.5), with the greatest differences observed between African and Asian populations. These CNVs exhibit weak linkage disequilibrium with SNPs from two commercial arrays in most of the populations tested.
Reconsidering the human immunoglobulin heavy-chain locus
Immunogenetics, 2006
We have used a bioinformatics approach to evaluate the completeness and functionality of the reported human immunoglobulin heavy-chain IGHD gene repertoire. Using the hidden Markov-model-based iHMMune-align program, 1,080 relatively unmutated heavy-chain sequences were aligned against the reported repertoire. These alignments were compared with alignments to 1,639 more highly mutated sequences. Comparisons of the frequencies of gene utilization in the two databases, and analysis of features of aligned IGHD gene segments, including their length, the frequency with which they appear to mutate, and the frequency with which specific mutations were seen, were used to determine the reliability of alignments to the less commonly seen IGHD genes. Analysis demonstrates that IGHD4-23 and IGHD5-24, which have been reported to be open reading frames of uncertain functionality, are represented in the expressed gene repertoire; however, the functionality of IGHD6-25 must be questioned. Sequence similarities make the unequivocal identification of members of the IGHD1 gene family problematic, although all genes except IGHD1-14*01 appear to be functional. On the other hand, reported allelic variants of IGHD2-2 and of the IGHD3 gene family appear to be nonfunctional, very rare, or nonexistent. Analysis also suggests that the reported repertoire is relatively complete, although one new putative polymorphism (IGHD3-10*p03) was identified. This study therefore confirms a surprising lack of diversity in the available IGHD gene repertoire, and restriction of the germline sequence databases to the functional set described here will substantially improve the accuracy of IGHD gene alignments and therefore the accuracy of analysis of the V-D-J junction.
PNAS
Individual variation in germline and expressed B-cell immunoglobulin (Ig) repertoires has been associated with aging, disease susceptibility, and differential response to infection and vaccination. Repertoire properties can now be studied at large-scale through next-generation sequencing of rearranged Ig genes. Accurate analysis of these repertoire-sequencing (Rep-Seq) data requires identifying the germline variable (V), diversity (D), and joining (J) gene segments used by each Ig sequence. Current V(D)J assignment methods work by aligning sequences to a database of known germline V(D)J segment alleles. However, existing databases are likely to be incomplete and novel polymorphisms are hard to differentiate from the frequent occurrence of somatic hypermutations in Ig sequences. Here we develop a Tool for Ig Genotype Elucidation via Rep-Seq (TIgGER). TIgGER analyzes mutation patterns in Rep-Seq data to identify novel V segment alleles, and also constructs a personalized germline database containing the specific set of alleles carried by a subject. This information is then used to improve the initial V segment assignments from existing tools, like IMGT/HighV-QUEST. The application of TIgGER to Rep-Seq data from seven subjects identified 11 novel V segment alleles, including at least one in every subject examined. These novel alleles constituted 13% of the total number of unique alleles in these subjects, and impacted 3% of V(D)J segment assignments. These results reinforce the highly polymorphic nature of human Ig V genes, and suggest that many novel alleles remain to be discovered. The integration of TIgGER into Rep-Seq processing pipelines will increase the accuracy of V segment assignments, thus improving B-cell repertoire analyses.
Genetic variation in the immunoglobulin heavy chain locus shapes the human antibody repertoire
Variation in the antibody response has been linked to differential outcomes in disease, and suboptimal vaccine and therapeutic responsiveness, the determinants of which have not been fully elucidated. Countering models that presume antibodies are generated largely by stochastic processes, we demonstrate that polymorphisms within the immunoglobulin heavy chain locus (IGH) significantly impact the naive and antigen-experienced antibody repertoire, indicating that genetics predisposes individuals to mount qualitatively and quantitatively different antibody responses. We pair recently developed long-read genomic sequencing methods with antibody repertoire profiling to comprehensively resolve IGH genetic variation, including novel structural variants, single nucleotide variants, and genes and alleles. We show that IGH germline variants determine the presence and frequency of antibody genes in the expressed repertoire, including those enriched in functional elements linked to V(D)J recomb...