Richeng Gao - Academia.edu (original) (raw)
Papers by Richeng Gao
The Genetics of Male Infertility, 2007
Genetic studies in humans have been limited by various factors, including small family size and d... more Genetic studies in humans have been limited by various factors, including small family size and diploidy of the human genome. The ability to use individual spermatozoa as subjects has significantly facilitated these studies. However, because each sperm usually contains ...
Comparative Genomics, 2007
The ability to analyze a large number of genetic markers consisting of single nucleotide polymorp... more The ability to analyze a large number of genetic markers consisting of single nucleotide polymorphisms (SNPs) may bring about significant advance in understanding human biology. Recent development of several high-throughput genotyping approaches has significantly facilitated large-scale SNP analysis. However, because of their relatively low sensitivity, application of these approaches, especially in studies involving a small amount of material, has been limited. In this chapter, detailed experimental procedures for a high-throughput and highly sensitive genotyping system are described. The system involves using computer program selected primers that are expected not to generate a significant amount of nonspecific products during PCR amplification. After PCR, a small aliquot of the PCR product is used as templates to generate single-stranded DNA (ssDNA). ssDNA sequences from different SNP loci are then resolved by hybridizing these sequences to the probes arrayed onto glass surface. The probes are designed in such a way that hybridizing to the ssDNA templates places their 3'-ends next to the polymorphic sites. Therefore, the probes can be labeled in an allele-specific way using fluorescently labeled dye terminators. The allelic states of the SNPs can then be determined by analyzing the amounts of different fluorescent colors incorporated to the corresponding probes. The genotyping system is highly accurate and capable of analyzing >1000 SNPs in individual haploid cells.
PLoS ONE, 2009
Background: Copy number variants (CNVs) occupy a significant portion of the human genome and may ... more Background: Copy number variants (CNVs) occupy a significant portion of the human genome and may have important roles in meiotic recombination, human genome evolution and gene expression. Many genetic diseases may be underlain by CNVs. However, because of the presence of their multiple copies, variability in copy numbers and the diploidy of the human genome, detailed genetic structure of CNVs cannot be readily studied by available techniques.
Nucleic Acids Research, 2006
Microarray-based analysis of single nucleotide polymorphisms (SNPs) has many applications in larg... more Microarray-based analysis of single nucleotide polymorphisms (SNPs) has many applications in largescale genetic studies. To minimize the influence of experimental variation, microarray data usually need to be processed in different aspects including background subtraction, normalization and lowsignal filtering before genotype determination. Although many algorithms are sophisticated for these purposes, biases are still present. In the present paper, new algorithms for SNP microarray data analysis and the software, AccuTyping, developed based on these algorithms are described. The algorithms take advantage of a large number of SNPs included in each assay, and the fact that the top and bottom 20% of SNPs can be safely treated as homozygous after sorting based on their ratios between the signal intensities. These SNPs are then used as controls for color channel normalization and background subtraction. Genotype calls are made based on the logarithms of signal intensity ratios using two cutoff values, which were determined after training the program with a dataset of $160 000 genotypes and validated by nonmicroarray methods. AccuTyping was used to determine .300 000 genotypes of DNA and sperm samples. The accuracy was shown to be .99%. AccuTyping can be downloaded from
Genome Research, 2005
Although the haplotype structure of the human genome has been studied in great detail, very littl... more Although the haplotype structure of the human genome has been studied in great detail, very little is known about the mechanisms underlying its formation. To investigate the role of meiotic recombination on haplotype block formation, single nucleotide polymorphisms were selected at a high density from a 2.5-Mb region of human chromosome 21. Direct analysis of meiotic recombination by high-throughput multiplex genotyping of 662 single sperm identifies 41 recombinants. The crossovers were nonrandomly distributed within 16 small areas. All, except one, of these crossovers fall in areas where the haplotype structure exhibits breakdown, displaying a strong statistically positive association between crossovers and haplotype block breaks. The data also indicate a particular clustered distribution of recombination hotspots within the region. This finding supports the hypothesis that meiotic recombination makes a primary contribution to haplotype block formation in the human genome.
Genome Research, 2005
A high-throughput genotyping system for scoring single nucleotide polymorphisms (SNPs) has been d... more A high-throughput genotyping system for scoring single nucleotide polymorphisms (SNPs) has been developed. With this system, >1000 SNPs can be analyzed in a single assay, with a sensitivity that allows the use of single haploid cells as starting material. In the multiplex polymorphic sequence amplification step, instead of attaching universal sequences to the amplicons, primers that are unlikely to have nonspecific and productive interactions are used. Genotypes of SNPs are then determined by using the widely accessible microarray technology and the simple single-base extension assay. Three SNP panels, each consisting of >1000 SNPs, were incorporated into this system. The system was used to analyze 24 human genomic DNA samples. With 5 ng of human genomic DNA, the average detection rate was 98.22% when single probes were used, and 96.71% could be detected by dual probes in different directions. When single sperm cells were used, 91.88% of the SNPs were detectable, which is comparable to the level that was reached when very few genetic markers were used. By using a dual-probe assay, the average genotyping accuracy was 99.96% for 5 ng of human genomic DNA and 99.95% for single sperm. This system may be used to significantly facilitate large-scale genetic analysis even if the amount of DNA template is very limited or even highly degraded as that obtained from paraffin-embedded cancer specimens, and to make many unpractical research projects highly realistic and affordable. 4 These authors contributed equally to this work. 5 Corresponding author. E-mail holi@umdnj.edu; fax (732) 235-8073. Article and publication date are at
BMC Genomics, 2011
Background: Segmental duplication and deletion were implicated for a region containing the human ... more Background: Segmental duplication and deletion were implicated for a region containing the human immunoglobulin heavy chain variable (IGHV) gene segments, 1.9III/hv3005 (possible allelic variants of IGHV3-30) and hv3019b9 (a possible allelic variant of IGHV3-33). However, very little is known about the ranges of the duplication and the polymorphic region. This is mainly because of the difficulty associated with distinguishing between allelic and paralogous sequences in the IGHV region containing extensive repetitive sequences. Inability to separate the two parental haploid genomes in the subjects is another serious barrier. To address these issues, unique DNA sequence tags evenly distributed within and flanking the duplicated region implicated by the previous studies were selected. The selected tags in single sperm from six unrelated healthy donors were amplified by multiplex PCR followed by microarray detection. In this way, individual haplotypes of different parental origins in the sperm donors could be analyzed separately and precisely. The identified polymorphic region was further analyzed at the nucleotide sequence level using sequences from the three human genomic sequence assemblies in the database. Results: A large polymorphic region was identified using the selected sequence tags. Four of the 12 haplotypes were shown to contain consecutively undetectable tags spanning in a variable range. Detailed analysis of sequences from the genomic sequence assemblies revealed two large duplicate sequence blocks of 24,696 bp and 24,387 bp, respectively, and an incomplete copy of 961 bp in this region. It contains up to 13 IGHV gene segments depending on haplotypes. A polymorphic region was found to be located within the duplicated blocks. The variants of this polymorphism unusually diverged at the nucleotide sequence level and in IGHV gene segment number, composition and organization, indicating a limited selection pressure in general. However, the divergence level within the gene segments is significantly different from that in the intergenic regions indicating that these regions may have been subject to different selection pressures and that the IGHV gene segments in this region are functionally important.
Genes and Immunity, 2005
Organization of the IGHV genes (n ¼ 108) on single human chromosomes has been determined by detec... more Organization of the IGHV genes (n ¼ 108) on single human chromosomes has been determined by detecting these sequences in single sperm using multiplex PCR amplification followed by microarray detection. A total of 374 single sperm samples from five Caucasian males were studied. Three deletion/insertion polymorphisms (Del I-Del III) with deletion allele frequencies ranging from 0.1 to 0.3 were identified. Del I is a previously reported polymorphism affecting three IGHV genes (IGHV1-8, IGHV3-9, and IGHV2-10). Del II affects a region 2-18 kb containing two pseudogenes IGHV(II)-28.1 and IGHV3-29, and Del III spans B21-53 kb involving genes IGHV4-39, IGHV7-40, IGHV(II)-40-1, and IGHV3-41. Deletion alleles of both Dels II and III were found in a heterozygous state, and therefore, could not be easily detected if haploid samples were not used in the study. Results of the present study indicate that deletions/insertions together with other possible chromosomal rearrangements may play an important role in forming the genetic structure of the IGHV region, and may significantly contribute to antibody diversity. Since these three polymorphisms are located within or next to the 3 0 half of the IGHV region, they may have an important role in the expressed IGHV gene repertoire during immune response.
The Genetics of Male Infertility, 2007
Genetic studies in humans have been limited by various factors, including small family size and d... more Genetic studies in humans have been limited by various factors, including small family size and diploidy of the human genome. The ability to use individual spermatozoa as subjects has significantly facilitated these studies. However, because each sperm usually contains ...
Comparative Genomics, 2007
The ability to analyze a large number of genetic markers consisting of single nucleotide polymorp... more The ability to analyze a large number of genetic markers consisting of single nucleotide polymorphisms (SNPs) may bring about significant advance in understanding human biology. Recent development of several high-throughput genotyping approaches has significantly facilitated large-scale SNP analysis. However, because of their relatively low sensitivity, application of these approaches, especially in studies involving a small amount of material, has been limited. In this chapter, detailed experimental procedures for a high-throughput and highly sensitive genotyping system are described. The system involves using computer program selected primers that are expected not to generate a significant amount of nonspecific products during PCR amplification. After PCR, a small aliquot of the PCR product is used as templates to generate single-stranded DNA (ssDNA). ssDNA sequences from different SNP loci are then resolved by hybridizing these sequences to the probes arrayed onto glass surface. The probes are designed in such a way that hybridizing to the ssDNA templates places their 3'-ends next to the polymorphic sites. Therefore, the probes can be labeled in an allele-specific way using fluorescently labeled dye terminators. The allelic states of the SNPs can then be determined by analyzing the amounts of different fluorescent colors incorporated to the corresponding probes. The genotyping system is highly accurate and capable of analyzing >1000 SNPs in individual haploid cells.
PLoS ONE, 2009
Background: Copy number variants (CNVs) occupy a significant portion of the human genome and may ... more Background: Copy number variants (CNVs) occupy a significant portion of the human genome and may have important roles in meiotic recombination, human genome evolution and gene expression. Many genetic diseases may be underlain by CNVs. However, because of the presence of their multiple copies, variability in copy numbers and the diploidy of the human genome, detailed genetic structure of CNVs cannot be readily studied by available techniques.
Nucleic Acids Research, 2006
Microarray-based analysis of single nucleotide polymorphisms (SNPs) has many applications in larg... more Microarray-based analysis of single nucleotide polymorphisms (SNPs) has many applications in largescale genetic studies. To minimize the influence of experimental variation, microarray data usually need to be processed in different aspects including background subtraction, normalization and lowsignal filtering before genotype determination. Although many algorithms are sophisticated for these purposes, biases are still present. In the present paper, new algorithms for SNP microarray data analysis and the software, AccuTyping, developed based on these algorithms are described. The algorithms take advantage of a large number of SNPs included in each assay, and the fact that the top and bottom 20% of SNPs can be safely treated as homozygous after sorting based on their ratios between the signal intensities. These SNPs are then used as controls for color channel normalization and background subtraction. Genotype calls are made based on the logarithms of signal intensity ratios using two cutoff values, which were determined after training the program with a dataset of $160 000 genotypes and validated by nonmicroarray methods. AccuTyping was used to determine .300 000 genotypes of DNA and sperm samples. The accuracy was shown to be .99%. AccuTyping can be downloaded from
Genome Research, 2005
Although the haplotype structure of the human genome has been studied in great detail, very littl... more Although the haplotype structure of the human genome has been studied in great detail, very little is known about the mechanisms underlying its formation. To investigate the role of meiotic recombination on haplotype block formation, single nucleotide polymorphisms were selected at a high density from a 2.5-Mb region of human chromosome 21. Direct analysis of meiotic recombination by high-throughput multiplex genotyping of 662 single sperm identifies 41 recombinants. The crossovers were nonrandomly distributed within 16 small areas. All, except one, of these crossovers fall in areas where the haplotype structure exhibits breakdown, displaying a strong statistically positive association between crossovers and haplotype block breaks. The data also indicate a particular clustered distribution of recombination hotspots within the region. This finding supports the hypothesis that meiotic recombination makes a primary contribution to haplotype block formation in the human genome.
Genome Research, 2005
A high-throughput genotyping system for scoring single nucleotide polymorphisms (SNPs) has been d... more A high-throughput genotyping system for scoring single nucleotide polymorphisms (SNPs) has been developed. With this system, >1000 SNPs can be analyzed in a single assay, with a sensitivity that allows the use of single haploid cells as starting material. In the multiplex polymorphic sequence amplification step, instead of attaching universal sequences to the amplicons, primers that are unlikely to have nonspecific and productive interactions are used. Genotypes of SNPs are then determined by using the widely accessible microarray technology and the simple single-base extension assay. Three SNP panels, each consisting of >1000 SNPs, were incorporated into this system. The system was used to analyze 24 human genomic DNA samples. With 5 ng of human genomic DNA, the average detection rate was 98.22% when single probes were used, and 96.71% could be detected by dual probes in different directions. When single sperm cells were used, 91.88% of the SNPs were detectable, which is comparable to the level that was reached when very few genetic markers were used. By using a dual-probe assay, the average genotyping accuracy was 99.96% for 5 ng of human genomic DNA and 99.95% for single sperm. This system may be used to significantly facilitate large-scale genetic analysis even if the amount of DNA template is very limited or even highly degraded as that obtained from paraffin-embedded cancer specimens, and to make many unpractical research projects highly realistic and affordable. 4 These authors contributed equally to this work. 5 Corresponding author. E-mail holi@umdnj.edu; fax (732) 235-8073. Article and publication date are at
BMC Genomics, 2011
Background: Segmental duplication and deletion were implicated for a region containing the human ... more Background: Segmental duplication and deletion were implicated for a region containing the human immunoglobulin heavy chain variable (IGHV) gene segments, 1.9III/hv3005 (possible allelic variants of IGHV3-30) and hv3019b9 (a possible allelic variant of IGHV3-33). However, very little is known about the ranges of the duplication and the polymorphic region. This is mainly because of the difficulty associated with distinguishing between allelic and paralogous sequences in the IGHV region containing extensive repetitive sequences. Inability to separate the two parental haploid genomes in the subjects is another serious barrier. To address these issues, unique DNA sequence tags evenly distributed within and flanking the duplicated region implicated by the previous studies were selected. The selected tags in single sperm from six unrelated healthy donors were amplified by multiplex PCR followed by microarray detection. In this way, individual haplotypes of different parental origins in the sperm donors could be analyzed separately and precisely. The identified polymorphic region was further analyzed at the nucleotide sequence level using sequences from the three human genomic sequence assemblies in the database. Results: A large polymorphic region was identified using the selected sequence tags. Four of the 12 haplotypes were shown to contain consecutively undetectable tags spanning in a variable range. Detailed analysis of sequences from the genomic sequence assemblies revealed two large duplicate sequence blocks of 24,696 bp and 24,387 bp, respectively, and an incomplete copy of 961 bp in this region. It contains up to 13 IGHV gene segments depending on haplotypes. A polymorphic region was found to be located within the duplicated blocks. The variants of this polymorphism unusually diverged at the nucleotide sequence level and in IGHV gene segment number, composition and organization, indicating a limited selection pressure in general. However, the divergence level within the gene segments is significantly different from that in the intergenic regions indicating that these regions may have been subject to different selection pressures and that the IGHV gene segments in this region are functionally important.
Genes and Immunity, 2005
Organization of the IGHV genes (n ¼ 108) on single human chromosomes has been determined by detec... more Organization of the IGHV genes (n ¼ 108) on single human chromosomes has been determined by detecting these sequences in single sperm using multiplex PCR amplification followed by microarray detection. A total of 374 single sperm samples from five Caucasian males were studied. Three deletion/insertion polymorphisms (Del I-Del III) with deletion allele frequencies ranging from 0.1 to 0.3 were identified. Del I is a previously reported polymorphism affecting three IGHV genes (IGHV1-8, IGHV3-9, and IGHV2-10). Del II affects a region 2-18 kb containing two pseudogenes IGHV(II)-28.1 and IGHV3-29, and Del III spans B21-53 kb involving genes IGHV4-39, IGHV7-40, IGHV(II)-40-1, and IGHV3-41. Deletion alleles of both Dels II and III were found in a heterozygous state, and therefore, could not be easily detected if haploid samples were not used in the study. Results of the present study indicate that deletions/insertions together with other possible chromosomal rearrangements may play an important role in forming the genetic structure of the IGHV region, and may significantly contribute to antibody diversity. Since these three polymorphisms are located within or next to the 3 0 half of the IGHV region, they may have an important role in the expressed IGHV gene repertoire during immune response.