The diploid genome sequence of an Asian individual - PubMed (original) (raw)

Comparative Study

. 2008 Nov 6;456(7218):60-5.

doi: 10.1038/nature07484.

Wei Wang, Ruiqiang Li, Yingrui Li, Geng Tian, Laurie Goodman, Wei Fan, Junqing Zhang, Jun Li, Juanbin Zhang, Yiran Guo, Binxiao Feng, Heng Li, Yao Lu, Xiaodong Fang, Huiqing Liang, Zhenglin Du, Dong Li, Yiqing Zhao, Yujie Hu, Zhenzhen Yang, Hancheng Zheng, Ines Hellmann, Michael Inouye, John Pool, Xin Yi, Jing Zhao, Jinjie Duan, Yan Zhou, Junjie Qin, Lijia Ma, Guoqing Li, Zhentao Yang, Guojie Zhang, Bin Yang, Chang Yu, Fang Liang, Wenjie Li, Shaochuan Li, Dawei Li, Peixiang Ni, Jue Ruan, Qibin Li, Hongmei Zhu, Dongyuan Liu, Zhike Lu, Ning Li, Guangwu Guo, Jianguo Zhang, Jia Ye, Lin Fang, Qin Hao, Quan Chen, Yu Liang, Yeyang Su, A San, Cuo Ping, Shuang Yang, Fang Chen, Li Li, Ke Zhou, Hongkun Zheng, Yuanyuan Ren, Ling Yang, Yang Gao, Guohua Yang, Zhuo Li, Xiaoli Feng, Karsten Kristiansen, Gane Ka-Shu Wong, Rasmus Nielsen, Richard Durbin, Lars Bolund, Xiuqing Zhang, Songgang Li, Huanming Yang, Jian Wang

Affiliations

Comparative Study

The diploid genome sequence of an Asian individual

Jun Wang et al. Nature. 2008.

Abstract

Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics.

PubMed Disclaimer

Figures

Figure 1

Figure 1. The percentage of detected SNPs (a) and small indels (b) that overlap with SNPs and small indels in the dbSNP database (http://www.ncbi.nlm.nih.gov/SNP/, build 128)

The dbSNP alleles were separated into validated and non-validated SNPs, and the detected SNPs that were not present in dbSNP were classified as novel.

Figure 2

Figure 2. Genome coverage of the assembled consensus sequence and the accuracy of SNP detection as a function of sequencing depth

Analyses were carried out on human chromosome 12, and subsets of reads from all mapped 22.5× single-end and 13.5× paired-end reads were randomly extracted from areas of different average depth. The same method and filtering threshold (Q20) was used for SNP detection over different sequencing depths. The error rate for SNP calling—the sum of ‘over call’, ‘under call’ and ‘misses’ rate (see Supplementary Information)—was separated into heterozygotes (HET) and homozygotes (HOM), and was validated against the Illumina 1M genotyping alleles.

Figure 3

Figure 3. Summary of structural variations

a, Abundance of each class of structural variation. The overlap with known structural variations in the DGV (

http://projects.tcag.ca/variation/

) and with transposons (transposable elements, TEs) was calculated. About 34% of our identified structural variations are novel (having less than 10% of a portion of the YH structural variations overlapping with structural variations in the DGV). Transposable elements are a major component of the identified deletions, with Alus and LINEs involved in 49% and 34% of the deletions, respectively. b, An example of a deletion of a transposon complex on YH chromosome 1. The sequencing depth by both single-end and paired-end reads are shown. Normally aligned paired-end reads are shown in green, whereas abnormally aligned paired-end reads, which have unexpected long insert sizes or an incorrect orientation relationship, are shown in red. c, An example of an inversion on YH chromosome 19. Local assembly showed that a 102,405-bp fragment was inverted and reinserted in the genome. There are three genes in this sequence fragment, and the last exon of gene CYP4F12 was destroyed by this inversion event.

Figure 4

Figure 4. Size distribution of predicted haplotype blocks of autosomes

Haplotypes were constructed using PHASE software with the 700,300 autosomal heterozygous SNPs that overlapped with the CHB/JPT genotypes from the HapMap phase II data.

Comment in

Similar articles

Cited by

References

    1. International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. - PubMed
    1. Venter JC, et al. The sequence of the human genome. Science. 2001;291:1304–1351. - PubMed
    1. Levy S, et al. The diploid genome sequence of an individual human. PLoS Biol. 2007;5:e254. - PMC - PubMed
    1. Wheeler DA, et al. The complete genome of an individual by massively parallel DNA sequencing. Nature. 2008;452:872–876. - PubMed
    1. Church GM. The personal genome project. Mol. Syst. Biol. 2005;1 doi:10.1038/msb4100040. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources