Whole-genome and whole-exome sequencing of bladder cancer identifies frequent alterations in genes involved in sister chromatid cohesion and segregation (original) (raw)

. Author manuscript; available in PMC: 2020 Sep 24.

Published in final edited form as: Nat Genet. 2013 Oct 13;45(12):1459–1463. doi: 10.1038/ng.2798

Abstract

Bladder cancer is one of the most common cancers worldwide, with transitional cell carcinoma (TCC) being the predominant form. Here we report a genomic analysis of TCC by both whole-genome and whole-exome sequencing of 99 individuals with TCC. Beyond confirming recurrent mutations in genes previously identified as being mutated in TCC, we identified additional altered genes and pathways that were implicated in TCC. Notably, we discovered frequent alterations in STAG2 and ESPL1, two genes involved in the sister chromatid cohesion and segregation (SCCS) process. Furthermore, we also detected a recurrent fusion involving FGFR3 and TACC3, another component of SCCS, by transcriptome sequencing of 42 DNA-sequenced tumors. Overall, 32 of the 99 tumors (32%) harbored genetic alterations in the SCCS process. Our analysis provides evidence that genetic alterations affecting the SCCS process may be involved in bladder tumorigenesis and identifies a new therapeutic possibility for bladder cancer.


Bladder cancer is one of the most common genitourinary malignancies in the world, with an estimated 386,300 new cases and 150,200 deaths in 2008 alone1. Previous studies showed that bladder cancer represents a heterogeneous disease, with two distinct subtypes—superficial and invasive—showing variable clinical presentation and genetic background2. Recently, we reported frequent mutations in eight chromatin-remodeling genes (UTX, MLL-MLL3, CREBBP-EP300, NCOR1, ARID1A and CHD6) in TCC of the urinary bladder3. Nevertheless, a comprehensive catalog of genetic alterations is far from complete, and the key drivers of TCC tumorigenesis remain poorly understood.

To improve understanding of the genetic basis of TCC, we performed whole-exome sequencing of tumor and matched peripheral blood samples from 99 individuals with TCC (Supplementary Fig. 1 and Supplementary Tables 1 and 2). After several rigorous bioinformatic analysis steps (Online Methods), we identified 11,240 candidate somatic mutations, and validation experiments on 1,119 predicted somatic substitutions and 91 indels confirmed 1,023 (91%) and 67 (74%), respectively, by Sanger sequencing (Supplementary Tables 3 and 4). We also analyzed DNA from the same cohort by whole-genome sequencing to detect copy number alterations (CNAs) and obtained fourfold mean haploid coverage for each sample (Supplementary Fig. 2). In addition, we performed transcriptome sequencing (RNA-seq) on a subset of 42 DNA-sequenced tumors, including 16 for which matched, morphologically normal bladder tissue was available (Supplementary Table 5).

To identify genes whose mutations were associated with TCC, we assessed the statistical significance of the observed mutation prevalence for each gene as described previously3. In total, we identified 37 significantly mutated genes (Fig. 1 and Supplementary Table 6), which included 7 well-known bladder cancer genes (TP53 (ref. 4), HRAS5, FGFR3 (ref. 6), PIK3CA7, RB1 (ref. 8), KRAS5 and TSC1 (ref. 9)). Consistent with our previous study of TCC3, eight chromatin-remodeling genes (UTX, ARID1A, MLL-MLL3, _CREBBP_-EP300, NCOR1 and CHD6) were significantly mutated in the cohort (Supplementary Fig. 3). In addition, we examined mutation frequencies for genes and gene families (Supplementary Table 4) and observed frequent mutations in multiple tumors in other chromatin-remodeling genes, including the histone demethylase genes KDM6A (UTX) and UTY(30%), the chromatin-remodeler genes ARID1A and ARID4A (17%), the histone lysine methyltransferase genes KMT2A (MLL), KMT2C (MLL3) and KMT2E (MLL5) (16%), the histone acetyltransferase genes EP300 and EP400 (15%), the SWI/SNF complex–related genes SMARCA4 and SMARAC1 (7%) and the histone demethylase genes KDM5A (JARID1A) and KDM5B (JARID1B) (6%). In total, at least 57 of the 99 cases (58%) harbored somatic mutations in chromatin-remodeling genes, further indicating that altered epigenetic regulation of chromatin and chromatin post-translational modifications may be a major driver mechanism in TCC3.

Figure 1.

Figure 1

Significantly mutated genes in TCC as determined by exome sequencing. Significantly mutated genes are listed on the right. The percentage of bladder tumors with mutations detected by automated calling is noted on the left. The upper histogram shows the somatic mutation rate in each of the 99 tumors. The central heatmap shows the distribution of mutations across the sequenced samples. All the mutations shown were confirmed by Sanger sequencing. Genes with asterisks had mutations newly observed in TCC in this study.

We also found 13 new significantly mutated genes that have not previously been reported in TCC (Fig. 1). STAG2 (encoding stromal antigen 2) was particularly notable, ranking first in mutational significance among the 13 newly identified mutated genes and harboring a greater number of nonsynonymous mutations (P = 8 × 10−11) and a higher ratio of nonsynonymous to synonymous mutations (P = 0.02) than expected by chance. STAG2, a gene on the X chromosome, encodes a component of the cohesin complex associated with the SCCS process of the cell cycle, which regulates the separation of sister chromatids during cell division. We found that 11 cases (11%) harbored mutations in STAG2, and 9 of the 11 mutations were truncating mutations (3 small frameshifting indels and 4 nonsense and 2 splice-site mutations) (Fig. 2a and Supplementary Fig. 4). In addition, STAG2 genomic deletions were observed in 5 of the 99 tumors (Fig. 2b and Supplementary Fig. 5). By screening all exons of STAG2 in an additional 50 tumor-normal pairs form individuals with TCC by Sanger sequencing, we identified 5 somatic mutations in 4 tumors (Supplementary Table 7). We also examined the methylation status of the STAG2 promoter in 30 TCCs by bisulfite sequencing and found STAG2 promoter hypermethylation in 7 tumors (23%) relative to matched normal samples (Supplementary Table 8).

Figure 2.

Figure 2

STAG2 somatic mutations and copy number changes in TCC. (a) Somatic alterations overlaid on the STAG2 protein with the conserved protein domains highlighted. STAG, STAG domain; SCD, stromalin conservative domain. (b) Five tumors harboring genomic deletions of STAG2 on the X chromosome (ideogram shown on the left). (c) Kaplan-Meier survival analysis of individuals with TCC (n = 99) shows that the survival rate for individuals with somatic STAG2 alterations (n = 16) is significantly lower (log-rank test P < 0.001) than for individuals with wild-type STAG2 (n = 83).

To explore the association between STAG2 alterations and individual survival, we performed Kaplan-Meier survival analysis on our data and found that individuals with STAG2 somatic alterations had a much worse prognosis compared to individuals with wild-type STAG2 (Fig. 2c). Similarly, we also observed significant differences (P < 0.001) in survival between cases with and without STAG2 somatic alterations in the superficial and invasive subtypes of TCC (Supplementary Fig. 6), suggesting that STAG2 may be an independent predictor of worse outcome in TCC. A previous report showed that inactivation of STAG2 causes chromatid cohesion defects and aneuploidy in Ewing sarcoma, glioblastoma and melanoma10. In our study, we evaluated aneuploidy by quantifying the extent of copy number changes on chromosome arms and found that tumors with genetic alterations in STAG2 had more aneuploidy (P = 0.01) than tumors with wild-type SCCS genes (Supplementary Fig. 7), which is consistent with a published study10. In contrast with the observation in solid tumors, many _STAG2_-mutated primary leukemias have completely normal karyotypes11,12. These contradictory observations in different types of cancer suggest that further comprehensive investigation of STAG2 is warranted to fully explore its exact role in causing aneuploidy and in TCC tumorigenesis. In addition to aberrations in well-known and emerging cancer genes, our study also identified frequent mutations in several genes not yet associated with TCC, such as ERCC2 (Supplementary Fig. 8), TRRAP and FAT4 (Supplementary Note), and further experiments are highly recommended to investigate the functions of these genes in TCC.

We also profiled the 99 tumors for CNAs using whole-genome sequencing (Supplementary Fig. 9) and found abnormalities of chromosomal arms or entire chromosomes that predominantly involved gain of 5p, 8q, 13p, 20p and 20q and loss of 8p, 9p, 9q, 11p, 14p, 15p, 17p and 21p (Supplementary Fig. 10). The patterns of broad cytogenetic gain and loss were consistent with previous studies13, and no significantly disparate pattern was observed across the different subgroups of TCC. Profiling of CNAs identified many putative cancer driver genes that may be implicated in TCC tumorigenesis (Supplementary Fig. 11). We applied GSITIC14 analysis to whole-genome sequencing data to identify recurrent focal CNAs, and this analysis yielded 84 regions of focal amplification (Supplementary Fig. 12 and Supplementary Table 9), several of which included CNAs previously detected in bladder cancer13, which encompassed genes such as TRIO, MDM2, MYC, E2F3, CCND1 and ERBB2 (Supplementary Figs. 13-15). Other amplifications defined regions of CNAs reported for the first time, to our knowledge, in bladder cancer, including CCNE1, CEBPA, E2F1 and MUC1 (Supplementary Fig. 12). An interesting finding was frequent amplification of DHFR, encoding dihydrofolate reductase, which is a target of many anticancer agents15, at 5q (frequently loss in our cohort) in 14 tumors (14%) (Supplementary Figs. 16 and 17 and Supplementary Note). We also identified 80 recurrent focal deletion regions, including RB1 and CREBBP, which were also frequently truncated in TCCs. One of the most common focal deletions, detected in 50 tumors (50%), was a deleted region at 9p21 containing CDKN2A and CDKN2B (Supplementary Fig. 18) and is a widely reported genomic alteration in bladder cancer16.

We next searched for genome rearrangements using the RNA-seq data set instead of the whole-genome sequencing data set owing to the low sequence coverage achieved in whole-genome sequencing. Overall, we detected 32 candidate rearrangements that resulted in gene fusions (Supplementary Table 10). The only recurrent fusion event involved FGFR3 fusion in frame with TACC3 in cases B59-3 and B100 (2/42, 5%; Fig. 3a and Supplementary Figs. 19 and 20). Analysis of junction-spanning and mate-pair reads derived from the whole-genome sequencing data confirmed the presence of the FGFR3-TACC3 fusion in both tumors. TACC3, which is located 70 kb away from FGFR3, encodes a microtubule-associated protein that is important for spindle stability and organization in the SCCS process and has been mapped to a chromosomal region that is disrupted in some cancers17. In expression analysis from RNA-seq coverage, the B59-3 and B100 tumors had markedly higher, outlier expression of TACC3 compared to normal bladder tissues and other tumors without the FGFR3-TACC3 fusion (Fig. 3b), and TACC3 mRNA was predominantly present in fused transcripts in both tumors (Fig. 3c and Supplementary Fig. 20c). Although FGFR3 and TACC3 are located in a focal region that was frequently amplified in our cohort (Supplementary Fig. 12), the low level of amplification of this region in the B100 tumor and its apparent deletion in the B59-3 tumor (Supplementary Fig. 21) indicate that the high expression of TACC3 was unlikely to be caused by TACC3 amplification but is likely mediated by transcriptional regulatory elements in the promoter of its fusion partner, FGFR3. A previous study reported a similar _FGFR3_-TACC3 fusion in glioblastoma multiforme and found that the fusion protein could induce mitotic and chromosomal segregation defects and trigger aneuploidy18. A low frequency of FGFR3-TACC3 fusions (9%) was also reported in a recent study using bladder cancer cell lines19. In addition to mutations in FGFR3, frequent amplification of the locus with TACC3 and FGFR3 in our cohort and recurrent FGFR3-TACC3 fusion events suggest a close relationship between this locus and bladder tumorigenesis. This idea is further supported by the discovery of an association between a SNP in this locus and increased risk of bladder cancer20.

Figure 3.

Figure 3

FGFR3-TACC3 fusion was identified in TCC. (a) Genomic fusion of intron 17 of FGFR3 with intron 10 of TACC3 resulting in exon 17 of FGFR3 being spliced 5’ to exon 11 of TACC3 in the fused mRNA. Triangles indicate the genomic positions of the breakpoints. Detailed information on the positions and sequences of primers P1 and P3 is provided in Supplementary Table 14. (b) Outlier high expression of TACC3 in TCCs harboring FGFR3-TACC3 gene fusions. RPKM, reads per kilobase of exon region in a gene per million mapped reads, (c) RNA-seq coverage analysis of FGFR3 (top) and TACC3 (bottom) in the tumor and matched normal bladder tissue from B59-3. Three transcripts of FGFR3 and one transcript of TACC3 are shown. Black dotted lines indicate breakpoints. E, exon.

To construct a comprehensive view of the common genetic alterations underlying the TCC genome, we performed integrative pathway analysis of the somatic mutations and CNAs. In addition to chromatin remodeling, we also discovered a number of pathways that may be implicated in TCC, including the Ras–mitogen-activated protein kinase (MAPK) and phosphoinositol 3-kinase (PI3K)-AKT signaling pathways that were commonly altered in bladder cancer2,13,21 (Supplementary Fig. 22 and Supplementary Table 11). The cell cycle was the second most frequently altered pathway, with alterations in 86 of the tumors (86%), including that alterations in genes with known roles in the G1/S checkpoint (loss-of-function mutations in RB1, amplifications of MYC, CCND1, CCNE1, E2F1 and E2F3, and deletions of CDKN2A and CDKN2B) and the G2/M checkpoint (inactivating mutations in ATM, ATR, CREBBP, EP300, BRCA1, BRCA2 and TP53 and amplifications of MDM2). Specifically, genes involved in the SCCS process were frequently altered in TCCs (Fig. 4). STAG2, NIPBL, SMC1A and SMC3, four genes with important roles in sister chromatid cohesion, were altered in 16%, 4%, 3% and 2% of the tumors, respectively. We also observed frequent mutations in ESPL1 (6% of tumors), a gene encoding a protein with a central role in chromosome segregation through its cleavage of the cohesin complexes that hold sister chromatids together. Moreover, TACC3, also critical for the SCCS process, was alerted by genome rearrangement in 5% of the tumors. Rare mutations in the other SCCS genes BUB1, BUBR1B, BUB3, MAD1L1 (MAD1), MAD2 (MAD2L1) and CENPE have been reported in several types of cancer, including bladder cancer22, but no genomic alterations in these genes were observed in our cohort. In total, we identified genetic aberrations in the SCCS process in 32 of our 99 subjects with TCC (32%). The SCCS process, a major cell cycle control mechanism during mitosis, prevents chromosome missegregation if spindle assembly is perturbed, and misfunction of the SCCS process can generate chromosomal instability and aneuploidy, the most common characteristic of human solid tumors. As expected, we found that tumors with alterations in SCCS process genes had significantly higher aneuploidy (P = 0.01) than tumors without alterations in these genes (Supplementary Fig. 23). Up to now, only a few SCCS process genes have been shown to be altered at an appreciable frequency10,23. Discovery of high-frequency alterations in the SCCS process shows that genetic disruption of the genes directly controlling chromosome cohesion and segregation might have an important role in bladder tumorigenesis.

Figure 4.

Figure 4

Frequent genetic alterations in genes from the cell cycle pathway in TCC. Alterations are defined as somatic mutations, focal amplifications and deletions, and, in some cases, as gene fusion events. Alteration frequencies are expressed as a percentage of all cases.

In the present study, we performed genomic analysis of 99 TCCs by whole-genome sequencing, whole-exome sequencing and RNA-seq. Although neither whole-genome sequencing nor whole-exome sequencing in our study achieved high coverage and more sequencing data are needed to identify additional rare mutations, our analysis provides a more comprehensive catalog of genomic alterations in TCC. Notably, we found highly frequent genomic alterations in genes involved in the SCCS process, including alterations in STAG2, ESPL1 and NIPBL, and a recurrent FGFR3-TACC3 fusion in TCC. In contrast to other cancers in which only rare or low-frequency alterations in SCCS genes have been observed10,22,23, bladder cancer is the first type of cancer, to our knowledge, with predominant genetic lesions in genes involved in the SCCS process. Although we do not yet understand the detailed mechanisms by which STAG2 may contribute to TCC pathogenesis, the discovery of highly recurrent alterations in the genes involved in the SCCS process in TCC identifies yet another pathway that is undoubtedly relevant for TCC. Our analysis suggests the necessity of comprehensive investigation of SCCS genes to elucidate their exact roles in causing aneuploidy or bladder tumorigenesis.

ONLINE METHODS

Sample description and preparation.

Tumor samples with matched peripheral blood or normal controls (morphologically adjacent normal bladder tissues) were obtained from individuals newly diagnosed with TCC at the member institutions of the Urinogenital Cancer Genomics Consortium (UCGC) in China. Each subject was properly informed before recruitment for the study, according to the regulations of the institutional ethics review boards. Detailed clinical information for the subjects is summarized in Supplementary Table 1. All specimens were snap frozen in liquid nitrogen upon collection and were immediately stored at −80 °C for further study. Sections stained with hematoxylin and eosin, prepared using cancerous tissues, were microscopically evaluated by two independent pathologists. In the present study, only TCCs with malignant cell purities over 85% were selected for DNA extraction and subsequent sequencing.

Genomic DNA extraction and Illumina-based whole-genome and whole-exome sequencing.

Genomic DNA from tumor and matched peripheral blood samples for the 99 individuals with TCC was isolated using QIAamp DNA Mini kits (Qiagen), and DNA libraries were constructed according to the protocol provided by the manufacturer (Illumina). Whole-genome sequencing data were generated using the Illumina HiSeq 2000 platform in 2 × 100-bp paired-end reads.

For whole-exome sequencing, genomic DNA from the same 99 tumor-blood pairs was fragmented and subjected to whole-exome capture with the SureSelect Human All Exon 50Mb kit (Agilent Technologies) following the manufacturer’s protocols. Exome capture libraries were then sequenced on the HiSeq 2000 platform according to the manufacturer’s instructions, and 2 × 100-bp paired-end reads were generated.

Total RNA extraction and Illumina-based RNA-seq.

RNA from 42 bladder tumors and 16 matched normal bladder tissues was used to generate mRNA-seq libraries using the TruSeq RNA Sample Preparation kit (Illumina). mRNA-seq libraries were then sequenced on the HiSeq 2000 platform according to the manufacturer’s recommendations, and 2 × 90-bp paired-end reads were generated. After removing reads containing sequencing adaptors and reads of low quality, we aligned reads to the human genome (hg18) and Ensembl annotated genes (release 64) using SOAP2 (ref. 24), allowing up to two mismatches. Gene expression levels based on RNA-seq data were measured in RPKM25.

Whole-exome sequencing read mapping and detection of somatic mutations.

After removing whole-exome sequencing reads containing sequencing adaptors and low-quality reads with more than five unknown bases, high-quality paired-end reads were gap aligned to the NCBI human reference genome (hg18) using Burrows-Wheeler Aligner (BWA)26. We then performed local realignment of the BWA-aligned reads using the Genome Analysis Toolkit (GATK)27. Raw lists of potential somatic substitutions were called by VarScan (v2.2)28 on the basis of BWA alignments. In this process, several heuristic rules were applied: (i) both tumor and matched blood samples should be covered sufficiently (≥8~) at the genomic positions being compared; (ii) average base quality for a given genomic position should be no less than 15 in both the tumor and blood samples; (iii) variants should be supported by at least 20% of the total reads from tumors, and no high-quality variant-supporting reads were allowed in blood samples; and (iv) variants should be supported by at least 3 reads in tumors. Using these same criteria, the preliminary list of somatic indels was called out by GATK on the basis of local realignment results. After these two steps, germline variants could be effectively removed. To further reduce the number of false positive calls, variations in tumors, including single-nucleotide variants (SNVs) and indels, were called with the SAMtools software package. We eliminated all somatic variants that fulfilled any one of the following filtering criteria: (i) variants with Phred-like scaled consensus scores or SNP qualities of <20; (ii) variants with mapping qualities of <30; (iii) indels represented by only one DNA strand; and (iv) substitutions located within 30 bp of a dubious indel. To eliminate previously described germline variants, somatic mutations were cross-referenced against the dbSNP (version 132) database and SNP data sets from the 1000 Genomes Project. Any mutations present in these data sets were filtered out, and the remaining mutations were subjected to subsequent analyses.

Validation of somatic substitutions and indels by Sanger sequencing.

Non-silent somatic substitutions and indels were validated by Sanger sequencing based on PCR amplification. PCR primers for putative somatic variants were designed by Primer3 in silico and initially used to amplify source DNA from tumors. PCR was performed on the Dual 96-Well GeneAmp PCR System 9700 (Applied Biosystems), and 20 ng of template DNA from each sample was used per reaction. Products were sequenced with the 3730xl DNA Analyzer (Applied Biosystems). All sequences were analyzed with Sequencing Analysis Software Version 5.2 (Applied Biosystems). If mutations were successfully confirmed in tumors, the same primer pairs were used to amplify DNA from the blood of the same subjects to determine the somatic status of the observed mutations.

Analysis of significantly mutated genes.

The background mutation rate was estimated on the basis of the number of synonymous mutations identified by whole-exome sequencing in the 99 tumors and was defined as the product of the synonymous mutation rate and the ratio of nonsynonymous to synonymous mutations (1.4) observed in the HapMap database. Briefly, synonymous somatic mutations were classified into seven categories according to their sequence contexts and mutation types, as described in our previous study29. For each mutation category, i, we let the observed number of mutations for this category be mi and the total number of successfully sequenced nucleotides (≥8×) for this category in the 99 tumors be ni. The background mutation rate for this category, bi, was calculated as 1.4 × mi/ni. To test whether the rate of non-silent mutations for a given gene was significantly higher than the background rate, confirmed mutation data for the gene were obtained from whole-exome sequencing. We then estimated the passenger probability for each gene in turn as described by Sjöblom et al.30. Specifically, the probability (pgi) of obtaining the observed number of mutations for each category (i) in gene g was estimated from a binominal distribution, with bi being the success probability. The number of available nucleotides for each category was the total number of sufficiently covered (≥8×) bases for that particular category in all 99 tumors. The passenger probability (pg) for gene g was calculated as the product of the seven category-specific probabilities, where

We then determined the P value for each gene with the likelihood-ratio test as described by Getz et al.31. We considered the genes showing significantly (P < 0.01) higher mutation rates than the background rate and harboring non-silent mutations in at least 5 of the 99 tumors to be significantly mutated.

Screen of STAG2 mutations in additional individuals with TCC.

We designed 33 pairs of primers that uniquely amplify exons 3 to 35 of the STAG2 gene (Supplementary Table 12) using Primer5.0 (Premier Biosoft). PCR was performed on the Dual 96-Well GeneAmp PCR System 9700 using DNA from tumor and matched blood samples for an additional 50 individuals with TCC. Products were sequenced with a 3730xl DNA Analyzer. All sequences were analyzed with Sequencing Analysis Software Version 5.2.

STAG2 promoter methylation analysis by bisulfite sequencing.

Genomic DNA (0.5–2 μg) from tumor and normal bladder tissues from 30 cases was submitted to bisulfite modification using the EpiTect Bisulfite kit (Qiagen) according to the manufacturer’s instructions. The methylation status of the putative STAG2 promoter, which was predicted by Proscan32, was tested by bisulfite sequencing PCR. Bisulfite PCR primers (Supplementary Table 12) were designed using the online program MethPrimer33. Bisulfite-treated DNA (2 μl) was amplified in a 50-μl reaction using HotStarTaq DNA polymerase (Qiagen) with the following cycling conditions: 10 min at 95 °C, 45 cycles of 1 min at 94 °C, 45 s at 61 °C and 1 min at 72 °C, and a final extension step at 72 °C for 10 min. PCR products were purified and ligated into the pMD-18 T vector. Ten white clones for each sample were randomly selected for sequencing to determine the methylation status of each CpG site. BISMA (Bisulfite Sequencing DNA Methylation Analysis) software34 was used to determine the methylation status for each CpG site and to present the methylation pattern. The log2 ratio of fold change (tumor versus normal) in the methylation level at the STAG2 promoter for each of the 30 subjects was calculated. The _χ_2 test was performed to calculate P values and identify differentially methylated samples. STAG2 promoter with ≥10% methylation, a log2 ratio of ≥1 and a calculated P value of ≤0.001 was defined as hypermethylated.

CNA analysis.

We used SegSeq35 to infer somatic CNAs in TCC genomes using whole-genome sequencing reads. Resulting copy number segments were mapped to individual genes to determine gene-level copy numbers and copy gain or loss status using thresholds of ≥2.5 copies for gain and ≤1.5 copies for loss. To infer recurrently amplified or deleted genomic regions, we reimplemented the GISTIC algorithm14, using copy numbers in 100-kb windows instead of SNP array probes as markers. G scores were calculated for genomic regions on the basis of the frequency and amplitude of amplifications or deletions. A significant CNA region was defined as having amplification G score > 0.08 or deletion G score < 0.08, corresponding to a _P_-value threshold of 0.01 from the permutation-derived null distribution14.

Validation in CNAs by real-time quantitative PCR.

Validation in CNAs was performed by real-time quantitative PCR using SYBR Premix Ex Taq II (TAKARA) on an ABI StepOne instrument (Applied Biosystems) as previously described36. CNA primers were designed using Primer3 to amplify a 100- to 200-bp fragment within each CNA using sequences from the UCSC Genome Browser. Five endogenous control primers were also designed using ultraconserved sequences (Supplementary Table 13). First, a set of gradient dilutions (1:10; 30 ng–0.003 ng) of genomic DNA from a healthy individual was used for RT-PCR amplification to generate a standard curve for each pair of primers. Then, each pair of CNA primers was matched with a pair of endogenous control primers with similar amplification efficiency. Second, each pair of CNA and endogenous control primers was used for RT-PCR amplification for each sample in parallel. Δ_C_T was calculated by subtracting the endogenous control _C_T value from the CNA _C_T value. ΔΔ_C_T was then determined by subtracting the reference sample Δ_C_T value from the test Δ_C_T value. The log2 ratio, as expressed by ΔΔ_C_T, for each CNA was then compared with the log2 ratio of the copy number changes detected in whole-genome sequencing data.

Validation of DHFR amplifications by FISH.

DHFR gene amplifications were further validated by FISH on using formalin-fixed, paraffin-embedded slides of seven TCC cases that were also measured by real-time quantitative PCR as described above. All FISH assays were performed using DHFR SpectrumOrange/CEP 5 Spectrum Green oligonucleotide FISH probes (Agilent Technologies)37 according to the manufacturer’s protocol. Briefly, 4-μm paraffin-embedded tissue sections were deparaffinized and digested with proteinase, and antigen retrieval was performed. Sections were then hybridized with the DHFR and CEP 5 probes at 37 °C for 18–24 h using an Abbott Molecular Thermobrite. After slides were washed and counterstained with DAPI, signal analysis was performed on an IX71 fluorescence microscope (Olympus). The gene amplification ratio (the gene per chromosome per cell ratio) was evaluated according to the manufacturer’s protocol (Agilent Technologies).

Detection and validation of gene fusions.

We used SOAPfuse38 to detect gene fusion events from RNA-seq data using the default parameters. After mapping RNA-seq reads against the human reference genome sequence (hg18) and Ensembl annotated genes (release 64), SOAPfuse seeks two types of reads to support fusion detection: discordant mapping paired-end reads that connect candidate gene pairs and junction reads that confirm the exact junction sites. SOAPfuse can detect fusion events generated by genome rearrangements with breakpoints in intron and exon regions. Finally, we identified a recurrent gene fusion, FGFR3-TACC3, in B59-3 and B100 tumors.

To validate gene fusions on the RNA and DNA levels, PCR primers (Supplementary Table 14) were designed using flanking sequences from both sides of the predicted breakpoints by Primer5.0, and PCR products across the breakpoints ranged from 100 bp to 600 bp in length. PCR was performed on a GeneAmp PCR System 9700 thermal cycler using template cDNA or DNA from both tumors and matched normal controls. Products were gel selected when there were non-specific bands and were sequenced with a 3730xl DNA Analyzer. Sequences were analyzed with Sequencing Analysis Software Version 5.2 and compared by BLAST to the reference genome (hg18) to validate rearrangement events and exact breakpoints.

Pathway enrichment analysis.

We performed pathway enrichment analysis using WebGestalt39 by examining the distribution of genes with somatic mutations or copy number changes within the KEGG database. The significance of mutation enrichment was determined by a hypergeometric test and was adjusted for multiple testing with the Benjamini-Hochberg false discovery rate (FDR).

Supplementary Material

Supplemental table 4

Supplemental table 3

Supplemental figures tables

ACKNOWLEDGMENTS

This work was supported by grants from the Chinese High-Tech (863) Program (2012AA02A201 and 2012AA02A208), the Guangdong Innovative Research Team Program (2009010016), the State Key Development Program for Basic Research of China-973 Program (2011CB809203 and 2014CB745200) and the Shenzhen Municipal Government of China (JC201005260191A, CXB201108250096A, ZDSY20120615154448514 and BGI20100001).

Footnotes

METHODS

Methods and any associated references are available in the online version of the paper.

Accession codes. All sequencing data from this study have been deposited in the Sequence Read Archive (SRA) under accession SRA063495.

Note: Any Supplementary Information and Source Data files are available in the online version of the paper.

COMPETING FINANCIAL INTERESTS

The authors declare no competing financial interests.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental table 4

Supplemental table 3

Supplemental figures tables