A survey of homozygous deletions in human cancer genomes (original) (raw)

Abstract

Homozygous deletions of recessive cancer genes and fragile sites are known to occur in human cancers. We identified 281 homozygous deletions in 636 cancer cell lines. Of these deletions, 86 were homozygous deletions of known recessive cancer genes, 17 were of sequenced common fragile sites, and 178 were in genomic regions that do not overlap known recessive oncogenes or fragile sites (“unexplained” homozygous deletions). Some cancer cell lines have multiple homozygous deletions whereas others have none, suggesting intrinsic variation in the tendency to develop this type of genetic abnormality (P < 0.001). The 178 unexplained homozygous deletions clustered into 131 genomic regions, 27 of which exhibit homozygous deletions in more than one cancer cell line. This degree of clustering indicates that the genomic positions of the unexplained homozygous deletions are not randomly determined (P < 0.001). Many homozygous deletions, including those that are in multiple clusters, do not overlap known genes and appear to be in intergenic DNA. Therefore, to elucidate further the pathogenesis of homozygous deletions in cancer, we investigated the genome landscape within unexplained homozygous deletions. The gene count within homozygous deletions is low compared with the rest of the genome. There are also fewer short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs), and low-copy-number repeats (LCRs). However, DNA within homozygous deletions has higher flexibility. These features may signal the presence of currently unrecognized zones of susceptibility to DNA rearrangement. They may also reflect a tendency to reduce the adverse effects of homozygous deletions by minimizing the number of genes removed.


Somatically acquired homozygous deletion is one of several mutational mechanisms through which the proteins encoded by recessive cancer genes (tumor-suppressor genes) are inactivated. Positional cloning of many recessive cancer genes, for example, CDKN2A/p16 (1), PTEN (2), hMAD4 (3), SMARCB1 (4), and RB1 (5) has critically depended on fine mapping of homozygous deletions. Homozygous deletions in human cancers have also been reported over common fragile sites, genomic regions that manifest high breakage frequencies in normal cells after exposure to agents such as aphidicolin that interfere with DNA replication (6, 7). The elevated frequency of breakage in cancer cells probably reflects the presence of DNA-repair defects that result in expression of the intrinsic structural fragility of these genomic regions, although it is possible that deletion of genes in the vicinity of fragile sites also confers clonal selective advantage.

Because homozygous deletions have been instrumental in the identification of recessive cancer genes, searches for additional homozygous deletions in cancer genomes have been conducted. In many instances, however, unambiguous identification of the genes that are the putative targets of these homozygous deletions has not been reported. This inability may be attributable to insufficiently exhaustive screening for other types of mutation, such as base substitutions or small frameshifting insertions/deletions that induce premature translational termination, the hallmarks of most recognized recessive cancer genes. It is also conceivable that some recessive cancer genes are inactivated exclusively by large homozygous deletions or by this mechanism together with aberrant methylation of promoter regions and consequent transcriptional repression. Alternatively, some homozygous deletions may not confer any clonal growth advantage and may occur in regions of the genome with architectural features that result in susceptibility to DNA rearrangements. Finally, a subset of homozygous deletions may simply be random events occurring as a result of abnormal DNA-repair processes in cancer cells but are unrelated to either genome structure or the causation of oncogenesis.

The advent of the finished human genome sequence provides new avenues through which we can investigate the development of genetic disease. In this study, we have identified a large series of homozygous deletions in human cancers and examined the structure of the genome within them to characterize their patterns and elucidate their pathogenesis.

Materials and Methods

Detection of Homozygous Deletions. The studies were reviewed by the Cambridge (Addenbrookes) Local Research Ethics committee. Cancer cell lines were obtained from a number of public resources and were cultured under recommended conditions. DNAs from cancer cell lines were first analyzed by using the set of Applied Biosystems LMS-MD10 400 polymorphic microsatellite markers on ABI 3100 DNA sequencers to determine regions of loss of heterozygosity (LOH) and to detect duplicates. All duplicates of cell lines were removed from the analyses. For the identification of homozygous deletions, we divided the genome into 36 regions. Cell lines showing LOH in a particular genomic region were examined for homozygous deletions by using markers within the region that amplify cytosine-adenine (CA) dinucleotide (microsatellite) repeats. These markers were derived from the set previously reported by the Genethon group (8) and are spaced, on average, at ≈500-kb intervals. Markers were analyzed on each cell line in duplicate in a PCR-based TaqMan assay by using a CA11 TaqMan probe oligonucleotide labeled with 6-carboxy-fluorescein (FAM) and tetramethylrhodamine (TAMRA). End-point analysis was performed on a Spectrofluor Plus microplate reader (Tecan, Reading, U.K.,) by recording FAM emissions (excite 495 nm/absorb 535 nm) and TAMRA emissions (excite 495 nm/absorb 585 nm). Degradation of the TaqMan probe by a successful PCR reaction results in an increase of the FAM emission signal, which is suppressed in the intact TaqMan probe by the TAMRA. Failure to degrade the TaqMan probe because of the presence of a homozygous deletion results in an altered FAM-to-TAMRA ratio. If the TaqMan assay suggested the presence of a homozygous deletion, PCRs were repeated and examined on agarose gels. To exclude the possibility that nucleotide polymorphisms in a primer site might be responsible for failure to amplify, putative homozygous deletions were, in every case, confirmed by analysis of additional PCR markers by using nonoverlapping primers located within 100 bp of the original marker. Several confirmed homozygous deletions were investigated with a denser local-marker map to refine the position of boundaries and in additional cancer cell lines to detect other small homozygous deletions. To estimate the proportion of homozygous deletions that are potentially due to copy-number polymorphisms, Epstein–Barr-virus-transformed lymphoblastoid cell lines, derived from the same individuals as a subset of cancer lines with homozygous deletions, were analyzed for copy number and heterozygosity by using a GeneChip human mapping 10K array (Affymetrix, Santa Clara, CA) as described in ref. 9.

Statistical Methods. All methods were calculated with the aid of matlab (The MathWorks, Natick, MA). All categorical comparisons were achieved with the χ2 test. Where a more general comparison of distributions was required, a Wilcoxon sign-rank test was calculated. Several comparisons relied on detailed information regarding the genome local to markers, which was extracted from ensembl (10).

A threshold of five contiguous markers with homozygous genotypes was used to identify areas of LOH. To ensure that this cutoff gives a low false-positive rate of LOH detection, the genotyping results of 25 normal samples were analyzed with the following bootstrapping method. For each marker, two alleles were randomly selected from all the normal samples at that marker. This procedure was repeated 25 times to give a new set of alleles at that marker and was then repeated across all 400 markers. The number of times that five or more contiguous homozygous markers occurred was totaled. This simulation was repeated 3,000 times, and the results were averaged to obtain the mean false-positive rate for 25 samples. This estimate was scaled to give the false-positive rate per sample.

To examine whether cell lines exhibit different rates of homozygous deletion, the deletion–variation statistic was introduced. This statistic is defined as the standard deviation of the number of homozygous deletions across all cell lines. The significance of the observed statistic was then determined by Monte Carlo simulation as follows. All the observed homozygous deletions were reassigned to randomly selected cell lines. Because different cell lines have different amounts of LOH and, hence, were examined with different numbers of TaqMan markers, the probability of reassignment of each homozygous deletion to each cell line was proportional to the number of markers examined in that cell line. The total number of homozygous deletions reassigned to each cell line was then calculated. The standard deviation of these counts across all the cell lines was then derived to obtain the deletion–variation statistic of the simulated counts. This was repeated 10,000 times to obtain the expected distribution of the deletion–deviation statistic. The deletion–deviation statistic for the observed homozygous deletions was then compared with this expected distribution to provide a P value. Because sets of cell lines with a greater range of homozygous deletion rates will increase this statistic, the comparison of the observed statistic to the expected distribution was one-tailed. Identifying small P values with precision by using this method can be achieved only by increasing the number of Monte Carlo runs. Limits on computing power and time restrict this precision; so, as a compromise, an upper bound for small P values has been provided.

To examine the degree of positional clustering of homozygous deletions in the genome, the deleted-marker count and maximum-overlap statistics were introduced. The deleted-marker count is defined as the number of markers deleted in at least one cell line. The maximum-overlap statistic is defined as the maximum number of cell lines deleting any single marker. Both statistics were calculated for each chromosome. In addition, the results were summed across all chromosomes to obtain genome-wide statistics. Monte Carlo simulation was performed to assess the significance of the observed statistics as follows. For each cell line, the deletions on each chromosome arm were cycled to simultaneously reposition the deletions at randomly selected TaqMan-marker positions. This process randomly repositions all deletions while preserving the level of deletion by both cell line and chromosome arm. Using the simulated data, we calculated both statistics for each chromosome and obtained a pair of genome-wide statistics by adding the statistics across all the chromosomes. This procedure was repeated 10,000 times to derive the expected distribution of both statistics. By comparing the statistics of the observed data to these expected distributions, P values were derived. Because a higher degree of positional clustering of homozygous deletions results in a smaller deleted-marker count and a greater maximum-overlap statistic, these tests are both one-tailed in opposing directions. As before, only an upper bound for small P values was derived.

The DNA-flexibility and -stability analyses were conducted by using the flexstab algorithm (http://leonardo.ls.huji.ac.il/departments/genesite/faculty/bkerem.htm), (11, 12). To investigate the possibility that differences in GC content between deleted and nondeleted markers could fully account for observed differences in flexibility/stability scores, logistic regression was applied. The deletion status of each marker analyzed provided a series of binary responses from the model (1 = deleted, and 0 = not deleted). GC content and flexibility scores provided two explanatory factors. The significance of each factor was then derived by using standard techniques.

Results

Cancer cell lines (n = 636) were screened for LOH by using 400 polymorphic microsatellite repeats distributed, on average, at 10-cM intervals through the genome. LOH was determined by the presence of at least five consecutive polymorphic markers showing homozygous genotypes. To assess this criterion, 25 Epstein–Barr-virus-transformed lymphoblastoid-cell-line DNAs were genotyped by using the 400-marker panel. Simulations based on the resultant data by using bootstrapping methods established a false-positive rate of approximately one region of LOH in 16 cell lines (see Materials and Methods). We also directly evaluated the presence of runs of five or more contiguous homozygous microsatellite markers in the 25 lymphoblastoid cell lines. Seven such runs were found. This number is significantly more than would be expected from the simulations and may reflect LOH in lymphoblastoid lines during in vitro passage. However, several thousand regions of LOH were detected in the 636 cancer cell lines. Therefore, both analyses demonstrate that the overwhelming majority of putative LOH regions detected in cancer cell lines by using this approach represent somatic events.

Cancer cell lines (n = 505) exhibiting LOH in at least one genomic region were screened for homozygous deletions by using a TaqMan assay applied to a map of 5,321 markers positioned, on average, at ≈500-kb intervals through the genome. Homozygous deletions (n = 281) were detected in 174 cell lines (see www.sanger.ac.uk/genetics/CGP/deletionmapping or Table 3, which is published as supporting information on the PNAS web site, for details of the homozygous deletions and the cell lines in which they occur). These deletions ranged in size from 64.7 kb to 21.2 Mb.

A subset of cancer cell lines (n = 55) had multiple homozygous deletions, with a maximum of 11 observed in BxPC3 (Table 3). The deletion–variation statistic described in Materials and Methods provided strong evidence that these differences in the number of homozygous deletions between cell lines are for intrinsic biological reasons rather than random variation or differences in the number of markers applied to detect homozygous deletions (P < 0.001). This biological effect appears to be associated with certain cancer types. For example, adrenocortical carcinomas, brain tumors, and pancreatic carcinomas exhibit higher rates of homozygous deletion compared with neuroblastomas, cervical carcinomas, and hepatocellular carcinomas (even when corrected for the number of TaqMan analyses conducted) (see Table 4, which is published as supporting information on the PNAS web site).

Some homozygous deletions (86 of 281) resulted in the complete or partial deletion of known recessive cancer genes (see Table 5, which is published as supporting information on the PNAS web site). These include CDKN2A/p16 (69 homozygous deletions), PTEN (7), hMAD4 (6), RB1 (3), and SMARCB1 (1). An additional 17 homozygous deletions overlapped or bordered seven of the nine currently sequenced common fragile sites, FRA2G (1), FRA3B (9), FRA6E (1), FRA6F (1), FRA7G (0), FRA7H (0), FRA9E (1), FRA16D (3), and FRAXB (1). Together, these results validate the sensitivity of our strategy with respect to the detection of biologically important homozygous deletions in human cancer.

The remaining homozygous deletions (n = 178) did not overlap known recessive cancer genes or fragile sites. The unexplained homozygous deletions grouped into 131 clusters of homozygous deletions. Each of these clusters was defined as one or more homozygous deletions from different cancer cell lines that overlapped or were contiguous.

For most cancer cell lines available from public repositories, it is not possible to evaluate directly whether either or both of the two deletions that constitute a homozygous deletion is a constitutional polymorphism (13, 14) or is somatically acquired. However, we were able to analyze, by using Affymetrix GeneChip human mapping 10K arrays, lymphoblastoid cell lines derived from the same individuals as 11 of the cancer cell lines containing 17 unexplained homozygous deletions. The presence of heterozygous SNPs and diploid copy number in the matched lymphoblastoid cell line within the region of homozygous deletion indicates that there is no constitutional hemizygous or homozygous deletion and, therefore, that the homozygous deletion is somatic. Sixteen of 17 unexplained homozygous deletions evaluated in this way were shown to be somatic. The single homozygous deletion, which proved to be due to a hemizygous constitutional deletion (close to the 8q telomere at D8S272) coupled with LOH in the cancer, formed part of a cluster of homozygous deletions in multiple cell lines that had identical boundaries on fine mapping. We therefore excluded from subsequent analyses this cluster and all others (at D6S1586, D7S520, and D9S1815) in which there were identical deletion boundaries in all of the cell lines. In all remaining clusters of homozygous deletions from multiple cell lines, differences in the boundaries of constituent homozygous deletions were observed.

A subset clusters (n = 27) included unexplained homozygous deletions from more than one cancer cell line (15 clusters included deletions from two cancer cell lines, seven clusters included deletions from three cancer cell lines, two clusters included deletions from four cancer cell lines, and three clusters included deletions from five cancer cell lines), and 104 were constituted by one homozygous deletion from a single cell line. To evaluate whether this degree of clustering could have occurred by chance, we performed Monte Carlo analyses to derive the distribution of clustering produced by randomly positioned deletions. We then compared this distribution to the level of observed clustering. For both the deleted-marker count and maximum-overlap statistics described in Materials and Methods, the observed statistic was significantly different from the expected distribution (P < 0.001), indicating that there was more genomic clustering of homozygous deletions than would be expected by chance. Therefore, the clustering of unexplained homozygous deletions is likely to have a biological explanation. Analyses by chromosome flagged clusters of potential interest. Specifically, chromosomes 2, 8–10, 16, 18, 19, and 21 all exhibited significant levels of clustering of homozygous deletions (Table 5).

One cluster of homozygous deletions in multiple cell lines lies in a gene-poor region on chromosome 9p, ≈700 kb centromeric to CDKN2A/p16. Many of the cell lines with a homozygous deletion in this region do not have a homozygous deletion in CDKN2A/p16 that is detectable by our screen. Other cell lines, however, do also have a deletion in CDKN2A/p16, with a region of retained DNA between the two deletions (Table 3). Moreover, some homozygous deletions of CDKN2A/p16 do extend to this region. Because of the proximity of this locus to CDKN2A/p16, we cannot exclude the possibility that all deletions at this locus depend, in some way, on deletions at CDKN2A/p16. We therefore conducted all subsequent analyses without this cluster.

Clusters of unexplained homozygous deletions that illustrate some frequently observed features are shown in Fig. 1. First, there is often no single common region of overlap of all of the homozygous deletions in a cluster. Second, there is frequently no gene that has components of its transcript deleted by all of the constituent homozygous deletions of a cluster. Third, many of the homozygous deletions in these clusters appear to be in regions without any annotated coding genes. Fourth, some of the deletions appear to be fragmented, with small areas retained between two regions of homozygous deletion. Finally, application of an additional, denser set of markers within an unexplained cluster to additional cancer cell lines (as in the four examples illustrated in Fig. 1) revealed previously undetected, smaller homozygous deletions. The occurrence of these additional homozygous deletions confirms the tendency to accumulate homozygous deletions in these regions. Although we cannot completely exclude the possibility that a recessive cancer gene is being inactivated by the homozygous deletions in these regions, these patterns of homozygous deletion are not typical of those found over recessive cancer genes.

Fig. 1.

Fig. 1.

Clusters of homozygous deletions in human cancer cell lines. Black rectangles denote the markers used in the analyses. Red rectangles denote a marker showing a homozygous deletion. Green rectangles denote a marker that is retained. Marker names are indicated at the bottom. Cell line names are indicated at the left hand side. The clusters of homozygous deletions are from chromosome 3 (A), chromosome 4 (B), chromosome 9 (C), and chromosome 13 (D). Identical deletion patterns to that observed at SNP tsc0055722 in B were observed at SNPs tsc0055721 and tsc0055723, the SNPs lying in too much closer proximity to differentiate at the scale of B.

We therefore investigated the hypothesis that aspects of genome structure may determine the positions and clustering of homozygous deletions in cancer. To sample the genome, 40- and 200-kb genomic “windows” were created, each centered on a marker from the set of 5,321 used to detect homozygous deletions. We then compared features of the genome within windows centered on markers in unexplained homozygous deletions to windows centered on markers showing no homozygous deletions in cancer cell lines (Tables 1 and 2; see also Table 6, which is published as supporting information on the PNAS web site).

Table 1. Gene presence in homozygous deletions.

40-kb windows 200-kb windows
Marker category Marker count Genes present (%) Genes absent (%) Genes present (%) Genes absent (%)
2+ deletions 44 9 (20.4) 35 (79.5) 21 (47.7) 23 (52.2)
One deletion 293 119 (40.6) 174 (59.3) 186 (63.4) 107 (36.5)
No deletions 4,908 2,572 (52.4) 2,336 (47.5) 3,679 (74.9) 1,229 (25)

Table 2. Genomic indicators in homozygously deleted regions.

Genome descriptor Marker category 40-kb windows 200-kb windows
Mean (SD) Mean (SD)
No. of lines per window 2+ deletions 17.55 (6.17) 84.25 (17.30)
One deletion 17.47 (6.89) 90.53 (20.93)
No deletions 18.87 (7.33) 94.39 (21.94)
No. of sines per window 2+ deletions 19.07 (15.33) 93.64 (54.47)
One deletion 18.36 (11.55) 91.49 (51.17)
No deletions 26.13 (15.86) 128.8 (71.33)
GC, % 2+ deletions 38.77 (4.00) 38.95 (3.91)
One deletion 39.07 (5.07) 39.08 (4.88)
No deletions 41.92 (5.17) 41.77 (4.70)
No. of genes per window 2+ deletions 0.25 (0.53) 0.95 (1.58)
One deletion 0.48 (0.63) 1.29 (1.53)
No deletions 0.67 (0.78) 2.02 (2.22)
Average gene footprint 2+ deletions 3.95 × 105 (5.16 × 105) 1.44 × 105 (3.34 × 105)
One deletion 1.91 × 105 (2.57 × 105) 1.17 × 105 (2.11 × 105)
No deletions 1.39 × 105 (2.28 × 105) 0.84 × 105 (1.88 × 105)
Flexibility scores 2+ deletions 10.94 (0.22) 10.92 (0.20)
One deletion 10.91 (0.27) 10.91 (0.25)
No deletions 10.75 (0.27) 10.75 (0.24)
No. of regions of high flexibility 2+ deletions 1.40 (3.39) 8.88 (10.91)
One deletion 1.09 (2.61) 5.23 (5.34)
No deletions 0.82 (1.90) 4.06 (4.88)

Windows surrounding homozygously deleted markers showed a lower gene count compared with windows without homozygous deletions (Tables 2 and 6, Wilcoxon rank-sum tests, P = 2.3 × 10–7 and 2.6 × 10–12 for 40- and 200-kb windows, respectively). The difference in gene count could, in principle, be related to a lower count of genes with normal footprint size and/or to an increased number of genes with large footprints. There is evidence for both explanations. There is clearly a higher incidence of gene-free windows among the set showing homozygous deletions (Table 1, χ2 tests, P = 3.0 × 10–7 and 4.1 × 10–8 for 40- and 200-kb windows, respectively), and there is some evidence for the average genome-footprint size of genes found in homozygous deletions being larger than in nondeleted regions (Table 2, Wilcoxon sign-rank test, P = 0.0075 for 40-kb windows and P = 0.25 for 200-kb windows). The short interspersed nuclear element (SINE) and, to a lesser extent, the long interspersed nuclear element (LINE) contents were lower in regions showing homozygous deletions (Wilcoxon sign-rank test, SINEs P = 1.0 × 10–23 and 7.9 × 10–27, LINEs P = 0.0010 and 5.5 × 10–5 for 40- and 200-kb windows, respectively). GC content was also lower in regions showing homozygous deletions (Wilcoxon sign-rank test, P = 1.4 × 10–29 and 9.2 × 10–32 for 40- and 200-kb windows, respectively). Low gene number is correlated with low SINE count and GC content in the genome as a whole.

Previous studies have suggested an association between fragile sites and sequences predicted to show high flexibility and low stability (6, 7). To test the significance of these indices in the genesis of homozygous deletions, an average flexibility was derived for each 40- and 200-kb window (Tables 2 and 6). Regions showing homozygous deletions have higher mean flexibility than those that are not deleted (Wilcoxon rank-sum test, P < 1.0 × 10–50 for 40- and 200-kb windows). We also examined 100-bp areas across each window to detect localized areas of high flexibility. Windows around markers showing homozygous deletions contained more 100-bp high-flexibility areas than did windows around markers showing no deletion. Similar results were observed for stability, with a trend toward lower stability in homozygously deleted regions (data not shown). Because GC content correlates negatively with DNA flexibility, these results are potentially explicable purely by the aforementioned differences in GC content between homozygously deleted and deletion-free regions. However, the application of logistic regression (see Materials and Methods) with these predictors [CG content (P = 0.3480) and flexibility score (P = 0.0004)] suggests that the likelihood of homozygous deletion is more dependent on flexibility score than on GC content.

The low density or absence of coding genes within regions of homozygous deletion raises the possibility that other types of transcript may be the target for homozygous deletion. One recent proposal is that miRNAs are more commonly encountered within certain homozygously deleted regions and in fragile sites (16, 17). We examined the distribution of 212 miRNAs (12) with respect to homozygous deletions: 93.4% (198) of the miRNAs were closest to a marker showing no homozygous deletion, 4.2% (9) were closest to a marker showing one homozygous deletion, and 2.4% (5) were closest to a marker showing more than one homozygous deletion. The respective proportions for the markers themselves were 92.3% (4,911) showing no homozygous deletion, 6.2% (330) showing a homozygous deletion in a single cell line, and 1.5% (82) showing homozygous deletions in more than one cell line. There was no evidence of significant association of miRNAs with homozygous deletions (P = 0.3383).

A further possibility is that homozygous deletions target regulatory regions. The full extent of regulatory regions genome-wide is currently poorly defined. However, these regions are likely to be conserved through evolution. We therefore compared the locations of a set of recently identified, noncoding, highly conserved genomic regions with the locations of homozygous deletions and found no significant correlation (P = 0.8417 and 0.2125 for intergenic and intragenic highly conserved regions, respectively).

Approximately 5% of the genome is low-copy-number repeat (LCR), also known as segmental duplications (15). An association between the locations of LCRs and constitutional copy-number polymorphisms is noted in refs. 16 and 17. We therefore investigated the potential role of LCRs in the pathogenesis of homozygous deletions in cancer. Many LCRs are located in the vicinity of centromeres. When centromeres are excluded, 3.3% of the genome is LCR. By contrast, in the portion of the genome defined by placing 2-Mb windows around markers showing homozygous deletions in cancer, only 1.4% is LCR. We confirmed that this analysis detects the previously reported association between LCRs and constitutional copy-number polymorphisms. Placing 2-Mb windows around the copy-number-polymorphic regions specified in refs. 16 and 17 showed that 6.9% and 5.9%, for copy-number reductions and increases, respectively (16), and 8.6% and 27.3%, for copy-number reductions and increases, respectively (17), of the genome defined in this way was LCR. Thus, homozygous deletions in cancer are negatively associated with LCRs, whereas constitutional copy-number polymorphisms are positively associated with LCRs. This analysis indicates that, in general, LCRs are not likely to be implicated in the genesis of somatic homozygous deletions in cancer and further confirms that copy-number polymorphisms are unlikely to account for a significant proportion of the homozygous deletions we have detected.

Clusters of homozygous deletions from several independent cancer samples have previously been of particular significance in the search for previously undiscovered recessive cancer genes. We therefore compared windows around markers showing homozygous deletions in one cancer cell line to markers showing homozygous deletions in multiple cancer cell lines. The number of markers showing homozygous deletions in multiple cancer cell lines is relatively small and limits the power of these comparisons. Nevertheless, windows around markers showing homozygous deletions in two or more cell lines exhibited significantly lower gene counts and more gene-free regions than did windows around markers deleted in a single cell line (Tables 2 and 6). Therefore, structural features associated with homozygous deletions overall are more marked in regions showing clustering of multiple homozygous deletions.

Discussion

We have carried out a genome-wide survey of homozygous deletions in human cancer that has yielded a larger set of this type of abnormality than has, to our knowledge, previously been reported. The results show that most homozygous deletions in cancer genomes are not explained by known recessive cancer genes, sequenced fragile sites, or copy-number polymorphisms. The patterns of overlap of unexplained homozygous deletions within individual clusters indicate that there is often no single coding gene that is the target for the deletions and that homozygous deletions often fall within intergenic DNA. Nevertheless, the unexplained homozygous deletions cluster more than would be expected by chance, and their positions are strongly associated with structural features of the genome, including low gene density, low repeat frequency, and high DNA flexibility.

These structural features could conceivably flag currently unsequenced fragile sites. Unfortunately, there are no clearly defined sequence characteristics of common fragile sites. Moreover, the numbers of deleted markers due to fragile sites in this study is small, and, hence, reliable comparisons are difficult to make. Nevertheless, in contrast to unexplained homozygous deletions, genome windows around deleted markers in the vicinity of the nine sequenced fragile sites do not show a consistent increase in gene-free regions and show no differences in SINE, LINE, GC content, and flexibility analyses from the rest of the genome (data not shown). There are weak similarities with unexplained homozygous deletions in a trend toward larger gene footprint and an associated reduction in number of genes per window. However, the evidence in favor of currently unsequenced fragile sites being responsible for the locations of unexplained homozygous deletions in human cancers is, at best, equivocal. Current mapping imprecision of fragile sites at the resolution of chromosomal bands makes it difficult to determine whether unexplained homozygous deletions are positionally related to common fragile sites.

One further possibility is that homozygous deletions in regions of low gene count may simply entail fewer adverse consequences for the cell. Thus, if large numbers of homozygous deletions are generated in random genomic locations over a short period in a population of cancer cells (such that almost every cell has several homozygous deletions), the emergent dominant clone may be the one with homozygous deletions that confer the least negative selection. The positioning of homozygous deletions in gene-poor regions may minimize negative-selection pressure and could account for many of the differences in genomic indices observed between homozygously deleted and retained regions. It may, conceivably, also account for some of the clustering of homozygous deletions. However, it is not clear that the clustering seen in Fig. 1 is attributable to simply the absence of genes, because the gene-free regions in which these clusters occur is often more extensive. Therefore, there may be other, currently obscure, factors that determine the positions of homozygous deletions. For example, it is conceivable that homozygous deletions in gene-free regions confer positive selective advantage by deleting regulatory regions that are necessary for control of the expression of recessive or dominant cancer genes.

It is likely that forces of positive selection conferred by homozygous deletions of recessive cancer genes and the increased rate of formation of homozygous deletions in mutable (i.e., fragile) regions act in concert with the reduction of selective disadvantage through location of homozygous deletions within gene-poor regions to determine the overall patterns of homozygous deletion in cancer cells. One example of this may be occurring over the CDKN2A/p16 gene. Homozygous deletions of CDKN2A/p16 and other recessive cancer genes confer positive selective advantage on cancer cells. It is notable, however, that homozygous deletion of CDKN2A/p16 in cancer cell lines is much more common than homozygous deletion of other recessive cancer genes and that the size of homozygous deletions over CDKN2A/p16 is larger than for the other four recessive oncogenes showing homozygous deletions. We detected 69 homozygous deletions that include CDKN2A/p16, whereas there are, in total, 17 over the other four recessive cancer genes that show at least one homozygous deletion. The largest homozygous deletion that includes CDKN2A/p16 is 9.2 Mb, whereas the largest over the other four recessive cancer genes is 1.3 Mb. CDKN2A/p16 is located just telomeric to one of the largest gene-poor regions in the human genome. Moreover, homozygous deletions of CDKN2A/p16 extend centromerically into the gene-poor region more frequently than telomerically into a region of relatively high gene density (Table 3). Thus, it may be that CDKN2A/p16 is subject more commonly to homozygous deletions than are other recessive cancer genes because of relatively limited collateral negative-selection pressures associated with deletions of adjacent genes.

This comparison of genome landscape has not been exhaustive, and there may be other sequence parameters that merit investigation. Moreover, we have not investigated deletion breakpoints, and the analyses have been conducted on cancer cell lines rather than on primary tumors. It is therefore possible that some homozygous deletions were generated during in vitro passage, and this may generate a distorted view of processes occurring in vivo. Nevertheless, the results of these studies illustrate how the human genome sequence applied to large data sets can be used to elucidate the structural factors that are implicated in determining broad patterns of somatic genetic abnormality seen in cancer cells.

Supplementary Material

Supporting Tables

Acknowledgments

We thank all contributors to the collection of cell lines; Wendy Haynes for formatting the manuscript; Nazneen Rahman, James Lupski, Charles Lee, and Nigel Carter for helpful comments; and the Wellcome Trust and the Institute of Cancer Research for their support.

This paper was submitted directly (Track II) to the PNAS office.

Abbreviations: LCR, low-copy-number repeat; LOH, loss of heterozygosity; LINE, long interspersed nuclear element; SINE, short interspersed nuclear element.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Tables