Exon Array Profiling Detects EML4-ALK Fusion in Breast, Colorectal, and Non–Small Cell Lung Cancers (original) (raw)

Skip Nav Destination

Cancer Genes and Genomics| September 17 2009

Eva Lin;

Departments of 1Molecular Biology, 2Bioinformatics, and 3Research Oncology Diagnostics, Genentech, Inc., South San Francisco, California

Search for other works by this author on:

Li Li;

Departments of 1Molecular Biology, 2Bioinformatics, and 3Research Oncology Diagnostics, Genentech, Inc., South San Francisco, California

Search for other works by this author on:

Yinghui Guan;

Departments of 1Molecular Biology, 2Bioinformatics, and 3Research Oncology Diagnostics, Genentech, Inc., South San Francisco, California

Search for other works by this author on:

Robert Soriano;

Departments of 1Molecular Biology, 2Bioinformatics, and 3Research Oncology Diagnostics, Genentech, Inc., South San Francisco, California

Search for other works by this author on:

Celina Sanchez Rivers;

Departments of 1Molecular Biology, 2Bioinformatics, and 3Research Oncology Diagnostics, Genentech, Inc., South San Francisco, California

Search for other works by this author on:

Sankar Mohan;

Departments of 1Molecular Biology, 2Bioinformatics, and 3Research Oncology Diagnostics, Genentech, Inc., South San Francisco, California

Search for other works by this author on:

Ajay Pandita;

Departments of 1Molecular Biology, 2Bioinformatics, and 3Research Oncology Diagnostics, Genentech, Inc., South San Francisco, California

Search for other works by this author on:

Jerry Tang;

Departments of 1Molecular Biology, 2Bioinformatics, and 3Research Oncology Diagnostics, Genentech, Inc., South San Francisco, California

Search for other works by this author on:

Zora Modrusan

Departments of 1Molecular Biology, 2Bioinformatics, and 3Research Oncology Diagnostics, Genentech, Inc., South San Francisco, California

Zora Modrusan, Department of Molecular Biology, Genentech, Inc., 1 DNA Way, MS#413a, South San Francisco, CA 94080. Phone: 650-225-6101; Fax: 650-467-5482. E-mail: modrusan.zora@gene.com

Search for other works by this author on:

Crossmark: Check for Updates

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

E. Lin and L. Li contributed equally to this work.

Zora Modrusan, Department of Molecular Biology, Genentech, Inc., 1 DNA Way, MS#413a, South San Francisco, CA 94080. Phone: 650-225-6101; Fax: 650-467-5482. E-mail: modrusan.zora@gene.com

Received: November 07 2008

Revision Received: June 11 2009

Accepted: June 23 2009

Online ISSN: 1557-3125

Print ISSN: 1541-7786

© 2009 American Association for Cancer Research.

2009

Mol Cancer Res (2009) 7 (9): 1466–1476.

Citation

Eva Lin, Li Li, Yinghui Guan, Robert Soriano, Celina Sanchez Rivers, Sankar Mohan, Ajay Pandita, Jerry Tang, Zora Modrusan; Exon Array Profiling Detects EML4-ALK Fusion in Breast, Colorectal, and Non–Small Cell Lung Cancers. _Mol Cancer Res 1 September 2009; 7 (9): 1466–1476. https://doi.org/10.1158/1541-7786.MCR-08-0522

Download citation file:

Abstract

The echinoderm microtubule-associated protein-like 4–anaplastic lymphoma kinase (EML4-ALK) fusion gene has been identified as an oncogene in a subset of non–small cell lung cancers (NSCLC). We used profiling of cancer genomes on an exon array to develop a novel computational method for the global search of gene rearrangements. This approach led to the detection of EML4-ALK fusion in breast and colorectal carcinomas in addition to NSCLC. Screening of a large collection of patient tumor samples showed the presence of EML4-ALK fusion in 2.4% of breast (5 of 209), 2.4% of colorectal (2 of 83), and in 11.3% of NSCLC (12 of 106). Besides previously known EML4-ALK variants 1 (E13; A20) and 2 (E20; A20), a novel variant E21; A20 was found in colorectal carcinoma. The presence of an EML-ALK rearrangement was verified by identifying genomic fusion points in tumor samples representative of breast, colon, and NSCLC. EML4-ALK translocation was also confirmed by fluorescence in situ hybridization assay, which revealed its substantial heterogeneity in both primary tumors and tumor-derived cell lines. To elucidate the functional significance of EML4-ALK, we examined the growth of cell lines harboring the fusion following EML4 and ALK silencing by small interfering RNA. Significant growth inhibition was observed in some but not all cell lines, suggesting their variable dependence on _ALK_-mediated cell survival signaling. Collectively, these findings show the recurrence of EML4-ALK fusion in multiple solid tumors and further substantiate its role in tumorigenesis. (Mol Cancer Res 2009;7(9):1466–76)

Introduction

Chromosomal translocations and their corresponding gene fusions play an important role in the initiation of tumorigenesis and have been strongly associated with distinct tumor subtypes (1). The first cancer causing translocation t(9:22)(q34;q11) was identified in chronic myeloid leukemia resulting in the discovery of Philadelphia chromosome (2). This translocation juxtaposes the 5′ portion of the BCR gene with the 3′ portion of the tyrosine kinase ABL1, generating the BCR-ABL1 fusion gene with constitutive kinase activity. Similar to ABL1, a substantial number of genes involved in translocations have been causally implicated in carcinogenesis (3). However, the biological and clinical significance of translocations in solid tumors has been less appreciated compared with hematologic malignancies mainly due to their karyotypic complexity (4, 5).

An increasing number of translocations have been identified in the past two decades using technologies based on fluorescence in situ hybridization (FISH; ref. 6). The latest advances in microarray and sequencing technologies have also provided new tools for interrogating recurrent genomic aberrations at the whole genome level. For example, gene expression profiling of tumor samples coupled with a novel bioinformatics approach resulted in identification of a gene fusion between TMPRSS2 and ERG or ETV1 in the majority of prostate cancer samples (7). The presence of TMPRSS2-ERG fusion in prostate was further confirmed by profiling patient tumor samples on exon arrays (8). Although the feasibility of using exon array data for the detection of gene fusions has been shown, a more systematic approach is required when performing a global search for novel gene rearrangements across multiple cancer types.

Anaplastic lymphoma kinase (ALK) was first identified as a fusion partner of nucleophosmin in anaplastic large-cell lymphoma (ALCL) with the t(2, 5) (p23;q35) chromosomal translocation (9, 10). Translocations linking ALK to multiple fusion partners were subsequently identified in ALCL as well as in other malignancies, including neuroblastomas and myofibroblastic tumors (11). A novel gene fusion of ALK and the echinoderm microtubule-associated protein-like 4 (EML4) was recently identified in non–small cell lung cancer (NSCLC; ref. 12, 13). The EML4-ALK fusion was shown to result from a small inversion within chromosome 2p and was detected in 7% of NSCLC patient population. The presence of EML-ALK fusion in NSCLC was confirmed in a number of subsequent studies (14-18); however, it has not yet been detected in other carcinomas, including breast and colorectal (19, 20). A number of EML4-ALK fusion variants have been identified up to date (12, 18, 20-23); all of them involve an almost identical portion of ALK (exons 20-29) fused to EML4 exons 1 to 13 (E13; A20 or variant 1; ref. 12), 1 to 20 (E20; A20 or variant 2; ref. 12), 1 to 6 (E6a/b; A20 or variant 3; refs. 21, 22), 1 to 14 (E14; A20 or variant 4; ref. 20), 1 and 2 (E2; A20 or variant 5; ref. 20), and 1 to 18 (E18; A20 variant; ref. 18). Similar to other ALK gene fusions, all of the EML4-ALK transcript variants contain the cytoplasmic portion of ALK with the entire kinase domain.

The constitutive kinase activity of ALK is essential for proliferation of ALCL cells and its inactivation represents a feasible therapeutic approach for the treatment of ALCL (24). The intact ALK kinase domain within EML4-ALK possesses marked transforming as well as oncogenic activity in vitro and in vivo, respectively (12, 21, 25). Therefore, tumors harboring EML4-ALK fusion seem to represent a distinct subtype that might be responsive to ALK kinase inhibitors. Several studies showed that ALCL, NSCLC, and neuroblastoma cell lines harboring ALK rearrangements underwent cell cycle arrest and apoptosis when treated with ALK-selective inhibitors (26-29). Administration of two ALK inhibitors, TAE684 and PF-2341066, in _ALK_-positive tumor xenograft models caused significant tumor growth regression (22, 27, 28). Moreover, treatment with another small-molecule ALK inhibitor resulted in a disappearance of adenocarcinoma nodules in lungs of EML4-ALK transgenic mice (25). Thus, ALK kinase inhibitors alone or in conjunction with other chemotherapeutic agents may represent an effective treatment for patients whose tumors contain the EML4-ALK fusion.

Here, we used exon array profiling to develop a novel computational method for the identification of gene rearrangements in solid tumors. This approach revealed the presence of EML4-ALK fusion in breast, colorectal, and NSCLC samples. We used reverse transcription PCR (RT-PCR) to screen a large collection of breast, colorectal, and NSCLC patient samples for EML4-ALK fusion. The EML4-ALK rearrangement was found in multiple tumors and its underlying structure was deciphered by genomic PCR and FISH. In addition, we identified some tumor-derived cell lines positive for EML4-ALK and used them to investigate the functional significance of the fusion in cell growth and proliferation. Together, these data show the recurrence of EML4-ALK translocation in solid tumors and further substantiate its role in tumorigenesis.

Results

Computational Identification of ALK Rearrangement

One hundred fifty-three samples, including 84 breast (34 HER2 positive, 26 luminal, and 24 basal), 26 colorectal adenocarcinoma, and 43 NSCLC (21 adenocarcinoma and 22 squamous), were profiled on Affymetrix human Exon 1.0 arrays. As chromosomal translocations often lead to up-regulation of the fusion gene at the 3′ end, we developed a whole genome search algorithm to recognize potential gene fusion candidates with a discordant expression between 5′ and 3′ exons in one or more samples (Fig. 1). Normalized expression values of all exon probe sets for a given gene were extracted into a matrix, with rows representing samples and columns representing exon probe sets. Each site between exon probe sets was examined and a putative breakpoint was predicted as a position that shows the maximum difference in expression between 5′ and 3′ exons in a subset of samples using the t statistic. If multiple breakpoints met the threshold in different samples, the best candidate breakpoint was selected for each gene by weighted majority voting, which took into consideration the magnitude of expression difference between 5′ and 3′ exons as well as the number of occurrences among all samples.

FIGURE 1.

FIGURE 1. Computational method used to identify the ALK breakpoint. Normalized exon expression values for a given gene are represented by a heatmap: red, higher values; blue, lower values. Columns, exon probe sets ordered from 5′ to 3′; rows, tumor samples (top left). Each sample was examined for putative breakpoint, which resulted in a T score and a predicted breakpoint (top right). Multiple samples were summarized for the gene by selecting the best candidate breakpoint, resulting in an M score (bottom right). The statistical significance of the M score was evaluated using empirically derived null distribution through permutation (bottom left).

Computational method used to identify the ALK breakpoint. Normalized exon expression values for a given gene are represented by a heatmap: red, higher values; blue, lower values. Columns, exon probe sets ordered from 5′ to 3′; rows, tumor samples (top left). Each sample was examined for putative breakpoint, which resulted in a T score and a predicted breakpoint (top right). Multiple samples were summarized for the gene by selecting the best candidate breakpoint, resulting in an M score (bottom right). The statistical significance of the M score was evaluated using empirically derived null distribution through permutation (bottom left).

FIGURE 1.

FIGURE 1. Computational method used to identify the ALK breakpoint. Normalized exon expression values for a given gene are represented by a heatmap: red, higher values; blue, lower values. Columns, exon probe sets ordered from 5′ to 3′; rows, tumor samples (top left). Each sample was examined for putative breakpoint, which resulted in a T score and a predicted breakpoint (top right). Multiple samples were summarized for the gene by selecting the best candidate breakpoint, resulting in an M score (bottom right). The statistical significance of the M score was evaluated using empirically derived null distribution through permutation (bottom left).

Computational method used to identify the ALK breakpoint. Normalized exon expression values for a given gene are represented by a heatmap: red, higher values; blue, lower values. Columns, exon probe sets ordered from 5′ to 3′; rows, tumor samples (top left). Each sample was examined for putative breakpoint, which resulted in a T score and a predicted breakpoint (top right). Multiple samples were summarized for the gene by selecting the best candidate breakpoint, resulting in an M score (bottom right). The statistical significance of the M score was evaluated using empirically derived null distribution through permutation (bottom left).

Close modal

Following the whole genome search, ranking of genes with putative breakpoints was done using a summary score (M score), which was the median t statistic of samples containing the predicted breakpoint. The probability density distribution of the M scores for all examined genes was similar across colorectal, breast, and NSCLC samples showing the consistency of the scoring metric for different datasets (Supplementary Fig. S1A). The majority of genes had an M score of 0 as they did not meet the threshold for having a putative breakpoint. The M scores of all other genes had median and interquartile range of 2.2 and 1.15 for colorectal, 2 and 0.8 for breast, and 2.1 and 0.9 for NSCLC, respectively. A small fraction of genes were located at the tail of the density distribution, suggesting that they have remarkably larger M scores than the rest of the genome.

Because observed breakpoints may be present due to reasons other than gene fusions, we applied additional criteria to select biologically relevant genes. The predicted breakpoints between exons were compared with protein domain boundaries, and the exons to the 3′ of the predicted breakpoint were required to contain intact functional domains. Comparison to the Cancer Gene Census database was further used to prioritize genes previously known to be implicated in cancer (3). As a result, we identified ALK as a top candidate having the same breakpoint located between probe sets 2546434 (exon 19) and 2546433 (exon 20) in three types of tumor samples. Based on the Protein Family Database, we determined that the portion of ALK protein located downstream from the breakpoint contains an intact tyrosine kinase domain (30). A heatmap of ALK gene expression in colorectal, breast, and NSCLC samples is shown in Fig. 2. A significant difference between 5′ and 3′ expression was observed for one colorectal (HF-18092), two breasts (HF-20579 and HF-21749), and three NSCLC samples (HF-5158, HF-11756, HF-15224). The M scores of ALK in all three datasets were located at the tail of the density distribution ranging from 12.3 (ranked 6th) in colorectal, 11.2 (ranked 9th) in breast, and 9.9 (ranked 26th) in NSCLC. The M scores of ALK were highly significant with estimated q values of 0 (Supplementary Fig. S1B).

FIGURE 2.

FIGURE 2. Detection of putative ALK breakpoints in tumor samples. Heatmaps of normalized exon array expression values for ALK in colorectal (A), breast (B), and NSCLC (C) tumors. The color scale illustrating level of expression is shown at bottom right: red, higher values; blue, lower values. The columns represent exon probe sets ordered from 5′ to 3′; each probe set is listed at the bottom. The rows represent tumor samples and the ones containing predicted breakpoint are listed on the right side. The ALK breakpoint (arrow) was predicted between probe sets 2546434 (exon 19) and 2546433 (exon 20) in all three tumor types.

Detection of putative ALK breakpoints in tumor samples. Heatmaps of normalized exon array expression values for ALK in colorectal (A), breast (B), and NSCLC (C) tumors. The color scale illustrating level of expression is shown at bottom right: red, higher values; blue, lower values. The columns represent exon probe sets ordered from 5′ to 3′; each probe set is listed at the bottom. The rows represent tumor samples and the ones containing predicted breakpoint are listed on the right side. The ALK breakpoint (arrow) was predicted between probe sets 2546434 (exon 19) and 2546433 (exon 20) in all three tumor types.

FIGURE 2.

FIGURE 2. Detection of putative ALK breakpoints in tumor samples. Heatmaps of normalized exon array expression values for ALK in colorectal (A), breast (B), and NSCLC (C) tumors. The color scale illustrating level of expression is shown at bottom right: red, higher values; blue, lower values. The columns represent exon probe sets ordered from 5′ to 3′; each probe set is listed at the bottom. The rows represent tumor samples and the ones containing predicted breakpoint are listed on the right side. The ALK breakpoint (arrow) was predicted between probe sets 2546434 (exon 19) and 2546433 (exon 20) in all three tumor types.

Detection of putative ALK breakpoints in tumor samples. Heatmaps of normalized exon array expression values for ALK in colorectal (A), breast (B), and NSCLC (C) tumors. The color scale illustrating level of expression is shown at bottom right: red, higher values; blue, lower values. The columns represent exon probe sets ordered from 5′ to 3′; each probe set is listed at the bottom. The rows represent tumor samples and the ones containing predicted breakpoint are listed on the right side. The ALK breakpoint (arrow) was predicted between probe sets 2546434 (exon 19) and 2546433 (exon 20) in all three tumor types.

Close modal

RT-PCR Detection of EML4-ALK Fusion

The sequence located 5′ of the computationally predicted breakpoint in the ALK transcript was determined in breast tumor sample HF-21749 using RNA ligase–mediated rapid amplification of cDNA ends. Sequence analysis of RNA ligase–mediated rapid amplification of cDNA end products revealed that the first 13 exons of EML4 were located upstream of the ALK exon 20, generating the EML4-ALK fusion variant E13; A20 or variant 1 (12). The presence of EML4-ALK fusion in the remaining five tumor samples predicted to have breakpoint in ALK was detected subsequently by RT-PCR.

RT-PCR assay was used to screen a collection of tumor samples, including 209 breast, 83 colorectal, and 106 NSCLC, for the presence of the EML4-ALK fusion transcript. PCR products were run on an agarose gel and fragments representative of variants E13; A20 (247 bp) and E20; A20 (1,000 bp) were cloned and sequenced (see Materials and Methods). The EML4-ALK fusion was detected in five breast tumor samples (5 of 209, 2.4%); both transcript variants E13; A20 and E20; A20 were present (Fig. 3; Table 1). The tumors positive for EML4-ALK included HER2+ (HF-21744), luminal (HF-20260), and basal (HF-20579, HF-21749, HF-21788), suggesting that the presence of the fusion is not specific to a particular breast cancer subtype. Similar to the breast cancer, low incidence of EML4-ALK fusion was found in colorectal carcinoma where only two adenocarcinoma samples were found positive (2 of 83, 2.4%; Fig. 3; Table 1). Sequencing analysis of PCR products showed that HF-18138 sample contained a previously known E20; A20 variant, whereas HF-18092 harbored a novel E21; A20 variant where EML4 exons 1 to 21 were fused to ALK exon 20. Although the E21; A20 variant was predicted as a possible in-frame fusion of EML and ALK (20), it has not yet been found in NSCLC (16, 18). Comparing with the breast and colorectal carcinomas, a higher frequency of EML4-ALK fusion transcript was found in NSCLC (12 of 106, 11.3%; Fig. 3; Table 1). The EML4-ALK fusion was detected only in adenocarcinomas, and a majority of the samples contained variant E13; A20. The HF-15224 sample harbored yet another EML4-ALK variant where EML4 exon 17 was fused with a 75-bp sequence from EML4 intron 17 followed by a truncated ALK exon 20 with a 62-bp deletion at the 5′ end (E17; ΔA20); it has not been determined whether this transcript gives rise to a functional protein. Overall, RT-PCR experiments showed some inconsistencies; in half of NSCLC samples, only two of three independent PCR reactions verified the presence of EML4-ALK fusion. A similar inconsistency of EML4-ALK transcript detection was reported recently (17), suggesting low levels of the fusion transcript expression in NSCLC.

FIGURE 3.

FIGURE 3. RT-PCR detection of EML4-ALK fusion in breast, colorectal, and NSCLC tumor samples. Top gel images show PCR-amplified fragments representative of different EML4-ALK transcript variants, including E13; A20 (247 bp), E17; ΔA20 (739 bp), E20; A20 (1,000 bp), and E21; A20 (1,099 bp), identified in breast, colorectal, and NSCLC samples; other tumors lacking EML4-ALK were included as negative controls. The bottom gel images represent PCR-amplified fragments for a control gene GAPDH. No-template control (NTC) was included and a 1-kb Plus DNA ladder was used as a marker.

RT-PCR detection of EML4-ALK fusion in breast, colorectal, and NSCLC tumor samples. Top gel images show PCR-amplified fragments representative of different EML4-ALK transcript variants, including E13; A20 (247 bp), E17; ΔA20 (739 bp), E20; A20 (1,000 bp), and E21; A20 (1,099 bp), identified in breast, colorectal, and NSCLC samples; other tumors lacking EML4-ALK were included as negative controls. The bottom gel images represent PCR-amplified fragments for a control gene GAPDH. No-template control (NTC) was included and a 1-kb Plus DNA ladder was used as a marker.

FIGURE 3.

FIGURE 3. RT-PCR detection of EML4-ALK fusion in breast, colorectal, and NSCLC tumor samples. Top gel images show PCR-amplified fragments representative of different EML4-ALK transcript variants, including E13; A20 (247 bp), E17; ΔA20 (739 bp), E20; A20 (1,000 bp), and E21; A20 (1,099 bp), identified in breast, colorectal, and NSCLC samples; other tumors lacking EML4-ALK were included as negative controls. The bottom gel images represent PCR-amplified fragments for a control gene GAPDH. No-template control (NTC) was included and a 1-kb Plus DNA ladder was used as a marker.

RT-PCR detection of EML4-ALK fusion in breast, colorectal, and NSCLC tumor samples. Top gel images show PCR-amplified fragments representative of different EML4-ALK transcript variants, including E13; A20 (247 bp), E17; ΔA20 (739 bp), E20; A20 (1,000 bp), and E21; A20 (1,099 bp), identified in breast, colorectal, and NSCLC samples; other tumors lacking EML4-ALK were included as negative controls. The bottom gel images represent PCR-amplified fragments for a control gene GAPDH. No-template control (NTC) was included and a 1-kb Plus DNA ladder was used as a marker.

Close modal

Table 1.

Summary Information on the Tumor Samples, Including Name, Tissue Origin, Histologic Subtype, and Type of EML4-ALK Fusion Variant

No. Sample Name Tissue Subtype EML4-ALK Variant
1 HF-20260 Breast Luminal E20; A20
2 HF-20579 Breast Basal E20; A20
3 HF-21744 Breast HER2+ E13; A20
4 HF-21749 Breast Basal E13; A20
5 HF-21788 Breast Basal E13; A20
6 HF-20603 Breast HER2+ ND
7 HF-21781 Breast HER2+ ND
8 HF-21753 Breast Basal ND
1 HF-18092 Colon Adenocarcinoma E21; A20
2 HF-18138 Colon Adenocarcinoma E20; A20
3 HF-18430 Colon Adenocarcinoma ND
4 HF-18032 Colon Adenocarcinoma ND
5 HF-17994 Colon Adenocarcinoma ND
1 HF-11739 NSCLC Adenocarcinoma E13; A20
2 HF-11769 NSCLC Adenocarcinoma E13; A20
3 HF-15512 NSCLC Adenocarcinoma E13; A20
4 HF-15519 NSCLC Adenocarcinoma E13; A20
5 HF-15523 NSCLC Adenocarcinoma E13; A20
6 HF-15559 NSCLC Adenocarcinoma E13; A20
7 HF-15560 NSCLC Adenocarcinoma E13; A20
8 HF-15563 NSCLC Adenocarcinoma E13; A20
9 HF-11766 NSCLC Adenocarcinoma E13; A20
10 HF-11756 NSCLC Adenocarcinoma E13; A20
11 HF-5158 NSCLC Adenocarcinoma E13; A20
12 HF-15224 NSCLC Adenocarcinoma E17; ΔA20
13 HF-11765 NSCLC Adenocarcinoma ND
14 HF-11779 NSCLC Adenocarcinoma ND
15 HF-4810 NSCLC Adenocarcinoma ND
No. Sample Name Tissue Subtype EML4-ALK Variant
1 HF-20260 Breast Luminal E20; A20
2 HF-20579 Breast Basal E20; A20
3 HF-21744 Breast HER2+ E13; A20
4 HF-21749 Breast Basal E13; A20
5 HF-21788 Breast Basal E13; A20
6 HF-20603 Breast HER2+ ND
7 HF-21781 Breast HER2+ ND
8 HF-21753 Breast Basal ND
1 HF-18092 Colon Adenocarcinoma E21; A20
2 HF-18138 Colon Adenocarcinoma E20; A20
3 HF-18430 Colon Adenocarcinoma ND
4 HF-18032 Colon Adenocarcinoma ND
5 HF-17994 Colon Adenocarcinoma ND
1 HF-11739 NSCLC Adenocarcinoma E13; A20
2 HF-11769 NSCLC Adenocarcinoma E13; A20
3 HF-15512 NSCLC Adenocarcinoma E13; A20
4 HF-15519 NSCLC Adenocarcinoma E13; A20
5 HF-15523 NSCLC Adenocarcinoma E13; A20
6 HF-15559 NSCLC Adenocarcinoma E13; A20
7 HF-15560 NSCLC Adenocarcinoma E13; A20
8 HF-15563 NSCLC Adenocarcinoma E13; A20
9 HF-11766 NSCLC Adenocarcinoma E13; A20
10 HF-11756 NSCLC Adenocarcinoma E13; A20
11 HF-5158 NSCLC Adenocarcinoma E13; A20
12 HF-15224 NSCLC Adenocarcinoma E17; ΔA20
13 HF-11765 NSCLC Adenocarcinoma ND
14 HF-11779 NSCLC Adenocarcinoma ND
15 HF-4810 NSCLC Adenocarcinoma ND

Abbreviation: ND, not detected (i.e., EML4-ALK fusion was not found).

RT-PCR assay was subsequently used to examine a panel of 46 breast, 28 colorectal, and 50 NSCLC tumor-derived cell lines for the presence of EML4-ALK. Based on the sequencing of PCR products, two breast (HCC1500 and ZR75-1), one colorectal (SW1417), and five NSCLC cell lines (H460, H1975, HOP18, RERF-LC-KJ, VWRC-LCD) were identified to harbor the EML4-ALK fusion variant E13; A20 (Supplementary Fig. S2). NSCLC cell line H2228 harboring EML4-ALK variant E6a/b; A20 was used as a positive control (22). H2228 cells were shown to express a much higher level of the fusion transcript than most NSCLC tumors (17). Compared with H2228, all of the identified cell lines showed lower levels of EML4-ALK transcript expression. Similar to NSCLC tumors, inconsistent RT-PCR detection of EML4-ALK fusion was observed in cell lines. Two identified _EML4-ALK_–positive NSCLC cell lines, H460 and H1975, were previously shown as negative for the expression of EML4-ALK protein (29). Similarly, our effort to detect the fusion protein in _EML4-ALK_–positive cell lines was unsuccessful, except in H2228 cells (data not shown). The discrepancy in detection of EML4-ALK transcripts and the fusion protein was also observed in NSCLC tumor samples (17).

Genomic Confirmation of EML4-ALK Translocation

The genomic structure of EML4-ALK rearrangement was identified in tumor samples representative of NSCLC and breast and colorectal carcinomas. For long-range genomic PCR, we used PCR primers residing in EML4 introns 13 and 20 and in ALK intron 19 (see Materials and Methods). Genomic EML4-ALK fusion points were identified in two NSCLC samples harboring variant E13; A20 (Fig. 4A). The breakpoints in HF-15512 were located 4,726 bp downstream of EML4 exon 13 and 1,021 bp upstream of ALK exon 20, whereas the ones in HF-15560 were located 3,116 bp downstream of EML4 exon 13 and 523 bp upstream of ALK exon 20. These fusion points were distinct and different from those reported previously (12, 16, 18, 19). The genomic structure of the EML4-ALK fusion was characterized in breast tumor samples harboring variant E13; A20 and E20; A20 (Fig. 4A). The breakpoints in HF-21749 were located 385 bp downstream of EML4 exon 13 and 1,066 bp upstream of ALK exon 20; these breakpoints were different from the ones identified in NSCLC samples. In HF-20579 sample, the breakpoints were located 370 bp downstream of EML4 exon 20 and 949 bp upstream of ALK exon 20 (Fig. 4B). Genomic EML4-ALK fusion points were also identified in colorectal tumor sample HF-18092 harboring the E21; A20 variant. Sequence analysis revealed that EML4 was disrupted at 1,831 bp downstream of EML4 exon 21 and was ligated to a position 726 bp upstream of ALK exon 20 (Fig. 4C).

FIGURE 4.

FIGURE 4. Schematic representation of genomic EML4-ALK fusion points. The genomic structure of EML4-ALK fusion was characterized in NSCLC, breast, and colorectal tumor samples harboring transcript variant E13; A20 (A), E20; A20 (B), and E21; A20 (C). The schematic illustration above electropherograms shows an approximate location of breakpoint relative to EML4 and ALK exons. The precise EML4-ALK breakpoint in each tumor sample is indicated by the number of base pairs upstream and downstream of representative EML4 and ALK exons, respectively. The exact EML4-ALK breakpoint is marked on each electropherogram by a dotted line. The electropherograms for two NSCLC samples are shown in the reverse direction.

Schematic representation of genomic EML4-ALK fusion points. The genomic structure of EML4-ALK fusion was characterized in NSCLC, breast, and colorectal tumor samples harboring transcript variant E13; A20 (A), E20; A20 (B), and E21; A20 (C). The schematic illustration above electropherograms shows an approximate location of breakpoint relative to EML4 and ALK exons. The precise EML4-ALK breakpoint in each tumor sample is indicated by the number of base pairs upstream and downstream of representative EML4 and ALK exons, respectively. The exact EML4-ALK breakpoint is marked on each electropherogram by a dotted line. The electropherograms for two NSCLC samples are shown in the reverse direction.

FIGURE 4.

FIGURE 4. Schematic representation of genomic EML4-ALK fusion points. The genomic structure of EML4-ALK fusion was characterized in NSCLC, breast, and colorectal tumor samples harboring transcript variant E13; A20 (A), E20; A20 (B), and E21; A20 (C). The schematic illustration above electropherograms shows an approximate location of breakpoint relative to EML4 and ALK exons. The precise EML4-ALK breakpoint in each tumor sample is indicated by the number of base pairs upstream and downstream of representative EML4 and ALK exons, respectively. The exact EML4-ALK breakpoint is marked on each electropherogram by a dotted line. The electropherograms for two NSCLC samples are shown in the reverse direction.

Schematic representation of genomic EML4-ALK fusion points. The genomic structure of EML4-ALK fusion was characterized in NSCLC, breast, and colorectal tumor samples harboring transcript variant E13; A20 (A), E20; A20 (B), and E21; A20 (C). The schematic illustration above electropherograms shows an approximate location of breakpoint relative to EML4 and ALK exons. The precise EML4-ALK breakpoint in each tumor sample is indicated by the number of base pairs upstream and downstream of representative EML4 and ALK exons, respectively. The exact EML4-ALK breakpoint is marked on each electropherogram by a dotted line. The electropherograms for two NSCLC samples are shown in the reverse direction.

Close modal

FISH assay was used to confirm the EML4-ALK translocation in tumor samples and tumor-derived cell lines positive for the fusion transcript. Instead of using a separate break-apart probe for each EML4 and ALK gene, we devised a three-color assay that detects the fusion and associated copy number changes within the same cell in a single experiment. Based on our assay design (Fig. 5A), we expected to see an array of red-blue-green fluorescence in cells deficient for EML4-ALK (Fig. 5B) and a red-green fusion signal accompanied by a separate blue signal in cells harboring the fusion (Fig. 5C). The presence of EML4-ALK translocation was confirmed by FISH in all of the examined tumor samples and cell lines. FISH analysis revealed substantial tumor heterogeneity where only a subset of cells ranging from 41.5% to 52.5% harbored the EML4-ALK translocation. Compared with the tumors, cell lines showed even more heterogeneity where EML4-ALK fusion signal was detected in less than 33% of examined cells (Supplementary Table S1). Such vast heterogeneity of EML4-ALK rearrangement in tumors and cell lines was unexpected and has not been reported previously. In addition, a low-level copy number gain of the whole 2p21-23 region was frequently seen in cell lines as well as in tumors (Fig. 5D). Two or more EML4-ALK fusion signals per cell were also observed in several cell lines. There was no deletion of the sequences centromeric to the ALK and telomeric to the EML4 locus, suggesting that a small inversion is the underlying mechanism of the EML4-ALK translocation.

FIGURE 5.

FIGURE 5. Schematic representation of the FISH assay and examples of observed alterations. A. The design of FISH assay including representative BACs and fluorescent colors used for detecting ALK (red), region in between ALK and EML4 (IB, blue) and EML4 (green). B. An example of a normal cell where ALK, IB, and EML4 are shown as red-blue-green array of fluorescence. C. An example of a tumor cell with EML4-ALK translocation showing a red-green fusion signal accompanied by a separate blue IB signal. D. An example of a tumor cell with a low copy number gain of the whole 2p21-23 region.

Schematic representation of the FISH assay and examples of observed alterations. A. The design of FISH assay including representative BACs and fluorescent colors used for detecting ALK (red), region in between ALK and EML4 (IB, blue) and EML4 (green). B. An example of a normal cell where ALK, IB, and EML4 are shown as red-blue-green array of fluorescence. C. An example of a tumor cell with EML4-ALK translocation showing a red-green fusion signal accompanied by a separate blue IB signal. D. An example of a tumor cell with a low copy number gain of the whole 2p21-23 region.

FIGURE 5.

FIGURE 5. Schematic representation of the FISH assay and examples of observed alterations. A. The design of FISH assay including representative BACs and fluorescent colors used for detecting ALK (red), region in between ALK and EML4 (IB, blue) and EML4 (green). B. An example of a normal cell where ALK, IB, and EML4 are shown as red-blue-green array of fluorescence. C. An example of a tumor cell with EML4-ALK translocation showing a red-green fusion signal accompanied by a separate blue IB signal. D. An example of a tumor cell with a low copy number gain of the whole 2p21-23 region.

Schematic representation of the FISH assay and examples of observed alterations. A. The design of FISH assay including representative BACs and fluorescent colors used for detecting ALK (red), region in between ALK and EML4 (IB, blue) and EML4 (green). B. An example of a normal cell where ALK, IB, and EML4 are shown as red-blue-green array of fluorescence. C. An example of a tumor cell with EML4-ALK translocation showing a red-green fusion signal accompanied by a separate blue IB signal. D. An example of a tumor cell with a low copy number gain of the whole 2p21-23 region.

Close modal

Small Interfering RNA Silencing of EML4-ALK Fusion

To elucidate the functional significance of EML4-ALK in cell growth and proliferation, we examined the number of viable cells following small interfering RNA (siRNA)–mediated silencing of the EML4 and ALK genes. Survival of cells treated with siRNAs targeting EML4-ALK fusion (5′ EML4 and 3′ ALK siRNA pools) was compared with that of cells treated with siRNAs targeting endogenous EML4 and ALK (3′ EML4 and 5′ ALK siRNA pools, respectively). Upon silencing of 5′ EML4 and 3′ ALK, more than 50% growth inhibition was seen in three cell lines, including H2228, H460, and HCC1500 (Fig. 6A). The growth inhibition was first observed at 48 hours posttransfection and it remained significant during the 72- and 96-hour time points (P ≤ 0.01 from one-tail t test). Breast cell line ZR75-1 showed growth inhibition of less than 20% only at the 96-hour time point. A similar response was observed in two other cell lines, VWRC-LCD and RERF-LC-KJ (data not shown). The remaining three EML4-ALK cell lines, H1975, HOP18, and SW1417, did not show change in cell viability under any condition (Fig. 6A). A different growth response observed between the cell lines suggested their variable level of dependence on _ALK_-mediated cell survival signaling. To verify that the observed growth inhibition was specific to cells harboring the EML4-ALK fusion, we also measured cell viability in several cell lines lacking the fusion (A549, MDA-MB-231, and H838). All of them consistently showed no change in cell viability upon silencing of EML4 and ALK (Fig. 6B), suggesting that the EML4-ALK fusion is important for the growth of cells expressing this oncokinase.

FIGURE 6.

FIGURE 6. siRNA silencing of EML4 and ALK in cell lines. Cell viability was measured upon transfection of each cell line with control TOX siRNA pool (siTOX) and four siRNA pools (5′ EML4, 3′ EML4, 5′ ALK, 3′ ALK). Cell viability (Y axis) for each sample and condition was calculated as a ratio of the number of viable cells divided by that observed upon transfection with nontargeting siRNA pool. SD for each sample was calculated from three independent replicates and is indicated by error bars. *, statistically significant decrease of cell viability (P ≤ 0.01 from one-tail t test). Cell viability response in cell lines harboring EML4-ALK (A) and deficient for the fusion (B). C. The level of EML4 and ALK transcripts was measured by quantitative PCR in cells before and after transfection with 5′ EML4 and 3′ ALK siRNA pools. Percentage of EML4 (top) or ALK (bottom) expression in transfected cells (black bars) is shown relative to that in nontransfected cells (white bars).

siRNA silencing of EML4 and ALK in cell lines. Cell viability was measured upon transfection of each cell line with control TOX siRNA pool (siTOX) and four siRNA pools (5′ EML4, 3′ EML4, 5′ ALK, 3′ ALK). Cell viability (Y axis) for each sample and condition was calculated as a ratio of the number of viable cells divided by that observed upon transfection with nontargeting siRNA pool. SD for each sample was calculated from three independent replicates and is indicated by error bars. *, statistically significant decrease of cell viability (P ≤ 0.01 from one-tail t test). Cell viability response in cell lines harboring EML4-ALK (A) and deficient for the fusion (B). C. The level of EML4 and ALK transcripts was measured by quantitative PCR in cells before and after transfection with 5′ EML4 and 3′ ALK siRNA pools. Percentage of EML4 (top) or ALK (bottom) expression in transfected cells (black bars) is shown relative to that in nontransfected cells (white bars).

FIGURE 6.

FIGURE 6. siRNA silencing of EML4 and ALK in cell lines. Cell viability was measured upon transfection of each cell line with control TOX siRNA pool (siTOX) and four siRNA pools (5′ EML4, 3′ EML4, 5′ ALK, 3′ ALK). Cell viability (Y axis) for each sample and condition was calculated as a ratio of the number of viable cells divided by that observed upon transfection with nontargeting siRNA pool. SD for each sample was calculated from three independent replicates and is indicated by error bars. *, statistically significant decrease of cell viability (P ≤ 0.01 from one-tail t test). Cell viability response in cell lines harboring EML4-ALK (A) and deficient for the fusion (B). C. The level of EML4 and ALK transcripts was measured by quantitative PCR in cells before and after transfection with 5′ EML4 and 3′ ALK siRNA pools. Percentage of EML4 (top) or ALK (bottom) expression in transfected cells (black bars) is shown relative to that in nontransfected cells (white bars).

siRNA silencing of EML4 and ALK in cell lines. Cell viability was measured upon transfection of each cell line with control TOX siRNA pool (siTOX) and four siRNA pools (5′ EML4, 3′ EML4, 5′ ALK, 3′ ALK). Cell viability (Y axis) for each sample and condition was calculated as a ratio of the number of viable cells divided by that observed upon transfection with nontargeting siRNA pool. SD for each sample was calculated from three independent replicates and is indicated by error bars. *, statistically significant decrease of cell viability (P ≤ 0.01 from one-tail t test). Cell viability response in cell lines harboring EML4-ALK (A) and deficient for the fusion (B). C. The level of EML4 and ALK transcripts was measured by quantitative PCR in cells before and after transfection with 5′ EML4 and 3′ ALK siRNA pools. Percentage of EML4 (top) or ALK (bottom) expression in transfected cells (black bars) is shown relative to that in nontransfected cells (white bars).

Close modal

To verify sufficient siRNA-mediated silencing of EML4-ALK, we measured the level of EML4 and ALK transcripts in several cell lines before and after transfection. Similar to the siRNA pools, pairs of gene-specific primers and probes located within 5′ and 3′ of EML4 and 5′ and 3′ of ALK were used (see Materials and Methods). A significant decrease of EML4 transcript ranging from 45% to 80% was observed upon silencing of 5′ EML4 in all cell lines, including the ones with (H2228, H460, HC1500) and without the fusion (A549; Fig. 6C). Silencing of 3′ EML4 was equally efficient, resulting in a similar decrease of endogenous EML4 transcript (data not shown). A substantial decrease of ALK transcript ranging from 30% to 55% was detected upon silencing of 3′ ALK in all of the cell lines (Fig. 6C). However, the efficiency of silencing 5′ ALK could not be determined because the level of the endogenous ALK transcript was below detectable range (high Ct values). Collectively, these data showed that silencing of EML4-ALK fusion was sufficient and, moreover, responsible for the observed cell growth inhibition in H2228, H460, and HCC1500 cells.

Discussion

The discovery of translocations and their corresponding gene fusion products in solid tumors could potentially increase with the use of innovative approaches that enable their detection. Although one study showed the feasibility of using exon array data for the detection of TMPRSS2-ERG fusion (8), a systematic approach for the identification of gene rearrangements in multiple carcinomas has not yet been reported. Here, we describe for the first time one such approach where exon array profiling of tumor samples in combination with a novel computational method have resulted in the detection of ALK gene fusion in three widespread carcinomas, including breast, colorectal, and previously known NSCLC. The algorithm we developed searches for abrupt changes in the level of expression between two stretches of consecutive exons across multiple samples. From a global search of breast, colorectal, and NSCLC datasets, a putative breakpoint between exons 19 and 20 of ALK with an extremely high M score suggested potentially the presence of the same underlying rearrangement in all three tumor types. As our approach infers genomic aberrations from expression change at the transcript level, gene rearrangements without significant change in expression could not be detected. In addition, many other events, including alternative splicing, may contribute to expression differences between exon probe sets. Therefore, it is important to incorporate other information such as protein domain composition when prioritizing novel, biologically relevant genomic aberrations. One obstacle of using exon array data was the poor performance of numerous probe sets. We sought to mitigate the effect of poor probes by excluding those that cross-hybridize to different genomic locations. Another limitation of identifying putative breakpoints from exon array data is that the predicted location is dependent on the genomic position of the probe sets and thus breakpoints located within intergenic regions could not be detected.

Although the presence of EML4-ALK fusion in NSCLC has been well documented (12, 14-18), our study is the first one to report its occurrence in breast and colorectal carcinomas. Based on RT-PCR screening of patient samples, we detected the presence of EML4-ALK fusion in 2.4% of breast, 2.4% colorectal, and in 11.3% of NSCLC. Although others have searched for EML4-ALK fusion in breast and colorectal tumor samples (19, 20), it was not found likely due to its low frequency. Despite of the low frequency, the recurrence of EML4-ALK fusion in breast and colorectal carcinoma represents a significant increase in the number of patients (∼5,000 per year in the United States) and exceeds that of ALCL patients where ALK translocation is present at a much higher frequency. The frequency of EML4-ALK fusion in NSCLC in our study is slightly higher than those reported previously (3-7%; refs. 12, 14-18). Patient samples from Asian and Caucasian populations were part of our collection; however, any other information on patient history was not available. Factors such as ethnicity, age, gender, tumor histology, mutations in epidermal growth factor receptor (EGFR), KRAS and TP53, tobacco exposure, etc., may have contributed to the observed difference in EML4-ALK frequency in NSCLC. For example, a recent study reported that 4.9% of Chinese NSCLC patients contained EML4-ALK fusion; however, its frequency was much higher in lung adenocarcinomas from nonsmoking women that were wild-type for EGFR and KRAS (29%; ref. 18). Similar to other studies (14, 16, 18, 31), the presence of EML4-ALK fusion was detected only in adenocarcinomas. All of our _EML4-ALK_–positive NSCLC samples were wild-type for EGFR and KRAS (data not shown), confirming that the presence of fusion and EGFR/KRAS mutations are mutually exclusive (12, 14, 16, 18). In summary, the presence of EML4-ALK fusion in multiple carcinomas, including breast, colorectal, and NSCLC, shows that its occurrence is broader than previously thought and, furthermore, not specific to NSCLC. Similarly, a recent study reported the presence of EML4-ALK in nonneoplastic lung tissue, further questioning its specificity to NSCLC (17).

A number of different EML4-ALK transcript variants have been reported up to date (12, 18, 20-23). The most frequent EML4-ALK variants in NSCLC are E13; A20 (variant 1; ref. 12), E20; A20 (variant 2; ref. 12), and E6a/b; A20 (variant 3; refs. 21, 22). A multiplex RT-PCR assay was recently developed for screening all possible in-frame EML4-ALK variants, including the one where EML4 exon 21 is fused to ALK exon 20 (20). However, variant E21; A20 has not yet been found, suggesting that it is rare or absent in NSCLC (16, 18). Here, we have shown that the E21; A20 variant occurs not in NSCLC but in colorectal cancer. We have also characterized the genomic structure underlying the EML4-ALK variant E21; A20 by identifying precise fusion points in the colorectal tumor sample HF-18092. Similarly, the genomic structure of EML4-ALK fusion was identified in breast and NSCLC samples harboring the E13; A20 and E20; A20 variants. All of the genomic breakpoints were diverse and different from the ones reported previously (12, 16, 18, 19), confirming that multiple genomic EML4-ALK rearrangements result in the production of the same transcript variant.

The FISH assay was used to confirm the presence of EML4-ALK translocation in tumor samples and cell lines harboring EML4-ALK fusion. FISH data had verified that a simple inversion, rather than deletion, is the underlying mechanism of EML4-ALK translocation. Similar to a recent study (16), genomic rearrangements supporting a possibility of other fusion partners for ALK or EML4 were not observed. A substantial heterogeneity of the EML4-ALK translocation in both primary tumors and cell lines has been revealed by FISH analysis. For example, only up to 53% of the tumor cells were found to harbor EML4-ALK translocation. Moderate heterogeneity of EML4-ALK in NSCLC tumors (50-100% cells) was previously reported in a study that used tissue microarrays (15). In contrast to tissue microarrays, which provide information only for a small portion of the tumor, whole tissue sections used here offered an opportunity to examine a large number of cells throughout the tumor, resulting in a better assessment of sample heterogeneity. Interestingly, a recent study showed an even lower percentage of _EML4-ALK_–positive cells (∼2%) in nine NSCLC tumor samples harboring the fusion transcripts (17). Such vast heterogeneity should be carefully examined because it has significant consequences on future diagnostic and therapeutic approaches designed for patient populations harboring the EML4-ALK rearrangement.

The functional role of EML4-ALK fusion in cell growth and proliferation was assessed by measuring cell viability following siRNA-mediated silencing of EML4 and ALK. Cell growth inhibition was never observed in cell lines lacking the EML4-ALK fusion. On the contrary, cell lines harboring the EML4-ALK fusion showed variable growth response following siRNA silencing. Consistent and significant growth inhibition was observed in three cell lines, including lung H2228 and H460 and breast HCC1500. In contrast to our data, decreased cell growth of H2228 cells was not previously observed upon siRNA silencing of ALK (32). Similarly, H2228 cells exhibited drug sensitivity as well as resistance when treated with the ALK-selective inhibitor TAE684 (22, 29). In our study, the magnitude of growth inhibition did not seem to correlate with the number of cells harboring EML4-ALK translocation. For example, all three cell lines showed a similar growth inhibition upon silencing of EML4-ALK, whereas the number of cells harboring the translocation was 2-fold higher in H2228 (25%) and H460 (29%) than in HCC1500 (11.2%). Although how growth inhibition occurred is not understood, this finding, together with the absence of growth response in some _EML4-ALK_–positive cell lines (H1975, HOP18, SW1417), suggests that other signaling mechanisms independent of ALK may regulate cell growth and proliferation. One such mechanism involves coactivation of other receptor tyrosine kinases and was reported for the _EML4-ALK_–positive cell line DFCI032 (22). DFCI032 cells were found resistant to ALK inhibitor TAE684 due to coactivation of EGFR and ERBB2 and only a combination of TAE684 and EGFR/ERBB2 inhibitor was effective in inhibiting their growth. Thus, it appears that the functional role of EML4-ALK in tumorigenesis could vary based on the level of tumor dependence on ALK signaling as well as on the presence of coexisting oncogenic events. A better understanding of both phenomena will enable further progress in designing effective treatments for patient population harboring the EML4-ALK gene fusion.

Materials and Methods

Samples

Patient tumor samples representative of breast, colorectal, and NSCLC were acquired from commercial sources and managed by Genentech's Human Tissue Bank. All tumor samples were classified by the Human Tissue Bank into the following subtypes: colorectal adenocarcinoma (83); breast HER2 positive (72), luminal (73) and basal (64); NSCL adenocarcinoma (57), squamous (46) and small cell (3). Tumor-derived cell lines were obtained from the American Type Culture Collection.

Exon Array

RNA from tumor samples was extracted using AllPrep (Qiagen) following the manufacturer's instructions. The RNA quantity was measured using Nanodrop ND-1000 UV-spectrophotometer (NanoDrop Technologies) and RNA quality was assessed using Agilent 2100 Bioanalyzer (Agilent Technologies). rRNA was first removed with RiboMinus Human Transcriptome Isolation (Invitrogen) and cDNA synthesis was done with the Whole Transcript Sense Target Labeling (Affymetrix). The cDNA was fragmented and biotin-labeled using Whole Transcript Terminal Labeling (Affymetrix). Biotinylated targets were hybridized onto Affymetrix human Exon 1.0 ST arrays following the manufacturer's protocol. The arrays were washed in the Fluidics Station 450 and scanned on the GeneChip scanner 3000 7G. Microarray data generated by profiling samples on exon array are deposited in the National Center for Biotechnology Information GEO database under accession number GSE16534.

Computational Algorithm

Expression intensities for the “core” probe sets were calculated using quantile normalization and the RMA-Sketch method from Affymetrix's Power Tools package. Probe sets having inconsistent cross-hybridization properties were excluded from the analysis. Expression intensities for each exon probe set were normalized across all samples by subtracting the mean and dividing the SD. Normalized expression values of all probe sets for a given gene were extracted into a matrix with rows representing samples and columns representing exon probe sets. To identify putative breakpoints in a given sample, each position between exon probe sets was examined for their ability to differentiate the probe sets into two distinct groups. Student's t statistic was calculated, comparing the expression values between the two groups of probe sets for each possible position. A T score was calculated as the maximum t statistic among all possible positions for the sample. If the T score was above a given threshold _t_0 (we used _t_0 >1), the position that gives the T score was identified as a putative breakpoint for the given sample. If the maximum t statistics from all possible breakpoints did not pass the threshold, no breakpoint was predicted and T score was set to 0 for the given sample. When multiple breakpoints for a given gene were predicted in different samples, the best breakpoint was determined by weighted majority voting. For each putative breakpoint, every sample was labeled as positive or negative as to whether the sample was predicted to have the respective breakpoint. Breakpoint with the highest sum of T scores from positive samples was identified as the best candidate for the given gene in the whole sample set. To prioritize gene fusion candidates in the whole genome, we sought to design a ranking metric without bias toward gene length or breakpoint frequency. An M score was calculated as the median of T scores among the positive samples for the candidate breakpoint for each gene. The statistical significance of the M score was evaluated using a null distribution empirically derived from permuting the positions of exon probe sets in each sample for 1,000 times. To correct for multiple hypothesis testing, we applied the Benjamini-Hochberg FDR procedure to obtain q values (33).

Detection of EML4-ALK Fusion

A fusion partner of ALK was determined by performing RNA ligase–mediated rapid amplification of 5′ and 3′ cDNA ends with GeneRacer kit (Invitrogen). First-strand cDNA was amplified with Advantage HD DNA polymerase mix (Clontech) using GeneRacer 5′ primer and _ALK_-6R primer (CATGAGGAAATCCAGTTCGTCCTG). Subsequent nested PCR was done using GeneRacer 5′ nested primer and _ALK_-2R primer (GAGGTCTTGCCAGCAAAGCAGTAG). Amplification products were gel purified with QIAquick gel extraction (Qiagen) and cloned using pCR4-TOPO TA Cloning (Invitrogen). Sequencing was done in Genentech Sequencing laboratory using 3730xl DNA Analyzer (Applied Biosystems). Sequencing products were analyzed with Sequencher software (Gene Codes). Basic Local Alignment and Search Tool against the BLAT database4

was used to determine the identity of unknown sequences.

RT-PCR Screening for EML4-ALK Transcripts

RT-PCR was carried out using One Step RT-PCR (Qiagen) and primers previously described for the detection of EML4-ALK variants 1, 2 (12), and 3 (21). PCR conditions for the detection of EML4-ALK fusion transcript included cDNA synthesis at 50°C for 30 min, denaturation at 95°C for 15 min, 40 cycles consisting of denaturation at 95°C for 30 s, annealing at 60°C for 30 s, and strand elongation at 72°C for 1 min and a final elongation step at 72°C for 10 min. As an internal control, we used primers for the glyceraldehyde-3-phosphate dehydrogenase (GAPDH; CAACGACCACTTTGTCAAGCTC and CTCTCTTCCTCTTGTGCTCTTGC) and performed 20 cycles of amplification. PCR products were resolved on agarose gel and their sizes were determined by using Trackit 1 kb Plus DNA ladder (Invitrogen). Fragments representing EML4-ALK fusion product were excised, gel purified, cloned, and sequenced as described above.

Identification of EML4-ALK Genomic Fusion Points

Genomic PCR was done with 50 to 100 ng of DNA in a 25 μL reaction containing LongAmp Taq DNA polymerase (New England Biolabs) under the following conditions: 3 min at 95°C followed by 30 cycles of 10 s at 95°C, 1 min at 55°C, and 10 min at 68°C plus a final extension for 20 min at 68°C. The genomic fusion points for the E13; A20 variant were identified using forward PCR primer Fusion-genome-S (12) or a primer residing in EML4 intron 13 (AGGAGAGAAAGAGCTGCAGTG) and reverse primer Fusion-genome-AS (12) or a primer located in ALK intron 19 (GCTCTGAACCTTTCCATCATACTT). For the detection of the E20; A20 and E21; A20 variants, the forward primers were placed within EML4 exon 20 (ACTGGTCCCCAGACAACAAG) or intron 20 (TTACTCTGTCAAATTGATGCTGCT), whereas the reverse primer was Fusion-genome-AS (12). The PCR products were resolved on agarose gel; if they appeared specific, the original PCR product was used for direct sequencing. However, if additional nonspecific fragments were present, the desired fragments were excised, gel purified, cloned, and sequenced as described above.

FISH

FISH assay was done on all tumor samples where formalin-fixed, paraffin-embedded sections were available and on tumor-derived cell lines identified as EML4-ALK positive by RT-PCR. All the locus-specific probes used for the FISH experiments were developed using bacterial artificial chromosomes (BAC) based on the UCSC Genome Browser March 2006 assembly. The FISH probe for the ALK locus composed of two overlapping BACs (RP11-328L16 and RP11-701P18). A single BAC (RP11-299C5) was used for the EML4 and three overlapping BACs (RP11-77G15, RP11-257N21, and CTD-786A2) were used to develop a probe for the region between ALK and EML4 (Fig. 5A). The probes were designed in such a way that the nuclei harboring EML4-ALK exhibit a red-green fusion signal, whereas the normal cells show an array of red, blue, and green signal. A commercially available probe for CEP2 (Abbott Laboratories) was used to confirm the localization of the above-described probes to chromosome 2. The hybridization efficiency of the FISH probes was >95%. At least a hundred nuclei per sample were analyzed for the EML4-ALK rearrangement. Probe preparation and FISH on cytogenetic and formalin-fixed, paraffin-embedded samples were done as described previously (34). The slides were visualized using an Olympus BX61 microscope and analyzed using FISHView software (Applied Spectral Imaging).

siRNA Silencing

Cells were plated in triplicates onto a 96-well plate, and each siRNA experiment was done at least three times. Four distinct siRNA pools targeting 5′ EML4, 3′ EML4, 5′ALK, and 3′ALK were used (Supplementary Fig. S3). Experimental controls included nontransfected cells and cells transfected with Lipofectamine 2000 (Invitrogen), TOX siRNA pool (Dharmacon), and nontargeting siRNA pool (Dharmacon). The CellTiter-Glo Luminescent Cell Viability assay (Promega) was used to determine the number of viable cells at 48, 72, and 96 h posttransfection. A relative decrease in cell growth was determined by comparing the number of cells upon silencing of 5′ EML4 and 3′ ALK (EML4-ALK fusion) with those upon silencing of 3′ EML4 and 5′ ALK (endogenous EML4 and ALK, respectively). The P value for each comparison at a given time point was calculated using a one-tail t test.

The efficacy of siRNA silencing was determined by measuring the relative quantity of EML4 and ALK transcripts in both untransfected and transfected cells. Cell pellets were collected 30 h posttransfection and RNA was prepared using RNeasy Mini Kit (Qiagen). Taqman assay was done in a two-step process using High-Capacity cDNA Reverse Transcription kit and TaqMan Gene Expression Master Mix (Applied Biosystems). Primers and probes targeting 5′ EML4 (Hs01040675_m1), 3′ EML4 (Hs00219420_m1), 5′ ALK (Hs01058321_m1), 3′ ALK (Hs00608289_m1), and GAPDH (4333764F) were purchased from Applied Biosystems, and the relative quantity of transcripts was determined following the manufacturer's protocol.

Disclosure of Potential Conflicts of Interest

No potential conflicts of interest were disclosed.

Acknowledgments

We thank the Genentech Human Tissue Bank group for providing tumor samples, the Sequencing laboratory for completing our sequencing requests, James Lee for his invaluable help with siRNA experiments, David Davis and Richard Neve for critical review of the manuscript, and Fred de Sauvage for guidance and support.

References

1

Rabbitts

TH

.

Chromosomal translocations in human cancer

.

Nature

1994

;

372

:

143

9

.

2

Rowley

JD

.

A new consistent chromosomal abnormality in chronic myelogenous leukemia identified by quinacrine fluorescence and Giemsa staining

.

Nature

1973

;

243

:

290

3

.

3

Futreal

PA

,

Coin

L

,

Marshall

M

, et al.

A census of human cancer genes

.

Nat Rev Cancer

2004

;

4

:

177

83

.

4

Mitelman

F

,

Johansson

B

,

Mertens

F

.

Fusion genes and rearranged genes as a linear function of chromosome aberrations in cancer

.

Nat Genet

2004

;

36

:

331

4

.

5

Mitelman

F

,

Johansson

B

,

Mertens

F

.

The impact of translocations and gene fusions on cancer causation

.

Nat Rev Cancer

2007

;

7

:

233

45

.

6

Speicher

MR

,

Carter

NP

.

The new cytogenetics: blurring the boundaries with molecular biology

.

Nat Rev Genet

2005

;

6

:

782

92

.

7

Tomlins

SA

,

Rhodes

DR

,

Perner

S

, et al.

Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer

.

Science

2005

;

310

:

644

8

.

8

Jhavar

S

,

Reid

A

,

Clark

J

, et al.

Detection of TMPRSS2-ERG translocations in human prostate cancer by expression profiling using GeneChip human Exon 1.0 ST arrays

.

J Mol Diagn

2008

;

10

:

50

7

.

9

Morris

SW

,

Kirstein

MN

,

Valentine

MB

, et al.

Fusion of a kinase gene, ALK, to a nucleolar protein gene, NPM, in non-Hodgkin's lymphoma

.

Science

1994

;

263

:

1281

4

.

10

Shiota

M

,

Fujimoto

J

,

Takenaga

M

, et al.

Diagnosis of t(2;5)(p23;q35)-associated Ki-1 lymphoma with immunohistochemistry

.

Blood

1994

;

84

:

3648

52

.

11

Pulford

K

,

Morris

SW

,

Turturro

F

, et al.

Anaplastic lymphoma kinase proteins in growth control and cancer

.

J Cell Physiol

2004

;

199

:

330

58

.

12

Soda

M

,

Choi

YL

,

Enomoto

M

, et al.

Identification of the transforming EML4-ALK fusion gene in non-small-cell lung cancer

.

Nature

2007

;

448

:

561

6

.

13

Mano

H

.

Non-solid oncogenes in solid tumors: EML4-ALK fusion genes in lung cancer

.

Cancer Sci

2008

;

99

:

2349

55

.

14

Inamura

K

,

Takeuchi

K

,

Togashi

Y

, et al.

EML4-ALK fusion is linked to histological characteristics in a subset of lung cancers

.

J Thorac Oncol

2008

;

3

:

13

7

.

15

Perner

S

,

Wagner

PL

,

Demichelis

F

, et al.

EML4-ALK fusion lung cancer: a rare acquired event

.

Neoplasia

2008

;

10

:

298

302

.

16

Shinmura

K

,

Kageyama

S

,

Tao

H

, et al.

EML4-ALK fusion transcripts, but no NPM-, TPM3-, CLTC-, ATIC-, or TFG-ALK fusion transcripts, in non-small cell lung carcinomas

.

Lung Cancer

2008

;

61

:

163

9

.

17

Martelli

MP

,

Sozzi

G

,

Hernandez

L

, et al.

EML4-ALK rearrangement in non-small cell lung cancer and non-tumor lung tissues

.

Am J Pathol

2009

;

174

:

661

70

.

18

Wong

DW

,

Leung

EL

,

So

KK

, et al.

The EML4-ALK fusion gene is involved in various histologic types of lung cancers from nonsmokers with wild-type EGFR and KRAS

.

Cancer

2009

;

115

:

1723

33

.

19

Fukuyoshi

Y

,

Inoue

H

,

Kita

Y

, et al.

EML4-ALK fusion transcript is not found in gastrointestinal and breast cancers

.

Br J Cancer

2008

;

98

:

1536

9

.

20

Takeuchi

K

,

Choi

YL

,

Soda

M

, et al.

Multiplex reverse transcription-PCR screening for EML4-ALK fusion transcripts

.

Clin Cancer Res

2008

;

14

:

6618

24

.

21

Choi

YL

,

Takeuchi

K

,

Soda

M

, et al.

Identification of novel isoforms of the EML4-ALK transforming gene in non-small cell lung cancer

.

Cancer Res

2008

;

68

:

4971

76

.

22

Koivunen

JP

,

Mermel

C

,

Zejnullahu

K

, et al.

EML4-ALK fusion gene and efficacy of an ALK kinase inhibitor in lung cancer

.

Clin Cancer Res

2008

;

14

:

4275

83

.

23

Takeuchi

K

,

Choi

YL

,

Togashi

Y

, et al.

KIF5B-ALK, a novel fusion oncokinase identified by an immunohistochemistry-based diagnostic system for ALK-positive lung cancer

.

Clin Cancer Res

2009

;

15

:

3143

49

.

24

Piva

R

,

Chiarle

R

,

Manazza

AD

, et al.

Ablation of oncogenic ALK is a viable therapeutic approach for anaplastic large-cell lymphomas

.

Blood

2006

;

107

:

689

97

.

25

Soda

M

,

Takada

S

,

Takeuchi

K

, et al.

A mouse model for EML4-ALK-positive lung cancer

.

Proc Natl Acad Sci U S A

2008

;

105

:

19893

7

.

26

Wan

W

,

Albom

MS

,

Lu

L

, et al.

Anaplastic lymphoma kinase activity is essential for the proliferation and survival of anaplastic large-cell lymphoma cells

.

Blood

2006

;

107

:

1617

23

.

27

Christensen

JG

,

Zou

HY

,

Arango

ME

, et al.

Cytoreductive antitumor activity of PF-3421066, a novel inhibitor of anaplastic lymphoma kinase and c-Met, in experimental models of anaplastic large-cell lymphoma

.

Mol Cancer Ther

2007

;

6

:

3314

22

.

28

Galkin

AV

,

Melnick

JS

,

Kim

S

, et al.

Identification of NVP-TAE684, a potent, selective and efficacious inhibitor of NPM-ALK

.

Proc Natl Acad Sci U S A

2007

;

104

:

270

5

.

29

McDermott

U

,

Iafrate

AJ

,

Gray

NS

, et al.

Genomic alterations of anaplastic lymphoma kinase may sensitize tumors to anaplastic lymphoma kinase inhibitors

.

Cancer Res

2008

;

68

:

3389

95

.

30

Finn

RD

,

Mistry

J

,

Schuster-Böckler

B

, et al.

Pfam: clans, web tools and services

.

Nucleic Acids Res

2006

;

34

:

D247

51

.r

31

Inamura

K

,

Takeuchi

K

,

Togashi

Y

, et al.

EML4-ALK lung cancers are characterized by rare other mutations, a TTF-1 cell lineage, an acinar histology and young onset

.

Mod Pathol

2009

;

22

:

508

15

.

32

Rikova

K

,

Guo

A

,

Zeng

Q

, et al.

Global survey of phosphotyrosine signaling identifies oncogenic kinases in lung cancer

.

Cell

2007

;

131

:

1190

203

.

33

Benjamini

Y

,

Hochberg

Y

.

Controlling the false discovery rate: a practical and powerful approach to multiple testing

.

J R Royal Stat Soc

1995

;

57

:

289

300

.

34

O'Brien

C

,

Cavet

G

,

Pandita

A

, et al.

Functional genomics identifies ABCC3 as a mediator of taxane resistance in HER2-amplified breast cancer

.

Cancer Res

2008

;

68

:

5380

9

.

Competing Interests

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

© 2009 American Association for Cancer Research.

2009

Supplementary data