Alternative Splice Variants, a New Class of Protein Cancer Biomarker Candidates: Findings in Pancreatic Cancer and Breast Cancer with Systems Biology Implications (original) (raw)

Pan-cancer repository of validated natural and cryptic mRNA splicing mutations [version 3; peer review: 2 approved, 1 approved with reservations

We present a major public resource of mRNA splicing mutations validated according to multiple lines of evidence of abnormal gene expression. Likely mutations present in all tumor types reported in the Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) were identified based on the comparative strengths of splice sites in tumor versus normal genomes, and then validated by respectively comparing counts of splice junction spanning and abundance of transcript reads in RNA-Seq data from matched tissues and tumors lacking these mutations. The comprehensive resource features 341,486 of these validated mutations, the majority of which (69.9%) are not present in the Single Nucleotide Polymorphism Database (dbSNP 150). There are 131,347 unique mutations which weaken or abolish natural splice sites, and 222,071 mutations which strengthen cryptic splice sites (11,932 affect both simultaneously). 28,812 novel or rare flagged variants (with <1% population frequency in dbSNP) were observed in multiple tumor tissue types. An algorithm was developed to classify variants into splicing molecular phenotypes that integrates germline heterozygosity, degree of information change and impact on expression. The classification thresholds were calibrated against the ClinVar clinical database phenotypic assignments. Variants are partitioned into allele-specific alternative splicing, likely aberrant and aberrant splicing phenotypes. Single variants or chromosome ranges can be queried using a Global Alliance for Genomics and Health (GA4GH)-compliant, web-based Beacon "Validated Splicing Mutations" either separately or in aggregate alongside other Beacons through the public , as well as through our. The website Beacon Network website provides additional information, such as a visual representation of supporting RNAseq results, gene expression in the corresponding normal tissues, and splicing molecular phenotypes.

Transcriptome-wide identification and study of cancer-specific splicing events across multiple tumors

Oncotarget, 2015

Dysregulation of alternative splicing (AS) is one of the molecular hallmarks of cancer, with splicing alteration of numerous genes in cancer patients. However, studying splicing mis-regulation in cancer is complicated by the large noise generated from tissue-specific splicing. To obtain a global picture of cancer-specific splicing, we analyzed transcriptome sequencing data from 1149 patients in The Cancer Genome Atlas project, producing a core set of AS events significantly altered across multiple cancer types. These cancer-specific AS events are highly conserved, are more likely to maintain protein reading frame, and mainly function in cell cycle, cell adhesion/migration, and insulin signaling pathways. Furthermore, these events can serve as new molecular biomarkers to distinguish cancer from normal tissues, to separate cancer subtypes, and to predict patient survival. We also found that most genes whose expression is closely associated with cancer-specific splicing are key regulat...

Detection of recurrent alternative splicing switches in tumor samples reveals novel signatures of cancer

Nucleic Acids Research, 2015

The determination of the alternative splicing isoforms expressed in cancer is fundamental for the development of tumor-specific molecular targets for prognosis and therapy, but it is hindered by the heterogeneity of tumors and the variability across patients. We developed a new computational method, robust to biological and technical variability, which identifies significant transcript isoform changes across multiple samples. We applied this method to more than 4000 samples from the The Cancer Genome Atlas project to obtain novel splicing signatures that are predictive for nine different cancer types, and find a specific signature for basal-like breast tumors involving the tumor-driver CTNND1. Additionally, our method identifies 244 isoform switches, for which the change occurs in the most abundant transcript. Some of these switches occur in known tumor drivers, including PPARG, CCND3, RALGDS, MITF, PRDM1, ABI1 and MYH11, for which the switch implies a change in the protein product. Moreover, some of the switches cannot be described with simple splicing events. Surprisingly, isoform switches are independent of somatic mutations, except for the tumor-suppressor FBLN2 and the oncogene MYH11. Our method reveals novel signatures of cancer in terms of transcript isoforms specifically expressed in tumors, providing novel potential molecular targets for prognosis and therapy.

Identification of Alternative Splicing Markers for Breast Cancer

Cancer Research, 2008

Breast cancer is the most common cause of cancer death among women under age 50 years, so it is imperative to identify molecular markers to improve diagnosis and prognosis of this disease. Here, we present a new approach for the identification of breast cancer markers that does not measure gene expression but instead uses the ratio of alternatively spliced mRNAs as its indicator. Using a high-throughput reverse transcription-PCR-based system for splicing annotation, we monitored the alternative splicing profiles of 600 cancer-associated genes in a panel of 21 normal and 26 cancerous breast tissues. We validated 41 alternative splicing events that significantly differed in breast tumors relative to normal breast tissues. Most cancer-specific changes in splicing that disrupt known protein domains support an increase in cell proliferation or survival consistent with a functional role for alternative splicing in cancer. In a blind screen, a classifier based on the 12 best cancer-associated splicing events correctly identified cancer tissues with 96% accuracy. Moreover, a subset of these alternative splicing events could order tissues according to histopathologic grade, and 5 markers were validated in a further blind set of 19 grade 1 and 19 grade 3 tumor samples. These results provide a simple alternative for the classification of normal and cancerous breast tumor tissues and underscore the putative role of alternative splicing in the biology of cancer. [Cancer Res 2008;68(22):9525-31]

Alternative splicing as a biomarker and potential target for drug discovery

Acta pharmacologica Sinica, 2015

Alternative splicing is a key process of multi-exonic gene expression during pre-mRNA maturation. In this process, particular exons of a gene will be included within or excluded from the final matured mRNA, and the resulting transcripts generate diverse protein isoforms. Recent evidence demonstrates that approximately 95% of human genes with multiple exons undergo alternative splicing during pre-mRNA maturation. Thus, alternative splicing plays a critical role in physiological processes and cell development programs, and.dysregulation of alternative splicing is highly associated with human diseases, such as cancer, diabetes and neurodegenerative diseases. In this review, we discuss the regulation of alternative splicing, examine the relationship between alternative splicing and human diseases, and describe several approaches that modify alternative splicing, which could aid in human disease diagnosis and therapy.

ASPicDB: a database of annotated transcript and protein variants generated by alternative splicing

Nucleic acids …, 2011

Alternative splicing is emerging as a major mechanism for the expansion of the transcriptome and proteome diversity, particularly in human and other vertebrates. However, the proportion of alternative transcripts and proteins actually endowed with functional activity is currently highly debated. We present here a new release of ASPicDB which now provides a unique annotation resource of human protein variants generated by alternative splicing. A total of 256,939 protein variants from 17,191 multi-exon genes have been extensively annotated through state of the art machine learning tools providing information of the protein type (globular and transmembrane), localization, presence of PFAM domains, signal peptides, GPI-anchor propeptides, transmembrane and coiled-coil segments. Furthermore, full-length variants can be now specifically selected based on the annotation of CAGE-tags and polyA signal and/or polyA sites, marking transcription initiation and termination sites, respectively. The retrieval can be carried out at gene, transcript, exon, protein or splice site level allowing the selection of data sets fulfilling one or more features settled by the user. The retrieval interface also enables the selection of protein variants showing specific differences in the annotated features. ASPicDB is available at http://www.caspur.it/ASPicDB/.

RNA sequencing of cancer reveals novel splicing alterations

Nature Scientific Reports, 2013

Breast cancer transcriptome acquires a myriad of regulation changes, and splicing is critical for the cell to “tailor-make” specific functional transcripts. We systematically revealed splicing signatures of the three most common types of breast tumors using RNA sequencing: TNBC, non-TNBC and HER2-positive breast cancer. We discovered subtype specific differentially spliced genes and splice isoforms not previously recognized in human transcriptome. Further, we showed that exon skip and intron retention are predominant splice events in breast cancer. In addition, we found that differential expression of primary transcripts and promoter switching are significantly deregulated in breast cancer compared to normal breast. We validated the presence of novel hybrid isoforms of critical molecules like CDK4, LARP1, ADD3, and PHLPP2. Our study provides the first comprehensive portrait of transcriptional and splicing signatures specific to breast cancer sub-types, as well as previously unknown transcripts that prompt the need for complete annotation of tissue and disease specific transcriptome.

An analysis of the sensitivity of proteogenomic mapping of somatic mutations and novel splicing events in cancer

Molecular & cellular proteomics : MCP, 2015

Improvements in mass spectrometry (MS)-based peptide sequencing provide a new opportunity to determine whether polymorphisms, mutations and splice variants identified in cancer cells are translated. Herein we apply a proteogenomic data integration tool (QUILTS) to illustrate protein variant discovery using whole genome, whole transcriptome and global proteome datasets generated from a pair of luminal and basal-like breast cancer patient derived xenografts (PDX). The sensitivity of proteogenomic analysis for singe nucleotide variant (SNV) expression and novel splice junction (NSJ) detection was probed using multiple MS/MS sample process replicates defined here as an independent tandem MS experiment using identical sample material. Despite analysis of over thirty sample process replicates, only about 10% of SNVs (somatic and germline) detected by both DNA and RNA sequencing were observed as peptides. An even smaller proportion of peptides corresponding to NSJ observed by RNA sequencin...

Validation of predicted mRNA splicing mutations using high-throughput transcriptome data

F1000Research, 2014

Interpretation of variants present in complete genomes or exomes reveals numerous sequence changes, only a fraction of which are likely to be pathogenic. Mutations have been traditionally inferred from allele frequencies and inheritance patterns in such data. Variants predicted to alter mRNA splicing can be validated by manual inspection of transcriptome sequencing data, however this approach is intractable for large datasets. These abnormal mRNA splicing patterns are characterized by reads demonstrating either exon skipping, cryptic splice site use, and high levels of intron inclusion, or combinations of these properties. We present, Veridical, an method for in silico the automatic validation of DNA sequencing variants that alter mRNA splicing. Veridical performs statistically valid comparisons of the normalized read counts of abnormal RNA species in mutant versus non-mutant tissues. This leverages large numbers of control samples to corroborate the consequences of predicted splicing variants in complete genomes and exomes. How to cite this article:

Proteomic Characterization of Novel Alternative Splice Variant Proteins in Human Epidermal Growth Factor Receptor 2/neu-Induced Breast Cancers

Cancer Research, 2010

Multifaceted alternative splicing in cancer cells greatly diversifies protein structure independently of genome changes, but characterization of cancer-associated splice variants is quite limited. In this study, we used mass spectrometric data to interrogate a custom-built database created with threeframe translations of mRNA sequences from Ensembl and ECgene to find alternative splice variant proteins. In mass spectrometric files from LC-MS/MS analyses of normal mouse mammary glands or mammary tumors derived from MMTV-Her-2/neu transgenic mice, we identified a total of 608 alternative splice variants, of which peptides from 216 proteins were found only in the tumor sample. Among the 608 splice variants were 68 novel proteins that were not completely matched to any known protein sequence in mice, for which we found known functional motifs. Biological process enrichment analysis of the splice variants identified suggested involvement of these proteins especially in cell motility and translation initiation. The cancer-associated differentially-expressed splice variant proteins offer novel biomarker candidates that may function in breast cancer progression or metastasis.