RNA seq Research Papers - Academia.edu (original) (raw)

Benign fibrous histiocytoma (BFH) is a mesenchymal tumor that most often occurs in the skin (socalled dermatofibroma), but may also appear in soft tissues (so-called deep BFH) and in the skeleton (so-called non-ossifying fibroma). The... more

Benign fibrous histiocytoma (BFH) is a mesenchymal tumor that most often occurs in the skin (socalled dermatofibroma), but may also appear in soft tissues (so-called deep BFH) and in the skeleton (so-called non-ossifying fibroma). The origin of BFH is unknown, and it has been questioned whether it is a true neoplasm. Chromosome banding, fluorescence in situ hybridization, single nucleotide polymorphism arrays, RNA sequencing, RT-PCR and quantitative real-time PCR were used to search for recurrent somatic mutations in a series of BFH. BFHs were found to harbor recurrent fusions of genes encoding membrane-associated proteins (podoplanin, CD63 and LAMTOR1) with genes encoding protein kinase C (PKC) isoforms PRKCB and PRKCD. PKCs are serine-threonine kinases that through their many phosphorylation targets are implicated in a variety of cellular processes, as well as tumor development. When inactive, the amino-terminal, regulatory domain of PKCs suppresses the activity of their catalytic domain. Upon activation, which requires several steps, they typically translocate to cell membranes, where they interact with different signaling pathways. The detected PDPN-PRKCB, CD63-PRKCD and LAMTOR1-PRKCD gene fusions are all predicted to result in chimeric proteins consisting of the membrane-binding part of PDPN, CD63 or LAMTOR1 and the entire catalytic domain of the PKC. This novel pathogenetic mechanism should result in constitutive kinase activity at an ectopic location. The results show that BFH indeed is a true neoplasm, and that distorted PKC activity is essential for tumorigenesis. The findings also provide means to differentiate BFH from other skin and soft tissue tumors. This article is part of a Directed Issue entitled: Rare cancers.

Chronic administration of lysergic acid diethylamide (LSD) every other day to rats results in a variety of abnormal behaviors. These build over the 90 day course of treatment and can persist at full strength for at least several months... more

Chronic administration of lysergic acid diethylamide (LSD) every other day to rats results in a variety of abnormal behaviors. These build over the 90 day course of treatment and can persist at full strength for at least several months after cessation of treatment. The behaviors are consistent with those observed in animal models of schizophrenia and include hyperactivity, reduced sucrose-preference, and decreased social interaction. In order to elucidate molecular changes that underlie these aberrant behaviors, we chronically treated rats with LSD and performed RNA-Sequencing on the medial prefrontal cortex (mPFC), an area highly associated with both the actions of LSD and the pathophysiology of schizophrenia and other psychiatric illnesses. We observed widespread changes in the neurogenetic state of treated animals four weeks after cessation of LSD treatment. QPCR was used to validate a subset of gene expression changes observed with RNA-Seq, and confirmed a significant correlation between the two methods. Functional clustering analysis indicates differentially expressed genes are enriched in pathways involving neurotransmission (Drd2, Gabrb1), synaptic plasticity (Nr2a, Krox20), energy metabolism (Atp5d, Ndufa1) and neuropeptide signaling (Npy, Bdnf), among others. Many processes identified as altered by chronic LSD are also implicated in the pathogenesis of schizophrenia, and genes affected by LSD are enriched with putative schizophrenia genes. Our results provide a relatively comprehensive analysis of mPFC transcriptional regulation in response to chronic LSD, and indicate that the long-term effects of LSD may bear relevance to psychiatric illnesses, including schizophrenia.

Background: Molecular hydrogen, given its pollution-free combustion, has great potential to replace fossil fuels in future transportation and energy production. However, current industrial hydrogen production processes, such as steam... more

Background: Molecular hydrogen, given its pollution-free combustion, has great potential to replace fossil fuels in future transportation and energy production. However, current industrial hydrogen production processes, such as steam reforming of methane, contribute significantly to the greenhouse effect. Therefore alternative methods, in particular the use of fermentative microorganisms, have attracted scientific interest in recent years. However the low overall yield obtained is a major challenge in biological H 2 production. Thus, a thorough and detailed understanding of the relationships between genome content, gene expression patterns, pathway utilisation and metabolite synthesis is required to optimise the yield of biohydrogen production pathways. Results: In this study transcriptomic and proteomic analyses of the hydrogen-producing bacterium Clostridium butyricum CWBI 1009 were carried out to provide a biomolecular overview of the changes that occur when the metabolism shifts to H 2 production. The growth, H 2-production, and glucose-fermentation profiles were monitored in 20 L batch bioreactors under unregulated-pH and fixed-pH conditions (pH 7.3 and 5.2). Conspicuous differences were observed in the bioreactor performances and cellular metabolisms for all the tested metabolites, and they were pH dependent. During unregulated-pH glucose fermentation increased H 2 production was associated with concurrent strong up-regulation of the nitrogenase coding genes. However, no such concurrent up-regulation of the [FeFe] hydrogenase genes was observed. During the fixed pH 5.2 fermentation, by contrast, the expression levels for the [FeFe] hydrogenase coding genes were higher than during the unregulated-pH fermentation, while the nitrogenase transcripts were less abundant. The overall results suggest, for the first time, that environmental factors may determine whether H 2 production in C. butyricum CWBI 1009 is mediated by the hydrogenases and/or the nitrogenase. Conclusions: This work, contributing to the field of dark fermentative hydrogen production, provides a multidisciplinary approach for the investigation of the processes involved in the molecular H 2 metabolism of clostridia. In addition, it lays the groundwork for further optimisation of biohydrogen production pathways based on genetic engineering techniques.

Although a number of genes that play key roles during the meiotic process have been characterized in great detail, the whole process of meiosis is still not completely unraveled. To gain insight into the bigger picture, large-scale... more

Although a number of genes that play key roles during the meiotic process have been characterized in great detail, the whole process of meiosis is still not completely unraveled. To gain insight into the bigger picture, large-scale approaches like RNA-seq and microarray can help to elucidate the transcriptome landscape during plant meiosis, discover co-regulated genes, enriched processes, and highly expressed known and unknown genes which might be important for meiosis. These high-throughput studies are gaining more and more popularity, but their beginnings in plant systems reach back as far as the 1960's. Frequently, whole anthers or post-meiotic pollen were investigated, while less data is available on isolated cells during meiosis, and only few studies addressed the transcriptome of female meiosis. For this review, we compiled meiotic transcriptome studies covering different plant species, and summarized and compared their key findings. Besides pointing to consistent as well ...

The Parasitic Plant Genome Project has sequenced transcripts from three parasitic species and a nonparasitic relative in the Orobanchaceae with the goal of understanding genetic changes associated with parasitism. The species studied span... more

The Parasitic Plant Genome Project has sequenced transcripts from three parasitic species and a nonparasitic relative in the Orobanchaceae with the goal of understanding genetic changes associated with parasitism. The species studied span the trophic spectrum from free-living nonparasite to obligate holoparasite. Parasitic species used were Triphysaria versicolor, a photosynthetically competent species that opportunistically parasitizes roots of neighboring plants; Striga hermonthica, a hemiparasite that has an obligate need for a host; and Orobanche aegyptiaca, a holoparasite with absolute nutritional dependence on a host. Lindenbergia philippensis represents the closest nonparasite sister group to the parasitic Orobanchaceae and was included for comparative purposes. Tissues for transcriptome sequencing from each plant were gathered to identify expressed genes for key life stages from seed conditioning through anthesis. Two of the species studied, S. hermonthica and O. aegyptiaca, are economically important weeds and the data generated by this project are expected to aid in research and control of these species and their relatives. The sequences generated through this project will provide an abundant resource of molecular markers for understanding population dynamics, as well as provide insight into the biology of parasitism and advance progress toward understanding parasite virulence and host resistance mechanisms. In addition, the sequences provide important information on target sites for herbicide action or other novel control strategies such as trans-specific gene silencing. Nomenclature: Egyptian broomrape, Orobanche aegyptiaca (Pers.) (Syn. Phelipanche aegyptiaca) ORAAE; Lindenbergia philippensis (Cham. & Schltdl.) Benth. LINPH; yellowbeak owl's-clover, Triphysaria versicolor (Fisch. & C.A. Mey) TRVEV; purple witchweed, Striga hermonthica, (Del.) Benth. STRHE.

Background & Aims: b-Catenin is an oncogene frequently mutated in hepatocellular carcinoma. In this study, we investigated target genes of b-catenin signaling in hepatocyte proliferation. Methods: We studied transgenic mice displaying... more

Background & Aims: b-Catenin is an oncogene frequently mutated in hepatocellular carcinoma. In this study, we investigated target genes of b-catenin signaling in hepatocyte proliferation. Methods: We studied transgenic mice displaying either inactivation or activation of the b-catenin pathway, focusing on analysis of liver proliferation due to aberrant b-catenin activation, and on the regeneration process during which b-catenin signaling is transiently activated. We localized in situ the various partners involved in proliferation or identified as targets of b-catenin in these transgenic and regenerating livers. We also performed comparative transcriptome analyses, using microarrays. Finally, we extracted, from deep-sequencing data, both the DNA regulatory elements bound to the b-catenin/Tcf nuclear complex and the expression levels of critical targets identified in microarrays. Results: b-Catenin activation during liver regeneration occurred during G1/S cell cycle progression and allowed zonal extension of the normal territory of active b-catenin and panlobular proliferation. We found that b-catenin controlled both cell-autonomous and non-cell-autonomous hepatocyte proliferation, through direct transcriptional and complex control of cyclin D1 gene expression and of the expression of a new target gene, Tgfa. Conclusions: We propose that b-catenin controls panlobular hepatocyte proliferation partly by controlling, together with its Tcf4 nuclear partner, expression of the pro-proliferation cyclin D1 and Tgfa genes. This study constitutes a first step toward understanding the oncogenic properties of this prominent signaling pathway in the liver. Ó

Understanding the molecular mechanisms underlying insect compensatory responses to plant defenses could lead to improved plant resistance to herbivores. The Mp708 inbred line of maize produces the maize insect resistant 1-cysteine... more

Understanding the molecular mechanisms underlying insect compensatory responses to plant defenses could lead to improved plant resistance to herbivores. The Mp708 inbred line of maize produces the maize insect resistant 1-cysteine protease (Mir1-CP) toxin. Reduced feeding and growth of fall armyworm larvae fed on Mp708 was previously linked to impairment of nutrient utilization and degradation of the midgut (MG) peritrophic matrix (PM) by Mir1-CP. Here we examine the biochemical and transcriptional responses of fall armyworm larvae to Mir1-CP. Insect Intestinal Mucin (IIM) was severely depleted from pure PMs treated in vitro with recombinant Mir1-CP. Larvae fed on Mp708 midwhorls excrete frass largely depleted of IIM. Cracks, fissures and increased porosity previously observed in the PM of larvae fed on Mp708 midwhorls could ensue when Mir1-CP degrades the IIM that cross-links chitin fibrils in the PM. Both targeted and global transcriptome analyses were performed to determine how complete dissolution of the structure and function of the PM is prevented, enabling larvae to continue growing in the presence of Mir1-CP. The MGs from fall armyworm fed on Mp708 upregulate expression of genes encoding proteins involved in PM production as an apparent compensation to replace the disrupted PM structure and restore appropriate counter-current MG gradients. Also, several families of digestive enzymes (endopeptidases, aminopeptidases, lipases, amylase) were more highly expressed in MGs from larvae fed on Mp708 than MGs from larvae fed on diets lacking Mir1-CP (artificial diet, midwhorls from Tx601 or B73 maize). Impaired growth of larvae fed on Mp708 probably results from metabolic costs associated with higher production of PM constituents and digestive enzymes in a compensatory attempt to maintain MG function.

Understanding brain function involves improved knowledge about how the genome specifies such a large diversity of neuronal types. Transcriptome analysis of single neurons has been previously described using gene expression microarrays.... more

Understanding brain function involves improved knowledge about how the genome specifies such a large diversity of neuronal types. Transcriptome analysis of single neurons has been previously described using gene expression microarrays. Using high-throughput transcriptome sequencing (RNA-Seq), we have developed a method to perform single-neuron RNA-Seq. Following electrophysiology recording from an individual neuron, total RNA was extracted by aspirating the cellular contents into a fine glass electrode tip. The mRNAs were reverse transcribed and amplified to construct a single-neuron cDNA library, and subsequently subjected to high-throughput sequencing. This approach was applied to both individual neurons cultured from embryonic mouse hippocampus, as well as neocortical neurons from live brain slices. We found that the average pairwise Spearman's rank correlation coefficient of gene expression level expressed as RPKM (reads per kilobase of transcript per million mapped reads) w...

Technical advances such as the development of molecular cloning, Sanger sequencing, PCR and oligonucleotide microarrays are key to our current capacity to sequence, annotate and study complete organismal genomes. Recent years have seen... more

Technical advances such as the development of molecular cloning, Sanger sequencing, PCR and oligonucleotide microarrays are key to our current capacity to sequence, annotate and study complete organismal genomes. Recent years have seen the development of a variety of so-called 'next-generation' sequencing platforms, with several others anticipated to become available shortly. The previously unimaginable scale and economy of these methods, coupled with their enthusiastic uptake by the scientific community and the potential for further improvements in accuracy and read length, suggest that these technologies are destined to make a huge and ongoing impact upon genomic and post-genomic biology. However, like the analysis of microarray data and the assembly and annotation of complete genome sequences from conventional sequencing data, the management and analysis of next-generation sequencing data requires (and indeed has already driven) the development of informatics tools able to assemble, map, and interpret huge quantities of relatively or extremely short nucleotide sequence data. Here we provide a broad overview of bioinformatics approaches that have been introduced for several genomics and functional genomics applications of next-generation sequencing.

Background Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting... more

Background Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof. Results We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tes...

Transcriptome analysis through next-generation sequencing technologies allows the generation of detailed gene catalogs for non-model species, at the cost of new challenges with regards to computational requirements and bioinformatics... more

Transcriptome analysis through next-generation sequencing technologies allows the generation of detailed gene catalogs for non-model species, at the cost of new challenges with regards to computational requirements and bioinformatics expertise. Here, we present TRAPID, an online tool for the fast and efficient processing of assembled RNA-Seq transcriptome data, developed to mitigate these challenges. TRAPID offers high-throughput open reading frame detection, frameshift correction and includes a functional, comparative and phylogenetic toolbox, making use of 175 reference proteomes. Benchmarking and comparison against state-of-the-art transcript analysis tools reveals the efficiency and unique features of the TRAPID system. TRAPID is freely available at

Larvae of the tobacco budworm are major polyphagous pests throughout the Americas. Development of effective microbial biopesticides for this and related noctuid pests has been stymied by the natural resistance mediated innate immune... more

Larvae of the tobacco budworm are major polyphagous pests throughout the Americas. Development of effective microbial biopesticides for this and related noctuid pests has been stymied by the natural resistance mediated innate immune response. Hemocytes play an early and central role in activating and coordinating immune responses to entomopathogens. To approach this problem we completed RNA-seq expression profiling of hemocytes collected from larvae following an in vivo challenge with bacterial and fungal cell wall components to elicit an immune response. A de novo exome assembly was constructed by combination of sequence tags from all treatments. Sequence tags from each treatment were aligned separately with the assembly to measure expression. The resulting table of differential expression had >22,000 assemblies each with a distinct combination of annotation and expression. Within these assemblies >1,400 were upregulated and >1,500 downregulated by immune activation with bacteria or fungi. Orthologs to innate immune components of other insects were identified including pattern recognition, signal transduction pathways, antimicrobial peptides and enzymes, melanization and coagulation. Additionally orthologs of components regulating hemocytic functions such as autophagy, apoptosis, phagocytosis and nodulation were identified. Associated cellular oxidative defenses and detoxification responses were identified providing a comprehensive snapshot of the early response to elicitation.

Although Manduca sexta has significantly contributed to our knowledge on a variety of insect physiological processes, the lack of its genome sequence hampers the large-scale gene discovery, transcript profiling, and proteomic analysis in... more

Although Manduca sexta has significantly contributed to our knowledge on a variety of insect physiological processes, the lack of its genome sequence hampers the large-scale gene discovery, transcript profiling, and proteomic analysis in this biochemical model species. Here we report our implementation of the RNA-Seq cDNA sequencing approach based on massively parallel pyrosequencing, which allows us to categorize transcripts based on their relative abundances and to discover process-or tissue-specifically regulated genes simultaneously. We obtained 1,821,652 reads with an average length of 289 bp per read from fat body and hemocytes of naïve and microbe-injected M. sexta larvae. After almost all (92.1%) of these reads were assembled into 19,020 contigs, we identified 528 contigs whose relative abundances increased at least 5-and 8fold in fat body and hemocytes, respectively, after the microbial challenge. Polypeptides encoded by these contigs include pathogen recognition receptors, extracellular and intracellular signal mediators and regulators, antimicrobial peptides, and proteins with no known sequence but likely participating in defense in novel ways. We also found 250 and 161 contigs that were preferentially expressed in fat body and hemocytes, respectively. Furthermore, we integrated data from our previous study and generated a sequence database to support future gene annotation and proteomic analysis in M. sexta. In summary, we have successfully established a combined approach for gene discovery and expression profiling in organisms lacking known genome sequences.

Listeria monocytogenes is a virulent food-borne pathogen most often associated with the consumption of "ready-to-eat" foods. The organism is a common contaminant of food processing plants where it may persist for extended periods of time.... more

Listeria monocytogenes is a virulent food-borne pathogen most often associated with the consumption of "ready-to-eat" foods. The organism is a common contaminant of food processing plants where it may persist for extended periods of time. A commonly used approach for the control of Listeria monocytogenes in the processing environment is the application of biocides such as quaternary ammonium compounds. In this study, the transcriptomic response of a persistent strain of L. monocytogenes (strain 6179) on exposure to a sub-lethal concentration of the quaternary ammonium compound benzethonium chloride (BZT) was assessed. Using RNA-Seq, gene expression levels were quantified by sequencing the transcriptome of L. monocytogenes 6179 in the presence (4 ppm) and absence of BZT, and mapping each data set to the sequenced genome of strain 6179. Hundreds of differentially expressed genes were identified, and subsequent analysis suggested that many biological processes such as peptidoglycan biosynthesis, bacterial chemotaxis and motility, and carbohydrate uptake, were involved in the response of L. monocyotogenes to the presence of BZT. The information generated in this study further contributes to our understanding of the response of bacteria to environmental stress. In addition, this study demonstrates the importance of using the bacterium's own genome as a reference when analysing RNA-Seq data. (2014) Transcriptome analysis of Listeria monocytogenes exposed to biocide stress reveals a multi-system response involving cell wall synthesis, sugar uptake, and motility. Front. Microbiol. 5:68.

Full-length cDNA encoding two leptin sequences (tLepA and tLepB) and one leptin receptor sequence (tLepR) were identified in tilapia (Oreochromis niloticus). The full-length cDNA of tLepR was 3423 bp, encoding a protein of 1140 amino acid... more

Full-length cDNA encoding two leptin sequences (tLepA and tLepB) and one leptin receptor sequence (tLepR) were identified in tilapia (Oreochromis niloticus). The full-length cDNA of tLepR was 3423 bp, encoding a protein of 1140 amino acid (aa) which contained all functionally important domains conserved among vertebrate leptin receptors. The cDNAs of tLepA and tLepB were 486 bp and 459 bp in length, encoding proteins of 161 aa and 152 aa, respectively. Modeling the three-dimensional structures of tLepA and tLepB predicted strong conservation of tertiary structure with that of human leptin, comprised of four helixes. Using synteny, the tLeps were found near common genes, such as IMPDH1 and LLRC4. The cDNA for tLepA and tLepB was cloned and synthetic cDNA optimized for expression in Escherichia coli was prepared according to the cloned sequence. The tLepA-and tLepB-expressing plasmids were transformed into E. coli and expressed as recombinant proteins upon induction with nalidixic acid, found almost entirely in insoluble inclusion bodies (IBs). The proteins were solubilized, refolded and purified to homogeneity by anion-exchange chromatography. In the case of tLepA, the fraction eluted contained a mixture of monomers and dimers. The purified tLepA and tLepB monomers and tLepA dimer showed a single band of 15kDaonanSDS−polyacrylamidegelinthepresenceofreducingagent,whereasthetLepAdimershowedonebandof15 kDa on an SDS-polyacrylamide gel in the presence of reducing agent, whereas the tLepA dimer showed one band of 15kDaonanSDSpolyacrylamidegelinthepresenceofreducingagent,whereasthetLepAdimershowedonebandof30 kDa in the absence of reducing agent, indicating its formation by S-S bonds. The three tLeps were biologically active in promoting proliferation of BAF/3 cells stably transfected with the long form of human leptin receptor (hLepR), but their activity was four orders of magnitude lower than that of mammalian leptin. Furthermore, the three tLeps were biologically active in promoting STAT-LUC activation in COS7 cells transfected with the identified tLepR but not in cells transfected with hLepR. tLepA was more active than tLepB. Low or no activity likely resulted from low identity (9-22%) to mammalian leptins. In an in vivo experiment in which tilapia were fed ad libitum or fasted, there was no significant difference in the expressions of tLepA, tLepB or tLepR in the brain between the two groups examined both by real-time PCR and RNA next generation sequencing. In conclusion, in the present report we show novel, previously unknown sequences of tilapia leptin receptor and two leptins and prepare two biologically active recombinant leptin proteins.

RNA-Seq has become a widely used method to study transcriptomes, and it is now possible to perform RNA-Seq on almost any sample. Nevertheless, samples obtained from small cell populations are particularly challenging, as biases associated... more

RNA-Seq has become a widely used method to study transcriptomes, and it is now possible to perform RNA-Seq on almost any sample. Nevertheless, samples obtained from small cell populations are particularly challenging, as biases associated with low amounts of input RNA can have strong and detrimental effects on downstream analyses. Here we compare different methods to normalize RNA-Seq data obtained from minimal input material. Using RNA from isolated medaka pituitary cells, we have amplified material from six samples before sequencing. Both synthetic and real data are used to evaluate different normalization methods to obtain a robust and reliable pipeline for analysis of RNA-Seq data from samples with very limited input material. The analysis outlined here shows that quantile normalization outperforms other more commonly used normalization procedures when using amplified RNA as input and will benefit researchers employing low amounts of RNA in similar experiments.

The advent of Next Generation Sequencing (NGS) technologies has opened new possibilities for researchers. However, the more biology becomes a data-intensive field, the more biologists have to learn how to process and analyze NGS data with... more

The advent of Next Generation Sequencing (NGS) technologies has opened new possibilities for researchers. However, the more biology becomes a data-intensive field, the more biologists have to learn how to process and analyze NGS data with complex computational tools. Even with the availability of common pipeline specifications, it is often a time-consuming and cumbersome task for a bench scientist to install and configure the pipeline tools. We believe that a unified, desktop and biologist-friendly front end to NGS data analysis tools will substantially improve productivity in this field. Here we present NGS pipelines "Variant Calling with SAMtools", "Tuxedo Pipeline for RNA-seq Data Analysis" and "Cistrome Pipeline for ChIP-seq Data Analysis" integrated into the Unipro UGENE desktop toolkit. We describe the available UGENE infrastructure that helps researchers run these pipelines on different datasets, store and investigate the results and re-run the pipelines with the same parameters. These pipeline tools are included in the UGENE NGS package. Individual blocks of these pipelines are also available for expert users to create their own advanced workflows.

The technological advances of RNA-seq and de novo transcriptome assembly have enabled genome annotation and transcriptome profiling in highly heterozygous species such as grapevine (Vitis vinifera L.). This work is an attempt to utilize a... more

The technological advances of RNA-seq and de novo transcriptome assembly have enabled genome annotation and transcriptome profiling in highly heterozygous species such as grapevine (Vitis vinifera L.). This work is an attempt to utilize a de novo-assembled transcriptome of the V. vinifera cultivar 'Riesling' to improve annotation of the grapevine reference genome sequence. Here we show that the transcriptome assembly of a single V. vinifera cultivar is insufficient for a complete genome annotation of the grapevine reference genome constructed from V. vinifera PN40024. Further, we provide evidence that the gene models we identified cannot be completely anchored to the previously published V. vinifera PN40024 gene models. In addition to these findings, we present a computational pipeline for the de novo identification of lncRNAs. Our results demonstrate that, in grapevine, lncRNAs are significantly different from protein coding transcripts in such metrics as length, GC-content...

Intrahepatic cholangiocarcinoma (ICC) is a highly aggressive tumor of the bile duct, and a significant public health problem in East Asia, where it is associated with infection by the parasite Opisthorchis viverrini. ICC is often detected... more

Intrahepatic cholangiocarcinoma (ICC) is a highly aggressive tumor of the bile duct, and a significant public health problem in East Asia, where it is associated with infection by the parasite Opisthorchis viverrini. ICC is often detected at an advanced stage and with a poor prognosis, making a biomarker for early detection a priority.

Chromatin immunoprecipitation and DNase I hypersensitivity assays with high-throughput sequencing have greatly accelerated the understanding of transcriptional and epigenetic regulation, although data reuse for the community of... more

Chromatin immunoprecipitation and DNase I hypersensitivity assays with high-throughput sequencing have greatly accelerated the understanding of transcriptional and epigenetic regulation, although data reuse for the community of experimental biologists has been challenging. We created a data portal CistromeFinder that can help query, evaluate and visualize publicly available Chromatin immunoprecipitation and DNase I hypersensitivity assays with highthroughput sequencing data in human and mouse. The database currently contains 6378 samples over 4391 datasets, 313 factors and 102 cell lines or cell populations. Each dataset has gone through a consistent analysis and quality control pipeline; therefore, users could evaluate the overall quality of each dataset before examining binding sites near their genes of interest. CistromeFinder is integrated with UCSC genome browser for visualization, Primer3Plus for ChIP-qPCR primer design and CistromeMap for submitting newly available datasets. It also allows users to leave comments to facilitate data evaluation and update.

Plants display sophisticated mechanisms to tolerate challenging environmental conditions and need to manage their ontogenesis in parallel. Here, we set out to generate an RNA-Seq time series dataset throughout grapevine (Vitis vinifera)... more

Plants display sophisticated mechanisms to tolerate challenging environmental conditions and need to manage their ontogenesis in parallel. Here, we set out to generate an RNA-Seq time series dataset throughout grapevine (Vitis vinifera) early bud development. The expression of the developmental regulator VviAP1 served as an indicator of the progression of development. We investigated the impact of changing temperatures on gene expression levels during the time series and detected a correlation between increased temperatures and a high expression level of genes encoding heat-shock proteins. The dataset also allowed the exemplary investigation of expression patterns of genes from three transcription factor (TF) gene families, namely MADS-box, WRKY, and R2R3-MYB genes. Inspection of the expression profiles from all three TF gene families indicated that a switch in the developmental program takes place in July which coincides with increased expression of the bud dormancy marker gene Vvi...

Using our previously established xmrk transgenic zebrafish, hepatocellular carcinoma (HCC) was generated by induced expression of xmrk, which encoded a hyperactive epidermal growth factor receptor (EGFR) homolog, and regressed by... more

Using our previously established xmrk transgenic zebrafish, hepatocellular carcinoma (HCC) was generated by induced expression of xmrk, which encoded a hyperactive epidermal growth factor receptor (EGFR) homolog, and regressed by suppression of xmrk expression. To investigate molecular changes in liver tumour progression and regression, RNA-Seq was performed for induced HCC and early and late stages of liver tissues during tumour regression. We found that Xmrk-induced zebrafish HCC shared strong molecular characteristics with a human HCC subtype (S2), which shows activated Myc signalling, upregulated phosphor-S6 and epithelial cell adhesion molecule. In the HCC stage, there were enhanced proteasome, antigen processing and presentation, aminosugars metabolisms, p53 and cell cycle pathways. During tumour regression, the transcriptomic profile showed a reversed trend of molecular changes compared with human HCC progression. Interestingly, distinct immune responses in tumour progression and regression were observed, including increased major histocompatibility complex class I (MHCI) at the HCC stage, enriched immune cell trafficking signals and inflammation in early regression and enhanced MHCII in late regression. Both neutrophils and macrophages were enriched during tumour progression and regression; however, the distribution of neutrophils and macrophages in HCC was relatively uniform, whereas both types of immune cells were regionally clustered during tumour regression, especially with dominant blood vessel association of macrophage in late regression, suggesting differential functions of these immune cells in tumour progression and regression. As tumour regression in our model resembles the targeted inhibition of EGFR in cancer therapy, our observations may provide molecular insights into the targeted inhibition and highlight the importance of immune response in tumour regression.

KRAS mutations are highly prevalent in non-small cell lung cancer (NSCLC), and tumors harboring these mutations tend to be aggressive and resistant to chemotherapy. We used next-generation sequencing technology to identify pathways that... more

KRAS mutations are highly prevalent in non-small cell lung cancer (NSCLC), and tumors harboring these mutations tend to be aggressive and resistant to chemotherapy. We used next-generation sequencing technology to identify pathways that are specifically altered in lung tumors harboring a KRAS mutation. Paired-end RNA-sequencing of 15 primary lung adenocarcinoma tumors (8 harboring mutant KRAS and 7 with wild-type KRAS) were performed. Sequences were mapped to the human genome, and genomic features, including differentially expressed genes, alternate splicing isoforms and single nucleotide variants, were determined for tumors with and without KRAS mutation using a variety of computational methods. Network analysis was carried out on genes showing differential expression (374 genes), alternate splicing (259 genes), and SNV-related changes (65 genes) in NSCLC tumors harboring a KRAS mutation. Genes exhibiting two or more connections from the lung adenocarcinoma network were used to car...

The diversity of the installed sequencing and microarray equipment make it increasingly difficult to compare and analyze the gene expression datasets obtained using the different methods. Many applications requiring high-quality and low... more

The diversity of the installed sequencing and microarray equipment make it increasingly difficult to compare and analyze the gene expression datasets obtained using the different methods. Many applications requiring high-quality and low error rates cannot make use of available data using traditional analytical approaches. Recently, we proposed a new concept of signalome-wide analysis of functional changes in the intracellular pathways termed OncoFinder, a bioinformatic tool for quantitative estimation of the signaling pathway activation (SPA). We also developed methods to compare the gene expression data obtained using multiple platforms and minimizing the error rates by mapping the gene expression data onto the known and custom signaling pathways. This technique for the first time makes it possible to analyze the functional features of intracellular regulation on a mathematical basis. In this study we show that the OncoFinder method significantly reduces the errors introduced by transcriptome-wide experimental techniques. We compared the gene expression data for the same biological samples obtained by both the next generation sequencing (NGS) and microarray methods. For these different techniques we demonstrate that there is virtually no correlation between the gene expression values for all datasets analyzed (R 2 < 0.1). In contrast, when the OncoFinder algorithm is applied to the data we observed clear-cut correlations between the NGS and microarray gene expression datasets. The SPA profiles obtained using NGS and microarray techniques were almost identical for the same biological samples allowing for the platform-agnostic analytical applications. We conclude that this feature of the OncoFinder enables to characterize the functional states of the transcriptomes and interactomes more accurately as before, which makes OncoFinder a method of choice for many applications including genetics, physiology, biomedicine, and molecular diagnostics.

a b s t r a c t Molecular mechanisms guiding naïve T helper cell differentiation into functionally specified effector cells are intensively studied. The rapidly growing knowledge is mainly achieved by using mouse cells or disease models.... more

a b s t r a c t Molecular mechanisms guiding naïve T helper cell differentiation into functionally specified effector cells are intensively studied. The rapidly growing knowledge is mainly achieved by using mouse cells or disease models. Comparatively exiguous data is gathered from human primary cells although they provide the "ultimate model" for immunology in man, have been exploited in many original studies paving the way for the field, and can be analyzed more easily than ever with the help of modern technology and methods. As usage of mouse models is unavoidable in translational research, parallel human and mouse studies should be performed to assure the relevancy of the hypothesis created during the basic research. In this review, we give an overview on the status of the studies conducted with human primary cells aiming at elucidating the mechanisms instructing the priming of T helper cell subtypes. The special emphasis is given to the recent high-throughput studies. In addition, by comparing the human and mouse studies we intend to point out the regulatory mechanisms and questions which are lacking examination with human primary cells.

Technical advances such as the development of molecular cloning, Sanger sequencing, PCR and oligonucleotide microarrays are key to our current capacity to sequence, annotate and study complete organismal genomes. Recent years have seen... more

Technical advances such as the development of molecular cloning, Sanger sequencing, PCR and oligonucleotide microarrays are key to our current capacity to sequence, annotate and study complete organismal genomes. Recent years have seen the development of a variety of so-called 'next-generation' sequencing platforms, with several others anticipated to become available shortly. The previously unimaginable scale and economy of these methods, coupled with their enthusiastic uptake by the scientific community and the potential for further improvements in accuracy and read length, suggest that these technologies are destined to make a huge and ongoing impact upon genomic and post-genomic biology. However, like the analysis of microarray data and the assembly and annotation of complete genome sequences from conventional sequencing data, the management and analysis of next-generation sequencing data requires (and indeed has already driven) the development of informatics tools able to assemble, map, and interpret huge quantities of relatively or extremely short nucleotide sequence data. Here we provide a broad overview of bioinformatics approaches that have been introduced for several genomics and functional genomics applications of next-generation sequencing.

We examine the contribution of next generation sequencing (NGS) to our understanding of the interaction between the fungal pathogen Zymoseptoria tritici and its wheat host. Recent interspecific whole genome comparisons between Z. tritici... more

We examine the contribution of next generation sequencing (NGS) to our understanding of the interaction between the fungal pathogen Zymoseptoria tritici and its wheat host. Recent interspecific whole genome comparisons between Z. tritici and its close relatives provide evidence that Z. tritici has undergone strong adaptive evolution, which is attributed to specialization by Z. tritici on wheat. We also assess the contribution of recent RNA sequencing datasets toward identifying pathogen genes and mechanisms critical for disease. While these studies have yet to report a major effector gene, they illustrate that assembling reads to the reference genome is a robust method to identify fungal transcripts from in planta infections. They also highlight the strong influence that the wheat cultivar has on effector gene expression. Lastly, we suggest future directions for NGS-guided approaches to address largely unanswered questions related to cultivar and lifecycle dependent gene expression and propose that future experiments with Z. tritici be conducted on a single wheat cultivar to enable comparisons across experiments.

Please cite this article in press as: Morton ML, et al. Identification of mRNAs and lincRNAs associated with lung cancer progression using next-generation RNA sequencing from laser micro-dissected archival FFPE tissue specimens. Lung... more

Please cite this article in press as: Morton ML, et al. Identification of mRNAs and lincRNAs associated with lung cancer progression using next-generation RNA sequencing from laser micro-dissected archival FFPE tissue specimens. Lung Cancer (2014), http://dx.Adenocarcinoma in situ of the lung Invasive lung adenocarcinoma Formalin-fixed paraffin embedded (FFPE) lincRNA Gene expression Laser capture microdissection (LCM) a b s t r a c t Objectives: Adenocarcinoma in situ (AIS) is an intermediate step in the progression of normal lung tissue to invasive adenocarcinoma. However, molecular mechanisms underlying this progression remain to be fully elucidated due to challenges in obtaining fresh clinical samples for downstream analyses. Formalin fixation and paraffin embedding (FFPE) is a tissue preservation system widely used for long-term storage. Until recently, challenges in working with FFPE precluded using new RNA sequencing technologies (RNAseq), which would help clarify key pathways in cancer progression. Also, isolation techniques including laser-capture micro-dissection provide the ability to select histopathologically distinct tissues, allowing researchers to study transcriptional variations between tightly juxtaposed cell and tissue types. Materials and methods: Utilizing these technologies and new alignment tools we examined differential expression of long intergenic non-coding RNAs (lincRNAs) and mRNAs across normal, AIS and invasive adenocarcinoma samples from six patients to identify possible markers of lung cancer progression. Results: RNA extracted and sequenced from these 18 samples generated an average of 198 million reads per sample. After alignment and filtering, uniquely aligned reads represented an average 35% of the total reads. We detected differential expression of a number of lincRNAs and mRNAs when comparing normal to AIS, or AIS to invasive adenocarcinoma. Of these, 5 lincRNAs and 31 mRNAs were consistently up-or down-regulated from normal to AIS and more so to invasive carcinoma. We validated the up-regulation of two mRNAs and one lincRNA by RT-qPCR as proof of principle. Conclusion: Our findings indicate a potential role of not only mRNAs, but also lincRNAs in the progression to invasive adenocarcinoma. We anticipate that these findings will lay the groundwork for future experimental studies of candidate RNAs from FFPE to identify their functional roles in lung cancer.

Ingenuity pathway analysis (IPA) in The Cancer Genome Atlas (TCGA) union of RNA seq and Agilent data. Top canonical pathways detected by IPA using the union of differentially expressed genes in Agilent and RNA-Seq expression platforms in... more

Ingenuity pathway analysis (IPA) in The Cancer Genome Atlas (TCGA) union of RNA seq and Agilent data. Top canonical pathways detected by IPA using the union of differentially expressed genes in Agilent and RNA-Seq expression platforms in TCGA. Four columns correspond to Ingenuity canonical pathway names, â log(p value), percentage of genes detected in this pathway, and molecules in this pathway. (XLS 15 kb)

SummaryWith the development of transcriptomic technologies, we are able to quantify precise changes in gene expression profiles from astronauts and other organisms exposed to spaceflight. Members of NASA GeneLab and GeneLab-associated... more

SummaryWith the development of transcriptomic technologies, we are able to quantify precise changes in gene expression profiles from astronauts and other organisms exposed to spaceflight. Members of NASA GeneLab and GeneLab-associated analysis working groups (AWGs) have developed a consensus pipeline for analyzing short-read RNA-sequencing data from spaceflight-associated experiments. The pipeline includes quality control, read trimming, mapping, and gene quantification steps, culminating in the detection of differentially expressed genes. This data analysis pipeline and the results of its execution using data submitted to GeneLab are now all publicly available through the GeneLab database. We present here the full details and rationale for the construction of this pipeline in order to promote transparency, reproducibility and reusability of pipeline data, to provide a template for data processing of future spaceflight-relevant datasets, and to encourage cross-analysis of data from ...

Maternal mRNA transcripts deposited in growing oocytes regulate early development and are under intensive investigation as determinants of egg quality. The research has evolved from single gene studies to microarray and now RNA-Seq... more

Maternal mRNA transcripts deposited in growing oocytes regulate early development and are under intensive investigation as determinants of egg quality. The research has evolved from single gene studies to microarray and now RNA-Seq analyses in which mRNA expression by virtually every gene can be assessed and related to gamete quality. Such studies have mainly focused on genes changing two- to several-fold in expression between biological states, and have identified scores of candidate genes and a few gene networks whose functioning is related to successful development. However, ever-increasing yields of information from high throughput methods for detecting transcript abundance have far outpaced progress in methods for analyzing the massive quantities of gene expression data, and especially for meaningful relation of whole transcriptome profiles to gamete quality. We have developed a new approach to this problem employing artificial neural networks and supervised machine learning wi...

RNA-Seq has provided valuable insights into global gene expression in a wide variety of organisms. Using a modified RNA-Seq approach and Illumina's high-throughput sequencing technology, we globally identified 59-ends of transcripts for... more

RNA-Seq has provided valuable insights into global gene expression in a wide variety of organisms. Using a modified RNA-Seq approach and Illumina's high-throughput sequencing technology, we globally identified 59-ends of transcripts for the plant pathogen Pseudomonas syringae pv. tomato str. DC3000. A substantial fraction of 59-ends obtained by this method were consistent with results obtained using global RNA-Seq and 59RACE. As expected, many 59-ends were positioned a short distance upstream of annotated genes. We also captured 59-ends within intergenic regions, providing evidence for the expression of un-annotated genes and non-coding RNAs, and detected numerous examples of antisense transcription, suggesting additional levels of complexity in gene regulation in DC3000. Importantly, targeted searches for sequence patterns in the vicinity of 59-ends revealed over 1200 putative promoters and other regulatory motifs, establishing a broad foundation for future investigations of regulation at the genomic and single gene levels.

DNA sequencing is a powerful approach for decoding a number of human diseases, including cancers. The advent of next-generation sequencing (NGS) technologies has reduced sequencing cost by orders of magnitude and significantly increased... more

DNA sequencing is a powerful approach for decoding a number of human diseases, including cancers. The advent of next-generation sequencing (NGS) technologies has reduced sequencing cost by orders of magnitude and significantly increased the throughput, making whole-genome sequencing a possible way for obtaining global genomic information about patients on whom clinical actions may be taken. However, the benefits offered by NGS technologies come with a number of challenges that must be adequately addressed before they can be transformed from research tools to routine clinical practices. This article provides an overview of four commonly used NGS technologies from Roche Applied Science//454 Life Sciences, Illumina, Life Technologies and Helicos Biosciences. The challenges in the analysis of NGS data and their potential applications in clinical diagnosis are also discussed.

Due to the advent of high-throughput DNA sequencing technology, the sequence of an entire plant genome of Arabidopsis became available [1]. Subsequently, microarray platforms were developed for transcriptome analysis of Arabidopsis [2,... more

Due to the advent of high-throughput DNA sequencing technology, the sequence of an entire plant genome of Arabidopsis became available [1]. Subsequently, microarray platforms were developed for transcriptome analysis of Arabidopsis [2, 3]. Also, the first transcriptome report by next generation sequencing (NGS) was released in 2007 [4]. The progression of technology utilization, from genomics to transcriptomics, has played a crucial role in understanding the complex regulation of the eukaryotic transcriptome landscape. For the model plants of dicots and monocots, genome statistics data suggests that 135 Mbp of Arabidopsis genome codes for about 33 602 genes and 385 Mbp of rice genome codes for about 55 986 genes. This number includes genes coding for proteins, pseudogenes, non-coding transcripts as well as transposable elements. In Arabidopsis, about 18% of the genes have annotated splice variants and in rice 66 338 transcripts are present for a total of 55 986 genes (TAIR10,

Advancing the production efficiency and profitability of aquaculture is dependent upon the ability to utilize a diverse array of genetic resources. The ultimate goals of aquaculture genomics, genetics and breeding research are to enhance... more

Advancing the production efficiency and profitability of aquaculture is dependent upon the ability to utilize a diverse array of genetic resources. The ultimate goals of aquaculture genomics, genetics and breeding research are to enhance aquaculture production efficiency, sustainability, product quality, and profitability in support of the commercial sector and for the benefit of consumers. In order to achieve these goals, it is important to understand the genomic structure and organization of aquaculture species, and their genomic and phenomic variations, as well as the genetic basis of traits and their interrelationships. In addition, it is also important to understand the mechanisms of regulation and evolutionary conservation at the levels of genome, transcriptome, proteome, epigenome, and systems biology. With genomic information and information between the genomes and phenomes, technologies for marker/causal mutation-assisted selection, genome selection, and genome editing can ...

The quantification of transcriptomic features is the basis of the analysis of RNA-seq data. We present an integrated alignment workflow and a simple counting-based approach to derive estimates for gene, exon and exon-exon junction... more

The quantification of transcriptomic features is the basis of the analysis of RNA-seq data. We present an integrated alignment workflow and a simple counting-based approach to derive estimates for gene, exon and exon-exon junction expression. In contrast to previous counting-based approaches, EQP takes into account only reads whose alignment pattern agrees with the splicing pattern of the features of interest. This leads to improved gene expression estimates as well as to the generation of exon counts that allow disambiguating reads between overlapping exons. Unlike other methods that quantify skipped introns, EQP offers a novel way to compute junction counts based on the agreement of the read alignments with the exons on both sides of the junction, thus providing a uniformly derived set of counts. We evaluated the performance of EQP on both simulated and real Illumina RNAseq data and compared it with other quantification tools. Our results suggest that EQP provides superior gene expression estimates and we illustrate the advantages of EQP's exon and junction counts. The provision of uniformly derived high-quality counts makes EQP an ideal quantification tool for differential expression and differential splicing studies. EQP

Background: Human Malaria is transmitted by mosquitoes of the genus Anopheles. Transmission is a complex phenomenon involving biological and environmental factors of humans, parasites and mosquitoes. Among more than 500 anopheline... more

Background: Human Malaria is transmitted by mosquitoes of the genus Anopheles. Transmission is a complex phenomenon involving biological and environmental factors of humans, parasites and mosquitoes. Among more than 500 anopheline species, only a few species from different branches of the mosquito evolutionary tree transmit malaria, suggesting that their vectorial capacity has evolved independently. Anopheles albimanus (subgenus Nyssorhynchus) is an important malaria vector in the Americas. The divergence time between Anopheles gambiae, the main malaria vector in Africa, and the Neotropical vectors has been estimated to be 100 My. To better understand the biological basis of malaria transmission and to develop novel and effective means of vector control, there is a need to explore the mosquito biology beyond the An. gambiae complex. Results: We sequenced the transcriptome of the An. albimanus adult female. By combining Sanger, 454 and Illumina sequences from cDNA libraries derived from the midgut, cuticular fat body, dorsal vessel, salivary gland and whole body, we generated a single, high-quality assembly containing 16,669 transcripts, 92% of which mapped to the An. darlingi genome and covered 90% of the core eukaryotic genome. Bidirectional comparisons between the An. gambiae, An. darlingi and An. albimanus predicted proteomes allowed the identification of 3,772 putative orthologs. More than half of the transcripts had a match to proteins in other insect vectors and had an InterPro annotation. We identified several protein families that may be relevant to the study of Plasmodium-mosquito interaction. An open source transcript annotation browser called GDAV (Genome-Delinked Annotation Viewer) was developed to facilitate public access to the data generated by this and future transcriptome projects. Conclusions: We have explored the adult female transcriptome of one important New World malaria vector, An. albimanus. We identified protein-coding transcripts involved in biological processes that may be relevant to the Plasmodium lifecycle and can serve as the starting point for searching targets for novel control strategies. Our data increase the available genomic information regarding An. albimanus several hundred-fold, and will facilitate molecular research in medical entomology, evolutionary biology, genomics and proteomics of anopheline mosquito vectors.

In females, estrogens have two main modes of action relating to gonadotropin secretion: positive feedback and negative feedback. Estrogen positive and negative feedback are controlled by different regions of the hypothalamus: the preoptic... more

In females, estrogens have two main modes of action relating to gonadotropin secretion: positive feedback and negative feedback. Estrogen positive and negative feedback are controlled by different regions of the hypothalamus: the preoptic area/anterior portion (mainly the anteroventral periventricular nucleus, AVPV) of the hypothalamus is associated with estrogen positive feedback while the mediobasal hypothalamus (mainly the arcuate nucleus of the hypothalamus, ARH), is associated with estrogen negative feedback. In this study, we examined the temporal pattern of gene transcription in these two regions following estrogen treatment. Adult, ovariectomized, Long Evans rats received doses of estradiol benzoate (EB) or oil every 4 days for 3 cycles. On the last EB priming cycle, hypothalamic tissues were dissected into the AVPV+ and ARH+ at 0 hrs (baseline/oil control), 6 hrs, or 24 hrs after EB treatment. RNA was extracted and sequenced using bulk RNA sequencing. Differential gene anal...

Alternative σ factors are important transcriptional regulators in bacteria. While σ(B) has been shown to control a large regulon and play important roles in stress response and virulence in the pathogen Listeria monocytogenes, the... more

Alternative σ factors are important transcriptional regulators in bacteria. While σ(B) has been shown to control a large regulon and play important roles in stress response and virulence in the pathogen Listeria monocytogenes, the function of σ(H) has not yet been well defined in Listeria, even though σ(H) controls a large regulon in the closely related non-pathogenic Bacillus subtilis. Using RNA-seq characterization of a L. monocytogenes strain with deletions of all 4 genes encoding alternative σ factors (ΔBCHL), which was further modified to overexpress sigH (ΔBCHL::P rha -sigH), we identified 6 transcription units (TUs) that are transcribed from σ(H)-dependent promoters. Five of these TUs had not been previously identified. Identification of these promoters was facilitated by use of a bio-informatics approach that compared normalized RNA-seq coverage (NRC), between ΔBCHL::P rha -sigH and a ΔBCHL control, using sliding windows of 51 nt along the whole genome rather than comparing ...

Rainbow trout, Oncorhynchus mykiss, is an important aquaculture species worldwide and, in addition to being of commercial interest, it is also a research model organism of considerable scientific importance. Because of the lack of a whole... more

Rainbow trout, Oncorhynchus mykiss, is an important aquaculture species worldwide and, in addition to being of commercial interest, it is also a research model organism of considerable scientific importance. Because of the lack of a whole genome sequence in that species, transcriptomic analyses of this species have often been hindered. Using next-generation sequencing (NGS) technologies, we sought to fill these informational gaps. Here, using Roche 454-Titanium technology, we provide new tissue-specific cDNA repertoires from several rainbow trout tissues. Non-normalized cDNA libraries were constructed from testis, ovary, brain and gill rainbow trout tissue samples, and these different libraries were sequenced in 10 separate half-runs of 454-Titanium. Overall, we produced a total of 3 million quality sequences with an average size of 328 bp, representing more than 1 Gb of expressed sequence information. These sequences have been combined with all publicly available rainbow trout sequences, resulting in a total of 242,187 clusters of putative transcript groups and 22,373 singletons. To identify the predominantly expressed genes in different tissues of interest, we developed a Digital Differential Display (DDD) approach. This approach allowed us to characterize the genes that are predominantly expressed within each tissue of interest. Of these genes, some were already known to be tissue-specific, thereby validating our approach. Many others, however, were novel candidates, demonstrating the usefulness of our strategy and of such tissue-specific resources. This new sequence information, acquired using NGS 454-Titanium technology, deeply enriched our current knowledge of the expressed genes in rainbow trout through the identification of an increased number of tissue-specific sequences. This identification allowed a precise cDNA tissue repertoire to be characterized in several important rainbow trout tissues. The rainbow trout contig browser can be accessed at the following publicly available web site (http://www.sigenae.org/).