RNAseq analysis in R (original) (raw)
General Information
In this workshop, you will be learning how to analyse RNA-seq count data, using R. This will include reading the data into R, quality control and performing differential expression analysis and gene set testing, with a focus on the limma-voom analysis workflow. You will learn how to generate common plots for analysis and visualisation of gene expression data, such as boxplots and heatmaps. You will also be learning how alignment and counting of raw RNA-seq data can be performed in R. In addition, you will be introduced to Degust, a user-friendly interactive tool for RNA-seq analysis. This workshop is aimed at biologists interested in learning how to perform differential expression analysis of RNA-seq data when reference genomes are available.
With special guest speakers**Alicia Oshlack** and**Wei Shi**.
Who: The course is aimed at graduate students and other researchers.Some basic R knowledge is assumed. This is not an introduction to R course. If you are not familiar with the R statistical programming language we strongly encourage you to work through an introductory R course before you attend this workshop. We recommend the Software CarpentryR for Reproducible Scientific Analysis lessons up to and including vectorisation (topic 9).
Where: LAB-14, 700 Swanston Street, Carlton VICTORIA 3053. Get directions with[OpenStreetMap](//www.openstreetmap.org/?mlat=-37.800229&mlon= 144.964256&zoom=16) or[Google Maps](//maps.google.com/maps?q=-37.800229, 144.964256).
Thank you to VLSCI for hosting this workshop.
Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listedbelow). If you choose to bring your own RNAseq data, it must be count data (because we don't have the time or computational resources to map large data sets during the workshop). If you're not sure, here is anexample of count data format.
Accessibility: We are committed to making this workshop accessible to everybody. The workshop organisers have checked that:
- The room is wheelchair / scooter accessible.
- Accessible restrooms are available.
Contact: Please mailcombine@combine.org.au for more information.
Schedule
Note: this is a preliminary schedule. There may be changes to the timing and content.
If you have any trouble installing the software or packages, please arrive at 8:45am on the first day so we can help before the workshop starts.
In the first session of Day 1 we will cover some R skills that are particularly useful for RNAseq analysis, for example data frame manipulation, factors and subsetting using logical statements. If you are a more advanced R user, you may not need to attend this session.
Day 1
09:00 | R for RNAseq |
---|---|
11:00 | Morning tea |
11:30 | Introduction to RNA-seq theory |
12:00 | Differential expression |
13:00 | Lunch |
13:45 | Differential expression continued |
15:00 | Afternoon tea |
15:30 | Differential expression continued |
16:30 | Wrap-up |
Day 2
09:30 | Alignment and feature counting |
---|---|
11:00 | Morning tea |
11:30 | Gene set testing |
13:00 | Lunch |
13:45 | Applying RNAseq (or BYO data) |
15:00 | Afternoon tea |
15:30 | Applying RNAseq (or BYO data) continued |
16:30 | Wrap-up |
Etherpad: http://pad.software-carpentry.org/2016-05-11-RNAseq.
We will use this Etherpad for chatting, taking notes, and sharing URLs and bits of code.
Syllabus
You can find all of the lesson notes for this workshophere.
Setup
To participate in this workshop, you will need access to the software described below. In addition, you will need an up-to-date web browser.
Even if you already have R installed, it is important that you have the latest version because some of the packages we will be using will not work with earlier versions of R. The latest version (available through the links below) is 3.2.5 for Windows and 3.2.4 for Mac 10.9 and above. For Mac 10.6-10.8 the latest version is 3.2.1 (available fromhere) and for that version you will also need to download this gfortran packagegfortran package).
Software Carpentry maintains a list of common issues that occur during installation as a reference for instructors that may be useful on theConfiguration Problems and Solutions wiki page.
R
R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R, we useRStudio.
Linux
You can download the binary files for your distribution from CRAN. Or you can use your package manager (e.g. for Debian/Ubuntu run sudo apt-get install r-base
and for Fedora runsudo yum install R
). Also, please install theRStudio IDE.
R packages
You will also need to install the following R packages:limma, edgeR, gplots, org.Mm.eg.db, EDASeq, RColorBrewer, GO.db, BiasedUrn, DESeq2, Glimma, Rsubread.
These can all be obtained from Bioconductor, except for Glimma (see below).
Open RStudio and run the following commands to install packages from Bioconductor:
> source("http://bioconductor.org/biocLite.R")
> biocLite("limma")
Repeat this for each package.
> biocLite("edgeR")
…
> biocLite("RColorBrewer")
We will install Glimma from source (Mac users: Glimma_0.99.5.tgz, Windows: Glimma_0.99.5.zip). We can help with this during the workshop.
Rsubread
Mac and Linux users can install Rsubread using
> biocLite("Rsubread")
Windows users will need to set up a GVL instance. Instructions to follow.
Common problems:
- If the source command doesn't work, try https instead of http.
- Make sure you are directly connected to the internet and not using a proxy.
- Make sure the package name is in quotes.
If you have any trouble please see theInstructions for installing Bioconductor packages. You can also arrive early on the first day of the workshop for help with set up.