GitHub - LieberInstitute/qsva_brain: qSVA adapted for 2 brain regions as part of the BrainSeq Phase II project (original) (raw)
qsva_brain
This repository contains the code for the SCZD case vs control differential expression analysis for the BrainSeq Phase II project (see the brainseq_phase2 repo). It was carried out by Amy Peterson as part of her JHSPH MPH project.
License
Attribution-NonCommercial: CC BY-NC
This license lets others remix, tweak, and build upon our work non-commercially as long as they acknowledge our work.
View License Deed | View Legal Code
Citation
If you use anything in this repository please cite the BrainSeq Phase II project.
Scripts
The main script directories follow this order:
expr_data
: scripts for integrating the degradation datasets from multiple brain regions.means
: calculate the base-pair coverage mean files for the degradation data.ERs
: identify expressed regions usingderfinder
on the degradation data. Also identify which ERs are strongly associated with degradation time.brainseq_phase2_qsv
: compute the coverage matrix for the BrainSeq Phase 2 data, identify the quality surrogate variables (qSVs) and then perform the differential expression analysis between schizophrenia cases and non-psychiatric controls.
Files in this project
- README.md: this file
- qsva_brain.Rproj: RStudio file for organizing the project
expr_data
- merge_data.R: R script merging the different degradation datasets.
- run_merge_data.sh: bash shell script for running the R script at JHPCE.
- logs: directory with log files
- pdf: directory with image files
means
- qsva_bws.R: R script for computing the mean base-pair coverage and saving them as BigWig files. Uses the
recount.bwtool
R package. - qsva_bws.sh: bash shell script for running the previous R script at JHPCE.
ERs
- make_ERs_stranded.R: R script for identifying the expressed regions, then identifying which ERs are strongly associated with degradation signal and saving the results for later use.
- make_ERs_stranded.sh: bash shell script for running the previous R script at JHPCE.
brainseq_phase2_qsv
- quantify_top1000.R and quantify_top1000.sh: R and bash script for computing the coverage matrix.
- explore_replicates.R: R script for exploring the coverage matrix results.
- make_qSVs.R: identify the quality surrogate variables following different options. That is, with all 900 samples, with 712 after dropping potential confounding samples, and then for each brain region individually.
- explore_qsvs.R: perform PCA on the BrainSeq gene expression data and explore that data. Also explore the qSVs and their associations with measured covariates.
- casectrl_HIPPO.R and casectrl_DLPFC.R: perform the case-control differential expression analysis for each brain region at the gene expression level.
- casectrl_HIPPO_allFeatures.R and casectrl_DLPFC_allFeatures.R: similar to the above, but for exons, exon-exon junctions and transcript expression levels.
- casectrl_HIPPO_plots.R and casectrl_DLPFC_plots.R: R scripts for making the DEqual plots that assess the performance of the qSVA framework.
- explore_case_control.R: R script for exploring the case-control results, performing gene ontology enrichment analyses, comparing results against BrainSeq Phase I (polyA) and visualizing the top results.
- pdf: directory with image files
Length of scripts
Number of lines in each script.
## R scripts
$ for i in */*.R; do echo <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>i</mi><mo separator="true">;</mo><mi>w</mi><mi>c</mi><mo>−</mo><mi>l</mi></mrow><annotation encoding="application/x-tex">i; wc -l </annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.854em;vertical-align:-0.1944em;"></span><span class="mord mathnormal">i</span><span class="mpunct">;</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathnormal" style="margin-right:0.02691em;">w</span><span class="mord mathnormal">c</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span></span></span></span>i; done
ERs/make_ERs_stranded.R
258 ERs/make_ERs_stranded.R
brainseq_phase2_qsv/casectrl_DLPFC.R
263 brainseq_phase2_qsv/casectrl_DLPFC.R
brainseq_phase2_qsv/casectrl_DLPFC_allFeatures.R
164 brainseq_phase2_qsv/casectrl_DLPFC_allFeatures.R
brainseq_phase2_qsv/casectrl_DLPFC_plots.R
56 brainseq_phase2_qsv/casectrl_DLPFC_plots.R
brainseq_phase2_qsv/casectrl_HIPPO.R
261 brainseq_phase2_qsv/casectrl_HIPPO.R
brainseq_phase2_qsv/casectrl_HIPPO_allFeatures.R
165 brainseq_phase2_qsv/casectrl_HIPPO_allFeatures.R
brainseq_phase2_qsv/casectrl_HIPPO_plots.R
55 brainseq_phase2_qsv/casectrl_HIPPO_plots.R
brainseq_phase2_qsv/explore_case_control.R
690 brainseq_phase2_qsv/explore_case_control.R
brainseq_phase2_qsv/explore_qsvs.R
1510 brainseq_phase2_qsv/explore_qsvs.R
brainseq_phase2_qsv/explore_replicates.R
137 brainseq_phase2_qsv/explore_replicates.R
brainseq_phase2_qsv/make_qSVs.R
318 brainseq_phase2_qsv/make_qSVs.R
brainseq_phase2_qsv/quantify_top1000.R
97 brainseq_phase2_qsv/quantify_top1000.R
expr_data/merge_data.R
264 expr_data/merge_data.R
means/qsva_bws.R
29 means/qsva_bws.R
## bash scripts
$ for i in */*.sh; do echo <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>i</mi><mo separator="true">;</mo><mi>w</mi><mi>c</mi><mo>−</mo><mi>l</mi></mrow><annotation encoding="application/x-tex">i; wc -l </annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.854em;vertical-align:-0.1944em;"></span><span class="mord mathnormal">i</span><span class="mpunct">;</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathnormal" style="margin-right:0.02691em;">w</span><span class="mord mathnormal">c</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span></span></span></span>i; done
ERs/make_ERs_stranded.sh
21 ERs/make_ERs_stranded.sh
brainseq_phase2_qsv/quantify_top1000.sh
22 brainseq_phase2_qsv/quantify_top1000.sh
expr_data/run_merge_data.sh
23 expr_data/run_merge_data.sh
means/qsva_bws.sh
19 means/qsva_bws.sh
LIBD internal
JHPCE location: /dcs04/lieber/lcolladotor/qSVA_LIBD3080/qsva_brain