GitHub - marcottelab/MSblender: MSblender is a statistical tool for merging database search results from multiple database search engines for peptide identification based on a multivariate modeling approach. (original) (raw)
MSblender
MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines
See http://www.marcottelab.org/index.php/MSblender for more (somewhat outdated) information. Citation:
T. Kwon*, H. Choi*, C. Vogel, A.I. Nesvizhskii, and E.M. Marcotte, MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines. J. Proteome Research, 10(7): 2949–2958 (2011) Link
About
This version is modified from JRH MS1-Quant-Pipeline but minus any MS1 quantification
It has been further modified to make parameters more consistent across search algorithms. MS-GF+ options and PTMs are now defined in dedicated parameter files.
This repo contains:
- msblender MS2 analysis
- helper scripts
- accessory files and parameters used MS intepretation programs
Available search engines:
- X!Tandem
- Comet
- MS-GF+
All programs are external from SearchGUI except X!tandem
Quick start
set up directories (replace "proj" with your project name)
mkdir -p proj/{mzXML,db,working,output,logs}
symlink raw data to mzxml directory
ln -s /path/to/mzxmls/*mzXML proj/mzXML
make fasta database (replace "proteome" with your fasta name)
there is a contam.fasta here: example/fastas/contam.fasta
cat /path/to/proteome.fasta /path/to/contam.fasta > proj/db/proteome_contam.combined.fasta
template command
/path/to/runMSblender.sh /path/to/mzXML/file /path/to/database/file /path/to/working/dir/ /path/to/output/dir /path/to/logs/dir
for many mzXMLs (e.g., CFMS data); parallel: "-j2" = 2 commands at a time, "-j4" = 4 commands at a time, etc
for x in mzXML/*mzXML; do echo "/path/to/runMSblender.sh ${x} /path/to/db/proteome_contam.combined.fasta /path/to/working/dir/ /path/to/output/dir/ /path/to/logs/dir/"; done > proj.msblender.cmds cat proj.msblender.cmds | parallel -j4
combine .group file for each fraction into one tab-separated output
python /path/to/msblender-scripts/msblender2elution.py
--prot_count_files /path/to/output/dir/*.group
--output_filename proj_output_name.prot_count_mFDRpsm001.unique.elut
--fraction_name_from_filename
--parse_uniprot_id --remove_zero_unique
Running the example
This repo contains an "example" folder with the recommended directory structure provided above.
get repo
git clone https://github.com/marcottelab/MSblender.git
switch to the example directory
cd MSblender/example/
create directory skeleton (mzXML and db already exist)
mkdir {working,output,logs}
make the database
cat db/caeel.fasta db/contam.fasta > db/caeel.contam.fasta
generate commands
for x in mzXML/*mzXML; do echo "../runMSblender.sh ${x} db/caeel.contam.fasta working output logs"; done > example.msblender.cmds
run commands in parallel ("-j2" = 2 commands at a time, "-j4" = 4 commands at a time, etc)
cat example.msblender.cmds | parallel -j4
combine results into a table
python ../msblender-scripts/msblender2elution.py
--prot_count_files output/*.group
--output_filename example.prot_count_mFDRpsm001.unique.elut
--fraction_name_from_filename
--parse_uniprot_id --remove_zero_unique
Search parameter configuration
Search engine parameter docs: X!Tandem, Comet-2013020, and MS-GF+.
Search parameters can be modified as necessary, but try to keep parameters consistent across search algorithms.
The default were selected with our standard MS experiments in mind:
- high-res MS (10ppm precursor tolerance)
- low-res MS/MS (ion trap) *
- tryptic digestion, no non-enzymatic termini
- fixed cysteine carbamidomethylation (+57.021464, from iodoacetamide alkylation)
- optional methionine oxidation (+15.9949)
* X!Tandem purportedly ignores fragment mass tolerance settings when using k-scoring and/or no "spectrum conditioning". (And it recommends turning conditioning off when using k-score.)
Tips on changing search parameters
Comet parameter and MS-GF+ param and modification files are found in ./params
Comments within each should offer sufficient documentation.
X!Tandem parameters are found in ./search/tmpl/tandemK.high.xml
MSBlender Docker
A full docker image with MSblender installed is available here: https://hub.docker.com/r/kdrew/msblender
To run:
docker pull kdrew/msblender
docker run -v /test_data/:/data msblender /data/xl_animalcaps_SEC_Control_20a_20181121.mzXML /data/combined_contam_rev_file.fasta /data/working /data/output /searchgui
To do list
- Myrimatch currently not working b/c of library issue
- Separate Xtandem from this repo
- Add on MS1 quantification