PSMatch: Handling and Managing Peptide Spectrum Matches — PSMatch (original) (raw)
The PSMatch package offers functionality to load, manage and analyse Peptide Spectrum Matches as generated in mass spectrometry-based proteomics. The four main objects and concepts that are proposed in this package are described below, and are aimed to proteomics practitioners to explore and understand their identification data better.
PSM objects
As mentioned in the [PSM()](PSM.html)
manual page, The PSM
class is a simple class to store and manipulate peptide-spectrum matches. The class encapsulates PSM data as a DataFrame (or more specifically aDFrame
) with additional lightweight metadata annotation. PSM objects are typically creatd from XML-based mzID files ordata.frames
imported from spreadsheets. It is then possible to apply widely used filters (such as removal of decoy hits, PSMs of rank > 1, ...) as described in [filterPSMs()](filterPSMs.html)
.
Adjacency matrices
PSM data, as produced by all proteomics search engines, is exported as a table-like structure where PSM are documented along the rows by variables such as identification scores, peptides sequences, modifications and the protein which the peptides originate from. There is always a level of ambiguity in such data, as peptides can be mapped to mutliple proteins; they are then called shared peptides, as opposed to unique peptides.
One convenient way to store the relation between peptides and proteins is as a peptide-by-protein adjacency matrix. Such matrices can be generated from PSM object or vectors using the[makeAdjacencyMatrix()](adjacencyMatrix.html)
function.
The [describePeptides()](describeProteins.html)
and [describeProteins()](describeProteins.html)
functions are also helpful to tally the number of unique and shared peptides and the number of proteins composed of unique or shared peptides, or a combination thereof.
Connected Components
Once we model the peptide-to-protein relations explicitly using an adjacency matrix, it becomes possible to perform computations on the proteins that are grouped by the peptides they share. These groups are mathematically defined as connected components, which are implemented as [ConnectedComponents()](ConnectedComponents.html)
objects.
Vignettes
A couple of vignette describe how to several of these concepts through illustrative use-cases. Use vignette(package = "PSMatch")
to get a list and open them directly in R
or read them online on the package'swebpage.