Protein Interactions And Network Analysis (original) (raw)

Attention! PIANA is not supported anymore. Check BIANA project instead.

  1. What is PIANA?
  2. How does PIANA work?
  3. Main features
  4. Documentation
  5. PIANA databases and parsers
  6. Examples
  7. Download code and database (Jul 08 2007 - New! PIANA v1.4 is available!)
  8. References
  9. Authors and acknowledgements

PIANA (Protein Interactions And Network Analysis) is a software framework that facilitates the work with protein interaction networks by 1) integrating data from multiple sources in a centralized database, 2) providing a library that handles all operations related with the network and 3) automating the analysis of protein-protein interactions networks.

PIANA can also be used as a stand-alone application to create protein interaction networks and perform analyses on them.

The PIANA architecture is described in this image. PIANA is implemented as a collection of python modules that can be used separatedly as libraries or as a stand-alone application through a user interface. A PIANA user does not need to know python or programming to perform analyses of protein interaction networks: all PIANA parameters and commands are set through a simple configuration file.

Currently, PIANA can integrate into a single protein interaction network data extracted from DIP, MIPS, HPRD, BioGrid, IntAct, MINT, BIND, STRING and any data that follows the HUPO PSI standard . PIANA can also use interactions specified by the user via simple flat text files. Therefore, analyses can be performed on a single network regardless of the sources from which the interactions were extracted. Moreover, PIANA integrates proteins coming from UniProt and NCBI GenBank and contains co-references between different types of protein identifiers.

This version of PIANA has been designed with the following people in mind:

PIANA is not a network visualizer and doesn't (currently) have a nice graphical user interface. However, PIANA is a powerfool tool for performing analyses of protein interaction networks and integrates into a single network most repositories of protein interaction data. Therefore, PIANA will be of a high interest for bioinformaticiens already used to computers.

This is the standard procedure of use for PIANA:

All README files linked from this page can be found in the main directory of PIANA once you have installed it on your machine.

In addition to providing a framework for working with protein interaction networks, PIANA can also be used as a stand-alone application to create and analyze protein interaction networks.

Data Integration

PIANA accepts most types of protein codes and contains coreferences between the different types. Therefore, PIANA accepts data from most external databases, and interactions from different sources are integrated into a single network. Moreover, the list of input proteins provided by the user can be in any of the protein code types accepted by PIANA: uniprot entry names and accession numbers, gene names, NCBI GenBank gi, geneID, unigene, FlyBase, ENSEMBL, PDB, PIR and the protein sequence in fasta format. PIANA transforms these codes into its internal identifiers, processes the data and returns the results using the type of protein code chosen by the user. PIANA contains a very extensive mapping between protein identifiers that has been created by parsing multiple databases and applying our own algorithms for creating coreferences.

In consequence, PIANA users can work with all interactions from all databases integrated into a single network, allowing them to perform more comprenhensive studies of the protein interaction networks

PIANA can also be used as a translator between different code types.

Creation of protein-protein interaction networks

The user can choose to retrieve the interactions from the PIANA MySQL database, add his own interactions or a combination of both. Usually, a list of proteins of interest is given as input (referred hereafter as �root proteins�) and PIANA adds interactions extracted from the database for these proteins until the depth (number of interaction steps from a root protein) chosen by the user is reached. The user can also restrict the network to contain proteins and interactions according to different criteria: the species of the proteins, the source interaction databases and the method used to determine the interaction (e.g build network for human protein interactions detected by means of two hybrid experiments).

Interpreting the protein interaction network

The network and analysis results can be printed out as a detailed table of protein interactions or a file for graphical visualization. PIANA can also present only the interactions that appear in the intersection of the databases set by the user. Furthermore, PIANA can identify as well proteins that act as �linkers� between root proteins. Connecting two root proteins is an indication of important proteins in the pathways where the root proteins are involved.

--> New!: PIANA now outputs results in Cytoscape format. You can combine the data integration and analysis of PIANA with the visualization capabilities of Cytoscape.

When visualizing the network the user can ask PIANA to highlight proteins that have specific keywords in their function or description. PIANA also accepts files with over/under expressed genes to indicate in its output which of the proteins in the network also appear as being "relevant" in a microarray experiment

Checking pathways related to your PPI network

Given a list of proteins that are known to belong to a specific pathway, PIANA checks which proteins of the network appear on those pathways. Furthermore, if you have different PPI networks you can ask PIANA to compare them in terms of the pathways that are 'affected' by each network.

Predicting new interactions

PIANA can predict protein interactions by transferring interactions between proteins that share a given property. For example, PIANA predicts interactions using interologs (i.e orthologous proteins interact with the same proteins) by means of COG codes. In a similar way, SCOP codes can be used to transfer interactions between proteins that share a similar type of domain family.

Finding �interaction distance� between proteins

Obtaining lists of proteins that are at a certain interaction distance (ie. minimum number of interacting steps that have to be taken between two proteins) from another protein can be useful for tasks such as searching for remote similarities between proteins. PIANA integrates algorithms such as Dijkstra for efficiently finding the interaction distance and the set of proteins that is at a given distance from a root protein.

Identifying spots in 2D gels from electrophoresis experiments

In combination with the results of an electrophoresis experiment, PIANA can be used to accurately identify spots in a 2D gel. By comparing the molecular weight and isoelectric point of the proteins in the network with the features of the spots in the 2D gel, spots that could not be identified by mass spectrometry can be assigned to proteins in the network.

Clustering proteins using their GO terms

Networks can become very complex and hence, clustering methods are needed to ease their interpretation. PIANA provides a clustering library for protein interaction networks and specifically, methods for clustering proteins by their GO terms.

Extending PIANA

PIANA is designed so that new and independent modules can be easily added. Moreover, PIANA libraries can be used to work with protein interaction networks in external python programs that do not want to take care of the low level operations related to graphs, databases and protein interaction networks.

For more PIANA capabilities, please refer to the documentation provided along with the code. Moreover, all PIANA commands are described in this file: <general%5Ftemplate.piana%5Fconf>

PIANA can be used as a stand-alone application or as a library for working with graphs, protein-protein interaction networks and protein data.

These are some of the README files provided along with the code. You can read them before deciding whether it is worth it for you to download PIANA or not:

If you wish to use PIANA for implementing your own classes, programs or scripts, you can use its classes and methods. Here, you can read the documentation related to the four fundamental classes contained in PIANA:

Full HTML documentation of all classes and methods is provided along with the code (piana/docs/documentation/piana_documentation.html).

If you are just interested in using the parsers provided by PIANA, you should read README.populate_piana_db

| **Subscribe to the PIANA discussion list!**Open mailing list for discussing PIANA, asking questions or reporting bugs (only members can post) | | | --------------------------------------------------------------------------------------------------------------------------------------------- | |

PIANA requires a MySQL database for creating the protein-protein interaction networks. Currently, we do not provide a web server, therefore PIANA users must use the tools we provide to create their own database. On one side, this increases the difficulty of installing PIANA. On the other side, this gives more control to PIANA users on the data they use for their analysis.

There are two options for the PIANA user:

PIANA is normally used to perform the analysis of protein interaction networks built from a group of "interesting proteins", where 'interesting' can refer to different concepts: their genes were found over/under expressed in a microarray experiment, proteins known to be involved in a pathway being studied, proteins identified by mass spectrometry, etc.

A standard use of PIANA would be:

To illustrate the analyses performed by PIANA, we have used genes that mediate breast cancer metastasis to lung, extracted from an article published by the Massague group at Memorial Sloan Kettering: click here to see the analysis we have performed for these genes

This file describes step by step how the above analysis was performed using PIANA.

For a complete listing of all PIANA options, this file describes in detail all input and output parameters to PIANA, as well as commands that PIANA can execute

PIANA distribution contains several directories and files compressed in a

tar.gz

file. Source code and documentation files are included in the distribution. If you want to use the limited piana database we provide, you'll need to download as well the mysql dump of pianaDB_limited.

PIANA code is under GNU General Public License By downloading the code you agree to the terms of that license.

Latests versions of PIANA code and the database pianaDB_limited can be downloaded here:

(for a complete list of PIANA downloads and versions descriptions click here)
(if you get a broken link, it might be due to the fact that you are not seeing the latest version of the web page: do a (shift) reload and try downloading again)

To install PIANA on your computer, download the tar.gz file, uncompress it (using tar -xvzf) in the directory where you prefer PIANA to be located and follow instructions on fileREADME.piana_installation

We recommend using wget to download this file (size is 632MB!): $> wget http://sbi.imim.es/piana/pianaDB\_limited.v1.4.mysqldump.gz
This will allow you to continue with the download (ie. wget -c) in case there are any problems during the connection.

To use this database as your piana database you should download this file and follow instructions on README.pianaDB_limited

| **Subscribe to the PIANA Announce list!**Low-volume mailing list use to announce new PIANA versions and updates | | | --------------------------------------------------------------------------------------------------------------- | |

If you use this software to either build your own code or perform biological analyses, please do not forget to make reference to this article:

If you use the interaction predictions by distant structure/sequence patterns provided with pianaDB_limited, please do not forget to make reference to this article:

Don't forget that if you use PIANA for your analyses, apart from citing PIANA you must make reference to the databases that contain the information that allowed you to reach your results. For example, if PIANA finds interactions extracted from DIP for your proteins of interest, you must also make reference to DIP in your articles.

The current version of PIANA has been written by Ramon Aragues and Javier Garc�a-Garc�a
with contributions by D. Jaeggi, P. Boixeda, J. Planas and B. Gregori


If you encounter problems using PIANA, or have suggestions on how to improve it, send an e-mail to boliva at imim.es


Copyright © 2007 PIANA is underGNU General Public License.