Protein Interactions And Network Analysis (original) (raw)

Attention! PIANA is not supported anymore. Check BIANA project instead.

What is PIANA?
How does PIANA work?
Main features
Documentation
PIANA databases and parsers
Examples
Download code and database (Jul 08 2007 - New! PIANA v1.4 is available!)
References
Authors and acknowledgements

PIANA (Protein Interactions And Network Analysis) is a software framework that facilitates the work with protein interaction networks by 1) integrating data from multiple sources in a centralized database, 2) providing a library that handles all operations related with the network and 3) automating the analysis of protein-protein interactions networks.

PIANA can also be used as a stand-alone application to create protein interaction networks and perform analyses on them.

The PIANA architecture is described in this image. PIANA is implemented as a collection of python modules that can be used separatedly as libraries or as a stand-alone application through a user interface. A PIANA user does not need to know python or programming to perform analyses of protein interaction networks: all PIANA parameters and commands are set through a simple configuration file.

Currently, PIANA can integrate into a single protein interaction network data extracted from DIP, MIPS, HPRD, BioGrid, IntAct, MINT, BIND, STRING and any data that follows the HUPO PSI standard . PIANA can also use interactions specified by the user via simple flat text files. Therefore, analyses can be performed on a single network regardless of the sources from which the interactions were extracted. Moreover, PIANA integrates proteins coming from UniProt and NCBI GenBank and contains co-references between different types of protein identifiers.

This version of PIANA has been designed with the following people in mind:

the bioinformatician who needs to perform analyses of protein interaction networks
the bioinformatician who needs to integrate many interaction repositories into a single network
the bioinformatician who is helping experimentalists with their proteins of interest
the bioinformatician who is working on protein interaction networks and needs an easy-to-use platform that takes care of all the low level details

PIANA is not a network visualizer and doesn't (currently) have a nice graphical user interface. However, PIANA is a powerfool tool for performing analyses of protein interaction networks and integrates into a single network most repositories of protein interaction data. Therefore, PIANA will be of a high interest for bioinformaticiens already used to computers.

This is the standard procedure of use for PIANA:

Create a configuration file as described in the template for PIANA configuration files
Run piana: $> python piana.py --configuration-file=your_configuration_file
Analyze your results This file describes in detail examples on how to use PIANA in practice. An illustrated example of a PIANA experiment is provided here.

All README files linked from this page can be found in the main directory of PIANA once you have installed it on your machine.

In addition to providing a framework for working with protein interaction networks, PIANA can also be used as a stand-alone application to create and analyze protein interaction networks.

Data Integration

PIANA accepts most types of protein codes and contains coreferences between the different types. Therefore, PIANA accepts data from most external databases, and interactions from different sources are integrated into a single network. Moreover, the list of input proteins provided by the user can be in any of the protein code types accepted by PIANA: uniprot entry names and accession numbers, gene names, NCBI GenBank gi, geneID, unigene, FlyBase, ENSEMBL, PDB, PIR and the protein sequence in fasta format. PIANA transforms these codes into its internal identifiers, processes the data and returns the results using the type of protein code chosen by the user. PIANA contains a very extensive mapping between protein identifiers that has been created by parsing multiple databases and applying our own algorithms for creating coreferences.

In consequence, PIANA users can work with all interactions from all databases integrated into a single network, allowing them to perform more comprenhensive studies of the protein interaction networks

PIANA can also be used as a translator between different code types.

Creation of protein-protein interaction networks

The user can choose to retrieve the interactions from the PIANA MySQL database, add his own interactions or a combination of both. Usually, a list of proteins of interest is given as input (referred hereafter as �root proteins�) and PIANA adds interactions extracted from the database for these proteins until the depth (number of interaction steps from a root protein) chosen by the user is reached. The user can also restrict the network to contain proteins and interactions according to different criteria: the species of the proteins, the source interaction databases and the method used to determine the interaction (e.g build network for human protein interactions detected by means of two hybrid experiments).

Interpreting the protein interaction network

The network and analysis results can be printed out as a detailed table of protein interactions or a file for graphical visualization. PIANA can also present only the interactions that appear in the intersection of the databases set by the user. Furthermore, PIANA can identify as well proteins that act as �linkers� between root proteins. Connecting two root proteins is an indication of important proteins in the pathways where the root proteins are involved.

--> New!: PIANA now outputs results in Cytoscape format. You can combine the data integration and analysis of PIANA with the visualization capabilities of Cytoscape.

When visualizing the network the user can ask PIANA to highlight proteins that have specific keywords in their function or description. PIANA also accepts files with over/under expressed genes to indicate in its output which of the proteins in the network also appear as being "relevant" in a microarray experiment

Checking pathways related to your PPI network

Given a list of proteins that are known to belong to a specific pathway, PIANA checks which proteins of the network appear on those pathways. Furthermore, if you have different PPI networks you can ask PIANA to compare them in terms of the pathways that are 'affected' by each network.

Predicting new interactions

PIANA can predict protein interactions by transferring interactions between proteins that share a given property. For example, PIANA predicts interactions using interologs (i.e orthologous proteins interact with the same proteins) by means of COG codes. In a similar way, SCOP codes can be used to transfer interactions between proteins that share a similar type of domain family.

Finding �interaction distance� between proteins

Obtaining lists of proteins that are at a certain interaction distance (ie. minimum number of interacting steps that have to be taken between two proteins) from another protein can be useful for tasks such as searching for remote similarities between proteins. PIANA integrates algorithms such as Dijkstra for efficiently finding the interaction distance and the set of proteins that is at a given distance from a root protein.

Identifying spots in 2D gels from electrophoresis experiments

In combination with the results of an electrophoresis experiment, PIANA can be used to accurately identify spots in a 2D gel. By comparing the molecular weight and isoelectric point of the proteins in the network with the features of the spots in the 2D gel, spots that could not be identified by mass spectrometry can be assigned to proteins in the network.

Clustering proteins using their GO terms

Networks can become very complex and hence, clustering methods are needed to ease their interpretation. PIANA provides a clustering library for protein interaction networks and specifically, methods for clustering proteins by their GO terms.

Extending PIANA

PIANA is designed so that new and independent modules can be easily added. Moreover, PIANA libraries can be used to work with protein interaction networks in external python programs that do not want to take care of the low level operations related to graphs, databases and protein interaction networks.

For more PIANA capabilities, please refer to the documentation provided along with the code. Moreover, all PIANA commands are described in this file: <general%5Ftemplate.piana%5Fconf>

PIANA can be used as a stand-alone application or as a library for working with graphs, protein-protein interaction networks and protein data.

These are some of the README files provided along with the code. You can read them before deciding whether it is worth it for you to download PIANA or not:

PIANA requirements

PIANA installation

PIANA tutorial

PIANA configuration file: this file describes in detail the PIANA input parameters, output parameters and commands

If you wish to use PIANA for implementing your own classes, programs or scripts, you can use its classes and methods. Here, you can read the documentation related to the four fundamental classes contained in PIANA:

Graph: to create, manage and work with general purpose graphs

PianaGraph: to create, manage and work with protein-protein interaction networks

PianaDBaccess: to access and insert information into piana databases

PianaApi: to execute commands provided by PIANA

Full HTML documentation of all classes and methods is provided along with the code (piana/docs/documentation/piana_documentation.html).

If you are just interested in using the parsers provided by PIANA, you should read README.populate_piana_db

| **Subscribe to the PIANA discussion list!**Open mailing list for discussing PIANA, asking questions or reporting bugs (only members can post) | | | --------------------------------------------------------------------------------------------------------------------------------------------- | |

PIANA requires a MySQL database for creating the protein-protein interaction networks. Currently, we do not provide a web server, therefore PIANA users must use the tools we provide to create their own database. On one side, this increases the difficulty of installing PIANA. On the other side, this gives more control to PIANA users on the data they use for their analysis.

There are two options for the PIANA user:

Use the database we provide along with the code: pianaDB_limited
You can download the copy of our database that we provide as a MySQL dump. This database contains interactions predicted from sequence/structure distant patterns(2) and from the Database of Interacting Proteins (DIP). For copyright reasons, this database does not contain interactions from other databases compatible with PIANA. If you wish to add more interactions to this database, you can use the parsers provided by PIANA to populate it with interactions from HPRD, MIPS, BIND and STRING. PIANA can also parse any interaction data in flat text files or files following the HUPO PSI format for protein interaction data.
These are the databases currently contained in pianaDB_limited:
- uniprot (swissprot and trembl) Release 6.9
- genpept release 151
- ncbi nr January, 2006
- ncbi taxonomy 25/01/2006
- pdbsprotec February 2nd 2006
- COG January 2006
- SCOP release 1.69
- Gene Ontology January 2006
- Interactions from (2)
- Interactions from DIP January 16th 2006
  - pianaDB_limited is a limited version of the database we use at our lab: the main difference with respect to the complete version of it is that, apart from only containing interactions from the two sources described above, DIP interactions do not contain information about the method that was used to detect them. This has been done in order to agree with the redistribution license that governs DIP. For a complete description on how this database has been generated read README.pianaDB_limited
  - All information in PIANA databases is labelled with the source database, for example, interactions that come from DIP are labelled in the database as 'dip' interactions. When visualizing the network, each database has a different color code. For further details on the data contained in PIANA databases one must refer to the original sources.
Create and populate your own piana database
You can create your own copy of a piana database using the scripts and parsers provided by PIANA. This README file explains step by step the procedure of creating your own copy of a piana database. Even though it takes time to populate a new piana database, it gives you more control on which data you insert into the database. Moreover, you can create a PIANA database that is not limited by copyright issues.

PIANA is normally used to perform the analysis of protein interaction networks built from a group of "interesting proteins", where 'interesting' can refer to different concepts: their genes were found over/under expressed in a microarray experiment, proteins known to be involved in a pathway being studied, proteins identified by mass spectrometry, etc.

A standard use of PIANA would be:

Build the interaction network for those interesting proteins
Visualize the network to detect interesting features
Identify proteins that connect the initial proteins
Highlight proteins of the network with specific functions or keywords defined by user
Highlight proteins in the network that are over/under expressed
Predict novel interactions for the proteins of interest using interologs
Cluster the network by molecular function, biological process or cellular component

To illustrate the analyses performed by PIANA, we have used genes that mediate breast cancer metastasis to lung, extracted from an article published by the Massague group at Memorial Sloan Kettering: click here to see the analysis we have performed for these genes

This file describes step by step how the above analysis was performed using PIANA.

For a complete listing of all PIANA options, this file describes in detail all input and output parameters to PIANA, as well as commands that PIANA can execute

PIANA distribution contains several directories and files compressed in a

tar.gz

file. Source code and documentation files are included in the distribution. If you want to use the limited piana database we provide, you'll need to download as well the mysql dump of pianaDB_limited.

PIANA code is under GNU General Public License By downloading the code you agree to the terms of that license.

Latests versions of PIANA code and the database pianaDB_limited can be downloaded here:

(for a complete list of PIANA downloads and versions descriptions click here)
(if you get a broken link, it might be due to the fact that you are not seeing the latest version of the web page: do a (shift) reload and try downloading again)

PIANA v1.4 full distribution (Jul 02 2007): source code and documentation -- [DOWNLOAD] (click here for details)
Read this to learn more about PIANA v1.4 )

To install PIANA on your computer, download the tar.gz file, uncompress it (using tar -xvzf) in the directory where you prefer PIANA to be located and follow instructions on fileREADME.piana_installation

pianaDB_limited : piana database as described above (version 1.4: pianaDB_limited.v1.4.mysqldump.gz) -- [DOWNLOAD] (please read below about downloading with wget instead of doing it with your browser)

We recommend using wget to download this file (size is 632MB!): $> wget http://sbi.imim.es/piana/pianaDB\_limited.v1.4.mysqldump.gz
This will allow you to continue with the download (ie. wget -c) in case there are any problems during the connection.

To use this database as your piana database you should download this file and follow instructions on README.pianaDB_limited

| **Subscribe to the PIANA Announce list!**Low-volume mailing list use to announce new PIANA versions and updates | | | --------------------------------------------------------------------------------------------------------------- | |

If you use this software to either build your own code or perform biological analyses, please do not forget to make reference to this article:

(1) R. Aragues, D. Jaeggi and B. Oliva
"PIANA: Protein Interactions and Network Analysis"
Bioinformatics. 2006 Apr 15;22(8):1015-7 (2006) [PubMed] [Full Text]

If you use the interaction predictions by distant structure/sequence patterns provided with pianaDB_limited, please do not forget to make reference to this article:

(2) Espadaler, J., O. Romero-Isart, et al. (2005). "Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships." Bioinformatics 21(16): 3360-8. [ PubMed ] [ Full Text ]

Don't forget that if you use PIANA for your analyses, apart from citing PIANA you must make reference to the databases that contain the information that allowed you to reach your results. For example, if PIANA finds interactions extracted from DIP for your proteins of interest, you must also make reference to DIP in your articles.

The current version of PIANA has been written by Ramon Aragues and Javier Garc�a-Garc�a
with contributions by D. Jaeggi, P. Boixeda, J. Planas and B. Gregori

If you encounter problems using PIANA, or have suggestions on how to improve it, send an e-mail to boliva at imim.es