(original) (raw)

Feature Set Downloading:

�������

è [new] �����Since the related positive reference sets and feature sources have updated rapidly over the years, just sharing the extracted feature files or partial prediction scores are not good enough anymore. Thus,

Here I share the code and related files to generate our feature set. Download (both summary and detailed !)

The general framework and the codes should be quite useful. You could try to find more recent versions of related evidence sets to make improvement though.

> Feature Details in the data set

Group Index # of features Dataset Attribute Property Data Position in the set
1 20 Gene Expression Real value: [-1, 1] 1-20
2 21 GO Molecular Function {1, 0} 21- 41
3 33 GO Biological Process {1, 0} 42 - 74
4 23 GO Component {1, 0} 75 - 97
5 1 Protein Expression Real Value � Non Negative 98
6 1 Essentiality {2 , 1, 0} 99
7 1 HMS_PCI Mass * { 1, 0} 100
8 1 TAP Mass * { 1, 0} 101
9 1 Y2H { 1, 0} 102
10 1 Synthetic Lethal { 1, 0} 103
11 1 GeneNeighborhood / Gene Fusion / Gene Co-occur { 1, 0} 104
12 1 Sequence Similarity Real value - Non negative 105
13 4 Homology based PPI Discrete: Non-negative (Most 0, 1) 106 � 109
14 1 Domain-Domain Interaction Real value between [0, 1] 110
15 16 Protein-DNA TF group binding Non-negative discrete, most 0 111 � 126
16 25 MIPS Protein Class { 1, 0} 127 � 151
17 11 MIPS Mutant Phenotype { 1, 0} 151 - 162

* Matrix model for co-complex and co-pathway prediction. Spoke model for direct PPI prediction.

> Shared data sets

> Note

� �-100� in the feature sets means a missing value in that position!

� Details about the gold standard positive sets shared above, please check �Gold Standard datasets� section in the paper.

� The negative data sets I put here is just a random subset containing ~230,000 yeast protein-protein pairs that are not in the positive PPI set of each specific task.

� In the paper, we assume the size ratio between the positive examples and the negative examples is roughly 1:600 (estimated based on experimental data) in building the train-test sets.

� This ratio is still questionable and need further discussion.

� If you happen to know a better answer other than the above strategy I used, it would be greatly appreciated if you could contact me.

� If you notice any mistakes in the data, please contact me as soon as possible. Thanks ahead !

FAQ page