GitHub - MengLiuPurdue/Graph-Topological-Data-Analysis (original) (raw)

DOI

This repo contains code for Topological structure of complex predictions.

packages requirement

install using conda

The easiest way to setup the environment required by GTDA is to use the included GTDA.yml file. Run this file with the following command:

conda env create --file GTDA.yml

which will create a virtual environment under the current working directory. The installation can take up to one or two hours depending on network speed. Then, the code can be run after activating the conda environment using the following command:

conda activate GTDA

create a Reeb network

from GTDA.GTDA_utils import compute_reeb, NN_model from GTDA.GTDA import GTDA

nn_model = NN_model() nn_model.A = G # graph to analyze, in scipy csr format nn_model.preds = preds # prediction matrix, samples-by-class nn_model.labels = labels # integer class assignments starting from 0 nn_model.train_mask = train_mask # training mask in bool nn_model.val_mask = val_mask # validation mask in bool nn_model.test_mask = test_mask # testing mask in bool smallest_component = 20 overlap = 0.1 labels_to_eval = list(range(preds.shape[1])) GTDA_record = compute_reeb(GTDA,nn_model,labels_to_eval,smallest_component,overlap, node_size_thd=5,reeb_component_thd=5,nprocs=10,device='cuda')

analyze a Reeb network

g_reeb = GTDA_record['g_reeb'] # Reeb network in csr format gtda = GTDA_record['gtda'] # an instance of GTDA class gtda.final_components_filtered[gtda.filtered_nodes[reeb_node_index]] # map a reeb node back to the original component gtda.A_reeb # projected Reeb network with the same set of nodes as the original graph gtda.sample_colors_mixing # GTDA estimated errors for each sample

Swiss Roll experiment (demo)

Prerequisites:

None, self contained

Files:

Expected running time:

Amazon Electronics experiment

Prerequisites:

Download All products under 'Electronics' from the 2014 version of Amazon reviews data from http://jmcauley.ucsd.edu/data/amazon/index_2014.html and put under dataset/electronics folder

Files:

Imagenette experiment

Prerequisites:

Download from https://github.com/fastai/imagenette and put under dataset/imagenette folder

Files:

Gene mutation experiment

Prerequisites:

Download variant_summary.txt from https://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/ and put under dataset/variants folder

Download human reference genome hg19.fa, hg38.fa and put under dataset/variants folder

Install tensorflow and setup Enformer based on instructions of https://github.com/deepmind/deepmind-research/tree/master/enformer. GPU with cuda support is highly recommended.

Files:

Additional experiments in supplement

Prerequisites:

To run the CNN model comparison experiment, all training and validation images of ImageNet-1k dataset must be downloaded from https://www.image-net.org/ and put under dataset/imagenet_1k. The folder should be organized as:

dataset/imagenet_1k/
    train/
        class1/
            img1
            img2
            ...
        class2/
        ...
    val/
        class1/
            img1
            img2
            ...
        class2/
        ...

It also requires to install timm package to get the pretrained VOLO model.

To run the chest X-ray experiment, all X-ray images and expert labels should be downloaded from https://cloud.google.com/healthcare-api/docs/resources/public-datasets/nih-chest. The images should be put under dataset/chest_xray/images and expert labels under dataset/chest_xray. We also use the implementation from https://github.com/zoogzog/chexnet to train a DenseNet-121 model.

Files:

Interactive web interface for better exploration

Once we have computed the Reeb net, other than examing the results in traditional figures, we have also created an interactive web interface using D3 library to explore the results in a web browser. To transform the GTDA results into the format compatible with the web interface, the following function needs to be called:

from GTDA.GTDA_utils import save_to_json savepath = "web/" save_to_json(GTDA_record, nn_model, savepath)

This function will save the results to web/reeb_net.js. Currently, this interface doesn't support another filename as reeb_net.js is hard coded in the code. This is to facilitate running the interface locally. Once the function is called, we can open web/index.html to explore the results.

Demos:

https://mengliupurdue.github.io/Graph-Topological-Data-Analysis/

Supported operations: