Compressed sparse graph routines (scipy.sparse.csgraph) — SciPy v1.15.2 Manual (original) (raw)

Fast graph algorithms based on sparse matrix representations.

Contents#

connected_components(csgraph[, directed, ...]) Analyze the connected components of a sparse graph
laplacian(csgraph[, normed, return_diag, ...]) Return the Laplacian of a directed graph.
shortest_path(csgraph[, method, directed, ...]) Perform a shortest-path graph search on a positive directed or undirected graph.
dijkstra(csgraph[, directed, indices, ...]) Dijkstra algorithm using Fibonacci Heaps
floyd_warshall(csgraph[, directed, ...]) Compute the shortest path lengths using the Floyd-Warshall algorithm
bellman_ford(csgraph[, directed, indices, ...]) Compute the shortest path lengths using the Bellman-Ford algorithm.
johnson(csgraph[, directed, indices, ...]) Compute the shortest path lengths using Johnson's algorithm.
yen(csgraph, source, sink, K, *[, directed, ...]) Yen's K-Shortest Paths algorithm on a directed or undirected graph.
breadth_first_order(csgraph, i_start[, ...]) Return a breadth-first ordering starting with specified node.
depth_first_order(csgraph, i_start[, ...]) Return a depth-first ordering starting with specified node.
breadth_first_tree(csgraph, i_start[, directed]) Return the tree generated by a breadth-first search
depth_first_tree(csgraph, i_start[, directed]) Return a tree generated by a depth-first search.
minimum_spanning_tree(csgraph[, overwrite]) Return a minimum spanning tree of an undirected graph
reverse_cuthill_mckee(graph[, symmetric_mode]) Returns the permutation array that orders a sparse CSR or CSC matrix in Reverse-Cuthill McKee ordering.
maximum_flow(csgraph, source, sink) Maximize the flow between two vertices in a graph.
maximum_bipartite_matching(graph[, perm_type]) Returns a matching of a bipartite graph whose cardinality is at least that of any given matching of the graph.
min_weight_full_bipartite_matching(biadjacency) Returns the minimum weight full matching of a bipartite graph.
structural_rank(graph) Compute the structural rank of a graph (matrix) with a given sparsity pattern.
NegativeCycleError([message])
construct_dist_matrix(graph, predecessors[, ...]) Construct distance matrix from a predecessor matrix
csgraph_from_dense(graph[, null_value, ...]) Construct a CSR-format sparse graph from a dense matrix.
csgraph_from_masked(graph) Construct a CSR-format graph from a masked array.
csgraph_masked_from_dense(graph[, ...]) Construct a masked array graph representation from a dense matrix.
csgraph_to_dense(csgraph[, null_value]) Convert a sparse graph representation to a dense representation
csgraph_to_masked(csgraph) Convert a sparse graph representation to a masked array representation
reconstruct_path(csgraph, predecessors[, ...]) Construct a tree from a graph and a predecessor list.

Graph Representations#

This module uses graphs which are stored in a matrix format. A graph with N nodes can be represented by an (N x N) adjacency matrix G. If there is a connection from node i to node j, then G[i, j] = w, where w is the weight of the connection. For nodes i and j which are not connected, the value depends on the representation:

As a concrete example, imagine that you would like to represent the following undirected graph:

  G

 (0)
/   \

1 2 /
(2) (1)

This graph has three nodes, where node 0 and 1 are connected by an edge of weight 2, and nodes 0 and 2 are connected by an edge of weight 1. We can construct the dense, masked, and sparse representations as follows, keeping in mind that an undirected graph is represented by a symmetric matrix:

import numpy as np G_dense = np.array([[0, 2, 1], ... [2, 0, 0], ... [1, 0, 0]]) G_masked = np.ma.masked_values(G_dense, 0) from scipy.sparse import csr_array G_sparse = csr_array(G_dense)

This becomes more difficult when zero edges are significant. For example, consider the situation when we slightly modify the above graph:

 G2

 (0)
/   \

0 2 /
(2) (1)

This is identical to the previous graph, except nodes 0 and 2 are connected by an edge of zero weight. In this case, the dense representation above leads to ambiguities: how can non-edges be represented if zero is a meaningful value? In this case, either a masked or sparse representation must be used to eliminate the ambiguity:

import numpy as np G2_data = np.array([[np.inf, 2, 0 ], ... [2, np.inf, np.inf], ... [0, np.inf, np.inf]]) G2_masked = np.ma.masked_invalid(G2_data) from scipy.sparse.csgraph import csgraph_from_dense

G2_sparse = csr_array(G2_data) would give the wrong result

G2_sparse = csgraph_from_dense(G2_data, null_value=np.inf) G2_sparse.data array([ 2., 0., 2., 0.])

Here we have used a utility routine from the csgraph submodule in order to convert the dense representation to a sparse representation which can be understood by the algorithms in submodule. By viewing the data array, we can see that the zero values are explicitly encoded in the graph.

Directed vs. undirected#

Matrices may represent either directed or undirected graphs. This is specified throughout the csgraph module by a boolean keyword. Graphs are assumed to be directed by default. In a directed graph, traversal from node i to node j can be accomplished over the edge G[i, j], but not the edge G[j, i]. Consider the following dense graph:

import numpy as np G_dense = np.array([[0, 1, 0], ... [2, 0, 3], ... [0, 4, 0]])

When directed=True we get the graph:

---1--> ---3--> (0) (1) (2) <--2--- <--4---

In a non-directed graph, traversal from node i to node j can be accomplished over either G[i, j] or G[j, i]. If both edges are not null, and the two have unequal weights, then the smaller of the two is used.

So for the same graph, when directed=False we get the graph:

Note that a symmetric matrix will represent an undirected graph, regardless of whether the ‘directed’ keyword is set to True or False. In this case, using directed=True generally leads to more efficient computation.

The routines in this module accept as input either scipy.sparse representations (csr, csc, or lil format), masked representations, or dense representations with non-edges indicated by zeros, infinities, and NaN entries.