(original) (raw)
15-750 Grad Algorithms 02/13/01 Global min cuts - Karger's alg - Karger-Stein ================================================ Today: Global min cut problem. (Was originally scheduled for later, but is related to alg from last time so moved up to now). Not as complicated as alg from last time, but conceptually tricky. THE PROBLEM: ----------- Given a connected graph G, we want to find the smallest set of edges whose removal separates the graph into at least two pieces. For instance, say this graph represents a network, and we want to know how ``robust'' it is in the sense of the the minimum number of links whose failure can cause the network to become disconnected. [We won't be worried about the weighted version.] E.g., *---------* / | | \ * | | * \ | | / *---------* Easy fact: the minimum cut has size at most the minimum degree of any node. Later on we'll be talking about network flows and the ``s-t minimum cut'' problem, where you want a cut that separates two specific vertices s and t. In fact, for a while, the best algorithm for finding the global minimum cut was to solve the n max flows. (Pick arbitrary start node s and try all different t's). People then worked on ways to do these n computations together faster than n separately and things like that. But, recently Karger (and then [Karger-Stein]) gave neat randomized alg that gives global min cut faster than anyone knows how to do the S-T cut. And the basic idea is really really simple. Before giving the algorithm, an important thing to point out: This will be what's called a "MONTE-CARLO ALGORITHM". On any given trial, it might succeed (finding the min cut) or might fail (finding a cut that's too big). We'll prove that Pr(success) > p, for some value p. We can improve the chance of success by running r times and taking the best one, to get Pr(success) > 1 - (1-p)^r. For instance, if r = 5*ln(n)/p, then this is > 1 - 1/n^5. But, we'll never really know if we failed. In complexity-theoretic terms, this is an RP algorithm for the question "is there a cut of size <= k?" This is unlike all our previous randomized algorithms which succeeded for sure but whose running time was a random variable. KARGER'S MIN CUT ALGORITHM (version 1: not fastest) 1. Start with each vertex in its own component. 2. Pick a random edge out of all those that go between two different components (so now we have one fewer component). 3. When only two components left, output (the edges between) them as the cut. Easy algorithm. Plan is to show it has at least a 1/n^2 chance of working. (Can think of as: give random edge lengths and then run Kruskal until only two trees left.) ANALYSIS -------- 1) At any point the algorithm, say a cut is ALIVE if it doesn't split any component chosen so far. So, at the end there is only one live cut. The size of the minimum live cut can only go UP as the algorithm progresses, since all alive cuts at time i are also alive at time j= 1 - k/(nk/2) = 1 - 2/n = (n-2)/n. Suppose it's survived so far, and there are now n-i components. Prob of survival is at least: 1 - 2/(n-i) = (n-i-2)/(n-i) So, prob that C survives overall is at least: Prod_{i=0}^{n-3} (n-i-2)/(n-i) = (n-2)/n * (n-3)/(n-1) * (n-4)/(n-2) * ... * 1/3 = 2/(n(n-1)). ====================================================================== Neat implication: How many minimum cuts are there? A: at most n(n-1)/2. Running time: can implement in O(m alpha(m)) time, or in terms of n only, time O(n^2). But, if we want to only have an epsilon chance of failure, need to run it O(n^2*log(1/epsilon)) times. So, if set epsilon to constant, this is O(mn^2*alpha(m)), or O(n^4) if in terms of n only. So, it's simple, but not all THAT fast. How can we speed all this up? ====================================================================== SPEEDING UP THE ALGORITHM [Due to D. Karger and C. Stein] ------------------------- Claim: earlier steps are less risky than later steps. Why? At what point is our chance of having lost C about 50/50? Ans: when we have about n/sqrt(2) components left. By that time, our success probability is about: (n/sqrt(2))^2 / n^2 = 1/2. So, this suggests a speedup: Instead of repeating from the start, let's repeat from only partway down. E.g., from when we had just n/sqrt(2) components left. Can think of this as building a binary tree (where root has degree 1) of depth 2lg(n), # leaves is n^2, where then each edge is independently destroyed with prob 1/2. Edge being destroyed corresponds to losing the minimum cut. What we want is for some path to the leaf to survive. [We proved prob of losing cut in any step given that we hadn't lost it yet was a most 1/2, so if we can prove good things for this idealized process, we will have what we want] Q: What's the probability some path to a leaf survives? (can think of the strategy of doing n^2 repititions as a root with n^2 branches leading, in depth 2lg(n), to n^2 leaves.) First, what is running time? Let's just do as a function of n. In order to make this efficient, after going partway down, lets contract components into vertices and keep multi-edges as a single edge with a weight (and then pick randomly from this weighted distribution). The point is, this allows us to view the second part as two recursive calls on a smaller graph. So, our running time is: T(n) <= O(n^2) + 2T(n/sqrt(2)) = O(n^2 log n) Now: want to show that a path survives with probability at least 1/log(n). So, we can just run our algorithm log(n) times to get constant fraction prob of success, or log^2(n) times to get high probability. Then total time is O(n^2 log^3(n)) Proof: For tree of depth d, let P_d be the probability a path survives. Looking recursively, we get: P_d = (1/2) (Prob at least one of two subtrees has path that survives) = (1/2) (2P_{d-1} - (P_{d-1})^2) (using Pr(A or B) = Pr(A) + Pr(B) - Pr(A and B)) = P_{d-1} - (1/2) (P_{d-1})^2. We can now verify that P_d >= 1/(d+1), since: 1/d - 1/(2d^2) > 1/d - 1/(d(d+1)) = 1/d One good way of looking at what we've shown is this: we have shown that we can replace n^2 independent runs of the algorithm by n^2 DEPENDENT runs of the algorithm in a way that drastically reduces running time ((n^2 * cost-of-one-run) versus (log(n) * cost-of-one-run), but only slightly reduces the chance of success (constant versus 1/log(n)).