proof of Birkhoff-von Neumann theorem (original) (raw)

First, we prove the following lemma:
Lemma:
A convex combination of doubly stochastic matrices is doubly stochastic.Proof:
Let {Ai}i=1m be a collection of n×n doubly-stochastic matrices, and suppose {λi}i=1m is a collection of scalars satisfying∑i=1mλi=1 and λi≥0 for each i=1,…,m. We claim that A=∑i=1mλi⁢Ai is doubly stochastic.
Take any i∈{1,…,m}. Since Ai is doubly stochastic, each of its rows and columns sum to 1. Thus each of the rows and columns of λi⁢Ai sum to λi.
By the definition of elementwise summation, given matrices N=M1+M2, the sum of the entries in the ith column of N is clearly the sum of the sums of entries of the ith columns of M1 and M2 respectively. A similar result holds for the jth row.
Hence the sum of the entries in the ith column of A is the sum of the sums of entries of the ith columns of λk⁢Ak for each i, that is, ∑k=1mλk=1. The sum of entries of the jth row of A is the same. Hence A is doubly stochastic.
Observe that since a permutation matrix has a single nonzero entry, equal to 1, in each row and column, so the sum of entries in any row or column must be 1. So a permutation matrix is doubly stochastic, and on applying the lemma, we see that a convex combination of permutation matrices is doubly stochastic. This provides one direction of our proof; we now prove the more difficult direction: suppose B is doubly-stochastic. Define a weighted graph G=(V,E) with vertex set V={r1,…,rn,c1,…,cn}, edge set E, where ei⁢j=(ri,ci)∈E if Bi⁢j≠0, and edge weight ω, where ω⁢(ei⁢j)=Bi⁢j. Clearly G is a bipartite graph , with partitions R={r1,…,rn} and C={c1,…,cn}, since the only edges in E are between ri and cj for some i,j∈{1,…,n}. Furthermore, since Bi⁢j≥0, then ω⁢(e)>0 for every e∈E. For any A⊂V define N⁢(A), the neighbourhood of A, to be the set of vertices u∈V such that there is some v∈A such that (u,v)∈E. We claim that, for any v∈V, ∑u∈N⁢({v})ω⁢(u,v)=1. Take any v∈V; either v∈R or v∈C. Since G is bipartite, v∈R implies N⁢({v})⊂C, and v∈C implies N⁢({v})⊂R. Now,

v=ri⟹∑u∈N⁢(ri)ω⁢(ri,u)=∑ei⁢j∈Ej=1,nω⁢(ri,cj)=∑Bi⁢j≠0j=1,nBi⁢j=∑j=1nBi⁢j=1v=cj⟹∑u∈N⁢(cj)ω⁢(u,cj)=∑ei⁢j∈Ei=1,nω⁢(ri,cj)=∑Bi⁢j≠0i=1,nBi⁢j=∑i=1nBi⁢j=1

since B is doubly stochastic. Now, take any A⊂R. We have

| ∑w∈N⁢(A)v∈Aω⁢(v,w)=∑v∈A∑w∈N⁢({v})ω⁢(v,w)=∑v∈A1=|A| | | ------------------------------------------------------ |

Let B=N⁢(A). But then clearly A⊂N⁢(B), by definition of neighbourhood. So

| |N⁢(A)|=|B|=∑w∈N⁢(B)v∈Bω⁢(v,w)≥∑w∈Av∈Bω⁢(v,w)=∑v∈N⁢(A)w∈Aω⁢(v,w)=|A| | | ------------------------------------------------------------------------- |

So |N⁢(A)|≥|A|. We may therefore apply the graph-theoretic version of Hall’s marriage theorem to G to conclude that G has a perfect matching. So let M⊂E be a perfect matching for G. Define an n×n matrix P by

Pi⁢j={1if ⁢ei⁢j∈M0otherwise

Note that Bi⁢j=0 implies Pi⁢j=0: if Bi⁢j=0, then(ri,cj)∉E, so (ri,cj)∉M, which implies Pi⁢j=0. Further, we claim that P is a permutation matrix:
Let i be any row of P. Since M is a perfect matching of G, there exists e0∈M such that ri is an end of e0. Let the other end be cj for some i; then Pi⁢j=1.
Suppose ii,i2∈{1,…,n} with i1≠i2 and Pi1,j=Pi2,j=1 for some j. This implies (ri1,cj),(ri2,cj)∈M, but this implies the vertex i is the end of two distinct edges, which contradicts the fact that M is a matching.
Hence, for each row and column of P, there is exactly one nonzero entry, whose value is 1. So P is a permutation matrix. Define λ=mini,j∈{1,…,n}⁡{Bi⁢j∣Pi⁢j≠0}. We see that λ>0 since Bi⁢j≥0, and Pi⁢j≠0⟹Bi⁢j≠0. Further, λ=Bp⁢q for some p,q. Let D=B-λ⁢P. If D=0, then λ=1 and B is a permutation matrix, so we are done. Otherwise, note that D is nonnegative; this is clear since λ⁢Pi⁢j≤λ≤Bi⁢j for any Bi⁢j≠0. Notice that Dp⁢q=Bp⁢q-λ⁢Pp⁢q=λ-λ⋅1=0. Note that since every row and column of B and P sums to 1, that every row and column of D=B-λ⁢P sums to 1-λ. Define B′=11-λ⁢D. Then every row and column of B′ sums to 1, so B′ is doubly stochastic. Rearranging, we have B=λ⁢P+(1-λ)⁢B′. Clearly Bi⁢j=0 implies that Pi⁢j=0 which implies that Bi⁢j′=0, so the zero entries of B′ are a superset of those of B′. But notice that Bp⁢q′=11-λ⁢Dp⁢q=0, so the zero entries of B′ are a strict superset of those of B. We have decomposed B into a convex combination of a permutation matrix and another doubly stochastic matrix with strictly more zero entries than B. Thus we may apply this procedure repeatedly on the doubly stochastic matrix obtained from the previous step, and the number of zero entries will increase with each step. Since B has at most n2 nonzero entries, we will obtain a convex combination of permutation matrices in at most n2 steps. Thus B is indeed expressible as a convex combination of permutation matrices.