Large Very Dense Subgraphs in a Stream of Edges (original) (raw)

Subgraphs of Dense Random Graphs with Specified Degrees

Combinatorics, Probability and Computing, 2011

Let d = (d 1 , d 2 , . . . , d n ) be a vector of non-negative integers with even sum. We prove some basic facts about the structure of a random graph with degree sequence d, including the probability of a given subgraph or induced subgraph.

Concentration and regularization of random graphs

Random Structures & Algorithms, 2017

This paper studies how close random graphs are typically to their expectations. We interpret this question through the concentration of the adjacency and Laplacian matrices in the spectral norm. We study inhomogeneous Erdös-Rényi random graphs on n vertices, where edges form independently and possibly with different probabilities pij. Sparse random graphs whose expected degrees are o(log n) fail to concentrate; the obstruction is caused by vertices with abnormally high and low degrees. We show that concentration can be restored if we regularize the degrees of such vertices, and one can do this in various ways. As an example, let us reweight or remove enough edges to make all degrees bounded above by O(d) where d = max npij. Then we show that the resulting adjacency matrix A concentrates with the optimal rate: A −E A = O(√ d). Similarly, if we make all degrees bounded below by d by adding weight d/n to all edges, then the resulting Laplacian concentrates with the optimal rate: L(A) − L(E A) = O(1/ √ d). Our approach is based on Grothendieck-Pietsch factorization, using which we construct a new decomposition of random graphs. We illustrate the concentration results with an application to the community detection problem in the analysis of networks.

Identifying sparse and dense sub-graphs in large graphs with a fast algorithm

EPL (Europhysics Letters), 2014

Identifying the nodes of small sub-graphs with no a priori information is a hard problem. In this work, we want to find each node of a sparse sub-graph embedded in both dynamic and static background graphs, of larger average degree. We show that exploiting the summability over several background realizations of the Estrada-Benzi communicability and the Krylov approximation of the matrix exponential, it is possible to recover the sub-graph with a fast algorithm with computational complexity O(N n). Relaxing the problem to complete sub-graphs, the same performance is obtained with a single background. The worst case complexity for the single background is O( n log(n)).

Subgraphs in random networks

Physical review. E, Statistical, nonlinear, and soft matter physics, 2003

Understanding the subgraph distribution in random networks is important for modeling complex systems. In classic Erdos networks, which exhibit a Poissonian degree distribution, the number of appearances of a subgraph G with n nodes and g edges scales with network size as <G> approximately N(n-g). However, many natural networks have a non-Poissonian degree distribution. Here we present approximate equations for the average number of subgraphs in an ensemble of random sparse directed networks, characterized by an arbitrary degree sequence. We find scaling rules for the commonly occurring case of directed scale-free networks, in which the outgoing degree distribution scales as P(k) approximately k(-gamma). Considering the power exponent of the degree distribution, gamma, as a control parameter, we show that random networks exhibit transitions between three regimes. In each regime, the subgraph number of appearances follows a different scaling law, <G> approximately Nalpha, ...

Random graphs with arbitrary degree distributions and their applications

Physical review. E, Statistical, nonlinear, and soft matter physics, 2001

Recent work on the structure of social networks and the internet has focused attention on graphs with distributions of vertex degree that are significantly different from the Poisson degree distributions that have been widely studied in the past. In this paper we develop in detail the theory of random graphs with arbitrary degree distributions. In addition to simple undirected, unipartite graphs, we examine the properties of directed and bipartite graphs. Among other results, we derive exact expressions for the position of the phase transition at which a giant component first forms, the mean component size, the size of the giant component if there is one, the mean number of vertices a certain distance away from a randomly chosen vertex, and the average vertex-vertex distance within a graph. We apply our theory to some real-world graphs, including the world-wide web and collaboration graphs of scientists and Fortune 1000 company directors. We demonstrate that in some cases random gra...

Largest sparse subgraphs of random graphs

Electronic Notes in Discrete Mathematics, 2011

For the Erdős-Rényi random graph G n,p , we give a precise asymptotic formula for the sizeα t (G n,p) of a largest vertex subset in G n,p that induces a subgraph with average degree at most t, provided that p = p(n) is not too small and t = t(n) is not too large. In the case of fixed t and p, we find that this value is asymptotically almost surely concentrated on at most two explicitly given points. This generalises a result on the independence number of random graphs. For both the upper and lower bounds, we rely on large deviations inequalities for the binomial distribution.

Densification arising from sampling fixed graphs

2008

During the past decade, a number of different studies have identified several peculiar properties of networks that arise from a diverse universe, ranging from social to computer networks. A recently observed feature is known as network densification, which occurs when the number of edges grows much faster than the number of nodes, as the network evolves over time. This surprising phenomenon has been empirically validated in a variety of networks that emerge in the real world and mathematical models have been recently proposed to explain it. Leveraging on how real data is usually gathered and used, we propose a new model called Edge Sampling to explain how densification can arise. Our model is innovative, as we consider a fixed underlying graph and a process that discovers this graph by probabilistically sampling its edges. We show that this model possesses several interesting features, in particular, that edges and nodes discovered can exhibit densification. Moreover, when the node degree of the fixed underlying graph follows a heavy-tailed distribution, we show that the Edge Sampling model can yield power law densification, establishing an approximate relationship between the degree exponent and the densification exponent. The theoretical findings are supported by numerical evaluations of the model. Finally, we apply our model to real network data to evaluate its performance on capturing the previously observed densification. Our results indicate that edge sampling is indeed a plausible alternative explanation for the densification phenomenon that has been recently observed.

Random subgraphs of finite graphs: I. The scaling window under the triangle condition

Random Structures and Algorithms, 2005

We study random subgraphs of an arbitrary finite connected transitive graph G obtained by independently deleting edges with probability 1 − p. Let V be the number of vertices in G, and let Ω be their degree. We define the critical threshold p c = p c (G, λ) to be the value of p for which the expected cluster size of a fixed vertex attains the value λV 1/3 , where λ is fixed and positive. We show that for any such model, there is a phase transition at p c analogous to the phase transition for the random graph, provided that a quantity called the triangle diagram is sufficiently small at the threshold p c . In particular, we show that the largest cluster inside a scaling window of size |p − p c | = Θ(Ω −1 V −1/3 ) is of size Θ(V 2/3 ), while below this scaling window, it is much smaller, of order O( −2 log(V 3 )), with = Ω(p c − p). We also obtain an upper bound O(Ω(p − p c )V ) for the expected size of the largest cluster above the window. In addition, we define and analyze the percolation probability above the window and show that it is of order Θ(Ω(p − p c )). Among the models for which the triangle diagram is small enough to allow us to draw these conclusions are the random graph, the n-cube and certain Hamming cubes, as well as the spread-out n-dimensional torus for n > 6.

The Probability of Non-Existence of a Subgraph in a Moderately Sparse Random Graph

Combinatorics, Probability and Computing

We develop a general procedure that finds recursions for statistics counting isomorphic copies of a graph G0 in the common random graph models calG{\cal G}calG(n,m) and calG{\cal G}calG(n,p). Our results apply when the average degrees of the random graphs are below the threshold at which each edge is included in a copy of G0. This extends an argument given earlier by the second author for G0=K3 with a more restricted range of average degree. For all strictly balanced subgraphs G0, our results give much information on the distribution of the number of copies of G0 that are not in large ‘clusters’ of copies. The probability that a random graph in calG{\cal G}calG(n,p) has no copies of G0 is shown to be given asymptotically by the exponential of a power series in n and p, over a fairly wide range of p. A corresponding result is also given for calG{\cal G}calG(n,m), which gives an asymptotic formula for the number of graphs with n vertices, m edges and no copies of G0, for the applicable range of m. An example...

Provable and Practical Approximations for the Degree Distribution using Sublinear Graph Samples

Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW '18, 2018

The degree distribution is one of the most fundamental properties used in the analysis of massive graphs. There is a large literature on graph sampling, where the goal is to estimate properties (especially the degree distribution) of a large graph through a small, random sample. Estimating the degree distribution of real-world graphs poses a significant challenge, due to their heavy-tailed nature and the large variance in degrees. We design a new algorithm, SADDLES, for this problem, using recent mathematical techniques from the field of sublinear algorithms. The SADDLES algorithm gives provably accurate outputs for all values of the degree distribution. For the analysis, we define two fatness measures of the degree distribution, called the h-index and the z-index. We prove that SADDLES is sublinear in the graph size when these indices are large. A corollary of this result is a provably sublinear algorithm for any degree distribution bounded below by a power law. We deploy our new algorithm on a variety of real datasets and demonstrate its excellent empirical behavior. In all instances, we get extremely accurate approximations for all values in the degree distribution by observing at most 1% of the vertices. This is a major improvement over the state-of-the-art sampling algorithms, which typically sample more than 10% of the vertices to give comparable results. We also observe that the h and z-indices of real graphs are large, validating our theoretical analysis.