100% found this document useful (2 votes)
32 views

Lectures On Complex Networks Sergey Dorogovtsev 2024 Scribd Download

ebook

Uploaded by

feyershreky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
32 views

Lectures On Complex Networks Sergey Dorogovtsev 2024 Scribd Download

ebook

Uploaded by

feyershreky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Full download ebook at ebookname.

com

Lectures on Complex Networks Sergey Dorogovtsev

https://ebookname.com/product/lectures-on-complex-networks-
sergey-dorogovtsev/

OR CLICK BUTTON

DOWLOAD NOW

Download more ebook from https://ebookname.com


More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Complex Networks Principles Methods and Applications


1st Edition Vito Latora

https://ebookname.com/product/complex-networks-principles-
methods-and-applications-1st-edition-vito-latora/

On Time Lectures on Models of Equilibrium Churchill


Lectures in Economics 1st Edition Peter A. Diamond

https://ebookname.com/product/on-time-lectures-on-models-of-
equilibrium-churchill-lectures-in-economics-1st-edition-peter-a-
diamond/

Lectures on abstract algebra Elman R

https://ebookname.com/product/lectures-on-abstract-algebra-
elman-r/

Graph theory and complex networks an introduction 1st


ed Edition Steen

https://ebookname.com/product/graph-theory-and-complex-networks-
an-introduction-1st-ed-edition-steen/
Sergey s HTML5 CSS3 Quick Reference 2nd Edition Sergey
Mavrody

https://ebookname.com/product/sergey-s-html5-css3-quick-
reference-2nd-edition-sergey-mavrody/

Lectures on Public Economics Updated Edition Atkinson

https://ebookname.com/product/lectures-on-public-economics-
updated-edition-atkinson/

Lectures on Quantum Mechanics 2nd Edition Ashok Das

https://ebookname.com/product/lectures-on-quantum-mechanics-2nd-
edition-ashok-das/

Lectures on Quantum Groups 2nd Edition Pavel Etingof

https://ebookname.com/product/lectures-on-quantum-groups-2nd-
edition-pavel-etingof/

Lectures on the Ten Oxherding Pictures Yamada Mumon

https://ebookname.com/product/lectures-on-the-ten-oxherding-
pictures-yamada-mumon/
OXFORD MASTER SERIES IN STATISTICAL,
COMPUTATIONAL, AND THEORETICAL PHYSICS
OXFORD MASTER SERIES IN PHYSICS

The Oxford Master Series is designed for final year undergraduate and beginning graduate students in physics and
related disciplines. It has been driven by a perceived gap in the literature today. While basic undergraduate physics
texts often show little or no connection with the huge explosion of research over the last two decades, more advanced
and specialized texts tend to be rather daunting for students. In this series, all topics and their consequences are
treated at a simple level, while pointers to recent developments are provided at various stages. The emphasis is on
clear physical principles like symmetry, quantum mechanics, and electromagnetism which underlie the whole of
physics. At the same time, the subjects are related to real measurements and to the experimental techniques and
devices currently used by physicists in academe and industry. Books in this series are written as course books, and
include ample tutorial material, examples, illustrations, revision points, and problem sets. They can likewise be used
as preparation for students starting a doctorate in physics and related fields, or for recent graduates starting research
in one of these fields in industry.

CONDENSED MATTER PHYSICS


1. M.T. Dove: Structure and dynamics: an atomic view of materials
2. J. Singleton: Band theory and electronic properties of solids
3. A.M. Fox: Optical properties of solids, second edition
4. S.J. Blundell: Magnetism in condensed matter
5. J.F. Annett: Superconductivity, superfluids, and condensates
6. R.A.L. Jones: Soft condensed matter
17. S. Tautz: Surfaces of condensed matter
18. H. Bruus: Theoretical microfluidics
19. C.L. Dennis, J.F. Gregg: The art of spintronics: an introduction
21. T.T. Heikkilä: The physics of nanoelectronics: transport and fluctuation phenomena at low temperatures
22. M. Geoghegan, G. Hadziioannou: Polymer electronics

ATOMIC, OPTICAL, AND LASER PHYSICS


7. C.J. Foot: Atomic physics
8. G.A. Brooker: Modern classical optics
9. S.M. Hooker, C.E. Webb: Laser physics
15. A.M. Fox: Quantum optics: an introduction
16. S.M. Barnett: Quantum information

PARTICLE PHYSICS, ASTROPHYSICS, AND COSMOLOGY


10. D.H. Perkins: Particle astrophysics, second edition
11. Ta-Pei Cheng: Relativity, gravitation and cosmology, second edition

STATISTICAL, COMPUTATIONAL, AND THEORETICAL PHYSICS


12. M. Maggiore: A modern introduction to quantum field theory
13. W. Krauth: Statistical mechanics: algorithms and computations
14. J.P. Sethna: Statistical mechanics: entropy, order parameters, and complexity
20. S.N. Dorogovtsev: Lectures on complex networks
Lectures on Complex Networks

S. N. Dorogovtsev

University of Aveiro
and
Ioffe Institute, St Petersburg

CLARENDON PRESS • OXFORD

2010
Preface

This text is a very concise modern introduction to the science of net-


works, based on lectures which I gave at several universities to students
and non-specialists. My aim is to introduce a reader without serious
background in mathematics or physics to the world of networks.
The term 'complex networks' is young. It came to use in the late 1990s
when researchers from very distinct sciences—computer scientists, biol-
ogists, sociologists, physicists, and mathematicians—started to inten-
sively study diverse real-world networks and their models. This notion
refers to networks with more complex architectures than, say, a uni-
formly random graph with given numbers of nodes and links. Usually,
in these complex architectures, hubs—strongly connected nodes—play
a pivotal role. In this sense, the great majority of real-world networks
are complex.
The field of complex networks is currently a very hot and attractive
research area. The reader may ask: why all the fuss around networks in
fundamental sciences like physics? I prefer the question: why are net-
works so interesting? The answer is not only the tremendous importance
of the Internet and cellular networks. The point is that the geometry
and structural organization of these and many other networks are very
different from those of other well-studied objects—lattices. Networks
and their function cannot be understood based on theories developed
for finite-dimensional lattices, and a new vision is needed.
On the other hand, random networks are objects of statistical me-
chanics. So the course is essentially based on the standard apparatus of
classical statistical physics. There are already several excellent popular
science books and serious reference volumes on complex networks, in-
cluding books on particular types of networks. The introductory lectures
for beginners fill the existing gap between these two kinds of literature.
The intended audience is mostly undergraduate and postgraduate stu-
dents in physics and other natural science disciplines. There is some risk
that inevitable oversimplification will only create an illusion of under-
standing. I believe however that this illusion is not too dangerous and
may even be stimulating. Moreover, I suggest that the strict selection of
material and discussion of recent results and fresh ideas will make this
thin book useful, even for many specialists in networks. The reader who
needs more detailed information and rigorous derivations can afterwards
refer to more difficult reference books and original papers.
I am deeply indebted to my friends and colleagues in Portugal for
their encouragement and advice, first and foremost to Anna Rozhnova,
Alexander Goltsev, Alexander Povolotsky, Alexander Samukhin, Anto-
nio Luis Ferreira, Fernao Vistulo de Abreu, Joao Gamma Oliveira, Jose
Fernando Mendes, Gareth Baxter, Massimo Ostilli, Rui Americo da
Costa, Zhang Peng, and Sooyeon Yoon. I would like to warmly thank
Adilson Motter, Agata Fronczak, Alessandro Vespignani, Andre Krzy-
wicki, Ay§e Erzan, Bartlomiej Waclaw, Bela Bollobas, Bo Soderberg,
Bosiljka Tadic, Bruno Goncalves, Byungnam Kahng, David Mukamel,
Des Johnston, Dietrich Stauffer, Dmitri Krioukov, Dmitri Volchenkov,
Doochul Kim, Florent Krzakala, Geoff Rodgers, Geoffrey Canright,
Ginestra Bianconi, Hildegard Meyer-Ortmanns, Hyunggyu Park, Jae
Dong Noh, Janos Kertesz, Jorge Pacheco, Jose Ramasco, Kwang-Il Goh,
Laszlo Barabasi, Marian Bogufia, Mark Newman, Maksim Kitsak, Mar-
tin Rosvall, Masayuki Hase, Matteo Marsili, Matti Peltomaki, Michael
Fisher, Michel Bauer, Mikko Alava, Kim Sneppen, Kimmo Kaski, Oliver
Riordan, Olivier Benichou, Pavel Krapivsky, Peter Grassberger, Piotr
Fronczak, Romualdo Pastor-Satorras, Santo Fortunato, Sergey Maslov,
Shlomo Havlin, Sid Redner, Stefan Bornholdt, Stephane Coulomb, Ta-
mas Vicsek, Zdzislaw Burda, and Zoltan Toroczkai for numerous helpful
and stimulating discussions and communications. Finally, my warmest
thanks to the superb editorial and production staff at Oxford University
Press for their invaluable guidance, encouragement, and patience.

Aveiro S.N.D.
July 2009
Contents

First steps towards networks 1


1.1 Euler's graph 1
1.2 Examples of graphs 2
1.3 Shortest path length 3
1.4 Lattices and fractals 3
1.5 Milgram's experiment 4
1.6 Directed networks 4
1.7 What are random networks? 5
1.8 Degree distribution 6
1.9 Clustering 7
1.10 Adjacency matrix 8

Classical random graphs 9


2.1 Two classical models 9
2.2 Loops in classical random graphs 11
2.3 Diameter of classical random graphs 12
2.4 The birth of a giant component 13
2.5 Finite components 15

Small and large worlds 17


3.1 The world of Paul Erdos 17
3.2 Diameter of the Web 18
3.3 Small-world networks 18
3.4 Equilibrium versus growing trees 20
3.5 Giant connected component at birth is fractal 22
3.6 Dimensionality of a brush 23

From the Internet to cellular nets 25


4.1 Levels of the Internet 25
4.2 The WWW 28
4.3 Cellular networks 30
4.4 Co-occurrence networks 31

Uncorrelated networks 33
5.1 The configuration model 33
5.2 Hidden variables 34
5.3 Neighbour degree distribution 35
5.4 Loops in uncorrelated networks 35
5.5 Statistics of shortest paths 37
5.6 Uncorrelated bipartite networks 38

6 Percolation and epidemics 41


6.1 Connected components in uncorrelated networks 41
6.2 Ultra-resilience phenomenon 43
6.3 Finite-size effects 45
6.4 fc-cores 46
6.5 Epidemics in networks 48

7 Self-organization of networks 51
7.1 Random recursive trees 51
7.2 The Barabasi-Albert model 52
7.3 General preferential attachment 53
7.4 Condensation phenomena 57
7.5 Accelerated growth 58
7.6 The BKT transition 59
7.7 Deterministic graphs 60

8 Correlations in networks 61
8.1 Degree-degree correlations 61
8.2 How to measure correlations 62
8.3 Assortative and disassortative mixing 62
8.4 Why are networks correlated? 64
8.5 Degree correlations and clustering 66

9 Weighted networks 67
9.1 The strength of weak ties 67
9.2 World-wide airport network 69
9.3 Modelling weighted networks 70

10 Motifs, cliques, communities 73


10.1 Cliques in networks 73
10.2 Statistics of motifs 74
10.3 Modularity 76
10.4 Detecting communities 78
10.5 Hierarchical architectures 82

11 Navigation and search 83


11.1 Random walks on networks 83
11.2 Biased random walks 85
11.3 Kleinberg's problem 86
11.4 Navigability 88
11.5 Google PageRank 90

12 Traffic 93
12.1 Traffic in the Internet 93
12.2 Congestion 95
12.3 Cascading failures 97
13 Interacting systems on networks 99
13A The Ising model on networks 99
13.2 Critical phenomena 101
13.3 Synchronization 102
13.4 Games on networks 108
13.5 Avalanches as branching processes 109

14 Optimization 113
14.1 Critique of preferential attachment 113
14.2 Optimized trade-offs 114
14.3 The power of choice 115

15 Outlook 119

Further reading 121

References 123

Index 133
First steps towards
networks

A network (or a graph) is a set of nodes connected by links.1 In princi-


ple, any system with coupled elements can be represented as a network, 1.1 Euler's graph 1

so that our world is full of networks. Specific networks—regular and 1.2 Examples of graphs 2
disordered lattices—were main objects of study in physics and other 1.3 Shortest path length 3
1.4 Lattices and fractals 3
natural sciences up to the end of the 20th century. It is already clear,
however, that most natural and artificial networks, from the Internet 1.5 Milgram's experiment 4
to biological and social nets, by no means resemble lattices. The path 1.6 Directed networks 4

to understanding these networks began in St Petersburg in 1735 with a 1.7 What are random
networks? 5
mathematical problem formulated on a very small graph.
1.8 Degree distribution 6
1.9 Clustering 7
1.10 Adjacency matrix 8
1.1 Euler's graph
This small undirected graph (Fig. 1.1) with multiple links was considered The terms 'vertices' and 'edges' are
by legendary Swiss born mathematician Leonhard Euler (1707-1783). more standard in graph theory.
Young Euler was invited to St Petersburg in 1727 and worked there
until his death, with a 25-year break (1741-1766) in Berlin. In 1735
Euler made what is now regarded as the birth point of graph theory:
he solved the Konigsberg bridge problem. The structure of all possible
paths within Konigsberg in Euler's time is represented in the form of a z Much later, in 1873, Carl Hierholzer
graph in Fig. l.L The nodes of the graph are separate land masses in old proved that a walk of this kind exists if
Konigsberg, and its links are the bridges between these pieces of land. and only if every node in a graph has
an even number of links.
Could a pedestrian walk around Konigsberg, crossing each bridge only
once? In other words, is it possible to walk this graph passing through
each link only once? Euler proved that such a walk is impossible.2
In graph theory the total number of connections of a node is called
degree (it is sometimes called connectivity in physics). Consequently,
Euler's graph has three nodes of degree 3 and one of degree 5. According
to another definition, a simple graph does not have multiple links and
loops of length 1. Otherwise, the graph is a multi-graph. Thus Euler's
Fig. 1.1 Euler's graph for the
graph is a multi-graph. Degree is a local characteristic. Any description
Konigsberg bridge problem. The undi-
of the structure of an entire network or of its parts is essentially based rected links of this graph are seven
on two notions: a path and a loop. A path is an alternating sequence bridges of old Konigsberg connect-
of adjacent nodes and links with no repeated nodes. A cycle (in graph ing four separate land masses—nodes:
Kneiphof island and the banks of the
theory) or a loop (in physics) is a closed path where only the start and Pregel river. As Euler proved in 1735,
end nodes coincide. Note that it is the presence of loops in Euler's graph there is no walk on this graph that
that makes the Konigsberg bridge problem fascinating and profound. passes through each link only once.
2 First steps towards networks

Fig. 1.2 (a) Fully connected or com-


plete graph, (b) Star, (c) Comb graph,
(d) Brush, (e) Petersen graph which is
the (3, 5) cage graph, where 3 is the de-
gree of its nodes and 5 is the length of
^- 9 *

the shortest cycle, (f) The simplest hy-


pergraph: three nodes interconnected
by a single hyperedge. (a) (b) (c) (d) (e) (f)

Graphs without loops—trees—are usually more simple to analyse. For


example, a one-dimensional chain is a tree. The numbers of nodes N
(which we call the 'network size') and links L of a tree satisfy a simple
relation L = N — 1.

1.2 Examples of graphs


A few simple graphs are shown in Fig. 1.2. A complete graph, Fig. 1.2
(a), which is widely used in exactly solvable models in physics, has all
nodes interconnected. In a star graph (which is the most compact tree),
Fig. 1.2 (b), the maximum separation between nodes is 2. Combs and
brushes, containing numerous chains, are shown in Figs. 1.2 (b) and (c),
respectively. Due to the chains, random walks on these graphs essentially
differ from those on lattices. The next example—the Petersen graph in
Fig. 1.2 (e)—is one of the so-called cage graphs. These graphs are reg-
ular in the sense that each node in the graph has the same number of
connections. A (q, g)-cage is a graph with the minimum possible number
of nodes for a given node degree q and a given length g of the short-
est cycle. In synchronized systems the cage graph architectures provide
optimal synchronization. The notion of a graph can be generalized. In
one of the direct generalizations—hypergraphs—generalized links ('hy-
(a) peredges') connect triples, quadruples, etc. of nodes, see Fig. 1.2 (f).
The next two important regular graphs, a Cayley tree and a Bethe
Fig. 1.3 (a) 3-regular Cayley tree, (b) lattice, shown in Fig. 1.3, will be extensively discussed in these lectures.
Bethe lattice. Notice the absence of a These are very different networks. A Cayley tree has a boundary, which
border.
contains a finite fraction of all nodes—dead ends—and a centre (a root).
A Bethe lattice is obtained from an infinite Cayley tree by formal exclu-
sion of dead ends. As a result, all nodes in a Bethe lattice are equivalent,
(a) so there is neither a boundary nor a centre. To get rid of boundaries,
physicists often treat these graphs as containing infinite loops.
Collaboration and many other networks may have not one but two
types of nodes. For example a network of scientific coauthorships con-
tains nodes-authors and nodes-papers. Each scientific paper in this
(b)
graph is linked to all of its authors. As a result we have a bipartite
graph, shown in Fig. 1.4. These networks are actually hypergraphs,
where a 'node-paper' together with its connections plays the role of a
Fig. 1.4 A bipartite graph (a) and one
of its one-mode projections (b). 'hyperedge'. A one-mode projection of a bipartite graph, explained in
Fig. 1.4, is less informative. Many empirical maps of networks are only
one-mode projections of as yet unexplored real multi-partite networks.
1.3 Shortest path length 3

1.3 Shortest path length


A distance lij between two nodes i and j in a network is the length of the
shortest path between them through the network. Two characteristics
describe the separation between nodes in an entire network: the mean
internode distance in the network and its diameter. The mean internode
distance 1 (also mean geodesic distance) is the average of lij over all those
pairs of nodes (i,j) between which there is at least one connecting path.
(Note that, in general, networks will contain disconnected parts.) The
diameter Id is the maximum internode distance in a network. We will
demonstrate that in many large networks, there is no great difference
between these two quantities.
It is the dependence of I or I∑, on network size N that is particularly
important for characterization of network architectures. In networks
with a compact structure, which we discuss, 1(N) grows with N slower
than in more loose structures—lattices.

1.4 Lattices and fractals

In finite-dimensional regular and disordered lattices (these lattices are


supposed to have no long-range bonds), the size dependence 1{N) is
power-law,
Z~N1- . (i.i)
Here d is the dimensionality of a lattice—an integer number. In con-
trast, fractals (Fig. 1.5 explains this notion) may have non-integer di-
mensionalities. In fractals, I ~ N1/ -, where df is called a fractal or
Hausdorff dimension. Note that there is no great difference between
finite-dimensional lattices and fractals in respect of the dependence 1{N):
they are both 'large worlds'. For example, for a two-dimensional lattice
of 1012 nodes, J ~ 106. Only when d or df tend to infinity does this de-
pendence become non-power-law. Note that the fractal dimension can
be found even if (large) N is fixed. Simply count down the number
of nodes n{t) within a distance I from a given node. In fractals (and
lattices) this number is n ~ ldf.
For the sake of comparison, let us estimate n{l) for the q-regular
Bethe lattice and Cayley tree, Fig. 1.3. As is common, instead of the Fig. 1.5 This transformation gen-
node degree q, we use another number—branching b = q — 1. Then erates a fractal of dimension df =
n = 1 + q(l + b + b2 + ... + be~l·) = qb*~1, where we assume that n is In 6/ In 2 = 2.585 At each step, ev-
large. Thus, n ~ b£, in contrast to lattices and fractals, and so, for a ery link is substituted by the cluster of
six links. So, while the diameter of the
Cayley tree, we have graph (here it is the distance from its
left node to the right one) is doubled,
the total number L of links increases
by a factor of six. If i is a step num-
which grows with N much slower than for any finite-dimensional lattice. ber, then 1 ~ Id = 2* and L = 6*, and
In this respect, one may say that Cayley trees and Bethe lattices are ,ι λγ
consequently N ~ L = t
τ -7ln6/hi2

infinite-dimensional objects—'small worlds'. If, for example, a Cayley


tree has 1012 nodes of degree, say, 5, then I ~ 10, which is dramati-
cally smaller than in the previous example for a two-dimensional lattice.
4 First steps towards networks

Generally, the term small-world phenomenon refers to a slower growth


of ~t{N) than any positive power of N (and to a more rapidly growing
n(i) than any power of £). The networks showing this phenomenon are
called small worlds. Most of the explored real-world networks, which we
will discuss, have compact architectures of this kind.

1.5 Milgram's experiment


Milgram's paper in Psychology To- The small-world phenomenon was first observed in a social network. In
day with the results of his experiment 1967 prominent social psychologist Stanley Milgram (1933-1984) per-
was entitled 'The small world problem'
[125].
formed a seminal experiment for measuring distances in a network of
acquaintances in the United States.3 The question was: how many inter-
In fact, Milgram made two attempts. mediate social links separate two randomly selected (and geographically
The first one, with starting points in separated) individuals?
Wichita, Kansas and a target person in
Sharon, Massachusetts, resulted in only
The idea of the experiment was elegant, see Fig. 1.6. Milgram chose
three finished chains, but the second at- two locations: Omaha, Nebraska and Boston.4 A target person was cho-
tempt turned out to be more successful. sen at random in Boston. A large enough number of randomly selected
residents of Omaha received a letter with the following instructions:
Omaha Boston (i) If you know the target person On a personal basis' (his/her name
and address were enclosed), send the letter directly to him/her.
(ii) Otherwise mail a copy of this instruction to your 'personal' acquain-
tance (someone you know on a first name basis) who is more likely
than you to know the target person.
An essential fraction of letters approached the target, after passing
through only, on average, 5.5 social links; which is a surprisingly small
number. This is what is known as the 'six degrees of separation'. One
Fig. 1.6 How Stanley Milgram
scanned a net of acquaintances in the
may think that the real shortest path length should be even smaller,
United States. Notice that some chains since the experiment revealed only a small fraction of all possible chains
of acquaintances were broken off. between starting persons and the target.
It is dangerous however to believe sociologists too much: (i) they
For an intelligent critique of the re- have to work with poorly defined and subjective material, (ii) they have
sults of Milgram's experiment see Kle-
infeld [113]. Note one of Kleinfeld's ar- to use poor statistics. The details of the experiment and the resulting
guments: Our desire to believe we live number, the 'six degrees', were criticised,5 but nobody denies the essence
in a "small, small world'". of Milgram's observation—the impressive smallness of the world of social
relations.6
36 years after Milgram his experiment
was repeated on a greater scale by us-
ing the modern opportunity of email
(Dodds, Muhamad, and Watts 2003)
1.6 Directed networks
[70]. Volunteers started 24163 chains
aimed at reaching 18 target persons in In directed networks, at least some fraction of connections are directed.
13 countries. Only 384 (!) of the chains It seems that the first extensively studied nets of this type were networks
were completed, which indicates that
the global social world is rather discon-
of citations in scientific papers.7 The nodes of a citation network are sci-
nected. On the other hand, the success- entific papers, and the directed links are citations of one paper within
ful chains turned out to be an average another (Fig. 1.7). New links in the citation networks emerge only be-
of about 4 links, i.e. even less than 'six tween new nodes and already existing ones; new connections between
degrees'.
existing nodes are impossible (one cannot update an already published
7 Price (1965) [152]. paper). In graph theory networks of this kind are called recursive graphs.
1.7 What are random networks? 5

Furthermore, all links in a network of citations have the same direction—


to older papers. This is valid, of course, for publications in paper form, The http://arXiv.org is one of the
that is in printed journals and in books. In contrast, papers in many largest electronic archives, used mostly
by physicists but also mathematicians
electronic archives may be updated. I can update my old works in the
and computer scientists. Most papers
http://arXiv.org electronic archive and change their lists of references on the statistical mechanics of net-
to cite more recent papers.8 So, some links in the citation networks of works can be found in the cond-mat

these electronic archives may be oppositely directed. and physics sections of this archive.

1.7 What are random networks?

Even if we ignore the directedness of connections, the apparently ran-


dom network in Fig. 1.7 differs from the graphs shown in Figs. 1.2 and
1.3. But then, what is a random network from the point of view of
a physicist or a mathematician? Note that, strictly speaking, the no-
tion of randomness is not applicable to a single finite graph. Indeed,
by inspecting this finite graph, one cannot find whether it was gener-
ated by a deterministic algorithm or by a non-deterministic one. In the
spirit of statistical physics, a random network is not a single graph but a Fig. 1.7 Network of citations in scien-
statistical ensemble. This ensemble is defined as a set of its members— tific papers.
particular graphs—where each member has its own given probability of
realization, that is its statistical weight.9 By this definition, a given ran- More rigorously, the statistical
dom network is some graph with one probability, another graph with weights are proportional to the realiza-
tion probabilities. However, the pro-
another probability, and so on. To obtain some quantity, characteriz-
portionality coefficient is an arbitrary
ing a random network, in principle we should collect the full statistics constant.
for all members of the statistical ensemble. To obtain the mean value
of some quantity for a random network, we average this quantity over Note the difference between the

all members of the ensemble—over all realizations—taking into account three kinds of scientist. As a rule, em-
pirical researchers and experimenters
their statistical weights.10
collect statistics for a single realiza-
The first example of a random graph is a classical random graph tion of a random network. Scientists
model, shown in Fig. 1.8. This is the Gn,p or Gilbert model defined as using numerical simulations (computer
follows. Take a given number N of labelled nodes, say i = 1,2,3,...,N, experiments) investigate a few or a rel-
atively small number of realizations.
and interlink each pair of nodes with a given probability p. If N = 3,
Theorists consider all, or at least all es-
this gives eight possible configurations with the realization probabili- sential, members of the statistical en-
semble of a random network.

/, A
P(l-P)2 p2(l-p)
2 2

A, A
P(l-P)2 p2(l-p) Fig. 1.8 The Gilbert model of a
2 2 2
random graph (the Gj\r,p model) for
• N=3 with realization probabilities rep-

1* *3 A, A, resented for all configurations.


graphs in each column are isomorphic,
All

that is they can be transformed into


(1-P)3 p(1-p)2 p2(l-p) P3 each other by relabelling their nodes.
ties shown in the figure. Note that this graph is 'labelled' (has labelled
nodes). As in classical statistical mechanics, where particles are distin-
guishable (i.e., can be labelled), networks are usually considered to be
labelled, which is important for the resulting ensemble.
Physicists divide statistical ensembles into two classes—equilibrium
and non-equilibrium—which correspond to equilibrium and non-equilib-
rium systems. This division is also relevant for random networks. For
example, the ensemble presented in Fig. 1.8 is equilibrium—its statistical
weights do not evolve. In non-equilibrium (evolving) ensembles, statisti-
cal weights of configurations vary with time, and the set of configurations
may also vary. Growing networks are obviously non-equilibrium. How-
ever, even among networks with a fixed number number of nodes, one
can find non-equilibrium nets.
Suppose now that the number of nodes in a random network ap-
proaches infinity. Then, as a rule, the statistics collected for one member
of the ensemble almost surely coincides with the statistics for the entire
ensemble—self-averaging takes place. In other words, a relative number
of ensemble members with non-typical properties is negligibly small. It
turns out that the self-averaging property is very common in disordered
systems. So the features of many large, but finite individual graphs can
be accurately described in terms of statistical ensembles. It is techni-
cally easier for a theoretical physicist to analyse a statistical ensemble
than a single graph, and so the self-averaging is really useful.

1.8 Degree distribution


The degree distribution P(q) is the probability that a randomly chosen
node in a random network has degree q:

pι←ημ. (1.3)
Here {N(q)) is the average number of nodes of degree q in the network,
where the averaging is over the entire statistical ensemble. We assume
that the total number of nodes in each member of the ensemble is the

same, N = ∑9(N(q))· An empirical researcher, who studies a single


graph, say graph g, measures the frequency of occurrence of nodes with
degree q in this graph: Pg(q) = Ng(q)/N. Here Ng(q) is the number
of nodes of degree q in graph g. This quantity is also usually called a
degree distribution. Pg(q) approaches P(q) in the infinite network limit.
The degree distribution is the simplest statistical characteristic of a
random network, and it is usually only the first step towards the descrip-
tion of a net. Remarkably, in many situations knowledge of the degree
distribution is sufficient for the understanding of a network and the pro-
cesses taking place on it. In principle, the entire degree distribution is
significant: its low- and high-degree parts are important for different
network properties and functions. In classical random graphs such as
shown in Fig. 1.8, degree distributions decay quite rapidly, P(q) ~ 1/q!
1.9 Clustering 7

for large q (see the next lecture). All their moments ∑ qnP{q) are fi-
nite even as the network size approaches infinity, and so the mean degree The value of the moment

(q) = ∑ qP(q) is a typical scale for degrees. There are practically no ∑ qnP(q) here is determined by
strongly connected hubs in these networks. the upper limit of the sum. In an
infinite network, this limit approaches
In contrast, numerous real-world networks, from the Internet to cel- infinity. So, if exponent 7 < n + 1,
lular nets, have slowly decaying degree distributions, where hubs occur then the nth and higher moments of
with noticeable probability and play essential roles. Higher moments the distribution diverge.
of the degree distributions of these networks diverge if we tend the size
of the network to infinity. A dependence with power-law asymptotics More strictly, the term 'scale-free'
P(q) ~ q-7 at large q gives a standard example of a slowly decaying refers to the following property of a
power-law distribution q 1. A rescal-
degree distribution.11 The power-law distributions are also called scale- ing of <j by a constant, c —>• cq, only
free and networks with these distributions are called scale-free networks. has the effect of multiplication by a con-
This term implies the absence of a typical node degree in the network.12 stant factor: (c<j)~^' = c~^<q~^1'.

1.9 Clustering
Clustering is about how the nearest neighbours of a node in a network
are interconnected, so it is a non-local characteristic of a node. In this
respect clustering goes one step further than degree. The clustering
coefficient of a node is the probability that two nearest neighbours of
a node are themselves nearest neighbours. In other words, if node j
has qj nearest neighbours with tj connections between them, the local
clustering coefficient is
u
CM (1-4)
<Zi( . ~ l)/2 '
see Fig. 1.9. When all the nearest neighbours of node j are intercon-
nected, Cj = 1; when there are no connections between them, as in trees,
Cj = 0. The number tj is the total number of triangles—loops of length
3—attached to the node, and so the clustering refers to the statistics
of small loops—triangles—in a network. Importantly, most real-world
networks have strong clustering.
In general, the clustering coefficient of a node depends on its de-
gree. Empirical researchers often present their data on degree-dependent Fig. 1.9 The clustering coefficient of
clustering by using an averaged quantity—the mean clustering coeffi- the central node equals 2/3.
cient of a node of degree q—that is C(q) = {Cj(q)). Two different
less informative integral characteristics of network clustering are tradi-
tionally used. The first is the mean clustering of a network, which is
the average of the local clustering coefficient, eqn (1.4), over all nodes,
C = {tj/[lj(lj ~ l)/-•−"]) = ∑qP(θ)C(q). The second characteristic—the
clustering coefficient C of a network or transitivity—allows one to find
1 "^i
the total number of loops of length 3 in the network.13 The clustering The notion of clustering was adapted
coefficient of a network is defined as from sociology, where it is usually
called transitivity.
the number of loops of length 3 in a network
C = 3 (1.5) 14
the number of connected triples of nodes One can easily find that the num-
ber of connected triples of nodes equals
A triple here is a node and two of its nearest neighbours.14 A 3-loop ∑. .gi-l)/2 = ΛΓ«92)-<9))/2.
8 First steps towards networks

consists of three triples, which explains the coefficient 3. The denomina-


tor gives three times the maximum possible number of loops of length 3.
One can easily see that C is also the ratio of the average numerator of
expression (1.4) and its average denominator, C ≡ (lI)/(<D(<b· — l)/2).
Compare this with the definition of mean clustering. If C{q) is indepen-
dent of degree q, then the mean clustering and the clustering coefficient
coincide, C — C.

1.10 Adjacency matrix


Networks are naturally represented in matrix form. A graph of N nodes
is described by an N x N adjacency matrix a whose non-zero elements
In a random network, each of the indicate connections between nodes.15 For undirected networks, a non-
members of the statistical ensemble is
diagonal element aij of an adjacency matrix is equal to the number of
represented by its own adjacency ma-
trix.
links between nodes i and j, and so the matrix is symmetric. A diagonal
element aii is twice the number of loops of length 1 attached to node
i. The factor 2 here is clear: each 1-loop plays the role of a double
connection for a node. As a result, the degree of node i is qi = ∑ · αii.
Any structural characteristic of a network can be expressed in terms
of the adjacency matrix. See, for example, the expression for the total
number T of triangles in a graph without 1-loops:

T=l∑(a*h = lτra*. (1.6)

16
Check that the total number of links, Here Tr denotes the trace of a matrix—the sum of its diagonal elements.16
L = K/2 ≡. ∑. ki, in this graph is This formula leads to a compact expression for the clustering coefficient.
Trα2 Numerical calculations with adjacency matrices of large networks re-
L = - ∑<K-· = o ∑<. )« = quire huge memory resources. Fortunately, one can often avoid using
adjacency matrices. The point is that real-world networks and their
models are typically sparse. That is, the numbers of connections in
these networks are much smaller than in complete graphs: L << N2, i.e.
{q) << N· In 1999, in the WWW, for example, the average number of
outgoing and incoming hyperlinks per web page was about eight. There-
fore the great majority of matrix elements in the adjacency matrices of
these networks are zeros. So, instead of an adjacency matrix N x N, it is
better to use a set of N vectors, i = 1,2,..., N, where the components
of vector i are the labels of the nearest neighbours of node i. This takes
up much less memory, {q)N << N2.
Classical random graphs
2
In this lecture we give an insight into the simplest and most studied
random networks—classical random graphs. 2.1 Two classical models 9
2.2 Loops in classical random
graphs 11
2.3 Diameter of classical random
2.1 Two classical models graphs 12
2.4 The birth of a giant
In 1951–1952, applied mathematicians Ray J. Solomonoff and Anatol component 13
Rapoport published a series of papers in the Bulletin of Mathematical 2.5 Finite components 15
Biophysics, which did not attract much attention at the time [165, 166].
It is in these papers that the GN,p or Gilbert model, as it is now known,
of a random graph (see Fig. 1.8) was introduced.1 Later this basic model 1
The terms Bernoulli or binomial ran-
was rediscovered by mathematician E. N. Gilbert (1959). The notation dom graph are also relevant. The term
‘binomial’ is explained by the binomial
GN,p indicates that this is a statistical ensemble of networks, G, with two form of statistical weights in this model,
fixed parameters: a given number of nodes N in each ensemble member see Fig. 1.8.
and a given probability p that two nodes have an interconnecting link.
In the second half of the 1950s outstanding Paul Erdős and Alfréd
Rényi introduced another random graph model and actually established
random graph theory as a field of mathematics [87, 88]. The Erdős–
Rényi random graph is a statistical ensemble whose members are all
possible labelled graphs of given numbers of nodes N and links L, and all
these members have equal statistical weight. This random graph is also
called the GN,L model—a statistical ensemble G of graphs with two fixed
parameters for each member of the ensemble: (i) a given number of nodes
N and (ii) a given number of links L.2 This is a special case of a general 2
Mathematicians usually use the nota-
construction extensively exploited in the science of networks. The idea tions Gn,p and Gn,M for the ensembles
GN,p and GN,L , respectively.
of this basic construction is to build an ensemble whose members are
all possible graphs satisfying given restrictions, and all these members
are realized with equal probability—uniformly randomly. One can say,
2 2 2
this is the maximally random network that is possible under the given
restrictions. One can also say that a network of this kind satisfies a given
1 3 1 3 1 3
constraint but is otherwise random. Figure 2.1 shows an example—a
small Erdős–Rényi network of 3 nodes and 1 link. Compare this small
ensemble with that of the GN =3,p model in Fig. 1.8 and note a clear 2 2 2

difference between these two ensembles. Remarkably, this difference


1 3 1 3 1 3
turns out to be negligibly small in sufficiently large sparse networks.
Let us discuss this point in more detail. In simple terms, statisti-
cal mechanics is based on two kinds of ensembles—canonical and grand Fig. 2.1 The Erdős–Rényi random
graph of 3 nodes and 1 link, which is
canonical ensembles. In all members of the first ensemble, the number
GN =3,L=1 . All six graphs have equal
of particles is equal and fixed. In the second, grand canonical ensem- statistical weight. Draw all configura-
ble, the chemical potential is fixed, and the numbers of particles are tions of the GN =3,L=2 random graph.
10 Classical random graphs

different in different ensemble members. In statistical physics, these two


ensembles—in our case two ways to define a random system—usually
become equivalent as the number of particles approaches infinity. In
random graphs links actually play the role of particles. So the GN,L and
3 GN,p models correspond to the canonical and grand canonical ensembles,
More precisely, the statistical char-
acteristics of typical members of these respectively. Suppose that N → ∞, and the networks are sparse. Then
two ensembles converge. In particu-
one can show that these two random networks approach each other if
lar, a large majority of members in the
Gilbert ensemble have only relatively p = L/[N (N − 1)/2].3 In that sense, these two models are so close
small deviations in the numbers of links that they are called together ‘classical random graphs’ or even simply
from L = pN (N − 1)/2. Note that ‘random graphs’.4 Moreover, the term ‘Erdős–Rényi model’ sometimes
N (N − 1)/2 is the total number of links
in the complete graph of N nodes.
refers to both of these ensembles. The reader may be surprised that in
contrast to the Gilbert model, the Erdős–Rényi graph contains multiple
4 connections and 1-loops, see Fig. 2.1. Then how can these models be
One can even say that classical ran-
dom graphs are maximally random net- equivalent? The explanation is that multiple connections and 1-loops in
works with a given mean degree hqi of
a node.
a large sparse Gilbert graph are not important—we will see that there
are few of them.
Physicists know that analysis of the grand canonical ensembles is tech-
nically easier than for the canonical ones. In this respect the Gilbert
model has an advantage. Let us obtain the degree distribution of this
network using intuitive arguments. A node in this random graph can be
connected to each of the other N − 1 nodes with probability p. Then
combinatorics immediately results in the binomial form of the probabil-
ity that q of these N − 1 links are present:
q q N −1−q
P (q) = CN −1 p (1 − p) , (2.1)

which is the degree distribution of the finite graph. Here Cnq = n!/[q!(n−
q)!] is the binomial coefficient. One can obtain this exact formula strictly,
averaging over the ensemble. The resulting mean degree of a node is
hqi = p(N − 1). When the network is large (N → ∞) while hqi is fi-
nite (i.e. p → const/N ), the binomial distribution (2.1) approaches the
Poisson one:
1
P (q) = e−hqi hqiq . (2.2)
q!
In this limit the Gilbert and Erdős–Rényi models are equivalent, and so
this Poisson distribution is valid for all classical random graphs. The
extremely rapid decay of the distribution is determined by the facto-
rial denominator. We have mentioned that the degree distributions of
practically all interesting real-world networks decay much slower.
Importantly, the degree distributions in these random graphs com-
5
In particular, the absence of corre- pletely describe their architectures. Indeed, links in these networks are
lation means factorization of the joint
distributions of random variables. For
distributed independently. The only restriction is the fixed mean degree
example, let P (q, q ′ ) be the probability of a node. Therefore a node in a classical random graph ‘does not know’
that one node has degree q and another the statistics of connections of any other node. In that sense, even con-
one has degree q ′ . Then in an uncor- nected nodes are statistically independent. Random graphs of this kind
related network, P (q, q ′ ) = f (q)f (q ′ ),
where the function f (q) is completely
are called uncorrelated networks, which is one of the basic notions in
determined by the degree distribution, this field.5 We will consider general uncorrelated networks in detail in
see Lecture 5. the following lectures.
2.2 Loops in classical random graphs 11

2.2 Loops in classical random graphs


As was mentioned, the large sparse classical random graphs have few
loops. What does this mean? Let us first discuss small loops. It is
easy to find the clustering coefficient and so the total number of loops of
length 3 in a large classical random graph. Let the graph be of N vertices
with a mean degree hqi. Recall that the clustering coefficient of a node is
the probability that two nearest neighbours of the node are themselves
nearest neighbours. In our case this probability is hqi/(N − 1) ∼= hqi/N .
So the clustering is
hqi
C=C= . (2.3)
N
Note the equality C = C and the independence of the clustering coef-
ficient of a node from its degree. This is, of course, the case for any
uncorrelated network. So the clustering coefficient for an infinite sparse
classical random graph approaches zero. That is, clustering in these
networks is only a finite size-effect. Let us compare the clustering coef-
ficients of real-world networks with those of the classical random graphs
with the same numbers of nodes and links. For example, these days
(2009) the map of routers in the Internet contains about half a million
nodes—routers—or even more. The Internet grows exponentially, and so
this number changes rapidly. The mean number of connections of a node
in this network is about 10. The clustering coefficient is about 0.1. For a
classical random graph of the same size, with the same mean degree, for-
mula (2.3) gives C ∼ 10−5 , which is four orders of magnitude lower than
in the Internet! In the 1990s, when exploration of real-world networks
was in its early stages, many empirical studies highlighted this tremen-
dous difference, thus showing how far classical random graph models
stray from reality.
6
Formula (2.3) allows us to find the total number of triangles in a This expression is obtained in the fol-
classical random graph:6 lowing way. N3 = 31 C(# of triples),
see Section 1.9. The total number of
connected triples of nodes is N (hq 2 i −
N3 = hqi3 /6 . (2.4) hqi)/2. Finally, we use the fact that for
the Poisson distribution the second and
That is, the number of triangles in a sparse classical random graph does first moments are related:
not depend on its size. This number is finite even if these graphs are hq 2 i − hqi = hqi2 .
infinite. Similarly, one can find the number of loops of an arbitrary
length L. This number also does not depend on L: NL ∼ = hqiL /(2L) if L
is smaller than the diameter of the network, which we expect to be of the
order of ln N . Thus any finite neighbourhood of a node in these random
graphs almost certainly does not contain loops. In that sense, these
networks are locally tree-like, which is a standard term. We will use this
convenient feature extensively. On the other hand, there may be plenty
of long loops of length exceeding ln N ; namely, ln NL ∼ N if L ≫ ln N .
Obviously, these loops cannot spoil the local tree-like character.
One may observe another important object, cliques. A clique is a fully
connected subgraph. For example, a loop of length 3 provides us with a
3-clique. Since there are so few loops in these networks, one can see that
12 Classical random graphs

3-cliques are the maximum possible. The bigger cliques in large sparse
classical random graphs are almost entirely absent.
There is a similar random graph construction, even more simple than
the Erdős–Rényi model. This is the random regular graph. All N vertices
of this graph have equal degrees. The construction is similar to the
Erdős–Rényi model. The random regular graph is a statistical ensemble
of all possible graphs with N vertices of degree q, where all the members
are realized with equal probability. In other words, this is a maximally
7
Note that a random regular graph random regular graph.7 This graph also has loops of various lengths,
does not belong to the category of clas- including loops of length 1, as in the Erdős–Rényi graph. Their number
sical random graphs.
is NL ∼= (q −1)L /(2L), and so these networks also have a locally tree-like
structure. In that sense, an infinite random regular graph approaches
the Bethe lattice with the same node degree.

2.3 Diameter of classical random graphs


Now we can immediately exploit the local tree-like character of simple
random networks. We essentially repeat our derivation of the diameter
1 2 of a Bethe lattice from Section 1.4. The only difference being that, in
a random tree one should substitute a branching b in the derivation for
the mean branching b of a link. So the number zn of the n-th nearest
n−1
neighbours of a node grows as b . This exponential growth guarantees
that the number of nodes Sn which are not farther than distance n from
n
P(l) a given node is of the same order, b . Of course, this tree ansatz fails
when Sn is already close to the size of a network, that is when n is close
to the diameter. It turns out, however, that if we ignore the presence of
_ _ _ _ ℓ
l1 l2 l1 + l2 l loops even in this range of n and estimate b ∼ N , we get a non-essential
error in the limit of large N . So the resulting mean internode distance
is ℓ ∼= ln N/ ln b at large N . This result is valid for all uncorrelated
Fig. 2.2 A network consisting of
two large modules—small worlds—
networks.
n
interconnected by a single link has a The exponential growth, zn ∼ b , has another remarkable conse-
wide distribution P(ℓ) of internode dis- quence. Since z ℓ ∼ N , a finite fraction of nodes in these networks
tances. Indeed, let the mean separa- are at distance ℓ from each other. That is, the width of distribution of
tions between nodes inside modules 1
and 2 be ℓ1 and ℓ2 , respectively. Then
internode distances P(ℓ) in these networks is finite, even if these nets
two nodes in different modules are sepa- are infinitely large. In other words, nodes in infinite networks are al-
rated by approximately ℓ1 +ℓ2 links. So most surely mutually equidistant. The width δℓ of the distribution P(ℓ)
the distribution has three peaks: at ℓ1 , is finite only in specific network models. However, the relatively narrow
ℓ2 , and ℓ1 +ℓ2 . If, however, the number
of connections between the modules is internode distance distributions, δℓ ≪ ℓ, are typical for a wide range of
any finite (nonzero) fraction of the to- small worlds.8 (For a rare counterexample, a specific small world with-
tal number of links, then all three peaks out this property, see Fig. 2.2.) This is why the mean internode distance
merge into a single narrow peak. in large networks is close to the diameter (the maximum separation be-
tween nodes). Note that mutual equidistance is realized only in very
8
For comparison, one can easily es- large networks, where the mean separation of nodes is much greater
timate that for d-dimensional lattices, than 1. Even the largest real-world nets (in the WWW, for example,
δℓ/ℓ ∼ 1 − 2−d , and so this distribution
is wide if d is finite.
ℓ ∼ 10, which is not that great) still have rather wide internode distance
distributions.
For a random q-regular graph, branching equals q−1, and so readily
2.4 The birth of a giant component 13

ℓ∼= ln N/ ln(q − 1). To obtain the diameter of a classical random graph,


we must first find the average branching of its links. For this purpose,
we will use a remarkably general relation for networks. Consider an
arbitrary undirected graph of N nodes—a single realization, not an en-
semble. This graph may include bare nodes and be simple or multi-
ple. Let
P the numbers of nodes of degree q = 0, 1, 2, . . ., be N (q). So
N = q N (q), and the frequency of occurrence of nodes of degree q
is the ratio N (q)/NP . The total degree of the graph (double the num- P(q)
ber of links) equals q qN (q) = N hqi. Clearly, N (q) nodes of degree q
attract qN (q) stubs (‘halves of links’) of total number N hqi. Then the
frequency with which end nodes of a randomly chosen link have degree q,
is qN (q)/(N hqi). In other words, select a link at random, choose at ran-
dom one of its end nodes, then the probability that it has q connections qP(q)
____
equals qN (q)/(N hqi). <q>
For random networks, this important statement is formulated as fol-
lows. In a random network with a degree distribution P (q), the de- Fig. 2.3 End nodes of a randomly
gree distribution of an end node of a randomly chosen link is qP (q)/hqi, chosen link in a network have differ-
see Fig. 2.3. Therefore the connections of end nodes of links are or- ent statistics of connections from the
degree distribution of this network.
ganized in a different way from those of randomly chosen nodes. We
will use this fact extensively in the following sections. In particular
we have hq 2 i/hqi for the average degree of an end of a randomly cho-
sen link, which is greater than the mean degree of a node hqi in the
network. The mean branching is the average value of the number of
connections of an end of a randomly chosen link, minus one—the link
itself. So it is b = (hq 2 i/hqi) − 1. Recall that for the Poisson distribu-
tion, hq 2 i − hqi = hqi2 . Therefore in classical random graphs, an average
branching is equal to a mean degree, b = hqi. Finally we arrive at the
famous formula
ln N
ℓ∼= (2.5)
lnhqi
for the mean separation between nodes in a classical random graph and
for its diameter. Interestingly, this formula provides a reasonably good
estimate for mean internode distance in numerous real-world networks,
even if they differ strongly from classical random graphs. For example,
for the map of routers, N ∼ 2 × 105 and hqi ∼ 3, formula (2.5) gives
ℓ ≈ 11, which is close to an empirical value.

2.4 The birth of a giant component


In our discussion of diameters we missed one thing. In general, a graph
may consist of different, separate parts—clusters. All nodes inside each
of these parts—connected components—have connecting paths running
within them.9 Deriving formula (2.5) for a diameter we actually sup- 9
We assume here that a network is
posed that a random graph consists of a single connected component. undirected.
This is certainly not true when a mean degree hqi approaches zero and
the random graph consists of bare nodes. Already in 1951 Ray Solo-
monoff and Anatol Rapoport had discovered that when hqi exceeds 1,
14 Classical random graphs

infinite classical random graphs contain a large connected component


including a finite fraction of nodes. This is a so-called giant connected
component. Physicists, working in the field of condensed matter, are
familiar with a close analogy—a percolation cluster [169]. Remove at
random a fraction of nodes from an infinite conducting lattice, so that a
fraction p of nodes are retained. Then below some critical value of p the
lattice is split into a set of finite disconnected clusters, and the system is
isolating. On the other hand, a current can flow—‘percolate’—from one
<q> > 1 border of the lattice to another if p is above this critical concentration—
a percolation threshold pc . The current flows on an infinite percolation
cluster of connected nodes, which is quite similar to a giant connected
component. The situation in classical random graphs is as follows, see
Fig. 2.4. When a classical random graph has many links, hqi > 1, it con-
sists of a single giant connected component and, also, numerous ‘finite
<q> < 1 connected components’. On the other hand, if hqi < 1, the giant con-
nected component is absent and there are only plenty of finite connected
components.
This qualitative picture is generic for networks. The general proper-
Fig. 2.4 The organization of a clas- ties of a network are primarily determined by whether or not a giant
sical random graph. The filled circles connected component is present. So the first question about any net-
show connected components. When work should be about the presence of this component. Strictly speaking,
mean degree hqi is above 1, the graph
a giant connected component is well defined for infinite networks. What
contains a giant connected component,
which is absent if hqi < 1. about finite nets? We must inspect the dependence on network size,
N . If the number of nodes in a connected component grows propor-
tionally to N , then it is treated as ‘giant’. In contrast, finite connected
10
In physics, if there is no jump of an components practically do not grow with N .
order parameter (in our case, the rela- Solomonoff and Rapoport found that a giant connected component in
tive size of a giant component) at the
critical point, then it is a continuous
classical random graphs emerges exactly when a mean degree hqi sur-
phase transition. passes 1. This happens without a jump, see Fig. 2.5. In that sense the
birth of a giant connected component is a continuous phase transition,
where hqi = 1 is the critical point.10 This is the main structural tran-
1 sition in a network, where network characteristics dramatically change.
0.8
Note that all these changes take place in the regime of a sparse network,
in which the number of connections is low compared to the maximal
0.6 possible number. Furthermore, Fig. 2.5 demonstrates that a giant con-
S

0.4 nected component approaches the size of a classical random graph still
being in a sparse regime. In particular, the relative size of a giant con-
0.2
nected component is already above 99% at hqi = 5. Thus main quali-
00 1 2 3 4 tative changes in the architecture of networks are in the narrow region
5
<q> hqi ≪ N . Remarkably, most studied real-world networks are sparse.
Returning to formula (2.5) for the diameter, we may conclude that it
Fig. 2.5 The relative size of a gi- certainly fails close to the birth point of a giant connected component.
ant connected component in a classi- We will discuss this special point in detail in the following lectures. On
cal random graph versus the mean de- the other hand, when a giant component contains a reasonably large frac-
gree of its nodes. Near the birth point,
tion of nodes in a classical random graph, deviations from relation (2.5)
S∼= 2(hqi − 1).
are negligible.
2.5 Finite components 15

2.5 Finite components


After Erdős and Rényi, mathematicians spent a few decades studying the
statistics and structure of connected components in the classical random
graphs. In simple terms, the overall picture looks to be as follows. Let
the size N of a classical random graph tend to infinity. We already
know about a giant component. What about ‘finite’ ones? Note these
inverted commas. The point is that this standard term may be confusing.
11
It turns out that some ‘finite connected components’ still grow with N Rigorously speaking, this is valid
but extremely slowly, much slower than a giant connected component within a so-called scaling window,
where the deviation from the critical
(whose size is proportional to N ). Let us first stay away from the critical point, |hqi−1|, is less than, say, N −1/3 ,
point hqi = 1 in either of the two phases: in the ‘normal phase’, without Bollobás (1984) [39].
a giant component, or in the phase with a giant component. Then the
biggest ‘finite’ connected component, the second biggest, the third, the
i-th biggest, with i being any finite number, all of these components 6
have sizes of the order of ln N . The total number of components grows 5

<s>, <s>’
with N , and so most of the components are really finite as N → ∞. In 4
any case, the sizes of all of these components are much smaller than that 3
of the giant one. 2
Let us now move to the the critical point, where a giant component is
1
still absent. At this point, the biggest connected component, the second,
00
the third, the i-th biggest, with i being any finite number, all of these 1 2 3 4 5
components are of the order of N 2/3 .11 It is important that this size,
<q>
N 2/3 , is much smaller than N but much bigger than ln N .
The statistics of connected components are remarkably different at Fig. 2.6 The lower curve: the mean
the critical point and away from it. In the normal phase and in the size (number of nodes) hsi of a finite
component in a classical random graph
phase with a giant component, these distributions have a rapid expo-
versus the mean degree of its nodes.
nential decay. In contrast, at the critical point, the size distribution of The upper curve: the mean size hsi′
components P(s) decays slowly, as a power law: of a finite component to which a ran-
domly chosen node belongs as a func-
P(s) ∼ s−5/2 . (2.6) tion of mean degree. hsi′ diverges at
the critical point as 1/|hqi − 1|, while
The sum s sP(s) ∼ s ss−5/2 converges. Thus the mean size hsi of
P P
hsi is finite.
a finite connected component is finite at any value of hqi, including the
birth point of a giant connected component. Figure 2.6 (see the lower
curve) shows that the finite components are mostly very small—one or 12
The Curie–Weiss law works for in-
two nodes. teracting systems with a second-order
phase transition provided that the
We see from Fig. 2.6 that the dependence of hsi on hqi has only a not
mean-field theory is valid. For exam-
particularly impressive cusp at the critical point. On the other hand, ple, near a critical temperature in fer-
another, related average demonstrates a real critical singularity. Choose romagnets the magnetization
at random a node in a network. What is the probability that this node p
M (T, H = 0) ∝ Tc − T
is in a connected component of s nodes? Clearly, this probability P ′ (s)
is proportional to the product sP(s). At the birth point of a giant in a zero applied field. A susceptibil-
ity χ(T, H) characterizes the response
connected component, P ′ (s) ∼ s−3/2 , so it has an infinite first moment. of the magnetization to a small addition
Thus the average size hsi′ of a finite connected component to which of an applied field: H → H + δH. That
a (randomly chosen) node belongs diverges at the critical point. The is, χ(T, H) ≡ ∂M (T, H)/∂H. Accord-
upper curve in Fig. 2.6 shows the full dependence of hsi′ on hqi. In the ing to the Curie–Weiss law, the suscep-
tibility at a zero applied field is
critical region, hsi′ ∼
= 1/|hqi − 1|, which strongly resembles the famous
Curie–Weiss law for susceptibility in physics.12 χ(T, H = 0) ∝ 1/|T − Tc |.
16 Classical random graphs

One may even say that the average hsi′ plays the role of ‘susceptibil-
ity’ in graph theory and in percolation problems. Figure 2.7 explains
this analogy. Let us add a node to an arbitrary network and link it to a
number n of randomly chosen nodes. Suppose first that this number is a
finite fraction of the total number of nodes N . Then, thanks to the con-
nections of the ‘external’ node, we obtain a giant connected component
even if this component is absent in the original network. Clearly, this
specific attachment plays the role of an applied field which increases the
size M = SN of a giant connected component—similarly to the mag-
netic field changing magnetization in a ferromagnet. We intentionally
use the same notation for the size of a giant connected component as for
magnetization in physics. Similarly to magnetization (the ferromagnetic
order parameter) the size of a giant connected component plays the role
Fig. 2.7 An additional node attached of an order parameter in networks. The ‘susceptibility’ may now be in-
to randomly chosen nodes in a network troduced as an increase in the size of a giant component in response to
increases the giant connected compo- a small addition to the number of ‘external’ links n → n + δn. The ‘zero
nent and so plays the role of an exter-
nal field. The black spots are connected
field susceptibility’ is then ∂M (n)/∂n|n=0 . One can easily see that this
components in a network. is exactly the average size hsi′ , since the attachments are to randomly
selected nodes.
To conclude this lecture, we emphasize that the qualitative picture
described here for classical random graphs is of a surprisingly general
nature going even beyond graph theory and the science of networks. In
this respect, this is a zero model of a random network, but it is not a
toy model.
Small and large worlds
3
3.1 The world of Paul Erdős 3.1 The world of Paul Erdős 17
3.2 Diameter of the Web 18
The fantastic productivity of Paul Erdős and his travelling life (Erdős 3.3 Small-world networks 18
had a reputation as a mathematician-pilgrim) resulted, in particular, in 3.4 Equilibrium versus growing
an incredible number of coauthors. Over 500 mathematicians had the trees 20
privilege of being coauthors with Erdős. So Erdős plays the role of a hub 3.5 Giant connected component
at birth is fractal 22
in the network of collaborations between mathematicians. Nodes in this
3.6 Dimensionality of a brush 23
net are the mathematicians (authors), and undirected connections are
coauthorships in at least one publication. We have already mentioned
that this is actually a one-mode projection of a bipartite collaboration
network. In total, the network of collaborations between mathematicians
includes about 337 000 authors, connected by 496 000 links.
It is a great honour to be Erdős’ coauthor, but also it is a honour to be
a coauthor of Erdős’ coauthor (in reality, this honour is shared by a lot of
mathematicians), and so on. These grades of the ‘closeness’ to Erdős are
classified by Erdős numbers. The Erdős number of Erdős is 0, that of his
coauthors is 1, of coauthors of Erdős’ coauthors is 2, etc. In short, the
Erdős number is the shortest path length between a mathematician and
Erdős. Figure 3.1 shows the numbers of mathematicians with various
Erdős numbers. One can easily see that this is actually the distribution
of internode distances in this network. Note that this world of Erdős has 60000
a crucially different structure from the Erdős–Rényi model. Indeed, a n(d)
rapidly decreasing Poisson degree distribution does not admit hubs with 40000
500 in graphs with such a small number of connections. By using the
formula for the Poisson distribution, the reader can find the probability 20000
that in a classical random graph with the same numbers of nodes and
links as in the mathematics collaboration graph, at least one node has 0
0 5 10 15
500 connections. This probability is of the order of 10−934 . d
One can see from Fig. 3.1 that the average distance of a mathematician
from Erdős is about 5—only five steps/coauthorships from greatness! Fig. 3.1 The number of mathemati-
For comparison, formula (2.5) gives the mean internode distance 12 for a cians with Erdős number d in 2001.
classical random graph with the same number of nodes and links. Thus This plot was made by using data
this network is even more compact than the classical random graph, from the web page of the Erdős Num-
ber Project, http://www.oakland.edu
though the difference is not dramatic. Note that the distribution in this /grossman/erdoshp.html.
figure is not narrow. The diameter equals 15, which is three times bigger
than the mean distance between nodes. This is in contrast to the mutual
equidistance of nodes in infinite small worlds, which we discussed. This
is not that surprising, since the mean internode distance is not much
greater than 1.
18 Small and large worlds

One should note that the described small world of Paul Erdős is a
very typical collaboration network. All these networks are small worlds.
Now, what about the largest directed net—the WWW?

3.2 Diameter of the Web


In 1999 Reka Albert, Hawoong Jeong, and Albert-Laszlo Barabási, physi-
cists from the University of Notre Dame, measured what they called ‘the
diameter’ of the WWW [5]. In fact, this was the average length of, what
is important, the directed path between two pages of the WWW. Albert,
Jeong, and Barabási succeeded in collecting data from a small part of
the WWW, namely the nd.edu domain of the University of Notre Dame.
Then, how could such limited data be used for obtaining the mean in-
ternode distance of the large Web?
The researchers used the following approach. Although the complete
map of the nd.edu domain contained only 325 729 pages and 1 469 680
hyperlinks, these numbers were large enough to expect that the organi-
zation of connections in the domain is close to that in the entire Web.
The architecture of the nd.edu was approximately reproduced in a set
of even smaller model networks of different sizes. These small networks
could easily be generated and studied numerically. The only features of
the real WWW they reproduced were the distributions of incoming and
1
As a model of the WWW these au- outgoing connections of nodes, but this was sufficient.1 It was easy to
thors used a maximally random net- measure the mean shortest-directed-path length for each of these mini-
work with a given distribution of in-
coming and outgoing links of nodes.
Webs. The resulting size dependence 0.35 + 0.89 ln N was extrapolated
to the estimated size of the Web in 1999, N ≈ 800 000 000, which gave
the mean internode distance 19. In the spirit of our calculations of ℓ
in classical random graphs, we can easily estimate the average length
of the shortest directed path in a directed network: ℓd ≈ ln N/ lnhqo i.
Here hqo i is a mean out-degree, that is the mean number of outgoing
2
By definition, the in-degree qi of a links of a node.2 Substituting the above numbers to this estimate we
node is the number of its incoming obtain ℓd ≈ 14 for the network of 800 000 000 nodes, which is not very
links, the out-degree qo is the number
of outgoing connections.
far from the result of Albert, Jeong, and Barabási.
Later direct measurements made by a large group of computer scien-
tists from AltaVista, IBM, and Compaq resulted in a close value, 16, for
the part of the WWW (about 200 000 000 pages) observed by the search
engine AltaVista [44]. In other words, a web navigator would have to
make only about 20 ‘clicks’ on average to reach a web page, following
hyperlinks. Thus, in respect of internode distances, the large Web is a
very small object—a small world.

3.3 Small-world networks


The collaboration networks and the WWW are only two important ex-
amples among an infinity of real networks showing the small-world fea-
ture. We also mentioned that the great majority of real-world networks
have strong clustering. The values 0.1, 0.2, 0.3, etc. for a clustering coef-
3.3 Small-world networks 19

ficient are very typical. This combination—the small-world phenomenon


and the strong clustering—seemed to be completely incomprehensible in
the late 1990s, when the only reference model widely used for comparison
was a classical random graph. Unfortunately, classical random graphs
have very weak clustering. Ten years later, it is clear that there exist
plenty of easy ways to get a strongly clustered small world. The nature of
strong clustering (and, in general, of numerous loops) is not considered
to be a serious problem now. Nonetheless, it was the desire to under-
stand the high values of clustering coefficient in the empirical data that
inspired sociologist Duncan Watts and applied mathematician Steven
Strogatz (1998) to propose an original model, interconnecting themes of
networks and lattices [181]. This very popular model has significantly
influenced the development of the field.
The basic idea of Watts and Strogatz was as follows. Suppose some-
body, who knows only lattices and the Erdős–Rényi model, wants to Fig. 3.2 The idea of a small-world
construct a network with the small-world feature and numerous trian- network. Long-range shortcuts—links
gles. The classical random graphs demonstrate the small-world phe- connecting randomly chosen nodes—
nomenon but have few triangles. On the other hand, many lattices have are added to a d-dimensional lattice.
Together with the nodes and connec-
numerous triangles (if their unit cells include triangles of bonds) but the tions within the mother lattice (which
lattices have no small-world feature. Then let us combine a lattice with are not shown) these links form a small-
many triangles and a classical random graph with the small-world fea- world network.
ture. The combination of the ‘geographically short-range’ connections
and long-range shortcuts has clear roots in real life: find examples from
communications, sociology, etc. Technically, Watts and Strogatz con-
nected pairs of randomly chosen nodes in a lattice by links—‘shortcuts’,
see Fig. 3.2. Thanks to these long-range shortcuts, even nodes, widely
separated geographically within the mother lattice, have a chance to
become nearest neighbours. Clearly, the shortcuts make the resulting (a)
networks more compact than the original lattice. Watts and Strogatz
called networks of this kind small-world networks. This term is used
even if a mother lattice is disordered, or it has no triangles. The clus-
tering here is of secondary importance. Indeed, what is so special about
loops of length 3? Loops of length 4, 5, etc. are no less important.
Figure 3.3 (a) shows the original Watts–Strogatz network where ran-
domly chosen links of a mother lattice were rewired to randomly selected (b)
nodes. The rewiring produces the same effect as the added long-range
shortcuts, Fig. 3.3 (b). Figure 3.3 (b) also explains why we have su-
perposition of a lattice and a classical random graph. Indeed, if we let
the connections within the mother lattice be absent, then the shortcuts
and the nodes form a classical random graph. The resulting small-world
network is already ‘a complex one’.
The problem is: what does happen when the number of shortcuts in- Fig. 3.3 The original Watts–Strogatz
creases? It is easy to see that even a single long-range shortcut sharply model (a) with rewiring of links [181],
and its variation (b) with addition of
diminishes the average internode distance ℓ. For example, the first short- shortcuts (b) [132]. Notice numerous
cut added to a one-dimensional lattice (as in Fig. 3.3) may reduce ℓ by 3-loops within the mother lattice.
half! So, the influence of shortcuts on the shortest path lengths in a
small-world network is dramatic. Let p be the relative number of short-
cuts in a small-world network (with respect to the overall number of con-
20 Small and large worlds

_
CL nections. Watts and Strogatz observed that even at a very low p, where
the clustering is still nearly the same as in the mother lattice, the mean
_ internode distance is tremendously reduced. Figure 3.4 shows schemat-
C
ically these typical dependences of clustering and the average distance
between nodes on p in a finite small-world network. Thus even at small
0 (but non-zero) p the network shows the small-world phenomenon.
0 0.005 p 0.01 One should stress that in these networks there is no sharp transition
_
lL between two regimes: from large to small worlds. Rather, a smooth
crossover is realized as the number of shortcuts grows. In particular, in
_
l the limit of the infinite number of nodes N , we obtain a ‘large world’
if the number of shortcuts Ns is finite, and a small world if the relative
number of shortcuts p is finite. Let us estimate the average distance
0 between nodes, ℓ(p). We assume that N → ∞ and p is finite, that is,
0 0.005 p 0.01
the number Ns ∼ pN of shortcuts is large enough—the small-world
regime. In this case the shortcuts determine the global architecture
Fig. 3.4 A sketch of two typical depen- of the network, and the network resembles a classical random graph
dences for small-world networks. Mean with Ns links and of the order of Ns ‘supernodes’. These ‘supernodes’
clustering C versus the relative num- are regions of the mother lattice, which contain neighbouring ends of
ber of shortcuts p and average intern-
shortcuts. To this graph, we can apply a classical formula (2.5), that is
ode distance ℓ versus p, in a small-world
network with a fixed finite number of ℓ ∼ ln N , with two changes. In our estimation we ignore constant factors
nodes. C L and ℓL are the correspond- and for the sake of simplicity suppose that the network is based on a
ing values for the mother lattice with- one-dimensional mother lattice—a chain. In the resulting network, the
out shortcuts.
average distance between ends of different shortcuts is d ∼ 1/p ∼ N/Ns .
Then, (i) substitute N for Ns ; (ii) take into account that the shortest
paths in the network pass the mother lattice between ends of different
shortcuts, so multiply the classical expression by d. This gives the result:

ℓ(p) ∼ d ln Ns ∼ (1/p) ln(N p) . (3.1)

Thus the diameter of this small-world network is about Ns / ln(Ns ) ≫ 1


times smaller than that of the mother lattice.
We can obtain relation (3.1) in a slightly different way. The point is
that often, large small-world networks may be treated as locally tree-like.
Look at Fig. 3.5 which shows a small-world net based on a simple chain.
One can see that as N → ∞, any finite environment of an arbitrary node
is tree-like. The figure also shows this local structure. The reader can
apply the arguments, which we used in Section 2.3, to this tree-like
structure, estimate the mean branching, and rederive result (3.1).

Fig. 3.5 This small-world network has


locally a structure of a tree. 3.4 Equilibrium versus growing trees
Up to now we have discussed equilibrium random networks although
most real networks are non-equilibrium. The difference between net-
works from these two classes is sometimes great. For demonstration
purposes we compare here two important random trees—equilibrium
and growing. These two models are selected to be as close to each other
as possible the only difference being: one tree is equilibrium and the
second—growing.
3.4 Equilibrium versus growing trees 21

Let us first introduce recursive networks. A recursive network grows


in the following way. Add a new node and attach it to a number of
already existing nodes. Then repeat again, and again, and again. Fig- (a)
ure 3.6 (a) explains this growth process (the bubble denotes an existing
+
network). The nodes for attachment are selected by rules defined in
specific models. Notice that in this growth, new links cannot emerge be-
tween already existing nodes. This is, of course, a great simplification,
which makes these networks easy for analysis. Most growth models that (b)
we will discuss are of this type. Let us now suppose that in the recursive +
process, each new node has only one link, and the growth starts from
a single node, see Fig. 3.6 (b). Then this network is a recursive tree
(has no loops). Here the attachment is also a sequential birth of new
nodes by existing ones. One should specify how a node for attachment Fig. 3.6 How recursive networks grow.
is selected. The simplest rule is to choose it from among the existing (a) The growth of a loopy recursive net-
work. Each new node is attached to a
nodes with equal probability, that is uniformly at random. This gives a few chosen existing nodes. Note that
random recursive tree as it is known in graph theory. This construction new connections are only between an
provides us with a maximally random tree with a given number of nodes added node and already existing nodes.
under only one restriction: this tree is obtained by sequential addition (b) The growth of a recursive tree. (We
assume that the growth starts from a
of labelled nodes—the causality restriction. The random recursive tree single node.)
is used as a reference model for growing networks. We will consider
this important random graph in more detail in Section 7.1. Note that
its degree distribution significantly differs from that of classical random
graphs. For large recursive trees, the degree distribution approaches the
exponential form: P (q) = 2−q .
It is an easy exercise to find that the mean internode distance in the
tree is ℓ(N ) ∼= 2 ln N , if the number N of nodes is large.3 Thus this 3
Compare the total lengths of the
random tree is a small world. Let us compare the random recursive tree shortest paths between all pairs of
nodes in this network at ‘times’ N and
with an equilibrium tree. By definition, this is a maximally random tree N + 1. These are ℓ(N )N (N −1)/2 and
with a given number of nodes N . Compare with the definition of the ℓ(N +1)(N +1)N/2, respectively. The
Erdős–Rényi model. In other terms, this statistical ensemble contains all difference due to the new attached node
possible trees of N nodes taken with equal statistical weight. Figure 3.7 is 1+[ℓ(N )+1](N −1). Together these
three terms give an equation for ℓ(N ),
shows the members of both of these ensembles for a few small values of which can easily be solved.
N.
One can see that in the random recursive tree some configurations

N=1 N=2 N=3 N=4

3
2 2
1
growing 2 3
tree 1 1 2 4
1
Fig. 3.7 The random recursive tree
versus the equilibrium tree. The upper
and lower rows shows the members of
2 1
3 4 these two statistical ensembles. Each of
equilibrium them has a unit statistical weight. At
tree 1 1 2 2
1
3
N = 4 we indicate the number of con-
12
figurations which differ from each other
1 3
2 only by labels.
22 Small and large worlds

are absent. For example, there is no configuration 2—3—1 (equivalent


(a)
<q> to 1—3—2). Already at N = 4, the equilibrium tree has a larger mean
internode distance than that of the growing one (compare weights of
2
more and less compact configurations in the figure). As N increases,
this difference becomes vast. One √can show that the diameter of the
0 equilibrium random tree scales as N at large N . Therefore it is not
0 1 2 3 4 5 6 7 a small world but rather a random fractal with Hausdorff dimension 2.
n
Interestingly, the degree distribution of this network is close to classical
(b) random graphs. In the former case it is simply factorial: P (q) ∝ 1/(q −
<q> 1)!. Nonetheless, one of these networks is a small world and the second—
a ‘large’ one.
2 In Fig. 3.8, the readers can see how different are the neighbourhoods
of a randomly chosen node in a random recursive tree (a) and in an
0
0 1 2 3 4 ... 998 999 1000 equilibrium tree (b). The figure shows the dependences of the average
n degree of the n-th nearest neighbour of a node on n. Clearly, hqi(n = 0)
coincides with the average degree hqi of nodes in a network. This average
Fig. 3.8 The average degree of the n-th degree approaches 2 in any infinite tree. A significant difference is for n ≥
nearest neighbour of a randomly cho- 1. In this range, for the small-world phenomenon the mean branching
sen node as a function of n for two net- must be greater than 1. So in the recursive random trees hqi(n) > 2. In
works: for a particular small world—a
recursive tree (a), and for an equilib-
fact, the dependence, schematically presented in Fig. 3.8 (a), is valid for
rium tree—a fractal (b). any small world. Of course, hqi in loopy networks will be greater than
2. As for the equilibrium trees, the mean branching is close to 1 even in
a very far neighbourhood of a node.

3.5 Giant connected component at birth is


fractal
This ‘fractal appearance’ of the equilibrium trees has remarkable con-
sequences even for loopy networks with the small-world property. The
point is that we can find equilibrium trees within loopy small worlds.
For an arbitrary connected graph, one can construct a tree which spans
over the entire graph. This subgraph, the spanning tree, by definition,
consists of all nodes of a given graph and of some of its links. Clearly, if a
given graph is not tree, then it has a number of different spanning trees.
In principle, this number may be very large. A uniform (or random)
spanning tree of a given graph is a statistical ensemble whose members
are all spanning trees of this graph, taken with equal probability. In sim-
ple terms, this is the maximally random spanning tree of a given graph.
One may suppose that this random tree is equilibrium. Furthermore,
it turns out that the uniform spanning trees of classical random graphs
are fractals of Hausdorff dimension 2 [99]. In the preceding section we
explained how to make a small world from a ‘large one’ by using long-
range shortcuts. Contrastingly, the uniform spanning trees enable us to
make a large world from a small one.
Even though the classical random graphs are loopy networks, they
contain plenty of trees. Note that classical random graphs are only
locally tree-like. So their finite components are (almost surely) trees,
3.6 Dimensionality of a brush 23

but a giant connected component is loopy—it has long loops. Recall our
estimation of the diameter of a classical random graph in Section 2.3.
We emphasize here that the result ℓ ≈ ln N/ lnhqi was obtained for an
equilibrium, locally tree-like network which, however, has many long
loops. We mentioned that this estimate is valid only sufficiently far
from the birth point of a giant connected component. Suppose that
we are approaching the birth point hqi = 1 from above, and so a giant
component increasingly resembles finite ones. So, it has fewer and fewer
loops. In the limit hqi → 1, the giant component becomes an equilibrium
tree. One can show that it has a fractal architecture, and its Hausdorff
dimension equals 2, as is natural for random trees of this kind. Thus in
the limit hqi → 1, the classical random graphs are fractals!
We can easily reach this fractal state by removing at random nodes
or links in a graph, until a giant connected component disappears. In a
large but still finite random graph, one can also make the following op-
eration. Choose at random a node within a giant connected component.
Then remove this node, and all of its neighbours closer than distance n
from it. When n is sufficiently small, we will get a hole in the original
giant connected component. With growing n, this hole increases, and
at some critical value, it splits a giant component into a set of finite
ones, see Fig. 3.9. Close to this point, the vanishing giant component
again turns out to be a two-dimensional fractal [161]. In that sense, the
remote part of a classical random graph with a removed ‘centre’ has a
fractal structure.

Fig. 3.9 The remnant of a random


3.6 Dimensionality of a brush graph after the removal of a large en-
vironment of a node (open dot). A gi-
We introduced small worlds as objects with an infinite fractal or Haus- ant connected component on the verge
dorff dimension. Recall how the Hausdorff dimension was defined in of vanishing has a fractal architecture.
terms of the size dependence of the mean internode distance: ℓ(N ) ∼
N 1/d , that is d = ln N/ ln ℓ. One should note that this definition of the
dimensionality of a system is not unique. Let us discuss an alternative
way using a random walk on a network. A random walk here plays the
role of a useful instrument which allows us to characterize the structure
of a network. Consider a particle which at each time step, with equal
probability, moves from a node to one of its nearest neighbours. After t
steps of a random walk on a regular d-dimensional lattice,√ the particle
will typically be found at a distance of the order of t from a starting
point. So the ‘area’ of the region where the particle may be found at
time t is about√td/2 . This is simply the number of nodes in the hyper-
ball of radius t. Then the probability that the particle will be found
at the starting point after t steps is p0 (t) ∼ t−d/2 . This can be used as
another definition for the dimensionality. Remarkably, for some objects,
this definition results in a dimension which differs from the Hausdorff
one. So it is natural to introduce a special number, a spectral dimen-
sion ds , defined by the relation p0 (t) ∼ t−ds /2 .4 Actually, ds shows that 4
We will explain why it is called ‘spec-
random walks in a given network are organized similarly to those on tral’ in future lectures.
ds -dimensional lattices.
24 Small and large worlds

A difference between spectral and Hausdorff dimensions may be illus-


trated by combs and brushes, see examples in Fig. 1.2 (c) and (d). Let a
brush be based on a large d-dimensional lattice with long hairs—linear
chains—growing from some of the nodes of a lattice. The Hausdorff di-
mension of this brush is d+1. As for its spectral dimension, it was found
that ds = 1+d/2 when d is not higher than 4, and ds = 3 for d ≥ 4 [107].
One can even use a network with the small-world feature and attach long
linear chains to its nodes, and again, the spectral dimension is only 3.
From the Internet to
cellular nets
4
A wide range of real-world networks from diverse areas have surprisingly
similar architectures. In this lecture we discuss and compare structural 4.1 Levels of the Internet 25
properties of a few basic man-made and natural networks. Before start- 4.2 The WWW 28
ing, we emphasize the principal difference between two global networks— 4.3 Cellular networks 30
the Internet and the World Wide Web (WWW). 4.4 Co-occurrence networks 31

(i) The Internet is a global technological network of connected comput-


ers through which users can access data and programs from other
computers. Links in this network can be wired or wireless.
(ii) The WWW is a global information network, an array of web doc-
uments (files of various formats), connected by hyperlinks. The
hyperlinks are mutual references in web documents.

Even though most impressive, the WWW is only one of many Internet
applications. Another one, for example, is email.

4.1 Levels of the Internet


The first distributed computer network ARPANET was constructed at
the end of 1969.1 Originally ARPANET linked only four nodes: the Uni- 1
The US Defense Advanced Research
versity of California at Los Angeles, Stanford Research Institute, the Projects Agency (DARPA) played a
great role in the history of networking
University of California at Santa Barbara, and the University of Utah. and initiated the ‘Internetting’ program
This pioneering, US national net afterwards grew into the Internet by in 1972. The organization of ARPA
interconnecting with other networks [53]. The key idea was to build the (1958, renamed DARPA in 1972) was
Internet as a federation of interconnected autonomous (independently initiated in response to the launch of
Sputnik on October 4, 1957 by the So-
managed), peer networks of very different types and architectures. The viet Union.
routing of data packets within the peer networks is maintained by their
individual internal rules—protocols, while routing between these net-
works is performed by common internetwork routing protocols.2 Thus, 2
A protocol is an algorithm, a stan-
based on ‘hierarchical routing’, Internet technology was substantially dard, a set of formal instructions and
rules, see in more detail in Lecture 12.
formed by the middle of the 1980s. By 1991, the Internet included
700 000 host computers, and, in principle, approached the modern form.
The specific organization of the Internet as a network of numerous au-
tonomous networks, without a central authority is apparently optimal
and inevitable. ‘The Internet is the first computational artifact that was
not designed by one economic agent, but emerged from the distributed,
uncoordinated, spontaneous interaction (and selfish pursuits) of many.
26 From the Internet to cellular nets

3
From the paper ‘On a network cre- Today’s Internet consists of over 12,000 subnetworks (“autonomous sys-
ation game’ by Fabrikant et al. (2003) tems”), of different sizes, engaged in various, and varying over time,
[90]. degrees of competition and collaboration.’3 This long quotation touches
upon several key aspects of the evolving architecture of the Internet.
Let us consider this multilayer architecture in more detail. The Internet
AS2 includes hosts (computers of users), servers (computers or programs pro-
AS1 viding a network service), and routers, arranging the traffic. The total
number of hosts (including handheld devices) in the Internet was about
AS4 570 million in July 2008 and will probably reach 3 billion (3 000 000 000)
by 2011 [1]. Figure 4.1 shows schematically the multilayer architecture
of the Internet. The complete structure of the Internet including all host
AS3 computers has never been investigated. Routers with their undirected
LAN interconnections form the router level network in the Internet. The au-
tonomously administered subnetworks (autonomous systems, AS) in this
hosts network are the nodes of the second, AS-level graph. Routing between
autonomous systems is maintained by the common Border Gate Proto-
Fig. 4.1 Multilayer organization of the col. (A gateway is a system (software or a device) that joins two networks
Internet. Open dots show host comput- together.) In the AS graph, two autonomous systems are connected by
ers of users, LAN is a local area net- a single undirected link if at least two of their routers are directly con-
work, filled dots are routers, and ASs
are autonomous systems. nected. Because of the exponentially rapid growth, it is even hard to
estimate the present sizes of these two networks. Very approximately,
the Internet contained 40–70 thousand autonomous systems in 2009.
10 0 The number of routers was higher, say, by 15–25 times.
The routers in the Internet have geographical locations. It turns out,
10 −1
however, that usually it is hard to find their precise coordinates. As
10 −2 for the autonomous systems, some of them contain routers distributed
Pcum(q) all over the world, and any geographical mapping is impossible. This
10 −3
is why the majority of studies have to ignore the geographical factor
10 −4 IR in Internet architecture. The statistics of connections in the Internet
AS remained unstudied until the end of 1990s. In 1999, three scientists,
−5
10
10 0 10 1 q 10 2 10 3 Faloutsos, Faloutsos, and Faloutsos, analysed the partial maps of the
AS and router networks in the Internet and discovered that both of
them had not classical, but scale-free architectures. They found that
Fig. 4.2 The cumulative degree distri- the degree distributions of these networks could be approximately fitted
butions of the two networks: AS level
map measured in October/November
by power laws [91]. The networks that they investigated were small, so
1999 and the Internet Router (IR) map in Fig. 4.2, we reproduce empirical cumulative degree distributions ob-
of May 2001. Adapted from the pa- tained later for larger AS and router maps by Pastor-Satorras, Vázquez,
per of Vázquez, Pastor-Satorras, and and Vespignani [148,177].4 Both these empirical cumulative distributions
Vespignani [177].
were fitted by power-law dependencies with exponent 1.1, and therefore
4
the degree distribution exponent γ = 2.1.
The cumulative degree Pdistribution is
′ ). If
The empirical distributions in Fig. 4.2 are very typical [10]. Note that
defined as Pcum (q) = P (q

q ≥q the ‘quality’ of these ‘power laws’ is rather poor, especially for the router
P (q) ∼ q −γ , then Pcum (q) ∼ q 1−γ . Re-
searchers use these distributions to di- network, which is also a very typical situation. The AS network consisted
minish inevitably strong fluctuations of about 11 000 autonomous systems with, on average, 4.2 connections
in the region of large degrees. An- for each AS. The network of routers had about 230 000 nodes and the
other method to smooth fluctuations
is binning—accumulation or averaging
router’s mean degree hqi ≈ 2.8. For smaller networks and higher values
data within some degree intervals. of exponent γ, empirical curves usually resemble even less a power law.
Importantly, the Internet maps which empirical researchers analyse, are
4.1 Levels of the Internet 27

very approximate and always incomplete. These maps were obtained


by sending packet probes over the network from one or a few sources.
Unfortunately, this technique may seriously distort the empirical degree
distributions. Nodes with few connections have a higher chance of escap-
ing the probe, and the measured degree distributions are more skewed
than the real ones [151, 56]. Nonetheless, despite all these difficulties
and restrictions, the empirical data for the Internet show remarkably
5
skewed degree distributions. The biggest autonomous systems have a Interestingly, with time, many routers
and autonomous systems disappear.
few thousand connections to other ASs.
Both the networks have small diameters. The average internode dis-
tance in the AS network is only 3–4 hops. In the router network this
distance was found to be about 9. These were data obtained in 1999– 10 0
2001 for Internet maps. The Internet grows exponentially with time, and
_ 10 −1
both the AS and router networks evolve rapidly. It was theoretically
C(q)
suggested that these and many other networks become more densely
10 −2
connected with time. In other words, the growth is ‘accelerated’ in the
sense that the number of links in a network grows more rapidly than the 10 −3 IR
number of nodes [78]. Empirical observations have confirmed that this is AS
indeed the case [120]. The growing average degree results in increasing 10 −4 0
clustering. Furthermore this densification leads to shrinking diameters 10 10 1 q 10
2
10 3
of the networks. The networks exponentially grow but their diameters
are constant or even decrease with time.5
Fig. 4.3 The mean clustering of a
One can show that the heavy-tailed degree distributions make strong node versus the degree of this node
clustering inevitable. The measured values were C ≈ 0.03 and C ≈ 0.3 for the same two networks as above.
for mean clustering in the router and AS networks, respectively [177]. Adapted from the paper of Vázquez,
Pastor-Satorras, and Vespignani [177].
Moreover, in contrast to classical random graphs, the clustering of a
node in the router and AS networks was observed to depend strongly on
the degree of this node, see Fig. 4.3. We explained that in this situation, 10 3
the clustering coefficient C can essentially differ from C. In reality, C IR
AS
in these networks is much smaller than C.
10 2
Pastor-Satorras, Vázquez, and Vespignani also measured a quantity, _q
which became one of the standard structural characteristics of complex nn
(q)
networks. They investigated the average degree q nn (q) of the nearest 10 1
neighbour of a node with q connections. This quantity is independent of
q in the uniformly random, that is, uncorrelated networks; see the next
10 0
lecture. In these uncorrelated networks, nodes ‘know’ nothing about 10 0 10 1 q 10
2
10 3
the degrees of their neighbours. In the Internet, this is certainly not
the case. Figure 4.4 demonstrates the sharp difference between Internet
networks and uniform ones. Therefore, it is not only the skewed degree Fig. 4.4 The average degree of the
nearest neighbour of a node of degree
distribution that distinguishes these complex networks from classical q for the same two networks as above.
random graphs. Subsequent studies have shown that the Internet is not Adapted from the paper of Vázquez,
an exception. On the contrary, very few real-world networks have no Pastor-Satorras, and Vespignani [177].
structural correlations.
28 From the Internet to cellular nets

4.2 The WWW


In essence, the WWW is simply a system for automated retrieving of
information in the form of electronic documents (files of various for-
mats). The original idea was to organize links between documents in
a way, convenient for users. In the WWW, the linking is based on hy-
pertext. The hypertext contains highlighted parts which cover the link
to other documents. Clicking on highlighted text causes the fulfilment
of the underlying link to the corresponding document and downloads
this file to the user’s computer. In 1989, Tim Berners-Lee proposed
a hypertext system for CERN, the European Laboratory for Particle
Physics in Geneva, Switzerland [25, 65]. In 1990, he wrote a program
‘WorldWideWeb’, which is a web browser editor, ‘a program which pro-
vides access to the hypertext world’, and the first web page was placed
on the first web server in CERN. Hypertext documents in the WWW
are written by using special programming language, Hypertext Markup
Language (HTML) (look at any web page by using the option View the
Page Source of your browser). The functioning of the WWW is based on
the Hypertext Transfer Protocol (HTTP), which enables the flow and
processing of web documents and requests in the Web. Thus, in 1990,
the four required components—(i) the protocol of the Web, HTTP, (ii)
the language of the Web, HTML (which are the two main standards of
the Web), (iii) a web browser, and (iv) web servers—were created. The
WWW was born.
On the 25 July 2008 the official Google blog announced: the Google
index ‘hit a milestone: 1 trillion (as in 1 000 000 000 000) unique URLs
6
See http://googleblog.blogspot.com/ (Uniform Resource Locator) on the web at once!’6 This number—a tril-
2008/07/we-knew-web-was-big.html. lion pages in the Google index, however impressive, does not allow us
to estimate the size of the exponentially growing WWW. The difficulty
is that only a very small fraction of pages are accessible by search en-
gines. These public pages, sufficiently static to be scanned, form the
so-called Surface Web. The huge part of the WWW, (the Deep Web),
are electronic documents in databases and archives with restricted pub-
lic access and, also, rapidly varying pages (time tables, web calendars,
and so on) with all their hyperlinks. A clear border between the Surface
and Deep Webs is absent, and it is barely possible to reliably evaluate
their sizes. What empirical researchers can study are sufficiently large
pieces of the WWW. The analysis of these parts allows us to understand
the architecture of the WWW.
Figure 4.5 shows very schematically how hyperlinks connect web doc-
Fig. 4.5 Connections in the WWW. uments. Compare this schematic view of directed connections in the
Notice reciprocal hyperlinks. WWW with a more simple picture of connections in the network of ci-
tations in scientific papers, see Fig. 1.7. Notice reciprocal hyperlinks in
the Web graph. Two Web documents can cite each other while scientific
papers cannot. It turned out that the number of reciprocal hyperlinks in
the WWW is surprisingly large, up to 60% of all connections [85, 138].
The abundance of reciprocal hyperlinks in the WWW becomes more
clear if we look at the scheme of connections of a typical home page,
4.2 The WWW 29

see Fig. 4.6. These figures also show longer loops. The directedness
of the hyperlinks make makes these loops more diverse than in undi- index.html x.pdf
rected networks. For example, there are six different loops of length 3 y.jpg
if we also take into account reciprocal links. In principle, the definition e.html
a.html
of clustering must be modified to account for this diversity. Usually,
however, the clustering of the WWW network is measured ignoring the
b.html c.html
directedness of connections, which typically gives C ≈ 0.1 and C ≈ 0.3.7
d.html
The directedness of hyperlinks determines a more rich and interest-
ing organization of finite and giant components in the WWW than in
undirected networks. According to a standard definition, a giant compo-
nent obtained ignoring the directedness of connections is a giant weakly Fig. 4.6 Structure of a typical home
connected component. For a general directed network, this giant com- page. The dashed arrows show hyper-
ponent is organized as is shown in Fig. 4.7. There is a giant strongly links coming from external web docu-
ments.
connected component, which consists of the nodes mutually reachable
by directed paths. The nodes of a giant out-component are reachable
7
from the strongly connected component by directed paths. A giant in- These values are for the same nd.edu
component contains all the nodes from which the strongly connected domain of the WWW, which we dis-
cussed in the previous lecture [5].
component is reachable. By this definition, the giant strongly connected Surprisingly, taking into account the
component is the intersection of the giant in- and out-components. hyperlink directedness leads to even
The remaining part of the giant weakly connected component is the smaller numbers of triangles than one
mess of so-called tendrils. Apparently, the presence of a giant strongly could expect [31].
connected component is vitally important for the function of the WWW.
In 1999, Broder and coauthors measured the sizes of these components
using a map of a sufficiently large part of the WWW (about 200 million
TENDRILS
pages) [44]. They found that the giant strongly connected component
contained about 30% of the pages in the weakly connected component,
while the tendrils contain about 25%. The average length of the shortest
direct path between two web pages in these measurements was about 16. SCC
Remarkably, the maximum separation of nodes observed in this network
IN OUT
was very large, namely, about 1000 clicks!
Similarly to the Internet, the density of connections in the WWW
TENDRILS
grows with time. Broder and coauthors first made their measurements
in May 1999 and found that the average in- and out-degrees of a node
are equal at 7.22. When they repeated the measurements in October
1999, the average in- and out-degrees were already 7.85. Fig. 4.7 Organization of a giant
The scale-free in- and out-degree distributions of the WWW, Pi (qi ) ∼ weakly connected component in a di-
qi−γi and Po (qo ) ∼ qo−γo , respectively, were observed in 1999 by Albert, rected network. SCC, IN, and OUT are
the giant strongly connected compo-
Jeong, and Barabási, who studied a relatively small nd.edu domain. nent and the giant in- and out-compo-
Broder and coauthors (2000) obtained the degree distributions for a nents. For more detail, see [80]. Com-
much larger network [44]. They fitted the empirical in- and out-degree pare with Broder et al. [44].
distributions by power-law dependencies, with exponents γi = 2.1 and
γo = 2.7, respectively. The power law for Pi (qi ) was observed in a
wide range of in-degrees (about three orders of magnitude), and so the
obtained value 2.1 is reliable. This is not the case for the out-degree
distribution. The range of out-degrees in the WWW is much narrower
than that of in-degrees, so there is even a doubt that a power law for 8
It is impossible to put on a page, say,
out-degrees exists at all [71].8 In any case, in respect of at least the one million links to other web docu-
in-degree distribution, the WWW is a scale-free network. ments.
30 From the Internet to cellular nets

4.3 Cellular networks


The Internet and WWW networks are sufficiently large to allow reli-
able statistical analysis of their architectures. Even on the AS level,
the Internet network has many thousand nodes. In this section we will
touch upon a few really small cellular and genetic networks, whose spe-
9
For detailed discussion of complex cific structures could be revealed despite their tiny size.9 In 2000, Jeong,
network concepts in cellular biology, see Tombor, Albert, Oltvai, and Barabási made a thorough study of the
the review of Barabási and Oltvai [15].
networks of the metabolic reactions of 43 simple organisms belonging to
different domains of life [106]. All these networks were very small, from
200 to 800 nodes. The network of metabolic reactions, in essence, is
a typical chemical reaction graph. Its nodes are molecular compounds
participating in various metabolic reactions as educts or products. The
same compound can be educt in one reaction and product in another.
If two compounds participate in some reaction as an educt and a prod-
uct, connect them by a directed link going from educt to product. Thus
Jeong and coauthors had 43 sparse directed graphs to analyse. The
remarkable conclusion was that the in- and out-degree distributions of
all 43 networks are approximately scale-free with exponent equal to 2.2.
The definite conclusion for such small networks was possible due to the
low value, 2.2, of the degree distribution exponent. The lower this ex-
ponent, the wider the range of degrees to observe a power law. This
scale-free architecture has a direct consequence on the distributions of
metabolic reaction fluxes in these networks. These distributions were
also found to be skewed [86, 8].
The second example of a cellular network, which we touch upon, is
the network of physical protein interactions of the yeast Saccharomyces
cerevisiae [105]. This network of 1870 nodes and 2240 undirected links
is bigger than the metabolic networks that we have discussed in this
section, nonetheless, the statistical data are less conclusive. The nodes of
this network are different proteins and the undirected links are physical
interactions (direct contacts) between them. It was impossible to fit
the empirical degree distribution by a power-law function. Instead, the
distribution was fitted by a power law with an exponential cut-off. The
power-law exponent can be roughly estimated as 2.5, which, we believe,
is already too large to reliably observe scale-free distributions in networks
of this size. Still, the general architecture of this network is rather
similar to that of, say, the AS-level Internet network. In particular, the
empirical dependencies C(q) and q nn (q) look similar to those in Figs. 4.3
and 4.4 [122, 123].
In both these examples, the networks are clearly defined, and a graph
for empirical study can be obtained in a quite strict way. For many other
systems, an underlying network structure is not that obvious. Using
genetic networks as an illustration [156], let us demonstrate how the
network structure can be unveiled.
Without going into detail, genome is a large set of interacting genes
which encode the genetic information of an organism. A given living
organism is characterized by a set of features, the so-called gene expres-
4.4 Co-occurrence networks 31

sions. The expression of each gene is quantitatively described by its Expression levels
level—an expression level. The term ‘interaction between genes’ is used e1
here in the sense that genes function not independently, but in cooper- Genes e2
..
ation: the expression of a gene depends on other genes. Consequently, . Strain 1
1 eN
expression levels are not independent. These correlations between ex- 2
pression levels of genes are treated as the result of the cooperative func- .. ...
.
tion of genes, that is of their ‘interaction’. e1
N e2
The procedure for obtaining correlations between gene expression lev- .. Strain M
els is routine in genetic research. Suppose that the expression levels of .
N genes in a genome are ei , i = 1, . . . , N . The original (‘wild-type’) cell
eN
culture is ‘perturbed’ M times. For example, the culture is exposed to
radiation and mutant strains with distorted genomes are produced. The Fig. 4.8 Genes in a genome (N genes)
(s)
full set of gene expression levels, {ei }, s = 1, . . . , M , of these mutant are expressed in terms of characteristic
strains is measured (see Fig. 4.8). The correlation cij between the ex- features—traits—of the organism (N
gene expression levels). These features
pression levels of two genes, i and j, may be easily obtained by averaging are not independent. Exposure to radi-
over all the mutant strains:10 ation (M times) leads to mutant strains
with M sets of gene expression levels.
cij = hei ej i − hei ihej i , (4.1) The statistical analysis of these sets al-
lows one to find correlations between
where the average h i = M −1 s . Correlations between genes i and j
P
different gene expressions and to unveil
are absent when cij = 0. The resulting numbers cij are the elements of a the cooperative function of the corre-
sponding genes.
large N × N matrix which shows how different genes interact with each
other. However, most of the matrix elements are small and inessential,
10
and this matrix, in its original form, provides superfluous information. This is a simplified form of the ex-
For unveiling the basic structure of gene interactions, one must take pression. Moreover, instead of the ex-
pression levels ei , the logarithms of the
into account only the important matrix elements. So, genes i and j are ratios of expression levels of mutant
believed to be interacting to (be connected) if only the element cij is strains and those of the wild-type cell
greater than some threshold value. This value is chosen such that the culture are actually used.
resulting network is sufficiently sparse, and the structure of essential
pairwise interactions is clearly visible.
The same approach may be applied to many other situations. In
general, network constructions of such a kind use two basic restrictions:
(i) Take into account only ‘important’ pairwise interactions or correla-
tions between ‘nodes’.
(ii) Ignore the difference between the magnitudes of these interactions.
In principle, one can present the complete information in the form of a
weighted network, where each link has its real number value cik , taken,
for example, from eqn (4.1). Usually, however, these weighted networks
are too informative for a direct analysis.

4.4 Co-occurrence networks


In many systems allowing network representation, groups of elements
co-occur in various associations. That is, a given set of elements each
have something in common, or they participate together in some action,
or, they present in the same list, and so on. If these associations are
pairwise, then we have a simple network (nodes connected by ordinary
32 From the Internet to cellular nets

single links). On the other hand, if the associations are more compli-
cated, then we arrive at multi-partite graphs and hypergraphs. Let us
touch upon these more interesting constructions.
The standard example of these networks is the bipartite network of
directors sitting on many boards. We will consider this in the next
lecture. Here we mention another typical example, namely the human
disease network constructed by Goh and coauthors in 2007 [96]. The
network was based on a long list of human genetic disorders and all
known disease genes in the human genome. The disorders and disease
genes are nodes of two types. If mutations in a gene are involved in
some disorder, connect these two nodes by an undirected link. Since the
same gene may be implicated in various disorders, the network has a rich
structure of connections. The researchers hope that analysing the global
organization and statistics of connections in this bipartite network will
allow them to find as yet unknown relations between genetic disorders
and disease genes. The point is that the complete system of disorders
and disease genes is too large to uncover all the relations of this kind
without using network representation.
In principle, the human disease network is a quite typical bipartite
graph. Zlatić, Ghoshal, and Caldarelli (2009) studied more exotic net-
works [186] based on file-sharing databases Flickr and CiteULike. In
these databases, users upload photos (Flickr) or put links to scientific
papers (CiteULike) with short text descriptions. All registered users can
Flickr CiteULike
supply these photos and papers by keywords—descriptive tags. The re-
user user
sulting network has three sorts of nodes: (i) users, (ii) photos (papers),
and (iii) tags. When a user uploads a photo with a tag or assigns a
tag to a photo uploaded by another user, he creates a new hyperedge
interconnecting three nodes: this user, the photo (paper), and the tag;
see Fig. 4.9. So these networks are tripartite hypergraphs, in which ev-
photo tag paper tag ery hyperedge interconnects three nodes of different types. In these
tagged social networks, there are three ‘hyperedge distributions’, for
Fig. 4.9 Hyperedges in Flickr and Ci- each kind of node. The researchers found that these distributions differ
teULike. significantly from each other. The degree distribution for photos (pa-
pers) decays most rapidly of the three, the distribution for users decays
the most slowly (apparently, there exist ‘crazy taggers’), and the degree
distribution for tags demonstrates an intermediate rate of decay. Thus,
interestingly, there are no really ‘superpopular’ photos and papers either
in Flickr or in CiteULike. Instead, there are plenty of people interested
in tagging!
Uncorrelated networks
5
Most real-world networks are very far from the classical random graph
models. In this lecture we show that these models can be greatly im- 5.1 The configuration model 33
proved. 5.2 Hidden variables 34
5.3 Neighbour degree
distribution 35
5.4 Loops in uncorrelated
5.1 The configuration model networks 35
5.5 Statistics of shortest paths 37
By the late 1970s, the theory of classical random graphs was well devel-
5.6 Uncorrelated bipartite
oped, and mathematicians started to search for more general network networks 38
constructions. In 1978, Edward A. Bender and E. Rodney Canfield pub-
lished a paper entitled ‘The asymptotic number of labelled graphs with
given degree sequences’ [22], where they described random networks with
essentially richer architectures than the Erdős–Rényi graph. Béla Bol-
lobás strictly formulated this generalization of the Erdős–Rényi model
in his paper ‘A probabilistic proof of an asymptotic formula for the num-
ber of labelled random graphs’ (1980) and named it the configuration
model [38]. This generalization turned out to be a major step toward
real networks in the post-Erdős epoch.
Before introducing the configuration model, we have to recall the idea
of classical random graphs. A classical random graph is the maximally
random network that is possible for a given mean degree of a node, hqi.
The Erdős–Rényi model is one of the two versions of classical random
graphs: a maximally random network under two restrictions: (i) the to-
tal number of nodes N is fixed and (ii) the total number of links is fixed.
These constraints result in the Poisson form of the degree distribution
which differs from the degree distributions of real-world networks. To
make a step towards real networks, one should be able to construct a
network with, at least, a real degree distribution P (q), and not only a
real mean degree. The idea was to build the maximally random network
for a given degree distribution. The configuration model provides a way
(more precisely, one of the ways) to achieve this goal by directly general-
izing the Erdős–Rényi construction. In graph theory, the term ‘sequence
of degrees’ usually
P means the set of numbers N (q) of nodes of degree q
in a graph, q N (q) = N [128]. Let this sequence be given. The config-
uration model is a statistical ensemble whose members are all possible
labelled graphs, each with the same given sequence of degrees. All these
members are realized with equal probability. We have explained that
this corresponds to the the maximum possible randomness—uniform
randomness.
The same can be done in the following way. Consider a set of N
34 Uncorrelated networks

1
Note that the term ‘at random’ with- nodes: N (1) nodes of degree 1, N (2) nodes of degree 2, and so on; supply
out further specification usually means each of the nodes by q stubs of links; choose stubs in pairs at random
‘uniformly at random’. To build a
particular realization of this network,
and join each pair together into a link (see Fig. 5.1).1 As a result we
make the following: (i) create the full have a network with a given sequence of degrees, but otherwise random.
set of nodes with a given sequence of Clearly, if every node has the same number of connections, then this
stub bunches; (ii) from all these stubs, network is reduced to a random regular graph. On the other hand, if
choose at random a pair of stubs and
join them together; (iii) from the rest the sequence of degrees is drawn from a Poisson distribution, then (in
of the stubs, choose at random a pair the infinite network limit) we get a classical random graph.
of stubs and join them together, and so The heterogeneity of these networks is completely determined by de-
on until no stubs remain. gree distributions; correlations are absent in contrast to real-world net-
works. Fortunately, the majority of phenomena in complex networks can
be explained qualitatively based only on the form of degree distribution,
1/2 without accounting for correlations.

1 2 3

5.2 Hidden variables


1/2 The second way to build an uncorrelated network with an arbitrary
1 2 3
degree distribution was found quite recently, in 2001–2002. This con-
struction [97, 55, 48] directly generalizes the Gilbert (that is the GN,p )
model and is essentially based on the notion of ‘hidden variables’. The
Fig. 5.1 A random network ensemble
general idea of the algorithm used is very simple. (i) To each of the
in the configuration model. In this ex-
ample, N (1) = 2 and N (2) = 1. The two nodes, i = 1, 2, . . . , N , ascribe a number—a hidden variable, di . (ii)
members of the ensemble have equal Then connect nodes in pairs with probabilities depending on the hidden
probability of realization, 1/2. variables of these nodes, so that the probability that there is a link be-
tween nodes i and j is pij = f (di , dj ). The architecture of the resulting
random network is determined by the statistics of the hidden variables
and the form of a given function f (d, d′ ). In particular, if pij = p is a
constant, we arrive at the Gilbert model.
Suppose we aim to get an uncorrelated network with a degree dis-
tribution P (q). Then use the ‘desired degrees’ as the hidden variables.
Namely, (i) ascribe ‘desired degrees’ di —random numbers drawn from
the distribution P (d)—to N nodes, and (ii) connect nodes in pairs (i, j)
with probabilities pij proportional to the products di dj . Normalization
gives pij = di dj /N hdi if we assume that the network is sparse. One can
prove that the degree distribution of the resulting network approaches
the desired form P (q) at large degrees. One can also show that this net-
work is uncorrelated. Chung and Lu (2002) called this construction ‘the
random graphs with a given desired sequence of degrees’ [55]. This ran-
dom network and the configuration model approach each other when the
networks are infinite. However, networks built by using hidden variables
are more convenient for practical purposes than the configuration model.
In modern numerical experiments (simulations), researchers usually use
this easy algorithm or its variations to build uncorrelated networks, and
not the configuration model. Furthermore, these networks are easily
treatable analytically.
Maybe, the reader has already noticed a serious problem with this
construction. If the product di dj exceeds N hdi, the probability pij ,
5.3 Neighbour degree distribution 35

defined as above, becomes greater than 1, which is pure nonsense. This


problem certainly arises for slowly decaying distributions P (d). A simple
patch to remedy this flaw was proposed in the very first work introduc-
ing a construction with hidden variables (2001) [97]. Its authors—Goh,
Kahng, and Kim—proposed to restrict the probability pij from above by
choosing this probability in the form: pij ∝ 1 − exp(−di dj /N hdi).2 The 2
This is, of course, only one of possible
resulting construction is called the static model. The flaw has been fixed forms.
but, in return, another problem has emerged. For slowly decaying distri-
butions P (d), a network built with this pij appears correlated. Without
going into detail, these networks are uncorrelated only if their degree
distributions decay sufficiently rapidly. For more detail, see Lecture 8.

5.3 Neighbour degree distribution


In Section 2.3, we have shown that if the degree distribution of an ar-
bitrary network is P (q), then the degree distribution of any of the end
nodes of a randomly chosen link is equal to qP (q)/hqi. We stress that
this is the case for any network. Let us introduce a joint distribution
P (q, q ′ ) of the degrees q and q ′ of the end nodes of a randomly chosen
link. In uncorrelated networks, these degrees, q and q ′ , are indepen-
dent. So for these networks, a joint degree–degree distribution takes the
following factorised form:
qP (q) q ′ P (q ′ )
P (q, q ′ ) = . (5.1)
hqi hqi
This also means that the branching of a link does not depend on the
degree of its second end. The mean degree of an end of a randomly
chosen link is hq 2 i/hqi. This is also the mean degree of a randomly
chosen node. The mean branching is smaller by 1, and so, as we already
know, b = hq 2 i/hqi − 1. Note that this is also the ratio of z2 (the average
number of the second nearest neighbours of a node) and z1 (the average
number of the nearest neighbours of a node, hqi).
It is important that almost always hq 2 i/hqi is greater than the mean
degree hqi in the network.3 If a degree distribution decays slowly, the 3
Check that the equality hq 2 i/hqi=hqi
difference may be great. For various properties of networks, it is the takes place only if a network has no
nodes other than the bare ones or the
organization of connections of the nearest neighbours of a node that dead ends.
matters. In respect of these properties, a network seems more dense
than it really is. This basic observation explains a great number of
phenomena in complex networks. We will exploit this extensively.

5.4 Loops in uncorrelated networks


The first thing we should do is to find whether these networks are loopy
or not. Actually, we could expect that they have very few loops by
analogy with classical random graphs, but we must check. Let us, to
be concrete, find the clustering coefficient C in the configuration model.
Let a network be large, N ≫ 1. We should find the probability that
36 Uncorrelated networks

two nearest neighbours of a randomly chosen node have at least one link
between them (multiple connections are allowed). Suppose that these
two nodes, i and j, have qi and qj connections, respectively, see Fig. 5.2.
Then we have qi − 1 stubs at the first node and qj − 1 stubs at the
second, and also nearly N hqi stubs at the other nodes. We use the fact
that in the configuration model these stubs are connected in pairs at
random. Then a stub at node i is connected to one of the stubs at
N<q> node j with probability (qj − 1)/(N hqi). So for all qi − 1 stubs together
we get the total probability (qi − 1)(qj − 1)/(N hqi). Now average this
... expression over degrees qi and qj , taking into account that these degrees
are distributed as qP (q)/hqi. This readily gives the clustering coefficient
of an uncorrelated network [133]:
¶2 2
qi 1
µ 2
hq i − hqi b
q C=C= = . (5.2)
j N hqi hqi N hqi
For a Poisson degree distribution, this result is reduced to the expression
C = hqi/N for classical random graphs.
At first sight, formula (5.2) does not look radically different from that
Fig. 5.2 Calculating the clustering co- for classical random graphs. The clustering vanishes in both cases as N
efficient in the configuration model. All approaches infinity. In other words, the clustering is only a finite-size
the stubs in a network are connected in effect in these models. Nonetheless, formally substituting empirical data
pairs at random, with equal probabil- (hq 2 i, hqi, and N ) for real networks into eqn (5.2) usually provides far
ity.
more reasonable values than the classical random graph formula. Heavy
tails in degree distributions of real networks lead to large hq 2 i, and this
in turn results in a sufficiently high value of the calculated clustering
coefficient at finite N . Typically, the classical formula underestimates
C by several orders of magnitude—three, four, five orders, while with
eqn (5.2) we often underestimate it by ‘only’ several times. Roughly
speaking, the configuration model provides the smallest clustering that
is possible in a random network of a given size with a given degree
distribution, and ignores other details. Thus, clustering of many real-
world networks turns out to be not that far from these ‘minimum possible
values’. To explain real values and size-independent contribution to the
clustering coefficient, we should go beyond the configuration model and
its variations.
Similarly, one can find the number NL of loops of length L in an uncor-
L
related network. For sufficiently short (e.g., finite) loops, NL ∼b /(2L)
[30, 32]. On the other hand, the number of loops longer than the di-
ameter of a network is extremely large, ln NL ∝ N . In this respect,
the situation is very similar to that for classical random graphs—few
short loops and many long loops, which corresponds to a locally tree-like
4
More rigorously, if the second mo- architecture.4 This sea of long loops is a necessary feature of uncorre-
ment of a degree distribution diverges, lated networks and, generally, of constructions of this kind. Thanks to
then the tree-like character disappears.
This takes place in infinite networks
the long loops, these networks have no boundaries, no centres. Their
with scale-free degree distributions if nodes can be distinguished only by their degrees, otherwise they are
exponent γ is less than or equal to 3. ‘statistically equivalent’, as physicists often say. We will show that this
Surprisingly, even in this difficult situ- absence of boundaries is critically important for cooperative effects in
ation, the tree ansatz sometimes works.
complex networks.
5.5 Statistics of shortest paths 37

5.5 Statistics of shortest paths


Using the tree ansatz we readily arrive at the asymptotic formula (large
N ) for the mean internode distance (and diameter) of an uncorrelated
network:
ln N
ℓ∼= . (5.3)
ln b
We have obtained this relation discussing classical random graphs. This
result, typical for small worlds, is valid only if the second moment of a
degree distribution is finite in the infinite network limit.5 Otherwise, ℓ 5
Recall that b = (hq 2 i − hqi)/hqi.
can grow even slower than ln N —ultra-small worlds. The resulting form
ℓ(N ) depends on how hq 2 i approaches infinity with growing N . It is
impossible to show a general formula, since this approach differs between
different versions of uncorrelated network models. In particular, it may
be ℓ ∼ ln N/ ln ln N for networks with exponent γ = 3 and ℓ ∼ ln ln N if
exponent γ is smaller than 3 [60]. Furthermore, in some situations, ℓ(N )
even approaches a constant as N → ∞. This is the case, for example, if
a single node in a network attracts a finite number of all connections, as
in Fig. 2.7. Unfortunately, it is hardly possible to distinguish between
various slowly varying dependences on N in finite real-world networks.
Not everything can be done in the framework of the convenient tree
ansatz. Here we have indicated only one problem which cannot be solved
within this approximation. Let us introduce an important notion which
helps to characterize the distribution of shortest paths over a network.
The betweenness centrality (physicists also call it load) shows how often
shortest paths in a network pass through a given node.6 The between- 6
The notion of betweenness centrality
ness centrality of a given node is proportional to the relative number of was first proposed in sociology.
shortest paths between other nodes which run through this node.7 More
7
rigorously, the betweenness centrality of a node is defined as follows. Let Similarly, one can introduce the be-
s(i, j) > 0 be the number of shortest paths between nodes i and j, while tweenness centrality of a link.
s(i, v, j) is the number of these paths passing through node v. Then the
betweenness centrality B(v) of node v is
X s(i, v, j)
B(v) ≡ , (5.4) (a) (b) (c)
s(i, j)
i,j6=v

where the sum is over all nodes other than node v. Similarly to the i v j i j i j
degree distribution, for a random network, one should introduce the
betweenness centrality distribution P(B). v v
Figure 5.3 shows schematically all possible configurations of shortest
paths between three nodes in the same connected component. Note Fig. 5.3 Possible configurations of the
configuration (c) which implies that the presence of loops in a network shortest paths connecting nodes i, j,
certainly changes the value of betweenness centrality. Therefore, to cal- and v. The dotted lines show the short-
est paths between nodes i and v, j and
culate the betweenness centrality distribution, one has take into account
v.
loops. It turns out that this is a difficult task, and so the problem for
loopy networks is still open. As far as we know (2008), even for clas-
sical random graphs, an exact betweenness centrality distribution has
not been found. Nonetheless, this distribution has been investigated in
numerous empirical works and numerical simulations. In a wide range
38 Uncorrelated networks

of networks, including scale-free nets, the observed distribution is rather


close to P(B) ∼ B −2 [97].
In uncorrelated networks, the width of the distribution of internode
distances was found to be independent of N . If a network is sufficiently
small, then a noticeable fraction of nodes may be separated by distances
essentially greater than ℓ. The statistics for this remote part of a network
is of particular interest. The relevant quantity is the distribution of the
number of the ℓ-th nearest neighbours of a randomly chosen node if
ℓ ≫ ℓ. It is even more convenient to use the very close distribution
of the number of nodes z>ℓ at a distance greater than ℓ from a node.
Interestingly, a form of these distributions is essentially determined by
10000 the presence (or absence) of dead ends in a network. If there are no nodes
100 with a single link in a network, then these distributions rapidly decay.
1000
If a finite fraction of all nodes are dead ends, then these distributions
−2
N(q’)

decay quite slowly, as z>ℓ .


N(q)

100
10

10

1 1
5.6 Uncorrelated bipartite networks
0 2 4 6 8 0 10 20 30
q q’ We have demonstrated how to build an uncorrelated undirected one-
partite network. Similarly, one can define uncorrelated models of other
Fig. 5.4 Empirical statistics of the bi- networks: directed networks, multi-partite networks, etc. A number
partite Fortune 1000 graph. Frequency of examples were described in a seminal paper by Newman, Strogatz
with which a director sits on q board is and Watts (2001) [140]. For demonstration purposes, let us focus on
on the left panel; frequency with which
q ′ directors sit on a board is on the right
undirected bipartite networks, see Fig. 1.4 (a). If the reader needs a
panel. Adapted from Newman, Stro- real-world example, it can be a bipartite collaboration network of movie
gatz and Watts [140]. actors or one of the networks of the members of boards of directors. In
the movie actor network, one type of node is actors and the other is
movies where they played. In the network of directors, one kind of node
is directors and the other is boards on which they sit. An uncorrelated
0.10 bipartite network is a network with (i) given numbers of nodes of each
sort, N1 and N2 and (ii) given degree distributions P1 (q) and P2 (q ′ )
for each kind of node, otherwise they are uniformly random. As usual,
P(z)

0.05 we can also say that this is the maximally random bipartite network
with given N1 , N2 , P1 (q), and P2 (q ′ ). Similarly to the one-partite un-
correlated networks, this network has few short loops. Remarkably, the
0.00 one-mode projection of an uncorrelated bipartite network, Fig. 1.4 (b),
0 10 20 30 40 50
is already correlated. Furthermore, this projection is already not tree-
z
like—it has many short loops. In particular, the clustering coefficient C
of the one-mode projection does not vanish as the size of this network
Fig. 5.5 Statistics of the one mode approaches infinity. This shows how one can easily get large clustering
projection (to directors) of the bipar-
starting from an uncorrelated network.
tite Fortune 1000 graph. Dots are the
empirical data: probability that a di- How far is this model from real bipartite networks? It turns out
rector has in total z co-directors on all that it sometimes describes a real situation surprisingly well. In their
his boards. The solid line was calcu- work, Newman, Strogatz and Watts inspected the Fortune 1000 graph,
lated for the one-mode projection of an
uncorrelated bipartite graph with the
namely a bipartite network of the members of boards of directors of the
same statistics as in Fig. 5.4. Adapted US companies with the highest revenues. They obtained three empirical
from [140]. degree distributions: two degree distributions for each of two kinds of
5.6 Uncorrelated bipartite networks 39

nodes, Fig. 5.4, and the degree distribution to the one-mode projection
of this graph to the set of nodes-directors, the dots in Fig. 5.5. Then
they built an uncorrelated bipartite network with the same distributions
P1 (q) and P2 (q ′ ) as in the real network. It was easy to compute the
degree distribution of the one-mode projection of this model network,
the solid line in Fig. 5.5. The reader can see that the result is very
close to the real-world data. The calculated clustering coefficient of the
one-mode projection, C = 0.59, practically coincided with the measured
one. Note the large value of the clustering coefficient. One has to admit
that such close agreement is the exception rather than the rule.
Another random document with
no related content on Scribd:
Mert az asszonyom meséi szelidek. Nincs bennük gyülölet és
semmi, ami rossz. Mert ezek a mesék sohasem szenvednek, csak
nesztelenűl folynak, mint a felhők völgyeiben a nap arany patakja.
Aztán a kis legényt ágyába helyezzük, a rajta alvó macskát szép
puhán kivetjük és a faliórát, a házunk dobogó szivét, pillantásunkkal
szigoruan csendre intjük.
– Milyen drágán alszik, – ugymondja az anyja s orcájára csókol.
Ha most nappal volna s kertünkben aludna, az édescsók helyére
pillangó szállana. Igy az öreg ágyban (három nemzedéknek vénhedt
dajkájában) csak a sok játék rezdül meg kicsi homlokában, arasznyi
élete fehér koporsójában. Aztán körülrakja kicsi fiát őrangyal
szemekkel, ahogy minden este szokta s végül hozzám fordul.
– És velünk mi történt?
Mintha azt mondaná: – Gondosztozás legyen.
Az a gondosztozás csak azért sem lészen. (Férfi vagyok, vagy
mi!) – Huncutul felelem székely babonával.
– Menj el lelkem s keress három, virágvasárnapján, pitymalatkor
született hatvanéves leányt, kinek kötényébe penészes a virág, azok
majd megmondják.
Ahogy tudom, csak úgy következik most is.
– Van-e pénzünk?
– Az nincs nekünk, – sugom szeliden.
Igy szoktuk ezt minden este. Ilyen fene jókedv a házunknak
rendes bevétele. Az asszony nem szól, – egyáltalán nem szól senki
semmit, – csak az anyám tolja fel a pápaszemét. Ránk néz s az
apám bibliája, ez az őskorbeli ékesség kiesik kezéből. A gyertya
három szaporát pislant utána és a kutya a könyvpuffanásra elmordul
a küszöbön. Szerencsére az alvó gyermek álmában elkacagja magát.
– Törött aranycserép az ő kacagása s ettől házunk, mely imént
meghalni készült, ujra felderül. Az óra hosszu lábaival élénken elsiet
a falon. Nagyanyó a hálóházba vonúl:
– No-no gyermekek!
A holnapi napra nehány kedvező kilátást hazudok és ezt olyan
lelkemből cselekszem, hogy végűl is az asszony belenyugszik s a
reménységet feje alá téve, ő is elaluszik.
Kicsit nézem őket. A lámpa üszkösödik. A fiam rózsaszínü. Kivül
az égen a felhők szétverődnek, mintha a leigázott emberiség
bontakoznék ki mögöttük, hogy azt mondja:
– Mit is mondjon? – ütődik meg a lelkem…
Találgatom, bontogatom a gondolat szirmát, hogy mi lehetne a
szabad emberiség első szava. Feleségem sóhajtása-e, vagy a Petőfi-
mesével elkábított fiam tündér menyországa? A magam szivét nem
számítom, mert az emberiség most bomlik ki dermedtségéből. Az
ebek földnek nyomott orral lapulnak az éj hatalmas ölén, mert érzik
a szakadatlan rázkódást, mely az élet feltörő áramával minden élőt
elborít.
A belső szabadulás természet fölötti izgalmában fiam fejéhez ülök
kényszertől ösztönözve, hogy mesét mondjak az alvó gyermeknek.
– Kicsi fiam, ne hidd az anyád meséjét!
Igy dobtam ki az élet villámverte, ismeretlen partjaira, hol
háboru- és gyűlölet-fantomok küzdenek és ezeknek zaját sohase
verheti át a gondviselő boldogság kiáltása. A gyermekágynál úgy
éreztem, hogy a világ hatalmas sebe beszél általam.
És mondottam:
– Az az ember, kiről annyit mesél anyád, a szenvedés érett
kalásza volt. Majdnem szeméten, éhenveszett el. A háznak fájt, hogy
ilyen nagy teher lakik benne és ő is hányszor kérte, – hagyj el
engem élet.
Zajt hallok. Talán az ablakon valami hír kopog? Semmi, Csak a
falomb nézet be rajta, hogy hol járhat a virrasztó férfi lelke?
Gondolkodva, méltósággal szómra figyelmeztet és ismét hátra hajlik.
Az asszony arcán gondárnyalat lászik. Elszégyenlem magam.
– Volt egyszer, hol nem volt, olyan kis fiu volt, drága volt, mint
egy meleg csók az arcon, könnyü, mint a fehér köd a sár fölött és
csak akkora volt, mint te vagy kis fiam. Ez a kicsi fiu bizony sokat
éhezett, mert a kis családnak még kenyere sem volt. Nagyon
szegények voltak és jók, – mint mi. A rossz emberek mindenüket
elvették. Egyszer arra kérte az édes apját, ha nem is ád, legalább
mutassa meg neki a kenyérkét, de a kenyérke… már aludt.
– A kis fiu látott egy sötét álmot, ágyán hanyatesett és ő is
elaludt. Édesanyja költögette:
– Szép gyermekem, fehér virág, fehér sugár, kit elnyel a fekete
föld! Nem fáj fejed? Nem fáj szived?
A fiucska nem felelt. Fehér volt, halvány volt, mozdulatlan és csak
a teste, a lélek sara volt meg. Mert, – tudod fiam – ő is sok mesét és
kevés kenyeret kapott, mint… mint – és nem mertem a fiam sorsát a
mese sorsával összekötni.
A gyermek felpillant a beszédre. Rám néz, összegyűjti az álom
izét szájában és újra türelmes lesz, mint a halott, ki arra vár, hogy
temetőbe vigyék.
Ismét megdöbbenek attól, amit mondtam.
– Nem igaz, ne hallgasd, még szebb mesét mondok!
– Hát, hogy ne hazudjak, be kell, hogy ismerjem: A kisfiu nem
halt meg, mert az édesanyja nagyon szerette.
– Az édesanyja. – Tudod? – Aki volt aranyból épült Ház. Sértetlen
Anya. Jótanácsnak Anyja, hívséges és titkos értelmű, mint a te
édesanyád. De ő volt a Hajnali csillag is, kinek hajnalig csillogott a
tekintete a gyermek álma miatt és sárgára váltak ujjai a gyertyaláng
fölött, mig érte való imádságát nehéz lelkéről leoldozá.
– És sokszor elnyiltak a völgyben a kerti virágok s hullatta levelét
a nyárfa az ablakuk alatt. A kisfiuból nagy, szomoru fiu lett; akinek a
szive alatt olthatatlan tüzek égnek; de a feje az apja kalapjában
lakik, kordovány csizmáját kegyelemből szabták, kinek Isten épít
palotát, de olyat, ahol füttyel melegitheti fagyos körmeit a téli
hidegben. És mégis úgy érzi, hogy láng ül a feje helyén. Te még nem
ismered, de az olyan élet, amelyik Afrikában megcsalja a hiénát,
mert keserü szivét adja étkül. Nem hiszed? Akkor történt, mikor a
fürj azt mondotta – pitypalaty – és a felhőöltözetü ég mellén
ragyogott az érdempénz, a nap.
– Hiszed, vagy nem hiszed, szóról-szóra történt. A fergeteg, ez a
barna paraszt tépte, cibálta rongyait és a szemeközé verte az út
bogáncsait, Amikor véresen, betegen tántorgott egyedül, felnézett az
égre és azt kérdezte, hogy: – Nem vércsepp-e a csillag, hisz itt a
földön oly sokat gyilkolnak.
– Nem mese ez gyermek!
– Én mégis azt hiszem, hogy ő földileg mégis meg volt halva,
hiszen mindig panaszkodott, hogy álomképek meresztik rá vad
szemeiket és tisztán látta azt is, hogy mit szenved mostan a te
édesapád.
– Mit mondasz? Hogy ez, hogyan tehet?
– Hagyd el! Majd ha megnőssz és felmégy az öreg
székelykirálynak, a Hargitának tetejére, majd megtudod te is. Onnan
majd meglátod, hogy az emberek mindenhol egyformán szenvednek
és egyformán kell szeretni őket. De ezt csak az érti, aki már
megmosdott saját könnyében, tisztította száját barmok ételével,
éjjeli lakást vett a virág szájában s onnan nézett le a mások életébe;
– mint ahogy ő tette, nagy Petőfi Sándor.
Ott, a virágszájban legjobban hallatszik, mint rágják el az
emberek egymás lelkének aranyhurjait és csak a kis pokol, a sziv
izzik a gyülölettől. Ennek a virágnak piciny harangjai mind vészre
kondítnak, amikor azt látják, hogyan orozzák el a földnek népei
egymásnak életét. Hej, nem lehet napsugárnak nevezni a kigyót.
– Szive volt az üllő, melyen az egész világ dalát Isten kalapálta,
hogy a pusztulás lármája immár csendesüljön és a megtépett föld, a
siroktól szaggatott, velünk megbéküljön. Ma még árkok falazata
sebzi a daloknak szárnyát, de mégis meggyőzte a gyilkolni siető
golyó lélegzetét. Az ember pedig a földarculatának sok apró foltja
szabadult általa, mert kicsi fiam, olyan ő, mint a csodamadár, ki
himes szárnyán terít asztalt kicsinyeinek.
– Én pedig Fiam, ki mindezt mondom róla, oly parányi vagyok,
mint a megégett papir hamván a szikra. Csak ő röpül, röpül s ott van
már azelőtt, ki pillantásának kormával kormányozza a világot, kinek
valója fény,
– Te még nem érted meg, kicsi vagy, kit feldönt még a fűszál, de
eljön ideje, amikor benned is égni kezd a kérdés: Lenni, vagy nem
lenni.
– Vajjon megéred-e, te anyamesén nevekedő gyermek?
– Akkor már vége a mesének. Fölébredsz álmodból, megtanulsz
szeretni.
– Már közel a virradás, szelid fény jön hozzánk, már piroslani
látom az almát a fán. Az asszony is ébred, könnye legördült a
párnán.
– De te ne szólj neki, hogy Mindezt elmondottam. Várj és kacagj,
örülj anyád meséjének, hiszen olyan korán halnak meg a mesék.
A házunk fölött hajnali zaj ébred és visszatér a világosság,
elárasztva mindent, hogy égi jelével ma is megmutassa célját a
világnak.
TARTALOM.

Hull immár a fenyőtoboz 3


A vérlátó legény 11
Földig le székely 15
Sebe László hazabeszél 20
Apa Mózes temploma 26
Bosszú 33
Jézusfaragó ember 38
Csodálókő 46
Az időnap előtt ébredt ember 56
Tedd le a nevedet Gidró Péter 60
Lopják a gyümölcsöt 65
Nyirfák alatt 69
A klánéta 74
Gyertyavilághoz járunk 78
Haldoklik a székely 83
Az igric 88
Vizbetemetkezés 94
A virág jussa 97
Három muzsikaszó 102
Rapsóné rózsája 108
Értelek virág 116
Gépinditás 123
Aranykehely 137
Az alvó gyermek feje fölött 161

A cimlapot Kós Károly tervezte.


*** END OF THE PROJECT GUTENBERG EBOOK JÉZUSFARAGÓ
EMBER ***

Updated editions will replace the previous one—the old editions will
be renamed.

Creating the works from print editions not protected by U.S.


copyright law means that no one owns a United States copyright in
these works, so the Foundation (and you!) can copy and distribute it
in the United States without permission and without paying
copyright royalties. Special rules, set forth in the General Terms of
Use part of this license, apply to copying and distributing Project
Gutenberg™ electronic works to protect the PROJECT GUTENBERG™
concept and trademark. Project Gutenberg is a registered trademark,
and may not be used if you charge for an eBook, except by following
the terms of the trademark license, including paying royalties for use
of the Project Gutenberg trademark. If you do not charge anything
for copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such as
creation of derivative works, reports, performances and research.
Project Gutenberg eBooks may be modified and printed and given
away—you may do practically ANYTHING in the United States with
eBooks not protected by U.S. copyright law. Redistribution is subject
to the trademark license, especially commercial redistribution.

START: FULL LICENSE


THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK

To protect the Project Gutenberg™ mission of promoting the free


distribution of electronic works, by using or distributing this work (or
any other work associated in any way with the phrase “Project
Gutenberg”), you agree to comply with all the terms of the Full
Project Gutenberg™ License available with this file or online at
www.gutenberg.org/license.

Section 1. General Terms of Use and


Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand, agree
to and accept all the terms of this license and intellectual property
(trademark/copyright) agreement. If you do not agree to abide by all
the terms of this agreement, you must cease using and return or
destroy all copies of Project Gutenberg™ electronic works in your
possession. If you paid a fee for obtaining a copy of or access to a
Project Gutenberg™ electronic work and you do not agree to be
bound by the terms of this agreement, you may obtain a refund
from the person or entity to whom you paid the fee as set forth in
paragraph 1.E.8.

1.B. “Project Gutenberg” is a registered trademark. It may only be


used on or associated in any way with an electronic work by people
who agree to be bound by the terms of this agreement. There are a
few things that you can do with most Project Gutenberg™ electronic
works even without complying with the full terms of this agreement.
See paragraph 1.C below. There are a lot of things you can do with
Project Gutenberg™ electronic works if you follow the terms of this
agreement and help preserve free future access to Project
Gutenberg™ electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright law
in the United States and you are located in the United States, we do
not claim a right to prevent you from copying, distributing,
performing, displaying or creating derivative works based on the
work as long as all references to Project Gutenberg are removed. Of
course, we hope that you will support the Project Gutenberg™
mission of promoting free access to electronic works by freely
sharing Project Gutenberg™ works in compliance with the terms of
this agreement for keeping the Project Gutenberg™ name associated
with the work. You can easily comply with the terms of this
agreement by keeping this work in the same format with its attached
full Project Gutenberg™ License when you share it without charge
with others.

1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the
terms of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.

1.E. Unless you have removed all references to Project Gutenberg:

1.E.1. The following sentence, with active links to, or other


immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project Gutenberg™
work (any work on which the phrase “Project Gutenberg” appears,
or with which the phrase “Project Gutenberg” is associated) is
accessed, displayed, performed, viewed, copied or distributed:
This eBook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this eBook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is derived


from texts not protected by U.S. copyright law (does not contain a
notice indicating that it is posted with permission of the copyright
holder), the work can be copied and distributed to anyone in the
United States without paying any fees or charges. If you are
redistributing or providing access to a work with the phrase “Project
Gutenberg” associated with or appearing on the work, you must
comply either with the requirements of paragraphs 1.E.1 through
1.E.7 or obtain permission for the use of the work and the Project
Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is posted


with the permission of the copyright holder, your use and distribution
must comply with both paragraphs 1.E.1 through 1.E.7 and any
additional terms imposed by the copyright holder. Additional terms
will be linked to the Project Gutenberg™ License for all works posted
with the permission of the copyright holder found at the beginning
of this work.

1.E.4. Do not unlink or detach or remove the full Project


Gutenberg™ License terms from this work, or any files containing a
part of this work or any other work associated with Project
Gutenberg™.

1.E.5. Do not copy, display, perform, distribute or redistribute this


electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the Project
Gutenberg™ License.

1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or
expense to the user, provide a copy, a means of exporting a copy, or
a means of obtaining a copy upon request, of the work in its original
“Plain Vanilla ASCII” or other form. Any alternate format must
include the full Project Gutenberg™ License as specified in
paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,


performing, copying or distributing any Project Gutenberg™ works
unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or providing


access to or distributing Project Gutenberg™ electronic works
provided that:

• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”

• You provide a full refund of any money paid by a user who


notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of


any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.

• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™


electronic work or group of works on different terms than are set
forth in this agreement, you must obtain permission in writing from
the Project Gutenberg Literary Archive Foundation, the manager of
the Project Gutenberg™ trademark. Contact the Foundation as set
forth in Section 3 below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend


considerable effort to identify, do copyright research on, transcribe
and proofread works not protected by U.S. copyright law in creating
the Project Gutenberg™ collection. Despite these efforts, Project
Gutenberg™ electronic works, and the medium on which they may
be stored, may contain “Defects,” such as, but not limited to,
incomplete, inaccurate or corrupt data, transcription errors, a
copyright or other intellectual property infringement, a defective or
damaged disk or other medium, a computer virus, or computer
codes that damage or cannot be read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for


the “Right of Replacement or Refund” described in paragraph 1.F.3,
the Project Gutenberg Literary Archive Foundation, the owner of the
Project Gutenberg™ trademark, and any other party distributing a
Project Gutenberg™ electronic work under this agreement, disclaim
all liability to you for damages, costs and expenses, including legal
fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR
NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR
BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH
1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK
OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL
NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT,
CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF
YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you


discover a defect in this electronic work within 90 days of receiving
it, you can receive a refund of the money (if any) you paid for it by
sending a written explanation to the person you received the work
from. If you received the work on a physical medium, you must
return the medium with your written explanation. The person or
entity that provided you with the defective work may elect to provide
a replacement copy in lieu of a refund. If you received the work
electronically, the person or entity providing it to you may choose to
give you a second opportunity to receive the work electronically in
lieu of a refund. If the second copy is also defective, you may
demand a refund in writing without further opportunities to fix the
problem.

1.F.4. Except for the limited right of replacement or refund set forth
in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied


warranties or the exclusion or limitation of certain types of damages.
If any disclaimer or limitation set forth in this agreement violates the
law of the state applicable to this agreement, the agreement shall be
interpreted to make the maximum disclaimer or limitation permitted
by the applicable state law. The invalidity or unenforceability of any
provision of this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation,


the trademark owner, any agent or employee of the Foundation,
anyone providing copies of Project Gutenberg™ electronic works in
accordance with this agreement, and any volunteers associated with
the production, promotion and distribution of Project Gutenberg™
electronic works, harmless from all liability, costs and expenses,
including legal fees, that arise directly or indirectly from any of the
following which you do or cause to occur: (a) distribution of this or
any Project Gutenberg™ work, (b) alteration, modification, or
additions or deletions to any Project Gutenberg™ work, and (c) any
Defect you cause.

Section 2. Information about the Mission


of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the


assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will
remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.

The Foundation’s business office is located at 809 North 1500 West,


Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact

Section 4. Information about Donations to


the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many
small donations ($1 to $5,000) are particularly important to
maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws regulating


charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states where


we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make


any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About


Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.
Project Gutenberg™ eBooks are often created from several printed
editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.

You might also like