Clustering coefficient


In graph theory, a clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. Evidence suggests that in most real-world networks, and in particular social networks, nodes tend to create tightly knit groups characterised by a relatively high density of ties; this likelihood tends to be greater than the average probability of a tie randomly established between two nodes.
Two versions of this measure exist: the global and the local. The global version was designed to give an overall indication of the clustering in the network, whereas the local gives an indication of the embeddedness of single nodes.

Global clustering coefficient

The global clustering coefficient is based on triplets of nodes. A triplet is three nodes that are connected by either two or three undirected ties. A triangle graph therefore includes three closed triplets, one centered on each of the nodes. The global clustering coefficient is the number of closed triplets over the total number of triplets. The first attempt to measure it was made by Luce and Perry. This measure gives an indication of the clustering in the whole network, and can be applied to both undirected and directed networks.
The global clustering coefficient is defined as:
The number of closed triplets has also been referred to as 3 × triangles in the literature, so:
A generalisation to weighted networks was proposed by Opsahl and Panzarasa, and a redefinition to two-mode networks by Opsahl.

Local clustering coefficient

The local clustering coefficient of a vertex in a graph quantifies how close its neighbours are to being a clique. Duncan J. Watts and Steven Strogatz introduced the measure in 1998 to determine whether a graph is a small-world network.
A graph formally consists of a set of vertices and a set of edges between them. An edge connects vertex with vertex.
The neighbourhood for a vertex is defined as its immediately connected neighbours as follows:
We define as the number of vertices,, in the neighbourhood,, of a vertex.
The local clustering coefficient for a vertex is then given by the proportion of links between the vertices within its neighbourhood divided by the number of links that could possibly exist between them. For a directed graph, is distinct from, and therefore for each neighbourhood there are links that could exist among the vertices within the neighbourhood. Thus, the local clustering coefficient for directed graphs is given as
An undirected graph has the property that and are considered identical. Therefore, if a vertex has neighbours, edges could exist among the vertices within the neighbourhood. Thus, the local clustering coefficient for undirected graphs can be defined as
Let be the number of triangles on for undirected graph. That is, is the number of subgraphs of with 3 edges and 3 vertices, one of which is. Let be the number of triples on. That is, is the number of subgraphs with 2 edges and 3 vertices, one of which is and such that is incident to both edges. Then we can also define the clustering coefficient as
It is simple to show that the two preceding definitions are the same, since
These measures are 1 if every neighbour connected to is also connected to every other vertex within the neighbourhood, and 0 if no vertex that is connected to connects to any other vertex that is connected to.

Network average clustering coefficient

As an alternative to the global clustering coefficient, the overall level of clustering in a network is measured by Watts and Strogatz as the average of the local clustering coefficients of all the vertices :
It is worth noting that this metric places more weight on the low degree nodes, while the transitivity ratio places more weight on the high degree nodes. In fact, a weighted average where each local clustering score is weighted by is identical to the global clustering coefficient.
A graph is considered small-world, if the graph has a small mean-shortest path length that scales with the natural log of the number of nodes in the network,
. For example, a random graph is small-world, while a lattice is not, and scale-free graphs are often considered ultra-small world, as their mean-shortest path length scales with.
A generalisation to weighted networks was proposed by Barrat et al., and a redefinition to bipartite graphs by Latapy et al. and Opsahl.
Alternative generalisations to weighted and directed graphs have been provided by Fagiolo and Clemente and Grassi.
This formula is not, by default, defined for graphs with isolated vertices; see Kaiser and Barmpoutis et al. The networks with the largest possible average clustering coefficient are found to have a modular structure, and at the same time, they have the smallest possible average distance among the different nodes.