leiden_communities#
- leiden_communities(G, *, weight='weight', resolution=1.0, max_level=None, seed=None, metric='cpm', theta=0.01)[source]#
Return best node communities of
G.This function uses the Leiden Community Detection algorithm [1] to estimate the community structure based on metric optimization. The metric can be modularity or Constant Potts Model (CPM). Leiden ensures that the communities are well connected whereas Louvain does not. See
louvain_communities. The functions which compute those two metrics aremodularityandconstant_potts_modelThe algorithm works in 3 phases. On the first phase, starting with a partition of all singleton nodes, and a randomly shuffled queue of all nodes, each node in the queue is considered in order and moved to the current community which most increases the metric value. The node can stay in its current community. If it moves, all neighbors of the moved node which are not currently in the queue are added to the end of the queue to be considered for moving in turn. The first phase continues until the queue is empty, resulting in a node partition \(P\).
The second phase refines \(P\) to obtain \(P_{refined}\). Each community in \(P\) is considered on its own and split into subcommunities using a method closely resembling the first phase. Starting from a subpartition of singleton nodes, and a randomly ordered queue, subcommunities are merged only if both are sufficiently well connected. This is determined randomly using a distribution based on edge weights and controlled by the parameter
theta. The merging results in a splitting or refinement of the communities in \(P\) such that each set in \(P_{refined}\) is a subset of one community in \(P\). Thus each community in \(P\) also represents a community of subcommunities from \(P_{refined}\).The third phase involves creating an aggregate network of community connections. The communities in \(P_{refined}\) act as nodes in the new network. Edges exist between communities that have underlying nodes connected to nodes in the other community. The weight of each such edge is the sum of the original graph edge weights for edges between nodes in the two partitions. An initial partition of the new network is provided via \(P\) by joining the subcommunities of each community in \(P\) into an aggregated community of the new network. Each subcommunity from \(P_{refined}\) is a subset of one community in \(P\) and thus belongs to one community in the new network.
The result of the 3 phases is a new network (level of aggregation) which encodes a new partition of the original network. We can reapply the 3 phases to create bigger aggregations with increased metric value. At each level the network is smaller with fewer nodes and edges, so convergence is guaranteed. The three phases are repeatedly applied until no gain is achieved or
max_levelapplications have been performed.- Parameters:
- GNetworkX graph
- weightstring or None, optional (default=”weight”)
The name of an edge attribute that holds the numerical value used as a weight. If None then each edge has weight 1.
- metricstr (default=”cpm”)
The name of the partition quality metric that the algorithm optimises. Allowed names are “cpm” and “modularity” for constant potts model and modularity respectively. See functions
modularityandconstant_potts_modelfor more info.- resolutionfloat, optional (default=1)
Resolution should be a positive number indicating the coarseness of the communities produced. With a lower resolution, larger communities are produced. To see the smaller sub-communities, use a higher resolution.
- max_levelint or None, optional (default=None)
The maximum number of levels (steps of the algorithm) to compute. Must be a positive integer or None indicating the process is reapplied until no increase in metric is obtained.
- seedinteger, random_state, or None (default)
Indicator of random number generation state. See Randomness.
- thetafloat (default=0.01)
Parameter that determines the degree of randomness in the second phase _refine_partition step of the algorithm,
- Returns:
- list
A partition of
Gas a list of disjoint sets of nodes. Each set represents one community of nodes and each node ofGappears in exactly one set.
Notes
The order in which the nodes are considered can affect the final output. In the algorithm the ordering happens using a random shuffle controlled by
seedwhich also controls the randomness involved in selecting communities to merge during the refining (2nd) stage.References
[1]Traag, V.A., Waltman, L. & van Eck, N.J. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep 9, 5233 (2019). https://doi.org/10.1038/s41598-019-41695-z
Examples
>>> G = nx.barbell_graph(3, 4) >>> cpm_comm = nx.community.leiden_communities(G, resolution=0.2, seed=62537129) >>> len(cpm_comm) 3 >>> sorted(sorted(c) for c in cpm_comm) [[0, 1, 2], [3, 4, 5, 6], [7, 8, 9]]
Higher resolution produces smaller sub-communities:
>>> cpm_comm = nx.community.leiden_communities(G, resolution=0.4, seed=62537129) >>> len(cpm_comm) 4 >>> sorted(sorted(c) for c in cpm_comm) [[0, 1, 2], [3, 4], [5, 6], [7, 8, 9]] ----
Additional backends implement this function
- cugraphGPU-accelerated backend.
- Additional parameters:
- dtypedtype or None, optional
The data type (np.float32, np.float64, or None) to use for the edge weights in the algorithm. If None, then dtype is determined by the edge values.