Source code for networkx.algorithms.bipartite.community

"""Community detection and quality functions for bipartite graphs."""

import networkx as nx
from networkx.algorithms.community.community_utils import is_partition
from networkx.algorithms.community.quality import NotAPartition
from networkx.utils.decorators import not_implemented_for

__all__ = ["modularity"]


[docs] @not_implemented_for("directed") @nx._dispatchable(name="bipartite_modularity", edge_attrs="weight") def modularity(G, communities, nodes, *, weight="weight", resolution=1): r"""Returns Barber's bipartite modularity of the given partition. Bipartite modularity [1]_ adapts Newman's modularity to bipartite networks by replacing the configuration-model null model with one that respects the bipartite structure: expected edges only occur between the two node sets. For a bipartite graph with "red" nodes (one set) and "blue" nodes (the other), it is defined as .. math:: Q_B = \frac{1}{m} \sum_{v=1}^{r} \sum_{w=1}^{c} \left( \tilde{A}_{vw} - \gamma\frac{k_v d_w}{m} \right) \delta(c_v, c_w) where $m$ is the (weighted) number of edges, $\tilde{A}$ is the bipartite adjacency matrix, $k_v$ is the (weighted) degree of red node $v$, $d_w$ is the (weighted) degree of blue node $w$, $\gamma$ is the resolution parameter, and $\delta(c_v, c_w)$ is 1 if $v$ and $w$ are in the same community, else 0. Following Clauset, Newman and Moore [2]_, this can be rewritten as a sum over communities: .. math:: Q_B = \sum_{c=1}^{n} \left[ \frac{L_c}{m} - \gamma\,\frac{k_c d_c}{m^2} \right] where $L_c$ is the (weighted) number of intra-community edges for community $c$, $k_c$ is the sum of degrees of the red nodes in $c$, and $d_c$ is the sum of degrees of the blue nodes in $c$. This is the form used in the implementation. Note the structural analogy with the standard (unipartite) sum formulation `L_c/m - gamma * (k_c / 2m)^2`: the null-model term is a product of two degree sums, and in the undirected unipartite case those two sums coincide so the product becomes a square. In the bipartite case the sums differ (red vs. blue), so the null model term remains a product of two distinct terms. Communities in a bipartite modularity partition may contain nodes from both bipartite sets; the bipartite structure enters only through the null model. Parameters ---------- G : NetworkX Graph An undirected bipartite graph. communities : list or iterable of sets of nodes These node sets must represent a partition of `G`'s nodes. Communities may contain nodes from both bipartite sets. nodes : container of nodes A container with all nodes in **one** bipartite node set (the "red" nodes). The other set is inferred as ``set(G) - set(nodes)``. This follows the convention used throughout the bipartite subpackage. weight : string or None, optional (default="weight") The edge attribute that holds the numerical value used as the edge weight. If None or if an edge does not have the attribute, that edge is treated as having weight 1. resolution : float (default=1) Resolution parameter $\gamma$. With the default ``resolution=1``, $Q_B$ reduces to Barber's original definition. Values smaller than 1 favor larger communities; values greater than 1 favor smaller communities. The resolution parameter is not part of Barber's original definition, but a standard extension consistent with :func:`networkx.community.modularity`. Returns ------- Q_B : float The bipartite modularity of the partition. Raises ------ NotAPartition If `communities` is not a valid partition of the nodes of `G`. NetworkXNotImplemented If `G` is directed. Barber's formulation is for undirected bipartite graphs. Examples -------- Two disconnected :math:`K_{2,2}` components form a strong bipartite community structure: >>> G = nx.Graph([(0, 2), (1, 3)]) >>> red = {0, 1} >>> nx.bipartite.modularity(G, [{0, 2}, {1, 3}], red) 0.5 Notes ----- The functions in the bipartite package do not check that `nodes` is actually one side of a bipartition of `G`. It is the caller's responsibility to provide a valid bipartite graph and a valid bipartite node set. See Also -------- modularity References ---------- .. [1] Barber, M. J. (2007). "Modularity and community detection in bipartite networks." Physical Review E, 76(6), 066102. https://doi.org/10.1103/PhysRevE.76.066102 .. [2] Clauset, A., Newman, M. E. J., and Moore, C. (2004). "Finding community structure in very large networks." Physical Review E, 70(6), 066111. """ if not isinstance(communities, list): communities = list(communities) if not is_partition(G, communities): raise NotAPartition(G, communities) red = set(nodes) blue = set(G) - red degree = dict(G.degree(weight=weight)) # Every edge has exactly one red and one blue endpoint, so the sum of # red degrees equals the total (weighted) number of edges m. m = sum(degree[v] for v in red) if m == 0: return 0.0 norm = 1 / m**2 def community_contribution(community): comm = set(community) # L_c: (weighted) internal edges. In a bipartite graph edges only # exist between the two node sets, so iterating over all edges # with both endpoints in comm counts each bipartite edge exactly # once. L_c = sum(wt for _, v, wt in G.edges(comm, data=weight, default=1) if v in comm) k_c = sum(degree[u] for u in comm & red) d_c = sum(degree[u] for u in comm & blue) return L_c / m - resolution * k_c * d_c * norm return sum(community_contribution(c) for c in communities)
def _bipartite_modularity_merge_delta( G, community_a, community_b, resolution, weight="weight", *, red_degree_attr="red_degree", blue_degree_attr="blue_degree", m, ): r"""Change in bipartite modularity from merging two disjoint communities. Computes ``Q_B(P') - Q_B(P)`` where P contains `community_a` and `community_b` as separate communities and P' is obtained by replacing them with their union ``community_a U community_b``. For bipartite modularity ``Q_B = sum_c [L_c/m - gamma k_c d_c / m^2]``, the merge delta has the closed form merge_delta = E(A, B) / m - gamma * (k_A d_B + k_B d_A) / m^2 where ``E(A, B)`` is the sum of edge weights between A and B, ``k_C`` is the sum of the "red" degrees of nodes in C, and ``d_C`` is the sum of the "blue" degrees of nodes in C. **Attribute-based interface.** Rather than identifying each node with a single bipartite side (which breaks once Louvain/Leiden has aggregated communities into super-nodes that mix both colors), this helper reads per-node ``red_degree_attr`` and ``blue_degree_attr`` attributes. On the original bipartite graph a red node has ``red_degree = G.degree(v)`` and ``blue_degree = 0`` (and vice versa). On an aggregated graph each super-node carries the sums of both attributes over its constituent nodes; both may be nonzero. This is the same pattern used for directed modularity (in-degree / out-degree tracked as separate attributes). `community_a` and `community_b` must be disjoint and non-empty. Self-loops are tolerated: on the original graph they are absent by the bipartite convention, and on aggregated graphs they represent intra-community edges which are not part of ``E(A, B)`` and are correctly excluded by the ``v in community_b`` filter below. Parameters ---------- G : NetworkX Graph A graph whose nodes carry the red/blue degree attributes. May be an original bipartite graph or an aggregated graph produced by a multilevel optimizer. community_a, community_b : set Two disjoint communities to merge. resolution : float Resolution parameter gamma. weight : str or None Edge weight attribute. red_degree_attr : str Keyword-only. Name of the node attribute holding the "red-side" contribution to the node's degree. blue_degree_attr : str Keyword-only. Name of the node attribute holding the "blue-side" contribution to the node's degree. m : float Keyword-only. Total (weighted) edge count, a graph-level invariant preserved across aggregation. """ # Iterate over community members rather than all of G.nodes nodes = G.nodes k_a = sum(nodes[v].get(red_degree_attr, 0) for v in community_a) d_a = sum(nodes[v].get(blue_degree_attr, 0) for v in community_a) k_b = sum(nodes[v].get(red_degree_attr, 0) for v in community_b) d_b = sum(nodes[v].get(blue_degree_attr, 0) for v in community_b) # E(A, B): edges with one endpoint in A and the other in B. Iterate # edges incident to community_a and keep those landing in community_b. # Self-loops (intra-community edges on aggregated super-nodes) are # filtered out because they have both endpoints in community_a. E_cross = sum( wt for _, v, wt in G.edges(community_a, data=weight, default=1) if v in community_b ) return E_cross / m - resolution * (k_a * d_b + k_b * d_a) / m**2