butterflies#

butterflies(G, nodes=None)[source]#

Count the number of butterflies for each node in a bipartite graph.

A butterfly is a complete bipartite subgraph K_{2,2} — four nodes (two from each partition) with all four cross-edges present. It is the bipartite analogue of a triangle in unipartite graphs.

A1   A2
| \ / |
|  X  |
| / \ |
B1   B2

Equivalently, a butterfly is a 4-cycle (C_4) in which alternating nodes belong to different partitions of the bipartite graph. This structure is also called a square in the physics and complex-networks literature [3], and a four-cycle in the sociology literature [4]. The name butterfly is standard in the data-mining and bipartite-network literature [1] [2].

Parameters:
GNetworkX graph

An undirected bipartite graph.

nodesiterable of nodes, optional (default: all nodes)

Return butterfly counts only for these nodes. The computation always uses the full graph; nodes only filters the returned dictionary (same convention as triangles()). When None (default), counts for all nodes are returned. Nodes not present in G are silently ignored.

Returns:
butterfliesdict

A dictionary keyed by node to the number of butterflies that node participates in. Each butterfly is counted once per participating node, so:

sum(butterflies(G).values()) == 4 * total_butterfly_count

See also

robins_alexander_clustering

graph-level bipartite clustering coefficient whose numerator is 4 * total butterfly count.

networkx.algorithms.cluster.square_clustering

per-node square clustering coefficient for general graphs.

Notes

This algorithm uses wedge enumerate to count butterflies. The wedge enumeration method uses the vertex-priority of BFC-VP [2] : nodes are ranked by degree, neighbor lists are pre-sorted by ascending rank, and for each start node u only neighbors with lower rank are processed as middle and end nodes, with early termination on the sorted lists. This gives time complexity \(O\!\bigl(\sum_{(u,v)\in E} \min(d(u), d(v))\bigr) = O(\alpha m)\) where \(\alpha\) is the arboricity, i.e. the minimum number of forests into which \(E\) can be partitioned, and \(m\) the number of edges.

The per-node attribution (distributing butterfly credits to start, middle, and end nodes) is an extension of BFC-VP not present in the original paper, which computes only a global count.

References

[1]

Sanei-Mehri, S. V., Sariyuce, A. E., & Tirthapura, S. (2018). Butterfly counting in bipartite networks. Proceedings of the 24th ACM SIGKDD, 2150–2159. https://doi.org/10.1145/3219819.3220097

[2] (1,2)

Wang, K., Lin, X., Qin, L., Zhang, W., & Zhang, Y. (2023). Accelerated butterfly counting with vertex priority on bipartite graphs. The VLDB Journal, 32, 257–281. https://doi.org/10.1007/s00778-022-00746-0

[3]

Lind, P. G., Gonzalez, M. C., & Herrmann, H. J. (2005). Cycles and clustering in bipartite networks. Physical Review E, 72, 056127. https://doi.org/10.1103/PhysRevE.72.056127

[4]

Robins, G. and M. Alexander (2004). Small worlds among interlocking directors: Network structure and distance in bipartite graphs. Computational & Mathematical Organization Theory 10(1), 69–94. https://doi.org/10.1023/B:CMOT.0000032580.12184.c0

Examples

A single K_{2,2} contains exactly one butterfly, and each of its four nodes participates in that butterfly:

>>> G = nx.complete_bipartite_graph(2, 2)
>>> nx.bipartite.butterflies(G)
{0: 1, 1: 1, 2: 1, 3: 1}

The total number of butterflies is the sum divided by 4:

>>> bt = nx.bipartite.butterflies(G)
>>> sum(bt.values()) // 4
1

K_{3,3} contains nine butterflies; every node participates in six:

>>> G2 = nx.complete_bipartite_graph(3, 3)
>>> bt2 = nx.bipartite.butterflies(G2)
>>> sum(bt2.values()) // 4
9

Nodes not in any butterfly receive count zero:

>>> G = nx.complete_bipartite_graph(2, 2)
>>> G.add_edge(0, 5)
>>> nx.bipartite.butterflies(G)
{0: 1, 1: 1, 2: 1, 3: 1, 5: 0}