NXEP 3 — Graph Builders#

Author:

Dan Schult

Author:

Kelly Boothby

Status:

Draft

Type:

Standards Track

Created:

2020-11-27

Revision:

Spring 2023

Abstract#

Graph generators in NetworkX create a Graph starting from an object specified in the create_using argument. Many of these generators do no more than create edges to add to the graph. Sometimes all we want the graph for is to generate those edges. It might be better to allow the graph generator functions to return either the edgelist or a graph object as specified by create_using. This NXEP proposes a framework for graph builders which allows a user friendly interface to this feature and decorators to make it easy for developers to provide this feature whether the graph builder algorithm requires a graph, or just edges.

Motivation and Scope#

Consider, for example, the function nx.path_graph(nodes, create_using). It creates the edges for the indicated path and adds them to a graph data structure created using the type create_using. path_graph does not use the graph structure to create the edges being generated and could arguably just yield the edges without involving the data structure at all. This NXEP proposes the syntax nx.path_graph_generator(nodes) to obtain a generator of the graph. That is, the result iterates over the edge-list for the graph without constructing the graph itself. Note that we are adding a new function for each of the existing graph building functions. The API is *_graph() for the graph builder, and *_graph_generator() for the generator of edges.

An edge-list is actually not sufficient to represent a graph because of node attributes, graph attributes and isolated nodes. To handle these more exotic elements, we introduce the Graph Sequence as a way to specify all nodes, edges and attributes for a graph. The generator yields this sequence.

Separating edge generation from graph data structure creation arguably makes a cleaner interface where independent tools can be put together in creative ways. To the extent that users need to generate edges rather than graphs, having an edge generator that doesn’t store the graph is an advantage. It’s not exactly clear how much demand there is for this feature. But as an example of the flexibility of this approach, the nx.utils.pairwise(nbunch) call could be replaced via nx.path_graph_generator(nbunch).

The create_using parameter is a mechanism to tell the builder function what class of graph data structure to start with. Separating edge generation from graph construction would mean the edge generator function would no longer need a type for the graph data structure when there isn’t one. This NXEP proposes one way to provide an interface that separates edge generation from graph data structure creation when desired, while leaving the familiar create_using syntax for graph type selection when desired.

Graph Sequences#

Edgelists that only contain pairs of nodes indicating an edge are restrictive. Some graphs have isolated nodes which would not appear in any node-pair. Some graphs have node or edge attributes associated with the node or edge. Multigraphs have edge keys associated with each edge, often as a 3-tuple (u, v, ekey). This proposal suggests that we adopt the following Graph Sequence protocol for describing a graph.

A Graph Sequence is a sequence-of-sequences, usually a list-of-tuples. The length of each tuple along with the hashable nature of the tuple’s final element determine the type of information included in the tuple. All networkx graph information can be stored in such a sequence. The logic is as follows where S denotes the inner sequence:

len(S)

hashable(S[-1])

Graph attributes:

1

False

Node without attributes:

1

True

Node with attributes:

2

False

Edge without attributes:

2

True

Edge with attributes:

3

False

Multiedge without attributes:

3

True

Multiedge with attributes:

4

False

Here is some code to process such a sequence and construct the graph starting from an empty graph G:

for S in graphsequence:
    N = len(S)
    last_entry = S[-1]
    attrs = not hashable(last_entry)
    if N == 1:
        if attrs:
            G.graph.update(last_entry)  # graph attributes
        else:
            G.add_node(last_entry)  # node without attributes
    elif N == 2:
        if attrs:
            G.add_node(S[0], **last_entry)  # (node, attrdict)
        else:
            G.add_edge(*S)  # (u, v)
    elif N == 3:
        if attrs:
            G.add_edge(*S[:2], **last_entry)  # (u, v, attrdict)
        else:
            G.add_edge(*S)  # (u, v, edge_key)
    elif N == 4:
        if attrs:
            G.add_edge(*S)  # (u, v, edge_key, attrdict)
        else:
            raise NetworkXInvalidEdgelist(
                "Sequence element has 4 items and last is not hashable: {S}"
            )
    else:
        raise NetworkXInvalidEdgelist(
            "Graph Sequence element has more than 4 items: {S}"
        )

Note that order does not affect the network structure, but the reporting order of nodes can be retained if the nodes are added in the desired order before the edges.

Usage and Impact#

Each graph builder function (formerly called graph generators) will allow return of graph structures with the same syntax as before. For example, create a wheel graph with 9 spokes (10 nodes):

>>> G = nx.wheel_graph(9)  # same as current code

To iterate over a Graph Sequence without creating a graph:

>>> for u, v in nx.binomial_graph.edges(9):
>>>     process(u, v)

Add 10 new nodes with random edges (maybe including isolated nodes) to an existing graph G:

>>> G.update(nx.binomial_graph_generator(range(9, 19))

Construct a path graph using a MultiDiGraph data structure (two methods):

>>> MDG = nx.path_graph([3, 4, 2, 5, 7, 6], create_using=MultiDiGraph)
>>> MDG = nx.MultiDiGraph(nx.path_graph_generator([3, 4, 2, 5, 7, 6])

The code to read in an edgelist upon instantiation, or via the update method will change to allow Graph Sequences in addition to edgelists. An additional base class method G.as_sequence() will yield the Graph Sequence for the graph.

Developers will use a decorator to indicate whether their graph builder has underlying code that yields from an edgelist, or returns a graph.

@graph_builder
@py_random_state(4)
def extended_barabasi_albert_graph(n, m, p, q, seed=None):
    # some fancy code that requires we construct G to use graph properties
    # while we decide what edges to add next.
    return G

The @graph_builder decorator adds code to enable e.g. nx.extended_barabasi_albert_graph_generator.

Another decorator provides code to handle the create_using argument for developers that write code which simply yields an edgelist.

@node_and_edge_builder
def ladder_graph_generator(n):
    yield from pairwise(range(n))
    yield from pairwise(range(n, 2 * n))
    yield from ((v, v + n) for v in range(n))

The @node_and_edge_builder decorator adds code to enable e.g. nx.ladder_graph(6, create_using=MultiGraph). Note that nx.ladder_graph(6) would still return an nx.Graph as it currently does. To make use of the edgelist functionality, the syntax would be nx.ladder_graph.edges(6).

It would be ideal to have the doc_strings show up on a single webpage for both the function and generator. We are exploring this possibility.

Backward compatibility#

To reduce backward incompatibility, the base calling structure nx.path_graph(9) works as it currently does. The create_using parameter behaves as usual. So, no existing code should break.

To reduce developer impact, upon inception, we could reuse all current graph generators as graph builders by attaching the @graph_builder decorator. Presumably for efficiency many of them should be rewritten to yield edgelists rather than returning graphs. But this could be done gradually and along with switching the decorator to @node_and_edge_builder. Both sets of code should return equivalent graph builder objects.

Detailed description#

This can be accomplished through a couple decorators, which could be adopted gradually – a big patch initially decorating all existing generators with @graph_builder would immediately support the notation nx.complete_graph_generator(...) without impacting existing code. Later generators could use @node_and_edge_builder.

** NEEDS UPDATING **

def node_and_edge_builder(f):
    @wraps(f)
    def graph(*args, create_using=None, **kwargs):
        G = nx.empty_graph(0, create_using=create_using)
        G.update(f(*args, **kwargs))
        return G

    graph.edges = f
    graph.edges_plus = f
    return graph


def graph_builder(f):
    @wraps(f)
    def edgelist(*args, **kwargs):
        G = f(*args, **kwargs)
        return itertools.ichain(map(tuple, G.nodes.data()), map(tuple, G.edges.data()))

    def edges(*args, **kwargs):
        G = f(*args, **kwargs)
        return map(tuple, G.edges.data())

    f.edges_plus = edgelist
    f.edges = edges

    return f

Note: the graph_builder underlying code should accept a create_using parameter for this implementation to work. We need to think if this is universally applicable and how to handle builders that shouldn’t work with all four of the major NetworkX graph classes.

Graph.update will need to handle graph sequence input. It currently handles node-pairs and node-pair-with-edge-key triples for multigraphs. Code like that shown above in the description of Graph Sequences should be used.

Example developer usage:

@node_and_edge_builder
def path_graph(n):
    """an overly simplified path graph implementation"""
    return pairwise(range(n))


@graph_builder
def complete_graph(n, create_using=None):
    """an overly simplified complete graph implementation"""
    if create_using is None:
        create_using = nx.Graph
    g = empty_graph(0, create_using)
    g.update(itertools.combinations(range(n), 2))
    return g

Implementation#

The first major step is to implement the two builder decorators. Next we need to change the Graph update methods, convert functions, etc. to process graph sequences that contain isolated nodes and data attributes. Third we should identify any functions that build graphs or edgelists and decorate them to make them Graph Builders. And we should take care that code which handles edgelists and are not able to handle Graph Sequences are appropriately protected.

Special care should be made to ensure only desired graph types are accepted and appropriate errors raised when not.

Later steps include going through the existing generator code and switching that code to yield edgelists instead of returning graphs (where appropriate).

Alternatives#

#) We can just leave the generators as they are and deal with the cost of creating a graph when one only needs the edgelist. It’s not a huge cost most of the time.

#) We can create an attribute-function on the graph builder function to provide the generator functionality. Like: nx.path_graph.edges_plus() along with nx.path_graph.edges().

#) We can provide nx.path_graph.edges_plus without providing the edges attribute. The simpler interface (by one attribute function) costs us making sure that easy tools for creating, consuming and handling Graph Sequences are available.

#) We can provide the decorators to use an attribute syntax for graph type instead of the argument create_using. Thus nx.path_graph.MultiGraph(9) would be the same as nx.path_graph(9, create_using=nx.MultiGraph). Similarly for Graph, DiGraph, MultiDiGraph and perhaps CustomGraph with a kwarg create_using.

An earlier version of this proposal included this attribute-style alternative as a replacement of the create_using argument. Developers would still write code to either 1) yield edges, or 2) construct a graph from an input graph parameter. Two decorators would then add the extra code needed to construct a single object so users would use the same interface no matter which style of underlying code was used. The user facing interface would allow the user to specify a graph data structure by type, or request an edgelist. One syntax proposal was:

G = nx.path_graph(9)
DG = nx.path_graph.DiGraph(9)
MG = nx.path_graph.MultiGraph(9)
MDG = nx.path_graph.MultiDiGraph(9)
CG = nx.path_graph.CustomGraph(9, create_using)
elist = nx.path_graph.edgelist(9)

This can be accomplished through decorators named as above, and coded similar to these examples.

def node_and_edge_builder(f):
    @wraps(f)
    def graph(*args, **kwargs):
        return nx.Graph(f(*args, **kwargs))

    def digraph(*args, **kwargs):
        return nx.DiGraph(f(*args, **kwargs))

    def multigraph(*args, **kwargs):
        return nx.MultiGraph(f(*args, **kwargs))

    def multidigraph(*args, **kwargs):
        return nx.MultiDiGraph(f(*args, **kwargs))

    def custom_graph(*args, create_using=None, **kwargs):
        g = create_using()
        g.update(f(*args, **kwargs))
        return g

    graph.Graph = graph
    graph.DiGraph = digraph
    graph.MultiGraph = multigraph
    graph.MultiDiGraph = multidigraph
    graph.CustomGraph = custom_graph
    graph.edgelist = f
    return graph


def graph_builder(f):
    @wraps(f)
    def edgelist(*args, **kwargs):
        g = f(*args, **kwargs)
        return itertools.ichain(map(tuple, G.nodes.data()), map(tuple, G.edges.data()))

    f.edgelist = edgelist
    f.CustomGraph = f

    def graph(*args, **kwargs):
        return f(*args, create_using=nx.Graph, **kwargs)

    def digraph(*args, **kwargs):
        return f(*args, create_using=nx.DiGraph, **kwargs)

    def multigraph(*args, **kwargs):
        return f(*args, create_using=nx.MultiGraph, **kwargs)

    def multidigraph(*args, **kwargs):
        return f(*args, create_using=nx.MultiDiGraph, **kwargs)

    f.Graph = graph
    f.DiGraph = digraph
    f.MultiGraph = multigraph
    f.MultiDiGraph = multidigraph
    return f

#) We might be able to avoid the function attribute syntax altogether if we can construct a Graph Sequence generator object (of class EdgesPlus?) that can be provided as the create_using argument. Graph builder code would treat it like a graph class, but the object would magically handle all the add_node and add_edge calls in the style of an iterator, yielding graph information as construction progresses. The user could halt construction if the structure showed early signs of not being useful. Graph Sequences could then be generated using nx.path_graph, create_using=EdgesPlus) This could perhaps be built with some creative coroutine magic.

Discussion#

Most of the ideas here are from - [#3036] which built on discussion from - [#1393]

Over a year of occasional thought and more occasional mentioning-in-passing, most core developers feel that the create_using parameter should be retained. The proposal was rewritten to retain that feature and not develop the attribute syntax seen in the Alternatives section. The attribute syntax continued to be used for the edges and edges_plus generators, with edges for an edge-list and edges_plus for the full Graph Sequence.

More discussion led to the proposal to make two functions for each graph type. That is, nx.path_graph(9) for a graph and nx.path_graph_generator(9) for the generator. This gained enough support to be included. No attribute support is needed.

Remaining questions: - how to handle docs - how to have a decorator add two functions to the namespace.