NXEP 3 — Graph Builders

Author

Dan Schult

Author

Kelly Boothby

Status

Draft

Type

Standards Track

Created

2020-11-27

Abstract

Graph generators in NetworkX create a Graph starting from an object specified in the create_using argument. Many of these generators do no more than create edges to add to the graph. Sometimes all we want the graph for is to generate those edges. It might be better to allow generators to report either the edgelist or any of the standard graph classes or a custom class. This NXEP proposes a framework for graph builders which allows a user friendly interface to these features and decorators to make it easy for developers to provide these features whether the graph builder algorithm requires a graph, or just edges.

Motivation and Scope

Consider, for example, the function nx.path_graph(nodes, create_using). It creates the edges for the indicated path and adds them to an empty graph data structure created using the type create_using. path_graph does not use the graph structure to create the edges being generated and could arguably just yield the edges without involving the data structure at all. The parameter create_using is used to indicate the type of graph data structure to use. This could be indicated by passing an edge generator to the type constructor e.g. nx.MultiDiGraph(nx.path_edges(n)) instead of nx.path_graph(n, create_using=nx.MultiDiGraph). The former style allows the edges to be used without creating any graph data structure if that is desired. The latter is preferred stylistically because the main idea of the code phrase is to create a path graph and that comes first. Details such as what graph type to use are specified later in the phrase.

Separating edge generation from graph data structure creation arguably makes a cleaner interface where independent tools can be put together in creative ways. To the extent that users need to generate edges rather than graphs, having an edge generator that doesn’t store the graph is an advantage. It’s not exactly clear how much demand there is for this feature. But e.g. nx.utils.pairwise would no longer be needed if we had nx.path_edges(node_iterable).

The create_using parameter is a mechanism to tell the function what class of graph data structure to start with. Separating edge generation from graph construction would mean the edge generator function would no longer need a type for the graph data structure since there isn’t one. This NXEP proposes one way to provide an interface that separates edge generation from graph data structure creation when desired, while leaving a user-friendly mechanism for selection the graph type when desired.

The changes proposed involve any nx function that creates a graph or an edgelist. The proposal is to make these functions return a graph of any type or an edgelist based on the choice indicated by the user. Developers choose whether it is more effective to yield edges or to return a graph. Decorators are used to construct the surrounding code to enable the other output styles.

The proposed solution is to provide the user with graph builders that return either a graph or an edgelist while minimizing the code needed for developers to support both. The underlying code could choose to either 1) yield edges, or 2) construct a graph from an input graph parameter. Two decorators would then add the extra code needed to construct a single object so users would use the same interface no matter which style of underlying code was used. The user facing interface would allow the user to specify a graph data structure by type, or request an edgelist. One syntax proposal is:

G = nx.path_graph(9)
DG = nx.path_graph.DiGraph(9)
MG = nx.path_graph.MultiGraph(9)
MDG = nx.path_graph.MultiDiGraph(9)
CG = nx.path_graph.CustomGraph(9, create_using)
elist = nx.path_graph.edgelist(9)

Edgelists that only contain pairs of nodes indicating an edge are restrictive. Some graphs have isolated nodes which would not appear in any node-pair. Some graphs have node or edge attributes associated with the node or edge. Multigraphs have edge keys associated with each edge, often as a 3-tuple (u, v, ekey). This proposal suggests that we adopt the following edgelist protocol for describing a graph (perhaps there is a better name than edgelist).

An edgelist is a sequence of containers. The length of the container along with the hashable nature of it’s last element determined the type of information included in the container. All currently used graph information can be stored in such a sequence. The logic is as follows where C denotes the container:

len(C)

hashable(C[-1])

Graph attributes:

1

False

Node without attributes:

1

True

Node with attributes:

2

False

Edge without attributes:

2

True

Edge with attributes:

3

False

Multiedge without attributes:

3

True

Multiedge with attributes:

4

False

Here is some code to process such an edgelist and construct the graph starting from an empty graph G:

for C in edgelist:
 if len(C) == 1:
   if not hashable(C[-1]):
     G.graph.update(C[-1])  # C[-1] is a dict of graph attributes
   else:
     G.add_node(C[-1])  # C[-1] is a node
 elif len(C) == 2:
   if not hashable(C[-1]):
     G.add_node(C[0], **C[-1])  # C[-1] is a dict of node attributes
   else:
     G.add_edge(*C)  # C is a node-pair indicating an edge
 elif len(C) == 3:
   if not hashable(C[-1]):
     G.add_edge(*C[:2], **C[-1])  # C -> (u, v, attrdict)
   else:
     G.add_edge(*C)  # C -> (u, v, edge_key)
 elif len(C) == 4:
     assert not hashable(C[-1])
     G.add_edge(*C)  # C -> (u, v, edge_key, attr_dict)
 else:
     raise NetworkXInvalidEdgelist(
         "no container in an edgelist should be larger than 4 objects."
     )

Usage and Impact

Users will build graphs using similar syntax as before with added flexibility.

Create a wheel graph with 9 spokes (10 nodes):

>>> G = nx.wheel_graph(9)  # same as current code

Construct a path graph using a MultiDiGraph data structure:

>>> MDG = nx.path_graph.MultiDiGraph([3, 4, 2, 5, 7, 6])
>>> # current code:
>>> MDG = nx.path_graph([3, 4, 2, 5, 7, 6], create_using=MultiDiGraph)

Construct a star graph using a CustomGraph subclass of a NetworkX graph class.

>>> G = nx.star_graph.CustomGraph(9, MyCustomGraph)
>>> # current code:
>>> G = nx.star_graph(9, create_using=MyCustomGraph)

Add a complete graph to an existing graph G:

>>> G.update(nx.complete_graph.edgelist(range(len(G) - 10, 20))

Iterate over the edges of a randomly generated graph without storing it.

>>> for u, v in nx.configuration_model_graph.edgelist(deg_sequence):
>>>     process(u, v)

Developers will use a decorator to indicate whether their graph builder has underlying code that yields from an edgelist, or returns a graph.

@graph_builder
@py_random_state(4)
def extended_barabasi_albert_graph(n, m, p, q, seed=None)
    # some fancy code that requires we construct G to use graph properties
    # while we decide what edges to add next.
    return G

The @graph_builder decorator adds code to enable e.g. nx.extended_barabasi_albert_graph.edgelist.

For most graph builders we simply yield from an edgelist.

@node_and_edge_builder
def ladder_graph(n):
    yield from pairwise(range(n))
    yield from pairwise(range(n, 2 * n))
    yield from ((v, v + n) for v in range(n))

The @node_and_edge_builder decorator adds code to enable e.g. nx.ladder_graph.MultiGraph(6). Note that nx.ladder_graph(6) would still return an nx.Graph as it currently does. To make use of the edgelist functionality yielding edge without graph constructing, the syntax would be nx.ladder_graph.edgelist(6).

Backward compatibility

To reduce backward incompatibility, the base calling structure nx.path_graph(9) works as it currently does. The create_using parameter is removed and replaced by an attribute of the calling function. So nx.path_graph(9, nx.DiGraph) becomes nx.path_graph.DiGraph(9). The create_using parameter could also be retained providing more backward compatibility at the potential cost of providing at least 2 ways to create the same graph: nx.path_graph(9, create_using=nx.DiGraph) and nx.path_graph.DiGraph(9). See the Alternatives section.

Due to the renaming of graph generators as graph builders (to avoid confusion with Python’s generator functions) anyone using full-path calling syntax e.g., nx.generators.path_graph(9) will need to change to nx.path_graph(9) or nx.builders.path_graph(9) though the latter is discouraged. This change of name is independent of the main thrust of this proposal. But it seems a reasonable time to make such a change.

To reduce developer impact, upon inception, we could use all current graph generators as graph builders by attaching the @graph_builder decorator. Presumably for efficiency many of them should be rewritten to yield edgelists rather than returning graphs. But this could be done gradually and when done switch the decorator to @node_and_edge_builder. Both should return equivalent graph builder objects.

Detailed description

This can be accomplished through a couple decorators, which could be adopted gradually – a big patch initially decorating all existing generators with @graph_builder would immediately support the notation nx.complete_graph.edgelist(...) without impacting existing code. Later generators could use @node_and_edge_builder.

def node_and_edge_builder(f):
    @wraps(f)
    def graph(*args, **kwargs):
        return nx.Graph(f(*args, **kwargs))
    def digraph(*args, **kwargs):
        return nx.DiGraph(f(*args, **kwargs))
    def multigraph(*args, **kwargs):
        return nx.MultiGraph(f(*args, **kwargs))
    def multidigraph(*args, **kwargs):
        return nx.MultiDiGraph(f(*args, **kwargs))
    def custom_graph(*args, create_using=None, **kwargs):
        g = create_using()
        g.update(f(*args, **kwargs))
        return g
    graph.Graph = graph
    graph.DiGraph = digraph
    graph.MultiGraph = multigraph
    graph.MultiDiGraph = multidigraph
    graph.CustomGraph = custom_graph
    graph.edgelist = f
    return graph

def graph_builder(f):
    @wraps(f)
    def edgelist(*args, **kwargs):
        g = f(*args, **kwargs)
        return itertools.ichain(
            map(tuple, G.nodes.data()), map(tuple, G.edges.data())
        )
    f.edgelist = edgelist
    f.CustomGraph = f
    def graph(*args, **kwargs):
        return f(*args, create_using=nx.Graph, **kwargs))
    def digraph(*args, **kwargs):
        return f(*args, create_using=nx.DiGraph, **kwargs))
    def multigraph(*args, **kwargs):
        return f(*args, create_using=nx.MultiGraph, **kwargs))
    def multidigraph(*args, **kwargs):
        return f(*args, create_using=nx.MultiDiGraph, **kwargs))
    f.Graph = graph
    f.DiGraph = digraph
    f.MultiGraph = multigraph
    f.MultiDiGraph = multidigraph
    return f

Note: the graph_builder underlying code should accept a create_using parameter for this implementation to work. We need to think if this is universally applicable and how to handle builders that shouldn’t work with all four of the major NetworkX graph classes.

Graph.update will need to handle an edgelist input. It currently handles node-pairs and node-pair with edge key triples for multigraphs. Code like that shown above in the description of Edgelist should be used.

Example developer usage:

@node_and_edge_builder
def path_graph(n):
    """an overly simplified path graph implementation"""
    return pairwise(range(n))

@graph_builder
def complete_graph(n, create_using=None):
    """an overly simplified complete graph implementation"""
    if create_using is None:
        create_using = nx.Graph
    g = empty_graph(0, create_using)
    g.update(itertools.combinations(range(n), 2))
    return g

Implementation

The first major step is to implement the two builder decorators. Next we need to change the Graph update methods, convert functions, etc. to process edgelists that contain isolated nodes and data attributes. Third we should identify any functions that build graphs or edgelists and decorate them to make them Graph Builders.

Special care should be made to ensure only desired graph types are accepted and appropriate errors raised when not.

We should rename the generators directory as builders and adjust documentation where needed appropriately (including old documentation getting the correct canonical url).

Later steps include going through the existing generator code and switching that code to yield edgelists instead of returning graphs (where appropriate).

Alternatives

We can just leave the generators as they are and deal with the cost of creating a graph when one only needs the edgelist. It’s not a huge cost most of the time.

We can split the edge generation from graph creation using nx.DiGraph(nx.path_edgelist(9)) and disallowing create_using.

We can implement the proposal retaining the create_using parameter for backward compatibility.

Discussion

Most of the ideas here are from - [#3036] which built on discussion from - [#1393]