---
jupytext:
notebook_metadata_filter: all
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.13.8
kernelspec:
display_name: Python 3
language: python
name: python3
language_info:
codemirror_mode:
name: ipython
version: 3
file_extension: .py
mimetype: text/x-python
name: python
nbconvert_exporter: python
pygments_lexer: ipython3
version: 3.9.2
---
# Directed Acyclic Graphs & Topological Sort
In this tutorial, we will explore the algorithms related to a directed acyclic graph
(or a "dag" as it is sometimes called) implemented in networkx under `networkx/algorithms/dag.py`.
First of all, we need to understand what a directed graph is.
## Directed Graph
### Example
```{code-cell} ipython3
%matplotlib inline
import networkx as nx
import matplotlib.pyplot as plt
```
```{code-cell} ipython3
triangle_graph = nx.from_edgelist([(1, 2), (2, 3), (3, 1)], create_using=nx.DiGraph)
```
```{code-cell} ipython3
nx.draw_planar(
triangle_graph,
with_labels=True,
node_size=1000,
node_color="#ffff8f",
width=0.8,
font_size=14,
)
```
### Definition
In mathematics, and more specifically in graph theory,
a directed graph (or DiGraph) is a graph that is made up of a set of vertices
connected by directed edges often called arcs.
Edges here have _directionality_, which stands in contrast to undirected graphs
where, semantically, edges have no notion of a direction to them.
Directed acyclic graphs take this idea further;
by being _acyclic_, they have no _cycles_ in them.
You will see this idea in action in the examples below.
## Directed Acyclic Graph
### Example
```{code-cell} ipython3
clothing_graph = nx.read_graphml(f"data/clothing_graph.graphml")
```
```{code-cell} ipython3
plt.figure(figsize=(12, 12), dpi=150)
nx.draw_planar(
clothing_graph,
arrowsize=12,
with_labels=True,
node_size=8000,
node_color="#ffff8f",
linewidths=2.0,
width=1.5,
font_size=14,
)
```
Here is a fun example of Professor Bumstead,
who has a routine for getting dressed in the morning.
By habit, the professor dons certain garments before others (e.g., socks before shoes).
Other items may be put on in any order (e.g., socks and pants).
A directed edge $(u, v)$ in the example indicates that garment $u$
must be donned before garment $v$.
In this example, the `clothing_graph` is a DAG.
```{code-cell} ipython3
nx.is_directed_acyclic_graph(clothing_graph)
```
By contrast, the `triangle_graph` is not a DAG.
```{code-cell} ipython3
nx.is_directed_acyclic_graph(triangle_graph)
```
This is because the `triangle_graph` has a cycle:
```{code-cell} ipython3
nx.find_cycle(triangle_graph)
```
### Applications
Directed acyclic graphs representations of partial orderings have many applications in scheduling
of systems of tasks with ordering constraints.
An important class of problems of this type concern collections of objects that need to be updated,
for example, calculating the order of cells of a spreadsheet to update after one of the cells has been changed,
or identifying which object files of software to update after its source code has been changed.
In these contexts, we use a dependency graph, which is a graph that has a vertex for each object to be updated,
and an edge connecting two objects whenever one of them needs to be updated earlier than the other.
A cycle in this graph is called a circular dependency, and is generally not allowed,
because there would be no way to consistently schedule the tasks involved in the cycle.
Dependency graphs without circular dependencies form DAGs.
A directed acyclic graph may also be used to represent a network of processing elements.
In this representation, data enters a processing element through its incoming edges
and leaves the element through its outgoing edges.
For instance, in electronic circuit design, static combinational logic blocks
can be represented as an acyclic system of logic gates that computes a function of an input,
where the input and output of the function are represented as individual bits.
### Definition
A directed acyclic graph ("DAG" or "dag") is a directed graph with no directed cycles.
That is, it consists of vertices and edges (also called arcs), with each edge directed from one vertex to another,
such that following those directions will never form a closed loop.
A directed graph is a DAG if and only if it can be topologically ordered
by arranging the vertices as a linear ordering that is consistent with all edge directions.
## Topological sort
Let's now introduce what the topological sort is.
### Example
```{code-cell} ipython3
list(nx.topological_sort(clothing_graph))
```
### Applications
The canonical application of topological sorting is in scheduling a sequence of jobs
or tasks based on their dependencies.
The jobs are represented by vertices, and there is an edge from $u$ to $v$
if job $u$ must be completed before job $v$ can be started
(for example, when washing clothes, the washing machine must finish before we put the clothes in the dryer).
Then, a topological sort gives an order in which to perform the jobs.
A closely related application of topological sorting algorithms
was first studied in the early 1960s in the context of the
[PERT technique](https://en.wikipedia.org/wiki/Program_evaluation_and_review_technique)
for scheduling in project management.
In this application, the vertices of a graph represent the milestones of a project,
and the edges represent tasks that must be performed between one milestone and another.
Topological sorting forms the basis of linear-time algorithms for finding
the critical path of the project, a sequence of milestones and tasks that controls
the length of the overall project schedule.
In computer science, applications of this type arise in instruction scheduling,
ordering of formula cell evaluation when recomputing formula values in spreadsheets,
logic synthesis, determining the order of compilation tasks to perform in makefiles,
data serialization, and resolving symbol dependencies in linkers.
It is also used to decide in which order to load tables with foreign keys in databases.
### Definition
A topological sort of a directed acyclic graph $G = (V, E)$ is a linear ordering of all its vertices
such that if $G$ contains an edge $(u, v)$, then $u$ appears before $v$ in the ordering.
It is worth noting that if the graph contains a cycle, then no linear ordering is possible.
It is useful to view a topological sort of a graph as an ordering of its vertices
along a horizontal line so that all directed edges go from left to right.
### Kahn's algorithm
NetworkX uses Kahn's algorithm to perform topological sorting.
We will introduce it briefly here.
First, find a list of "start nodes" which have no incoming edges and insert them into a set S;
at least one such node must exist in a non-empty acyclic graph. Then:
```
L <- Empty list that will contain the sorted elements
S <- Set of all nodes with no incoming edge
while S is not empty do
remove a node N from S
add N to L
for each node M with an edge E from N to M do
remove edge E from the graph
if M has no other incoming edges then
insert M into S
if graph has edges then
return error # graph has at least one cycle
else
return L # a topologically sorted order
```
### NetworkX implementation
Finally, let's take a look at how the topological sorting is implemented in NetworkX.
We can see that Kahn's algorithm _stratifies_ the graph such that each level contains all the nodes
whose dependencies have been satisfied by the nodes in a previous level.
In other words, Kahn's algorithm does something like:
- Take all the nodes in the DAG that don't have any dependencies and put them in list.
- "Remove" those nodes from the DAG.
- Repeat the process, creating a new list at each step.
Thus, topological sorting is reduced to correctly stratifying the graph in this way.
This procedure is implemented in the `topological_generations()` function, on which the `topological_sort()` function is based.
Let's see how the `topological_generations()` function is implemented in NetworkX step by step.
#### Step 1. Initialize indegrees.
Since in Kahn's algorithm we are only interested in the indegrees of the vertices,
in order to preserve the structure of the graph as it is passed in,
instead of removing the edges, we will decrease the indegree of the corresponding vertex.
Therefore, we will save these values in a separate _dictionary_ `indegree_map`.
```
indegree_map = {v: d for v, d in G.in_degree() if d > 0}
```
#### Step 2. Initialize first level.
At each step of Kahn's algorithm, we seek out vertices with an in-degree of zero.
In preparation for the first loop iteration of the algorithm,
we can initialize a list called `zero_indegree` that houses these nodes:
```
zero_indegree = [v for v, d in G.in_degree() if d == 0]
```
#### Step 3. Move from one level to the next.
Now, we will show how the algorithm moves from one level to the next.
Inside the loop, the first generation to be considered (`this_generation`)
is the collection of nodes that have zero in-degrees.
We process all the vertices of the current level in variable `this_generation`
and we store the next level in variable `zero_degree`.
For each vertex inside `this_generation`,
we remove all of its outgoing edges.
Then, if the input degree of some vertex is zeroed as a result,
then we add it to the next level `zero_indegree`
and remove it from the `indegree_map` dictionary.
After we have processed all of the nodes inside `this_generation`, we can yield it.
```
while zero_indegree:
this_generation = zero_indegree
zero_indegree = []
for node in this_generation:
for child in G.neighbors(node):
indegree_map[child] -= 1
if indegree_map[child] == 0:
zero_indegree.append(child)
del indegree_map[child]
yield this_generation
```
#### Step 4. Check if there is a cycle in the graph.
If, after completing the loop there are still vertices in the graph,
then there is a cycle in it and the graph is not a DAG.
```
if indegree_map:
raise nx.NetworkXUnfeasible(
"Graph contains a cycle or graph changed during iteration"
)
```
#### Addendum: Topological sort works on multigraphs as well.
This is possible to do by slightly modifying the algorithm above.
* Firstly, check if `G` is a multigraph
```
multigraph = G.is_multigraph()
```
* Then, replace
```
indegree_map[child] -= 1
```
with
```
indegree_map[child] -= len(G[node][child]) if multigraph else 1
```
#### Addendum: The graph may have changed during the iteration.
Between passing different levels in a topological sort, the graph could change.
We need to check this while the `while` loop is running.
* To do this, just replace
```
for node in this_generation:
for child in G.neighbors(node):
indegree_map[child] -= 1
```
with
```
for node in this_generation:
if node not in G:
raise RuntimeError("Graph changed during iteration")
for child in G.neighbors(node):
try:
indegree_map[child] -= 1
except KeyError as e:
raise RuntimeError("Graph changed during iteration") from e
```
#### Combine all steps.
Combining all of the above gives the current implementation of the `topological_generations()` function in NetworkX.
```{code-cell} ipython3
import inspect
print(inspect.getsource(nx.topological_generations))
```
Let's finally see what the result will be on the `clothing_graph`.
```{code-cell} ipython3
list(nx.topological_generations(clothing_graph))
```