D-Separation#
Algorithm for testing d-separation in DAGs.
d-separation is a test for conditional independence in probability distributions that can be factorized using DAGs. It is a purely graphical test that uses the underlying graph and makes no reference to the actual distribution parameters. See [1] for a formal definition.
The implementation is based on the conceptually simple linear time algorithm presented in [2]. Refer to [3], [4] for a couple of alternative algorithms.
The functional interface in NetworkX consists of three functions:
find_minimal_d_separator
returns a minimal d-separator setz
. That is, removing any node or nodes from it makes it no longer a d-separator.is_d_separator
checks if a given set is a d-separator.is_minimal_d_separator
checks if a given set is a minimal d-separator.
D-separators#
Here, we provide a brief overview of d-separation and related concepts that are relevant for understanding it:
The ideas of d-separation and d-connection relate to paths being open or blocked.
A “path” is a sequence of nodes connected in order by edges. Unlike for most graph theory analysis, the direction of the edges is ignored. Thus the path can be thought of as a traditional path on the undirected version of the graph.
A “candidate d-separator”
z
is a set of nodes being considered as possibly blocking all paths between two prescribed setsx
andy
of nodes. We refer to each node in the candidate d-separator as “known”.A “collider” node on a path is a node that is a successor of its two neighbor nodes on the path. That is,
c
is a collider if the edge directions along the path look like... u -> c <- v ...
.If a collider node or any of its descendants are “known”, the collider is called an “open collider”. Otherwise it is a “blocking collider”.
Any path can be “blocked” in two ways. If the path contains a “known” node that is not a collider, the path is blocked. Also, if the path contains a collider that is not a “known” node, the path is blocked.
A path is “open” if it is not blocked. That is, it is open if every node is either an open collider or not a “known”. Said another way, every “known” in the path is a collider and every collider is open (has a “known” as a inclusive descendant). The concept of “open path” is meant to demonstrate a probabilistic conditional dependence between two nodes given prescribed knowledge (“known” nodes).
Two sets
x
andy
of nodes are “d-separated” by a set of nodesz
if all paths between nodes inx
and nodes iny
are blocked. That is, if there are no open paths from any node inx
to any node iny
. Such a setz
is a “d-separator” ofx
andy
.A “minimal d-separator” is a d-separator
z
for which no node or subset of nodes can be removed with it still being a d-separator.
The d-separator blocks some paths between x
and y
but opens others.
Nodes in the d-separator block paths if the nodes are not colliders.
But if a collider or its descendant nodes are in the d-separation set, the
colliders are open, allowing a path through that collider.
Illustration of D-separation with examples#
A pair of two nodes, u
and v
, are d-connected if there is a path
from u
to v
that is not blocked. That means, there is an open
path from u
to v
.
For example, if the d-separating set is the empty set, then the following paths are
open between u
and v
:
u <- n -> v
u -> w -> … -> n -> v
If on the other hand, n
is in the d-separating set, then n
blocks
those paths between u
and v
.
Colliders block a path if they and their descendants are not included in the d-separating set. An example of a path that is blocked when the d-separating set is empty is:
u -> w -> … -> n <- v
The node n
is a collider in this path and is not in the d-separating set.
So n
blocks this path. However, if n
or a descendant of n
is
included in the d-separating set, then the path through the collider
at n
(… -> n <- …) is “open”.
D-separation is concerned with blocking all paths between nodes from x
to y
.
A d-separating set between x
and y
is one where all paths are blocked.
D-separation and its applications in probability#
D-separation is commonly used in probabilistic causal-graph models. D-separation connects the idea of probabilistic “dependence” with separation in a graph. If one assumes the causal Markov condition [5], (every node is conditionally independent of its non-descendants, given its parents) then d-separation implies conditional independence in probability distributions. Symmetrically, d-connection implies dependence.
The intuition is as follows. The edges on a causal graph indicate which nodes
influence the outcome of other nodes directly. An edge from u to v
implies that the outcome of event u
influences the probabilities for
the outcome of event v
. Certainly knowing u
changes predictions for v
.
But also knowing v
changes predictions for u
. The outcomes are dependent.
Furthermore, an edge from v
to w
would mean that w
and v
are dependent
and thus that u
could indirectly influence w
.
Without any knowledge about the system (candidate d-separating set is empty)
a causal graph u -> v -> w
allows all three nodes to be dependent. But
if we know the outcome of v
, the conditional probabilities of outcomes for
u
and w
are independent of each other. That is, once we know the outcome
for `v`, the probabilities for ``w
do not depend on the outcome for u
.
This is the idea behind v
blocking the path if it is “known” (in the candidate
d-separating set).
The same argument works whether the direction of the edges are both left-going and when both arrows head out from the middle. Having a “known” node on a path blocks the collider-free path because those relationships make the conditional probabilities independent.
The direction of the causal edges does impact dependence precisely in the
case of a collider e.g. u -> v <- w
. In that situation, both u
and w
influence v`
. But they do not directly influence each other. So without any
knowledge of any outcomes, u
and w
are independent. That is the idea behind
colliders blocking the path. But, if v
is known, the conditional probabilities
of u
and w
can be dependent. This is the heart of Berkson’s Paradox [6].
For example, suppose u
and w
are boolean events (they either happen or do not)
and v
represents the outcome “at least one of u
and w
occur”. Then knowing
v
is true makes the conditional probabilities of u
and w
dependent.
Essentially, knowing that at least one of them is true raises the probability of
each. But further knowledge that w
is true (or false) change the conditional
probability of u
to either the original value or 1. So the conditional
probability of u
depends on the outcome of w
even though there is no
causal relationship between them. When a collider is known, dependence can
occur across paths through that collider. This is the reason open colliders
do not block paths.
Furthermore, even if v
is not “known”, if one of its descendants is “known”
we can use that information to know more about v
which again makes
u
and w
potentially dependent. Suppose the chance of n
occurring
is much higher when v
occurs (“at least one of u
and w
occur”).
Then if we know n
occurred, it is more likely that v
occurred and that
makes the chance of u
and w
dependent. This is the idea behind why
a collider does no block a path if any descendant of the collider is “known”.
When two sets of nodes x
and y
are d-separated by a set z
,
it means that given the outcomes of the nodes in z
, the probabilities
of outcomes of the nodes in x
are independent of the outcomes of the
nodes in y
and vice versa.
Examples#
A Hidden Markov Model with 5 observed states and 5 hidden states where the hidden states have causal relationships resulting in a path results in the following causal network. We check that early states along the path are separated from late state in the path by the d-separator of the middle hidden state. Thus if we condition on the middle hidden state, the early state probabilities are independent of the late state outcomes.
>>> G = nx.DiGraph()
>>> G.add_edges_from(
... [
... ("H1", "H2"),
... ("H2", "H3"),
... ("H3", "H4"),
... ("H4", "H5"),
... ("H1", "O1"),
... ("H2", "O2"),
... ("H3", "O3"),
... ("H4", "O4"),
... ("H5", "O5"),
... ]
... )
>>> x, y, z = ({"H1", "O1"}, {"H5", "O5"}, {"H3"})
>>> nx.is_d_separator(G, x, y, z)
True
>>> nx.is_minimal_d_separator(G, x, y, z)
True
>>> nx.is_minimal_d_separator(G, x, y, z | {"O3"})
False
>>> z = nx.find_minimal_d_separator(G, x | y, {"O2", "O3", "O4"})
>>> z == {"H2", "H4"}
True
If no minimal_d_separator exists, None
is returned
>>> other_z = nx.find_minimal_d_separator(G, x | y, {"H2", "H3"})
>>> other_z is None
True
References#
Pearl, J. (2009). Causality. Cambridge: Cambridge University Press.
Darwiche, A. (2009). Modeling and reasoning with Bayesian networks. Cambridge: Cambridge University Press.
Shachter, Ross D. “Bayes-ball: The rational pastime (for determining irrelevance and requisite information in belief networks and influence diagrams).” In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI), (pp. 480–487). 1998.
Koller, D., & Friedman, N. (2009). Probabilistic graphical models: principles and techniques. The MIT Press.
|
Return whether node sets |
|
Determine if |
|
Returns a minimal d-separating set between |