NXEP 4 — Adopting numpy.random.Generator
as default random interface#
- Author:
Ross Barnowski (rossbar@berkeley.edu)
- Status:
Draft
- Type:
Standards Track
- Created:
2022-02-24
Abstract#
Pseudo-random numbers play an important role in many graph and network analysis
algorithms in NetworkX.
NetworkX provides a standard interface to random number generators
that includes support for numpy.random
and the Python built-in random
module.
numpy.random
is used extensively within NetworkX and in several cases is the
preferred package for random number generation.
NumPy introduced a new interface in the numpy.random
package in NumPy version
1.17.
According to NEP19, the new interface based on
numpy.random.Generator
is recommended over the legacy numpy.random.RandomState
as the former has
better statistical properties,
more features,
and improved performance.
This NXEP proposes a strategy for adopting numpy.random.Generator
as the
default interface for random number generation within NetworkX.
Motivation and Scope#
The primary motivation for adopting numpy.random.Generator
as the default
random number generation engine in NetworkX is to allow users to benefit from
the improvements in numpy.random.Generator
, including:
- Advances in statistical quality of modern pRNG’s
- Improved performance
- Additional features
The numpy.random.Generator
API is very similar to the numpy.random.RandomState
API, so users can benefit from these improvements without any additional changes
[1] to their existing NetworkX code.
In principle this change would impact NetworkX users that use any of the
functions decorated by np_random_state
or py_random_state
(when the random_state
argument
involves numpy
).
See the next section for details.
Usage and Impact#
In NetworkX, random number generators are typically created via a decorator:
from networkx.utils import np_random_state
@np_random_state("seed") # Or could be the arg position, i.e. 0
def foo(seed=None):
return seed
The decorator is responsible for mapping various different inputs into an
instance of a random number generator within the function.
Currently, the random number generator instance that is returned is a
numpy.random.RandomState
object:
>>> type(foo(None))
numpy.random.mtrand.RandomState
>>> type(foo(12345))
numpy.random.mtrand.RandomState
The only way to get a numpy.random.Generator
instance from the random state
decorators is to pass the instance in directly:
>>> import numpy as np
>>> rng = np.random.default_rng()
>>> type(foo(rng))
numpy.random._generator.Generator
This NXEP proposes to change the behavior so that when e.g. and integer or
None
is given for the seed
parameter, a numpy.random.Generator
instance
is returned instead, i.e.:
>>> type(foo(None))
numpy.random._generator.Generator
>>> type(foo(12345))
numpy.random._generator.Generator
numpy.random.RandomState
instances can still be used as seed
, but they
must be explicitly passed in:
>>> rs = np.random.RandomState(12345)
>>> type(foo(rs))
numpy.random.mtrand.RandomState
Backward compatibility#
There are three main concerns:
The
Generator
interface is not stream-compatible withRandomState
, thus the results of theGenerator
methods will not be exactly the same as the correspondingRandomState
methods.There are a few slight differences in method names and availability between the
RandomState
andGenerator
APIs.There is no global
Generator
instance internal tonumpy.random
as is the case fornumpy.random.RandomState
.
The numpy.random.Generator
interface breaks the stream-compatibility
guarantee that numpy.random.RandomState
upheld of exact reproducibility of
values.
Switching the default random number generator from RandomState
to
Generator
would mean functions decorated with np_random_state
would
produce different results when a value other than an instantiated rng is used
as the seed.
For example, let’s take the following function:
@np_random_state("seed")
def bar(num, seed=None):
"""Return an array of `num` uniform random numbers."""
return seed.random(num)
With the current implementation of np_random_state
, a user can pass in an
integer value to seed
which will be used to seed a new RandomState
instance.
Using the same seed value guarantees the output is always exactly reproducible:
>>> bar(10, seed=12345)
array([0.92961609, 0.31637555, 0.18391881, 0.20456028, 0.56772503,
0.5955447 , 0.96451452, 0.6531771 , 0.74890664, 0.65356987])
>>> bar(10, seed=12345)
array([0.92961609, 0.31637555, 0.18391881, 0.20456028, 0.56772503,
0.5955447 , 0.96451452, 0.6531771 , 0.74890664, 0.65356987])
However, after changing the default rng returned by np_random_state
to
a Generator
instance, the values produced by the decorated bar
function
for integer seeds would no longer be identical:
>>> bar(10, seed=12345)
array([0.22733602, 0.31675834, 0.79736546, 0.67625467, 0.39110955,
0.33281393, 0.59830875, 0.18673419, 0.67275604, 0.94180287])
In order to recover exact reproducibility of the original results, a seeded
RandomState
instance would need to be explicitly created and passed in
via seed
:
>>> import numpy as np
>>> rng = np.random.RandomState(12345)
>>> bar(10, seed=rng)
array([0.92961609, 0.31637555, 0.18391881, 0.20456028, 0.56772503,
0.5955447 , 0.96451452, 0.6531771 , 0.74890664, 0.65356987])
Because the streams would no longer be compatible, it is proposed in this NXEP that switching the default random number generator only be considered for a major release, e.g. the transition from NetworkX 2.X to NetworkX 3.0.
The second point is only a concern for users who are using
create_random_state
and the corresponding decorator
np_random_state
in their own libraries.
For example, the numpy.random.RandomState.randint
method has been replaced
by numpy.random.Generator.integers
.
Thus any code that uses create_random_state
or create_py_random_state
and
relies on the randint
method of the returned rng would result in an
AttributeError
.
This can be addressed with a compatiblity class similar to the
networkx.utils.misc.PythonRandomInterface
class, which provides a compatibility
layer between random
and numpy.random.RandomState
.
create_random_state
currently returns the global numpy.random.mtrand._rand
RandomState
instance when the input is None
or the numpy.random
module.
By switching to numpy.random.Generator
, this will no longer be possible as
there is no global, internal Generator
instance in the numpy.random
module.
This should have no effect on users, as seed=None
currently does not
guarantee reproducible results.
Detailed description#
This NXEP proposes to change the default random number generator produced by
the create_random_state
function (and the related
decorator np_random_state
) from a numpy.random.RandomState
instance to a numpy.random.Generator
instance when the input to the
function is either an integer or None
.
Implementation#
The implementation itself is quite simple. The logic that determines how
inputs are mapped to random number generators is encapsulated in the
create_random_state
function (and the related
create_py_random_state
).
Currently (i.e. NetworkX <= 2.X), this function maps inputs like None
,
numpy.random
, and integers to RandomState
instances:
def create_random_state(random_state=None):
if random_state is None or random_state is np.random:
return np.random.mtrand._rand
if isinstance(random_state, np.random.RandomState):
return random_state
if isinstance(random_state, int):
return np.random.RandomState(random_state)
if isinstance(random_state, np.random.Generator):
return random_state
msg = (
f"{random_state} cannot be used to create a numpy.random.RandomState or\n"
"numpy.random.Generator instance"
)
raise ValueError(msg)
This NXEP proposes to modify the function to produce Generator
instances
for these inputs. An example implementation might look something like:
def create_random_state(random_state=None):
if random_state is None or random_state is np.random:
return np.random.default_rng()
if isinstance(random_state, (np.random.RandomState, np.random.Generator)):
return random_state
if isinstance(random_state, int):
return np.random.default_rng(random_state)
msg = (
f"{random_state} cannot be used to create a numpy.random.RandomState or\n"
"numpy.random.Generator instance"
)
raise ValueError(msg)
The above captures the essential change in logic, though implementation details may differ. Most of the work related implementing this change will be associated with improved/reorganized tests; including adding tests rng-stream reproducibility.
Alternatives#
The status quo, i.e. using RandomState
by default, is a completely
acceptable alternative.
RandomState
is not deprecated, and is expected to maintain its stream-compatibility
guarantee in perpetuity.
Another possible alternative would be to provide a package-level toggle that
users could use to switch the behavior the seed
kwarg for all functions
decorated by np_random_state
or py_random_state
.
To illustrate (ignoring implementation details):
>>> import networkx as nx
>>> from networkx.utils.misc import create_random_state
# NetworkX 2.X behavior: RandomState by default
>>> type(create_random_state(12345))
numpy.random.mtrand.RandomState
# Change random backend by setting pkg attr
>>> nx._random_backend = "Generator"
>>> type(create_random_state(12345))
numpy.random._generator.Generator
Discussion#
This section may just be a bullet list including links to any discussions regarding the NXEP:
This includes links to mailing list threads or relevant GitHub issues.