NXEP 4 — Adopting numpy.random.Generator as default random interface#

Author:

Ross Barnowski (rossbar@berkeley.edu)

Status:

Draft

Type:

Standards Track

Created:

2022-02-24

Abstract#

Pseudo-random numbers play an important role in many graph and network analysis algorithms in NetworkX. NetworkX provides a standard interface to random number generators that includes support for numpy.random and the Python built-in random module. numpy.random is used extensively within NetworkX and in several cases is the preferred package for random number generation. NumPy introduced a new interface in the numpy.random package in NumPy version 1.17. According to NEP19, the new interface based on numpy.random.Generator is recommended over the legacy numpy.random.RandomState as the former has better statistical properties, more features, and improved performance. This NXEP proposes a strategy for adopting numpy.random.Generator as the default interface for random number generation within NetworkX.

Motivation and Scope#

The primary motivation for adopting numpy.random.Generator as the default random number generation engine in NetworkX is to allow users to benefit from the improvements in numpy.random.Generator, including: - Advances in statistical quality of modern pRNG’s - Improved performance - Additional features

The numpy.random.Generator API is very similar to the numpy.random.RandomState API, so users can benefit from these improvements without any additional changes [1] to their existing NetworkX code.

In principle this change would impact NetworkX users that use any of the functions decorated by np_random_state or py_random_state (when the random_state argument involves numpy). See the next section for details.

Usage and Impact#

In NetworkX, random number generators are typically created via a decorator:

from networkx.utils import np_random_state

@np_random_state("seed")  # Or could be the arg position, i.e. 0
def foo(seed=None):
    return seed

The decorator is responsible for mapping various different inputs into an instance of a random number generator within the function. Currently, the random number generator instance that is returned is a numpy.random.RandomState object:

>>> type(foo(None))
numpy.random.mtrand.RandomState
>>> type(foo(12345))
numpy.random.mtrand.RandomState

The only way to get a numpy.random.Generator instance from the random state decorators is to pass the instance in directly:

>>> import numpy as np
>>> rng = np.random.default_rng()
>>> type(foo(rng))
numpy.random._generator.Generator

This NXEP proposes to change the behavior so that when e.g. and integer or None is given for the seed parameter, a numpy.random.Generator instance is returned instead, i.e.:

>>> type(foo(None))
numpy.random._generator.Generator
>>> type(foo(12345))
numpy.random._generator.Generator

numpy.random.RandomState instances can still be used as seed, but they must be explicitly passed in:

>>> rs = np.random.RandomState(12345)
>>> type(foo(rs))
numpy.random.mtrand.RandomState

Backward compatibility#

There are three main concerns:

  1. The Generator interface is not stream-compatible with RandomState, thus the results of the Generator methods will not be exactly the same as the corresponding RandomState methods.

  2. There are a few slight differences in method names and availability between the RandomState and Generator APIs.

  3. There is no global Generator instance internal to numpy.random as is the case for numpy.random.RandomState.

The numpy.random.Generator interface breaks the stream-compatibility guarantee that numpy.random.RandomState upheld of exact reproducibility of values. Switching the default random number generator from RandomState to Generator would mean functions decorated with np_random_state would produce different results when a value other than an instantiated rng is used as the seed. For example, let’s take the following function:

@np_random_state("seed")
def bar(num, seed=None):
    """Return an array of `num` uniform random numbers."""
    return seed.random(num)

With the current implementation of np_random_state, a user can pass in an integer value to seed which will be used to seed a new RandomState instance. Using the same seed value guarantees the output is always exactly reproducible:

>>> bar(10, seed=12345)
array([0.92961609, 0.31637555, 0.18391881, 0.20456028, 0.56772503,
       0.5955447 , 0.96451452, 0.6531771 , 0.74890664, 0.65356987])
>>> bar(10, seed=12345)
array([0.92961609, 0.31637555, 0.18391881, 0.20456028, 0.56772503,
       0.5955447 , 0.96451452, 0.6531771 , 0.74890664, 0.65356987])

However, after changing the default rng returned by np_random_state to a Generator instance, the values produced by the decorated bar function for integer seeds would no longer be identical:

>>> bar(10, seed=12345)
array([0.22733602, 0.31675834, 0.79736546, 0.67625467, 0.39110955,
       0.33281393, 0.59830875, 0.18673419, 0.67275604, 0.94180287])

In order to recover exact reproducibility of the original results, a seeded RandomState instance would need to be explicitly created and passed in via seed:

>>> import numpy as np
>>> rng = np.random.RandomState(12345)
>>> bar(10, seed=rng)
array([0.92961609, 0.31637555, 0.18391881, 0.20456028, 0.56772503,
       0.5955447 , 0.96451452, 0.6531771 , 0.74890664, 0.65356987])

Because the streams would no longer be compatible, it is proposed in this NXEP that switching the default random number generator only be considered for a major release, e.g. the transition from NetworkX 2.X to NetworkX 3.0.

The second point is only a concern for users who are using create_random_state and the corresponding decorator np_random_state in their own libraries. For example, the numpy.random.RandomState.randint method has been replaced by numpy.random.Generator.integers. Thus any code that uses create_random_state or create_py_random_state and relies on the randint method of the returned rng would result in an AttributeError. This can be addressed with a compatiblity class similar to the networkx.utils.misc.PythonRandomInterface class, which provides a compatibility layer between random and numpy.random.RandomState.

create_random_state currently returns the global numpy.random.mtrand._rand RandomState instance when the input is None or the numpy.random module. By switching to numpy.random.Generator, this will no longer be possible as there is no global, internal Generator instance in the numpy.random module. This should have no effect on users, as seed=None currently does not guarantee reproducible results.

Detailed description#

This NXEP proposes to change the default random number generator produced by the create_random_state function (and the related decorator np_random_state) from a numpy.random.RandomState instance to a numpy.random.Generator instance when the input to the function is either an integer or None.

Implementation#

The implementation itself is quite simple. The logic that determines how inputs are mapped to random number generators is encapsulated in the create_random_state function (and the related create_py_random_state). Currently (i.e. NetworkX <= 2.X), this function maps inputs like None, numpy.random, and integers to RandomState instances:

def create_random_state(random_state=None):
    if random_state is None or random_state is np.random:
        return np.random.mtrand._rand
    if isinstance(random_state, np.random.RandomState):
        return random_state
    if isinstance(random_state, int):
        return np.random.RandomState(random_state)
    if isinstance(random_state, np.random.Generator):
        return random_state
    msg = (
        f"{random_state} cannot be used to create a numpy.random.RandomState or\n"
        "numpy.random.Generator instance"
    )
    raise ValueError(msg)

This NXEP proposes to modify the function to produce Generator instances for these inputs. An example implementation might look something like:

def create_random_state(random_state=None):
    if random_state is None or random_state is np.random:
        return np.random.default_rng()
    if isinstance(random_state, (np.random.RandomState, np.random.Generator)):
        return random_state
    if isinstance(random_state, int):
        return np.random.default_rng(random_state)
    msg = (
        f"{random_state} cannot be used to create a numpy.random.RandomState or\n"
        "numpy.random.Generator instance"
    )
    raise ValueError(msg)

The above captures the essential change in logic, though implementation details may differ. Most of the work related implementing this change will be associated with improved/reorganized tests; including adding tests rng-stream reproducibility.

Alternatives#

The status quo, i.e. using RandomState by default, is a completely acceptable alternative. RandomState is not deprecated, and is expected to maintain its stream-compatibility guarantee in perpetuity.

Another possible alternative would be to provide a package-level toggle that users could use to switch the behavior the seed kwarg for all functions decorated by np_random_state or py_random_state. To illustrate (ignoring implementation details):

>>> import networkx as nx
>>> from networkx.utils.misc import create_random_state

# NetworkX 2.X behavior: RandomState by default

>>> type(create_random_state(12345))
numpy.random.mtrand.RandomState

# Change random backend by setting pkg attr

>>> nx._random_backend = "Generator"

>>> type(create_random_state(12345))
numpy.random._generator.Generator

Discussion#

This section may just be a bullet list including links to any discussions regarding the NXEP:

  • This includes links to mailing list threads or relevant GitHub issues.