menu

Questions & Answers

What is `np.ndarray[Any, np.dtype[np.float64]]` and why does `np.typing.NDArray[np.float64]` alias it?

The documentation for np.typing.NDArray says that it is "a generic version of np.ndarray[Any, np.dtype[+ScalarType]]". Where is the generalization in "generic" happening?

And in the documentation for numpy.ndarray.__class_getitem__ we have this example np.ndarray[Any, np.dtype[Any]] with no explanation as to what the two arguments are.

And why can I do np.ndarray[float], ie just use one argument? What does that mean?

Answers(1) :

"Generic" in this context means "generic type", typing-related objects that can be subscripted to generate more specific type "instances" (apologies for the sloppy jargon, I'm not well-versed in typing talk). Think typing.List that lets you use List[int] to denote a homogeneous list of ints.

As of Python 3.9 most standard-library collections have been upgraded to be compatible with typing as generic types themselves. Since tuple[foo] used to be invalid until 3.9, it was safe to allow tuple[int, int] to mean the same thing that typing.Tuple[int, int] used to mean: a tuple of two integers.

So as of 3.9 NumPy also allows using the np.ndarray type as a generic, this is what np.ndarray[Any, np.dtype[Any]] does. This "signature" matches the actual signature of np.ndarray.__init__() (__new__() if we want to be correct):

class numpy.ndarray(shape, dtype=float, ...)

So what np.ndarray[foo, bar] does is create a type for type hinting that means "a NumPy array of shape type foo and dtype bar". People normally don't call np.ndarray() directly anyway (rather using helpers such as np.array() or np.full_like() and the like), so this is doubly fine in NumPy.

Now, since most code runs with arrays of more than one possible number of dimensions, it would be a pain to have to specify an arbitrary number of lengths for the shape tuple (the first "argument" of np.ndarray as a generic type). I assume this was the motivation to define a type alias that is still a generic in the second "argument". This is np.typing.NDArray.

It lets you easily type hint something as an array of a given type without having to say anything about the shape, covering a vast subset of use cases (which would otherwise use np.ndarray[typing.Any, ...]). And this is still a generic, since you can parameterise it with a dtype. To quote the docs:

>>> print(npt.NDArray)
numpy.ndarray[typing.Any, numpy.dtype[+ScalarType]]

>>> print(npt.NDArray[np.float64])
numpy.ndarray[typing.Any, numpy.dtype[numpy.float64]]

As usual with generics, you're allowed to specify an argument to the generic type, but you're not required to. ScalarType is derived from np.generic, a base class that covers most (maybe all) NumPy scalar types. And the library code that defines NDArray is here, and is fairly transparent to the point of calling the helper _GenericAlias (apparently a backport of typing.GenericAlias). What you have at the end is a type alias that is still generic in one variable.