Thursday 14:05 in room 1.38 (ground floor)

Understanding Dispatching Approaches in the Scientific Python Ecosystem

Aditi Juneja, Sebastian Berg

Description

In the first few minutes we will go over what dispatching is, why it's needed, what kind of projects may benefit from it, and the kinds of dispatching i.e the user decides which backend to use explicitly or the library implicitly dispatch to the "right" implementation for the user or implicitly dispatch based on the input type(s).

Array API standard and NumPy predecessors (10-12 mins)

In this section we will start with NumPy's old dispatching, briefly explaining how it works, and also showing what it can do, such as allowing to use a NumPy function with cupy.

We will then continue with the Array API which has momentum as it is being used by libraries such as SciPy and sklearn and with existing support for numpy, pytorch, JAX, CuPy, Xarray, etc. We will highlight it's use by briefly show-casing the speed-up when a user switches e.g. from NumPy to pytorch.

These examples will help us expose key difference in type dispatching approaches both for it's users and for the implementation. A main difference being that the Array API is library orientated while the NumPy dispatching was user-orientated.

Backend-library based dispatching (10-12 mins)

We will then continue with the more general backend-library based dispatching approach that is implemented in libraries like NetworkX and scikit-image.

This approach is more general and opens up additional possibilities as it is based on Python entry_point, which are generally used to extend the functionality of a package. We will understand this entry-point based dispatching through a quick demo.

Then we will see a demo of how NetworkX speeds-up with nx-parallel and nx-cugraph backends, and the different user APIs in NetworkX to dispatch a call: based on backend-specific graph's type, backend= kwarg, environment variables, global configurations, context manager.

Then we will go over some of the learnings gathered while integrating this mechanism in scikit-image and challenges faced with dispatching arrays inputs.

Summary and Comparison

We will end with a summary of how dispatching approaches differ and when to use which ones, both as a developer and a user of a project. In this summary we wish to expand beyond the previous examples and also mention for example DataFrame API standards and narwhals.

Intended audience:

Some basic knowledge of Python is expected; should know what objects and classes are.

Thank you :)

Aditi Juneja

Hi, I am Aditi. I mostly work around API dispatching things in the scientific python ecosystem, mostly in NetworkX, nx-parallel backend and scikit-image.

GitHub: https://github.com/Schefflera-Arboricola Previous talks: https://github.com/Schefflera-Arboricola/blogs/tree/main/archive

Sebastian Berg

Sebastian has been a NumPy developer for about 10 years now. After a PhD in phsyics he worked at as a postdoc at the Berkeley Institute for Datascience on NumPy as grants byt the Alfred P. Sloan Foundation and the Gordon and Betty Moore Foundation. Since 2022 he has been a software engineer at NVIDIA where he continues to contribute to NumPy.