Wednesday 14:05 in room 1.19 (ground floor)

PyPI in the face: running jokes that PyPI download stats can play on you

Loïc Estève

We all love to tell stories with data and we all love to listen to them. Wouldn't it be great if we could also draw actionable insights from these nice stories?

As scikit-learn maintainers, we would love to use PyPI download stats and other proxy metrics (website analytics, github repository statistics, etc ...) to help inform some of our decisions like:

In the context of scikit-learn, we will present the kind of surprises and caveats we discovered when trying to make sense of the PyPI download stats.

Highlights include:

We will then zoom out a bit and talk about other metrics we looked at, for example scikit-learn.org website analytics, GitHub stars and "Used by" stats. After presenting all the inherent biases of these data, we will see present the kind of insights we gained by combining them.

During the presentation, we will also highlight a few tools and websites we used along the journey to make it easier to look at PyPI download stats numbers in more details.

We will conclude with some thoughts about how to use this kind of metrics to inform some of our decisions, while at the same time not falling in love too much with the stories we tell with them.

Loïc Estève

Loïc has a Particle Physics background, which is how he discovered Python towards the end of his PhD.

He is a scikit-learn and joblib core contributor and has been involved in a number of Python open-source projects in the past 10 years, amongst which Pyodide, dask-jobqueue, sphinx-gallery and nilearn.