Wednesday 13:30 in room 1.38 (ground floor)

Recent Developments in Pytensor, the Successor Package to Theano

Jesse Grabowski, Ricardo Vieira

After MILA offically stopped development of Theano in 2018, the PyMC project forked the project and continued developing the package under a series of names: Theano-PyMC, Aesara, and now Pytensor. This choice was motivated by Theano's decision to put static computational graphs directly in front of the user. This choice turned out to be ideal for Bayesian inference, because it allows for powerful graph-to-graph transformations. For example, a graph that defines a data generating process via draws from random variables can be automatically translated into a backwards process describing the log probability of observed data, conditioned on draws from prior distributions. This is precisely the same logic that underpins reverse-mode automatic differentiation. It turns out this idea of graph transformation is extremely powerful, and development of Pytensor has focused on fully leveraging this power.

Recently, new features have been added to Pytensor, making it a powerful tool for workflows in statistics, machine learning, and Bayesian modeling. One of the most powerful features of Theano was it's use of graph rewrites to optimize computation beyond what a compiler is willing or able to do. A canonical example is to rewrite the express log(1 + x) to log1p(x), a version of the computation that remains numerically stable for small values of x. Pytensor has fully embraced the system of graph rewrites, and has expanded it to include many new cases, including:

These rewrites are possible because as a user writes Pytensor code, a static graph is constructed. At any time, the user can inspect the graph, or directly intervene on it. These interventions include extracting sub-graphs, replacing or removing inputs or functions, or applying entire graph-to-graph transformations. The majority of our talk will focus on this last operation, which represents one of the most unique and powerful features of Pytensor. This capability was first used in Theano to perform automatic differentiation. Given a forward graph of a scalar loss function, a gradient graph can be created by walking backwards from the loss in topological order, applying the chain rule. Graph-to-graph transformations can go far beyond this application. We present the following examples:

We also give several examples where Pytensor offers advantages in deep learning contexts, including:

-Op Specialization: Certain extremely expensive layers used in deep learning, including convolutions and transformers, have specialized forms that can be used when the right conditions are met. Examples include FFT-convolution and FlashAttention. Pytensor can recognize Op sequences of the form matmul-mask-softmax-dropout-matmul and replace them with a single FlashAttention Op. Similarly, convolution Ops can be replaced with FFT convolutions when the kernels are sufficiently large. Both of these optimizations can be done automatically, without users necessarily having to know the intricacies of when and how these specialized versions should be used.

We conclude with a short description of planned features for our upcoming Pytensor 3.0 release, and an invitation for interested audience members to try the package and submit issues/PRs.

Jesse Grabowski

Jesse Grabowski is a PhD candidate at Paris 1 Pantheon-Sorbonne. He is also a principal data scientist at PyMC labs, and a core developer of PyMC, Pytensor, and related packages. His area of research includes time series modeling, macroeconomics, and finance.

Ricardo Vieira