Python-Blosc2: Compress Better, Compute Bigger!
Francesc Alted, Luke Shaw
Blosc and Blosc2 are well-known and widely used libraries for high-performance data compression. They are particularly effective for compressing large datasets, such as those encountered in data science and high-performance computing. The Blosc library has been around for over a decade, and its design has always prioritized speed, with a focus on achieving compression and decompression speeds that are close to or even exceed memory bandwidth limits.
With the introduction of a new compute engine in Python-Blosc2 3.0, the guiding principle has evolved to "Compress Better, Compute Bigger." This enhancement enables computations on datasets that are over 100 times larger than the available RAM, all while maintaining high performance.
During our talk, we will delve into the latest features of Python-Blosc2, including:
- Seamless integration with NumPy and the Python Data ecosystem
- High-performance compression and decompression
- The new compute engine and its capabilities
- A JIT (Just-In-Time) compiler for Python functions including almost all NumPy functions
- The ability to perform computations on datasets that exceed available RAM
To illustrate this, we will present an example of using Python-Blosc2 to analyze a dataset that largely exceeds the capacity of the available RAM. We will demonstrate how to leverage the new compute engine to perform computations efficiently, without the need for specialized hardware or infrastructure.
By the end of this talk, attendees will understand how Python-Blosc2 can help overcome memory constraints in their data workflows. Whether you're working with medium-sized datasets on modest hardware or large datasets on high-performance systems, you'll learn practical techniques to compress data while maintaining computational efficiency.
Join us to explore how this powerful library can expand your capabilities for scientific computing and data analysis while reducing memory footprint and improving processing speed.