Spotlight and Poster Session
On Wednesday 28th, a poster session is organized from 4.30 pm to 18:00 pm. It will be preceded by a spotlight session taking place at 3:30 pm in room 7.
You can find the list of the posters below.
Validation of Association - implementation in Python library
Mateusz, Data Science and Visualisation
Abstract: During out talk, we would like to present a Python library, that implements ideas proposed in the paper “Validation of Association” by Ćmiel Bogdan and Ledwina Teresa. The authors introduce a novel function-valued measure of dependence known as the quantile dependence function. This measure plays a pivotal role in constructing tests for independence and allows for easily interpretable diagnostic plots that highlight deviations from the null model. The quantile dependence function is specifically designed to identify general dependence structures between variables within different quantiles of their joint distribution. The authors develop new estimators for the dependence function and utilize them to devise innovative tests for independence.
Revolutionizing Enterprise Operations: Innovative Uses of AI Agents
Adwaith T A, Machine and Deep Learning
Abstract: Unlock the transformative power of AI agents in enterprise environments! This session explores how AI can revolutionize business operations by automating complex tasks, enhancing decision-making, and personalizing customer interactions. Through real-world case studies and practical insights, attendees will learn advanced techniques for developing and integrating AI agents to drive efficiency and innovation. Whether you’re a technical expert or a business leader, gain the tools and knowledge to harness AI for significant operational improvements.
Visualizing and debugging tensors with aesthetic-tensor
Iliya Zhechev, Community, Education, and Outreach
Abstract: Researchers and ML Engineers working hands-on with tensors frequently have to inspect and visualize the contents of tensors in order to understand what they hold. Aesthetic-tensor is a Python library that works for PyTorch and NumPy that abstracts away tensor visualizations in an intuitive and easy-to-learn API. In this tutorial, we’re going to explain the core concept of the API and walk through a few practical examples of how to use the library.
Using time series to detect anomalies in a wide area network environment. Is it possible to solve such a problem using Python?
Paweł Żal, Scientific Applications
Abstract: Time series issues may seem destined only for a narrow group of data scientists, but in fact they can make the work of administrators, support people, researchers or developers themselves easier.
In the era of ubiquitous IoT, and the consequent intensive use of telemetry, time series make it possible to analyze the timing of events, correlate the dynamics of their occurrence, and ultimately facilitate the understanding of these phenomena.
Using Python, time series can be successfully analyzed, and when the amount of data exceeds the amount of available RAM, Python can use time series databases.
My experience allows me to present cases from the practical use of time series to detect anomalies in the operation of equipment, often preceding failures.
🧪 classy-bench
: a low-code library for quickly training and evaluating model baselines for Multi Label Classification applications
Edoardo Abati, Machine and Deep Learning
Abstract: classy-bench
is a low-code Python library that simplifies the process of training and evaluating baseline models for real-world Multi-Label Classification applications. Simply provide your datasets, and quickly get a benchmark of multiple models tailored to your specific use case. This talk will introduce the library and demonstrate its ease of use through examples.
pycodehash: boost your pipeline by skipping all unchanged steps!
Simon Brugman, Ralph, Data Science and Visualisation
Abstract: Data pipelines are of paramount importance in data science, engineering and analysis. Often, there are parts of the pipeline that have not changed. Recomputing these nodes is wasteful, especially for larger datasets. PyCodeHash is a novel generic data and Python code hashing library that facilitates downstream caching.
Mastering Python Performance: Advanced Techniques for Efficiency
Adwaith T A, High Performance Computing
Abstract: Delve into the nuanced realm of Python optimization in this comprehensive session, where we uncover advanced strategies to maximize code performance. From profiling tools like cProfile and Memory Profiler to sophisticated techniques for data handling and parallel processing, we’ll explore how to minimize resource consumption and boost execution speed. Geared towards both seasoned developers and newcomers, this talk promises to equip you with the skills to transform your Python projects into models of efficiency and high performance.
Gatherer: Insight Revelation from Diplomatic Archives
Egemen Bezci, Scientific Applications
Abstract: Gatherer is an open-source Python tool designed to enhance the research efficiency of political scientists and historians conducting extensive archival research on historical diplomatic records. This tool helps digitization of physical records, converts unstructured data into structured tabular formats, generates summaries, extracts metadata, and identifies key named entities. These capabilities streamline the primary source collection and synthesis processes, significantly improving the research workflow.
SpatialData: a FAIR framework for multimodal spatial omics
Wouter-Michiel Vierdag, Luca Marconato, High Performance Computing
Abstract: Spatial omics data generation displaying DNA, RNA and protein within their spatial context has been tremendously increasing in the last couple years. This has led to challenges for bioinformaticians tasked with analysing the data due to among others data size and a plethora of different formats being used by different researchers. Hence the need for highly performant findable, accessible, interoperable and reusable (FAIR) representation of this bioimaging data. For this we developed the SpatialData framework, a solution that combines an on-disk format, the SpatialData format, with a set of Python libraries for accessing and operating on spatial omics data, and tools for interactive data annotation and visualization.The SpatialData library seamlessly integrates with the existing Python ecosystem by building upon standard scientific Python data types, such as xarray, dask, geopandas and anndata. Thereby providing a flexible, community standards-based, open framework to store, process, and annotate data from virtually any spatial omics technology available to date. With the simplified and interoperable data representation, the ability to easily create unified coordinate systems, and the numerous downstream analysis capabilities, it can facilitate the development, reproducibility and reuse of analysis pipelines, and ultimately unlock new approaches to unfold scientific questions.
Streamlining Strain-Stress Analysis with Pydidas for XRD experiments
Gudrun Lotze, Scientific Applications
Abstract: X-ray diffraction (XRD) reveals atomic structures in a variety of materials, from chocolate and biomaterials, like bone, to hard coatings for CNC machining tools. Synchrotron radiation facilities constantly strive to engage new scientific communities. However, attracting new user groups can be challenging, as many are not familiar with XRD. Pydidas bridges this gap by offering a comprehensive framework for XRD analysis, featuring data processing, analysis, and visualization tools. It efficiently handles complex HDF5 files and provides near real-time feedback. We introduce a new Pydidas workflow for X-ray diffraction-based strain-stress analysis, combining crystallographic data with mechanical properties. This integration aids in optimizing the material design by understanding how the microstructure influences macroscopic behaviour, advancing materials development in mechanical and aerospace engineering.
A modular interface for visualization and pre-processing for multi-channel signals
Anais Monteils, Data Science and Visualisation
Abstract: Filtering and having an overview of data are essential steps before feature extraction or any analytical process. This interface can provide a robust foundation to support this processing stage. The key word: modularity.
Built in the context of High Density Electromyography processing, this interface developed with PyQT, Pyqtgraph, Xarray/Datatree and Jinja, offers multi-channel visualizations in temporal and frequency domains. It enables users to generate reports with specific metrics, create and customize processing pipelines. Designed to be as modular as possible, the interface allows users to incorporate their own features that fit their datasets by following a standardized code architecture.
Project-Based Python Training Resources for AI Scientists, Engineers, and Developers
Anuradha Kar, PhD, Community, Education, and Outreach
Abstract: In this talk, I will discuss about the significance of hands-on project based training resources for Python learners of all levels who aspire to work in the latest domains of machine learning, deep learning and generative AI algorithms and applications. I will present the steps of development of hands-on training modules for python programming and AI enthusiasts. These resources are aimed to help learners in mastering key concepts while developing applied technical skills that are essential in undertaking latest AI based projects in industry and academia. In this talk I will discuss how python programmers and AI developers at all knowledge levels may use and benefit from such hands-on training approaches as well as how currently there is a huge need for developing project based learning modules in the domains of python programming, machine learning, generative AI and data science. As a creator of several such practical learning modules myself, I will do a walk through of the procedural steps involved in building these kind of learning curriculums and highlight the need and benefits of such practical, hands-on training methodologies.
From Logs to Insights: An Exploration of Infrastructure Logging and Clustering
Arkadiusz Trawiński, Data Science and Visualisation
Abstract: The analysis of logging messages is a big challenge because of their massive number, different origins and unspecify formats. These challenges can be partially address with NLP techniques and ultimately detect, predict or even maybe avoid incidents. What we demonstrate is complete monitoring solution. That includes clustering and uncovering warning-incident correlation with a Hawkes model. This model was previously successfully applied for earthquake predictions based on aftershocks. The Hawkes process model is well-defined mathematically and can process a large volume of data.