Monday 10:30
in room 1.19 (ground floor)
Introduction to NumPy and DataFrames
Nefta Kanilmaz
Title: Introduction to NumPy and DataFrames (Pandas & Polars)
This tutorial is targeted for beginners with basic Python knowledge and will give an understand of the basics of NumPy arrays and DataFrames, as well as perform simple data analysis tasks.
Welcome and Setup (~ 10 min)
- Quick introduction to the topic and objectives
- Ensure environments are setup
- Overview of what NumPy and DataFrames are used for
Introduction to NumPy (~25 min)
- What is NumPy and why use it?
- Creating arrays:
np.array
,np.zeros
,np.ones
,np.arange
,np.linspace
- Array shapes and reshaping:
.shape
,.reshape()
- Indexing and slicing
- Vectorized operations vs Python loops (brief performance motivation)
- Basic operations:
- Arithmetic, broadcasting,
.mean()
,.sum()
,.axis
- Arithmetic, broadcasting,
- Hands-on exercises:
- Create a 2D array and compute row-wise and column-wise means
- Element-wise multiplication of arrays
Introduction to Pandas DataFrames (~25 min)
- What is a DataFrame?
- Creating a DataFrame (from dicts, CSV, etc.)
- Exploring data:
.head()
,.info()
,.describe()
- Accessing columns and rows:
df['col']
,.loc
,.iloc
- Filtering and boolean indexing
- Common operations:
- Sorting (
.sort_values()
), grouping (.groupby()
), aggregation - Handling missing values:
.isna()
,.fillna()
,.dropna()
- Sorting (
- Simple data visualization with
.plot()
(optional if time) - Hands-on exercises:
- Load a small CSV
- Filter rows by condition
- Group by a column and compute summary stats
Polars (~25 min)
- Why Polars? Performance and parallelism
- Quick comparison with Pandas (syntax similarities/differences)
- Lazy vs eager evaluation
- Basic usage:
pl.read_csv
,df.select
,df.filter
,df.groupby
- Hands-on mini demo (load and filter data)
Recap, Tips & Q\&A (~ 5 min)
- Summary of key concepts
- When to use what (NumPy vs Pandas vs Polars)
- Tips for continued learning
- Q\&A