Part 3: Patterns for Data Science & ML

DS-MLOps Python Foundations

Python 3.12+ | Author: Anthony Faustine

Before you begin

This notebook assumes you have completed Part 1 (01-python-core.ipynb) and Part 2 (02-control-flow.ipynb). If you have not, start there. The concepts here build directly on both.

Part 2 introduces professional coding patterns: the habits and structures that separate a working script from maintainable, production-grade code. These patterns are used every day in real data science and ML engineering work.

Pattern	Why it matters
Functions	Reuse logic without copying code; make code testable
Lambda	Write concise callbacks for `sorted()`, `map()`, pandas `.apply()`
args / *kwargs	Handle flexible inputs like scikit-learn and PyTorch do
Dataclasses	Typed, structured containers for configs and pipeline state
Modules	Organise code into files; use the standard library
Exceptions	Handle errors gracefully instead of crashing
pathlib	Read and write files safely, cross-platform

The running example is the same university analytics platform from Part 1.

Callout markers used throughout this notebook are explained on the book cover page.

Learning Objectives

#	Skill	Covered in
1	Write type-annotated functions with Google-style docstrings	Sec. 1
2	Use lambda, `args`, and `*kwargs`	Sec. 2, 3
3	Define structured data with `@dataclass`	Sec. 4
4	Import and use the standard library	Sec. 5
5	Handle exceptions with `try/except/else/finally`	Sec. 6
6	Read and write files with `pathlib.Path`	Sec. 7
7	Recognise and avoid the most common Python gotchas	Sec. 8

1. Functions

A function is a named, reusable block of code. You define it once and call it as many times as you need, with different inputs each time.

# Define once:
def greet(name):
    print(f'Hello, {name}!')

# Call many times:
greet('Alice')   # Hello, Alice!
greet('Bob')     # Hello, Bob!

Without functions, any repeated logic must be copy-pasted, and copy-pasted code means bugs fixed in one place but not the other. Functions are the foundation of all reusable, testable code.

Key Concept: Type Hints + Google Docstrings

Every function you write for production should have:

Type annotations on all parameters and the return value: mypy and your IDE use them to catch bugs. This is the same name: type syntax from Part 1, Sec. 1, just applied to function signatures.
A docstring that explains what the function does, its arguments, and what it returns. This project uses Google style.

The default parameter rule: defaults are evaluated once at definition time. Never use a mutable object (list, dict) as a default; see the Common Mistake callout in Sec. 8.

def weighted_grade(
    midterm: float,
    final: float,
    project: float,
    weights: tuple[float, float, float] = (0.30, 0.50, 0.20),
) -> float:
    """Compute a weighted final grade from three components.

    Args:
        midterm: Midterm exam score (0-100).
        final: Final exam score (0-100).
        project: Project score (0-100).
        weights: (midterm_w, final_w, project_w) : must sum to 1.0.

    Returns:
        Weighted average on a 0-100 scale.

    Raises:
        ValueError: If weights do not sum to 1.0 within tolerance.
    """
    w_mid, w_fin, w_proj = weights
    if abs(w_mid + w_fin + w_proj - 1.0) > 1e-9:
        raise ValueError(f"Weights must sum to 1.0; got {w_mid + w_fin + w_proj}")
    return midterm * w_mid + final * w_fin + project * w_proj

Call the function with default weights, then with custom weights. Keyword arguments make the call self-documenting:

# Default weights (0.30 mid / 0.50 final / 0.20 project)
grade = weighted_grade(midterm=82.0, final=91.0, project=88.0)
print(f"Default weights : {grade:.1f}")

# Override weights - the tuple must still sum to 1.0
custom = weighted_grade(82.0, 91.0, 88.0, weights=(0.20, 0.60, 0.20))
print(f"Custom weights  : {custom:.1f}")

Default weights : 87.7
Custom weights  : 88.6

Functions can return multiple values packed as a tuple. Callers unpack them directly into named variables with a, b, c = func():

def score_summary(scores: list[float]) -> tuple[float, float, float, float]:
    """Return descriptive statistics for a score list.

    Args:
        scores: Non-empty list of numeric scores.

    Returns:
        Tuple of (mean, minimum, maximum, std_dev).

    Raises:
        ValueError: If scores is empty.
    """
    if not scores:
        raise ValueError("scores must not be empty")
    n = len(scores)
    mean = sum(scores) / n
    variance = sum((s - mean) ** 2 for s in scores) / n
    return mean, min(scores), max(scores), variance**0.5

Python packs the four return values into a tuple. Unpack them in one line with tuple unpacking:

exam_scores: list[float] = [78.0, 85.5, 92.0, 88.5, 95.0, 67.0, 81.0]
mean, lo, hi, std = score_summary(exam_scores)
print(f"mean={mean:.1f}  min={lo}  max={hi}  std={std:.1f}")

mean=83.9  min=67.0  max=95.0  std=8.8

Another essential DS function: z-score normalisation. It scales any value to “how many standard deviations from the mean”, a prerequisite for most ML models:

import statistics


def normalize(value: float, mean: float, std: float) -> float:
    """Compute the z-score of a single value.

    Args:
        value: The raw data point.
        mean: Population or sample mean.
        std:  Standard deviation (must be non-zero).

    Returns:
        Z-score: positive = above average, negative = below average.

    Raises:
        ValueError: If std is zero (all values identical).
    """
    if std == 0.0:
        raise ValueError("std must be non-zero")
    return (value - mean) / std

Apply it to an exam score list. The z-scores make it immediately clear who is above/below the class mean:

exam_scores: list[float] = [72.0, 85.0, 91.0, 68.0, 88.0, 77.0, 94.0, 63.0]
mu = statistics.mean(exam_scores)
sig = statistics.stdev(exam_scores)

print(f"mean={mu:.1f}  std={sig:.1f}\n")
for score in exam_scores:
    z = normalize(score, mu, sig)
    label = "above avg" if z > 0 else "below avg"
    print(f"  {score:5.1f}  z={z:+.2f}  {label}")

mean=79.8  std=11.4

   72.0  z=-0.68  below avg
   85.0  z=+0.46  above avg
   91.0  z=+0.99  above avg
   68.0  z=-1.03  below avg
   88.0  z=+0.72  above avg
   77.0  z=-0.24  below avg
   94.0  z=+1.25  above avg
   63.0  z=-1.47  below avg

Activity 1 - Write a Grade Classifier Function

Goal: Write a fully annotated function classify_cohort that takes a list of scores and returns a dict mapping each grade letter to its count.

classify_cohort([95, 83, 71, 62, 45, 88, 76])
# -> {'A': 1, 'B': 2, 'C': 2, 'D': 1, 'F': 1}

Hint: Use a helper function grade_letter(score) -> str and Counter.

from collections import Counter


def grade_letter(score: float) -> str:
    """Return the letter grade for a numeric score."""
    ...  # TODO


def classify_cohort(scores: list[float]) -> dict[str, int]:
    """Return grade-letter frequency counts for a cohort.

    Args:
        scores: List of numeric scores.

    Returns:
        Dict mapping each letter grade to its count.
    """
    ...  # TODO


print(classify_cohort([95.0, 83.0, 71.0, 62.0, 45.0, 88.0, 76.0]))

None

Activity 2 - Login Validator

Goal: Write accept_login(users, username, password) that takes a dict[str, str] of username→password pairs and returns True if the username exists and the password matches, False otherwise.

users = {"alice": "ds2024", "bob": "ml#secure"}

accept_login(users, "alice", "ds2024")   # True
accept_login(users, "alice", "wrong")    # False  (bad password)
accept_login(users, "carol", "any")      # False  (user not found)

Hint: Use dict.get() to avoid a KeyError on missing usernames.

def accept_login(users: dict[str, str], username: str, password: str) -> bool:
    """Return True if username exists and password matches."""
    # TODO: implement


users: dict[str, str] = {"alice": "ds2024", "bob": "ml#secure"}
print(accept_login(users, "alice", "ds2024"))  # True
print(accept_login(users, "alice", "wrong"))  # False
print(accept_login(users, "carol", "any"))  # False

None
None
None

2. Lambda Functions

A lambda is an anonymous (nameless) function defined in a single expression. It takes inputs on the left of : and produces a result on the right:

double = lambda x: x * 2
double(5)   # -> 10

Lambdas are most useful as short callbacks: a function you pass to another function rather than call yourself. You will use them constantly with sorted(), map(), filter(), and pandas .apply().

Key Concept: Anonymous Single-Expression Function

A lambda is a function with no name, a single expression, and an implicit return. Use it as a short callback for sorted(), map(), filter(), and especially pandas .apply().
If the body needs more than one expression, write a named function instead; lambdas must stay simple.

# Lambda syntax: lambda params: expression
square = lambda x: x**2  # noqa: E731
clamp = lambda x, lo, hi: max(lo, min(x, hi))  # noqa: E731

print(f"square(5)          = {square(5)}")
print(f"clamp(150, 0, 100) = {clamp(150, 0, 100)}")

square(5)          = 25
clamp(150, 0, 100) = 100

The most common real-world use for lambdas is as a sort key: a function that extracts the comparison value from each element:

# Most common use: sort key: avoids writing a throwaway named function
students: list[dict[str, object]] = [
    {"name": "Alice", "gpa": 3.95, "major": "CS"},
    {"name": "Bob", "gpa": 3.45, "major": "Math"},
    {"name": "Carol", "gpa": 3.88, "major": "CS"},
    {"name": "Dan", "gpa": 3.72, "major": "Math"},
]

by_gpa = sorted(students, key=lambda s: s["gpa"], reverse=True)
by_major = sorted(students, key=lambda s: (s["major"], s["gpa"]))

print("By GPA (desc):")
for s in by_gpa:
    print(f"  {s['name']:<8} GPA={s['gpa']}")

print("\nBy major then GPA:")
for s in by_major:
    print(f"  {s['major']:<6}  {s['name']:<8} GPA={s['gpa']}")

By GPA (desc):
  Alice    GPA=3.95
  Carol    GPA=3.88
  Dan      GPA=3.72
  Bob      GPA=3.45

By major then GPA:
  CS      Carol    GPA=3.88
  CS      Alice    GPA=3.95
  Math    Bob      GPA=3.45
  Math    Dan      GPA=3.72

map() and filter()

map(func, iterable) applies func to every element. Both return lazy iterators – wrap with list() to materialise the result:

# map() applies a function to every element: returns a lazy iterator
raw_scores: list[str] = ["78.5", "85.0", "92.3", "61.0", "88.7"]

# Convert strings to floats
scores: list[float] = list(map(float, raw_scores))
print(f"Converted : {scores}")

# Normalise each score to 0-1
lo, hi = min(scores), max(scores)
normed: list[float] = list(  # noqa: C417
    map(lambda s: round((s - lo) / (hi - lo), 3), scores)
)
print(f"Normalised: {normed}")

Converted : [78.5, 85.0, 92.3, 61.0, 88.7]
Normalised: [0.559, 0.767, 1.0, 0.0, 0.885]

filter(func, iterable) keeps only elements where func returns True. Equivalent to [x for x in items if func(x)] but more expressive with a named predicate:

# filter() keeps elements where the function returns True
passing: list[float] = list(filter(lambda s: s >= 70, scores))
print(f"Passing: {passing}")

# Preview: lambda is the backbone of pandas .apply()
# df['grade'] = df['score'].apply(lambda s: 'pass' if s >= 70 else 'fail')
# You will see this pattern in the Data Analysis module.

Passing: [78.5, 85.0, 92.3, 88.7]

3. *args and **kwargs

What are *args and **kwargs?

Sometimes you want a function to accept any number of arguments without listing them all:

def add(*numbers):        # *numbers collects all positional args into a tuple
    return sum(numbers)

add(1, 2)       # -> 3
add(1, 2, 3, 4) # -> 10   (same function, different number of args)

This pattern is used by virtually every ML library: nn.Sequential(*layers), model.fit(X, y, **config), pd.concat([df1, df2], **options).

Key Concept: Variable Positional and Keyword Arguments

*args collects any number of positional arguments into a tuple.
**kwargs collects any number of keyword arguments into a dict.

You will encounter both constantly in scikit-learn, PyTorch, and FastAPI APIs: model.fit(X, y, **config), nn.Sequential(*layers).

# *args collects all positional arguments into a tuple
def ensemble_predict(*predictions: float) -> float:
    """Return the mean of any number of model predictions.

    Args:
        *predictions: Floats from individual model predictions.

    Returns:
        Mean of all predictions.

    Raises:
        ValueError: If no predictions are provided.
    """
    if not predictions:
        raise ValueError("At least one prediction required")
    return sum(predictions) / len(predictions)

Call with one value, three values, or unpack a list with *. The function signature stays the same in all three cases:

print(ensemble_predict(0.82))  # single model
print(ensemble_predict(0.82, 0.91, 0.87))  # three models

# * unpacks a list into positional arguments
model_preds: list[float] = [0.82, 0.91, 0.87, 0.79]
print(ensemble_predict(*model_preds))  # four models

0.82
0.8666666666666667
0.8475

**kwargs collects any number of keyword arguments into a dict. Unpacking a dict with ** passes its contents as keyword arguments to another function:

# **kwargs collects all keyword arguments into a dict
def build_config(model: str, **hyperparams: object) -> dict[str, object]:
    """Assemble a model config dict from keyword arguments.

    Args:
        model: Model name identifier.
        **hyperparams: Any additional hyperparameter key/value pairs.

    Returns:
        Config dict with model name plus all hyperparameters.
    """
    return {"model": model} | dict(hyperparams)

Pass any combination of keyword arguments. ** also unpacks a dict into keyword arguments at the call site:

cfg1 = build_config("xgboost", n_estimators=200, max_depth=6, learning_rate=0.05)
cfg2 = build_config("linear", C=1.0, penalty="l2")
print("Config 1:", cfg1)
print("Config 2:", cfg2)

# ** unpacks a dict into keyword arguments
base_params: dict[str, object] = {"n_estimators": 100, "max_depth": 4}
cfg3 = build_config("xgboost", **base_params, learning_rate=0.01)
print("Config 3:", cfg3)

Config 1: {'model': 'xgboost', 'n_estimators': 200, 'max_depth': 6, 'learning_rate': 0.05}
Config 2: {'model': 'linear', 'C': 1.0, 'penalty': 'l2'}
Config 3: {'model': 'xgboost', 'n_estimators': 100, 'max_depth': 4, 'learning_rate': 0.01}

You can combine all four argument forms in one signature: fixed positional, *args, keyword-only (after *), and **kwargs. This is the pattern used by scikit-learn, PyTorch, and FastAPI:

# All four kinds of argument in one signature:
#   positional  →  run_id
#   *args       →  tags (zero or more strings)
#   keyword-only → verbose (must be named at the call site)
#   **kwargs    →  metrics (any float metrics)
def log_run(
    run_id: str,
    *tags: str,
    verbose: bool = False,
    **metrics: float,
) -> None:
    """Log a training run with optional tags and metrics."""
    tag_str = ", ".join(tags) if tags else "none"
    metric_str = "  ".join(f"{k}={v:.3f}" for k, v in metrics.items())
    print(f"[{run_id}] tags=[{tag_str}]  {metric_str}")
    if verbose:
        print(f"  (full metrics: {metrics})")

Call it with positional tags and keyword metric pairs. verbose is keyword-only. It cannot be passed positionally:

log_run("run-001", "baseline", "v1", accuracy=0.923, loss=0.218)
log_run("run-002", verbose=True, accuracy=0.934, precision=0.918)

[run-001] tags=[baseline, v1]  accuracy=0.923  loss=0.218
[run-002] tags=[none]  accuracy=0.934  precision=0.918
  (full metrics: {'accuracy': 0.934, 'precision': 0.918})

Activity 2 - Flexible Metric Logger

Goal: Write log_metrics(epoch, **metrics) that prints a formatted line and returns a dict.

log_metrics(5, loss=0.312, accuracy=0.901, val_loss=0.334)
# prints:
# Epoch 05 | loss=0.3120  accuracy=0.9010  val_loss=0.3340
# returns:
# {'epoch': 5, 'loss': 0.312, 'accuracy': 0.901, 'val_loss': 0.334}

def log_metrics(epoch: int, **metrics: float) -> dict[str, object]:
    """Print and return a training metrics snapshot.

    Args:
        epoch: Current training epoch.
        **metrics: Metric name -> value pairs.

    Returns:
        Dict containing epoch and all metrics.
    """
    ...  # TODO


result = log_metrics(5, loss=0.312, accuracy=0.901, val_loss=0.334)
print(result)

None

4. Dataclasses

A dataclass is a class (a blueprint for creating objects) where Python automatically generates the __init__, __repr__, and __eq__ methods from your field annotations. It is the modern replacement for the plain dict records from Part 1, Sec. 5, once the structure of your data is fixed and known ahead of time.

# Instead of this:
config = {'model': 'xgboost', 'lr': 0.001, 'epochs': 50}

# Use this - self-documenting, typed, and auto-validated:
@dataclass
class Config:
    model: str
    lr: float
    epochs: int

If you have not used classes before, think of a class as a custom type you define. Dataclasses are the gentlest introduction. They need no understanding of inheritance or self beyond what is shown here.

Key Concept: Typed Structured Data

A @dataclass (Python 3.7+) generates init, repr, and eq from field annotations automatically. It is the modern replacement for plain dicts when the shape of your data is known and fixed.

When to use what:

dict: flexible, arbitrary keys, JSON-friendly
NamedTuple: immutable record, tuple-compatible
@dataclass: mutable typed object with methods; default for ML configs and pipeline state
@dataclass(frozen=True): immutable dataclass; hashable, usable as dict key

from dataclasses import dataclass, field


@dataclass
class TrainingConfig:
    """Configuration for a single training run."""

    model_name: str
    learning_rate: float
    epochs: int
    batch_size: int = 32  # field with a default value
    optimizer: str = "adam"
    tags: list[str] = field(default_factory=list)  # mutable default: use field()!

    def is_fast_run(self) -> bool:
        """Return True if this is a quick smoke-test run (≤ 5 epochs)."""
        return self.epochs <= 5

@dataclass generates __init__, __repr__, and __eq__ automatically. Create an instance exactly like calling a function:

cfg = TrainingConfig(
    model_name="xgboost-v2",
    learning_rate=0.001,
    epochs=50,
    tags=["baseline", "production"],
)

print(cfg)  # __repr__ generated: no boilerplate needed
print(f"Fast run : {cfg.is_fast_run()}")
print(f"Tags     : {cfg.tags}")

TrainingConfig(model_name='xgboost-v2', learning_rate=0.001, epochs=50, batch_size=32, optimizer='adam', tags=['baseline', 'production'])
Fast run : False
Tags     : ['baseline', 'production']

Dataclass fields are mutable by default. __eq__ compares field values, not object identity: two separately created instances with the same fields are equal:

# Mutation: dataclass fields are mutable by default
cfg.epochs = 100
cfg.tags.append("extended")
print(f"Updated epochs: {cfg.epochs}  tags: {cfg.tags}")

# Equality: __eq__ is generated automatically from field values
cfg2 = TrainingConfig("xgboost-v2", 0.001, 100, tags=["baseline", "production", "extended"])
print(f"cfg == cfg2: {cfg == cfg2}")

Updated epochs: 100  tags: ['baseline', 'production', 'extended']
cfg == cfg2: True

frozen=True: Immutable, Hashable Dataclass

frozen=True prevents field mutation after creation and makes the object hashable – it can then be used as a dict key or placed in a set:

from dataclasses import dataclass


@dataclass(frozen=True)
class DatasetSplit:
    """Immutable description of a train/val/test split."""

    train_size: float
    val_size: float
    test_size: float
    random_seed: int = 42

    def __post_init__(self) -> None:
        total = self.train_size + self.val_size + self.test_size
        if abs(total - 1.0) > 1e-9:
            raise ValueError(f"Splits must sum to 1.0; got {total}")

__post_init__ runs automatically after __init__. Any invalid split ratio is caught immediately on construction:

split = DatasetSplit(train_size=0.70, val_size=0.15, test_size=0.15)
print(split)

# Invalid split: __post_init__ raises ValueError
try:
    bad = DatasetSplit(train_size=0.80, val_size=0.15, test_size=0.15)
except ValueError as exc:
    print(f"Caught: {exc}")

DatasetSplit(train_size=0.7, val_size=0.15, test_size=0.15, random_seed=42)
Caught: Splits must sum to 1.0; got 1.1

A frozen dataclass can be used as a dict key for result caching. Attempting to set a field after construction raises FrozenInstanceError:

# frozen=True means the object is hashable: usable as a dict key or set element
cache: dict[DatasetSplit, float] = {split: 0.923}
print(f"Cached accuracy: {cache[split]}")

# Attempting mutation raises FrozenInstanceError
try:
    split.train_size = 0.80  # type: ignore[misc]
except Exception as exc:
    print(f"Immutable: {exc}")

Cached accuracy: 0.923
Immutable: cannot assign to field 'train_size'

5. Modules & the Standard Library

A module is a Python file. When you write import math, Python loads the file math.py from the standard library and makes its contents available under the name math.

import math
math.sqrt(9)   # -> 3.0

You can import your own code files the same way. Splitting code into modules is how real projects stay organised as they grow.

Key Concept: Import Patterns

import math	import the whole module; access with `math.sqrt()`
from math import sqrt, pi	import specific names; use directly
import numpy as np	alias (conventional for large packages)

Prefer import module over from module import *. The star import pollutes the namespace and hides where names come from.

import math

# math: precise numeric operations
print(f"pi             = {math.pi:.6f}")
print(f"sqrt(2)        = {math.sqrt(2):.6f}")
print(f"log base 2(8)  = {math.log2(8)}")
print(f"ceil(4.2)      = {math.ceil(4.2)}")
print(f"floor(4.9)     = {math.floor(4.9)}")

pi             = 3.141593
sqrt(2)        = 1.414214
log base 2(8)  = 3.0
ceil(4.2)      = 5
floor(4.9)     = 4

random: Sampling, Shuffling, and Simulation

random provides pseudo-random number generation for sampling, shuffling, and Monte Carlo simulation. Always call random.seed(n) at the start of any script that uses randomness. It makes results reproducible:

Function	What it does
`random.seed(n)`	Fix the random state for reproducibility
`random.uniform(a, b)`	Float in `[a, b]`
`random.randint(a, b)`	Integer in `[a, b]` (both inclusive)
`random.choice(seq)`	One element from a sequence
`random.sample(seq, k)`	`k` unique elements (no replacement)
`random.shuffle(lst)`	Shuffle a list in place

import random

# Always set a seed before any random operation for reproducibility
random.seed(42)

scores: list[float] = [72.0, 85.0, 91.0, 68.0, 88.0, 77.0, 94.0, 63.0]

# shuffle: randomise in place (S311: not for crypto -- pedagogical demo)
shuffled = scores.copy()
random.shuffle(shuffled)  # noqa: S311
print(f"Shuffled : {shuffled}")

# choice: pick one element at random
winner = random.choice(scores)  # noqa: S311
print(f"Random winner: {winner}")

# sample: pick k unique elements (without replacement)
batch = random.sample(scores, k=3)  # noqa: S311
print(f"Random batch : {batch}")

# uniform: float in [a, b]
points: list[float] = [random.uniform(0.0, 1.0) for _ in range(5)]  # noqa: S311
print(f"Uniform [0,1]: {[round(p, 3) for p in points]}")

# randint: integer in [a, b] (inclusive both ends)
dice_rolls: list[int] = [random.randint(1, 6) for _ in range(10)]  # noqa: S311
print(f"Dice rolls   : {dice_rolls}")

Shuffled : [68.0, 88.0, 94.0, 63.0, 91.0, 77.0, 72.0, 85.0]
Random winner: 85.0
Random batch : [85.0, 88.0, 68.0]
Uniform [0,1]: [0.032, 0.094, 0.233, 0.602, 0.561]
Dice rolls   : [6, 6, 6, 5, 4, 2, 4, 5, 3, 1]

json: Serialise Python Objects

json converts Python dicts, lists, strings, numbers, and booleans to a JSON string and back. It is the standard format for saving model configs and experiment results:

import json

# json: serialise / deserialise Python objects to JSON strings
run_result: dict[str, object] = {
    "run_id": "exp-2024-001",
    "model": "xgboost",
    "accuracy": 0.923,
    "loss": 0.218,
    "tags": ["baseline", "production"],
}

json_str: str = json.dumps(run_result, indent=2)
print("JSON string:")
print(json_str)

loaded: dict[str, object] = json.loads(json_str)
print(f"Round-trip accuracy: {loaded['accuracy']}")

JSON string:
{
  "run_id": "exp-2024-001",
  "model": "xgboost",
  "accuracy": 0.923,
  "loss": 0.218,
  "tags": [
    "baseline",
    "production"
  ]
}
Round-trip accuracy: 0.923

datetime represents a point in time. Always attach timezone.utc to avoid ambiguous “naive” datetime objects that can silently shift across time zones:

from datetime import UTC, datetime, timezone

# datetime: timestamp experiments and logs
now = datetime.now(tz=UTC)
print(f"Timestamp : {now.isoformat()}")
print(f"Date part : {now.strftime('%Y-%m-%d')}")

Timestamp : 2026-06-18T09:54:16.238386+00:00
Date part : 2026-06-18

6. Exception Handling

An exception is an error that occurs while your program is running. By default, Python stops immediately and prints a traceback. Exception handling lets you catch the error, respond to it gracefully, and keep the program running.

# Without handling - program crashes:
int("abc")   # ValueError: invalid literal for int() with base 10: 'abc'

# With handling - program continues:
try:
    int("abc")
except ValueError:
    print("That was not a number, skipping")

In data pipelines and ML training loops, unhandled exceptions can discard hours of computation. Always handle errors at system boundaries (user input, file I/O, APIs).

Key Concept: try / except / else / finally

try: code that might raise an exception
except ExcType as e: handle a specific exception
else: runs only if NO exception was raised in try
finally: always runs, even if an exception propagates (use for cleanup)

Catch the most specific exception you can. Bare except: or except Exception: hides bugs and silences keyboard interrupts.

def parse_score(raw: str) -> float:
    """Parse a score string and validate it is in [0, 100].

    Args:
        raw: String representation of a numeric score.

    Returns:
        Validated float score.

    Raises:
        ValueError: If raw is not numeric or out of range.
    """
    try:
        value = float(raw)
    except ValueError:
        raise ValueError(f"{raw!r} is not a valid number") from None

    if not 0 <= value <= 100:
        raise ValueError(f"Score {value} is out of range [0, 100]")

    return value

Test parse_score() against a range of inputs: valid numbers, out-of-range values, and non-numeric strings. The else clause runs only on the success path:

# Test parse_score against valid and invalid inputs
test_inputs: list[str] = ["87.5", "105", "abc", "-3", "72"]

for raw in test_inputs:
    try:
        score = parse_score(raw)
    except ValueError as exc:
        print(f"  {raw!r:<8} -> ERROR: {exc}")
    else:
        print(f"  {raw!r:<8} -> OK: {score}")

  '87.5'   -> OK: 87.5
  '105'    -> ERROR: Score 105.0 is out of range [0, 100]
  'abc'    -> ERROR: 'abc' is not a valid number
  '-3'     -> ERROR: Score -3.0 is out of range [0, 100]
  '72'     -> OK: 72.0

finally runs regardless of whether an exception occurred or was handled. Use it for cleanup code (closing files, releasing connections) that must execute either way. This example illustrates the pattern using explicit open/close; in practice, always use with open(...) as fh instead (shown in Sec. 7):

# else runs ONLY when try succeeds; finally ALWAYS runs (cleanup guarantee)
def load_scores(filepath: str) -> list[float]:
    """Load numeric scores from a text file, one per line."""
    fh = None
    try:
        fh = open(filepath, encoding="utf-8")  # noqa: SIM115, PTH123
        lines = fh.readlines()
    except FileNotFoundError:
        print(f"File not found: {filepath!r}")
        return []
    except PermissionError as exc:
        print(f"Permission denied: {exc}")
        return []
    else:
        print(f"Loaded {len(lines)} lines successfully")
        return [float(line.strip()) for line in lines if line.strip()]
    finally:
        if fh is not None:
            fh.close()
            print("File handle closed")

finally guarantees the file handle is closed even when the file does not exist: no resource leak is possible:

# NOTE: prefer `with open(...) as fh` in practice (shown in Sec. 7).
# This example uses explicit open/close to make else/finally visible.
result = load_scores("nonexistent.txt")
print(f"Result: {result}")

File not found: 'nonexistent.txt'
Result: []

Custom Exception Classes

Subclass a built-in exception to give callers a specific type to catch. Store the structured context as instance attributes for programmatic access:

# Custom exception classes give callers something specific to catch
class DataValidationError(ValueError):
    """Raised when a data record fails validation."""

    def __init__(self, field: str, value: object, reason: str) -> None:
        self.field = field
        self.value = value
        self.reason = reason
        super().__init__(f"Validation failed for {field!r}={value!r}: {reason}")

Define a validator that raises DataValidationError with field-level context, then test it. The except DataValidationError clause catches only your custom type – not any accidental ValueError from elsewhere in the code:

def validate_student(record: dict[str, object]) -> None:
    """Validate a student record dict against required field constraints."""
    gpa = record.get("gpa")
    if not isinstance(gpa, int | float):
        raise DataValidationError("gpa", gpa, "must be numeric")
    if not 0.0 <= float(gpa) <= 4.0:
        raise DataValidationError("gpa", gpa, "must be in [0.0, 4.0]")

    name = record.get("name", "")
    if not isinstance(name, str) or not name.strip():
        raise DataValidationError("name", name, "must be a non-empty string")

Test against valid and invalid records. The custom exception prints exactly which field failed and why:

test_records: list[dict[str, object]] = [
    {"name": "Alice", "gpa": 3.95},  # valid
    {"name": "Bob", "gpa": 5.0},  # GPA out of range
    {"name": "", "gpa": 3.5},  # empty name
]

for rec in test_records:
    try:
        validate_student(rec)
        print(f"  {rec['name']!r:<10} -> valid")
    except DataValidationError as exc:
        print(f"  {rec.get('name')!r:<10} -> {exc}")

  'Alice'    -> valid
  'Bob'      -> Validation failed for 'gpa'=5.0: must be in [0.0, 4.0]
  ''         -> Validation failed for 'name'='': must be a non-empty string

Activity 3 - Safe Batch Parser

Goal: Write parse_batch(rows) that returns (valid, errors): a list of successfully parsed floats and a list of error messages.

rows = ['85.0', '92', 'n/a', '-5', '78.5', '110', '63']

valid, errors = parse_batch(rows)
# valid  = [85.0, 92.0, 78.5, 63.0]
# errors = ["'n/a' is not a valid number",
#           "'-5' out of range [0, 100]",
#           "'110' out of range [0, 100]"]

def parse_batch(rows: list[str]) -> tuple[list[float], list[str]]:
    """Parse a batch of score strings, separating valid from invalid.

    Args:
        rows: List of raw score strings.

    Returns:
        Tuple of (valid_scores, error_messages).
    """
    valid: list[float] = []
    errors: list[str] = []
    # TODO: iterate rows, use parse_score from above, collect results
    return valid, errors


rows: list[str] = ["85.0", "92", "n/a", "-5", "78.5", "110", "63"]
valid, errors = parse_batch(rows)
print(f"valid  = {valid}")
print(f"errors = {errors}")

valid  = []
errors = []

7. File I/O with pathlib

File I/O (Input/Output) means reading data from files on disk and writing results back. Almost every data science workflow starts by loading a CSV, JSON, or Parquet file and ends by saving results somewhere.

pathlib.Path is the modern Python way to work with file paths. It is cross-platform (works on Windows, macOS, and Linux without changes) and composable:

from pathlib import Path

data_dir  = Path('tutorials') / 'data'        # / joins path parts
csv_file  = data_dir / 'students.csv'
print(csv_file)   # tutorials/data/students.csv

Key Concept: pathlib.Path, the Modern Way to Handle Paths

Since Python 3.4, pathlib.Path is the standard for file-system work. It is cross-platform, composable with /, and carries methods for existence checks, directory creation, and reading/writing, all in one object.

Always use with open(…) as fh (context manager) so the file is closed automatically, even if an exception occurs.

Common Mistake: Bare String Paths

open(‘data/file.csv’) works but gives you no path-manipulation methods and is fragile on Windows vs. macOS/Linux. Use Path(‘data’) / ‘file.csv’ instead.

from pathlib import Path

# Path composition: / operator joins parts, cross-platform
project_root: Path = Path()
data_dir: Path = project_root / "data"
output_file: Path = data_dir / "results" / "run_001.json"

print(f"data_dir    : {data_dir}")
print(f"output_file : {output_file}")

data_dir    : data
output_file : data/results/run_001.json

Every Path object knows its own parts: no string slicing to extract a filename or extension. mkdir(exist_ok=True) is the safest way to create a directory (no error if it already exists):

from pathlib import Path

# Path properties: inspect parts of a path without string slicing
p = Path("tutorials/03-data-analysis/data/primary.csv")
print(f"p.name   : {p.name}")  # 'primary.csv'
print(f"p.stem   : {p.stem}")  # 'primary'
print(f"p.suffix : {p.suffix}")  # '.csv'
print(f"p.parent : {p.parent}")  # 'tutorials/03-data-analysis/data'
print(f"p.parts  : {p.parts}")

# Safe directory creation: no error if it already exists
tmp_dir = Path("tmp_activity")
tmp_dir.mkdir(exist_ok=True)
print(f"\ntmp_dir exists: {tmp_dir.exists()}")

p.name   : primary.csv
p.stem   : primary
p.suffix : .csv
p.parent : tutorials/03-data-analysis/data
p.parts  : ('tutorials', '03-data-analysis', 'data', 'primary.csv')

tmp_dir exists: True

Reading & Writing Files

Always use the with statement. It closes the file automatically, even if an exception occurs. DictWriter writes rows as dicts keyed by column name:

import csv
from pathlib import Path

tmp = Path("tmp_activity")
tmp.mkdir(exist_ok=True)

csv_path = tmp / "students.csv"
rows: list[dict[str, object]] = [
    {"name": "Alice Kamau", "gpa": 3.95, "major": "CS"},
    {"name": "Bob Mwangi", "gpa": 3.45, "major": "Math"},
    {"name": "Carol Osei", "gpa": 3.88, "major": "CS"},
]

with csv_path.open("w", newline="", encoding="utf-8") as fh:
    writer = csv.DictWriter(fh, fieldnames=["name", "gpa", "major"])
    writer.writeheader()
    writer.writerows(rows)

print(f"Wrote: {csv_path}")

Wrote: tmp_activity/students.csv

DictReader reads each row back as a dict keyed by header names, with no positional index access needed:

import csv
from pathlib import Path

csv_path = Path("tmp_activity") / "students.csv"

with csv_path.open(encoding="utf-8") as fh:
    reader = csv.DictReader(fh)
    loaded: list[dict[str, str]] = list(reader)

for row in loaded:
    print(f"  {row['name']:<15} GPA={row['gpa']}  {row['major']}")

  Alice Kamau     GPA=3.95  CS
  Bob Mwangi      GPA=3.45  Math
  Carol Osei      GPA=3.88  CS

For single-document JSON files, Path.write_text() + json.dumps() and Path.read_text() + json.loads() is the most concise round-trip:

import json
from pathlib import Path

tmp = Path("tmp_activity")
json_path = tmp / "run_result.json"
run_data = {"run_id": "exp-001", "accuracy": 0.923, "tags": ["baseline"]}

# Write: write_text is the cleanest one-liner for JSON
json_path.write_text(json.dumps(run_data, indent=2), encoding="utf-8")
print(f"Wrote: {json_path}")

# Read: read_text + json.loads
reloaded: dict[str, object] = json.loads(json_path.read_text(encoding="utf-8"))
print(f"Read back: {reloaded}")

Wrote: tmp_activity/run_result.json
Read back: {'run_id': 'exp-001', 'accuracy': 0.923, 'tags': ['baseline']}

Finding Files

Path.iterdir() yields the immediate children of a directory. Path.rglob(pattern) searches the entire subtree recursively:

from pathlib import Path

tmp = Path("tmp_activity")
print("Files in tmp_activity:")
for f in sorted(tmp.iterdir()):
    size = f.stat().st_size
    print(f"  {f.name:<30} {size:>6} bytes")

Files in tmp_activity:
  run_result.json                    78 bytes
  students.csv                       79 bytes

rglob('*.ipynb') finds all matching files at any depth. After exploring, clean up the temporary directory with shutil.rmtree():

from pathlib import Path
import shutil

# rglob: recursive search by pattern
notebooks = list(Path("tutorials").rglob("*.ipynb"))
print(f"Notebooks found: {len(notebooks)}")
for nb in sorted(notebooks)[:5]:
    print(f"  {nb}")

# Clean up tmp directory
tmp = Path("tmp_activity")
shutil.rmtree(tmp)
print(f"\nCleaned up: {tmp} exists = {tmp.exists()}")

Notebooks found: 0

Cleaned up: tmp_activity exists = False

Creating & Checking Directories

Path.mkdir() creates directories; Path.exists() and Path.is_dir() check state without raising an error. Always prefer mkdir(parents=True, exist_ok=True) over conditionally calling os.makedirs():

from pathlib import Path

results_dir = Path("results") / "experiment_001"
print(f"Exists before : {results_dir.exists()}")

# parents=True creates any missing parent directories
# exist_ok=True is silent if the directory already exists
results_dir.mkdir(parents=True, exist_ok=True)
print(f"Exists after  : {results_dir.exists()}")
print(f"Is directory  : {results_dir.is_dir()}")

# Write a file into the new directory
log_file = results_dir / "metrics.txt"
log_file.write_text("accuracy=0.923\nval_loss=0.218\n")
print(f"Log file size : {log_file.stat().st_size} bytes")

# Clean up
log_file.unlink()
results_dir.rmdir()
results_dir.parent.rmdir()
print("Cleaned up")

Exists before : False
Exists after  : True
Is directory  : True
Log file size : 30 bytes
Cleaned up

Activity 4 - Experiment Logger

Goal: Write a function that appends an experiment result as a JSON line to a log file, then reads and prints all logged runs.

log_experiment(Path('runs.jsonl'), run_id='run-001', accuracy=0.901, loss=0.312)
log_experiment(Path('runs.jsonl'), run_id='run-002', accuracy=0.923, loss=0.218)

# runs.jsonl contents:
# {"run_id": "run-001", "accuracy": 0.901, "loss": 0.312}
# {"run_id": "run-002", "accuracy": 0.923, "loss": 0.218}

Hint: JSONL (JSON Lines), one JSON object per line, is the standard format for streaming experiment logs. Use mode=‘a’ to append.

import json
from pathlib import Path


def log_experiment(log_path: Path, **metrics: object) -> None:
    """Append an experiment result as a JSON line to log_path.

    Args:
        log_path: Path to the .jsonl log file (created if absent).
        **metrics: Any metric name/value pairs to record.
    """
    ...  # TODO


log_path = Path("runs.jsonl")
if log_path.exists():
    log_path.unlink()  # start fresh for this activity

log_experiment(log_path, run_id="run-001", accuracy=0.901, loss=0.312)
log_experiment(log_path, run_id="run-002", accuracy=0.923, loss=0.218)

# Read back and print all runs
print("Logged runs:")
for line in log_path.read_text(encoding="utf-8").splitlines():
    run = json.loads(line)
    print(f"  {run}")

log_path.unlink()  # clean up

Logged runs:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[46], line 24
     22 # Read back and print all runs
     23 print("Logged runs:")
---> 24 for line in log_path.read_text(encoding="utf-8").splitlines():
     25     run = json.loads(line)
     26     print(f"  {run}")

File /usr/lib/python3.12/pathlib.py:1029, in Path.read_text(self, encoding, errors)
   1025 """
   1026 Open the file in text mode, read it, and close the file.
   1027 """
   1028 encoding = io.text_encoding(encoding)
-> 1029 with self.open(mode='r', encoding=encoding, errors=errors) as f:
   1030     return f.read()

File /usr/lib/python3.12/pathlib.py:1015, in Path.open(self, mode, buffering, encoding, errors, newline)
   1013 if "b" not in mode:
   1014     encoding = io.text_encoding(encoding)
-> 1015 return io.open(self, mode, buffering, encoding, errors, newline)

FileNotFoundError: [Errno 2] No such file or directory: 'runs.jsonl'

Why study gotchas?

The bugs in this section are silent: they do not raise an error. Python happily runs the code and produces the wrong answer. These patterns appear in real data pipelines and ML training scripts and can waste hours of debugging time.

Read through them now and you will recognise them instantly in the wild.

8. Common Gotchas

Key Concept: Bugs That Are Hard to See

The following patterns cause real bugs in data pipelines and ML code. None of them raise an exception. They silently produce wrong results. Learn to recognise them now so you never spend hours debugging them later.

# GOTCHA 1: Mutable default argument
# The default [] is created ONCE at function definition time - shared across all calls!
def append_score_bad(score: float, history: list[float] = []) -> list[float]:  # noqa: B006
    history.append(score)
    return history


print("Bad default: the list leaks between calls:")
print(append_score_bad(82.0))  # [82.0]          expected
print(append_score_bad(91.0))  # [82.0, 91.0]    WRONG: previous call leaked in!

Bad default: the list leaks between calls:
[82.0]
[82.0, 91.0]

The fix: use None as the default sentinel and create a fresh list inside the function body on each call:

def append_score(score: float, history: list[float] | None = None) -> list[float]:
    if history is None:
        history = []  # new list created on every call where history is not provided
    history.append(score)
    return history


print("Fixed: independent list each time:")
print(append_score(82.0))  # [82.0]
print(append_score(91.0))  # [91.0]   fresh list

# Rule: never use a mutable object (list, dict, set) as a default argument value.
# With @dataclass use field(default_factory=list) instead (shown in Sec. 4).

Fixed: independent list each time:
[82.0]
[91.0]

Gotcha 2: assignment is not a copy. b = a creates a second name for the same list. Shallow .copy() creates a new outer container but inner objects are still shared. Use copy.deepcopy() for fully independent nested structures:

# GOTCHA 2: Assignment is NOT a copy
# For nested structures, .copy() is a SHALLOW copy: inner objects are still shared.

import copy

original: list[list[int]] = [[1, 2], [3, 4]]

ref = original  # same object
shallow_copy = original.copy()  # new outer list, shared inner lists
deep_copy = copy.deepcopy(original)  # completely independent

original[0].append(99)
print(f"original     : {original}")  # [[1, 2, 99], [3, 4]]
print(f"ref          : {ref}")  # [[1, 2, 99], [3, 4]] : same object
print(f"shallow_copy : {shallow_copy}")  # [[1, 2, 99], [3, 4]] : inner list shared!
print(f"deep_copy    : {deep_copy}")  # [[1, 2], [3, 4]]     : fully independent

original     : [[1, 2, 99], [3, 4]]
ref          : [[1, 2, 99], [3, 4]]
shallow_copy : [[1, 2, 99], [3, 4]]
deep_copy    : [[1, 2], [3, 4]]

Gotcha 3: / vs //. Both divide, but // floors toward negative infinity, not toward zero. Note that // on floats returns a float, not an int:

# GOTCHA 3: / vs //: easy to confuse
print(f"7 / 2   = {7 / 2}")  # 3.5  : true division, always float
print(f"7 // 2  = {7 // 2}")  # 3    : floor, NOT truncate
print(f"-7 // 2 = {-7 // 2}")  # -4   : floors toward negative infinity
print(f"7.5//2  = {7.5 // 2}")  # 3.0  : floor of float is still float

7 / 2   = 3.5
7 // 2  = 3
-7 // 2 = -4
7.5//2  = 3.0

Gotcha 4 & 5: {} is a dict, and truthiness is not None-ness. Python treats 0, 0.0, None, [], and '' all as falsy, which silently breaks the common value or default pattern when 0.0 is a legitimate result:

# GOTCHA 4: {} creates a dict, not a set
empty1 = {}
empty2 = set()
print(f"type({{}})   : {type(empty1)}")
print(f"type(set()) : {type(empty2)}")

# GOTCHA 5: Boolean short-circuit: 0.0, None, '', [] are all falsy
score: float | None = None
result = score or 0.0  # 0.0: but breaks if score is legitimately 0.0!
print(f'\n0.0 or "default" : {0.0 or "default"}')  # noqa: SIM222

# Prefer an explicit None check
result2 = score if score is not None else 0.0
print(f"Explicit check   : {result2}")

type({})   : <class 'dict'>
type(set()) : <class 'set'>

0.0 or "default" : default
Explicit check   : 0.0

9. Capstone Exercises

Apply everything from Parts 1-3. Each exercise is self-contained. Attempt them without looking at previous sections first.

Exercise 1 - Student Report Generator

Goal: Given a list of student dicts, produce a formatted text report.

students = [
    {'name': 'Alice', 'scores': [88, 92, 85], 'major': 'CS'},
    {'name': 'Bob',   'scores': [62, 70, 58], 'major': 'Math'},
    {'name': 'Carol', 'scores': [91, 95, 89], 'major': 'CS'},
]

# Expected output:
# Name      Major    Avg    Grade
# Alice     CS       88.3   B
# Carol     CS       91.7   A
# Bob       Math     63.3   D
# (sorted by average score, descending)

students: list[dict[str, object]] = [
    {"name": "Alice", "scores": [88, 92, 85], "major": "CS"},
    {"name": "Bob", "scores": [62, 70, 58], "major": "Math"},
    {"name": "Carol", "scores": [91, 95, 89], "major": "CS"},
]

# TODO: produce the formatted report
...

Ellipsis

Exercise 2 - Experiment Tracker Dataclass

Goal: Define an Experiment dataclass, populate a list of runs, then print the best run by validation accuracy.

@dataclass
class Experiment:
    run_id: str
    model: str
    val_accuracy: float
    config: TrainingConfig   # from Sec. 4

# Expected output:
# Best run: Experiment(run_id='run-002', model='xgboost', val_accuracy=0.934, ...)

from dataclasses import dataclass


@dataclass
class Experiment:
    """Record for a single training experiment."""

    run_id: str
    model: str
    val_accuracy: float
    # TODO: add more fields as needed


runs: list[Experiment] = [
    Experiment("run-001", "xgboost", 0.901),
    Experiment("run-002", "xgboost", 0.934),
    Experiment("run-003", "linear", 0.881),
]

# TODO: find and print the best run
best: Experiment = ...
print(f"Best run: {best}")

Best run: Ellipsis

Exercise 3 - Rolling Anomaly Detector

Goal: Using a deque of size 5, flag any reading that deviates more than 2 standard deviations from the window mean.

readings = [36.5, 36.7, 36.8, 36.6, 36.9, 39.5, 36.7, 36.8]

# Expected: reading 39.5 flagged as anomaly

Hint: Use score_summary() from Sec. 1 (or inline the calculation).

from collections import deque

readings: list[float] = [36.5, 36.7, 36.8, 36.6, 36.9, 39.5, 36.7, 36.8]
WINDOW_SIZE: int = 5
THRESHOLD_STD: float = 2.0

window: deque[float] = deque(maxlen=WINDOW_SIZE)

# TODO: detect and print anomalies
for _reading in readings:
    ...

Exercise 4 - Moving Window Average

A raw metric stream (sensor readings, training loss per step) can be noisy. A moving window average smooths it by replacing each value with the average of its neighbours within a fixed window.

Task: Implement moving_window_average(x, n_neighbors):

For each element, average the n_neighbors elements on each side plus the element itself
At the edges, pad with the boundary value (repeat the first/last element as needed)
Return a list the same length as the input

losses = [0.95, 0.82, 0.91, 0.78, 0.65, 0.70, 0.60, 0.55]
moving_window_average(losses, n_neighbors=1)
# window of 3: each value = mean of itself + 1 left + 1 right
# → [0.887, 0.893, 0.837, 0.780, 0.710, 0.650, 0.617, 0.575]

Then: compute the average for n_neighbors in 1–4 and print the range (max − min) of each smoothed list. Does the range shrink as the window grows? Why?

def moving_window_average(x: list[float], n_neighbors: int = 1) -> list[float]:
    """Replace each value with the mean of its n_neighbors on each side.

    Args:
        x:           Input list of floats.
        n_neighbors: Number of neighbours on each side to include.

    Returns:
        Smoothed list of the same length as x.
    """
    n = len(x)
    width = n_neighbors * 2 + 1
    # Pad: repeat first/last element for missing neighbours at edges
    padded = [x[0]] * n_neighbors + x + [x[-1]] * n_neighbors
    # TODO: return the windowed means
    return [sum(padded[i : i + width]) / width for i in range(n)]


training_loss: list[float] = [0.95, 0.82, 0.91, 0.78, 0.65, 0.70, 0.60, 0.55]
print("Original :", training_loss)

for k in range(1, 5):
    smoothed = moving_window_average(training_loss, n_neighbors=k)
    rng = round(max(smoothed) - min(smoothed), 4)
    print(f"  n={k}: range={rng}  {[round(v, 3) for v in smoothed]}")

Original : [0.95, 0.82, 0.91, 0.78, 0.65, 0.7, 0.6, 0.55]
  n=1: range=0.34  [0.907, 0.893, 0.837, 0.78, 0.71, 0.65, 0.617, 0.567]
  n=2: range=0.326  [0.916, 0.882, 0.822, 0.772, 0.728, 0.656, 0.61, 0.59]
  n=3: range=0.3086  [0.901, 0.859, 0.823, 0.773, 0.716, 0.677, 0.626, 0.593]
  n=4: range=0.27  [0.879, 0.851, 0.812, 0.768, 0.723, 0.679, 0.649, 0.609]

Resource	Why it matters
PEP 484 — Type Hints	The original proposal; reading it explains why certain type annotation rules exist
Google Python Style Guide	The docstring format used throughout this notebook comes from here
Van Rossum, G., Warsaw, B. & Coghlan, N. (2001). PEP 8 — Style Guide for Python Code.	The canonical style reference for all Python code
Hunt, A. & Thomas, D. (1999). The Pragmatic Programmer.	Chapters on DRY, orthogonality, and design-by-contract translate directly to the patterns in this notebook

Summary

Concept	Key rule
Functions	Annotate all params and return types; use Google-style docstrings
Defaults	Never use a mutable object as a default, use `None` and check inside
Lambda	Use for short key functions: `sorted(items, key=lambda x: x['score'])`
`*args`	Collect variable positional args into a tuple; unpack a list with `*list`
`**kwargs`	Collect variable keyword args into a dict; unpack a dict with `**dict`
`@dataclass`	Generated `__init__`, `__repr__`, `__eq__`; use `frozen=True` for immutability
`field(default_factory=list)`	The correct way to give a dataclass a mutable default
Imports	`import module` over `from module import *`; use conventional aliases
Exceptions	`except SpecificError` not bare `except`; use `else` for success path, `finally` for cleanup
`pathlib.Path`	Cross-platform paths; compose with `/`; read with `.read_text()`, write with `.write_text()`
Context manager	`with open(...) as fh` always, file closes automatically
Gotchas	Mutable defaults, `=` is not copy, `//` floors, `{}` is dict

Next: 04-numpy.ipynb, covering arrays, broadcasting, and vectorised operations with NumPy.