Part 14: Code Quality with ruff

View Source on GitHub

DS-MLOps Dev Tools

Python 3.12+ | Author: Anthony Faustine

Before you begin

This chapter assumes you have completed Part 13: Project Setup with uv. The grade-predictor project you built there is the codebase ruff will run on. If you are starting here, create the project with uv init grade-predictor --package and add pandas>=2.1 as a dependency before continuing.

Topics covered

Topic	Why it matters
Linting vs formatting	Two separate concerns: catching bugs vs enforcing style
`ruff check`	Detects 700+ code patterns 100x faster than flake8
`ruff format`	Opinionated formatter compatible with Black
`pyproject.toml` configuration	Rule selection and per-file ignores without a separate config file
Noqa comments	Suppress a specific rule on one line without disabling it globally
CI and pre-commit integration	Fail the build on lint errors; run on every commit automatically

Callout markers used throughout this chapter are explained on the book cover page.

Learning Objectives

By the end of Part 14 you will be able to:

#	Skill	Covered in
1	Explain the difference between a linter and a formatter	Sec. 1
2	Run `ruff check` and interpret its output line by line	Sec. 2
3	Run `ruff format` and understand when to run it	Sec. 3
4	Configure ruff in `pyproject.toml` with a ruleset appropriate for DS code	Sec. 4
5	Identify the lint rules that catch the most common silent bugs in DS code	Sec. 5
6	Write Google-style docstrings and enforce them with ruff’s D ruleset	Sec. 6
7	Use `# noqa` and per-file ignores correctly	Sec. 7

1. Two Different Jobs

Ruff does two separate things, and it is worth being clear about which one you are running.

A formatter rewrites the layout of your code: indentation, line length, quote style, trailing commas. It makes no judgements about correctness. It does not care whether a variable is used or a loop is redundant. It just makes the code look consistent.

A linter reads the code and flags patterns that are wrong, suspicious, or likely to cause bugs. It cannot fix most of what it finds, because fixing requires understanding intent. It tells you “this import is never used” and lets you decide whether to remove it or rename it.

Ruff handles both. They are separate commands with separate jobs: ruff format for layout, ruff check for correctness. Running format first is the usual order: get the code looking clean, then read what the linter flags.

flowchart LR
    A[source code] --> B["ruff format\nindentation, quotes,\nline length"]
    B --> C[formatted code]
    C --> D["ruff check\nAST rules: unused imports,\nmutable defaults, etc."]
    D --> E{findings?}
    E -->|"--fix: auto-fixable"| F["ruff applies fix\nre-stage the file"]
    F --> D
    E -->|"none"| G["ready to commit"]

    style G fill:#EBF5F0,stroke:#059669,color:#065F46
    style B fill:#F5F3FF,stroke:#7C3AED,color:#3B0764
    style D fill:#EAF3FA,stroke:#0369A1,color:#0C4A6E

Key Concept: Formatting is about style. Linting is about correctness.

A formatter can never catch df.drop(“column”) without inplace=True silently doing nothing. A linter can. Use both: format automatically on every save, run the linter before every commit.

2. Running `ruff check`

Start with a deliberately messy core.py that contains several common issues:

# src/grade_predictor/core.py  (intentionally broken version for this section)
import os
import sys
from pathlib import Path

def compute_grade(midterm, final, project, weights=[0.30, 0.45, 0.25]):
    total = midterm * weights[0] + final * weights[1] + project * weights[2]
    unused_var = "this is never read"
    return total

def flag_at_risk(df, threshold=50.0):
    try:
        result = df["average_marks"] < threshold
    except:
        result = None
    return result

Run ruff check on it:

uv run ruff check src/

The output looks like:

src/grade_predictor/core.py:1:8: F401 [*] `os` imported but unused
src/grade_predictor/core.py:2:8: F401 [*] `sys` imported but unused
src/grade_predictor/core.py:3:8: F401 [*] `pathlib.Path` imported but unused
src/grade_predictor/core.py:5:28: B006 Do not use mutable data structures for argument defaults
src/grade_predictor/core.py:8:5: F841 [*] Local variable `unused_var` is assigned to but never used
src/grade_predictor/core.py:13:5: E722 Do not use bare `except`
Found 6 errors.
[*] 4 fixable with the `--fix` option.

Reading the output: each line is file:line:col: CODE message. The [*] marker means ruff can auto-fix this one. F401 is the pyflakes family; B006 is from bugbear; E722 is from pycodestyle.

Auto-fix what can be fixed automatically, then inspect the rest:

uv run ruff check src/ --fix
uv run ruff check src/

B006 (mutable default argument) and E722 (bare except) need manual fixes:

def compute_grade(
    midterm: float,
    final: float,
    project: float,
    weights: tuple[float, float, float] = (0.30, 0.45, 0.25),  # tuple, not list
) -> float:
    return midterm * weights[0] + final * weights[1] + project * weights[2]

def flag_at_risk(df, threshold: float = 50.0):
    try:
        return df["average_marks"] < threshold
    except KeyError:                 # specific exception, not bare except
        return None

Activity 1 - Read and Fix ruff Output

Goal: Add an unused import, a bare except:, and a variable assigned but never used to your core.py. Run ruff check src/ and read every line of output. Fix each one manually (do not use –fix for this exercise). Confirm ruff check src/ returns zero findings.

uv run ruff check src/
# Fix each finding, then re-run
uv run ruff check src/

3. Running `ruff format`

ruff format rewrites the layout of every Python file it touches. It is opinionated and non-configurable by design: line length is the only setting. The result is that two developers with different editor preferences produce identical diffs.

uv run ruff format src/              # format in place
uv run ruff format src/ --check      # exit 1 if anything would change (CI mode)
uv run ruff format src/ --diff       # show what would change without writing it

The --check flag is what CI uses. If any file needs reformatting, the check fails and the push is blocked. The developer runs ruff format . locally, stages the changes, and pushes again.

Before format:

def compute_grade(midterm,final,project,weights=(0.30,0.45,0.25)):
    return midterm*weights[0]+final*weights[1]+project*weights[2]

After format:

def compute_grade(
    midterm,
    final,
    project,
    weights=(0.30, 0.45, 0.25),
):
    return midterm * weights[0] + final * weights[1] + project * weights[2]

Pro Tip: Set your editor to run ruff format on save

In VS Code, add “editor.formatOnSave”: true and “[python]”: {“editor.defaultFormatter”: “charliermarsh.ruff”} to settings.json. After that, every save produces a correctly formatted file. CI’s ruff format –check becomes a safety net for the rare case where it did not run, not a first-pass fix.

4. Configuring ruff in `pyproject.toml`

Without configuration, ruff checks only the E and F rules (pycodestyle and pyflakes). For DS work, several more rule groups pay for themselves:

[tool.ruff]
target-version = "py312"
line-length = 100

[tool.ruff.lint]
select = [
    "E",   # pycodestyle errors
    "F",   # pyflakes (unused imports, undefined names)
    "I",   # isort (import order)
    "B",   # bugbear (real bugs, not just style)
    "N",   # pep8 naming
    "UP",  # pyupgrade (modernize syntax automatically)
    "SIM", # simplify (cleaner expressions)
]
ignore = [
    "E402",  # module-level import not at top: notebooks need this
]

[tool.ruff.lint.per-file-ignores]
"tests/**" = ["S101"]          # assert is expected in tests
"**/*.ipynb" = ["E501", "F811"] # notebooks: long lines and re-defined names are ok

The per-file-ignores section is where you handle legitimate exceptions. Notebooks need E501 suppressed because output lines can be long. Tests need S101 suppressed because assert is the whole point.

Activity 2 - Configure and Run

Goal: Add the configuration above to your grade-predictor/pyproject.toml. Run ruff check src/ and fix every finding without using # noqa. Then run ruff format src/ and commit the clean state.

uv run ruff check src/ --fix
uv run ruff check src/          # should be zero findings
uv run ruff format src/

5. The Rules That Catch Real DS Bugs

The E and F rules catch syntax and unused-import issues. The rules that catch silent data bugs in DS code come from bugbear (B) and simplify (SIM).

B006: Mutable default argument

This is the most common silent bug in DS code:

# Wrong: the same list is shared across every call
def aggregate(df, group_cols=["program", "semester"]):
    return df.groupby(group_cols).mean()

# Right: use a tuple (immutable) or None
def aggregate(df, group_cols: tuple[str, ...] = ("program", "semester")):
    return df.groupby(list(group_cols)).mean()

The wrong version works fine until someone calls aggregate(df) and the default list gets mutated somewhere in the call chain. The bug is silent and the output is wrong.

B023: Function defined in a loop

A common pattern in ML training callbacks:

# Wrong: all callbacks capture the same (final) value of threshold
callbacks = []
for threshold in [0.5, 0.6, 0.7]:
    callbacks.append(lambda df: df[df["score"] > threshold])  # B023

# Right: capture the value at definition time
callbacks = [lambda df, t=t: df[df["score"] > t] for t in [0.5, 0.6, 0.7]]

F841: Assigned but never used

A dead intermediate variable in a pipeline is often a sign of a forgotten transformation:

filtered = df[df["passed"]]          # F841 if filtered is never used again
result = df.groupby("course").mean() # this uses df, not filtered

SIM108: Use ternary expression

# Before
if score >= 50:
    label = "Pass"
else:
    label = "Fail"

# After (ruff --fix handles this automatically)
label = "Pass" if score >= 50 else "Fail"

Common Mistake: Mutable default argument

def split(df, cols=[]) shares the same list object across every call. The second call gets whatever the first call left in the list. This exact pattern appears in DS code regularly because it looks like a harmless convenience. Use an empty tuple () or None with a guard inside the function.

6. Docstrings: Style, Content, and Enforcement

A docstring is the first string literal inside a function, class, or module. It is not a comment; it is data that Python exposes through help(), IDEs, and documentation generators. The rule for when to write one: if a function is called from outside the file it lives in, it needs a docstring.

The Google style is the de facto standard for DS projects:

def compute_grade(
    midterm: float,
    final: float,
    project: float,
    weights: tuple[float, float, float] = (0.30, 0.45, 0.25),
) -> float:
    """Compute a weighted average grade from three component scores.

    Args:
        midterm: Score from 0 to 100.
        final: Score from 0 to 100.
        project: Score from 0 to 100.
        weights: Three weights that must sum to 1.0. Order is (midterm, final, project).

    Returns:
        Weighted average score from 0 to 100.

    Raises:
        ValueError: If weights do not sum to 1.0 within a tolerance of 1e-6.

    Example:
        >>> compute_grade(80, 90, 75)
        84.25
    """
    if abs(sum(weights) - 1.0) > 1e-6:
        raise ValueError(f"Weights must sum to 1.0, got {sum(weights)}")
    return midterm * weights[0] + final * weights[1] + project * weights[2]

The structure: one-line summary, blank line, then named sections. The sections you actually need depend on the function: Args and Returns for anything non-trivial, Raises when exceptions are part of the contract, Example for functions that need a concrete demo to be understood.

Key Concept: Write for the caller, not the implementer

A docstring answers: “what does this function do from the outside?” It does not explain how it works internally. “““Compute weighted average grade”““ is useful. ”““Multiply midterm by weights[0] and add to final times weights[1]”““ is not: it describes the implementation, which the caller can already read. Write what a caller needs to know to use the function correctly: inputs, outputs, edge cases.

What counts as a module-level docstring?

A module docstring sits at the very top of the file, before any imports:

"""Grade predictor: core grade computation functions.

This module provides the core logic for computing student grades,
assigning letter grades, and flagging at-risk students.
"""
import pandas as pd

Enforcing docstrings with ruff

Ruff’s D rule group (pydocstyle) checks docstring conventions. Add it to pyproject.toml:

[tool.ruff.lint]
select = [
    "E", "F", "I", "B", "N", "UP", "SIM",
    "D",   # pydocstyle: docstring conventions
]

[tool.ruff.lint.pydocstyle]
convention = "google"   # enforce Google-style structure

[tool.ruff.lint.per-file-ignores]
"tests/**"    = ["S101", "D"]         # tests do not need docstrings
"**/*.ipynb"  = ["E501", "F811", "D"] # notebooks: docstrings not required

The key rules: - D100: missing module docstring - D101/D102/D103: missing class/method/function docstring - D205: blank line required after first-line summary - D417: missing argument descriptions in the docstring

Pro Tip: Enforce docstrings only on public functions

Functions prefixed with _ are internal by convention. Ruff’s D rules skip them by default when convention = “google” is set. This means you only have to document the API surface: the functions a caller is expected to use. Internal helpers stay undocumented until they grow complex enough to warrant it.

Enforcing spell-checking with codespell

Docstrings and comments accumulate typos. codespell catches them before they reach the repository. Add it to .pre-commit-config.yaml:

- repo: https://github.com/codespell-project/codespell
  rev: v2.3.0
  hooks:
    - id: codespell
      args: [--write-changes]
      additional_dependencies: [tomli]

With --write-changes, codespell fixes unambiguous typos automatically. Ambiguous corrections are left for you to resolve. For project-specific words that codespell does not recognise (library names, abbreviations), add them to pyproject.toml:

[tool.codespell]
skip = "*.ipynb,uv.lock,_freeze"
ignore-words-list = "caste,midterms"

Activity 3 - Write and Enforce Docstrings

Goal: Write Google-style docstrings for all three public functions in grade-predictor/src/grade_predictor/core.py: compute_grade, grade_to_letter, and flag_at_risk. Add the D rule group to pyproject.toml with convention = “google”. Run ruff check src/ and fix every D-rule finding. Confirm zero findings.

uv run ruff check src/ --select D
# fix each finding, then re-run
uv run ruff check src/ --select D

7. `# noqa` and When to Use It

# noqa: E501 suppresses one rule on one line. Use it when suppression is the right answer, not when you want to avoid fixing the real problem.

Legitimate uses:

# A URL that cannot be shortened
# See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html  # noqa: E501

# A regex that must stay on one line for readability
pattern = r"^S\d{4}$"  # noqa: E501

# A ruff false positive (see notebooks for S106 on pass_rate= kwargs)
.cols_label(pass_rate="Pass Rate")  # noqa: S106

Not legitimate:

result = df.groupby("program").agg({"score": "mean", "grade": "first", "cohort": "nunique", "semester": "count", "attendance": "mean"})  # noqa: E501
# Fix: break the line instead of suppressing

per-file-ignores is better than per-line # noqa for patterns that apply to all files of a type, such as E501 in all notebooks.

Activity 3 - Legitimate Suppression

Goal: Find one place in your core.py where a # noqa comment is the right choice (a long docstring URL, a regex, or a ruff false positive). Add the comment with the specific rule code. Then find one place where you were tempted to use # noqa but fixed the code instead. Explain in a comment why you chose each path.

Capstone - Apply ruff to grade-predictor

Apply the full ruff workflow to the grade-predictor project you have built so far.

Capstone - A Clean Codebase

Add the DS ruleset (E, F, I, B, N, UP, SIM) to pyproject.toml
Add at least one intentional version of each of these bugs to core.py: unused import, mutable default, bare except, assigned-but-never-used variable
Run ruff check src/ –fix. Fix the remaining findings manually
Run ruff format src/
Confirm ruff check src/ returns zero findings and ruff format src/ –check exits 0
Commit the clean state: git commit -m “style: apply ruff to grade-predictor”

Resource	Why it matters
ruff documentation	Complete rule reference and configuration guide
Bugbear rules	The rules most likely to catch real DS bugs
ruff vs Black/Flake8	Why ruff replaced a two-tool workflow
pyupgrade rules	Automatic modernization of Python syntax
isort rules	Why import order matters and how ruff enforces it

Concept	Key rule
`ruff check`	Finds bugs and style issues. Does not rewrite code unless `--fix` is passed.
`ruff format`	Rewrites layout only. Cannot introduce bugs. Run on save locally.
Rule groups	`E/F` always. `B` for DS. `I` for import order. `D` for docstrings. Add others incrementally.
`per-file-ignores`	Use for notebooks and tests, not for hiding real problems.
CI check	`ruff format . --check` exits 1 if anything would change. Always include in CI.
Google docstring	One-line summary, blank line, `Args:`, `Returns:`, `Raises:`, `Example:` sections as needed.
`D` rule group	`convention = "google"` in `[tool.ruff.lint.pydocstyle]` enforces Google style.
codespell	Catches typos in docstrings and comments; `--write-changes` fixes unambiguous ones automatically.
`# noqa`	A deliberate decision, not a shortcut. Include the specific rule code.

Before you begin

1. Two Different Jobs

2. Running ruff check

3. Running ruff format

4. Configuring ruff in pyproject.toml