Part 13: Project Setup with uv

View Source on GitHub

DS-MLOps Dev Tools

Python 3.12+ | Author: Anthony Faustine

Before you begin

This chapter assumes you have completed Parts 1 and 2. No prior experience with virtual environments is required. If you have used pip and venv before, every concept here maps directly onto what you already know.

You will build a project called grade-predictor, a small Python package that computes and analyses student grades from university_analytics.csv. This project thread continues through all six Dev Tools chapters, giving every tool a real codebase to act on.

Topic Why it matters
uv vs pip/venv/Poetry Faster installs, built-in lockfiles, one tool for everything
pyproject.toml The single source of truth for dependencies and project metadata
uv init and uv add Create and extend projects without manually editing TOML
Lockfile (uv.lock) Pins every transitive dependency for reproducible environments
Extras and dev dependencies Separate production, test, and development dependencies cleanly
uv run Execute scripts in the project environment without activating it

Callout markers used throughout this notebook are explained on the book cover page.

By the end of Part 13 you will be able to:

# Skill Covered in
0 Explain what uv is, how it compares to pip and Poetry, and install it Sec. 0
1 Explain why isolated environments and lockfiles matter for reproducible DS work Sec. 1
2 Initialize a packaged Python project with uv init --package Sec. 2
3 Read and write the [project] and [project.optional-dependencies] sections of pyproject.toml Sec. 3
4 Add, remove, and sync dependencies with uv add, uv remove, and uv sync Sec. 4
5 Separate heavy ML dependencies into optional groups so CI stays fast Sec. 5
6 Manage secrets with a .env file and keep them out of version control Sec. 6
7 Run any command inside the project environment with uv run Sec. 7

0. What is uv and Why Use It

uv is a Python package and project manager written in Rust, built by Astral, the same team behind ruff. It replaces a stack of tools that most Python developers use separately: pip, venv, pip-tools, and parts of Poetry or conda. One binary, one configuration file, one command prefix (uv run).

flowchart LR
    A["uv init --package\nproject-name"] --> B["pyproject.toml\n[project] metadata"]
    B --> C["uv add pandas\nuv add --optional dev ruff"]
    C --> D["uv.lock\nfull dependency tree\npinned & reproducible"]
    D --> E["uv sync\n.venv created"]
    E --> F["uv run script.py\nno activation needed"]

    style D fill:#EBF5F0,stroke:#059669,color:#065F46
    style E fill:#EAF3FA,stroke:#0369A1,color:#0C4A6E
    style F fill:#EAF3FA,stroke:#0369A1,color:#0C4A6E

How uv compares to the alternatives

Tool Virtual env Dependency resolution Lockfile Speed pyproject.toml native
pip + venv Manual Basic (no SAT solver) Only with pip-tools Slow No
pip-tools Manual Full resolution Yes (requirements.txt) Slow No
Poetry Built-in Full resolution Yes (poetry.lock) Medium Yes
conda Built-in Full (cross-language) Yes Slow No
uv Built-in Full (Rust SAT solver) Yes (uv.lock) 10-100x faster Yes

The decisive advantages for DS work: uv is fast enough that uv sync after a fresh clone takes seconds, not minutes; and uv.lock pins every transitive dependency so two machines always get identical environments.

Installing uv

uv is a single binary with no Python dependency. Install it once per machine:

# macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

# If you already have Python and prefer pipx
pipx install uv

# Verify
uv --version

After installation, uv is available system-wide. You do not need to activate any environment to use it.

Pro Tip: Keep uv up to date

uv releases frequently. Run uv self update to upgrade to the latest version. The resolver improves with every release.

1. The Reproducibility Problem

Two colleagues share the same script. One gets a clean result; the other gets a FutureWarning and a different output. The script has not changed. The problem is the environment: pandas 1.5 and pandas 2.0 handle missing values and certain default arguments differently. Without a record of exact package versions, “it works on my machine” is the only guarantee on offer.

The standard Python answer has been to pair a virtual environment (an isolated Python interpreter with its own packages) with a requirements.txt file pinning direct dependencies. This solves part of the problem. It does not pin transitive dependencies: the packages that your packages depend on. A project frozen at pandas==2.1.0 will still install whichever version of pytz pip resolves on the day of installation, and that version may differ between colleagues or CI runs.

A lockfile solves this. It records the exact version of every package in the dependency tree, direct and transitive, so that two different machines installing from the same lockfile get byte-for-byte identical environments. uv provides both the virtual environment and the lockfile in a single fast tool.

Key Concept: A lockfile is a snapshot of the environment that worked

uv.lock pins the exact version of every dependency, including transitive ones. Commit it alongside your code. Anyone who clones the repo and runs uv sync gets the identical environment, on any machine, at any time.

2. Initializing a Project

Create the grade-predictor project with the --package flag, which generates the src/ layout:

uv init grade-predictor --package
cd grade-predictor

The generated structure:

grade-predictor/
├── src/
│   └── grade_predictor/
│       └── __init__.py
├── .python-version
├── pyproject.toml
└── README.md

The --package flag matters. Without it, uv init creates a simple script project. With it, it creates a proper Python package with the src/ layout. In the src/ layout, the package lives inside src/ rather than at the project root. This means you cannot accidentally import from the development copy of the code when running scripts from the project root: Python will only find the installed package, which is the version that tests and CI will see.

Run uv sync to create the virtual environment and generate uv.lock:

uv sync

The .venv/ directory appears at the project root. You do not activate it manually. uv run handles activation automatically.

Activity 1 - Initialize and Inspect

Goal: Run uv init grade-predictor –package in a temporary directory, then open pyproject.toml and identify the [project] section. List the three files whose purpose you cannot immediately guess and look each one up.

3. pyproject.toml Anatomy

pyproject.toml is the single source of truth for a Python project. It replaces four files that older projects split across: setup.py, setup.cfg, requirements.txt, and pytest.ini. Every modern Python tool reads it.

Here is a complete pyproject.toml for grade-predictor with annotations:

[project]
name = "grade-predictor"
version = "0.1.0"
description = "Grade computation and risk analysis for university_analytics.csv"
authors = [{ name = "Anthony Faustine", email = "sambaiga@gmail.com" }]
requires-python = ">=3.12"               # minimum Python version; 3.12 adds better error messages
dependencies = [
    "pandas>=2.1",
    "numpy>=1.26",
]

[project.optional-dependencies]          # groups; installed only when requested
modelling = [
    "scikit-learn>=1.7",
    "xgboost>=3.0",
]
test = [
    "pytest>=8.0",
    "pytest-cov>=6.0",
]
dev = [
    "ruff>=0.5",
    "pre-commit>=4.0",
    "python-dotenv>=1.0",
]

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.hatch.build.targets.wheel]
packages = ["src/grade_predictor"]

[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = "--cov=grade_predictor --cov-report=term-missing --cov-fail-under=80"

[tool.ruff]
target-version = "py312"
line-length = 100

Key Concept: pyproject.toml is the single source of truth

It replaces setup.py, requirements.txt, setup.cfg, and pytest.ini. Read it and you can understand any modern Python project in under five minutes: what it does, what it depends on, how to test it, and how to build it.

Activity 2 - Edit pyproject.toml

Goal: Open the pyproject.toml from Activity 1. Change requires-python to >=3.12, add polars>=1.0 to dependencies, and add a dev optional group containing jupyter>=1.0. Then run uv sync and confirm no errors.

4. Adding and Syncing Dependencies

uv add is the correct way to add a dependency. It updates pyproject.toml, resolves the full dependency graph, and writes a new uv.lock in one step:

uv add pandas numpy
uv add --optional dev ruff pytest
uv sync

Compare this to pip install pandas: pip installs the package into whatever environment is currently active but writes nothing permanent. Next week, on a different machine, there is no record of which version was installed.

Useful commands:

uv add <package>                      # add to core dependencies
uv add --optional dev <package>       # add to the dev group
uv remove <package>                   # remove from dependencies
uv sync                               # install everything in uv.lock
uv sync --extra test                  # install core + test group
uv sync --all-extras                  # install all groups
uv lock --upgrade-package pandas      # upgrade one package and update lock

After uv sync, confirm the environment works:

uv run python -c "import pandas; print(pandas.__version__)"

Pro Tip: Commit uv.lock, not requirements.txt

uv.lock records every transitive dependency. A requirements.txt generated from it records only what you asked for, not what was actually resolved. When you pin with requirements.txt, a colleague installing a week later may get different transitive versions. Commit uv.lock and delete any requirements.txt workflow.

Activity 3 - Add and Verify a Dependency

Goal: Add polars>=1.0 as a core dependency with uv add. Then add great-tables>=0.20 to the dev group. Run uv sync and confirm both packages import: uv run python -c “import polars, great_tables; print(‘ok’)”.

5. Optional Dependency Groups for ML Projects

Heavy ML dependencies are the single biggest cause of slow CI pipelines. A cold install of PyTorch takes three to four minutes. A cold install of scikit-learn and pytest takes under 20 seconds. Separating them into optional groups makes CI run only what it needs.

[project.optional-dependencies]
modelling = [
    "scikit-learn>=1.7",
    "xgboost>=3.0",
    "torch>=2.0",          # ~800MB; only needed for deep learning experiments
]
test = [
    "pytest>=8.0",
    "pytest-cov>=6.0",
]
dev = [
    "ruff>=0.5",
    "pre-commit>=4.0",
    "jupyter>=1.0",
]

Three installation profiles:

Command Who uses it What gets installed
uv sync CI (fast path), Docker prod image Core dependencies only
uv sync --extra test CI (test path) Core + test group
uv sync --all-extras Local development Everything

The GitHub Actions workflow from Ch04 uses uv sync --extra test. Your local environment uses uv sync --all-extras. The production Docker image uses uv sync with no extras.

Pro Tip: Keep torch in an optional group

A CI run that installs PyTorch takes 3 to 4 minutes on a cold cache. One that installs only pandas and pytest takes under 30 seconds. Multiply by 20 pushes per day and that is two hours of CI time saved weekly, for free.

Activity 4 - Verify Group Isolation

Goal: Add scikit-learn>=1.7 to a modelling group. Run uv sync (no extras) and confirm scikit-learn is NOT importable: uv run python -c “import sklearn” should raise ModuleNotFoundError. Then run uv sync –extra modelling and confirm it imports.

6. Secret Management with .env

A database password or API key committed to a public GitHub repository is searchable, permanent (even after deletion from history via git rm), and a real security incident. The correct pattern is to never put secrets in version-controlled files.

Create .env at the project root:

# .env  -- never commit this file
DATABASE_URL=postgresql://user:password@localhost/grades
OPENAI_API_KEY=sk-...
REPORT_TITLE=Grade Predictor Report

Add to .gitignore immediately:

echo ".env" >> .gitignore
echo "*.env" >> .gitignore

Load in Python with python-dotenv:

from dotenv import load_dotenv
import os

load_dotenv()                              # reads .env into os.environ
db_url = os.getenv("DATABASE_URL")
report_title = os.getenv("REPORT_TITLE", "Default Title")  # second arg is fallback

Commit an .env.example with placeholder values so collaborators know which variables to set:

# .env.example  -- commit this file
DATABASE_URL=postgresql://user:password@localhost/grades
OPENAI_API_KEY=sk-your-key-here
REPORT_TITLE=My Report

Common Mistake: Defaults that are real values

Never put a real secret as a default value in code: os.getenv(“API_KEY”, “sk-prod-abc123”). The moment that file is committed, the key is in git history permanently. A placeholder default like “your-key-here” is safe; a real credential is not. If a key is committed by accident, rotate it immediately.

Activity 5 - Load a Secret from .env

Goal: Create .env with REPORT_TITLE=“Grade Predictor”. Add python-dotenv to the dev group. Write a script scripts/report.py that loads the title and prints it. Confirm .env is in .gitignore before running git status.

7. uv run: The Universal Entry Point

uv run executes any command inside the project environment without requiring manual activation of .venv:

uv run python scripts/report.py           # run a script
uv run pytest                              # run tests
uv run ruff check .                        # lint
uv run jupyter lab                         # open Jupyter
uv run --with httpx python -c "import httpx; print(httpx.__version__)"  # one-off tool

The --with flag installs a package into the run environment for that command only, without adding it to pyproject.toml. This is the correct way to use tools you reach for occasionally (nbmake, httpx, a formatter you are evaluating) without polluting the project’s dependency list.

uv run always uses the project environment defined by uv.lock. Shell state (which conda environment is active, whether you ran source .venv/bin/activate in another terminal) never leaks in.

Key Concept: uv run replaces manual venv activation

source .venv/bin/activate changes your shell state. Forget to run it and you install into the wrong Python. uv run needs no shell state: it always resolves the project environment from uv.lock. Use it for every command in a project, and activation errors disappear entirely.

Capstone - Build grade-predictor from Scratch

Build the complete initial structure of the grade-predictor project. By the end of this exercise, you will have a runnable Python package with a proper environment.

Capstone: grade-predictor initial setup

  1. Run uv init grade-predictor –package and cd grade-predictor
  2. Add core dependencies: pandas>=2.1, numpy>=1.26
  3. Add optional groups: test with pytest>=8.0 and pytest-cov>=6.0; dev with ruff>=0.5 and python-dotenv>=1.0
  4. Write src/grade_predictor/core.py with this function:
    def compute_grade(midterm: float, final: float, project: float) -> float:
        return midterm * 0.30 + final * 0.45 + project * 0.25
    
  5. Confirm it runs: uv run python -c “from grade_predictor.core import compute_grade; print(compute_grade(80, 85, 90))”
  6. Create .env with REPORT_TITLE=Grade Predictor and add .env to .gitignore
Resource Why it matters
uv documentation: Projects Authoritative reference for every uv command and concept
fmind MLOps course, 1.2 uv MLOps-specific framing of uv in CI and Docker
pybit.es, Developing packages with uv Src layout and editable installs explained
python-dotenv documentation Standard library for .env loading
PEP 518, pyproject.toml specification The spec behind every pyproject.toml field
Concept Key rule
Virtual environment One per project. Never install into the system Python.
uv init --package Src layout prevents accidental imports from the wrong place.
uv.lock Commit it. It pins the exact environment that worked.
Optional dependency groups Heavy ML deps in modelling or gpu groups. CI syncs core and test only.
.env Never commit it. Commit .env.example instead.
uv run Use it for everything. No manual venv activation required.

Next: Part 14: Code Quality with ruff runs ruff check and ruff format on the grade-predictor codebase you just created.