Why Another ML Book?

The work was technically solid. The model trained cleanly, the metrics looked good, and the deployment went smoothly. Six months into the project, someone asked the question that should have come first: what decision does this output actually change? The answer was uncomfortable. We had spent half a year building the right system for the wrong problem.

You might recognise yourself in one of these patterns.

If you come from a research background, you understand the mathematics and can read a paper and implement it from scratch. But your code is fragile, hard to read, and nearly impossible for anyone else to maintain. A notebook that runs once on your laptop is not a pipeline a team can build on.

If you are a strong engineer, you write clean functions, use version control, and test your work. But you treat a model as a black box: something that returns a number without fully understanding why it works, when it will fail, or how to explain its limits to a stakeholder.

The third pattern is the one I described above, and it is the most common across industry and applied research. Your work is technically solid. The model trains. The metrics look good. The code is clean. But the question nobody stopped to ask was: what problem does this actually solve, and for whom?

All three patterns represent real data science work. All three are missing something that would make the work significantly more effective.

Most learning resources reflect the same division. There are courses that teach you to implement a neural network from scratch, and courses that teach you to write better Python, but rarely does the same resource treat both with equal rigour. When you try to combine them on your own, you spend most of your time resolving conflicts between conventions, not actually learning.

The recent wave of LLMs has changed the speed at which people can start writing code, but not the depth at which they understand it. Generating a working script is easier than ever. Knowing why it works, how to adapt it when something breaks, and how to build something that holds up under real conditions – that still requires a structured foundation.

This book is my attempt to put all three sides together. The technical depth to understand what you are building. The engineering discipline to build it properly. The tooling and thinking to make it matter in practice.

It is written for data scientists who want to close one of these gaps, and for people starting out who do not want to acquire one side without the others.

Anthony Faustine
Principal ML Engineer · sambaiga.github.io