Why Another ML Book?

The work is technically solid. But the question nobody stopped to ask was: what problem does this actually solve, and for whom?

Over the years of working as a data scientist across academia and industry, I have seen the same gap show up in three different forms.

Some people come in with strong research foundations. They understand the mathematics, they can read a paper and implement it from scratch. But their code is fragile, hard to read, impossible to test, and nearly impossible for anyone else to maintain. A notebook that runs once on their laptop is not a pipeline anyone else can build on.

Others are strong engineers. They write clean functions, use version control, and test their work. But they treat a model as a black box: something that returns a number without understanding why it works, when it will fail, or how to communicate its limits to a stakeholder.

The third pattern is subtler, and in my experience the most common in both industry and applied research. The work is technically solid, the model trains, the metrics look good, the code is clean. But the question nobody stopped to ask was: what problem does this actually solve, and for whom? I have seen sophisticated models built on the wrong objective, deployed to users who never needed them, or evaluated on benchmarks that had nothing to do with the decision they were meant to support.

All three groups are doing real data science. All three are missing something that would make their work significantly more effective.

Most learning resources reflect the same division. There are courses that teach you to implement a neural network from scratch, and courses that teach you to write better Python, but rarely does the same resource treat both with equal rigour. When you try to combine them on your own, you spend most of your time resolving conflicts between conventions, not actually learning.

The recent wave of LLMs has changed the speed at which people can start writing code, but not the depth at which they understand it. Generating a working script is easier than ever. Knowing why it works, how to adapt it when something breaks, and how to build something that holds up under real conditions, that still requires a structured foundation.

This book is my attempt to put all three sides together. The technical depth to understand what you are building. The engineering discipline to build it properly. The tooling and thinking to make it matter in practice.

It is written for data scientists who want to close one of these gaps, and for people starting out who do not want to acquire one side without the others.

Anthony Faustine
Principal ML Engineer  ·  sambaiga.github.io