MLOps in 2025: The Stack I Actually Use (and Why)

I counted once — there are something like 47 MLOps tools out there, and every single one claims to be the only thing you'll ever need. I spent way too long evaluating options before I realised the best way to pick a stack is to just start building and swap things out when they hurt. So here's what I actually use, after going through that process.

My One Rule: Don't Add Things Until They Hurt

I've watched teams adopt Airflow for a three-step pipeline, or set up elaborate monitoring that nobody ever checks. Every tool on this list earned its spot by solving a problem I was already having. If I hadn't felt the pain yet, it didn't make the cut. That keeps things from getting bloated.

Experiment Tracking: MLflow

I tried Weights & Biases and Neptune, and they're both good products. But I kept coming back to MLflow because it's free and self-hosted. For personal projects, paying per-seat for experiment tracking felt wrong. MLflow covers tracking, the model registry, artifact storage, and gives you a decent UI for comparing runs. Nothing flashy, but it gets out of your way.

One area where it falls short: LLM work. I've been poking at LangSmith for prompt versioning and tracing, because MLflow doesn't handle that well yet.

Orchestration: Prefect

I used Airflow at a previous gig and it was fine for large, complex DAGs. But for the kind of ML pipelines I'm building now — maybe five to ten steps, all Python — it's way more infrastructure than you need. Prefect lets me slap a decorator on my existing functions and get scheduling, retries, and observability without rewriting anything. The local dev experience is night and day compared to Airflow.

Serving: FastAPI

This one isn't even a close call for me anymore. I started with Flask years ago, and it's fine for quick prototypes. But the moment you need input validation, proper error responses, and auto-generated docs, you end up bolting on three more libraries anyway. FastAPI gives you all of that out of the box, plus async support. I haven't looked back.

Containers: Docker

I learned this one the hard way. Shipped a model that worked perfectly on my laptop, failed silently on the deployment server because of a numpy version mismatch. Now everything goes in a container, no exceptions. Multi-stage builds keep the images small, and pinning every dependency version means I'm not debugging phantom issues at 2am.

Orchestration: Kubernetes

Look, the learning curve is steep. I won't pretend otherwise. But once you get past the initial YAML headaches, Kubernetes gives you rolling deploys, auto-scaling, health checks baked into the scheduler, and the same interface whether you're running locally on Minikube or on a cloud cluster. I validate all my manifests locally first, and they just work when I move to a real environment.

Monitoring: Prometheus + Grafana

These two have been around forever, and for good reason — they just work. My FastAPI services expose a /metrics endpoint, Prometheus scrapes it, and Grafana turns it into dashboards I can actually read. I track p50/p95/p99 latency, error rates, request volume, and prediction distribution over time to catch drift. The whole setup is versioned as code, so spinning it up for a new project takes minutes.

CI/CD: GitHub Actions

It's free, it lives right next to my code, and I don't need to maintain a Jenkins server. The YAML syntax can be annoying, but the ecosystem of pre-built actions covers almost everything I need: linting, tests, Docker builds, registry pushes, and Kubernetes deployments. For personal projects and small teams, it's hard to justify anything else.

What I'm Poking At Next

Ray Serve keeps coming up when I talk to people doing high-throughput inference. BentoML looks interesting for a higher-level serving abstraction. LangSmith is already in my toolkit for LLM tracing. And I'm watching OpenTelemetry closely — standardised observability across services would clean up a lot of my monitoring setup.

If I Had to Summarise

Start with MLflow, Prefect, FastAPI, Docker, and GitHub Actions. Add Prometheus and Grafana when you actually deploy something. Bring in Kubernetes when you need it, not before. The goal is to solve the problems you have today, not the ones you might have next year.

← Back to Blog Read: MLOps Pipeline Post →