MLOps in 2025: The Stack I Actually Use (and Why)

By Amulya Gupta 6 min read ← Back to Blog

There are 47 MLOps tools. Each one has a marketing page telling you it's the only one you'll ever need. Here's my honest take on the stack I've actually settled on — and why each tool earned its place after doing the work.

The Principle: Start Simple, Add Complexity When You Feel the Pain

The worst mistake I see is teams adopting complex orchestration before they need it, or adding observability tools they never query. Every tool in this list is here because I felt a real pain that it solved. Not because a blog post told me it was "best practice."

Experiment Tracking: MLflow

I chose MLflow over Weights & Biases and Neptune for one main reason: it's self-hosted and free at any scale. For personal projects and smaller teams, paying per-seat for experiment tracking doesn't make sense. MLflow gives you experiment tracking, model registry, artifact storage, and a decent UI. It integrates with everything. The learning curve is low. It does the job.

What I'm watching: for LLM projects, I'm starting to look at LangSmith specifically for prompt versioning and trace-level debugging. MLflow isn't quite there for that use case yet.

Orchestration: Prefect

I chose Prefect over Apache Airflow for Python-first teams. Airflow's DAG model is powerful, but the overhead for small-to-medium pipelines is painful. Prefect lets you decorate your existing Python functions and get scheduling, retries, observability, and deployment without rewriting your code. The local dev experience is significantly better. For teams that are mostly Python engineers building ML pipelines, Prefect is the right default.

Serving: FastAPI

FastAPI is non-negotiable for me. Async by default, automatic OpenAPI documentation, Pydantic validation, and a clean developer experience. I've tried Flask — it's fine for quick prototypes, but once you need validation, middleware, and structured response models, FastAPI is just faster to work with correctly.

Containers: Docker

If you're not containerising your ML services, you're creating environment dependency problems that will bite you in production. Docker is the baseline. Learn multi-stage builds, learn how to keep images lean, learn health checks. These aren't optional skills for production ML.

Orchestration: Kubernetes

The learning curve is real. But it's worth it. Kubernetes gives you rolling deployments with zero downtime, horizontal scaling, health checks built into the scheduler, declarative infrastructure, and a common interface across all cloud providers. I use Minikube locally to validate manifests before deploying to a real cluster. The manifests work the same everywhere.

Monitoring: Prometheus + Grafana

The industry standard for a reason. Prometheus handles metric collection and storage. Grafana handles visualisation and alerting. The FastAPI service exposes a /metrics endpoint that Prometheus scrapes. The setup is repeatable and the dashboards are versioned as code. I track inference latency at p50/p95/p99, error rate, request volume, and prediction distribution over time (for drift detection).

CI/CD: GitHub Actions

Free, sufficient, and integrates directly with your repository. For personal projects and small teams, GitHub Actions handles everything: linting, testing, Docker build, push to registry, and Kubernetes deployment. No separate CI server to maintain. The YAML syntax isn't perfect, but the ecosystem of pre-built actions covers almost every use case.

What I'm Exploring Next

Ray Serve for high-throughput inference requirements. BentoML for a slightly higher-level serving abstraction. LangSmith specifically for LLM-based applications where you need trace-level visibility into prompt chains. OpenTelemetry for standardised observability across services.

The Bottom Line

Start simple. MLflow for tracking, Prefect for orchestration, FastAPI for serving, Docker for packaging, GitHub Actions for CI/CD, Prometheus and Grafana for monitoring. Add Kubernetes when you need it. Add complexity when you feel the pain of not having it — not before.

← Back to Blog Read: MLOps Pipeline Post →