How I Built an End-to-End MLOps Pipeline: MLflow + FastAPI + Kubernetes

By Amulya Gupta 8 min read ← Back to Blog

Most ML engineers stop at the notebook. They train a model, check the accuracy, and call it done. But shipping a model to production — in a way that's reliable, reproducible, and observable — is a completely different engineering challenge. Here's exactly how I approached it.

Why Most ML Portfolios Miss the Point

The notebook is where the idea lives. Production is where it has to survive. The gap between the two involves: versioning experiments so they can be reproduced, serving predictions reliably at scale, packaging the service so it runs anywhere, orchestrating deployment without manual steps, and knowing when something is going wrong in production. My goal with this project was to build every layer.

Section 1: The Architecture

The full pipeline is five layers, each with a clear responsibility:

Data & Training: Preprocessing → MLflow experiment tracking → artifact storage
Inference: FastAPI service → Pydantic validation → /predict, /health, /metrics
Infrastructure: Docker image → Kubernetes manifests (Minikube for local, cloud-ready)
Observability: Prometheus scrape → Grafana dashboards (latency, error rates, throughput)
CI/CD: GitHub Actions → automated build/test/deploy on every push

Section 2: Data & Training with MLflow

MLflow handles experiment tracking and artifact management. Every training run logs its parameters, metrics, and model artifact. This means any experiment can be reproduced exactly — something that matters enormously when you need to audit or re-deploy a specific model version.

import mlflow with mlflow.start_run(): mlflow.log_param("n_estimators", 100) mlflow.log_param("max_depth", 5) model.fit(X_train, y_train) accuracy = model.score(X_test, y_test) mlflow.log_metric("accuracy", accuracy) mlflow.sklearn.log_model(model, "model")

Section 3: Building the FastAPI Inference Service

FastAPI was the right choice here for three reasons: native async support, automatic OpenAPI documentation generation, and Pydantic validation built in. The service exposes three key endpoints: /predict for inference, /health for Kubernetes liveness/readiness probes, and /metrics for Prometheus scraping.

from fastapi import FastAPI from pydantic import BaseModel import mlflow.sklearn app = FastAPI() class PredictRequest(BaseModel): age: int cholesterol: float blood_pressure: float @app.post("/predict") async def predict(request: PredictRequest): features = [[request.age, request.cholesterol, request.blood_pressure]] prediction = model.predict(features) return {"prediction": int(prediction[0]), "confidence": float(model.predict_proba(features).max())} @app.get("/health") async def health(): return {"status": "healthy"}

Section 4: Containerising with Docker

The Dockerfile follows production best practices: multi-stage build to keep the image lean, non-root user for security, health check instruction for orchestrator integration, and pinned dependency versions for reproducibility.

Section 5: Kubernetes Deployment

Even on Minikube, using Kubernetes manifests forces you to think in production terms: resource limits, readiness probes, rolling updates, and service exposure. The manifests work on any cluster — cloud or local.

Section 6: Observability with Prometheus & Grafana

Observability isn't optional in production. If you don't know your model's inference latency, error rate, and request volume, you're flying blind. Prometheus scrapes metrics from the /metrics endpoint every 15 seconds. Grafana visualises them. I track p50/p95/p99 inference latency, prediction distribution drift, error rate, and requests per second.

Section 7: CI/CD with GitHub Actions

Every push to main triggers the pipeline: lint → test → build Docker image → push to registry → deploy to Kubernetes. No manual steps. If a test fails, the deployment doesn't happen. This is table stakes for production engineering.

What I Learned

The hardest part wasn't any individual component — it was wiring them together reliably. The most important lesson: observability is a first-class engineering concern, not an afterthought. And the second: reproducibility in experiment tracking saves enormous debugging time when a deployed model starts behaving unexpectedly.

← Back to Blog View Full Project →