⚡Model Drift & Monitoring — Catching AI Failures Before Users Do (EP6)

MLOps Series — How Real Companies Watch Their Models in Production

🎯 Why This Episode Matters
In software, when something breaks… it usually breaks loudly.

In machine learning, models often fail quietly:

  • No errors

  • No exceptions

  • No crashes

Just silently wrong predictions.

The model you deployed last month was great.
Today, users are typing new slang, new patterns, new behaviors…
and your model has no idea what they’re talking about.

That’s data drift and concept drift.
If you’re not monitoring it, your “AI system” slowly becomes useless while dashboards stay green.

Episode 6 is all about the missing layer:

👉 Monitoring & Drift Detection for ML models in production.

We’ll build a real setup that:

  • tracks requests and predictions

  • detects out-of-distribution inputs (slang / new patterns)

  • shows live metrics in Prometheus

  • visualizes drift in Grafana dashboards

This is how real companies keep ML systems trustworthy after deployment.

📌 What We Build in Episode 6

Our repo now has a proper monitoring stack:

mlops_ep6_monitoring/

artifacts_prod/              # Production model
    pipeline.pkl

model/
    train_good_model.py      # Train stable production model
    api_monitoring.py        # FastAPI with metrics + drift detection

data/
    data.csv                 # Sample training/serving data

prometheus/
    prometheus.yml           # Scrape config for FastAPI metrics

grafana/
    provisioning/            # Auto-configure Prometheus datasource

Dockerfile                   # Build monitored API image
docker-compose.yml           # API + Prometheus + Grafana stack
requirements.txt

In this episode, we:

  • train a GOOD production model

  • wrap it in a FastAPI microservice

  • instrument it with Prometheus metrics

  • visualize everything in Grafana

  • simulate drift using unseen slang and new patterns

By the end, you’ll have a production-style ML monitoring setup running on your machine.

🟢 Training the Stable Production Model

We start with a clean, reliable text-classification pipeline.

The script:

python model/train_good_model.py

Trains a model and saves it to:

artifacts_prod/pipeline.pkl

This is our trusted production model.

  • The FastAPI service always loads this file

  • Prometheus + Grafana observe everything it does

  • Any future model must beat or at least match this one

Think of it as:
🧠 “the brain currently running in production.”

📊 Turning FastAPI into a Monitored ML Microservice

Next, we upgrade our API into a fully observable ML service.

api_monitoring.py exposes:

  • /predict — for real predictions

  • /metrics — for Prometheus scraping

Inside, we track:

  • Total requests (how much traffic your model receives)

  • Predictions per class (are we suddenly predicting one class 90% of the time?)

  • Input text length histograms (user behavior changing?)

  • Model version as a metric (which model is live)

  • Out-of-Distribution (OOD) inputs based on slang / unseen patterns

Example OOD idea:

def is_out_of_distribution(text: str) -> bool:
    slang = ["scene out", "wifi kaput", "5g gone", "rip net"]
    ...

It’s intentionally simple — but it mimics how real teams add signals for new behavior.

This is not just “serving a model”.
This is instrumenting a model.

🐳 Running the Full Stack: API + Prometheus + Grafana

We don’t run services manually one by one.
We run them like a real platform would:

docker compose up --build

This spins up:

  • FastAPI (monitored ML microservice)

  • Prometheus (metrics database + query engine)

  • Grafana (dashboards)

One command → complete MLOps monitoring environment.

📡 Prometheus: Watching Metrics in Real Time

Open:

http://localhost:9090

Query metrics like:

  • ml_requests_total — overall traffic

  • ml_pred_network_total, ml_pred_billing_total, etc.

  • ml_input_in_distribution_total

  • ml_input_out_of_distribution_total

  • ml_text_length_bucket

Then:

  1. Send normal inputs to the API

  2. Send slang / weird inputs

  3. Refresh your queries

You’ll see the OOD counters jump.
That’s live drift detection.

📈 Grafana: Visualizing Drift & Behavior

Next, open Grafana:

http://localhost:3000
login: admin

We auto-provision Prometheus as a datasource, so you can start creating dashboards immediately.

Typical panels we build:

  1. Out-of-Distribution Rate

  2. Prediction Distribution

  3. Request Traffic

  4. Input Length Behavior

  5. Model Version

With just a few panels, you can answer:

  • “Are users behaving differently than last week?”

  • “Did predictions shift heavily toward one class?”

  • “Are we getting more OOD traffic?”

  • “Which model version is currently live?”

This is real observability for ML, not just logging.

🧠 What EP6 Teaches You

Key idea:

CI/CD protects deployments.
Monitoring protects everything after deployment.

Episode 6 gives you:

  • the difference between “serving a model” and monitoring a model

  • how to expose ML metrics from FastAPI

  • how to design OOD / drift signals

  • how to connect FastAPI → Prometheus → Grafana

  • how to build a drift dashboard in under 20 minutes

  • how real teams notice model failures before customers do

If you want to call yourself an MLOps Engineer, this is core skillset.

🚀 Coming Up in Episode 7

Episode 7 connects all the pieces:

  • monitoring detects drift

  • drift triggers retraining

  • CI/CD evaluates & auto-rejects bad models

  • only better models get promoted

End goal:

👉 A self-updating ML system
that:

  • watches itself

  • retrains when needed

  • tests new models

  • auto-promotes only when safe

This is what “real-world MLOps” looks like.

🔗 Full Video + Code Access

🎥 Watch Episode 6:
https://youtu.be/GQj0S2bHc68

Subscribers get:

  • full FastAPI + Prometheus + Grafana code

  • monitoring & drift detection labs

  • CI/CD + governance examples from EP5

  • “real incident” simulation scripts

  • interview questions for MLOps & DevOps roles

  • all episode bundles in one place

💼 Need DevOps or MLOps Help?

If you’re building:

  • CI/CD pipelines for ML or microservices

  • Docker + Jenkins / GitHub Actions setups

  • MLflow / experiment tracking

  • FastAPI model deployments

  • monitoring + alerting (Prometheus / Grafana)

  • Kubernetes or scalable infra for ML

  • cost-optimized cloud environments

You can reach out and work with me directly.

Reply to this email or message me on YouTube / Instagram.

Arbaz
📺 YouTube: Learn with DevOps Engineer
📬 Newsletter: learnwithdevopsengineer.beehiiv.com/subscribe
📸 Instagram: instagram.com/learnwithdevopsengineer