⚡Model Drift & Monitoring — Catching AI Failures Before Users Do (EP6)

🎯 Why This Episode Matters
In software, when something breaks… it usually breaks loudly.

In machine learning, models often fail quietly:

No errors
No exceptions
No crashes

Just silently wrong predictions.

The model you deployed last month was great.
Today, users are typing new slang, new patterns, new behaviors…
and your model has no idea what they’re talking about.

That’s data drift and concept drift.
If you’re not monitoring it, your “AI system” slowly becomes useless while dashboards stay green.

Episode 6 is all about the missing layer:

👉 Monitoring & Drift Detection for ML models in production.

We’ll build a real setup that:

tracks requests and predictions
detects out-of-distribution inputs (slang / new patterns)
shows live metrics in Prometheus
visualizes drift in Grafana dashboards

This is how real companies keep ML systems trustworthy after deployment.

📌 What We Build in Episode 6

Our repo now has a proper monitoring stack:

mlops_ep6_monitoring/

artifacts_prod/              # Production model
    pipeline.pkl

model/
    train_good_model.py      # Train stable production model
    api_monitoring.py        # FastAPI with metrics + drift detection

data/
    data.csv                 # Sample training/serving data

prometheus/
    prometheus.yml           # Scrape config for FastAPI metrics

grafana/
    provisioning/            # Auto-configure Prometheus datasource

Dockerfile                   # Build monitored API image
docker-compose.yml           # API + Prometheus + Grafana stack
requirements.txt

In this episode, we:

train a GOOD production model
wrap it in a FastAPI microservice
instrument it with Prometheus metrics
visualize everything in Grafana
simulate drift using unseen slang and new patterns

By the end, you’ll have a production-style ML monitoring setup running on your machine.

🟢 Training the Stable Production Model

We start with a clean, reliable text-classification pipeline.

The script:

python model/train_good_model.py

Trains a model and saves it to:

artifacts_prod/pipeline.pkl

This is our trusted production model.

The FastAPI service always loads this file
Prometheus + Grafana observe everything it does
Any future model must beat or at least match this one

Think of it as:
🧠 “the brain currently running in production.”

📊 Turning FastAPI into a Monitored ML Microservice

Next, we upgrade our API into a fully observable ML service.

api_monitoring.py exposes:

/predict — for real predictions
/metrics — for Prometheus scraping

Inside, we track:

Total requests (how much traffic your model receives)
Predictions per class (are we suddenly predicting one class 90% of the time?)
Input text length histograms (user behavior changing?)
Model version as a metric (which model is live)
Out-of-Distribution (OOD) inputs based on slang / unseen patterns

Example OOD idea:

def is_out_of_distribution(text: str) -> bool:
    slang = ["scene out", "wifi kaput", "5g gone", "rip net"]
    ...

It’s intentionally simple — but it mimics how real teams add signals for new behavior.

This is not just “serving a model”.
This is instrumenting a model.

🐳 Running the Full Stack: API + Prometheus + Grafana

We don’t run services manually one by one.
We run them like a real platform would:

docker compose up --build

This spins up:

FastAPI (monitored ML microservice)
Prometheus (metrics database + query engine)
Grafana (dashboards)

One command → complete MLOps monitoring environment.

📡 Prometheus: Watching Metrics in Real Time

Open:

❝

http://localhost:9090

Query metrics like:

ml_requests_total — overall traffic
ml_pred_network_total, ml_pred_billing_total, etc.
ml_input_in_distribution_total
ml_input_out_of_distribution_total
ml_text_length_bucket

Then:

Send normal inputs to the API
Send slang / weird inputs
Refresh your queries

You’ll see the OOD counters jump.
That’s live drift detection.

📈 Grafana: Visualizing Drift & Behavior

Next, open Grafana:

❝

http://localhost:3000
login: admin

We auto-provision Prometheus as a datasource, so you can start creating dashboards immediately.

Typical panels we build:

Out-of-Distribution Rate

rate(ml_input_out_of_distribution_total[1m])

Prediction Distribution

rate(ml_pred_network_total[5m])
rate(ml_pred_billing_total[5m])
...

Request Traffic
```
rate(ml_requests_total[1m])
```

Input Length Behavior

histogram_quantile(0.95, rate(ml_text_length_bucket[5m]))

Model Version
```
ml_model_version
```

With just a few panels, you can answer:

“Are users behaving differently than last week?”
“Did predictions shift heavily toward one class?”
“Are we getting more OOD traffic?”
“Which model version is currently live?”

This is real observability for ML, not just logging.

🧠 What EP6 Teaches You

Key idea:

❝

CI/CD protects deployments.
Monitoring protects everything after deployment.

Episode 6 gives you:

the difference between “serving a model” and monitoring a model
how to expose ML metrics from FastAPI
how to design OOD / drift signals
how to connect FastAPI → Prometheus → Grafana
how to build a drift dashboard in under 20 minutes
how real teams notice model failures before customers do

If you want to call yourself an MLOps Engineer, this is core skillset.

🚀 Coming Up in Episode 7

Episode 7 connects all the pieces:

monitoring detects drift
drift triggers retraining
CI/CD evaluates & auto-rejects bad models
only better models get promoted

End goal:

👉 A self-updating ML system
that:

watches itself
retrains when needed
tests new models
auto-promotes only when safe

This is what “real-world MLOps” looks like.

🔗 Full Video + Code Access

🎥 Watch Episode 6:
https://youtu.be/GQj0S2bHc68

📬 Code + Labs + Exercises:
https://learnwithdevopsengineer.beehiiv.com/subscribe

Subscribers get:

full FastAPI + Prometheus + Grafana code
monitoring & drift detection labs
CI/CD + governance examples from EP5
“real incident” simulation scripts
interview questions for MLOps & DevOps roles
all episode bundles in one place

💼 Need DevOps or MLOps Help?

If you’re building:

CI/CD pipelines for ML or microservices
Docker + Jenkins / GitHub Actions setups
MLflow / experiment tracking
FastAPI model deployments
monitoring + alerting (Prometheus / Grafana)
Kubernetes or scalable infra for ML
cost-optimized cloud environments

You can reach out and work with me directly.

Reply to this email or message me on YouTube / Instagram.

— Arbaz
📺 YouTube: Learn with DevOps Engineer
📬 Newsletter: learnwithdevopsengineer.beehiiv.com/subscribe
📸 Instagram: instagram.com/learnwithdevopsengineer

⚡Model Drift & Monitoring — Catching AI Failures Before Users Do (EP6)

Keep Reading

Learnwithdevopsengineer