🎯 Why This Episode Matters

In Episode 3, we deployed our model behind FastAPI + Docker and watched it fail instantly on real-world inputs — even though the API, container, and server were all healthy.

Episode 4 is where we FIX the root cause.

This is the first episode where we move from “quick ML experiments” to actual MLOps engineering:

proper preprocessing
proper packaging
consistent training + inference
versioning
reproducible behavior

This episode marks the moment the model becomes production-ready.

📌 Why Episode 3 Failed

This is the most common real-world ML failure.

Our model wasn’t wrong.
Our preprocessing was wrong.

Here’s what caused the failure:

No preprocessing pipeline
Vectorizer not saved or versioned correctly
Input cleaning mismatch
Tokenization mismatch
Model and vectorizer stored separately
Training logic ≠ inference logic

This is EXACTLY what happens in companies:

ML engineer saves model.pkl
Forgets the vectorizer
DevOps deploys it
Vectorizer mismatch
Model fails silently
End users get nonsense predictions

Episode 4 is the fix for this entire class of failures.

🧱 The Fix — Build a REAL Machine Learning Pipeline

We replace the broken approach with a proper scikit-learn Pipeline:

Pipeline([
    ("vectorizer", CountVectorizer()),
    ("model", LogisticRegression())
])

One object.
One file.
One save.
One load.

No mismatches.
No separate artifacts.
No confusion between training and serving.

This is how ML models SHOULD be packaged.

🧪 Training With the New Pipeline

We rewrite the training script (train_v2.py) to:

use the pipeline
train end-to-end preprocessing + model
evaluate correctly
save a single pipeline.pkl
log everything in MLflow

This version is:

cleaner
reproducible
usable in any environment
ready for real-world deployment

It is already 10x more production-ready than our Episode 2 model.

📁 Inference With the Correct Pipeline

We also build a new inference script (infer_v2.py) that loads the full pipeline and tests it on real-world messages.

You’ll immediately notice:

noisy messages perform better
predictions stabilize
behavior is more consistent
accuracy is more realistic

This is the power of combining preprocessing + model into one pipeline.

🏃 Updating the FastAPI Service

We update the API:

load only pipeline.pkl
remove manual vectorizer logic
remove mismatched transformations
simplify inference
reduce risk

The new API is cleaner, safer, and exactly how real companies deploy ML today.

🐳 Dockerfile v2 — Production-Friendly

The updated Dockerfile includes the new pipeline and the v2 API.

We build and run:

docker build -t ml-api-v2 .
docker run -p 8000:8000 ml-api-v2

Same deployment process — but with a far more stable model inside.

🧠 What EP4 Teaches You

Episode 4 shows the core idea of MLOps:

ML models do not fail because of accuracy.
They fail because training ≠ serving.

This episode fixes:

vectorizer mismatch
preprocessing mismatch
dependency mismatch
versioning issues
inconsistent logic

This is the first time in the series where our model becomes truly production-ready.

🚀 Coming Up in Episode 5

Next episode takes the next big step:

automated retraining
validation checks
CI/CD for model updates
batch evaluation
data drift strategies
MLflow Model Registry
proper versioning
improved deployment

This is where the pipeline becomes a real system.

🔗 Full Video + Code Access

🎥 Watch Episode 4: https://youtube.com/@learnwithdevopsengineer
📬 Get code, labs, and exercises:
https://learnwithdevopsengineer.beehiiv.com/subscribe

Subscribers get:

full project code
all pipelines
Dockerfiles
model registry examples
drift testing scripts
real incident simulations
interview prep

This entire series will accelerate your MLOps career.

💼 Need DevOps/MLOps Help?

If you’re building:

CI/CD pipelines
Docker or Jenkins setups
MLflow tracking
FastAPI deployments
production ML pipelines
monitoring and alerting
Kubernetes workloads
cloud cost optimization

You can consult me directly.

Reply to this email or message me on YouTube/Instagram.

— Arbaz
📺 YouTube: Learn with DevOps Engineer
📬 Newsletter: learnwithdevopsengineer.beehiiv.com/subscribe
📸 Instagram: instagram.com/learnwithdevopsengineer

⚡Fixing the Model — Building the FIRST Real MLOps Pipeline (EP4)