- Learnwithdevopsengineer
- Posts
- ⚡Fixing the Model — Building the FIRST Real MLOps Pipeline (EP4)
⚡Fixing the Model — Building the FIRST Real MLOps Pipeline (EP4)
MLOps Series — Preprocessing + Packaging Done the Right Way
🎯 Why This Episode Matters
In Episode 3, we deployed our model behind FastAPI + Docker and watched it fail instantly on real-world inputs — even though the API, container, and server were all healthy.
Episode 4 is where we FIX the root cause.
This is the first episode where we move from “quick ML experiments” to actual MLOps engineering:
proper preprocessing
proper packaging
consistent training + inference
versioning
reproducible behavior
This episode marks the moment the model becomes production-ready.
📌 Why Episode 3 Failed
This is the most common real-world ML failure.
Our model wasn’t wrong.
Our preprocessing was wrong.
Here’s what caused the failure:
No preprocessing pipeline
Vectorizer not saved or versioned correctly
Input cleaning mismatch
Tokenization mismatch
Model and vectorizer stored separately
Training logic ≠ inference logic
This is EXACTLY what happens in companies:
ML engineer saves model.pkl
Forgets the vectorizer
DevOps deploys it
Vectorizer mismatch
Model fails silently
End users get nonsense predictions
Episode 4 is the fix for this entire class of failures.
🧱 The Fix — Build a REAL Machine Learning Pipeline
We replace the broken approach with a proper scikit-learn Pipeline:
Pipeline([
("vectorizer", CountVectorizer()),
("model", LogisticRegression())
])
One object.
One file.
One save.
One load.
No mismatches.
No separate artifacts.
No confusion between training and serving.
This is how ML models SHOULD be packaged.
🧪 Training With the New Pipeline
We rewrite the training script (train_v2.py) to:
use the pipeline
train end-to-end preprocessing + model
evaluate correctly
save a single
pipeline.pkllog everything in MLflow
This version is:
cleaner
reproducible
usable in any environment
ready for real-world deployment
It is already 10x more production-ready than our Episode 2 model.
📁 Inference With the Correct Pipeline
We also build a new inference script (infer_v2.py) that loads the full pipeline and tests it on real-world messages.
You’ll immediately notice:
noisy messages perform better
predictions stabilize
behavior is more consistent
accuracy is more realistic
This is the power of combining preprocessing + model into one pipeline.
🏃 Updating the FastAPI Service
We update the API:
load only
pipeline.pklremove manual vectorizer logic
remove mismatched transformations
simplify inference
reduce risk
The new API is cleaner, safer, and exactly how real companies deploy ML today.
🐳 Dockerfile v2 — Production-Friendly
The updated Dockerfile includes the new pipeline and the v2 API.
We build and run:
docker build -t ml-api-v2 .
docker run -p 8000:8000 ml-api-v2
Same deployment process — but with a far more stable model inside.
🧠 What EP4 Teaches You
Episode 4 shows the core idea of MLOps:
ML models do not fail because of accuracy.
They fail because training ≠ serving.
This episode fixes:
vectorizer mismatch
preprocessing mismatch
dependency mismatch
versioning issues
inconsistent logic
This is the first time in the series where our model becomes truly production-ready.
🚀 Coming Up in Episode 5
Next episode takes the next big step:
automated retraining
validation checks
CI/CD for model updates
batch evaluation
data drift strategies
MLflow Model Registry
proper versioning
improved deployment
This is where the pipeline becomes a real system.
🔗 Full Video + Code Access
🎥 Watch Episode 4: https://youtube.com/@learnwithdevopsengineer
📬 Get code, labs, and exercises:
https://learnwithdevopsengineer.beehiiv.com/subscribe
Subscribers get:
full project code
all pipelines
Dockerfiles
model registry examples
drift testing scripts
real incident simulations
interview prep
This entire series will accelerate your MLOps career.
💼 Need DevOps/MLOps Help?
If you’re building:
CI/CD pipelines
Docker or Jenkins setups
MLflow tracking
FastAPI deployments
production ML pipelines
monitoring and alerting
Kubernetes workloads
cloud cost optimization
You can consult me directly.
Reply to this email or message me on YouTube/Instagram.
— Arbaz
📺 YouTube: Learn with DevOps Engineer
📬 Newsletter: learnwithdevopsengineer.beehiiv.com/subscribe
📸 Instagram: instagram.com/learnwithdevopsengineer