How to implement CI/CD/CT for Machine Learning?
If you are working on Machine Learning projects, chances are you are working on some
version of Continuous Integration / Continuous Deployment (CI/CD). It represents a
high level of maturity in MLOps with Continuous Training (CT) at the top. This level
of automation really helps ML engineers to solely focus on experimenting with new
ideas while delegating repetitive tasks to engineering pipelines and minimizing
human errors.
There are many ways to implement CI/CD/CT for Machine Learning
but here is a typical process:
The experimental phase: The
ML Engineer wants to test a new idea (let's say a new feature transformation). He
modifies the code base to implement the new transformation, trains a model, and
validates that the new transformation indeed yields higher performance. The
resulting outcome at this point is just a piece of code that needs to be included in
the master repo.
Continuous integration: The engineer then
creates a Pull Request (PR) that automatically triggers unit testing (like a typical
CI process) but also triggers the instantiation of the automated training pipeline
to retrain the model, potentially test it through integration tests or test cases
and push it to a model registry. There is a manual process for another engineer to
validate the PR and performance reading of the new model.
Continuous
deployment: Activating a deployment triggers a canary deployment to
make sure the model fits in a serving pipeline and runs an A/B test experiment to
test it against the production model. After satisfactory results, we can propose the
new model as a replacement for the production one.
Continuous
training: as soon as the model enters the model registry, it
deteriorates and you might want to activate recurring training right away. For
example, each day the model can be further fine-tuned with the new training data of
the day, deployed, and the serving pipeline is rerouted to the updated model.