Machine learning model deployment types
What are the four types of machine learning model deployments?
Even if you won't be working with them on a daily basis, here are four ways to deploy ML models that you should know and understand as an MLOps/ML engineer.
Batch:
1. You apply your trained models as a part of ETL/ELT Process on a given schedule.
2. You load the required Features from a batch storage, apply inference and save the results to a batch storage.
3. It is sometimes falsely thought that you can’t use this method for Real Time Predictions.
4. Inference results can be loaded into a real time storage and used for real time applications.
Embedded in a stream application:
1. You apply your trained models as a part of Stream Processing Pipeline.
2. While Data is continuously piped through your Streaming Data Pipelines, an application with a loaded model continuously applies inference on the data and returns it to the system - most likely another Streaming Storage.
3. This deployment type is likely to involve a real time Feature Store Serving API to retrieve additional Static Features for inference purposes.
4. Predictions can be consumed by multiple applications subscribing to the Inference Stream.
Real time:
1. You expose your model as a Backend Service (REST or gRPC).
2. This ML Service retrieves Features needed for inference from a Real Time Feature Store Serving API.
3. Inference can be requested by any application in real time as long as it is able to form a correct request that conforms API Contract.
Edge:
1. You embed your trained model directly into the application that runs on a user device.
2. This method provides the lowest latency and improves privacy.
3. Data in most cases is generated and lives inside of device significantly improving the security.