Machine learning model deployment types

What are the four types of machine learning model deployments?

Even if you won't be working with them on a daily basis, here are four ways to deploy ML models that you should know and understand as an MLOps/ML engineer.


1. You apply your trained models as a part of ETL/ELT Process on a given schedule.

2. You load the required Features from a batch storage, apply inference and save the results to a batch storage.

3. It is sometimes falsely thought that you can’t use this method for Real Time Predictions.

4. Inference results can be loaded into a real time storage and used for real time applications.

Embedded in a stream application:

1. You apply your trained models as a part of Stream Processing Pipeline.

2. While Data is continuously piped through your Streaming Data Pipelines, an application with a loaded model continuously applies inference on the data and returns it to the system - most likely another Streaming Storage.

3. This deployment type is likely to involve a real time Feature Store Serving API to retrieve additional Static Features for inference purposes.

4. Predictions can be consumed by multiple applications subscribing to the Inference Stream.

Real time:

1. You expose your model as a Backend Service (REST or gRPC).

2. This ML Service retrieves Features needed for inference from a Real Time Feature Store Serving API.

3. Inference can be requested by any application in real time as long as it is able to form a correct request that conforms API Contract.


1. You embed your trained model directly into the application that runs on a user device.

2. This method provides the lowest latency and improves privacy.

3. Data in most cases is generated and lives inside of device significantly improving the security.