This document summarizes and compares several machine learning deployment tools, including Seldon, Clipper, MLFlow, and MLeap. For each tool, it outlines key features like supported frameworks, Kubernetes integration, serialization method, and pros and cons. It also provides findings around challenges like enabling Spark and resolving Kubernetes pod issues. Finally, it includes additional references for machine learning model serialization and deployment.
4. ▶ Python pickle based
code serialization
▶ sklearn.externals.joblib
▶ Spark provide api to
save model/pipeline
as file
▶ Tensorflow provide
tf.train.Saver that
persists the tensor
graph
▶ It is pickle +
metadata +
checkpoint
Python Sklearn / Spark / Tensorflow
5.
6. ▶ Models from different tools are not compatible
▶ Code serialization has dependency on python version
▶ Code serialization has potential security concerns
▶ For tf model, those tensor names are required ( need check if there are in the
meta data)
▶ tf mode has dependency on customer code which defined customer
operations
Issues and Limitations
8. ▶ Enable wide range of ML modeling tools : Python, R, Tensorflow, Spark
▶ Scale up and down
▶ Performance, Latency optimization
▶ Accessing model, API
▶ Audit and Versioning
▶ CI/CD
▶ Metrics and Monitoring
▶ Optimization, AB Tests
ML Deployment Challenges
10. ▶ Seldon, A London Company focuses on providing control over Machine
Learning based on open source software
▶ Seldon Core is a open source platform for deploying machine learning model
on Kubernetes
• Python/Spark/H2O/R model support
• REST and gRPC API
• Deploy Inference graph of Model/Routers/Combiner/Transformers as microservices
• Leveraging K8s to provide scale, security, monitoring etc
Seldon
11.
12.
13.
14.
15. Pros Cons
▶ Seamless K8s integration
▶ Graph definition to support AB
test and ensembling
▶ No Scala support for Spark
▶ Need customer image for
pySpark
▶ No customization support for
liveness/readiness check due to
CRD
Summary
17. ▶ Clipper.ai is a system developed by UC Berkeley RISE lab.
▶ Clipper is a prediction serving system that sits between user-facing
applications and a wide range of commonly used machine learning models
and frameworks.
Clipper
18.
19.
20. Pros Cons
▶ Easy to use interactive model
deploy
▶ Support Docker and K8s
▶ Query Latency Objective support
▶ Model Version management
• Update and Rollback
▶ Cloud pickle version issue
▶ Python only
▶ Less examples/Documents
▶ Not friendly to AWS
• use_internal_ip does not work well
• need manually create repo for
model
• Failed to pull image from ecr
▶ Cluster creation is not stable
▶ Tensorflow failed to pickle
Summary
22. ▶ MLflow is an open source platform for managing the end-to-end machine
learning lifecycle.
▶ MLFlow is developed by Databricks
MLFlow
23.
24.
25. Pros Cons
▶ Flexible
▶ Easy to do with SKlearn
▶ Cloud integration to support
sagemaker and azure
▶ No K8s integration
▶ Spark/Tensorflow support is
based on Python
▶ Projects are better managed by
container
Summary
27. ▶ MLeap allows data scientists and engineers to deploy machine learning
pipelines from Spark and Scikit-learn to a portable format and execution
engine.
• A JSON base serialization
• A Runtime execution engine
• Benchmarks
▶ http://mleap-docs.combust.ml/core-concepts/transformers/support.html
MLeap
31. Pros Cons
▶ Portable model between Spark
and Sklearn
▶ Human readable model
▶ Easy model serving
▶ Support matrix is incomplete
▶ Extensibility
• Write code for each
estimator/transformer
▶ To support tensorflow, need
customer build tf-java binding,
and is under experiment
Summary
33. ▶ Seldon tightly integrates with k8s to support the scalability of model serving,
and it’s graph function is powerful.
▶ Clipper provides good interaction, while the code is not stable enough
▶ MLflow’s model serving is simple, with less functions
▶ MLeap targets to provide inter-operation between different tools which is very
nice, while there is still a long way to go to support all the features.
• PMML is not covered
▶ Some other tools are not touched
• MXnet model server
• Oracle Graphpipe
Wrap up
34. Model Persistent ML Tools K8s Integration Version License Implementation
Seldon
Core
S2i + Pickle Tensorflow, SKlearn,
Keras, R, H2O,
Nodejs, PMML
Yes 0.3.2 Apache Docker + K8s CRD
Clipper Pickle Python, PySpark,
PyTorch, Tensorflow,
MXnet, Customer
Container
Yes 0.3.0 Apache CPP / Python
MLFlow Directory +
Metadata
Python, H2O, Kera,
MLeap, PyTorch,
Sklearn, Spark,
Tensorflow, R
No Alpha Apache Python
MLeap Spark,Sklearn,
Tensorflow
No 0.12.0 Apache Scala/Java
36. ▶ Enabling Spark is not easy
• Version, pyspark version, java version
• Build spark image with glibc support
• Java gateway process exited before sending its port number
• Access spark from k8s is not easy
▶ Some K8s pods are pending with Unknown status
• kubectl delete pod {} --grace-period=0 --force
▶ Building your own ML image from python is not easy, use
continuumio/miniconda may save you some time
▶ Using batch command to clean the docker images
• docker images | grep "something_to_search" | awk '{print $1 ":" $2}' |xargs docker rmi -f
• docker system prune
Some other findings