Deploy Scikit-learn models with InferenceService¶
This example walks you through how to deploy a scikit-learn
model leveraging
the v1beta1
version of the InferenceService
CRD.
Note that, by default the v1beta1
version will expose your model through an
API compatible with the existing V1 Dataplane.
This example will show you how to serve a model through Open Inference Protocol.
Train the Model¶
The first step will be to train a sample scikit-learn
model.
Note that this model will be then saved as model.joblib
.
from sklearn import svm
from sklearn import datasets
from joblib import dump
iris = datasets.load_iris()
X, y = iris.data, iris.target
clf = svm.SVC(gamma='scale')
clf.fit(X, y)
dump(clf, 'model.joblib')
Test the Model locally¶
Once you've got your model serialised model.joblib
, we can then use KServe Sklearn Server to spin up a local server.
Note
This step is optional and just meant for testing, feel free to jump straight to deploying with InferenceService.
Using KServe SklearnServer¶
Pre-requisites¶
Firstly, to use KServe sklearn server locally, you will first need to install the sklearnserver
runtime package in your local environment.
- Clone the KServe repository and navigate into the directory.
git clone https://github.com/kserve/kserve
- Install
sklearnserver
runtime. Kserve uses Poetry as the dependency management tool. Make sure you have already installed poetry.cd python/sklearnserver poetry install
Serving model locally¶
The sklearnserver
package takes two arguments.
--model_dir
: The model directory path where the model is stored.--model_name
: The name of the model deployed in the model server, the default value ismodel
. This is optional.
With the sklearnserver
runtime package installed locally, you should now be ready to start our server as:
python3 sklearnserver --model_dir /path/to/model_dir --model_name sklearn-v2-iris
Deploy the Model with REST endpoint through InferenceService¶
Lastly, you will use KServe to deploy the trained model onto Kubernetes.
For this, you will just need to use version v1beta1
of the
InferenceService
CRD and set the protocolVersion
field to v2
.
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "sklearn-v2-iris"
spec:
predictor:
model:
modelFormat:
name: sklearn
protocolVersion: v2
runtime: kserve-sklearnserver
storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
Note
For V2 protocol (open inference protocol)
if runtime
field is not provided then, by default mlserver
runtime is used.
Note that this makes the following assumptions:
- Your model weights (i.e. your
model.joblib
file) have already been uploaded to a "model repository" (GCS in this example) and can be accessed asgs://seldon-models/sklearn/iris
. - There is a K8s cluster available, accessible through
kubectl
. - KServe has already been installed in your cluster.
kubectl apply -f sklearn.yaml
Test the Deployed Model¶
You can now test your deployed model by sending a sample request.
Note that this request needs to follow the Open Inference Protocol.
You can see an example payload below. Create a file named iris-input-v2.json
with the sample input.
{
"inputs": [
{
"name": "input-0",
"shape": [2, 4],
"datatype": "FP32",
"data": [
[6.8, 2.8, 4.8, 1.4],
[6.0, 3.4, 4.5, 1.6]
]
}
]
}
INGRESS_HOST
and INGRESS_PORT
.
Now, you can use curl
to send the inference request as:
SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-v2-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -XPOST -v \
-H "Host: ${SERVICE_HOSTNAME}" \
-H "Content-Type: application/json" \
-d @./iris-input-v2.json \
http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/sklearn-v2-iris/infer
Expected Output
{
"id": "823248cc-d770-4a51-9606-16803395569c",
"model_name": "sklearn-v2-iris",
"model_version": "v1.0.0",
"outputs": [
{
"data": [1, 1],
"datatype": "INT64",
"name": "predict",
"parameters": null,
"shape": [2]
}
]
}
Deploy the Model with GRPC endpoint through InferenceService¶
Create the inference service resource and expose the gRPC port using the below yaml.
Note
Currently, KServe only supports exposing either HTTP or gRPC port. By default, HTTP port is exposed.
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "sklearn-v2-iris-grpc"
spec:
predictor:
model:
modelFormat:
name: sklearn
protocolVersion: v2
runtime: kserve-sklearnserver
storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
ports:
- name: h2c # knative expects grpc port name to be 'h2c'
protocol: TCP
containerPort: 8081
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "sklearn-v2-iris-grpc"
spec:
predictor:
model:
modelFormat:
name: sklearn
protocolVersion: v2
runtime: kserve-sklearnserver
storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
ports:
- name: grpc-port # Istio requires the port name to be in the format <protocol>[-<suffix>]
protocol: TCP
containerPort: 8081
Note
For V2 protocol (open inference protocol)
if runtime
field is not provided then, by default mlserver
runtime is used.
Apply the InferenceService yaml to get the gRPC endpoint
kubectl apply -f sklearn-v2-grpc.yaml
Test the deployed model with grpcurl¶
After the gRPC InferenceService
becomes ready, grpcurl, can be used to send gRPC requests to the InferenceService
.
# download the proto file
curl -O https://raw.githubusercontent.com/kserve/open-inference-protocol/main/specification/protocol/open_inference_grpc.proto
INPUT_PATH=iris-input-v2-grpc.json
PROTO_FILE=open_inference_grpc.proto
SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-v2-iris-grpc -o jsonpath='{.status.url}' | cut -d "/" -f 3)
Determine the ingress IP and port and set INGRESS_HOST
and INGRESS_PORT
. Now, you can use curl
to send the inference requests.
The gRPC APIs follows the KServe prediction V2 protocol / Open Inference Protocol.
For example, ServerReady
API can be used to check if the server is ready:
grpcurl \
-plaintext \
-proto ${PROTO_FILE} \
-authority ${SERVICE_HOSTNAME} \
${INGRESS_HOST}:${INGRESS_PORT} \
inference.GRPCInferenceService.ServerReady
Expected Output
{
"ready": true
}
You can test the deployed model by sending a sample request with the below payload.
Notice that the input format differs from the in the previous REST endpoint
example.
Prepare the inference input inside the file named iris-input-v2-grpc.json
.
{
"model_name": "sklearn-v2-iris-grpc",
"inputs": [
{
"name": "input-0",
"shape": [2, 4],
"datatype": "FP32",
"contents": {
"fp32_contents": [6.8, 2.8, 4.8, 1.4, 6.0, 3.4, 4.5, 1.6]
}
}
]
}
ModelInfer
API takes input following the ModelInferRequest
schema defined in the grpc_predict_v2.proto
file.
grpcurl \
-vv \
-plaintext \
-proto ${PROTO_FILE} \
-authority ${SERVICE_HOSTNAME} \
-d @ \
${INGRESS_HOST}:${INGRESS_PORT} \
inference.GRPCInferenceService.ModelInfer \
<<< $(cat "$INPUT_PATH")
Expected Output
Resolved method descriptor:
// The ModelInfer API performs inference using the specified model. Errors are
// indicated by the google.rpc.Status returned for the request. The OK code
// indicates success and other codes indicate failure.
rpc ModelInfer ( .inference.ModelInferRequest ) returns ( .inference.ModelInferResponse );
Request metadata to send:
(empty)
Response headers received:
content-type: application/grpc
date: Mon, 09 Oct 2023 11:07:26 GMT
grpc-accept-encoding: identity, deflate, gzip
server: istio-envoy
x-envoy-upstream-service-time: 16
Estimated response size: 83 bytes
Response contents:
{
"modelName": "sklearn-v2-iris-grpc",
"id": "41738561-7219-4e4a-984d-5fe19bed6298",
"outputs": [
{
"name": "output-0",
"datatype": "INT32",
"shape": [
"2"
],
"contents": {
"intContents": [
1,
1
]
}
}
]
}
Response trailers received:
(empty)
Sent 1 request and received 1 response