Skip to content

Open Inference Protocol API Specification

REST

GRPC

ServerLive

The ServerLive API indicates if the inference server is able to receive and respond to metadata and inference requests.

rpc inference.GRPCInferenceService/ServerLive(ServerLiveRequest) returns ServerLiveResponse

ServerReady

The ServerReady API indicates if the server is ready for inferencing.

rpc inference.GRPCInferenceService/ServerReady(ServerReadyRequest) returns ServerReadyResponse

ModelReady

The ModelReady API indicates if a specific model is ready for inferencing.

rpc inference.GRPCInferenceService/ModelReady(ModelReadyRequest) returns ModelReadyResponse

ServerMetadata

The ServerMetadata API provides information about the server. Errors are indicated by the google.rpc.Status returned for the request. The OK code indicates success and other codes indicate failure.

rpc inference.GRPCInferenceService/ServerMetadata(ServerMetadataRequest) returns ServerMetadataResponse

ModelMetadata

The per-model metadata API provides information about a model. Errors are indicated by the google.rpc.Status returned for the request. The OK code indicates success and other codes indicate failure.

rpc inference.GRPCInferenceService/ModelMetadata(ModelMetadataRequest) returns ModelMetadataResponse

ModelInfer

The ModelInfer API performs inference using the specified model. Errors are indicated by the google.rpc.Status returned for the request. The OK code indicates success and other codes indicate failure.

rpc inference.GRPCInferenceService/ModelInfer(ModelInferRequest) returns ModelInferResponse


Messages

InferParameter

An inference parameter value. The Parameters message describes a “name”/”value” pair, where the “name” is the name of the parameter and the “value” is a boolean, integer, or string corresponding to the parameter.

Field Type Description
oneof parameter_choice.bool_param bool A boolean parameter value.
oneof parameter_choice.int64_param int64 An int64 parameter value.
oneof parameter_choice.string_param string A string parameter value.

InferTensorContents

The data contained in a tensor represented by the repeated type that matches the tensor's data type. Protobuf oneof is not used because oneofs cannot contain repeated fields.

Field Type Description
bool_contents repeated bool Representation for BOOL data type. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.
int_contents repeated int32 Representation for INT8, INT16, and INT32 data types. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.
int64_contents repeated int64 Representation for INT64 data types. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.
uint_contents repeated uint32 Representation for UINT8, UINT16, and UINT32 data types. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.
uint64_contents repeated uint64 Representation for UINT64 data types. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.
fp32_contents repeated float Representation for FP32 data type. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.
fp64_contents repeated double Representation for FP64 data type. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.
bytes_contents repeated bytes Representation for BYTES data type. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.

ModelInferRequest

Field Type Description
model_name string The name of the model to use for inferencing.
model_version string The version of the model to use for inference. If not given the server will choose a version based on the model and internal policy.
id string Optional identifier for the request. If specified will be returned in the response.
parameters map ModelInferRequest.ParametersEntry Optional inference parameters.
inputs repeated ModelInferRequest.InferInputTensor The input tensors for the inference.
outputs repeated ModelInferRequest.InferRequestedOutputTensor The requested output tensors for the inference. Optional, if not specified all outputs produced by the model will be returned.
raw_input_contents repeated bytes The data contained in an input tensor can be represented in "raw" bytes form or in the repeated type that matches the tensor's data type. To use the raw representation 'raw_input_contents' must be initialized with data for each tensor in the same order as 'inputs'. For each tensor, the size of this content must match what is expected by the tensor's shape and data type. The raw data must be the flattened, one-dimensional, row-major order of the tensor elements without any stride or padding between the elements. Note that the FP16 and BF16 data types must be represented as raw content as there is no specific data type for a 16-bit float type.

If this field is specified then InferInputTensor::contents must not be specified for any input tensor. |

ModelInferRequest.InferInputTensor

An input tensor for an inference request.

Field Type Description
name string The tensor name.
datatype string The tensor data type.
shape repeated int64 The tensor shape.
parameters map ModelInferRequest.InferInputTensor.ParametersEntry Optional inference input tensor parameters.
contents InferTensorContents The tensor contents using a data-type format. This field must not be specified if "raw" tensor contents are being used for the inference request.

ModelInferRequest.InferInputTensor.ParametersEntry

Field Type Description
key string N/A
value InferParameter N/A

ModelInferRequest.InferRequestedOutputTensor

An output tensor requested for an inference request.

Field Type Description
name string The tensor name.
parameters map ModelInferRequest.InferRequestedOutputTensor.ParametersEntry Optional requested output tensor parameters.

ModelInferRequest.InferRequestedOutputTensor.ParametersEntry

Field Type Description
key string N/A
value InferParameter N/A

ModelInferRequest.ParametersEntry

Field Type Description
key string N/A
value InferParameter N/A

ModelInferResponse

Field Type Description
model_name string The name of the model used for inference.
model_version string The version of the model used for inference.
id string The id of the inference request if one was specified.
parameters map ModelInferResponse.ParametersEntry Optional inference response parameters.
outputs repeated ModelInferResponse.InferOutputTensor The output tensors holding inference results.
raw_output_contents repeated bytes The data contained in an output tensor can be represented in "raw" bytes form or in the repeated type that matches the tensor's data type. To use the raw representation 'raw_output_contents' must be initialized with data for each tensor in the same order as 'outputs'. For each tensor, the size of this content must match what is expected by the tensor's shape and data type. The raw data must be the flattened, one-dimensional, row-major order of the tensor elements without any stride or padding between the elements. Note that the FP16 and BF16 data types must be represented as raw content as there is no specific data type for a 16-bit float type.

If this field is specified then InferOutputTensor::contents must not be specified for any output tensor. |

ModelInferResponse.InferOutputTensor

An output tensor returned for an inference request.

Field Type Description
name string The tensor name.
datatype string The tensor data type.
shape repeated int64 The tensor shape.
parameters map ModelInferResponse.InferOutputTensor.ParametersEntry Optional output tensor parameters.
contents InferTensorContents The tensor contents using a data-type format. This field must not be specified if "raw" tensor contents are being used for the inference response.

ModelInferResponse.InferOutputTensor.ParametersEntry

Field Type Description
key string N/A
value InferParameter N/A

ModelInferResponse.ParametersEntry

Field Type Description
key string N/A
value InferParameter N/A

ModelMetadataRequest

Field Type Description
name string The name of the model.
version string The version of the model to check for readiness. If not given the server will choose a version based on the model and internal policy.

ModelMetadataResponse

Field Type Description
name string The model name.
versions repeated string The versions of the model available on the server.
platform string The model's platform. See Platforms.
inputs repeated ModelMetadataResponse.TensorMetadata The model's inputs.
outputs repeated ModelMetadataResponse.TensorMetadata The model's outputs.

ModelMetadataResponse.TensorMetadata

Metadata for a tensor.

Field Type Description
name string The tensor name.
datatype string The tensor data type.
shape repeated int64 The tensor shape. A variable-size dimension is represented by a -1 value.

ModelReadyRequest

Field Type Description
name string The name of the model to check for readiness.
version string The version of the model to check for readiness. If not given the server will choose a version based on the model and internal policy.

ModelReadyResponse

Field Type Description
ready bool True if the model is ready, false if not ready.

ServerLiveRequest

ServerLiveResponse

Field Type Description
live bool True if the inference server is live, false if not live.

ServerMetadataRequest

ServerMetadataResponse

Field Type Description
name string The server name.
version string The server version.
extensions repeated string The extensions supported by the server.

ServerReadyRequest

ServerReadyResponse

Field Type Description
ready bool True if the inference server is ready, false if not ready.
Back to top