This is the second post in a series about our ML serving platform CLOps, where we will take a detailed look at its core components.

In the last post, we mentioned that there are five elements required for serving a model: Core components, computing resource management, model extensions, model operation management, and interfaces. Core components serve as the basis for connecting and extending the remaining four elements, and they are also the topic of this post.

The role of core components

Core components take on the important role of 'automatically extending, deploying, and controlling various features without modifying code'. To reiterate, the main points of this role are as follows.

Does not modify code.
Can automatically extend various features without much manual control.
Can automatically deploy and control applications.

First, "Does not modify code". Core components must be able to deploy model applications without modifying the implementation that was planned for deployment. In other words, the modeler must be able to focus on the model application's logic without additional work.

Second, "Can automatically extend various features without much manual control". Core components must be able to easily operate models and extend their features. This includes adding additional model features such as synchronous or asynchronous model serving, collecting logs and visualizing indicators for operations, and automated health checks that don't require the attention of a human operator.

Third, "Can automatically deploy and control applications". Core components must be able to easily deploy and control the aforementioned two types through a deployment manifest file. In other words, core components must be able to simplify deployment and control methods and replicate them with a declarative manifest.

To achieve the three goals above, we used an operator pattern to extend our Kubernetes controller and an executor that uses a sidecar pattern for extending and controlling additional application features. When using an operator pattern, you can deploy and control applications using declarative interfaces such as YAML, just like how you would use Kubernetes. Using an executor that uses a sidecar pattern also allows you to inject executors before deployment, allowing you to intermediately receive model traffic and easily add middleware logic and make your implementation more extendible.

Operator

An operator is in charge of core tasks such as the deployment of model application through CLOps and automating feature extensions.

Operator patterns are one of the basic Kubernetes mechanisms. Anyone can extend Kubernetes features using one of these patterns. This is all thanks to Kubernetes specifications being based on declarative objects. A declarative approach allows operators to send the desired application deployment status according to declared specifications. By comparing the current status and requirements, you can keep idempotency while also performing the logic based on the desired status.

Operators normally define new requirements and then use them to perform the logic. The defined specifications are called CRD (Custom Resource Definition), while the submitted specifications according to the defined requirements is called CR (Custom Resource).

Process of validating operator default values and modifications — Figure 2. Operator validation process

Operators perform logic in four steps. As you can see in [Figure 2], the initial steps are as follows: 1) Fill the CR default value or perform modification (Kubernetes Change Manager), 2) Validate CR using Open API specifications (Open API Validator), 3) Collect and validate fields based on additional CR information (Kubernetes Validation Manager). Once the initial three steps are complete, the data is sent to the operator as seen in [Figure 1], and then the logic performs step 4) Repeatedly reconcile the validated CR (Reconcile discrepancies).

CLOps uses this structure to modify or validate and finally deploy a modeler's model application CR.

Modelers must use a CRD named CLOpsDeployment according to specifications in order to deploy model applications. This single CR was designed to handle and deploy everything from model registry info, model deployment info, and many more features including extension configuration.

In order to use various configurations and features in a simple but extendable manner, we structured our hierarchy by placing the CR CLOpsDeployment at the top, with ModelDeployment below it, and AsyncAPI below that. As a result, we can use the topmost CR top easily define requirements, flexibly scale up or down CR internally, and simplify reconciliation logic for each feature.

When modelers use the CLOpsDeployment CR to submit requirements such as model registry and model code, we reconcile by modifying and validating each field through the aforementioned process. During this process, operators inject init containers that automatically download model parameters from the model registry. Once the init containers finish the initialization process, we use sidecar to inject executors and generate service objects that will direct all traffic to the executor first.

Executor

As explained above, once executors are automatically deployed along with the model application using sidecar, model traffic is handled by these executors as a top priority. This structure allows modelers to extend middleware in the front-end without having to modify source code. In a sense, executors are the second most important compared to operators.

When operators inject executors, the information required from model CR is injected through ConfigMap. ConfigMap contains information such as which features on the model application should be extended, and which ports should go through a proxy. Executors parse this information to initialize the information required for configuration and then extends features. For example, you could add middleware that can provide model indicators such as request and response latency, or you could change model serving methods to asynchronous by adding a queueing middleware.

When adding middleware like this, the traffic is directly sent to the model application, allowing modelers to focus on model logic without having to consider what kind of features are being added to the model application.

Considerations for backwards compatibility and serving stability

Our structure of automating the model application serving process using a single CR and extending features using an executor was magical at first. However, we had many options to consider when it came to backwards compatibility and serving stability.

Backwards compatibility

Operators are greatly affected by the side effects from changing CRDs. Therefore, it's crucial to change API versions when there is a change to the specifications and reconciliation logic from the existing CRD. For example, you can ensure backwards compatibility when changing API specifications by splitting the endpoint into api/v1 and api/v2.

Kubernetes allows you to split API versions for every API group on a CRD. However, splitting versions also increase points of maintenance. This is why Kubernetes also provides the "hub and spoke" method that allows you to set a certain version to act as a hub, and then convert CRs that use different versions according to the specifications of the hub CR so that you can process everything with a single reconciliation logic.

We are constantly adding and removing specifications on CLOps. Backwards compatibility is crucial when changes to CRD are frequent.

Serving stability

As mentioned above, operator logic is set to attempt to maintain the desired status when the desired status and current status show a discrepancy. While this reconciliation logic is triggered by a modeler trying to change the desired status most of the time, it can also be caused by an upgrade to an operator. In other words, the logic can be triggered by a patch that was run by an error in operator logic or additional logic deployment. The new version of the operator deployment can cause a discrepancy between the already deployed model application's desired status and current status, which will then cause an undesired redeployment of the model application.

We separated changes that could cause an undesired redeployment from those that won't. In cases where change was inevitable, we carefully considered the side effects and minimized the impact these side effects would have on the service. Securing serving stability is also one of the main points of a core component.

How shall we extend based on our core components?

In this post, we looked at the core components of CLOps, and what were the main considerations. Core components may have simple operation methods, they are important components and mechanisms that can be extended to add numerous features. Are you curious about how we used this basic mechanism to extend CLOps? In our next post, we will be taking a look at how you can efficiently manage CR, and what kind of challenges we had to overcome to do so.

CLOVA Engineering blog

CLOps - The heart of the platform