Safer Rolling Update for Docker Applications with Kubernetes

Introduction

Docker containers have enabled applications to be developed and deployed faster due to its superior portability between development and production environment and across multiple clouds. Enterprises have started adopting Docker with microservices architecture for their new applications. One of the emerging requirements apparent among these early adopters is the need to update versions of the microservices in production seamlessly.  To address this need, container orchestration systems like Kubernetes, Docker Swarm, Mesos, and Nomad has introduced support for rolling updates.  In a rolling update (Figure 1), applications get updated with zero downtime by incrementally updating pods/container instances with the new version.

Figure 1: Kubernetes Rolling Update (Source: Kubernetes.io)

In this blog, we will discuss how to achieve safer rolling update of Docker applications in a Kubernetes environment through automated in-depth analysis of the microservices updates.

Need for Automated Analysis of Rolling Updates

Global continuous delivery tools like Spinnaker has allowed for shorter delivery times by automating orchestration of the release pipeline.  Deploying containers applications on a Kubernetes cluster is one of the most common deployment scenarios.  The inbuilt rolling update feature is the preferred microservices update method for Kubernetes based deployments. However, besides basic cursory liveness and readiness check of the container itself, there are no robust inbuilt mechanisms within Kubernetes rolling update feature to prevent bad deployments for application software issues. The additional validation responsibility falls on the DevOps team. 

While Kubernetes allows for rollback to the prior version if failures happen, but such broad exposure to production traffic to a faulty release could be business disruptive.  This strategy is akin to driving a fast car with no breaks but have insurance in case of an accident. Manual analysis of updated services after the rolling updates in production is error-prone and not scalable mainly if there are many microservices with lots of interactions and frequent updates.

Hence, it is critical to do an automated analysis of the new version real-time during the rolling update to ensure safety requirements of the deployment are met while maintaining the benefits of the rolling updates.

Achieving Safer Rolling Update with Kubernetes

Enterprises need a real-time automated system that performs analysis of new service version relative to the current production version as the Kubernetes rolls out the deployment of the latest version of the service.  If there is enough negative deviation along critical metrics,   the system needs to stop the rollout and restore the old version replica set.

OpsMx is a CI/CD analytics platform that performs automated real-time in-depth analysis of the new update across various risk dimensions such as architecture regressions, performance/scalability deviations and security and business SLA violations.  

OpsMx integrates with the Kubernetes cluster (Figure 2) to trigger the automated analysis of new rolling update and communicates back to the Kubernetes master through extensive health check to stop the rollout as necessary.  OpsMx continuously compares and analyzes the new version pods across the entire cluster till the successful rollout of the service or roll-back.  OpsMx container agent collects essential data for analysis.  OpsMx also integrates with any existing monitoring/logging tools (e.g., Datadog, Stackdriver, and Splunk)

Figure 2: OpsMx for Kubernetes Rolling Update

OpsMx provides release safety scores indicating the health of the new version as well as detailed diagnostics of the issues found during the rolling update analysis as shown in Figure 3

Fig 3; OpsMx Risk Assessment Report and Diagnostics for Kubernetes Rolling Update

OpsMx Solution Benefits

Validate and rollout service update with low risk to production: With the OpsMx real-time analysis,  validate the service update beyond the basic container health check.   OpsMx compares the new version of the service to the production baseline characteristics during the rolling update to calculate safety score which accurately reflect the risks of the latest version.  Hence, if the safety score is above the pass threshold, the OpsMx can be configured to allow the Kubernetes to do a complete rollout of the update with more certainty or terminate the rollout if the release fails the analysis.

  • Identify root cause of issues with the microservice update: OpsMx microservice update risk assessment report provides a very detailed sub-score for components of each microservice across various metrics group.  If there are any significant deviation or issues found between the latest version and the baseline version, OpsMx can be configured to stop the rollout. OpsMx automatically provides root cause analysis including offending code commit.  OpsMx does in-depth analysis including interactions between various microservices and transactions to narrow down the problematic microservice. OpsMx thus saves DevOps team’s time with the fully automated issue and root cause identification.
  • Automated, scalable and less error-prone: OpsMx microservice rolling update risk assessment is fully automated and can analyze 1000s of metrics for every update through integration into existing data monitoring and collection tools.  OpsMx can analyze known services or unknown new services.  OpsMx machine learns the microservice characteristics to evaluate the latest version of that service. The OpsMx solution is scalable, consistent and less error-prone providing DevOps engineers a very reliable method to perform rolling updates of Docker applications.

Summary

The OpsMx provides an effective data-driven automated real-time solution for validating new Docker application rolling update using Kubernetes.  The OpsMx solution integrates with Kubernetes for analysis during the rollout of the latest version. With the OpsMx solution, DevOps team can reliably validate and rollout microservices updates with low risk for deployment,  scale to validate multiple deployments a day, reduce the time for analysis and debugging issues and reduce human errors in release decisions for multiple microservices-based applications. Overall, OpsMx solution lowers business risk due to bad deployments and makes rolling update safer.

For more information or a free trial of OpsMx solution for Docker microservices rolling update using Kubernetes or Docker Swarm or Mesos or Nomad fill out the below form or email us at info@opsmx.com.



One Reply to “Safer Rolling Update for Docker Applications with Kubernetes”

Leave a Reply

Your email address will not be published. Required fields are marked *