How to Enable Prometheus Monitoring for Kubernetes Cluster

Prometheus

Prometheus, an open-source monitoring, and alerting toolkit is becoming the most popular solution for monitoring Kubernetes cluster for Spinnaker users or otherwise.   To perform manual or automated Red/Black or Canary analysis, having a useful monitoring solution is essential.

In this blog,  we will provide instructions on how to enable Prometheus monitoring for your Kubernetes cluster

Background:

Prometheus is by default a “pull-based” metrics system.  The monitoring node needs to expose the metrics on “/metrics” endpoint (http://IP:9090/metrics), and the Prometheus server accesses the node’s “/metrics” endpoint on regular intervals (jobs) to collect the metrics. Node’s endpoint URLs should be configured on the Prometheus server for the server to access the nodes.

Enabling Prometheus Exporter on the Worker Nodes

To enable Prometheus exporter, launch cAdvisor container on every worker node in the cluster. For info information on cAdvisor, check out  https://github.com/google/cadvisor

sudo docker run \

  --volume=/:/rootfs:ro \

  --volume=/var/run:/var/run:rw \

  --volume=/sys:/sys:ro \

  --volume=/var/lib/docker/:/var/lib/docker:ro \

  --volume=/dev/disk/:/dev/disk:ro \

  --publish=9090:8080 \

  --detach=true \

  --name=cadvisor \

  --restart always \

  google/cadvisor:latest

The above command will launch cAdvisor container in the node and exposes metrics on http:<NODEIP>:9090/metrics.  Run the above command on all worker nodes.

Next step is to configure the node IPs in the Prometheus server to allow it to pull the POD metrics off the nodes.

Enabling Prometheus Server

Step 1: Download Prometheus server from https://prometheus.io/download/

Step 2: Extract the .tar file and configure “prometheus.yml”, refer and add below configuration with the node IPs (where the cAdvisor containers were enabled in the previous section) under “scrape_configs:” section.

scrape_configs: 

   - job_name: 'K8sCluster'

     honor_labels: true

     static_configs:

       - targets: ['NODE1 IP:9090']

       - targets: ['NODE2 IP:9090']

       - targets: ['NODE3 IP:9090']

Step 3: Start Prometheus server

nohup ./prometheus config.file=prometheus.yml &

Step 4:  Check status of exporters(Monitoring nodes) in “9090/target” endpoint

Prometheus Targets
Prometheus Targets

Step 5:  Check relevant metrics data

Prometheus Metric Graph
Prometheus Metric Graph

That’s it, you are done.  For more advanced configurations, check out the complete documentation at https://prometheus.io/docs/introduction/overview/

You can connect this Prometheus server to your Automated Canary Analysis system to able to perform automated release analysis.  If you need further assistance, email info@opsmx.com

Spinnaker Use Cases at Cisco, Intuit and Pure Storage

OpsMx along with Netflix Spinnaker team organized the second Spinnaker meetup in the South Bay.   We had more than 100 Spinnaker users who attended the event.   3 Spinnaker users  (Cisco, Intuit and Pure Storage) shared their use case, learnings and best practices.   Also, Andy Glover from the Netflix Spinnaker team took Q&A from the audience.

If you missed the event,  here are the videos from the event (Note: Intuit Use Case video is no longer available).   If you don’t want to miss future events,  sign up for the meetup group.

Cisco Use Case:

Cisco Spinnaker Use Case
Cisco Spinnaker Use Case

Intuit Use Case:

Intuit Spinnaker Use Case
Intuit Spinnaker Use Case

Pure Storage Use Case:

Pure Storage Spinnaker Use Case
Pure Storage Spinnaker Use Case

Community Q&A – Andy Glover

Spinnaker Community Q&A - Andy Glover
Spinnaker Community Q&A – Andy Glover

Automated Canary Analysis Featured in Spinnaker Summit 2017

250+ Spinnaker community members recently attended the 1st Spinnaker Summit at the Netflix campus in Los Gatos.   Enthusiasm for Spinnaker is high as enterprises look to achieve continuous delivery with safety across multiple clouds.   Users from Netflix, Schibsted, Cloudera, Under Armour,  Capital One, Cisco, Gogo Air, Target,  Scopely, and Cerner shared their use cases and best practices for running Spinnaker in production.

Andy Glover from Netflix started off the event encouraging the community to leverage the Spinnaker for more than as a multi-cloud deployment tool.  One of the key strategies he asked the enterprises to consider is Automated Canary Analysis (ACA)  to achieve safety for their continuous deployment.  Sure enough, a lot of time in the summit was dedicated to discussing the benefits of ACA  and its real-world use cases.  ACA allows for a reliable, less error-prone and safe continuous deployment.

In this blog, I have captured three separate sessions that one should watch to understand ACA for your application deployments better.   If you are interested in evaluating ACA,  please email us at info@opsmx.com and we will glad to provide you guidance and offer a pilot of our OpsMx ACA platform

1:  Canary Analysis at Netflix

2.  Automated Canary Analysis Panel with participation from OpsMx, Google, Netflix, and Armory.

3:  Kayenta: Automated Canary Analysis from Google & Netflix

If you are interested in watching all the summit sessions, check out the complete playlist.

 

Spinnaker MeetUp Video – Essence of and Safety for Continuous Delivery

Total solar eclipse was obviously the biggest news on 8/21/17 but in the Spinnaker world,  the day marked the first meetup for South Bay Spinnaker Meetup at the Netflix campus.

We had 50+ people show up including operators from Cloudera, Cisco, Symantec, etc. which was a great start for our first meetup.

We had two great presentations on the day. In case you missed the event, here are the videos of the talks.

1.  Essence of Continuous Delivery

Andy Glover – Manager, Netflix Engineering Team – Spinnaker Creators

2.  Safety for Continuous Delivery

Gopal Dommety, CEO, OpsMx

For more information about OpsMx solution, check out http://blog.opsmx.com

If you like to be notified of our future events, make sure you sign up for the Meetup.   If you like to speak in one of the future Meetup, please contact the organizers through the Meetup page.

Exciting Spinnaker Meetup on August 21st at NetFlix

If you are an existing Spinnaker user or someone looking to enable Continuous Delivery (CD) in your organization,  don’t miss out on an exciting meetup event happening on August 21st at the Netflix campus in Los Gatos, CA.

We have the Spinnaker founding team and “Father of Spinnaker” – Andy Glover of Netflix Engineering team presenting on the topic of Essence of Continuous Delivery.

If that’s not enough of a motivation, we also have session discussing use cases of Automated Canary Analysis (ACA) for Spinnaker.  As you may recall Canary/ACA stages were introduced recently in Spinnaker 1.1 release.  You may also be interested in our pilot of our automated canary judge.

More information about the meetup can be found here:

Essense of Continuous Deliver/Automated Canary Analysis Use Cases

If you can’t make it on the day, don’t fear we will record it and post it.   If you want to get notified of future events from the Meetup group, make sure to join the group.   We hope to schedule more events in the upcoming months and if you interested in being a speaker, drop a note to info@opsmx.com.

 

Participate in Pilot of Automated Canary Analysis Platform

OpsMx is looking for Spinnaker deployers interested in implementing automated deployment analysis for Canary/Red-Black deployments.  We are looking for high touch pilot engagement with you to help us design our product and in return, you will get to use our product at no cost during the pilot.

Automated Red/Black Analysis in Spinnaker Pipeline

For more information about the benefits and features of our automated release analysis for Spinnaker,  check out the following blog.

Improve Release Safety and Diagnostics Through Automated Canary Analysis for Spinnaker

The pilot is a full-featured SaaS offer of our platform for an unlimited number of pipelines in Spinnaker.  If you are interested in learning more or starting the pilot, submit the form below or e-mail us at info@opsmx.com and we will get you started.

 


Validate Software Releases Through Application Behavior Analysis

Introduction

Enterprises are moving to DevOps model and also adopting agile software development methodologies to accelerate the software deliveries.  As the velocity of software development increases,  the need to speed up the testing and validation also arises.  Enterprises usually do some combination of the following to validate new releases –  functional testing, integration testing,  end-to-end system testing,  user acceptance testing, accessibility testing, load or stress testing, security test and staging environment validation. Another need of these organizations is to find the issues earlier in the testing  cycle as the cost of finding issues later or in production is much higher (Figure 1)

Cost Rises Exponentially if Defects Found in Later Stages of Testing
Figure 1: Cost Rises Exponentially for Defects Found in Later Stages/Production

In this blog, we will look at ways enterprises can leverage automatic in-depth application behavioral based analysis to validate releases quicker, find defects early and spend less time diagnosing the issues.

Continue reading “Validate Software Releases Through Application Behavior Analysis”

Safer Rolling Update for Docker Applications with Kubernetes

Introduction

Docker containers have enabled applications to be developed and deployed faster due to its superior portability between development and production environment and across multiple clouds. Enterprises have started adopting Docker with microservices architecture for their new applications. One of the emerging requirements apparent among these early adopters is the need to seamlessly update versions of the microservices in production.  To address this need, container orchestration systems like Kubernetes, Docker Swarm, Mesos, and Nomad has introduced support for rolling updates.  In a rolling update (Figure 1), applications are updated with zero downtime by incrementally updating pods/container instances with the new version.

Figure 1: Kubernetes Rolling Update (Source: Kubernetes.io)

In this blog, we will discuss how to achieve safer rolling update of Docker applications in a Kubernetes environment through automated in-depth analysis of the microservices updates. Continue reading “Safer Rolling Update for Docker Applications with Kubernetes”

Improve Release Safety and Diagnostics Through Automated Canary Analysis for Spinnaker

Introduction

Spinnaker is a continuous delivery platform that is pioneering the ability to release software faster. It is allowing thousands of enterprises to achieve release velocity never seen before.  The key to increasing the velocity is to have the ability to determine with confidence that the new release can be promoted across different testing stages and eventually to production through Canary or Red/Black (aka blue-green) or Rolling update.  

Leading enterprises (like Netflix which deploys more than 4000 updates a day), has proprietary decision engine to allow them to promote builds to production with confidence.  However, most enterprises still are dependent on manual analysis and judgments to promote builds.  Manual judgment is error prone as decisions are based on incomplete analysis and are time-consuming as the analysis are laborious.  Bad builds in production introduce significant risks due to business disruptions and brand damage.

OpsMx is a real-time analytics platform for CI/CD pipelines that is designed to aid manual decision in promoting build across test and deployment to production. The OpsMx solution helps in reduce error and diagnostics time through complete, consistent real-time automated analysis for Spinnaker. Continue reading “Improve Release Safety and Diagnostics Through Automated Canary Analysis for Spinnaker”

How to Set Up Automated Release Analysis in Spinnaker Deployments

Updated 11/16/2017

Spinnaker, a continuous delivery platform is becoming popular among software deployment teams with its superior ability to orchestrate software delivery faster, safer and across multiple clouds. With the 1.1 release, Spinnaker has introduced the ability to do even safer release deployment with the Canary deploy option (using the canary feature flag). Before this release, Red/Black deployment offered safer deployment with the ability to rollback to the prior release, if the new release had issues.  Rolling Red/Black is also planned for a future release in Spinnaker. It is critical to enable automated canary analysis for the deployments to truly benefit from these safe deployment strategies.

In this blog, we will review how to set up automated analysis for Canary and Red/Black deployment strategies instead of manual judgment or verification.

Safe Deployment Options with Spinnaker
Safe Deployment Options with Spinnaker

Continue reading “How to Set Up Automated Release Analysis in Spinnaker Deployments”