Getting Started with Argo Rollouts and AWS ALB

2023-12-14

6 minute read

This post will get you started using Argo Rollouts and ALB on AWS with the Terraform code to install Rollouts and an example app to demonstrate a canary deployment. All the referenced Terraform code can be obtained here.

These are the providers that we’ll be using in the environment. You may need to adjust how the helm and kubectl providers are getting the cluster name and token for your environment.

Providers/Versions

providers.tf

 1locals {
 2  env    = "sandbox"
 3  region = "us-east-1"
 4}
 5
 6provider "aws" {
 7  region = local.region
 8  default_tags {
 9    tags = {
10      env       = local.env
11      terraform = true
12    }
13  }
14}
15
16provider "helm" {
17  kubernetes {
18    host                   = module.eks-cluster.endpoint
19    cluster_ca_certificate = base64decode(module.eks-cluster.certificate)
20    exec {
21      api_version = "client.authentication.k8s.io/v1beta1"
22      # This requires the awscli to be installed locally where Terraform is executed
23      args        = ["eks", "get-token", "--cluster-name", module.eks-cluster.name]
24      command     = "aws"
25    }
26  }
27}

versions.tf

 1terraform {
 2  required_providers {
 3    aws = {
 4      source  = "hashicorp/aws"
 5      version = "~> 5.0"
 6    }
 7    helm = {
 8      source  = "hashicorp/helm"
 9      version = "~> 2.11.0"
10    }
11  }
12  required_version = "~> 1.5.7"
13}

Module

Initialize the module where needed. Here we’re installing Argo Rollouts to your K8s cluster through Helm and providing a values file through the templatefile function so we can have variable subsitution. In this demo, I’m using a public LB; however, it’s important to stick it behind an internal LB with access by VPN.

1module "argo_rollouts" {
2  source                = "../../modules/argo_rollouts"
3  name                  = "argo-rollouts"
4  env                   = local.env
5  region                = local.region
6  argo_rollouts_version = "2.32.7"
7  loadbalancer_dns      = module.public_loadbalancer.dns_name
8  fqdn                  = "argorollouts.sandbox.demo"
9}

Module files

main.tf

 1resource "helm_release" "argocd" {
 2  namespace        = "argo-rollouts"
 3  create_namespace = true
 4  name             = "argo-rollouts"
 5  repository       = "https://argoproj.github.io/argo-helm"
 6  chart            = "argo-rollouts"
 7  version          = var.argo_rollouts_version
 8  values = ["${templatefile("../../modules/argo_rollouts/files/values.yaml", {
 9    ENV     = var.env
10    FQDN    = var.fqdn
11    LB_NAME = "${var.env}-public-application"
12  })}"]
13}

In this values file, we’re enabling the dashboard and using the ALB controller for the ingress. This example is using a shared LB by setting the “group.name” annotation and also take note of the node affinity to my core node group since we don’t want these pods shifted to nodes managed by Karpenter.

values.yaml

 1dashboard:
 2  enabled: true
 3  ingress:
 4    enabled: true
 5    ingressClassName: alb
 6    hosts:
 7      - ${FQDN}
 8    annotations:
 9      alb.ingress.kubernetes.io/backend-protocol: HTTP
10      alb.ingress.kubernetes.io/group.name: ${LB_NAME}
11      alb.ingress.kubernetes.io/healthcheck-interval-seconds: "30"
12      alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
13      alb.ingress.kubernetes.io/load-balancer-attributes: routing.http2.enabled=true
14      alb.ingress.kubernetes.io/load-balancer-name: ${LB_NAME}
15      alb.ingress.kubernetes.io/scheme: internet-facing
16      alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-FS-1-2-2019-08
17      alb.ingress.kubernetes.io/tags: "env=${ENV},terraform=true"
18      alb.ingress.kubernetes.io/target-type: ip
19  affinity:
20    nodeAffinity:
21      requiredDuringSchedulingIgnoredDuringExecution:
22        nodeSelectorTerms:
23        - matchExpressions:
24          - key: role
25            operator: In
26            values:
27            - core
28controller:
29  affinity:
30    nodeAffinity:
31      requiredDuringSchedulingIgnoredDuringExecution:
32        nodeSelectorTerms:
33        - matchExpressions:
34          - key: role
35            operator: In
36            values:
37            - core

variables.tf

 1variable "argo_rollouts_version" {
 2  type = string
 3}
 4variable "env" {
 5  type = string
 6}
 7variable "fqdn" {
 8  type = string
 9}
10variable "loadbalancer_dns" {
11  type = string
12}
13variable "name" {
14  type = string
15}
16variable "region" {
17  type = string
18}

Change this to your DNS provider.

dns.tf

1resource "cloudflare_record" "argocd" {
2  zone_id         = "your_zone_id"
3  name            = "argorollouts.${var.env}"
4  value           = var.loadbalancer_dns
5  type            = "CNAME"
6  ttl             = 3600
7  allow_overwrite = true
8}

Demo App

Once it’s installed to your K8s cluster, you should be able to reach the Argo Rollouts dashboard. You won’t see anything yet until we deploy an app with the rollout CRD. For this demo, I’m going to use Argo’s demo app since it has a nifty UI that shows the canary deployment steps in real time.

The rollout CRD is replacing our deployment manifest and most of structure is the same under “template”. There are few things I want to point out:

We’re specifying two services that you will create in the next steps.
The rollbackWindow setting will tell it how many revisions that we can instantly rollback to.
The steps section can be more or less depending on your needs. The durations are short here for demo purposes.
We’re setting ALB as our traffic routing mechanism. This will modify our LB rule with 2 weights for our canary and stable (original) target groups.

rollout.yaml

 1apiVersion: argoproj.io/v1alpha1
 2kind: Rollout
 3metadata:
 4  name: rollouts-demo
 5  namespace: demo
 6spec:
 7  strategy:
 8    canary:
 9      canaryService: rollouts-demo-canary
10      stableService: rollouts-demo-stable
11      maxSurge: "25%"
12      maxUnavailable: 0
13      dynamicStableScale: true
14      trafficRouting:
15        alb:
16          ingress: rollouts-demo-ingress
17          servicePort: 80
18      steps:
19      - setWeight: 5
20      - pause: { duration: 30s }
21      - setWeight: 10
22      - pause: { duration: 30s }
23      - setWeight: 15
24      - pause: { duration: 30s }
25      - setWeight: 25
26      - pause: { duration: 30s }
27  rollbackWindow:
28    revisions: 3
29  revisionHistoryLimit: 5
30  selector:
31    matchLabels:
32      app: rollouts-demo
33  template:
34    metadata:
35      labels:
36        app: rollouts-demo
37    spec:
38      containers:
39      - name: rollouts-demo
40        image: argoproj/rollouts-demo:blue
41        ports:
42        - name: http
43          containerPort: 8080
44          protocol: TCP
45        resources:
46          requests:
47            memory: 32Mi
48            cpu: 5m

Only need this if you already use HPA with a key part that will tell HPA to modify the rollout CRD.

hpa.yaml

 1apiVersion: autoscaling/v1
 2kind: HorizontalPodAutoscaler
 3metadata:
 4  name: rollouts-demo
 5  namespace: demo
 6spec:
 7  maxReplicas: 6
 8  minReplicas: 2
 9  scaleTargetRef:
10    apiVersion: argoproj.io/v1alpha1
11    kind: Rollout
12    name: rollouts-demo
13  targetCPUUtilizationPercentage: 80

Here we’re creating two services to differentiate between releases. Important for the ALB controller and separate target groups.

services.yaml

 1apiVersion: v1
 2kind: Service
 3metadata:
 4  name: rollouts-demo-canary
 5  namespace: demo
 6spec:
 7  type: ClusterIP
 8  ports:
 9  - port: 80
10    targetPort: http
11    protocol: TCP
12    name: http
13  selector:
14    app: rollouts-demo
15---
16apiVersion: v1
17kind: Service
18metadata:
19  name: rollouts-demo-stable
20  namespace: demo
21spec:
22  type: ClusterIP
23  ports:
24  - port: 80
25    targetPort: http
26    protocol: TCP
27    name: http
28  selector:
29    app: rollouts-demo

The hostname will need to be modified for your environment. In this example, I’m using an existing public ALB with SSL. My post on setting up a ALB can be seen here.

ingress.yaml

 1apiVersion: networking.k8s.io/v1
 2kind: Ingress
 3metadata:
 4  name: rollouts-demo-ingress
 5  namespace: demo
 6  annotations:
 7    kubernetes.io/ingress.class: alb
 8    alb.ingress.kubernetes.io/backend-protocol: HTTP
 9    alb.ingress.kubernetes.io/group.name: sandbox-public-application
10    alb.ingress.kubernetes.io/healthcheck-interval-seconds: "30"
11    alb.ingress.kubernetes.io/healthcheck-path: /
12    alb.ingress.kubernetes.io/healthcheck-port: "8080"
13    alb.ingress.kubernetes.io/healthcheck-protocol: HTTP
14    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
15    alb.ingress.kubernetes.io/load-balancer-name: sandbox-public-application
16    alb.ingress.kubernetes.io/scheme: internet-facing
17    alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-FS-1-2-2019-08
18    alb.ingress.kubernetes.io/tags: environment=sandbox,service=networking
19    alb.ingress.kubernetes.io/target-type: ip
20    alb.ingress.kubernetes.io/ssl-redirect: '443'
21spec:
22  rules:
23  - http:
24      paths:
25      - path: /
26        pathType: Prefix
27        backend:
28          service:
29            name: rollouts-demo-stable
30            port:
31              name: use-annotation
32    host: demo.sandbox.demo

Once the demo app is up and running, you should see a very cool UI that continuously sends traffic to show which app version is active. Also, if you check out your ALB ruleset, you should see it pointing to two target groups; one with 100% of traffic and the other 0%.

The Argo Rollout dashboard should also show the app under the namespace it was deployed to. There you can see the list of steps and once we start the deploy, it will show the progress in realtime.

Now the fun begins with seeing it in action. Change “argoproj/rollouts-demo” image tag in the rollouts manifest from “blue” to “yellow” and save. You will see the demo app start to show traffic being sent to the yellow version. If you look at your ALB ruleset you will see the weight rules being changed at each step. After all the steps complete, 100% of traffic will be sent to the latest version.

In a future post, I will demonstrate how to use the analysis feature that can query prometheus metrics for 500’s and rollback automatically if it reaches the threshold.