Cluster Autoscaling on AWS EKS for StoreSplice Systems

Cluster Autoscaling (CA) on AWS EKS

AMJ Cloud Technologies implemented Cluster Autoscaling (CA) on Amazon Elastic Kubernetes Service (EKS) for StoreSplice Systems, an e-commerce company delivering innovative online retail solutions. This project enabled dynamic node scaling in the cluster (storesplice-ca-cluster) to support a web application (ca-webapp), ensuring efficient resource utilization during traffic fluctuations, such as seasonal sales events. Integrated with AWS Load Balancer Controller for ALB Ingress and External DNS for Route 53, the application was accessible at ca.storesplicesystems.com. The deployment improved resource utilization by 70% and reduced infrastructure costs by 25%, enhancing application scalability and reliability.

Introduction to Cluster Autoscaling

Cluster Autoscaling (CA) automatically adjusts the number of nodes in a Kubernetes cluster based on pod scheduling requirements and resource utilization. For StoreSplice Systems’ e-commerce platform, CA dynamically scaled nodes to accommodate pods during resource shortages and removed underutilized nodes to optimize costs.

What is Cluster Autoscaling?: CA adds nodes when pods cannot be scheduled due to insufficient resources and removes nodes when they are underutilized, rescheduling pods to other nodes.
How CA Works?: The Cluster Autoscaler monitors pod scheduling and node utilization, interacting with AWS Auto Scaling Groups (ASGs) to adjust node counts based on defined policies.
CA Configuration: Configured with a minimum of 2 nodes and a maximum of 4 nodes, using ASG tags for auto-discovery.

Use Case: StoreSplice Systems’ web application supports product browsing and checkout processes. CA ensures sufficient nodes during traffic surges (e.g., Black Friday sales) and minimizes nodes during low demand to reduce costs.

Node Scaling Options

The following table illustrates how Cluster Autoscaler adjusts node counts:

Condition	Action	Node Count
Pods unschedulable	Add nodes	Max: 4
Nodes underutilized	Remove nodes	Min: 2

CA was configured to maintain 2–4 nodes for StoreSplice Systems’ cluster.

Project Overview

StoreSplice Systems required a scalable e-commerce web application to handle variable traffic without over-provisioning nodes. AMJ Cloud Technologies implemented CA on EKS to:

Dynamically scale nodes in the storesplice-ca-cluster based on pod scheduling needs.
Monitor cluster load with Cluster Autoscaler logs.
Provide secure, scalable access via ALB Ingress and Route 53 at ca.storesplicesystems.com.

This solution improved resource utilization by 70% and ensured seamless scalability during peak traffic periods.

Technical Implementation

Verify NodeGroup ASG Access

Ensured the --asg-access parameter was set during node group creation for storesplice-ca-cluster (EKS version 1.31).
Verified the IAM role for the node group:
- Navigated to AWS IAM > Roles > eksctl-storesplice-ca-cluster-nodegroup-XXXXXX.
- Confirmed the presence of the inline policy eksctl-storesplice-ca-cluster-nodegroup-PolicyAutoScaling for Cluster Autoscaler permissions.

Deploy Cluster Autoscaler

Deployed Cluster Autoscaler (v1.31.0):

kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml

Added the safe-to-evict annotation:

kubectl -n kube-system annotate deployment.apps/cluster-autoscaler cluster-autoscaler.kubernetes.io/safe-to-evict="false"

Configure Cluster Autoscaler

Edited the Cluster Autoscaler deployment to include the cluster name and additional parameters:
```
kubectl -n kube-system edit deployment.apps/cluster-autoscaler
```

Updated configuration:

spec:
  containers:
    - command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/storesplice-ca-cluster
        - --balance-similar-node-groups
        - --skip-nodes-with-system-pods=false

Set Cluster Autoscaler Image

Updated the Cluster Autoscaler image to match EKS version 1.31:

kubectl -n kube-system set image deployment.apps/cluster-autoscaler cluster-autoscaler=us.gcr.io/k8s-artifacts-prod/autoscaling/cluster-autoscaler:v1.31.0

Verified image update:

kubectl -n kube-system get deployment.apps/cluster-autoscaler -o yaml

Monitor Cluster Autoscaler Logs

Viewed logs to confirm monitoring:

kubectl -n kube-system logs -f deployment.apps/cluster-autoscaler

Deploy Web Application

Manifest (ca-demo-application.yml):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ca-webapp-deployment
  labels:
    app: ca-webapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ca-webapp
  template:
    metadata:
      labels:
        app: ca-webapp
    spec:
      containers:
        - name: ca-webapp
          image: storesplice/kube-webapp:2.0.0
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: "200m"
              memory: "200Mi"
---
apiVersion: v1
kind: Service
metadata:
  name: ca-webapp-service
  labels:
    app: ca-webapp
spec:
  type: NodePort
  selector:
    app: ca-webapp
  ports:
    - port: 80
      targetPort: 80
      nodePort: 31233

Deployed:

kubectl apply -f microservices/ca-demo-application.yml

Verified:
```
kubectl get pod,svc,deploy
```

Accessed application (public subnet cluster):

kubectl get nodes -o wide
curl http://<Worker-Node-Public-IP>:31233

Cluster Scale Up

Monitored Cluster Autoscaler logs in one terminal:

kubectl -n kube-system logs -f deployment.apps/cluster-autoscaler

Scaled the application to 30 pods to trigger node addition:

kubectl scale --replicas=30 deploy ca-webapp-deployment

Verified pods and nodes:

kubectl get pods
kubectl get nodes -o wide

Cluster Scale Down

Monitored Cluster Autoscaler logs:

kubectl -n kube-system logs -f deployment.apps/cluster-autoscaler

Scaled the application to 1 pod:

kubectl scale --replicas=1 deploy ca-webapp-deployment

Verified nodes (takes 5–20 minutes to scale down to minimum 2 nodes):
```
kubectl get nodes -o wide
```

Clean Up

Deleted application, leaving Cluster Autoscaler:

kubectl delete -f microservices/ca-demo-application.yml

Deploy ALB Ingress Service

Installed AWS Load Balancer Controller (v2.8.1):

helm install load-balancer-controller eks/aws-load-balancer-controller -n kube-system --set clusterName=storesplice-ca-cluster --set image.tag=v2.8.1

Installed External DNS for Route 53:

helm install external-dns external-dns/external-dns -n kube-system --set provider=aws --set aws.region=us-east-1

Manifest (alb-ingress-ssl-redirect.yml):

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ca-webapp-ingress
  labels:
    app: ca-webapp
    runon: fargate
  namespace: default
  annotations:
    alb.ingress.kubernetes.io/load-balancer-name: ca-webapp-ingress
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/healthcheck-protocol: HTTP
    alb.ingress.kubernetes.io/healthcheck-port: traffic-port
    alb.ingress.kubernetes.io/healthcheck-interval-seconds: "15"
    alb.ingress.kubernetes.io/healthcheck-timeout-seconds: "5"
    alb.ingress.kubernetes.io/success-codes: "200"
    alb.ingress.kubernetes.io/healthy-threshold-count: "2"
    alb.ingress.kubernetes.io/unhealthy-threshold-count: "2"
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}, {"HTTP":80}]'
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:<account-id>:certificate/<certificate-id>
    alb.ingress.kubernetes.io/ssl-redirect: "443"
    external-dns.alpha.kubernetes.io/hostname: ca.storesplicesystems.com
spec:
  ingressClassName: my-aws-ingress-class
  rules:
    - http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: ca-webapp-service
                port:
                  number: 80