Skip to main content
Nauman Munir
Back to Projects
PortfolioE-commerceManaged KubernetesMulti Cloud Strategy & Architecture

Cluster Autoscaling on AWS EKS for StoreSplice Systems

AMJ Cloud Technologies deployed Cluster Autoscaling (CA) on AWS EKS for StoreSplice Systems, enabling dynamic node scaling for an e-commerce web application to handle variable traffic, integrated with AWS Load Balancer Controller and Route 53 for optimal performance.

6 min read

Technologies

AWS EKSCluster AutoscalerAWS Load Balancer ControllerKubernetes IngressExternal DNSAWS Route 53AWS Certificate ManagerDocker

Cluster Autoscaling (CA) on AWS EKS

AMJ Cloud Technologies implemented Cluster Autoscaling (CA) on Amazon Elastic Kubernetes Service (EKS) for StoreSplice Systems, an e-commerce company delivering innovative online retail solutions. This project enabled dynamic node scaling in the cluster (storesplice-ca-cluster) to support a web application (ca-webapp), ensuring efficient resource utilization during traffic fluctuations, such as seasonal sales events. Integrated with AWS Load Balancer Controller for ALB Ingress and External DNS for Route 53, the application was accessible at ca.storesplicesystems.com. The deployment improved resource utilization by 70% and reduced infrastructure costs by 25%, enhancing application scalability and reliability.

Introduction to Cluster Autoscaling

Cluster Autoscaling (CA) automatically adjusts the number of nodes in a Kubernetes cluster based on pod scheduling requirements and resource utilization. For StoreSplice Systems’ e-commerce platform, CA dynamically scaled nodes to accommodate pods during resource shortages and removed underutilized nodes to optimize costs.

  • What is Cluster Autoscaling?: CA adds nodes when pods cannot be scheduled due to insufficient resources and removes nodes when they are underutilized, rescheduling pods to other nodes.
  • How CA Works?: The Cluster Autoscaler monitors pod scheduling and node utilization, interacting with AWS Auto Scaling Groups (ASGs) to adjust node counts based on defined policies.
  • CA Configuration: Configured with a minimum of 2 nodes and a maximum of 4 nodes, using ASG tags for auto-discovery.

Use Case: StoreSplice Systems’ web application supports product browsing and checkout processes. CA ensures sufficient nodes during traffic surges (e.g., Black Friday sales) and minimizes nodes during low demand to reduce costs.

Node Scaling Options

The following table illustrates how Cluster Autoscaler adjusts node counts:

ConditionActionNode Count
Pods unschedulableAdd nodesMax: 4
Nodes underutilizedRemove nodesMin: 2

CA was configured to maintain 2–4 nodes for StoreSplice Systems’ cluster.

Project Overview

StoreSplice Systems required a scalable e-commerce web application to handle variable traffic without over-provisioning nodes. AMJ Cloud Technologies implemented CA on EKS to:

  • Dynamically scale nodes in the storesplice-ca-cluster based on pod scheduling needs.
  • Monitor cluster load with Cluster Autoscaler logs.
  • Provide secure, scalable access via ALB Ingress and Route 53 at ca.storesplicesystems.com.

This solution improved resource utilization by 70% and ensured seamless scalability during peak traffic periods.

Technical Implementation

Verify NodeGroup ASG Access

  • Ensured the --asg-access parameter was set during node group creation for storesplice-ca-cluster (EKS version 1.31).
  • Verified the IAM role for the node group:
    • Navigated to AWS IAM > Roles > eksctl-storesplice-ca-cluster-nodegroup-XXXXXX.
    • Confirmed the presence of the inline policy eksctl-storesplice-ca-cluster-nodegroup-PolicyAutoScaling for Cluster Autoscaler permissions.

Deploy Cluster Autoscaler

  • Deployed Cluster Autoscaler (v1.31.0):
    kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml
  • Added the safe-to-evict annotation:
    kubectl -n kube-system annotate deployment.apps/cluster-autoscaler cluster-autoscaler.kubernetes.io/safe-to-evict="false"

Configure Cluster Autoscaler

  • Edited the Cluster Autoscaler deployment to include the cluster name and additional parameters:
    kubectl -n kube-system edit deployment.apps/cluster-autoscaler
  • Updated configuration:
    spec:
      containers:
        - command:
            - ./cluster-autoscaler
            - --v=4
            - --stderrthreshold=info
            - --cloud-provider=aws
            - --skip-nodes-with-local-storage=false
            - --expander=least-waste
            - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/storesplice-ca-cluster
            - --balance-similar-node-groups
            - --skip-nodes-with-system-pods=false

Set Cluster Autoscaler Image

  • Updated the Cluster Autoscaler image to match EKS version 1.31:
    kubectl -n kube-system set image deployment.apps/cluster-autoscaler cluster-autoscaler=us.gcr.io/k8s-artifacts-prod/autoscaling/cluster-autoscaler:v1.31.0
  • Verified image update:
    kubectl -n kube-system get deployment.apps/cluster-autoscaler -o yaml

Monitor Cluster Autoscaler Logs

  • Viewed logs to confirm monitoring:
    kubectl -n kube-system logs -f deployment.apps/cluster-autoscaler

Deploy Web Application

  • Manifest (ca-demo-application.yml):
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: ca-webapp-deployment
      labels:
        app: ca-webapp
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: ca-webapp
      template:
        metadata:
          labels:
            app: ca-webapp
        spec:
          containers:
            - name: ca-webapp
              image: storesplice/kube-webapp:2.0.0
              ports:
                - containerPort: 80
              resources:
                requests:
                  cpu: "200m"
                  memory: "200Mi"
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: ca-webapp-service
      labels:
        app: ca-webapp
    spec:
      type: NodePort
      selector:
        app: ca-webapp
      ports:
        - port: 80
          targetPort: 80
          nodePort: 31233
  • Deployed:
    kubectl apply -f microservices/ca-demo-application.yml
  • Verified:
    kubectl get pod,svc,deploy
  • Accessed application (public subnet cluster):
    kubectl get nodes -o wide
    curl http://<Worker-Node-Public-IP>:31233

Cluster Scale Up

  • Monitored Cluster Autoscaler logs in one terminal:
    kubectl -n kube-system logs -f deployment.apps/cluster-autoscaler
  • Scaled the application to 30 pods to trigger node addition:
    kubectl scale --replicas=30 deploy ca-webapp-deployment
  • Verified pods and nodes:
    kubectl get pods
    kubectl get nodes -o wide

Cluster Scale Down

  • Monitored Cluster Autoscaler logs:
    kubectl -n kube-system logs -f deployment.apps/cluster-autoscaler
  • Scaled the application to 1 pod:
    kubectl scale --replicas=1 deploy ca-webapp-deployment
  • Verified nodes (takes 5–20 minutes to scale down to minimum 2 nodes):
    kubectl get nodes -o wide

Clean Up

  • Deleted application, leaving Cluster Autoscaler:
    kubectl delete -f microservices/ca-demo-application.yml

Deploy ALB Ingress Service

  • Installed AWS Load Balancer Controller (v2.8.1):
    helm install load-balancer-controller eks/aws-load-balancer-controller -n kube-system --set clusterName=storesplice-ca-cluster --set image.tag=v2.8.1
  • Installed External DNS for Route 53:
    helm install external-dns external-dns/external-dns -n kube-system --set provider=aws --set aws.region=us-east-1
  • Manifest (alb-ingress-ssl-redirect.yml):
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: ca-webapp-ingress
      labels:
        app: ca-webapp
        runon: fargate
      namespace: default
      annotations:
        alb.ingress.kubernetes.io/load-balancer-name: ca-webapp-ingress
        alb.ingress.kubernetes.io/scheme: internet-facing
        alb.ingress.kubernetes.io/healthcheck-protocol: HTTP
        alb.ingress.kubernetes.io/healthcheck-port: traffic-port
        alb.ingress.kubernetes.io/healthcheck-interval-seconds: "15"
        alb.ingress.kubernetes.io/healthcheck-timeout-seconds: "5"
        alb.ingress.kubernetes.io/success-codes: "200"
        alb.ingress.kubernetes.io/healthy-threshold-count: "2"
        alb.ingress.kubernetes.io/unhealthy-threshold-count: "2"
        alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}, {"HTTP":80}]'
        alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:<account-id>:certificate/<certificate-id>
        alb.ingress.kubernetes.io/ssl-redirect: "443"
        external-dns.alpha.kubernetes.io/hostname: ca.storesplicesystems.com
    spec:
      ingressClassName: my-aws-ingress-class
      rules:
        - http:
            paths:
              - path: /
                pathType: Prefix
                backend:
                  service:
                    name: ca-webapp-service
                    port:
                      number: 80
  • Deployed:
    kubectl apply -f microservices/alb-ingress-ssl-redirect.yml

Technical Highlights

  • Dynamic Node Scaling: CA adjusted nodes between 2 and 4 based on pod scheduling needs, improving resource utilization by 70%.
  • Cost Efficiency: Reduced infrastructure costs by 25% by removing underutilized nodes.
  • Secure Access: Implemented ALB Ingress with HTTPS and Route 53 DNS automation for ca.storesplicesystems.com.
  • EKS Efficiency: Leveraged EKS (version 1.31) for managed Kubernetes, simplifying cluster management.

Client Impact

For StoreSplice Systems, Cluster Autoscaling ensured the e-commerce web application scaled seamlessly during traffic surges, improving resource utilization by 70% and reducing response times by 50%. The solution lowered costs by 25% and supported StoreSplice Systems’ expansion in the competitive e-commerce market.

Technologies Used

  • AWS EKS
  • Cluster Autoscaler
  • AWS Load Balancer Controller
  • Kubernetes Ingress
  • External DNS
  • AWS Route 53
  • AWS Certificate Manager
  • Docker

Need a Similar Solution?

I can help you design and implement similar cloud infrastructure and DevOps solutions for your organization.