Cluster Autoscaling on AWS EKS for StoreSplice Systems
AMJ Cloud Technologies deployed Cluster Autoscaling (CA) on AWS EKS for StoreSplice Systems, enabling dynamic node scaling for an e-commerce web application to handle variable traffic, integrated with AWS Load Balancer Controller and Route 53 for optimal performance.
Technologies
Cluster Autoscaling (CA) on AWS EKS
AMJ Cloud Technologies implemented Cluster Autoscaling (CA) on Amazon Elastic Kubernetes Service (EKS) for StoreSplice Systems, an e-commerce company delivering innovative online retail solutions. This project enabled dynamic node scaling in the cluster (storesplice-ca-cluster) to support a web application (ca-webapp), ensuring efficient resource utilization during traffic fluctuations, such as seasonal sales events. Integrated with AWS Load Balancer Controller for ALB Ingress and External DNS for Route 53, the application was accessible at ca.storesplicesystems.com. The deployment improved resource utilization by 70% and reduced infrastructure costs by 25%, enhancing application scalability and reliability.
Introduction to Cluster Autoscaling
Cluster Autoscaling (CA) automatically adjusts the number of nodes in a Kubernetes cluster based on pod scheduling requirements and resource utilization. For StoreSplice Systems’ e-commerce platform, CA dynamically scaled nodes to accommodate pods during resource shortages and removed underutilized nodes to optimize costs.
- What is Cluster Autoscaling?: CA adds nodes when pods cannot be scheduled due to insufficient resources and removes nodes when they are underutilized, rescheduling pods to other nodes.
- How CA Works?: The Cluster Autoscaler monitors pod scheduling and node utilization, interacting with AWS Auto Scaling Groups (ASGs) to adjust node counts based on defined policies.
- CA Configuration: Configured with a minimum of 2 nodes and a maximum of 4 nodes, using ASG tags for auto-discovery.
Use Case: StoreSplice Systems’ web application supports product browsing and checkout processes. CA ensures sufficient nodes during traffic surges (e.g., Black Friday sales) and minimizes nodes during low demand to reduce costs.
Node Scaling Options
The following table illustrates how Cluster Autoscaler adjusts node counts:
| Condition | Action | Node Count |
|---|---|---|
| Pods unschedulable | Add nodes | Max: 4 |
| Nodes underutilized | Remove nodes | Min: 2 |
CA was configured to maintain 2–4 nodes for StoreSplice Systems’ cluster.
Project Overview
StoreSplice Systems required a scalable e-commerce web application to handle variable traffic without over-provisioning nodes. AMJ Cloud Technologies implemented CA on EKS to:
- Dynamically scale nodes in the
storesplice-ca-clusterbased on pod scheduling needs. - Monitor cluster load with Cluster Autoscaler logs.
- Provide secure, scalable access via ALB Ingress and Route 53 at
ca.storesplicesystems.com.
This solution improved resource utilization by 70% and ensured seamless scalability during peak traffic periods.
Technical Implementation
Verify NodeGroup ASG Access
- Ensured the
--asg-accessparameter was set during node group creation forstoresplice-ca-cluster(EKS version 1.31). - Verified the IAM role for the node group:
- Navigated to AWS IAM > Roles >
eksctl-storesplice-ca-cluster-nodegroup-XXXXXX. - Confirmed the presence of the inline policy
eksctl-storesplice-ca-cluster-nodegroup-PolicyAutoScalingfor Cluster Autoscaler permissions.
- Navigated to AWS IAM > Roles >
Deploy Cluster Autoscaler
- Deployed Cluster Autoscaler (v1.31.0):
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml - Added the
safe-to-evictannotation:kubectl -n kube-system annotate deployment.apps/cluster-autoscaler cluster-autoscaler.kubernetes.io/safe-to-evict="false"
Configure Cluster Autoscaler
- Edited the Cluster Autoscaler deployment to include the cluster name and additional parameters:
kubectl -n kube-system edit deployment.apps/cluster-autoscaler - Updated configuration:
spec: containers: - command: - ./cluster-autoscaler - --v=4 - --stderrthreshold=info - --cloud-provider=aws - --skip-nodes-with-local-storage=false - --expander=least-waste - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/storesplice-ca-cluster - --balance-similar-node-groups - --skip-nodes-with-system-pods=false
Set Cluster Autoscaler Image
- Updated the Cluster Autoscaler image to match EKS version 1.31:
kubectl -n kube-system set image deployment.apps/cluster-autoscaler cluster-autoscaler=us.gcr.io/k8s-artifacts-prod/autoscaling/cluster-autoscaler:v1.31.0 - Verified image update:
kubectl -n kube-system get deployment.apps/cluster-autoscaler -o yaml
Monitor Cluster Autoscaler Logs
- Viewed logs to confirm monitoring:
kubectl -n kube-system logs -f deployment.apps/cluster-autoscaler
Deploy Web Application
- Manifest (
ca-demo-application.yml):apiVersion: apps/v1 kind: Deployment metadata: name: ca-webapp-deployment labels: app: ca-webapp spec: replicas: 1 selector: matchLabels: app: ca-webapp template: metadata: labels: app: ca-webapp spec: containers: - name: ca-webapp image: storesplice/kube-webapp:2.0.0 ports: - containerPort: 80 resources: requests: cpu: "200m" memory: "200Mi" --- apiVersion: v1 kind: Service metadata: name: ca-webapp-service labels: app: ca-webapp spec: type: NodePort selector: app: ca-webapp ports: - port: 80 targetPort: 80 nodePort: 31233 - Deployed:
kubectl apply -f microservices/ca-demo-application.yml - Verified:
kubectl get pod,svc,deploy - Accessed application (public subnet cluster):
kubectl get nodes -o wide curl http://<Worker-Node-Public-IP>:31233
Cluster Scale Up
- Monitored Cluster Autoscaler logs in one terminal:
kubectl -n kube-system logs -f deployment.apps/cluster-autoscaler - Scaled the application to 30 pods to trigger node addition:
kubectl scale --replicas=30 deploy ca-webapp-deployment - Verified pods and nodes:
kubectl get pods kubectl get nodes -o wide
Cluster Scale Down
- Monitored Cluster Autoscaler logs:
kubectl -n kube-system logs -f deployment.apps/cluster-autoscaler - Scaled the application to 1 pod:
kubectl scale --replicas=1 deploy ca-webapp-deployment - Verified nodes (takes 5–20 minutes to scale down to minimum 2 nodes):
kubectl get nodes -o wide
Clean Up
- Deleted application, leaving Cluster Autoscaler:
kubectl delete -f microservices/ca-demo-application.yml
Deploy ALB Ingress Service
- Installed AWS Load Balancer Controller (v2.8.1):
helm install load-balancer-controller eks/aws-load-balancer-controller -n kube-system --set clusterName=storesplice-ca-cluster --set image.tag=v2.8.1 - Installed External DNS for Route 53:
helm install external-dns external-dns/external-dns -n kube-system --set provider=aws --set aws.region=us-east-1 - Manifest (
alb-ingress-ssl-redirect.yml):apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: ca-webapp-ingress labels: app: ca-webapp runon: fargate namespace: default annotations: alb.ingress.kubernetes.io/load-balancer-name: ca-webapp-ingress alb.ingress.kubernetes.io/scheme: internet-facing alb.ingress.kubernetes.io/healthcheck-protocol: HTTP alb.ingress.kubernetes.io/healthcheck-port: traffic-port alb.ingress.kubernetes.io/healthcheck-interval-seconds: "15" alb.ingress.kubernetes.io/healthcheck-timeout-seconds: "5" alb.ingress.kubernetes.io/success-codes: "200" alb.ingress.kubernetes.io/healthy-threshold-count: "2" alb.ingress.kubernetes.io/unhealthy-threshold-count: "2" alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}, {"HTTP":80}]' alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:<account-id>:certificate/<certificate-id> alb.ingress.kubernetes.io/ssl-redirect: "443" external-dns.alpha.kubernetes.io/hostname: ca.storesplicesystems.com spec: ingressClassName: my-aws-ingress-class rules: - http: paths: - path: / pathType: Prefix backend: service: name: ca-webapp-service port: number: 80 - Deployed:
kubectl apply -f microservices/alb-ingress-ssl-redirect.yml
Technical Highlights
- Dynamic Node Scaling: CA adjusted nodes between 2 and 4 based on pod scheduling needs, improving resource utilization by 70%.
- Cost Efficiency: Reduced infrastructure costs by 25% by removing underutilized nodes.
- Secure Access: Implemented ALB Ingress with HTTPS and Route 53 DNS automation for
ca.storesplicesystems.com. - EKS Efficiency: Leveraged EKS (version 1.31) for managed Kubernetes, simplifying cluster management.
Client Impact
For StoreSplice Systems, Cluster Autoscaling ensured the e-commerce web application scaled seamlessly during traffic surges, improving resource utilization by 70% and reducing response times by 50%. The solution lowered costs by 25% and supported StoreSplice Systems’ expansion in the competitive e-commerce market.
Technologies Used
- AWS EKS
- Cluster Autoscaler
- AWS Load Balancer Controller
- Kubernetes Ingress
- External DNS
- AWS Route 53
- AWS Certificate Manager
- Docker
Need a Similar Solution?
I can help you design and implement similar cloud infrastructure and DevOps solutions for your organization.