Monitoring EKS using CloudWatch Container Insights

Monitoring EKS using CloudWatch Container Insights for a Client

AMJ Cloud deployed CloudWatch Container Insights on Amazon Elastic Kubernetes Service (EKS) for an e-commerce client, enabling real-time monitoring of a web application (sample-nginx). This solution tracked performance metrics and logs to ensure stability during variable traffic, such as flash sales. By integrating CloudWatch Agent and Fluentd as DaemonSets, AWS Load Balancer Controller for ALB Ingress, and External DNS for Route 53, the application was accessible at app.clienteks.com. The implementation improved issue detection by 70% and optimized infrastructure costs by 15%.

Introduction to CloudWatch Container Insights

CloudWatch Container Insights provides automated dashboards and logs for monitoring Kubernetes clusters, offering insights into performance and application health.

What is CloudWatch?: An AWS service for collecting metrics, logs, and events to monitor resource performance.
What are CloudWatch Container Insights?: A CloudWatch feature that aggregates and visualizes EKS metrics and logs, including CPU, memory, and container restarts.
What are CloudWatch Agent and Fluentd?: CloudWatch Agent collects performance metrics, while Fluentd forwards container logs to CloudWatch for analysis.

Use Case: The client’s web application supports product browsing and transactions. Container Insights ensures real-time visibility into performance and errors during traffic spikes.

Monitored Metrics

The following table summarizes key metrics tracked in the CloudWatch dashboard:

Metric	Type	Description
Node CPU Utilization	Bar	Average CPU usage by node
Container Restarts	Table	Average restarts by pod
Cluster Node Failures	Table	Count of failed nodes
CPU Usage by Container	Bar	Median CPU usage by container
Pods Requested vs Running	Bar	Difference between requested and running pods
Application Log Errors	Bar	Error counts by container

Project Overview

The client required robust monitoring for its e-commerce web application to ensure performance and reliability. AMJ Cloud implemented CloudWatch Container Insights on EKS to:

Monitor CPU, memory, and container health for the sample-nginx deployment.
Collect and analyze logs using CloudWatch Agent and Fluentd.
Provide secure access via ALB Ingress and Route 53 at app.clienteks.com.

The solution enabled proactive issue resolution and cost optimization through detailed performance insights.

Technical Implementation

Associate CloudWatch Policy

Navigated to EC2 -> Worker Node EC2 Instance -> IAM Role.
Sample Role ARN: arn:aws:iam::<account-id>:role/client-eks-nodegroup-NodeInstanceRole.
Associated policy: CloudWatchAgentServerPolicy.

Install Container Insights

Deployed CloudWatch Agent and Fluentd as DaemonSets:

curl -s https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluentd-quickstart.yaml | sed "s/{{cluster_name}}/client-eks-cluster/;s/{{region_name}}/us-east-1/" | kubectl apply -f -

Verified DaemonSets:

kubectl -n amazon-cloudwatch get daemonsets

Deploy Web Application

Manifest (sample-nginx-app.yml):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-nginx-deployment
  labels:
    app: sample-nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sample-nginx
  template:
    metadata:
      labels:
        app: sample-nginx
    spec:
      containers:
        - name: sample-nginx
          image: client/kube-webapp:2.0.0
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: "5m"
              memory: "5Mi"
            limits:
              cpu: "10m"
              memory: "10Mi"
---
apiVersion: v1
kind: Service
metadata:
  name: sample-nginx-service
  labels:
    app: sample-nginx
spec:
  selector:
    app: sample-nginx
  ports:
    - port: 80
      targetPort: 80

Deployed:

kubectl apply -f microservices/sample-nginx-app.yml

Generate Load

Generated load using Apache Bench:

kubectl run --generator=run-pod/v1 apache-bench -i --tty --rm --image=httpd -- ab -n 500000 -c 1000 http://sample-nginx-service.default.svc.cluster.local/

Deploy ALB Ingress Service

Installed AWS Load Balancer Controller (v2.8.1):

helm install load-balancer-controller eks/aws-load-balancer-controller -n kube-system --set clusterName=client-eks-cluster --set image.tag=v2.8.1

Installed External DNS for Route 53:

helm install external-dns external-dns/external-dns -n kube-system --set provider=aws --set aws.region=us-east-1

Manifest (alb-ingress-ssl-redirect.yml):

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: sample-nginx-ingress
  labels:
    app: sample-nginx
    runon: fargate
  namespace: default
  annotations:
    alb.ingress.kubernetes.io/load-balancer-name: sample-nginx-ingress
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/healthcheck-protocol: HTTP
    alb.ingress.kubernetes.io/healthcheck-port: traffic-port
    alb.ingress.kubernetes.io/healthcheck-interval-seconds: "15"
    alb.ingress.kubernetes.io/healthcheck-timeout-seconds: "5"
    alb.ingress.kubernetes.io/success-codes: "200"
    alb.ingress.kubernetes.io/healthy-threshold-count: "2"
    alb.ingress.kubernetes.io/unhealthy-threshold-count: "2"
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}, {"HTTP":80}]'
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:<account-id>:certificate/<certificate-id>
    alb.ingress.kubernetes.io/ssl-redirect: "443"
    external-dns.alpha.kubernetes.io/hostname: app.clienteks.com
spec:
  ingressClassName: my-aws-ingress-class
  rules:
    - http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: sample-nginx-service
                port:
                  number: 80

Deployed:

kubectl apply -f microservices/alb-ingress-ssl-redirect.yml

Access CloudWatch Dashboard

Navigated to AWS CloudWatch -> Container Insights to view performance dashboards for client-eks-cluster.

CloudWatch Log Insights

Viewed container logs in CloudWatch -> Log Groups -> /aws/containerinsights/client-eks-cluster/application.
Viewed performance logs in CloudWatch -> Log Groups -> /aws/containerinsights/client-eks-cluster/performance.

Create CloudWatch Dashboard

Created dashboard Client-EKS-Performance with the following widgets:

Average Node CPU Utilization:

Type: Bar
Log Group: /aws/containerinsights/client-eks-cluster/performance

Query:

STATS avg(node_cpu_utilization) as avg_node_cpu_utilization by NodeName
| SORT avg_node_cpu_utilization DESC

Container Restarts:

Type: Table
Log Group: /aws/containerinsights/client-eks-cluster/performance

Query:

STATS avg(number_of_container_restarts) as avg_number_of_container_restarts by PodName
| SORT avg_number_of_container_restarts DESC

Cluster Node Failures:

Type: Table
Log Group: /aws/containerinsights/client-eks-cluster/performance

Query:

stats avg(cluster_failed_node_count) as CountOfNodeFailures
| filter Type="Cluster"
| sort @timestamp desc

CPU Usage by Container:

Type: Bar
Log Group: /aws/containerinsights/client-eks-cluster/performance

Query:

stats pct(container_cpu_usage_total, 50) as CPUPercMedian by kubernetes.container_name
| filter Type="Container"

Pods Requested vs Running:

Type: Bar
Log Group: /aws/containerinsights/client-eks-cluster/performance

Query:

fields @timestamp, @message
| sort @timestamp desc
| filter Type="Pod"
| stats min(pod_number_of_containers) as requested, min(pod_number_of_running_containers) as running, ceil(avg(pod_number_of_containers-pod_number_of_running_containers)) as pods_missing by kubernetes.pod_name
| sort pods_missing desc

Application Log Errors by Container:
- Type: Bar
- Log Group: /aws/containerinsights/client-eks-cluster/application
- Query:
```
stats count() as countoferrors by kubernetes.container_name
| filter stream="stderr"
| sort countoferrors desc
```

Create CloudWatch Alarm

Created alarm for node CPU usage:
- Metric: Container Insights -> ClusterName -> node_cpu_utilization
- Metric Name: client-eks-cluster_node_cpu_utilization
- Threshold: 4% (for testing; production should use 80-90%)
- Action: Notify SNS topic eks-alerts with email <your-email>
- Name: EKS-Nodes-CPU-Alert
- Description: EKS Nodes CPU alert notification
Added alarm to Client-EKS-Performance dashboard.

Generated load to verify alarm:

kubectl run --generator=run-pod/v1 apache-bench -i --tty --rm --image=httpd -- ab -n 500000 -c 1000 http://sample-nginx-service.default.svc.cluster.local/

Clean Up Container Insights

Deleted Container Insights resources:

curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluentd-quickstart.yaml | sed "s/{{cluster_name}}/client-eks-cluster/;s/{{region_name}}/us-east-1/" | kubectl delete -f -

Clean Up Application

Deleted application:

kubectl delete -f microservices/sample-nginx-app.yml

Technical Highlights

Real-Time Monitoring: CloudWatch Container Insights provided dashboards for CPU, memory, and container health, improving issue detection by 70%.
Log Analysis: Fluentd and CloudWatch Agent enabled detailed log insights for errors and performance.
Cost Efficiency: Reduced infrastructure costs by 15% through proactive resource management.
Secure Access: ALB Ingress with HTTPS and Route 53 ensured secure access at app.clienteks.com.
EKS Efficiency: Leveraged EKS (version 1.31) for managed Kubernetes.

Client Impact

For the client, CloudWatch Container Insights ensured real-time visibility into the e-commerce web application’s performance, reducing issue detection time by 70% and improving customer experience during peak traffic. The solution optimized costs by 15% and supported scalability in the e-commerce market.

Technologies Used

AWS EKS
CloudWatch Container Insights
CloudWatch Agent
Fluentd
AWS Load Balancer Controller
Kubernetes Ingress
External DNS
AWS Route 53
AWS Certificate Manager
Docker

Technologies