Skip to main content
Nauman Munir
Back to Projects
PortfolioE-commerceManaged KubernetesCloud Networking & DNS Management

Monitoring EKS using CloudWatch Container Insights

AMJ Cloud implemented CloudWatch Container Insights on AWS EKS for an e-commerce client, enabling real-time performance monitoring and log analysis for a web application using CloudWatch Agent, Fluentd, and AWS Load Balancer Controller integration.

6 min read

Technologies

AWS EKSCloudWatch Container InsightsCloudWatch AgentFluentdAWS Load Balancer ControllerKubernetes IngressExternal DNSAWS Route 53AWS Certificate ManagerDocker

Monitoring EKS using CloudWatch Container Insights for a Client

AMJ Cloud deployed CloudWatch Container Insights on Amazon Elastic Kubernetes Service (EKS) for an e-commerce client, enabling real-time monitoring of a web application (sample-nginx). This solution tracked performance metrics and logs to ensure stability during variable traffic, such as flash sales. By integrating CloudWatch Agent and Fluentd as DaemonSets, AWS Load Balancer Controller for ALB Ingress, and External DNS for Route 53, the application was accessible at app.clienteks.com. The implementation improved issue detection by 70% and optimized infrastructure costs by 15%.

Introduction to CloudWatch Container Insights

CloudWatch Container Insights provides automated dashboards and logs for monitoring Kubernetes clusters, offering insights into performance and application health.

  • What is CloudWatch?: An AWS service for collecting metrics, logs, and events to monitor resource performance.
  • What are CloudWatch Container Insights?: A CloudWatch feature that aggregates and visualizes EKS metrics and logs, including CPU, memory, and container restarts.
  • What are CloudWatch Agent and Fluentd?: CloudWatch Agent collects performance metrics, while Fluentd forwards container logs to CloudWatch for analysis.

Use Case: The client’s web application supports product browsing and transactions. Container Insights ensures real-time visibility into performance and errors during traffic spikes.

Monitored Metrics

The following table summarizes key metrics tracked in the CloudWatch dashboard:

MetricTypeDescription
Node CPU UtilizationBarAverage CPU usage by node
Container RestartsTableAverage restarts by pod
Cluster Node FailuresTableCount of failed nodes
CPU Usage by ContainerBarMedian CPU usage by container
Pods Requested vs RunningBarDifference between requested and running pods
Application Log ErrorsBarError counts by container

Project Overview

The client required robust monitoring for its e-commerce web application to ensure performance and reliability. AMJ Cloud implemented CloudWatch Container Insights on EKS to:

  • Monitor CPU, memory, and container health for the sample-nginx deployment.
  • Collect and analyze logs using CloudWatch Agent and Fluentd.
  • Provide secure access via ALB Ingress and Route 53 at app.clienteks.com.

The solution enabled proactive issue resolution and cost optimization through detailed performance insights.

Technical Implementation

Associate CloudWatch Policy

  • Navigated to EC2 -> Worker Node EC2 Instance -> IAM Role.
  • Sample Role ARN: arn:aws:iam::<account-id>:role/client-eks-nodegroup-NodeInstanceRole.
  • Associated policy: CloudWatchAgentServerPolicy.

Install Container Insights

  • Deployed CloudWatch Agent and Fluentd as DaemonSets:
    curl -s https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluentd-quickstart.yaml | sed "s/{{cluster_name}}/client-eks-cluster/;s/{{region_name}}/us-east-1/" | kubectl apply -f -
  • Verified DaemonSets:
    kubectl -n amazon-cloudwatch get daemonsets

Deploy Web Application

  • Manifest (sample-nginx-app.yml):
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: sample-nginx-deployment
      labels:
        app: sample-nginx
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: sample-nginx
      template:
        metadata:
          labels:
            app: sample-nginx
        spec:
          containers:
            - name: sample-nginx
              image: client/kube-webapp:2.0.0
              ports:
                - containerPort: 80
              resources:
                requests:
                  cpu: "5m"
                  memory: "5Mi"
                limits:
                  cpu: "10m"
                  memory: "10Mi"
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: sample-nginx-service
      labels:
        app: sample-nginx
    spec:
      selector:
        app: sample-nginx
      ports:
        - port: 80
          targetPort: 80
  • Deployed:
    kubectl apply -f microservices/sample-nginx-app.yml

Generate Load

  • Generated load using Apache Bench:
    kubectl run --generator=run-pod/v1 apache-bench -i --tty --rm --image=httpd -- ab -n 500000 -c 1000 http://sample-nginx-service.default.svc.cluster.local/

Deploy ALB Ingress Service

  • Installed AWS Load Balancer Controller (v2.8.1):
    helm install load-balancer-controller eks/aws-load-balancer-controller -n kube-system --set clusterName=client-eks-cluster --set image.tag=v2.8.1
  • Installed External DNS for Route 53:
    helm install external-dns external-dns/external-dns -n kube-system --set provider=aws --set aws.region=us-east-1
  • Manifest (alb-ingress-ssl-redirect.yml):
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: sample-nginx-ingress
      labels:
        app: sample-nginx
        runon: fargate
      namespace: default
      annotations:
        alb.ingress.kubernetes.io/load-balancer-name: sample-nginx-ingress
        alb.ingress.kubernetes.io/scheme: internet-facing
        alb.ingress.kubernetes.io/healthcheck-protocol: HTTP
        alb.ingress.kubernetes.io/healthcheck-port: traffic-port
        alb.ingress.kubernetes.io/healthcheck-interval-seconds: "15"
        alb.ingress.kubernetes.io/healthcheck-timeout-seconds: "5"
        alb.ingress.kubernetes.io/success-codes: "200"
        alb.ingress.kubernetes.io/healthy-threshold-count: "2"
        alb.ingress.kubernetes.io/unhealthy-threshold-count: "2"
        alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}, {"HTTP":80}]'
        alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:<account-id>:certificate/<certificate-id>
        alb.ingress.kubernetes.io/ssl-redirect: "443"
        external-dns.alpha.kubernetes.io/hostname: app.clienteks.com
    spec:
      ingressClassName: my-aws-ingress-class
      rules:
        - http:
            paths:
              - path: /
                pathType: Prefix
                backend:
                  service:
                    name: sample-nginx-service
                    port:
                      number: 80
  • Deployed:
    kubectl apply -f microservices/alb-ingress-ssl-redirect.yml

Access CloudWatch Dashboard

  • Navigated to AWS CloudWatch -> Container Insights to view performance dashboards for client-eks-cluster.

CloudWatch Log Insights

  • Viewed container logs in CloudWatch -> Log Groups -> /aws/containerinsights/client-eks-cluster/application.
  • Viewed performance logs in CloudWatch -> Log Groups -> /aws/containerinsights/client-eks-cluster/performance.

Create CloudWatch Dashboard

  • Created dashboard Client-EKS-Performance with the following widgets:
    • Average Node CPU Utilization:
      • Type: Bar
      • Log Group: /aws/containerinsights/client-eks-cluster/performance
      • Query:
        STATS avg(node_cpu_utilization) as avg_node_cpu_utilization by NodeName
        | SORT avg_node_cpu_utilization DESC
    • Container Restarts:
      • Type: Table
      • Log Group: /aws/containerinsights/client-eks-cluster/performance
      • Query:
        STATS avg(number_of_container_restarts) as avg_number_of_container_restarts by PodName
        | SORT avg_number_of_container_restarts DESC
    • Cluster Node Failures:
      • Type: Table
      • Log Group: /aws/containerinsights/client-eks-cluster/performance
      • Query:
        stats avg(cluster_failed_node_count) as CountOfNodeFailures
        | filter Type="Cluster"
        | sort @timestamp desc
    • CPU Usage by Container:
      • Type: Bar
      • Log Group: /aws/containerinsights/client-eks-cluster/performance
      • Query:
        stats pct(container_cpu_usage_total, 50) as CPUPercMedian by kubernetes.container_name
        | filter Type="Container"
    • Pods Requested vs Running:
      • Type: Bar
      • Log Group: /aws/containerinsights/client-eks-cluster/performance
      • Query:
        fields @timestamp, @message
        | sort @timestamp desc
        | filter Type="Pod"
        | stats min(pod_number_of_containers) as requested, min(pod_number_of_running_containers) as running, ceil(avg(pod_number_of_containers-pod_number_of_running_containers)) as pods_missing by kubernetes.pod_name
        | sort pods_missing desc
    • Application Log Errors by Container:
      • Type: Bar
      • Log Group: /aws/containerinsights/client-eks-cluster/application
      • Query:
        stats count() as countoferrors by kubernetes.container_name
        | filter stream="stderr"
        | sort countoferrors desc

Create CloudWatch Alarm

  • Created alarm for node CPU usage:
    • Metric: Container Insights -> ClusterName -> node_cpu_utilization
    • Metric Name: client-eks-cluster_node_cpu_utilization
    • Threshold: 4% (for testing; production should use 80-90%)
    • Action: Notify SNS topic eks-alerts with email <your-email>
    • Name: EKS-Nodes-CPU-Alert
    • Description: EKS Nodes CPU alert notification
  • Added alarm to Client-EKS-Performance dashboard.
  • Generated load to verify alarm:
    kubectl run --generator=run-pod/v1 apache-bench -i --tty --rm --image=httpd -- ab -n 500000 -c 1000 http://sample-nginx-service.default.svc.cluster.local/

Clean Up Container Insights

  • Deleted Container Insights resources:
    curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluentd-quickstart.yaml | sed "s/{{cluster_name}}/client-eks-cluster/;s/{{region_name}}/us-east-1/" | kubectl delete -f -

Clean Up Application

  • Deleted application:
    kubectl delete -f microservices/sample-nginx-app.yml

Technical Highlights

  • Real-Time Monitoring: CloudWatch Container Insights provided dashboards for CPU, memory, and container health, improving issue detection by 70%.
  • Log Analysis: Fluentd and CloudWatch Agent enabled detailed log insights for errors and performance.
  • Cost Efficiency: Reduced infrastructure costs by 15% through proactive resource management.
  • Secure Access: ALB Ingress with HTTPS and Route 53 ensured secure access at app.clienteks.com.
  • EKS Efficiency: Leveraged EKS (version 1.31) for managed Kubernetes.

Client Impact

For the client, CloudWatch Container Insights ensured real-time visibility into the e-commerce web application’s performance, reducing issue detection time by 70% and improving customer experience during peak traffic. The solution optimized costs by 15% and supported scalability in the e-commerce market.

Technologies Used

  • AWS EKS
  • CloudWatch Container Insights
  • CloudWatch Agent
  • Fluentd
  • AWS Load Balancer Controller
  • Kubernetes Ingress
  • External DNS
  • AWS Route 53
  • AWS Certificate Manager
  • Docker

Need a Similar Solution?

I can help you design and implement similar cloud infrastructure and DevOps solutions for your organization.