We got called in at 2 AM because a D2C brand's payment service crashed and nobody knew for 47 minutes. Their EKS cluster was running 14 microservices across 3 node groups with zero monitoring. A memory leak in one pod cascaded across 4 services before anyone noticed. $8,300 in lost orders. The fix was not a code patch — it was visibility. Prometheus scraping metrics every 15 seconds and Grafana dashboards showing CPU/memory in real time would have caught the spike in under 60 seconds. Here is how to set it up on AWS EKS.

What You'll Learn:

How to install AWS CLI, eksctl, kubectl, and Helm on your server
How to create an EKS cluster and install the Kubernetes Metrics Server
How to configure IAM OIDC and the EBS CSI Driver for persistent storage
How to deploy Prometheus and Grafana using the kube-prometheus-stack Helm chart
How to create Grafana dashboards and monitor a deployed NGINX application

Monitoring vs. Observability: The Difference That Costs You Money

Most teams use these words interchangeably. They are not the same thing. Monitoring tells you what is happening (CPU at 92%, response time at 3.4s). Observability tells you why it is happening (a specific pod's garbage collector is thrashing because a memory limit is set 128MB too low).

Monitoring (Known Unknowns)

Tracks predefined metrics in real time: CPU usage, memory consumption, request counts, error rates. Those colorful dashboards on the wall of the IT department. Answers: "Is the system healthy right now?"

Observability (Unknown Unknowns)

Goes deeper using the three pillars: Metrics (time-series CPU/memory data), Logs (historical event records for root cause analysis), and Traces (request flow through microservices for latency debugging).

The Tools: Prometheus + Grafana

Prometheus (Data Collector)

Open-source metrics scraper. Pulls time-series data from your pods every 15-30 seconds. Includes AlertManager for firing alerts, PushGateway for short-lived jobs, and exporters for third-party services. Zero licensing cost.

Grafana (Visualizer)

Transforms raw Prometheus metrics into live dashboards. Pre-built templates (like Node Exporter dashboard ID 15760) give you CPU, memory, network, and pod-level views within minutes. Also open-source and free.

Step by Step Guide: Deploy Prometheus and Grafana on EKS

Prerequisites

You need an AWS account with access keys configured and an EC2 instance running Ubuntu 22.04 (or any Linux/Mac environment). We will install all CLI tools on this server.

Install AWS CLI, eksctl, kubectl, and Helm

Install all four CLI tools on your server. AWS CLI authenticates with AWS. eksctl creates and manages EKS clusters. kubectl interacts with Kubernetes. Helm is the package manager that deploys Prometheus and Grafana via charts.

Install All CLI Tools

# AWS CLI
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
sudo apt install unzip && unzip awscliv2.zip
sudo ./aws/install
aws configure   # Enter Access Key, Secret Key, Region

# eksctl
ARCH=amd64
PLATFORM=$(uname -s)_$ARCH
curl -sLO "https://github.com/eksctl-io/eksctl/releases/latest/download/eksctl_$PLATFORM.tar.gz"
tar -xzf eksctl_$PLATFORM.tar.gz -C /tmp && rm eksctl_$PLATFORM.tar.gz
sudo mv /tmp/eksctl /usr/local/bin

# kubectl
curl -LO "https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x ./kubectl && sudo mv ./kubectl /usr/local/bin

# Helm
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
chmod 700 get_helm.sh && ./get_helm.sh

Create the EKS Cluster

Use eksctl to spin up a 2-node EKS cluster. This provisions the VPC, subnets, IAM roles, and node group. Takes about 15-20 minutes. Then verify with kubectl get nodes.

Create EKS Cluster

eksctl create cluster \
  --name my-monitoring-cluster \
  --version 1.30 \
  --region us-east-1 \
  --nodegroup-name worker-nodes \
  --node-type t2.medium \
  --nodes 2 \
  --nodes-min 2 \
  --nodes-max 3

# Verify nodes are ready
kubectl get nodes

Install the Metrics Server

The Metrics Server collects CPU, memory, and network usage data from Kubelets on each node. Prometheus scrapes this data. Without it, kubectl top pods returns nothing and Prometheus has no resource metrics to collect.

Install & Verify Metrics Server

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify deployment
kubectl get deployment metrics-server -n kube-system

Configure IAM OIDC and EBS CSI Driver

Prometheus needs persistent storage to retain metrics data across pod restarts. The IAM OIDC provider lets Kubernetes pods assume AWS IAM roles. The EBS CSI Driver dynamically creates EBS volumes as persistent storage for Prometheus pods. Skip this and your Prometheus data disappears every time the pod restarts.

OIDC + EBS CSI Driver Setup

# Associate IAM OIDC provider
eksctl utils associate-iam-oidc-provider \
  --cluster my-monitoring-cluster --approve

# Create EBS CSI Driver IAM role
eksctl create iamserviceaccount \
  --name ebs-csi-controller-sa \
  --namespace kube-system \
  --cluster my-monitoring-cluster \
  --role-name AmazonEKS_EBS_CSI_DriverRole \
  --role-only \
  --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
  --approve

# Add the EBS CSI Driver addon (replace AWS_ACCOUNT_ID)
eksctl create addon \
  --name aws-ebs-csi-driver \
  --cluster my-monitoring-cluster \
  --service-account-role-arn arn:aws:iam::AWS_ACCOUNT_ID:role/AmazonEKS_EBS_CSI_DriverRole \
  --force

Install Prometheus and Grafana via Helm

Add the Helm repos, create a prometheus namespace, and install the kube-prometheus-stack chart. This single chart deploys Prometheus server, Grafana, AlertManager, and all required exporters. Then change both Prometheus and Grafana services from ClusterIP to LoadBalancer to access their dashboards from your browser.

Deploy Prometheus + Grafana

# Add Helm repos
helm repo add stable https://charts.helm.sh/stable
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

# Create namespace and install
kubectl create namespace prometheus
helm install stable prometheus-community/kube-prometheus-stack -n prometheus

# Verify everything is running
kubectl get all -n prometheus

# Expose Prometheus dashboard (change ClusterIP to LoadBalancer)
kubectl edit svc stable-kube-prometheus-sta-prometheus -n prometheus

# Expose Grafana dashboard (change ClusterIP to LoadBalancer)
kubectl edit svc stable-grafana -n prometheus

# Get Grafana admin password
kubectl get secret --namespace prometheus stable-grafana \
  -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

Grafana Login Credentials

Default username is admin. The password is auto-generated and stored as a Kubernetes secret. Use the kubectl get secret command above to retrieve it. Change the default password immediately after first login, especially if your LoadBalancer is internet-facing.

Configure Grafana Dashboards

Open Grafana via the LoadBalancer URL. Go to Add your first data source, choose Prometheus, and enter the Prometheus service URL. Click "Save and Test." Then go to Dashboards, click Import, enter dashboard ID 15760 (Node Exporter Full), select your Prometheus data source, and click Load. You now have real-time CPU, RAM, network, and pod metrics.

Deploy a Test Application and Monitor It

Deploy an NGINX application with 2 replicas to see monitoring in action. Apply the YAML below, verify pods are running, then refresh Grafana to see the new pod metrics appear in your dashboard. This confirms end-to-end monitoring is working.

deployment.yml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx-app
  template:
    metadata:
      labels:
        app: nginx-app
    spec:
      containers:
      - name: nginx-app
        image: nginx:latest
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-app
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: nginx-app

Deploy & Verify

kubectl apply -f deployment.yml
kubectl get deployment
kubectl get pods

# Clean up when done (avoid AWS charges!)
eksctl delete cluster my-monitoring-cluster --region us-east-1

Delete the Cluster When Done

An EKS cluster with 2x t2.medium nodes costs roughly $4.37/day ($0.10/hr EKS + $0.0464/hr per node). Always run eksctl delete cluster when you are finished testing. Forgetting to do this is a $131/month mistake we have seen dozens of times.

Key Components at a Glance

Component	Purpose	Why You Need It
Metrics Server	Collects CPU/memory from Kubelets	Without it, `kubectl top` returns nothing
IAM OIDC Provider	Lets pods assume IAM roles	Required for EBS CSI Driver access
EBS CSI Driver	Creates persistent EBS volumes	Prometheus needs persistent storage
kube-prometheus-stack	Helm chart for the full monitoring stack	Installs Prometheus, Grafana, AlertManager in one command
Dashboard ID 15760	Pre-built Node Exporter dashboard	Instant CPU, RAM, network, disk metrics visualization

Frequently Asked Questions

Can I use Prometheus and Grafana outside of Kubernetes?

Yes. Both tools work with any infrastructure. You can monitor standalone EC2 instances, Docker containers, or bare-metal servers. Kubernetes is just the most common use case because of its dynamic pod scheduling.

Why use the kube-prometheus-stack instead of installing separately?

The stack bundles Prometheus, Grafana, AlertManager, node-exporter, and kube-state-metrics in one Helm chart. Installing them separately means managing 5+ deployments and their configurations individually. The stack handles all of that in one command.

How much storage does Prometheus need for metrics retention?

Default retention is 15 days. A small cluster generates roughly 1-2 GB per day of metrics. For 15 days, plan for 20-30 GB of EBS storage. You can adjust retention with the --storage.tsdb.retention.time flag.

Is exposing Grafana via LoadBalancer safe for production?

Not without additional security. For production, use an Ingress controller with TLS termination, restrict access via security groups, and enable Grafana's built-in authentication with SSO or LDAP integration.

What Grafana dashboard ID should I use for Kubernetes monitoring?

Dashboard 15760 (Node Exporter Full) is the most popular for node-level metrics. For pod-level Kubernetes views, try 6417 (Kubernetes Cluster) or 315 (Kubernetes cluster monitoring via Prometheus).

Running EKS Without Monitoring Dashboards?

We have diagnosed $8,300 outages caused by invisible memory leaks in unmonitored clusters. Whether you need Prometheus alerting pipelines, Grafana dashboard architecture, or full Odoo ERP infrastructure on AWS with production-grade observability — we build the DevOps stack so your team stops firefighting at 2 AM.

What You'll Learn:

How to install AWS CLI, eksctl, kubectl, and Helm on your server
How to create an EKS cluster and install the Kubernetes Metrics Server
How to configure IAM OIDC and the EBS CSI Driver for persistent storage
How to deploy Prometheus and Grafana using the kube-prometheus-stack Helm chart
How to create Grafana dashboards and monitor a deployed NGINX application

Monitoring vs. Observability: The Difference That Costs You Money

Monitoring (Known Unknowns)

Observability (Unknown Unknowns)

The Tools: Prometheus + Grafana

Prometheus (Data Collector)

Grafana (Visualizer)

Step by Step Guide: Deploy Prometheus and Grafana on EKS

Prerequisites

You need an AWS account with access keys configured and an EC2 instance running Ubuntu 22.04 (or any Linux/Mac environment). We will install all CLI tools on this server.

Install AWS CLI, eksctl, kubectl, and Helm

Install All CLI Tools

# AWS CLI
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
sudo apt install unzip && unzip awscliv2.zip
sudo ./aws/install
aws configure   # Enter Access Key, Secret Key, Region

# eksctl
ARCH=amd64
PLATFORM=$(uname -s)_$ARCH
curl -sLO "https://github.com/eksctl-io/eksctl/releases/latest/download/eksctl_$PLATFORM.tar.gz"
tar -xzf eksctl_$PLATFORM.tar.gz -C /tmp && rm eksctl_$PLATFORM.tar.gz
sudo mv /tmp/eksctl /usr/local/bin

# kubectl
curl -LO "https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x ./kubectl && sudo mv ./kubectl /usr/local/bin

# Helm
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
chmod 700 get_helm.sh && ./get_helm.sh

Create the EKS Cluster

Use eksctl to spin up a 2-node EKS cluster. This provisions the VPC, subnets, IAM roles, and node group. Takes about 15-20 minutes. Then verify with kubectl get nodes.

Create EKS Cluster

eksctl create cluster \
  --name my-monitoring-cluster \
  --version 1.30 \
  --region us-east-1 \
  --nodegroup-name worker-nodes \
  --node-type t2.medium \
  --nodes 2 \
  --nodes-min 2 \
  --nodes-max 3

# Verify nodes are ready
kubectl get nodes

Install the Metrics Server

Install & Verify Metrics Server

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify deployment
kubectl get deployment metrics-server -n kube-system

Configure IAM OIDC and EBS CSI Driver

OIDC + EBS CSI Driver Setup

# Associate IAM OIDC provider
eksctl utils associate-iam-oidc-provider \
  --cluster my-monitoring-cluster --approve

# Create EBS CSI Driver IAM role
eksctl create iamserviceaccount \
  --name ebs-csi-controller-sa \
  --namespace kube-system \
  --cluster my-monitoring-cluster \
  --role-name AmazonEKS_EBS_CSI_DriverRole \
  --role-only \
  --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
  --approve

# Add the EBS CSI Driver addon (replace AWS_ACCOUNT_ID)
eksctl create addon \
  --name aws-ebs-csi-driver \
  --cluster my-monitoring-cluster \
  --service-account-role-arn arn:aws:iam::AWS_ACCOUNT_ID:role/AmazonEKS_EBS_CSI_DriverRole \
  --force

Install Prometheus and Grafana via Helm

Deploy Prometheus + Grafana

# Add Helm repos
helm repo add stable https://charts.helm.sh/stable
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

# Create namespace and install
kubectl create namespace prometheus
helm install stable prometheus-community/kube-prometheus-stack -n prometheus

# Verify everything is running
kubectl get all -n prometheus

# Expose Prometheus dashboard (change ClusterIP to LoadBalancer)
kubectl edit svc stable-kube-prometheus-sta-prometheus -n prometheus

# Expose Grafana dashboard (change ClusterIP to LoadBalancer)
kubectl edit svc stable-grafana -n prometheus

# Get Grafana admin password
kubectl get secret --namespace prometheus stable-grafana \
  -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

Grafana Login Credentials

Configure Grafana Dashboards

Deploy a Test Application and Monitor It

deployment.yml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx-app
  template:
    metadata:
      labels:
        app: nginx-app
    spec:
      containers:
      - name: nginx-app
        image: nginx:latest
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-app
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: nginx-app

Deploy & Verify

kubectl apply -f deployment.yml
kubectl get deployment
kubectl get pods

# Clean up when done (avoid AWS charges!)
eksctl delete cluster my-monitoring-cluster --region us-east-1

Delete the Cluster When Done

Key Components at a Glance

Component	Purpose	Why You Need It
Metrics Server	Collects CPU/memory from Kubelets	Without it, `kubectl top` returns nothing
IAM OIDC Provider	Lets pods assume IAM roles	Required for EBS CSI Driver access
EBS CSI Driver	Creates persistent EBS volumes	Prometheus needs persistent storage
kube-prometheus-stack	Helm chart for the full monitoring stack	Installs Prometheus, Grafana, AlertManager in one command
Dashboard ID 15760	Pre-built Node Exporter dashboard	Instant CPU, RAM, network, disk metrics visualization

Frequently Asked Questions

Can I use Prometheus and Grafana outside of Kubernetes?

Why use the kube-prometheus-stack instead of installing separately?

How much storage does Prometheus need for metrics retention?

Is exposing Grafana via LoadBalancer safe for production?

What Grafana dashboard ID should I use for Kubernetes monitoring?

Dashboard 15760 (Node Exporter Full) is the most popular for node-level metrics. For pod-level Kubernetes views, try 6417 (Kubernetes Cluster) or 315 (Kubernetes cluster monitoring via Prometheus).

How to Monitor Kubernetes Clusters with Prometheus and Grafana on AWS EKS: Complete Tutorial

Monitoring vs. Observability: The Difference That Costs You Money

Monitoring (Known Unknowns)

Observability (Unknown Unknowns)

The Tools: Prometheus + Grafana

Prometheus (Data Collector)

Grafana (Visualizer)

Step by Step Guide: Deploy Prometheus and Grafana on EKS

Prerequisites

Install AWS CLI, eksctl, kubectl, and Helm

Create the EKS Cluster

Install the Metrics Server

Configure IAM OIDC and EBS CSI Driver

Install Prometheus and Grafana via Helm

Configure Grafana Dashboards

Deploy a Test Application and Monitor It

Key Components at a Glance

Frequently Asked Questions

Can I use Prometheus and Grafana outside of Kubernetes?

Why use the kube-prometheus-stack instead of installing separately?

How much storage does Prometheus need for metrics retention?

Is exposing Grafana via LoadBalancer safe for production?

What Grafana dashboard ID should I use for Kubernetes monitoring?

Running EKS Without Monitoring Dashboards?

Need this implemented in your project?

Take the guide with you

Book a 30-min architecture call

Get a free 48-hour written brief

How to Monitor Kubernetes Clusters with Prometheus and Grafana on AWS EKS: Complete Tutorial

Monitoring vs. Observability: The Difference That Costs You Money

Monitoring (Known Unknowns)

Observability (Unknown Unknowns)

The Tools: Prometheus + Grafana

Prometheus (Data Collector)

Grafana (Visualizer)

Step by Step Guide: Deploy Prometheus and Grafana on EKS

Prerequisites

Install AWS CLI, eksctl, kubectl, and Helm

Create the EKS Cluster

Install the Metrics Server

Configure IAM OIDC and EBS CSI Driver

Install Prometheus and Grafana via Helm

Configure Grafana Dashboards

Deploy a Test Application and Monitor It

Key Components at a Glance

Frequently Asked Questions

Can I use Prometheus and Grafana outside of Kubernetes?

Why use the kube-prometheus-stack instead of installing separately?

How much storage does Prometheus need for metrics retention?

Is exposing Grafana via LoadBalancer safe for production?

What Grafana dashboard ID should I use for Kubernetes monitoring?

Running EKS Without Monitoring Dashboards?

Need this implemented in your project?

Take the guide with you

Book a 30-min architecture call

Get a free 48-hour written brief