Good. Now we’re talking about something that actually breaks in production if you don’t understand it properly.

You’re on EKS, so this is AWS EBS-backed storage. Let’s go layer by layer, then we’ll apply it to real scenarios like MySQL, Kafka, Redis.

🔥 First: Core Concepts (Stop Memorizing, Understand the Flow)

When a Pod needs storage:

Pod → requests PVC
PVC → uses StorageClass
StorageClass → provisions EBS volume
EBS → becomes PV
PV → attaches to node in specific AZ
Pod → must run on a node in that same AZ

That AZ part is where most people screw up.

🧱 Components Explained (EKS Context)

1️⃣ StorageClass

Defines:

EBS type (gp3, io2, etc.)
Reclaim policy
VolumeBindingMode
Parameters like IOPS

Example (EKS default gp3):

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3-sc
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
  type: gp3
reclaimPolicy: Delete

Key setting:

🔥 volumeBindingMode

Two options:

Immediate (BAD for multi-AZ clusters)

Volume created immediately when PVC is created. Problem:

It picks a random AZ.
Pod might schedule in different AZ.
Pod stuck in Pending forever.

WaitForFirstConsumer (Correct for EKS)

Volume is created only after Pod is scheduled. This ensures:

Pod scheduled to a node in AZ X
EBS created in AZ X
Everything aligns

In EKS multi-AZ cluster → ALWAYS use WaitForFirstConsumer.

2️⃣ PVC (PersistentVolumeClaim)

App asks for storage:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-data
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: gp3-sc
  resources:
    requests:
      storage: 20Gi

ReadWriteOnce = can attach to only one node at a time (EBS limitation).

3️⃣ PV (PersistentVolume)

You usually don’t create manually. EBS CSI dynamically provisions it.

Represents actual EBS volume.

⚠️ CRITICAL: EBS Is AZ Bound

EBS volumes:

Exist in ONE AZ
Can attach to ONE node at a time
Cannot move across AZs

If your pod moves to another AZ → volume detach + attach required If AZ dies → your pod is dead

This matters massively for databases.

🔥 Scenario 1 — MySQL Primary + 3 Read Replicas (EKS StatefulSet)

Let’s design this properly.

Architecture

StatefulSet: mysql
Replicas: 4 (1 primary + 3 replicas)
Each pod needs its own volume
Each volume AZ-bound

Why StatefulSet?

Because:

Stable network identity
Stable PVC per pod
Ordered startup

StatefulSet VolumeClaimTemplate

volumeClaimTemplates:
  - metadata:
      name: mysql-data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: gp3-sc
      resources:
        requests:
          storage: 50Gi

This creates:

Pod	PVC	EBS Volume	AZ
mysql-0	mysql-data-mysql-0	vol-xxx	ap-south-1a
mysql-1	mysql-data-mysql-1	vol-yyy	ap-south-1b
mysql-2	mysql-data-mysql-2	vol-zzz	ap-south-1c
mysql-3	mysql-data-mysql-3	vol-abc	ap-south-1a

Scheduler spreads pods across AZs.

Each pod gets its own EBS.

What Happens If Node Dies?

If node in 1a dies:

Kubernetes reschedules mysql-0
Must schedule in 1a
Because volume exists in 1a

If no node available in 1a → pod Pending forever

This is why:

You MUST have worker nodes in all AZs
You MUST use WaitForFirstConsumer

⚠️ Hard Truth: EKS + EBS Is NOT HA Across AZ

EBS does NOT replicate across AZ.

Your HA depends on:

MySQL replication
Not EBS replication

If AZ dies:

Primary in that AZ dies
You must promote replica in other AZ manually or via operator

🔥 Scenario 2 — 2 MySQL Masters

Now it gets tricky.

EBS still:

One volume per pod
One AZ per volume

Each master:

Has independent EBS
Replicates via MySQL clustering

Storage is NOT shared. It’s replicated at database layer.

Never try to share EBS between pods. It won’t work.

🔥 Scenario 3 — Kafka Cluster (3 Brokers)

Kafka absolutely requires:

One volume per broker
High IOPS
Consistent latency

You’d define:

storageClassName: gp3-sc
parameters:
  type: gp3

Better option:

Use io2 for production
Or gp3 with provisioned IOPS

Each broker:

Own EBS
AZ bound
Kafka handles replication

Again: Storage is local per broker. Replication happens at Kafka level.

If broker in AZ A dies:

Partition replicas in B and C take over

But: If entire AZ dies → you lose that broker’s data copy

That’s why Kafka replication factor = 3 minimum.

🔥 Scenario 4 — Redis

If it’s:

Standalone Redis:

Single pod + EBS

Bad HA story.

Redis Sentinel / Cluster:

Each pod:

Own PVC
Own EBS
Replication at Redis level

Same AZ constraints apply.

🔥 Very Important: Pod Rescheduling Behavior

When a pod restarts:

Kubernetes finds existing PVC
PVC bound to specific PV
PV bound to EBS
EBS tied to AZ
Pod must land in same AZ

Scheduler enforces topology constraint automatically.

If you don’t have node capacity in that AZ: Pod stuck forever.

That’s why: AutoScaling Groups must exist in all AZs.

🔥 Reclaim Policy

In StorageClass:

Delete

When PVC deleted → EBS deleted

Dangerous for production DB.

Retain

PVC deleted → EBS remains

Better for production databases.

🔥 When EBS Is the Wrong Choice

EBS:

Single AZ
RWO only
Block storage

If you need:

Shared filesystem
Multi-pod write
Cross-AZ mounting

Use:

EFS (RWX)
FSx
S3 (object storage)

🔥 Production Advice (No Sugarcoating)

If you're running:

Production MySQL
Production Kafka
Production Redis

On EKS with raw StatefulSets…

You’re reinventing the wheel.

Better:

Use RDS for MySQL
Use MSK for Kafka
Use ElastiCache for Redis

Unless:

You absolutely need Kubernetes-native control
You accept operational complexity

Running stateful workloads in EKS with EBS is doable. But you need to understand failure domains deeply.

🎯 Summary

Component	Purpose
StorageClass	Defines how volume is created
PVC	App request for storage
PV	Actual volume object
EBS	Real AWS disk
WaitForFirstConsumer	Prevents AZ mismatch
RWO	Only 1 node attach

If you want next level:

I can explain:

What happens during node drain
How volume detach/attach timing works
How topologySpreadConstraints interact with storage
How to design truly HA MySQL on EKS
Or how to debug “pod stuck in Pending due to volume node affinity conflict”

Tell me which direction you want.

Eks Eks Upgrade