Good. Now we’re talking about something that actually breaks in production if you don’t understand it properly.
You’re on EKS, so this is AWS EBS-backed storage. Let’s go layer by layer, then we’ll apply it to real scenarios like MySQL, Kafka, Redis.
🔥 First: Core Concepts (Stop Memorizing, Understand the Flow)
When a Pod needs storage:
- Pod → requests PVC
- PVC → uses StorageClass
- StorageClass → provisions EBS volume
- EBS → becomes PV
- PV → attaches to node in specific AZ
- Pod → must run on a node in that same AZ
That AZ part is where most people screw up.
🧱 Components Explained (EKS Context)
1️⃣ StorageClass
Defines:
- EBS type (gp3, io2, etc.)
- Reclaim policy
- VolumeBindingMode
- Parameters like IOPS
Example (EKS default gp3):
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp3-sc
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
type: gp3
reclaimPolicy: DeleteKey setting:
🔥 volumeBindingMode
Two options:
Immediate (BAD for multi-AZ clusters)
Volume created immediately when PVC is created. Problem:
- It picks a random AZ.
- Pod might schedule in different AZ.
- Pod stuck in Pending forever.
WaitForFirstConsumer (Correct for EKS)
Volume is created only after Pod is scheduled. This ensures:
- Pod scheduled to a node in AZ X
- EBS created in AZ X
- Everything aligns
In EKS multi-AZ cluster → ALWAYS use WaitForFirstConsumer.
2️⃣ PVC (PersistentVolumeClaim)
App asks for storage:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-data
spec:
accessModes:
- ReadWriteOnce
storageClassName: gp3-sc
resources:
requests:
storage: 20GiReadWriteOnce = can attach to only one node at a time (EBS limitation).
3️⃣ PV (PersistentVolume)
You usually don’t create manually. EBS CSI dynamically provisions it.
Represents actual EBS volume.
⚠️ CRITICAL: EBS Is AZ Bound
EBS volumes:
- Exist in ONE AZ
- Can attach to ONE node at a time
- Cannot move across AZs
If your pod moves to another AZ → volume detach + attach required If AZ dies → your pod is dead
This matters massively for databases.
🔥 Scenario 1 — MySQL Primary + 3 Read Replicas (EKS StatefulSet)
Let’s design this properly.
Architecture
- StatefulSet: mysql
- Replicas: 4 (1 primary + 3 replicas)
- Each pod needs its own volume
- Each volume AZ-bound
Why StatefulSet?
Because:
- Stable network identity
- Stable PVC per pod
- Ordered startup
StatefulSet VolumeClaimTemplate
volumeClaimTemplates:
- metadata:
name: mysql-data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: gp3-sc
resources:
requests:
storage: 50GiThis creates:
| Pod | PVC | EBS Volume | AZ |
|---|---|---|---|
| mysql-0 | mysql-data-mysql-0 | vol-xxx | ap-south-1a |
| mysql-1 | mysql-data-mysql-1 | vol-yyy | ap-south-1b |
| mysql-2 | mysql-data-mysql-2 | vol-zzz | ap-south-1c |
| mysql-3 | mysql-data-mysql-3 | vol-abc | ap-south-1a |
Scheduler spreads pods across AZs.
Each pod gets its own EBS.
What Happens If Node Dies?
If node in 1a dies:
- Kubernetes reschedules mysql-0
- Must schedule in 1a
- Because volume exists in 1a
If no node available in 1a → pod Pending forever
This is why:
- You MUST have worker nodes in all AZs
- You MUST use WaitForFirstConsumer
⚠️ Hard Truth: EKS + EBS Is NOT HA Across AZ
EBS does NOT replicate across AZ.
Your HA depends on:
- MySQL replication
- Not EBS replication
If AZ dies:
- Primary in that AZ dies
- You must promote replica in other AZ manually or via operator
🔥 Scenario 2 — 2 MySQL Masters
Now it gets tricky.
EBS still:
- One volume per pod
- One AZ per volume
Each master:
- Has independent EBS
- Replicates via MySQL clustering
Storage is NOT shared. It’s replicated at database layer.
Never try to share EBS between pods. It won’t work.
🔥 Scenario 3 — Kafka Cluster (3 Brokers)
Kafka absolutely requires:
- One volume per broker
- High IOPS
- Consistent latency
You’d define:
storageClassName: gp3-sc
parameters:
type: gp3Better option:
- Use io2 for production
- Or gp3 with provisioned IOPS
Each broker:
- Own EBS
- AZ bound
- Kafka handles replication
Again: Storage is local per broker. Replication happens at Kafka level.
If broker in AZ A dies:
- Partition replicas in B and C take over
But: If entire AZ dies → you lose that broker’s data copy
That’s why Kafka replication factor = 3 minimum.
🔥 Scenario 4 — Redis
If it’s:
Standalone Redis:
Single pod + EBS
Bad HA story.
Redis Sentinel / Cluster:
Each pod:
- Own PVC
- Own EBS
- Replication at Redis level
Same AZ constraints apply.
🔥 Very Important: Pod Rescheduling Behavior
When a pod restarts:
- Kubernetes finds existing PVC
- PVC bound to specific PV
- PV bound to EBS
- EBS tied to AZ
- Pod must land in same AZ
Scheduler enforces topology constraint automatically.
If you don’t have node capacity in that AZ: Pod stuck forever.
That’s why: AutoScaling Groups must exist in all AZs.
🔥 Reclaim Policy
In StorageClass:
Delete
When PVC deleted → EBS deleted
Dangerous for production DB.
Retain
PVC deleted → EBS remains
Better for production databases.
🔥 When EBS Is the Wrong Choice
EBS:
- Single AZ
- RWO only
- Block storage
If you need:
- Shared filesystem
- Multi-pod write
- Cross-AZ mounting
Use:
- EFS (RWX)
- FSx
- S3 (object storage)
🔥 Production Advice (No Sugarcoating)
If you're running:
- Production MySQL
- Production Kafka
- Production Redis
On EKS with raw StatefulSets…
You’re reinventing the wheel.
Better:
- Use RDS for MySQL
- Use MSK for Kafka
- Use ElastiCache for Redis
Unless:
- You absolutely need Kubernetes-native control
- You accept operational complexity
Running stateful workloads in EKS with EBS is doable. But you need to understand failure domains deeply.
🎯 Summary
| Component | Purpose |
|---|---|
| StorageClass | Defines how volume is created |
| PVC | App request for storage |
| PV | Actual volume object |
| EBS | Real AWS disk |
| WaitForFirstConsumer | Prevents AZ mismatch |
| RWO | Only 1 node attach |
If you want next level:
I can explain:
- What happens during node drain
- How volume detach/attach timing works
- How topologySpreadConstraints interact with storage
- How to design truly HA MySQL on EKS
- Or how to debug “pod stuck in Pending due to volume node affinity conflict”
Tell me which direction you want.