🚀 Part 1 — EKS Upgrade Process & Strategy (Production Grade)

Interviewers don’t want “click upgrade.” They want risk-managed upgrade strategy.

Start your answer like this:

I treat EKS upgrades as a staged, low-risk rollout across control plane, node groups, and add-ons — never a one-step upgrade.

Strong opener.

✅ EKS Upgrade Has 3 Layers (Say This Clearly)

EKS upgrades are done in order:

1️⃣ Control Plane 2️⃣ Cluster Add-ons 3️⃣ Worker Nodes

If you mix order — breakage risk.

🧠 Step 1 — Pre-Upgrade Assessment

Before upgrading:

check Kubernetes version skew policy
read EKS release notes
check deprecated APIs (very important)
scan manifests for removed APIs
check CRDs compatibility
check add-on compatibility matrix
verify CNI / CoreDNS / kube-proxy versions
check Helm charts compatibility

Senior signal: mention API deprecation scan.

Tools you can mention:

kubent
pluto
kube-no-trouble

🛠 Step 2 — Upgrade Control Plane

EKS control plane upgrade is managed by AWS.

Process:

upgrade via console/CLI
no node restart yet
control plane becomes new version
worker nodes can remain one version behind (skew allowed)

Risk is low but:

webhook / admission controllers can break
API removal can break controllers

🔌 Step 3 — Upgrade Add-ons

Critical add-ons:

VPC CNI
CoreDNS
kube-proxy
EBS/EFS CSI drivers
Load balancer controller

These must match cluster version.

Many outages happen here — not control plane.

🖥 Step 4 — Upgrade Node Groups (Safest Pattern)

Best practice = blue/green node group upgrade

Pattern:

Create new node group with:

new AMI
new kubelet version
new CNI version

Then:

cordon old nodes
drain pods respecting PDB
shift workloads
delete old node group

Never in-place patch all nodes blindly.

🔄 Step 5 — Workload Safety Controls

Must mention:

PodDisruptionBudgets
readiness probes
rolling deployments
maxUnavailable tuning
surge capacity available

Without PDB — upgrade can cause outage.

🧪 Step 6 — Pre-Prod Upgrade First

Senior answer must include:

I always upgrade staging cluster first and run smoke + load tests before production.

⏱ Upgrade Frequency Strategy

Good interview answer:

stay within 1–2 versions behind
avoid big jumps
schedule quarterly upgrades
treat as routine — not rare event

🌐 Part 2 — EKS Networking & CNI Models (Deep Interview Topic)

This is where many candidates get confused. Let’s make it clean.

🧩 First — What CNI Does

CNI decides:

pod IP allocation
pod routing
pod ↔ pod communication
pod ↔ VPC communication
network policy support
IP scaling limits

🟢 AWS VPC CNI (Default)

✅ How AWS CNI Works

Pods get real VPC IPs from subnet.

Pod = VPC-native IP No overlay network.

✅ Strengths

native VPC routing
no encapsulation overhead
high performance
security groups integration
works well with AWS LBs
simplest for AWS-native workloads

⚠️ Weakness

Consumes VPC IPs fast. Subnet exhaustion is common at scale.

🚀 Prefix Delegation (AWS CNI Enhancement)

✅ What It Does

Instead of attaching many secondary IPs — attach IP prefixes per ENI.

One prefix = block of pod IPs.

✅ Benefits

massive pod density increase
fewer ENIs needed
faster pod startup
reduces IP exhaustion pressure
best for high-scale clusters

🧠 Interview Position

Say:

Prefix delegation is the preferred scaling model for high pod density on AWS CNI.

🌉 VPC CNI Custom Networking

✅ What It Solves

Use separate subnets for pod IPs instead of node subnets.

Node subnet ≠ pod subnet.

✅ When Used

node subnet IP exhausted
want pod IP segmentation
network isolation
multi-subnet strategy

⚠️ Tradeoff

More routing complexity. Harder troubleshooting. Must design route tables correctly.

🛡 Calico

✅ What Calico Adds

strong NetworkPolicy engine
fine-grained policy control
can run with AWS CNI (policy-only mode)
can run full overlay mode

✅ When Choose Calico

strict microsegmentation needed
fintech compliance network isolation
zero-trust east-west controls

⚠️ Tradeoff

More operational complexity.

🕸 Weave Net

Characteristics

overlay network
simple setup
not AWS-native
more latency
less used in EKS now

Interview answer: rarely chosen today for EKS.

🧵 Flannel

Characteristics

simple overlay CNI
basic networking
no strong policy engine
good for small clusters
not common in EKS production

🧠 Interview Decision Matrix — Say This

✅ Default Production EKS

AWS VPC CNI + Prefix Delegation

✅ Need Network Policy

AWS CNI + Calico policy mode

✅ Extreme Pod Density

AWS CNI + Prefix Delegation + custom networking

✅ Hybrid / On-prem style cluster

Calico full mode

❌ Avoid for EKS Prod

Weave / Flannel (unless special case)

🧨 Interview Trap — Network Policy Support

AWS CNI alone historically lacked policy → needed Calico. Newer AWS network policy features exist — but Calico still stronger.

Mention this nuance = senior signal.

🧠 Senior One-Shot Summary Answer

If asked:

“What networking model would you choose for large fintech EKS?”

Answer:

I’d use AWS VPC CNI with prefix delegation for scale and native VPC routing, add Calico for network policy enforcement, and use custom networking only if subnet IP pressure requires separation. Overlay CNIs like Weave or Flannel I avoid for production EKS due to performance and operability tradeoffs.

Eks Irsaandpip