π Part 1 β EKS Upgrade Process & Strategy (Production Grade)
Interviewers donβt want βclick upgrade.β They want risk-managed upgrade strategy.
Start your answer like this:
I treat EKS upgrades as a staged, low-risk rollout across control plane, node groups, and add-ons β never a one-step upgrade.
Strong opener.
β EKS Upgrade Has 3 Layers (Say This Clearly)
EKS upgrades are done in order:
1οΈβ£ Control Plane 2οΈβ£ Cluster Add-ons 3οΈβ£ Worker Nodes
If you mix order β breakage risk.
π§ Step 1 β Pre-Upgrade Assessment
Before upgrading:
- check Kubernetes version skew policy
- read EKS release notes
- check deprecated APIs (very important)
- scan manifests for removed APIs
- check CRDs compatibility
- check add-on compatibility matrix
- verify CNI / CoreDNS / kube-proxy versions
- check Helm charts compatibility
Senior signal: mention API deprecation scan.
Tools you can mention:
- kubent
- pluto
- kube-no-trouble
π Step 2 β Upgrade Control Plane
EKS control plane upgrade is managed by AWS.
Process:
- upgrade via console/CLI
- no node restart yet
- control plane becomes new version
- worker nodes can remain one version behind (skew allowed)
Risk is low but:
- webhook / admission controllers can break
- API removal can break controllers
π Step 3 β Upgrade Add-ons
Critical add-ons:
- VPC CNI
- CoreDNS
- kube-proxy
- EBS/EFS CSI drivers
- Load balancer controller
These must match cluster version.
Many outages happen here β not control plane.
π₯ Step 4 β Upgrade Node Groups (Safest Pattern)
Best practice = blue/green node group upgrade
Pattern:
Create new node group with:
- new AMI
- new kubelet version
- new CNI version
Then:
- cordon old nodes
- drain pods respecting PDB
- shift workloads
- delete old node group
Never in-place patch all nodes blindly.
π Step 5 β Workload Safety Controls
Must mention:
- PodDisruptionBudgets
- readiness probes
- rolling deployments
- maxUnavailable tuning
- surge capacity available
Without PDB β upgrade can cause outage.
π§ͺ Step 6 β Pre-Prod Upgrade First
Senior answer must include:
I always upgrade staging cluster first and run smoke + load tests before production.
β± Upgrade Frequency Strategy
Good interview answer:
- stay within 1β2 versions behind
- avoid big jumps
- schedule quarterly upgrades
- treat as routine β not rare event
π Part 2 β EKS Networking & CNI Models (Deep Interview Topic)
This is where many candidates get confused. Letβs make it clean.
π§© First β What CNI Does
CNI decides:
- pod IP allocation
- pod routing
- pod β pod communication
- pod β VPC communication
- network policy support
- IP scaling limits
π’ AWS VPC CNI (Default)
β How AWS CNI Works
Pods get real VPC IPs from subnet.
Pod = VPC-native IP No overlay network.
β Strengths
- native VPC routing
- no encapsulation overhead
- high performance
- security groups integration
- works well with AWS LBs
- simplest for AWS-native workloads
β οΈ Weakness
Consumes VPC IPs fast. Subnet exhaustion is common at scale.
π Prefix Delegation (AWS CNI Enhancement)
β What It Does
Instead of attaching many secondary IPs β attach IP prefixes per ENI.
One prefix = block of pod IPs.
β Benefits
- massive pod density increase
- fewer ENIs needed
- faster pod startup
- reduces IP exhaustion pressure
- best for high-scale clusters
π§ Interview Position
Say:
Prefix delegation is the preferred scaling model for high pod density on AWS CNI.
π VPC CNI Custom Networking
β What It Solves
Use separate subnets for pod IPs instead of node subnets.
Node subnet β pod subnet.
β When Used
- node subnet IP exhausted
- want pod IP segmentation
- network isolation
- multi-subnet strategy
β οΈ Tradeoff
More routing complexity. Harder troubleshooting. Must design route tables correctly.
π‘ Calico
β What Calico Adds
- strong NetworkPolicy engine
- fine-grained policy control
- can run with AWS CNI (policy-only mode)
- can run full overlay mode
β When Choose Calico
- strict microsegmentation needed
- fintech compliance network isolation
- zero-trust east-west controls
β οΈ Tradeoff
More operational complexity.
πΈ Weave Net
Characteristics
- overlay network
- simple setup
- not AWS-native
- more latency
- less used in EKS now
Interview answer: rarely chosen today for EKS.
π§΅ Flannel
Characteristics
- simple overlay CNI
- basic networking
- no strong policy engine
- good for small clusters
- not common in EKS production
π§ Interview Decision Matrix β Say This
β Default Production EKS
AWS VPC CNI + Prefix Delegation
β Need Network Policy
AWS CNI + Calico policy mode
β Extreme Pod Density
AWS CNI + Prefix Delegation + custom networking
β Hybrid / On-prem style cluster
Calico full mode
β Avoid for EKS Prod
Weave / Flannel (unless special case)
𧨠Interview Trap β Network Policy Support
AWS CNI alone historically lacked policy β needed Calico. Newer AWS network policy features exist β but Calico still stronger.
Mention this nuance = senior signal.
π§ Senior One-Shot Summary Answer
If asked:
βWhat networking model would you choose for large fintech EKS?β
Answer:
Iβd use AWS VPC CNI with prefix delegation for scale and native VPC routing, add Calico for network policy enforcement, and use custom networking only if subnet IP pressure requires separation. Overlay CNIs like Weave or Flannel I avoid for production EKS due to performance and operability tradeoffs.