DEVOPS INTERVIEW PREP GUIDE
π§ Cluster Architecture & Core Concepts
- Explain full Kubernetes control plane architecture and request flow from kubectl to pod creation.
- What happens internally when you create a Deployment?
- Difference between Deployment, StatefulSet, DaemonSet β with production use cases.
- When should you use StatefulSet over Deployment β and why not always?
- How kube-scheduler makes scheduling decisions?
- What are scheduler predicates and priorities (or scheduling framework plugins)?
- How does kube-controller-manager work? Name key controllers.
- What happens if kube-controller-manager goes down?
- How etcd stores data β and why quorum matters?
- How do you design HA control plane?
π Networking & CNI (Very Frequently Asked)
- How pod-to-pod communication works across nodes?
- What is CNI β and what breaks if CNI fails?
- Difference between ClusterIP, NodePort, LoadBalancer in real usage.
- How kube-proxy works (iptables vs ipvs modes)?
- What is headless service and when used?
- How DNS resolution works inside cluster?
- How would you debug pod cannot reach another pod?
- NetworkPolicy β how it is enforced and common mistakes.
- Difference between Ingress and Gateway API.
- How TLS termination works with Ingress controller.
βοΈ Scheduling, Resources & Scaling
- Difference between requests and limits β real impact on scheduling.
- What happens if limits are not defined?
- What is OOMKilled and how to prevent it?
- How HPA actually calculates scaling decisions?
- Metrics Server vs Prometheus for HPA β difference.
- Difference between HPA, VPA, Cluster Autoscaler.
- When HPA fails to scale β debugging steps.
- PodDisruptionBudget β real production use case.
- Taints & tolerations β when you used them.
- Node affinity vs pod affinity vs anti-affinity β real scenario usage.
π Deployments & Release Strategies
- Rolling update β what parameters control behavior?
- How maxUnavailable and maxSurge affect rollout?
- How to implement Blue-Green in Kubernetes?
- How to implement Canary in Kubernetes?
- How to rollback a bad deployment safely?
- How readiness probe affects rollout?
- Liveness vs readiness vs startup probe β failure impact.
- How to achieve zero downtime deploy?
- What breaks zero downtime deploys most often?
- How do you manage config changes without image rebuild?
πΎ Storage
- PV vs PVC vs StorageClass β full lifecycle.
- Static vs dynamic provisioning.
- How volume binding works.
- When PVC stays Pending β root causes.
- Stateful app storage best practices.
- RWX vs RWO β production implications.
π Security & Access Control
- RBAC β how you design least privilege roles.
- Difference between Role and ClusterRole with example.
- ServiceAccount β how it is used by pods.
- How secrets are stored β and why base64 is not encryption.
π₯ Bonus Scenario Questions (Interview Killers)
- Pod stuck in Pending β walk me through debugging.
- Traffic suddenly drops after deploy β what do you check?
- One node shows high CPU β but pods look fine β why?
- Cluster autoscaler not scaling β why?
- How to upgrade EKS cluster safely?
- How to rotate certificates in cluster?
- How to debug CrashLoopBackOff step by step?
- How to reduce Kubernetes cost?
β οΈ Brutal Self-Test Rule
Donβt just βknow answersβ.
You should be able to say:
- what you did
- what broke
- what you fixed
- what you learned