Introduction

DEVOPS INTERVIEW PREP GUIDE


🧠 Cluster Architecture & Core Concepts

  1. Explain full Kubernetes control plane architecture and request flow from kubectl to pod creation.
  2. What happens internally when you create a Deployment?
  3. Difference between Deployment, StatefulSet, DaemonSet β€” with production use cases.
  4. When should you use StatefulSet over Deployment β€” and why not always?
  5. How kube-scheduler makes scheduling decisions?
  6. What are scheduler predicates and priorities (or scheduling framework plugins)?
  7. How does kube-controller-manager work? Name key controllers.
  8. What happens if kube-controller-manager goes down?
  9. How etcd stores data β€” and why quorum matters?
  10. How do you design HA control plane?

🌐 Networking & CNI (Very Frequently Asked)

  1. How pod-to-pod communication works across nodes?
  2. What is CNI β€” and what breaks if CNI fails?
  3. Difference between ClusterIP, NodePort, LoadBalancer in real usage.
  4. How kube-proxy works (iptables vs ipvs modes)?
  5. What is headless service and when used?
  6. How DNS resolution works inside cluster?
  7. How would you debug pod cannot reach another pod?
  8. NetworkPolicy β€” how it is enforced and common mistakes.
  9. Difference between Ingress and Gateway API.
  10. How TLS termination works with Ingress controller.

βš™οΈ Scheduling, Resources & Scaling

  1. Difference between requests and limits β€” real impact on scheduling.
  2. What happens if limits are not defined?
  3. What is OOMKilled and how to prevent it?
  4. How HPA actually calculates scaling decisions?
  5. Metrics Server vs Prometheus for HPA β€” difference.
  6. Difference between HPA, VPA, Cluster Autoscaler.
  7. When HPA fails to scale β€” debugging steps.
  8. PodDisruptionBudget β€” real production use case.
  9. Taints & tolerations β€” when you used them.
  10. Node affinity vs pod affinity vs anti-affinity β€” real scenario usage.

πŸš€ Deployments & Release Strategies

  1. Rolling update β€” what parameters control behavior?
  2. How maxUnavailable and maxSurge affect rollout?
  3. How to implement Blue-Green in Kubernetes?
  4. How to implement Canary in Kubernetes?
  5. How to rollback a bad deployment safely?
  6. How readiness probe affects rollout?
  7. Liveness vs readiness vs startup probe β€” failure impact.
  8. How to achieve zero downtime deploy?
  9. What breaks zero downtime deploys most often?
  10. How do you manage config changes without image rebuild?

πŸ’Ύ Storage

  1. PV vs PVC vs StorageClass β€” full lifecycle.
  2. Static vs dynamic provisioning.
  3. How volume binding works.
  4. When PVC stays Pending β€” root causes.
  5. Stateful app storage best practices.
  6. RWX vs RWO β€” production implications.

πŸ” Security & Access Control

  1. RBAC β€” how you design least privilege roles.
  2. Difference between Role and ClusterRole with example.
  3. ServiceAccount β€” how it is used by pods.
  4. How secrets are stored β€” and why base64 is not encryption.

πŸ’₯ Bonus Scenario Questions (Interview Killers)

  • Pod stuck in Pending β€” walk me through debugging.
  • Traffic suddenly drops after deploy β€” what do you check?
  • One node shows high CPU β€” but pods look fine β€” why?
  • Cluster autoscaler not scaling β€” why?
  • How to upgrade EKS cluster safely?
  • How to rotate certificates in cluster?
  • How to debug CrashLoopBackOff step by step?
  • How to reduce Kubernetes cost?

⚠️ Brutal Self-Test Rule

Don’t just β€œknow answers”.

You should be able to say:

  • what you did
  • what broke
  • what you fixed
  • what you learned

πŸ’¬ Need a Quick Summary?

Hey! Don't have time to read everything? I get it. 😊
Click below and I'll give you the main points and what matters most on this page.
Takes about 5 seconds β€’ Uses Perplexity AI