π DevOps Networking β 20 Interview Questions (Cloud + K8s + Production)
β Q1 β Explain full request flow from user β app in Kubernetes behind ALB.
User hits domain β DNS resolves to ALB β ALB routes to target group β node/pod IP β Ingress/Service β pod container port. TLS may terminate at ALB or ingress. Each hop must allow security group + network policy. Most failures happen at SG or health check layer.
β Q2 β Difference between ALB and NLB in real use?
ALB is Layer 7 β supports HTTP routing, path rules, TLS offload. NLB is Layer 4 β TCP/UDP, faster, static IP support. ALB for web apps, NLB for gRPC, TCP services, or when source IP preservation is needed.
β Q3 β What is east-west vs north-south traffic?
North-south = external client to cluster/service traffic. East-west = internal service-to-service traffic. East-west is usually handled by service mesh or cluster networking. Security rules differ for both.
β Q4 β What breaks if DNS fails inside Kubernetes?
Service discovery fails β pods cannot resolve service names. Apps fail even if pods are healthy. CoreDNS outage causes cascading failures. Thatβs why CoreDNS needs HA replicas.
β Q5 β How does Kubernetes service routing work technically?
Service gets virtual ClusterIP. kube-proxy programs iptables/IPVS rules. Traffic to ClusterIP is redirected to pod endpoints. Itβs node-level NAT routing.
β Q6 β NodePort vs LoadBalancer vs Ingress β when use which?
NodePort exposes service on every node port β mostly for internal/testing. LoadBalancer creates cloud LB per service. Ingress routes many services via single LB β preferred for HTTP apps.
β Q7 β What is SNAT and where do you see it in Kubernetes/cloud?
SNAT rewrites source IP on outbound traffic. Happens when pods access internet via node. In EKS VPC CNI, pod IP may be SNATed depending on config. Affects logging and firewall rules.
β Q8 β How do security groups interact with EKS pod traffic?
By default, SG applies at node ENI level. With Security Groups for Pods feature, SG can attach to pod ENI. That allows pod-level network control. Useful for DB access isolation.
β Q9 β Why do readiness probes affect load balancer routing?
If readiness fails, pod is removed from endpoints. Load balancer stops sending traffic. Without readiness probes, traffic may hit unhealthy pods. That causes partial outages.
β Q10 β What is MTU and why can it break containers?
MTU defines max packet size. Overlay networks reduce MTU. If mismatched, packets fragment or drop. Causes weird connection resets. Seen in VPN + K8s combos.
β Q11 β How do you debug pod cannot reach external API?
Exec into pod β curl API β check DNS resolve β check NetworkPolicy β check NAT gateway β check route table. Work from inside outward. Never guess β test path.
β Q12 β Difference between proxy vs reverse proxy?
Proxy acts for client outbound requests. Reverse proxy sits in front of servers and routes inbound traffic. Ingress controllers are reverse proxies.
β Q13 β What is service mesh solving in networking?
Service mesh handles east-west traffic control β retries, mTLS, routing, observability. It moves networking logic out of app code. Example: Istio, Linkerd.
β Q14 β How does mTLS work in service mesh?
Both client and server present certificates. Traffic is encrypted and identity-verified both ways. Prevents spoofing inside cluster. Certificates are rotated automatically by mesh control plane.
β Q15 β Why do we use private subnets for nodes?
Nodes donβt need public IP exposure. Reduces attack surface. Outbound access via NAT gateway. Only load balancers are public.
β Q16 β What is NAT Gateway role in cloud networking?
Allows private subnet resources to access internet outbound. No inbound allowed. Required for pulling images, updates, external APIs.
β Q17 β Why can health checks pass but app still fail?
Health check endpoint may be shallow. Only checks process alive, not dependency health. Real readiness should test critical dependencies.
β Q18 β What is connection draining / deregistration delay?
Load balancer waits before removing target to allow in-flight requests to finish. Prevents request drops during rollout. Important in zero-downtime deploys.
β Q19 β DNS TTL β why does it matter in failover?
TTL controls cache time. High TTL delays failover routing. Low TTL increases DNS query load. TTL tuning is part of DR planning.
β Q20 β Most common real-world networking misconfig in DevOps?
Security group or NetworkPolicy blocking traffic silently. People debug app layer first β but packet never arrives. Always verify network path early.