Elk – ahmadrazalab

✅ Q1 — Explain ELK data flow in real production setup.

Logs are generated by apps or nodes → shipped by agents like Filebeat/Fluent Bit → sent to Logstash (or directly to Elasticsearch) → parsed and enriched → stored in Elasticsearch indices → visualized and queried via Kibana. In Kubernetes, usually DaemonSet shippers are used. Buffering layer like Kafka is added at scale.

✅ Q2 — When do you use Logstash vs Beats directly to Elasticsearch?

Use Beats → Elasticsearch directly when logs are already structured and simple. Use Logstash when you need heavy parsing, enrichment, grok filters, or routing logic. Logstash adds flexibility but also latency and resource cost.

✅ Q3 — Log ingestion delay is high — how do you debug?

Check shipper backlog and CPU usage first. Then check Logstash pipeline throughput and queue size. Look for filter bottlenecks like heavy grok patterns. Also check Elasticsearch indexing pressure — slow indexing backs up pipeline.

✅ Q4 — What is grok filter and why is it risky at scale?

Grok parses unstructured logs using regex patterns. It’s powerful but CPU expensive. Complex grok on high-volume logs can choke Logstash. Prefer structured JSON logs whenever possible.

✅ Q5 — Elasticsearch cluster is yellow — what does it mean?

Yellow means primary shards are allocated but replica shards are not. Data is available but not fully redundant. Usually caused by insufficient nodes or shard allocation rules. It’s a warning, not outage — but risky.

✅ Q6 — Cluster is red — immediate meaning?

Red means some primary shards are unassigned — data unavailable. Queries will fail for those indices. I check shard allocation, node health, and disk space immediately. This is incident-level.

✅ Q7 — Elasticsearch disk is filling fast — what controls retention?

Index Lifecycle Management (ILM) policies control rollover and deletion. You define hot → warm → delete phases. Without ILM, indices grow forever. Retention must be enforced automatically.

✅ Q8 — Too many small indices — why is it bad?

Each index and shard has overhead in memory and file handles. Many small indices waste resources and slow cluster state updates. Better to use rollover strategy with controlled shard count.

✅ Q9 — What is shard sizing best practice?

Avoid very small or very large shards. Common target is ~20–50 GB per shard depending on workload. Oversharding hurts memory. Undersharding hurts parallelism.

✅ Q10 — Logs missing in Kibana but present on server — checks?

Check shipper logs first (Filebeat/Fluent Bit). Verify output destination and index name. Then check Logstash pipeline errors. Finally search raw index in Elasticsearch with wildcard query.

✅ Q11 — Logstash pipeline crashing — how debug?

Check Logstash logs and pipeline config syntax. Run config test mode before deploy. Look for grok failures or mapping conflicts. Enable dead letter queue if needed.

✅ Q12 — What is mapping conflict in Elasticsearch?

Same field has different data types across documents. Example: status as string in one log, number in another. Elasticsearch rejects documents then. Fix with index templates and consistent log structure.

✅ Q13 — Structured vs unstructured logs — which is better?

Structured JSON logs are far better. No grok needed, faster parsing, fewer errors. Querying is easier and safer. Modern best practice is structured logging at app level.

✅ Q14 — How do you scale Elasticsearch cluster?

Add more data nodes for storage and query load. Separate master nodes for cluster stability. Use dedicated ingest nodes if heavy pipelines. Scale based on shard and query pressure.

✅ Q15 — Master node overloaded — symptoms?

Cluster state updates slow, shard allocation delayed, elections unstable. Kibana may show cluster issues. Fix by using dedicated master nodes with enough memory and low heap pressure.

✅ Q16 — Heap sizing rule for Elasticsearch nodes?

Heap should be about 50% of RAM but below ~32GB. Above that loses compressed OOP advantage. Leave rest memory for OS cache — Elasticsearch relies heavily on filesystem cache.

✅ Q17 — How do you secure ELK stack?

Enable TLS between components. Use authentication and role-based access. Restrict index-level permissions. Never expose Elasticsearch directly to public internet.

✅ Q18 — High query latency in Kibana — where to look?

Check Elasticsearch query time and shard count first. Slow queries often hit too many shards. Optimize index pattern and time filters. Add index lifecycle rollover.

✅ Q19 — Difference between hot-warm architecture?

Hot nodes store recent, frequently queried data on fast disks. Warm nodes store older data on cheaper storage. ILM moves indices between tiers automatically. Saves cost at scale.

✅ Q20 — ELK vs Loki — when choose which?

ELK is full-text indexing and powerful search but heavy and costly. Loki is label-based, cheaper, and simpler for Kubernetes logs. Choose ELK when deep search and analytics needed. Choose Loki for lightweight cluster logging.

Cicd Jenkins