ELK Stack: Elasticsearch, Logstash, Kibana, and Beats
Complete guide to the ELK Stack for log aggregation and analysis. Learn Elasticsearch indexing, Logstash pipelines, Kibana visualizations, and Beats shippers.
ELK Stack Deep Dive: Elasticsearch, Logstash, Kibana, and Beats
The ELK Stack is a popular open-source solution for centralized logging. It lets you collect logs from multiple sources, transform them into structured format, store them efficiently, and query them interactively.
This guide covers each component in depth. If you are new to logging concepts, start with our Logging Best Practices guide first.
ELK Stack Architecture
graph LR
A[Log Sources] -->|Shippers| B[Beats]
B --> C[Logstash]
C --> D[Elasticsearch]
D --> E[Kibana]
A -->|Direct| D
The ELK Stack has four main components:
- Beats: Lightweight shippers that collect data from various sources
- Logstash: Transforms and enriches data during transit
- Elasticsearch: Stores and indexes data for fast search
- Kibana: Visualizes and explores data
Elasticsearch
Elasticsearch is a distributed search and analytics engine built on Apache Lucene. It stores documents in JSON format and provides powerful query capabilities.
Core Concepts
| Concept | Description |
|---|---|
| Index | Collection of documents, similar to a database |
| Document | A single JSON record, similar to a row |
| Shard | A partition of an index for horizontal scaling |
| Replica | A copy of a shard for high availability |
Index Lifecycle Management
Define policies to manage index data from creation to deletion:
PUT _ilm/policy/logs-policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_age": "7d",
"max_primary_shard_size": "50gb"
},
"set_priority": 100
}
},
"warm": {
"min_age": "7d",
"actions": {
"shrink": {
"number_of_shards": 1
},
"forcemerge": {
"max_num_segments": 1
},
"set_priority": 50
}
},
"cold": {
"min_age": "30d",
"actions": {
"freeze": {},
"set_priority": 0
}
},
"delete": {
"min_age": "365d",
"actions": {
"delete": {}
}
}
}
}
}
Mapping and Index Templates
Index templates define mappings and settings for new indices:
PUT _index_template/logs-template
{
"index_patterns": ["logs-*"],
"template": {
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"index.lifecycle.name": "logs-policy"
},
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
"level": {
"type": "keyword"
},
"message": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"service": {
"type": "keyword"
},
"trace_id": {
"type": "keyword"
},
"user_id": {
"type": "keyword"
},
"duration_ms": {
"type": "long"
},
"host": {
"properties": {
"name": { "type": "keyword" },
"ip": { "type": "ip" }
}
}
}
}
}
}
Querying Elasticsearch
GET logs-2026.03.22/_search
{
"query": {
"bool": {
"must": [
{ "match": { "service": "api-gateway" } },
{ "range": { "@timestamp": { "gte": "now-1h" } } }
],
"filter": [
{ "term": { "level": "ERROR" } }
]
}
},
"sort": [
{ "@timestamp": "desc" }
],
"aggs": {
"error_by_service": {
"terms": { "field": "service" },
"aggs": {
"error_rate": {
"avg": { "field": "error_count" }
}
}
}
}
}
Logstash
Logstash processes and transforms data before it reaches Elasticsearch. It handles complex parsing, enrichment, and filtering.
Logstash Pipeline
graph TB
A[Input] --> B[Filter]
B --> C[Output]
A pipeline has three sections: input, filter, and output.
Input Plugins
# Receive logs from Beats
input {
beats {
port => 5044
ssl => true
ssl_certificate => "/etc/ssl/certs/logstash.crt"
ssl_key => "/etc/ssl/private/logstash.key"
}
# Alternative: direct HTTP
http {
port => 8080
content_type => "application/json"
}
}
Filter Plugins
Filters transform and enrich data:
filter {
# Parse JSON logs
json {
source => "message"
target => "parsed"
}
# Parse timestamp
date {
match => ["parsed.timestamp", "ISO8601"]
target => "@timestamp"
}
# Extract fields from message
grok {
match => {
"parsed.message" => "%{DATA:level}\s*%{DATA:logger}\s*%{GREEDYDATA:log_message}"
}
overwrite => ["message"]
}
# Add computed fields
mutate {
add_field => {
"environment" => "%{[parsed][env]}"
"[@metadata][index_prefix]" => "logs-%{[parsed][service]}"
}
}
# Enrich with GeoIP
geoip {
source => "[parsed][client_ip]"
target => "[parsed][geoip]"
database => "/etc/logstash/GeoLite2-City.mmdb"
}
# Parse query string
kv {
source => "[parsed][request_params]"
field_split => "&"
prefix => "param_"
}
}
Output Plugins
output {
elasticsearch {
hosts => ["https://elasticsearch:9200"]
manage_template => false
index => "%{[@metadata][index_prefix]}-%{+YYYY.MM.dd}"
ssl => true
cacert => "/etc/ssl/certs/ca.crt"
user => "${ELASTICSEARCH_USER}"
password => "${ELASTICSEARCH_PASSWORD}"
}
# Also send to stdout for debugging
stdout {
codec => rubydebug
}
}
Complete Pipeline Example
input {
beats {
port => 5044
}
}
filter {
if [fields][log_type] == "application" {
json {
source => "message"
target => "parsed"
}
date {
match => ["parsed.timestamp", "ISO8601"]
target => "@timestamp"
}
if [parsed][level] {
mutate {
add_field => { "level" => "%{parsed[level]}" }
}
}
if [parsed][exception] {
mutate {
add_tag => ["error"]
}
}
}
if [fields][log_type] == "access" {
grok {
match => {
"message" => '%{IPORHOST:client_ip} %{DATA:ident} %{DATA:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:status:int} %{NUMBER:bytes:int} "%{DATA:referrer}" "%{DATA:user_agent}"'
}
}
date {
match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
target => "@timestamp"
}
geoip {
source => "client_ip"
target => "geoip"
}
useragent {
source => "user_agent"
target => "ua"
}
}
}
output {
if "error" in [tags] {
elasticsearch {
hosts => ["https://elasticsearch:9200"]
index => "logs-error-%{+YYYY.MM.dd}"
}
} else {
elasticsearch {
hosts => ["https://elasticsearch:9200"]
index => "logs-%{[fields][log_type]}-%{+YYYY.MM.dd}"
}
}
}
Beats
Beats are lightweight data shippers that send data from servers to Logstash or Elasticsearch.
Filebeat
Filebeat tails log files and ships them:
# filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/containers/*.log
json:
keys_under_root: true
add_error_key: true
message_key: log
fields:
log_type: container
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
- type: log
enabled: true
paths:
- /var/log/nginx/*.log
fields:
log_type: nginx
processors:
- add_locale: ~
processors:
- add_host_metadata:
when.not.contains.tags: forwarded
- add_cloud_metadata: ~
- add_docker_metadata: ~
output.logstash:
hosts: ["logstash:5044"]
ssl.enabled: true
ssl.certificate_authorities: ["/etc/filebeat/ca.crt"]
ssl.certificate: "/etc/filebeat/filebeat.crt"
ssl.key: "/etc/filebeat/filebeat.key"
logging.level: info
logging.to_files: true
logging.files:
path: /var/log/filebeat
name: filebeat
keepfiles: 7
permissions: 0644
Metricbeat
Metricbeat collects system and service metrics:
metricbeat.modules:
- module: system
metricsets:
- cpu
- memory
- network
- process
- diskio
period: 10s
processes: [".*"]
- module: docker
metricsets:
- container
- cpu
- diskio
- healthcheck
- info
- memory
- network
hosts: ["unix:///var/run/docker.sock"]
period: 10s
- module: nginx
metricsets:
- stubstatus
hosts: ["http://nginx:8080/nginx_status"]
period: 10s
output.elasticsearch:
hosts: ["https://elasticsearch:9200"]
ssl.enabled: true
ssl.certificate_authorities: ["/etc/metricbeat/ca.crt"]
Heartbeat
Heartbeat monitors service availability with synthetic checks:
heartbeat.monitors:
- type: http
name: api-health-check
schedule: "@every 30s"
urls:
- https://api.example.com/health
check.response:
status: 200
fields:
service: api-gateway
- type: tcp
name: redis-connectivity
schedule: "@every 60s"
hosts: ["redis:6379"]
timeout: 5s
- type: icmp
name: host-ping
schedule: "@every 5m"
hosts: ["elasticsearch"]
output.elasticsearch:
hosts: ["https://elasticsearch:9200"]
Kibana
Kibana provides the visualization and exploration interface for your Elasticsearch data.
Index Pattern Setup
Before exploring data, create an index pattern in Kibana:
- Navigate to Management > Stack Management > Index Patterns
- Click “Create index pattern”
- Enter
logs-*as the pattern - Select
@timestampas the time field
Building Visualizations
Error Rate Over Time
{
"title": "Error Rate",
"type": "line",
"params": {
"type": "line",
"grid": { "categoryLines": false },
"categoryAxes": [
{
"id": "CategoryAxis-1",
"type": "category",
"position": "bottom"
}
],
"valueAxes": [
{
"id": "ValueAxis-1",
"name": "LeftAxis-1",
"type": "value",
"position": "left",
"scale": {
"type": "linear",
"mode": "normal"
}
}
]
},
"aggs": [
{
"id": "1",
"type": "avg",
"schema": "metric",
"params": {
"field": "error_rate"
}
},
{
"id": "2",
"type": "date_histogram",
"schema": "segment",
"params": {
"field": "@timestamp",
"interval": "auto"
}
}
]
}
Service Error Distribution
{
"title": "Errors by Service",
"type": "pie",
"aggs": [
{
"id": "1",
"type": "count",
"schema": "metric"
},
{
"id": "2",
"type": "terms",
"schema": "segment",
"params": {
"field": "service.keyword",
"size": 10
}
}
]
}
Kibana Discover
Discover provides ad-hoc search and exploration:
// Sample Discover query
{
"query": {
"bool": {
"must": [
{ "match": { "level": "ERROR" } },
{ "range": { "@timestamp": { "gte": "now-24h" } } }
]
}
},
"sort": [{ "@timestamp": "desc" }],
"fields": ["@timestamp", "level", "message", "service", "trace_id"],
"filter": [
{
"meta": {
"index": "logs-*",
"negate": false,
"params": {},
"type": "phrase"
},
"query": {
"match_phrase": {
"service": "api-gateway"
}
}
}
]
}
Kibana Dashboard Example
A complete dashboard might include:
- Time series of log volume by level
- Pie chart of error distribution by service
- Table of recent errors with context
- Heat map of errors over time by host
- Metric visualization of error rate and latency percentiles
Deployment Considerations
Hardware Requirements
| Component | CPU | RAM | Disk |
|---|---|---|---|
| Elasticsearch (per node) | 4+ cores | 8GB+ | SSD, 500GB+ |
| Logstash | 2+ cores | 4GB+ | Minimal |
| Kibana | 2 cores | 2GB+ | Minimal |
| Beats | 1 core | 512MB+ | Minimal |
Elasticsearch is I/O intensive. Use SSDs and ensure adequate disk throughput.
Security
# Enable security in elasticsearch.yml
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.http.ssl.enabled: true
# API key authentication
xpack.security.api.key.enabled: true
# Role-based access control
xpack.security.authorization:
roles_path: /etc/elasticsearch/roles.yml
Scaling
Scale Elasticsearch horizontally by adding nodes. The cluster automatically rebalances shards.
# Minimum master nodes for cluster stability
discovery.zen.minimum_master_nodes: 2 # for 3-node cluster
# Adjust shard allocation
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.enable": "all",
"cluster.routing.allocation.cluster_concurrent_rebalance": 2
}
}
When to Use the ELK Stack
Use the ELK Stack when:
- You need centralized logging from multiple services and environments
- You need full-text search across log entries and application data
- You need log analysis and pattern detection with Kibana
- You need security analytics and threat detection
- You need compliance audit logging and archival
- You need infrastructure log aggregation (syslog, nginx, apache)
Don’t use the ELK Stack when:
- You have simple applications with minimal logging needs
- You only need metrics and dashboards (use Prometheus + Grafana instead)
- You have high-volume streaming use cases (Kafka is better suited)
- You need real-time alerting on log data (use dedicated alerting tools)
- You need large-scale time-series metrics (Elasticsearch is not optimized for pure metrics)
ELK Stack vs Alternatives
| Aspect | ELK Stack | Loki | Splunk |
|---|---|---|---|
| Cost | Open source (self-hosted) | Open source (self-hosted) | Commercial (expensive) |
| Storage efficiency | Medium (indexed) | High (log-structured) | Medium |
| Query language | KQL (Kibana) | LogQL (Prometheus-style) | SPL |
| Scalability | Excellent (horizontal) | Excellent | Excellent |
| Ease of setup | Moderate | Easy | Easy |
| Full-text search | Excellent | Limited | Excellent |
| Metrics integration | Via Metricbeat | Native Prometheus | Native |
| Best for | Complex log analysis, security analytics | High-volume Kubernetes logs | Enterprise compliance, security |
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| Elasticsearch cluster red/yellow | Logs not indexing; search degraded | Monitor cluster health; provision more shards; adjust replica settings |
| Logstash pipeline errors | Logs stuck in queue; processing backlog | Monitor pipeline errors; implement dead-letter queues; alert on queue depth |
| Hot tier disk saturation | New indices cannot be created; ingestion fails | Monitor disk usage; implement ILM rollover; add nodes |
| Kibana performance degradation | Slow searches; dashboards timeout | Optimize queries; use filter context; limit time ranges |
| Beats shipper failure | Logs not forwarded; blind spots in coverage | Monitor Beats health; implement local buffering; alert on forward failures |
| Index template mismatch | Fields not indexed correctly; search failures | Version index templates; validate mappings; test before deployment |
Observability Checklist
Infrastructure Monitoring
- Elasticsearch cluster health (green/yellow/red)
- Primary shard and replica distribution
- Index count and size per index
- Node resource utilization (CPU, heap, disk)
- Search and indexing latency percentiles
- JVM heap usage and GC frequency
- Segment count and merge queue depth
Log Pipeline Monitoring
- Beats shipper metrics (bytes sent, errors, lag)
- Logstash pipeline throughput and latency
- Logstash queue depth and worker utilization
- Dead-letter queue size and age
- Log parsing error rate
Kibana Monitoring
- Search response time (p95, p99)
- Dashboard load time
- Visualization render time
- Active users and session count
Data Management
- Index count within expected bounds
- Document count growth rate
- Disk usage trend and forecasting
- ILM policy execution success/failure
- Archive tier accessibility
Security Checklist
- Elasticsearch security enabled (XPack Security)
- User authentication configured (LDAP, SAML, or built-in)
- Role-based access control for indices and spaces
- TLS encryption for all network traffic
- API keys rotated regularly
- Kibana spaces isolation (dev/staging/prod separation)
- Audit logging enabled for security events
- No sensitive data in index names or field names
- Snapshot repositories secured and access logged
- Cross-cluster search secured if used
Common Pitfalls / Anti-Patterns
1. Too Many Indices with Few Documents
Each index has overhead. Too many small indices overwhelms the cluster:
// Bad: Index per day per service creates thousands of indices
PUT logs-service-a-2026.03.22
PUT logs-service-b-2026.03.22
// ... thousands more
// Good: Use rollover with larger time intervals
PUT logs-service-a
{
"aliases": {
"logs-service-a": { "is_write_index": true }
}
}
2. Dynamic Field Mapping Without Controls
Dynamic mapping can create unexpected field types and blow up cardinality:
// Bad: Unrestricted dynamic mapping
{
"mappings": {
"dynamic": "true" // Creates any field
}
}
// Good: Strict dynamic mapping or disabled
{
"mappings": {
"dynamic": "strict",
"properties": {
"@timestamp": { "type": "date" },
"level": { "type": "keyword" },
"message": { "type": "text" }
}
}
}
3. Not Using Filter Context for Simple Queries
Filter context is faster because it does not score:
// Bad: Query context for term filter
{
"query": {
"match": { "level": "ERROR" } // Scores, slower
}
}
// Good: Filter context for exact match
{
"query": {
"bool": {
"filter": [
{ "term": { "level": "ERROR" } } // No scoring, faster
]
}
}
}
4. Ignoring Index Lifecycle Management
Without ILM, indices grow unbounded and performance degrades:
// Good: ILM with hot/warm/cold/delete
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": { "rollover": { "max_age": "7d" } }
},
"warm": { "min_age": "7d", "actions": { "shrink": 1, "forcemerge": 1 } },
"cold": { "min_age": "30d", "actions": { "freeze": {} } },
"delete": { "min_age": "365d", "actions": { "delete": {} } }
}
}
}
5. Loading Too Much Data into Memory
Kibana visualizations on large time ranges cause OOM:
// Bad: Visualize 90 days of minute-level data
{
"query": { "range": { "@timestamp": { "gte": "now-90d" } } }
}
// Good: Use date histogram with appropriate interval
{
"aggs": {
"over_time": {
"date_histogram": {
"field": "@timestamp",
"fixed_interval": "1h" // Or auto with proper configuration
}
}
}
}
Quick Recap
Key Takeaways:
- Beats collect, Logstash transforms, Elasticsearch stores, Kibana visualizes
- Index lifecycle management prevents unbounded growth
- Use filter context for exact matches; query context only when scoring needed
- Monitor cluster health and pipeline metrics proactively
- Implement security early: authentication, TLS, RBAC
- Design index templates carefully to control field mapping
Copy/Paste Checklist:
# Check cluster health
GET _cluster/health?pretty
# Monitor index size and document count
GET _cat/indices?v&s=store.size:desc
# Check Logstash pipeline status
GET _nodes/stats/ingest?filter_path=nodes.*.ingest
# ILM policy check
GET _ilm/policy/logs-policy?pretty
# Dead letter queue inspection
GET _all/_doc/_search?q=tags:_dead_letter_queue
# Index template validation
GET _index_template/logs-template?pretty
# Secure your cluster (Elasticsearch)
PUT _security/user/kibana_admin
{
"password": "${KIBANA_PASSWORD}",
"roles": ["kibana_admin"]
}
Conclusion
The ELK Stack provides a powerful platform for centralized logging and analysis. Beats collect data efficiently, Logstash transforms it into structured format, Elasticsearch stores and indexes it, and Kibana makes it explorable.
Start with Filebeat shipping container logs to Elasticsearch, and build from there. Add Logstash for complex parsing, Kibana for visualizations, and ILM policies for efficient data retention.
For monitoring beyond logs, see our Prometheus & Grafana guide for metrics visualization. For distributed tracing, see the Jaeger and Distributed Tracing guides for correlating logs with request traces.
Category
Related Posts
Logging Best Practices: Structured Logs, Levels, Aggregation
Master production logging with structured formats, proper log levels, correlation IDs, and scalable log aggregation. Includes patterns for containerized applications.
Alerting in Production: Building Alerts That Matter
Build alerting systems that catch real problems without fatigue. Learn alert design principles, severity levels, runbooks, and on-call best practices.
Audit Logging: Tracking Data Changes for Compliance
Implement audit logging for compliance. Learn row-level change capture with triggers and CDC, log aggregation strategies, and retention policies.