Elasticsearch: Full-Text Search at Scale
Learn how Elasticsearch powers search at scale with inverted indexes, sharding, replicas, and its powerful Query DSL for modern applications.
Elasticsearch: Full-Text Search at Scale
Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene. It handles billions of events per day, from log analysis to full-text search, and is the backbone of the Elastic Stack. If you need to search, analyze, or visualize data at scale, Elasticsearch is the standard choice for modern search infrastructure.
This post covers the core concepts that make Elasticsearch work: the inverted index, sharding and replication, and the Query DSL. You will learn how to design indices, write efficient queries, and scale your search cluster horizontally.
The Inverted Index: How Search Works Under the Hood
Traditional databases store data row by row. When you search for “Elasticsearch tutorial,” the database scans every row, checking if the text contains your query. This works for small datasets but becomes prohibitively slow as data grows.
Elasticsearch flips this model with an inverted index. Instead of mapping documents to words, it maps words to documents.
{
"inverted_index": {
"elasticsearch": [{ "doc_id": 1, "positions": [0, 3] }],
"tutorial": [{ "doc_id": 1, "positions": [1] }],
"full-text": [{ "doc_id": 2, "positions": [0] }]
}
}
When you search for “elasticsearch tutorial,” Elasticsearch looks up both terms in the inverted index and finds matching documents instantly. No full table scan required.
The index also stores metadata about each term: document frequency, term frequency, positions for phrase queries, and norms for field-length normalization. This metadata is what makes relevance scoring, phrase matching, and fuzzy search possible.
Analyzer Pipeline
Before terms enter the inverted index, they pass through an analyzer consisting of three stages:
- Character filters remove HTML tags, convert characters, or apply language-specific normalizations.
- Tokenizer splits text into individual terms (tokens).
- Token filters lowercase terms, remove stop words, apply synonyms, and perform stemming.
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "snowball", "asciifolding"]
}
}
}
}
}
Choosing the right analyzer matters. The standard analyzer works fine for most English text. For domain-specific vocabularies, you might need custom analyzers with synonym filters or language-specific stemmers.
Sharding: Distributing Data Across Nodes
A single Elasticsearch node can store millions of documents, but eventually you will need more storage, CPU, or memory than one machine provides. Elasticsearch solves this with shards: horizontal partitions of your index.
When you create an index, you specify the number of primary shards. Each primary shard is an independent Lucene index that stores a subset of your documents.
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1
}
}
With three primary shards, Elasticsearch distributes documents roughly evenly across them. A document with ID doc123 is routed to a specific shard using a hash of the ID: shard = hash(_id) % num_primary_shards.
Shard Routing Explained
graph TD
Client[Client] -->|"search request"| LB[Load Balancer]
LB --> Node1[Node 1]
Node1 -->|"coordination"| Coordinator[Coordinator]
Coordinator -->|"scatter"| Shard1[Primary Shard 1]
Coordinator -->|"scatter"| Shard2[Primary Shard 2]
Coordinator -->|"scatter"| Shard3[Primary Shard 3]
Shard1 -->|"gather"| Coordinator
Shard2 -->|"gather"| Coordinator
Shard3 -->|"gather"| Coordinator
Coordinator -->|"reduce"| Client
The node receiving the search request becomes the coordinator. It broadcasts the query to all relevant shards, collects results, and merges them into a single response. The scatter-gather pattern here is what lets you search across all shards in parallel.
Replica Shards
Every primary shard can have replica shards for fault tolerance and read scalability. Replicas are never allocated on the same node as their primary. If a node fails, Elasticsearch automatically promotes replicas to primary shards.
Replica configuration is dynamic. You can increase replicas after index creation:
PUT /my-index/_settings
{
"number_of_replicas": 2
}
For read-heavy workloads, adding replicas linearly scales search throughput. Three replicas of three primary shards gives you six total shards to handle search requests.
The Query DSL: Expressing Search Logic
Elasticsearch’s Query DSL is a JSON-based language for expressing complex search queries. It separates queries into two categories:
- Leaf queries match specific fields (match, term, range, geo queries)
- Compound queries combine leaf queries (bool, constant_score, function_score)
The Bool Query
The bool query is the workhorse of Elasticsearch search. It supports four clauses:
- must: Documents must match (AND logic)
- should: Documents should match (OR logic)
- filter: Same as must but without scoring (faster)
- must_not: Documents must not match (NOT logic)
{
"query": {
"bool": {
"must": [{ "match": { "title": "elasticsearch" } }],
"filter": [
{ "range": { "publish_date": { "gte": "2024-01-01" } } },
{ "term": { "status": "published" } }
],
"should": [
{ "match": { "tags": "search" } },
{ "match": { "tags": "database" } }
],
"minimum_should_match": 1
}
}
}
The filter context is particularly important. Queries in filter context bypass scoring entirely, and Elasticsearch caches filter bitsets for reuse. If you filter by a static field like status, subsequent queries become faster.
Relevance Scoring
Elasticsearch uses BM25 (Okapi BM25) as its default similarity algorithm. BM25 considers term frequency saturation and field length normalization. A term appearing 10 times in a 100-word field scores roughly the same as appearing 5 times in a 50-word field.
You can debug relevance with the explain parameter:
GET /my-index/_search
{
"explain": true,
"query": {
"match": { "content": "elasticsearch tutorial" }
}
}
The response shows exactly how each document scored, including term frequencies, inverse document frequencies, and field norms.
Designing Indices for Performance
Index design profoundly impacts search performance. A few principles:
Size your shards wisely. Shards between 10GB and 50GB work well. Too many small shards cause overhead; too few large shards cause memory pressure.
Use aliases for flexibility. Index aliases let you reindex without downtime:
POST /_aliases
{
"actions": [
{ "remove": { "index": "my-index-v1", "alias": "my-index" } },
{ "add": { "index": "my-index-v2", "alias": "my-index" } }
]
}
Denormalize for read performance. Elasticsearch does not support joins like SQL. If you frequently query documents with nested objects, consider flattening the structure or using denormalization.
Capacity Estimation
Sizing an Elasticsearch cluster means estimating heap and document counts.
Heap allocation per node:
heap_per_node ≈
(shards_per_node × segment_overhead)
+ (indexing_buffer × 10-20% of heap)
+ (query_cache × 10% of heap)
+ (fielddata × 10-20% of heap)
+ OS reserve (1GB)
Elasticsearch heap is shared across all shards on a node. Segment overhead alone is roughly 25MB per shard. With 30 shards on one node, that is 750MB before you even touch indexing buffers or caches.
Docs per shard guidelines:
| Shard Size Target | Docs per Shard (approx) |
|---|---|
| 10GB shard | 10-30M docs (variable by doc size) |
| 30GB shard | 30-100M docs |
| 50GB shard | 50-150M docs |
Doc count is harder to predict than shard size. A 1KB doc and a 50KB doc land at very different doc counts even at the same shard size. Watch both.
Example: 500GB index with 3 primary shards and 1 replica each:
shards_per_node = 3 primaries + 3 replicas = 6 shards
at 30GB/shard = 180GB per node
heap needed:
segment overhead: 6 × 25MB = 150MB
indexing buffer: 512MB (default, scales with heap)
query cache: 276MB (10% of ~2.7GB heap)
fielddata: 276MB
OS reserve: 1GB
total ≈ 2.5GB minimum, 4GB recommended
The sweet spot is 30GB-32GB heap per node. Above 32GB, Lucene starts using direct memory buffers outside the Java heap, which shifts the memory math. Most production clusters run 31GB heap on 64GB machines — the remaining RAM goes to the OS page cache, which Lucene uses heavily for file system caching.
When to Use / When Not to Use
When to Use Elasticsearch
- Log and event analysis at scale, especially with the ELK stack
- Full-text search with complex relevance tuning and fuzzy matching
- Real-time analytics on time-series data with aggregations
- Distributed search where horizontal scalability is a requirement
- Autocomplete and type-ahead features via completion suggester
When Not to Use Elasticsearch
- Primary data store requiring ACID transactions (use a proper database)
- Simple key-value lookups where a document store suffices
- Heavy join operations across multiple entity types
- Systems requiring strong consistency (Elasticsearch is eventually consistent by default)
- Small datasets where a single PostgreSQL
tsvectoror SQLite FTS5 would suffice
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| Primary shard allocation fails | Index unavailable for writes | Set index.number_of_replicas >= 1 and use automatic retry |
| Coordinating node OOM | Search queries fail across cluster | Limit search.max_buckets, add circuit breakers, increase heap |
| Split-brain during network partition | Duplicate data or conflicting primaries | Use minimum_master_nodes ( quorum), prefer single-zone clusters |
| Bulk indexing queue overflow | Documents rejected, indexing lag | Size queue with thread_pool.write.queue_size, implement backpressure |
| Hot/Warm node imbalance | Some nodes run out of disk or CPU | Use Index Lifecycle Management (ILM), allocate shards manually |
| Incorrect analyzer causing data loss | Documents return no results | Test analyzers with _analyze API before applying to production indices |
Observability Checklist
Metrics to Monitor
{
"cluster_metrics": {
"cluster_health": "green/yellow/red status",
"number_of_pending_tasks": "< 100 typically",
"task_duration_avg": "< 1s for search, < 5s for bulk"
},
"node_metrics": {
"heap_used_percent": "< 85% sustained",
"cpu_usage_percent": "< 70% sustained",
"disk_io_read/write": "baseline + anomaly detection",
"open_file_handles": "< 80% of ulimit"
},
"index_metrics": {
"indexing_rate": "documents/second",
"search_latency_p99": "< 500ms for interactive, < 2s for batch",
"refresh_latency": "< 1s per segment",
"segments_count": "< 50 per shard (force merge if higher)"
}
}
Key Logs to Capture
- Error logs:
logs/elasticsearch.log— shard failures, OOM events, circuit breaker trips - Deprecation logs: deprecated API usage, settings scheduled for removal
- Slow search logs: configure
index.search.slowlog.thresholdto capture queries exceeding latency thresholds - Indexing slow logs:
index.indexing.slowlog.thresholdfor bulk insert issues
# Enable slow logs via API (persistent across restarts)
PUT /my-index/_settings
{
"index.search.slowlog.threshold.query.warn": "10s",
"index.search.slowlog.threshold.fetch.warn": "1s",
"index.indexing.slowlog.threshold.index.warn": "10s"
}
Alerts to Configure
| Alert | Condition | Severity |
|---|---|---|
| Cluster health | status == red | Critical |
| Node heap usage | heap_used_percent > 90% for > 5 min | Warning |
| Search latency | search_latency_p99 > 2s | Warning |
| Unassigned shards | unassigned_shards > 0 for > 5 min | Critical |
| Bulk queue rejection | bulk_queue_rejections > 0 | Warning |
| Disk watermark | disk.watermark.low exceeded | Warning |
Security Checklist
- Enable XPack Security (or OpenSearch Security if using OpenSearch distro)
- Use role-based access control (RBAC) — define roles with least-privilege principles
- Encrypt node-to-node communication with TLS certificates
- Restrict JMX/REST endpoints to internal networks; do not expose publicly
- Validate input to prevent injection via query strings and aggregations
- Audit access logs — log all write operations to sensitive indices
- Rotate credentials regularly; use a secrets manager for API keys
- Configure field-level security if certain fields must be hidden from some roles
- Enable audit logging to track unauthorized access attempts
# Example: Create a read-only role for an index
POST /_security/role/read_only_blogs
{
"indices": [
{
"names": ["blogs-*"],
"privileges": ["read"]
}
]
}
Common Pitfalls / Anti-Patterns
Over-Sharding
Creating too many shards wastes memory and increases cluster metadata overhead. Each shard maintains its own segment files, segment metadata, and caches. A rule of thumb: keep shard size between 10GB and 50GB. If you have 1TB of data, five 200GB shards is better than fifty 20GB shards.
Fix: Plan shard count based on expected data volume. Use index templates with ILM to auto-delete old indices.
Using Query Context for Filters
Queries in must context score every document, which is expensive when you only need filtering. Move static filters to filter context.
Before (slow):
{
"query": {
"bool": {
"must": [
{ "match": { "content": "search term" } },
{ "term": { "status": "published" } }
]
}
}
}
After (faster):
{
"query": {
"bool": {
"must": [{ "match": { "content": "search term" } }],
"filter": [{ "term": { "status": "published" } }]
}
}
}
Ignoring Refresh Interval
New documents are not searchable until the next refresh (default 1 second). For bulk indexing, this is fine, but for near-real-time requirements, understand the tradeoff. Setting refresh_interval: -1 disables auto-refresh and dramatically speeds up bulk ingestion.
Fix: Adjust refresh_interval based on write vs. read latency requirements.
Not Using Aliases for Zero-Downtime Reindexing
Reindexing directly into an existing index causes downtime and potential data inconsistency.
Fix: Use index aliases with swap operations:
POST /_aliases
{
"actions": [
{ "remove": { "index": "my-index-v1", "alias": "my-index" } },
{ "add": { "index": "my-index-v2", "alias": "my-index" } }
]
}
Quick Recap
Key Bullets
- Elasticsearch stores data in inverted indexes, enabling millisecond full-text queries
- Shard count is fixed at index creation — plan wisely (10GB-50GB per shard target)
- Use
filtercontext for non-scoring queries to leverage caching - Replicas provide fault tolerance and read scaling; they do not help write throughput
- The Query DSL separates leaf queries (match, term) from compound queries (bool)
- Index aliases enable zero-downtime reindexing and blue-green deployments
- BM25 is the default similarity algorithm; it handles term frequency saturation
Copy/Paste Checklist
# Check cluster health
GET /_cluster/health
# View shard allocation
GET /_cat/shards?v
# Monitor search latency
GET /_nodes/stats/indices/search?filter_path=**.*.query_total&timeout=30s
# Force merge to reduce segments
POST /my-index/_forcemerge?max_num_segments=1
# Update replica count
PUT /my-index/_settings
{
"number_of_replicas": 2
}
# Check slow logs
GET /_nodes/stats/indices/indexing?filter_path=**.*.indexing.index.failed
# Set ILM policy
PUT /_ilm/policy/my-policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": { "max_age": "7d" }
}
}
}
}
}
Connecting the Dots
Elasticsearch rarely stands alone. It often sits alongside message queues for indexing streams of data, databases for persistent storage, and caching layers for hot data. If you are building search infrastructure, understanding event-driven architecture helps when designing data pipelines that feed Elasticsearch.
For high-read scenarios, consider adding a caching layer. Our post on distributed caching covers strategies for caching search results and filter facets.
Conclusion
The inverted index is what makes Elasticsearch fast. Queries that would scan a relational database for minutes return in milliseconds. Sharding and replicas give you horizontal scalability and fault tolerance. The Query DSL handles everything from simple keyword matching to complex boolean logic.
Start with an index design that matches your access patterns. Pick analyzers appropriate for your data. Use the Query DSL’s filter context wherever scoring is unnecessary. These fundamentals will get you productive with Elasticsearch quickly.
If you are evaluating search solutions, also look at Apache Solr, which offers similar capabilities with a different operational model. For scaling strategies beyond a single cluster, see our search scaling deep dive.
Category
Related Posts
Apache Solr: Enterprise Search Platform
Explore Apache Solr's powerful search capabilities including faceted search, relevance tuning, indexing strategies, and how it compares to Elasticsearch.
Search Scaling: Sharding, Routing, and Horizontal Growth
Learn how to scale search systems horizontally with index sharding strategies, query routing, replication patterns, and cluster management techniques.
Data Migration: Strategies and Patterns for Moving Data
Learn proven strategies for migrating data between systems with minimal downtime. Covers bulk migration, CDC patterns, validation, and rollback.