Elasticsearch: Full-Text Search at Scale

Learn how Elasticsearch powers search at scale with inverted indexes, sharding, replicas, and its powerful Query DSL for modern applications.

published: March 22, 2026 reading time: 13 min read

Elasticsearch: Full-Text Search at Scale

Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene. It handles billions of events per day, from log analysis to full-text search, and is the backbone of the Elastic Stack. If you need to search, analyze, or visualize data at scale, Elasticsearch is the standard choice for modern search infrastructure.

This post covers the core concepts that make Elasticsearch work: the inverted index, sharding and replication, and the Query DSL. You will learn how to design indices, write efficient queries, and scale your search cluster horizontally.

The Inverted Index: How Search Works Under the Hood

Traditional databases store data row by row. When you search for “Elasticsearch tutorial,” the database scans every row, checking if the text contains your query. This works for small datasets but becomes prohibitively slow as data grows.

Elasticsearch flips this model with an inverted index. Instead of mapping documents to words, it maps words to documents.

{
  "inverted_index": {
    "elasticsearch": [{ "doc_id": 1, "positions": [0, 3] }],
    "tutorial": [{ "doc_id": 1, "positions": [1] }],
    "full-text": [{ "doc_id": 2, "positions": [0] }]
  }
}

When you search for “elasticsearch tutorial,” Elasticsearch looks up both terms in the inverted index and finds matching documents instantly. No full table scan required.

The index also stores metadata about each term: document frequency, term frequency, positions for phrase queries, and norms for field-length normalization. This metadata is what makes relevance scoring, phrase matching, and fuzzy search possible.

Analyzer Pipeline

Before terms enter the inverted index, they pass through an analyzer consisting of three stages:

Character filters remove HTML tags, convert characters, or apply language-specific normalizations.
Tokenizer splits text into individual terms (tokens).
Token filters lowercase terms, remove stop words, apply synonyms, and perform stemming.

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "snowball", "asciifolding"]
        }
      }
    }
  }
}

Choosing the right analyzer matters. The standard analyzer works fine for most English text. For domain-specific vocabularies, you might need custom analyzers with synonym filters or language-specific stemmers.

Sharding: Distributing Data Across Nodes

A single Elasticsearch node can store millions of documents, but eventually you will need more storage, CPU, or memory than one machine provides. Elasticsearch solves this with shards: horizontal partitions of your index.

When you create an index, you specify the number of primary shards. Each primary shard is an independent Lucene index that stores a subset of your documents.

{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

With three primary shards, Elasticsearch distributes documents roughly evenly across them. A document with ID doc123 is routed to a specific shard using a hash of the ID: shard = hash(_id) % num_primary_shards.

Shard Routing Explained

graph TD
    Client[Client] -->|"search request"| LB[Load Balancer]
    LB --> Node1[Node 1]
    Node1 -->|"coordination"| Coordinator[Coordinator]
    Coordinator -->|"scatter"| Shard1[Primary Shard 1]
    Coordinator -->|"scatter"| Shard2[Primary Shard 2]
    Coordinator -->|"scatter"| Shard3[Primary Shard 3]
    Shard1 -->|"gather"| Coordinator
    Shard2 -->|"gather"| Coordinator
    Shard3 -->|"gather"| Coordinator
    Coordinator -->|"reduce"| Client

The node receiving the search request becomes the coordinator. It broadcasts the query to all relevant shards, collects results, and merges them into a single response. The scatter-gather pattern here is what lets you search across all shards in parallel.

Replica Shards

Every primary shard can have replica shards for fault tolerance and read scalability. Replicas are never allocated on the same node as their primary. If a node fails, Elasticsearch automatically promotes replicas to primary shards.

Replica configuration is dynamic. You can increase replicas after index creation:

PUT /my-index/_settings
{
  "number_of_replicas": 2
}

For read-heavy workloads, adding replicas linearly scales search throughput. Three replicas of three primary shards gives you six total shards to handle search requests.

The Query DSL: Expressing Search Logic

Elasticsearch’s Query DSL is a JSON-based language for expressing complex search queries. It separates queries into two categories:

Leaf queries match specific fields (match, term, range, geo queries)
Compound queries combine leaf queries (bool, constant_score, function_score)

The Bool Query

The bool query is the workhorse of Elasticsearch search. It supports four clauses:

must: Documents must match (AND logic)
should: Documents should match (OR logic)
filter: Same as must but without scoring (faster)
must_not: Documents must not match (NOT logic)

{
  "query": {
    "bool": {
      "must": [{ "match": { "title": "elasticsearch" } }],
      "filter": [
        { "range": { "publish_date": { "gte": "2024-01-01" } } },
        { "term": { "status": "published" } }
      ],
      "should": [
        { "match": { "tags": "search" } },
        { "match": { "tags": "database" } }
      ],
      "minimum_should_match": 1
    }
  }
}

The filter context is particularly important. Queries in filter context bypass scoring entirely, and Elasticsearch caches filter bitsets for reuse. If you filter by a static field like status, subsequent queries become faster.

Relevance Scoring

Elasticsearch uses BM25 (Okapi BM25) as its default similarity algorithm. BM25 considers term frequency saturation and field length normalization. A term appearing 10 times in a 100-word field scores roughly the same as appearing 5 times in a 50-word field.

You can debug relevance with the explain parameter:

GET /my-index/_search
{
  "explain": true,
  "query": {
    "match": { "content": "elasticsearch tutorial" }
  }
}

The response shows exactly how each document scored, including term frequencies, inverse document frequencies, and field norms.

Designing Indices for Performance

Index design profoundly impacts search performance. A few principles:

Size your shards wisely. Shards between 10GB and 50GB work well. Too many small shards cause overhead; too few large shards cause memory pressure.

Use aliases for flexibility. Index aliases let you reindex without downtime:

POST /_aliases
{
  "actions": [
    { "remove": { "index": "my-index-v1", "alias": "my-index" } },
    { "add": { "index": "my-index-v2", "alias": "my-index" } }
  ]
}

Denormalize for read performance. Elasticsearch does not support joins like SQL. If you frequently query documents with nested objects, consider flattening the structure or using denormalization.

Capacity Estimation

Sizing an Elasticsearch cluster means estimating heap and document counts.

Heap allocation per node:

heap_per_node ≈
    (shards_per_node × segment_overhead)
  + (indexing_buffer × 10-20% of heap)
  + (query_cache × 10% of heap)
  + (fielddata × 10-20% of heap)
  + OS reserve (1GB)

Elasticsearch heap is shared across all shards on a node. Segment overhead alone is roughly 25MB per shard. With 30 shards on one node, that is 750MB before you even touch indexing buffers or caches.

Docs per shard guidelines:

Shard Size Target	Docs per Shard (approx)
10GB shard	10-30M docs (variable by doc size)
30GB shard	30-100M docs
50GB shard	50-150M docs

Doc count is harder to predict than shard size. A 1KB doc and a 50KB doc land at very different doc counts even at the same shard size. Watch both.

Example: 500GB index with 3 primary shards and 1 replica each:

shards_per_node = 3 primaries + 3 replicas = 6 shards
at 30GB/shard = 180GB per node

heap needed:
  segment overhead: 6 × 25MB = 150MB
  indexing buffer: 512MB (default, scales with heap)
  query cache: 276MB (10% of ~2.7GB heap)
  fielddata: 276MB
  OS reserve: 1GB
  total ≈ 2.5GB minimum, 4GB recommended

The sweet spot is 30GB-32GB heap per node. Above 32GB, Lucene starts using direct memory buffers outside the Java heap, which shifts the memory math. Most production clusters run 31GB heap on 64GB machines — the remaining RAM goes to the OS page cache, which Lucene uses heavily for file system caching.

When to Use / When Not to Use

When to Use Elasticsearch

Log and event analysis at scale, especially with the ELK stack
Full-text search with complex relevance tuning and fuzzy matching
Real-time analytics on time-series data with aggregations
Distributed search where horizontal scalability is a requirement
Autocomplete and type-ahead features via completion suggester

When Not to Use Elasticsearch

Primary data store requiring ACID transactions (use a proper database)
Simple key-value lookups where a document store suffices
Heavy join operations across multiple entity types
Systems requiring strong consistency (Elasticsearch is eventually consistent by default)
Small datasets where a single PostgreSQL tsvector or SQLite FTS5 would suffice

Production Failure Scenarios

Failure	Impact	Mitigation
Primary shard allocation fails	Index unavailable for writes	Set `index.number_of_replicas >= 1` and use automatic retry
Coordinating node OOM	Search queries fail across cluster	Limit `search.max_buckets`, add circuit breakers, increase heap
Split-brain during network partition	Duplicate data or conflicting primaries	Use `minimum_master_nodes` ( quorum), prefer single-zone clusters
Bulk indexing queue overflow	Documents rejected, indexing lag	Size queue with `thread_pool.write.queue_size`, implement backpressure
Hot/Warm node imbalance	Some nodes run out of disk or CPU	Use Index Lifecycle Management (ILM), allocate shards manually
Incorrect analyzer causing data loss	Documents return no results	Test analyzers with `_analyze` API before applying to production indices

Observability Checklist

Metrics to Monitor

{
  "cluster_metrics": {
    "cluster_health": "green/yellow/red status",
    "number_of_pending_tasks": "< 100 typically",
    "task_duration_avg": "< 1s for search, < 5s for bulk"
  },
  "node_metrics": {
    "heap_used_percent": "< 85% sustained",
    "cpu_usage_percent": "< 70% sustained",
    "disk_io_read/write": "baseline + anomaly detection",
    "open_file_handles": "< 80% of ulimit"
  },
  "index_metrics": {
    "indexing_rate": "documents/second",
    "search_latency_p99": "< 500ms for interactive, < 2s for batch",
    "refresh_latency": "< 1s per segment",
    "segments_count": "< 50 per shard (force merge if higher)"
  }
}

Key Logs to Capture

Error logs: logs/elasticsearch.log — shard failures, OOM events, circuit breaker trips
Deprecation logs: deprecated API usage, settings scheduled for removal
Slow search logs: configure index.search.slowlog.threshold to capture queries exceeding latency thresholds
Indexing slow logs: index.indexing.slowlog.threshold for bulk insert issues

# Enable slow logs via API (persistent across restarts)
PUT /my-index/_settings
{
  "index.search.slowlog.threshold.query.warn": "10s",
  "index.search.slowlog.threshold.fetch.warn": "1s",
  "index.indexing.slowlog.threshold.index.warn": "10s"
}

Alerts to Configure

Alert	Condition	Severity
Cluster health	`status == red`	Critical
Node heap usage	`heap_used_percent > 90%` for > 5 min	Warning
Search latency	`search_latency_p99 > 2s`	Warning
Unassigned shards	`unassigned_shards > 0` for > 5 min	Critical
Bulk queue rejection	`bulk_queue_rejections > 0`	Warning
Disk watermark	`disk.watermark.low` exceeded	Warning

Security Checklist

Enable XPack Security (or OpenSearch Security if using OpenSearch distro)
Use role-based access control (RBAC) — define roles with least-privilege principles
Encrypt node-to-node communication with TLS certificates
Restrict JMX/REST endpoints to internal networks; do not expose publicly
Validate input to prevent injection via query strings and aggregations
Audit access logs — log all write operations to sensitive indices
Rotate credentials regularly; use a secrets manager for API keys
Configure field-level security if certain fields must be hidden from some roles
Enable audit logging to track unauthorized access attempts

# Example: Create a read-only role for an index
POST /_security/role/read_only_blogs
{
  "indices": [
    {
      "names": ["blogs-*"],
      "privileges": ["read"]
    }
  ]
}

Common Pitfalls / Anti-Patterns

Over-Sharding

Creating too many shards wastes memory and increases cluster metadata overhead. Each shard maintains its own segment files, segment metadata, and caches. A rule of thumb: keep shard size between 10GB and 50GB. If you have 1TB of data, five 200GB shards is better than fifty 20GB shards.

Fix: Plan shard count based on expected data volume. Use index templates with ILM to auto-delete old indices.

Using Query Context for Filters

Queries in must context score every document, which is expensive when you only need filtering. Move static filters to filter context.

Before (slow):

{
  "query": {
    "bool": {
      "must": [
        { "match": { "content": "search term" } },
        { "term": { "status": "published" } }
      ]
    }
  }
}

After (faster):

{
  "query": {
    "bool": {
      "must": [{ "match": { "content": "search term" } }],
      "filter": [{ "term": { "status": "published" } }]
    }
  }
}

Ignoring Refresh Interval

New documents are not searchable until the next refresh (default 1 second). For bulk indexing, this is fine, but for near-real-time requirements, understand the tradeoff. Setting refresh_interval: -1 disables auto-refresh and dramatically speeds up bulk ingestion.

Fix: Adjust refresh_interval based on write vs. read latency requirements.

Not Using Aliases for Zero-Downtime Reindexing

Reindexing directly into an existing index causes downtime and potential data inconsistency.

Fix: Use index aliases with swap operations:

POST /_aliases
{
  "actions": [
    { "remove": { "index": "my-index-v1", "alias": "my-index" } },
    { "add": { "index": "my-index-v2", "alias": "my-index" } }
  ]
}

Quick Recap

Key Bullets

Elasticsearch stores data in inverted indexes, enabling millisecond full-text queries
Shard count is fixed at index creation — plan wisely (10GB-50GB per shard target)
Use filter context for non-scoring queries to leverage caching
Replicas provide fault tolerance and read scaling; they do not help write throughput
The Query DSL separates leaf queries (match, term) from compound queries (bool)
Index aliases enable zero-downtime reindexing and blue-green deployments
BM25 is the default similarity algorithm; it handles term frequency saturation

Copy/Paste Checklist

# Check cluster health
GET /_cluster/health

# View shard allocation
GET /_cat/shards?v

# Monitor search latency
GET /_nodes/stats/indices/search?filter_path=**.*.query_total&timeout=30s

# Force merge to reduce segments
POST /my-index/_forcemerge?max_num_segments=1

# Update replica count
PUT /my-index/_settings
{
  "number_of_replicas": 2
}

# Check slow logs
GET /_nodes/stats/indices/indexing?filter_path=**.*.indexing.index.failed

# Set ILM policy
PUT /_ilm/policy/my-policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": { "max_age": "7d" }
        }
      }
    }
  }
}

Connecting the Dots

Elasticsearch rarely stands alone. It often sits alongside message queues for indexing streams of data, databases for persistent storage, and caching layers for hot data. If you are building search infrastructure, understanding event-driven architecture helps when designing data pipelines that feed Elasticsearch.

For high-read scenarios, consider adding a caching layer. Our post on distributed caching covers strategies for caching search results and filter facets.

Conclusion

The inverted index is what makes Elasticsearch fast. Queries that would scan a relational database for minutes return in milliseconds. Sharding and replicas give you horizontal scalability and fault tolerance. The Query DSL handles everything from simple keyword matching to complex boolean logic.

Start with an index design that matches your access patterns. Pick analyzers appropriate for your data. Use the Query DSL’s filter context wherever scoring is unnecessary. These fundamentals will get you productive with Elasticsearch quickly.

If you are evaluating search solutions, also look at Apache Solr, which offers similar capabilities with a different operational model. For scaling strategies beyond a single cluster, see our search scaling deep dive.

Elasticsearch: Full-Text Search at Scale

Elasticsearch: Full-Text Search at Scale

The Inverted Index: How Search Works Under the Hood

Analyzer Pipeline

Sharding: Distributing Data Across Nodes

Shard Routing Explained

Replica Shards

The Query DSL: Expressing Search Logic

The Bool Query

Relevance Scoring

Designing Indices for Performance

Capacity Estimation

When to Use / When Not to Use

When to Use Elasticsearch

When Not to Use Elasticsearch

Production Failure Scenarios

Observability Checklist

Metrics to Monitor

Key Logs to Capture

Alerts to Configure

Security Checklist

Common Pitfalls / Anti-Patterns

Over-Sharding

Using Query Context for Filters

Ignoring Refresh Interval

Not Using Aliases for Zero-Downtime Reindexing

Quick Recap

Key Bullets

Copy/Paste Checklist

Connecting the Dots

Conclusion

Category

Tags

Related Posts

Apache Solr: Enterprise Search Platform

Search Scaling: Sharding, Routing, and Horizontal Growth

Data Migration: Strategies and Patterns for Moving Data