Elasticsearch: Full-Text Search at Scale

Learn how Elasticsearch powers search at scale with inverted indexes, sharding, replicas, and its powerful Query DSL for modern applications.

published: reading time: 13 min read

Elasticsearch: Full-Text Search at Scale

Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene. It handles billions of events per day, from log analysis to full-text search, and is the backbone of the Elastic Stack. If you need to search, analyze, or visualize data at scale, Elasticsearch is the standard choice for modern search infrastructure.

This post covers the core concepts that make Elasticsearch work: the inverted index, sharding and replication, and the Query DSL. You will learn how to design indices, write efficient queries, and scale your search cluster horizontally.

The Inverted Index: How Search Works Under the Hood

Traditional databases store data row by row. When you search for “Elasticsearch tutorial,” the database scans every row, checking if the text contains your query. This works for small datasets but becomes prohibitively slow as data grows.

Elasticsearch flips this model with an inverted index. Instead of mapping documents to words, it maps words to documents.

{
  "inverted_index": {
    "elasticsearch": [{ "doc_id": 1, "positions": [0, 3] }],
    "tutorial": [{ "doc_id": 1, "positions": [1] }],
    "full-text": [{ "doc_id": 2, "positions": [0] }]
  }
}

When you search for “elasticsearch tutorial,” Elasticsearch looks up both terms in the inverted index and finds matching documents instantly. No full table scan required.

The index also stores metadata about each term: document frequency, term frequency, positions for phrase queries, and norms for field-length normalization. This metadata is what makes relevance scoring, phrase matching, and fuzzy search possible.

Analyzer Pipeline

Before terms enter the inverted index, they pass through an analyzer consisting of three stages:

  1. Character filters remove HTML tags, convert characters, or apply language-specific normalizations.
  2. Tokenizer splits text into individual terms (tokens).
  3. Token filters lowercase terms, remove stop words, apply synonyms, and perform stemming.
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "snowball", "asciifolding"]
        }
      }
    }
  }
}

Choosing the right analyzer matters. The standard analyzer works fine for most English text. For domain-specific vocabularies, you might need custom analyzers with synonym filters or language-specific stemmers.

Sharding: Distributing Data Across Nodes

A single Elasticsearch node can store millions of documents, but eventually you will need more storage, CPU, or memory than one machine provides. Elasticsearch solves this with shards: horizontal partitions of your index.

When you create an index, you specify the number of primary shards. Each primary shard is an independent Lucene index that stores a subset of your documents.

{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

With three primary shards, Elasticsearch distributes documents roughly evenly across them. A document with ID doc123 is routed to a specific shard using a hash of the ID: shard = hash(_id) % num_primary_shards.

Shard Routing Explained

graph TD
    Client[Client] -->|"search request"| LB[Load Balancer]
    LB --> Node1[Node 1]
    Node1 -->|"coordination"| Coordinator[Coordinator]
    Coordinator -->|"scatter"| Shard1[Primary Shard 1]
    Coordinator -->|"scatter"| Shard2[Primary Shard 2]
    Coordinator -->|"scatter"| Shard3[Primary Shard 3]
    Shard1 -->|"gather"| Coordinator
    Shard2 -->|"gather"| Coordinator
    Shard3 -->|"gather"| Coordinator
    Coordinator -->|"reduce"| Client

The node receiving the search request becomes the coordinator. It broadcasts the query to all relevant shards, collects results, and merges them into a single response. The scatter-gather pattern here is what lets you search across all shards in parallel.

Replica Shards

Every primary shard can have replica shards for fault tolerance and read scalability. Replicas are never allocated on the same node as their primary. If a node fails, Elasticsearch automatically promotes replicas to primary shards.

Replica configuration is dynamic. You can increase replicas after index creation:

PUT /my-index/_settings
{
  "number_of_replicas": 2
}

For read-heavy workloads, adding replicas linearly scales search throughput. Three replicas of three primary shards gives you six total shards to handle search requests.

The Query DSL: Expressing Search Logic

Elasticsearch’s Query DSL is a JSON-based language for expressing complex search queries. It separates queries into two categories:

  • Leaf queries match specific fields (match, term, range, geo queries)
  • Compound queries combine leaf queries (bool, constant_score, function_score)

The Bool Query

The bool query is the workhorse of Elasticsearch search. It supports four clauses:

  • must: Documents must match (AND logic)
  • should: Documents should match (OR logic)
  • filter: Same as must but without scoring (faster)
  • must_not: Documents must not match (NOT logic)
{
  "query": {
    "bool": {
      "must": [{ "match": { "title": "elasticsearch" } }],
      "filter": [
        { "range": { "publish_date": { "gte": "2024-01-01" } } },
        { "term": { "status": "published" } }
      ],
      "should": [
        { "match": { "tags": "search" } },
        { "match": { "tags": "database" } }
      ],
      "minimum_should_match": 1
    }
  }
}

The filter context is particularly important. Queries in filter context bypass scoring entirely, and Elasticsearch caches filter bitsets for reuse. If you filter by a static field like status, subsequent queries become faster.

Relevance Scoring

Elasticsearch uses BM25 (Okapi BM25) as its default similarity algorithm. BM25 considers term frequency saturation and field length normalization. A term appearing 10 times in a 100-word field scores roughly the same as appearing 5 times in a 50-word field.

You can debug relevance with the explain parameter:

GET /my-index/_search
{
  "explain": true,
  "query": {
    "match": { "content": "elasticsearch tutorial" }
  }
}

The response shows exactly how each document scored, including term frequencies, inverse document frequencies, and field norms.

Designing Indices for Performance

Index design profoundly impacts search performance. A few principles:

Size your shards wisely. Shards between 10GB and 50GB work well. Too many small shards cause overhead; too few large shards cause memory pressure.

Use aliases for flexibility. Index aliases let you reindex without downtime:

POST /_aliases
{
  "actions": [
    { "remove": { "index": "my-index-v1", "alias": "my-index" } },
    { "add": { "index": "my-index-v2", "alias": "my-index" } }
  ]
}

Denormalize for read performance. Elasticsearch does not support joins like SQL. If you frequently query documents with nested objects, consider flattening the structure or using denormalization.

Capacity Estimation

Sizing an Elasticsearch cluster means estimating heap and document counts.

Heap allocation per node:

heap_per_node ≈
    (shards_per_node × segment_overhead)
  + (indexing_buffer × 10-20% of heap)
  + (query_cache × 10% of heap)
  + (fielddata × 10-20% of heap)
  + OS reserve (1GB)

Elasticsearch heap is shared across all shards on a node. Segment overhead alone is roughly 25MB per shard. With 30 shards on one node, that is 750MB before you even touch indexing buffers or caches.

Docs per shard guidelines:

Shard Size TargetDocs per Shard (approx)
10GB shard10-30M docs (variable by doc size)
30GB shard30-100M docs
50GB shard50-150M docs

Doc count is harder to predict than shard size. A 1KB doc and a 50KB doc land at very different doc counts even at the same shard size. Watch both.

Example: 500GB index with 3 primary shards and 1 replica each:

shards_per_node = 3 primaries + 3 replicas = 6 shards
at 30GB/shard = 180GB per node

heap needed:
  segment overhead: 6 × 25MB = 150MB
  indexing buffer: 512MB (default, scales with heap)
  query cache: 276MB (10% of ~2.7GB heap)
  fielddata: 276MB
  OS reserve: 1GB
  total ≈ 2.5GB minimum, 4GB recommended

The sweet spot is 30GB-32GB heap per node. Above 32GB, Lucene starts using direct memory buffers outside the Java heap, which shifts the memory math. Most production clusters run 31GB heap on 64GB machines — the remaining RAM goes to the OS page cache, which Lucene uses heavily for file system caching.

When to Use / When Not to Use

When to Use Elasticsearch

  • Log and event analysis at scale, especially with the ELK stack
  • Full-text search with complex relevance tuning and fuzzy matching
  • Real-time analytics on time-series data with aggregations
  • Distributed search where horizontal scalability is a requirement
  • Autocomplete and type-ahead features via completion suggester

When Not to Use Elasticsearch

  • Primary data store requiring ACID transactions (use a proper database)
  • Simple key-value lookups where a document store suffices
  • Heavy join operations across multiple entity types
  • Systems requiring strong consistency (Elasticsearch is eventually consistent by default)
  • Small datasets where a single PostgreSQL tsvector or SQLite FTS5 would suffice

Production Failure Scenarios

FailureImpactMitigation
Primary shard allocation failsIndex unavailable for writesSet index.number_of_replicas >= 1 and use automatic retry
Coordinating node OOMSearch queries fail across clusterLimit search.max_buckets, add circuit breakers, increase heap
Split-brain during network partitionDuplicate data or conflicting primariesUse minimum_master_nodes ( quorum), prefer single-zone clusters
Bulk indexing queue overflowDocuments rejected, indexing lagSize queue with thread_pool.write.queue_size, implement backpressure
Hot/Warm node imbalanceSome nodes run out of disk or CPUUse Index Lifecycle Management (ILM), allocate shards manually
Incorrect analyzer causing data lossDocuments return no resultsTest analyzers with _analyze API before applying to production indices

Observability Checklist

Metrics to Monitor

{
  "cluster_metrics": {
    "cluster_health": "green/yellow/red status",
    "number_of_pending_tasks": "< 100 typically",
    "task_duration_avg": "< 1s for search, < 5s for bulk"
  },
  "node_metrics": {
    "heap_used_percent": "< 85% sustained",
    "cpu_usage_percent": "< 70% sustained",
    "disk_io_read/write": "baseline + anomaly detection",
    "open_file_handles": "< 80% of ulimit"
  },
  "index_metrics": {
    "indexing_rate": "documents/second",
    "search_latency_p99": "< 500ms for interactive, < 2s for batch",
    "refresh_latency": "< 1s per segment",
    "segments_count": "< 50 per shard (force merge if higher)"
  }
}

Key Logs to Capture

  • Error logs: logs/elasticsearch.log — shard failures, OOM events, circuit breaker trips
  • Deprecation logs: deprecated API usage, settings scheduled for removal
  • Slow search logs: configure index.search.slowlog.threshold to capture queries exceeding latency thresholds
  • Indexing slow logs: index.indexing.slowlog.threshold for bulk insert issues
# Enable slow logs via API (persistent across restarts)
PUT /my-index/_settings
{
  "index.search.slowlog.threshold.query.warn": "10s",
  "index.search.slowlog.threshold.fetch.warn": "1s",
  "index.indexing.slowlog.threshold.index.warn": "10s"
}

Alerts to Configure

AlertConditionSeverity
Cluster healthstatus == redCritical
Node heap usageheap_used_percent > 90% for > 5 minWarning
Search latencysearch_latency_p99 > 2sWarning
Unassigned shardsunassigned_shards > 0 for > 5 minCritical
Bulk queue rejectionbulk_queue_rejections > 0Warning
Disk watermarkdisk.watermark.low exceededWarning

Security Checklist

  • Enable XPack Security (or OpenSearch Security if using OpenSearch distro)
  • Use role-based access control (RBAC) — define roles with least-privilege principles
  • Encrypt node-to-node communication with TLS certificates
  • Restrict JMX/REST endpoints to internal networks; do not expose publicly
  • Validate input to prevent injection via query strings and aggregations
  • Audit access logs — log all write operations to sensitive indices
  • Rotate credentials regularly; use a secrets manager for API keys
  • Configure field-level security if certain fields must be hidden from some roles
  • Enable audit logging to track unauthorized access attempts
# Example: Create a read-only role for an index
POST /_security/role/read_only_blogs
{
  "indices": [
    {
      "names": ["blogs-*"],
      "privileges": ["read"]
    }
  ]
}

Common Pitfalls / Anti-Patterns

Over-Sharding

Creating too many shards wastes memory and increases cluster metadata overhead. Each shard maintains its own segment files, segment metadata, and caches. A rule of thumb: keep shard size between 10GB and 50GB. If you have 1TB of data, five 200GB shards is better than fifty 20GB shards.

Fix: Plan shard count based on expected data volume. Use index templates with ILM to auto-delete old indices.

Using Query Context for Filters

Queries in must context score every document, which is expensive when you only need filtering. Move static filters to filter context.

Before (slow):

{
  "query": {
    "bool": {
      "must": [
        { "match": { "content": "search term" } },
        { "term": { "status": "published" } }
      ]
    }
  }
}

After (faster):

{
  "query": {
    "bool": {
      "must": [{ "match": { "content": "search term" } }],
      "filter": [{ "term": { "status": "published" } }]
    }
  }
}

Ignoring Refresh Interval

New documents are not searchable until the next refresh (default 1 second). For bulk indexing, this is fine, but for near-real-time requirements, understand the tradeoff. Setting refresh_interval: -1 disables auto-refresh and dramatically speeds up bulk ingestion.

Fix: Adjust refresh_interval based on write vs. read latency requirements.

Not Using Aliases for Zero-Downtime Reindexing

Reindexing directly into an existing index causes downtime and potential data inconsistency.

Fix: Use index aliases with swap operations:

POST /_aliases
{
  "actions": [
    { "remove": { "index": "my-index-v1", "alias": "my-index" } },
    { "add": { "index": "my-index-v2", "alias": "my-index" } }
  ]
}

Quick Recap

Key Bullets

  • Elasticsearch stores data in inverted indexes, enabling millisecond full-text queries
  • Shard count is fixed at index creation — plan wisely (10GB-50GB per shard target)
  • Use filter context for non-scoring queries to leverage caching
  • Replicas provide fault tolerance and read scaling; they do not help write throughput
  • The Query DSL separates leaf queries (match, term) from compound queries (bool)
  • Index aliases enable zero-downtime reindexing and blue-green deployments
  • BM25 is the default similarity algorithm; it handles term frequency saturation

Copy/Paste Checklist

# Check cluster health
GET /_cluster/health

# View shard allocation
GET /_cat/shards?v

# Monitor search latency
GET /_nodes/stats/indices/search?filter_path=**.*.query_total&timeout=30s

# Force merge to reduce segments
POST /my-index/_forcemerge?max_num_segments=1

# Update replica count
PUT /my-index/_settings
{
  "number_of_replicas": 2
}

# Check slow logs
GET /_nodes/stats/indices/indexing?filter_path=**.*.indexing.index.failed

# Set ILM policy
PUT /_ilm/policy/my-policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": { "max_age": "7d" }
        }
      }
    }
  }
}

Connecting the Dots

Elasticsearch rarely stands alone. It often sits alongside message queues for indexing streams of data, databases for persistent storage, and caching layers for hot data. If you are building search infrastructure, understanding event-driven architecture helps when designing data pipelines that feed Elasticsearch.

For high-read scenarios, consider adding a caching layer. Our post on distributed caching covers strategies for caching search results and filter facets.

Conclusion

The inverted index is what makes Elasticsearch fast. Queries that would scan a relational database for minutes return in milliseconds. Sharding and replicas give you horizontal scalability and fault tolerance. The Query DSL handles everything from simple keyword matching to complex boolean logic.

Start with an index design that matches your access patterns. Pick analyzers appropriate for your data. Use the Query DSL’s filter context wherever scoring is unnecessary. These fundamentals will get you productive with Elasticsearch quickly.

If you are evaluating search solutions, also look at Apache Solr, which offers similar capabilities with a different operational model. For scaling strategies beyond a single cluster, see our search scaling deep dive.

Category

Related Posts

Apache Solr: Enterprise Search Platform

Explore Apache Solr's powerful search capabilities including faceted search, relevance tuning, indexing strategies, and how it compares to Elasticsearch.

#search #solr #apache

Search Scaling: Sharding, Routing, and Horizontal Growth

Learn how to scale search systems horizontally with index sharding strategies, query routing, replication patterns, and cluster management techniques.

#search #scaling #elasticsearch

Data Migration: Strategies and Patterns for Moving Data

Learn proven strategies for migrating data between systems with minimal downtime. Covers bulk migration, CDC patterns, validation, and rollback.

#data-engineering #data-migration #cdc