HashMap in Java
Understand HashMap: key-value storage with O(1) average lookups, hash collisions, rehashing, and performance characteristics.
HashMap in Java
HashMap is Java’s primary key-value mapping implementation. It stores entries in an internal array of buckets, using each key’s hashCode() to determine the bucket and equals() to distinguish between keys in the same bucket. This combination provides O(1) average-case put and get operations.
Introduction
HashMap is the go-to data structure when you need to store and retrieve values by a key — fast. Under the hood, it turns each key into a bucket index using hashCode(), then uses equals() to tell apart the different keys that land in the same bucket. Done correctly, this gives you O(1) average-case insertion and lookup, regardless of how many entries you store. Done wrong — a broken hashCode() or a mutable key — and you get O(n) performance that degrades silently as the map grows.
The critical trap with HashMap is treating it as a simple dictionary when it is actually a carefully tuned data structure with real constraints. Keys must not change their hashCode() after insertion (or the entry becomes permanently unfindable). The map is not thread-safe — concurrent modification from multiple threads can corrupt internal state or throw ConcurrentModificationException. And because the backing array resizes when the load factor threshold is crossed, a burst of insertions can trigger an O(n) rehashing operation at the worst possible moment.
This post covers HashMap internals — bucket arrays, collision handling via chaining and treeification, the hash() spreading function — along with the failure scenarios that cause real production incidents: hash flooding attacks, unbounded growth, and the subtle difference between get() returning null because the key is absent versus because the value was null.
When to Use HashMap
Use HashMap when:
- You need to store and retrieve values by a unique key
- O(1) lookup performance is critical
- Keys are not ordered (if ordering matters, use
TreeMaporLinkedHashMap) - You need a mutable map; for immutable snapshots, consider
Map.of()orCollections.unmodifiableMap()
Do not use HashMap when:
- You need sorted or insertion-order iteration (use
TreeMaporLinkedHashMap) - You need thread-safe operations (use
ConcurrentHashMap) - Keys may be
nullin a concurrent context (singlenullkey is allowed but not thread-safe) - You need primitive-key maps without boxing (use Trove or other specialized libraries)
Internal Structure
HashMap uses an array of Node<K,V> buckets. Each Node stores the key, value, hash, and a pointer to the next node in the bucket (for collision handling via chaining):
transient Node<K,V>[] table;
transient int size;
// When a bucket has > 8 entries, it converts to a TreeNode (Red-Black tree)
static final int TREEIFY_THRESHOLD = 8;
Mermaid Diagram: HashMap Structure
graph TD
A["HashMap<K,V>"] --> B["table: Node[16]"]
B --> C["bucket[0] → Node<K,V> → null"]
B --> D["bucket[1] → Node → Node → null"]
B --> E["bucket[2] null"]
B --> F["bucket[3] → TreeNode (8+ entries)"]
D --> D1["hash: 17<br/>key<br/>value<br/>next: Node"]
D1 --> D2["hash: 17<br/>key<br/>value<br/>next: null"]
F --> F1["TreeNode<br/>Red-Black tree"]
Hash Collisions
When two different keys produce the same hash code (or their hashes map to the same bucket index), a collision occurs. HashMap handles this via chaining — multiple nodes in the same bucket form a linked list (or tree if the chain gets long).
// Simplified collision handling
if (node.next == null) {
node.next = new Node(hash, key, value, null);
} else {
// Treeify if chain exceeds TREEIFY_THRESHOLD
}
Failure Scenarios
| Scenario | Cause | Result |
|---|---|---|
NullPointerException | null key and key-based operations | Runtime crash |
ConcurrentModificationException | Modifying map while iterating | Runtime crash |
| Performance degradation | Many hash collisions (poor hashCode) | O(n) instead of O(1) per operation |
OutOfMemoryError | Unbounded growth with collisions | HashMap never shrinks |
Trade-Off Table
| Aspect | HashMap | TreeMap | LinkedHashMap |
|---|---|---|---|
| Ordering | None | Sorted by key (Red-Black) | Insertion or access order |
| Get/Put complexity | O(1) avg, O(n) worst | O(log n) | O(1) avg |
null key allowed | Yes (one) | No | Yes (one) |
| Thread safety | None | None | None |
| Memory overhead | Lower | Higher (tree structure) | Higher (linked list) |
Code Snippets
Basic Operations
Map<String, Integer> ages = new HashMap<>();
ages.put("Alice", 30);
ages.put("Bob", 25);
System.out.println(ages.get("Alice")); // 30
System.out.println(ages.getOrDefault("Eve", 0)); // 0
ages.remove("Bob");
System.out.println(ages.containsKey("Bob")); // false
System.out.println(ages.size()); // 1
Custom Key with Proper HashCode
class Employee {
private final int id;
private final String name;
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Employee that = (Employee) o;
return id == that.id;
}
@Override
public int hashCode() {
return Objects.hash(id); // Use only immutable field(s)
}
}
Iterating
for (Map.Entry<String, Integer> entry : ages.entrySet()) {
System.out.println(entry.getKey() + " -> " + entry.getValue());
}
// KeySet iteration
for (String key : ages.keySet()) { }
// Value iteration
for (Integer val : ages.values()) { }
Observability Checklist
- Monitor
hashCode()distribution — poor hash functions cause bucket clustering - Track collision counts via
Map.Entrytraversal on keys with similar hashes - Log
size()over time to detect unbounded map growth - Profile for
ConcurrentModificationExceptionin high-write scenarios - Monitor
TREEIFY_THRESHOLDconversions — tree nodes are slower than linked-list nodes
Security Notes
HashMapis not thread-safe; useConcurrentHashMapfor concurrent access- When using user-supplied keys, a malicious
hashCode()implementation could trigger DoS attacks (hash flooding) - Avoid serializing maps containing sensitive data; use encrypted serialization or secure alternatives
Map.of()andCollections.unmodifiableMap()create immutable views that prevent accidental mutation
Common Pitfalls / Anti-Patterns
- Mutable keys: If a key’s
hashCode()changes after insertion, the entry becomes unretrievable — the map cannot find it - Poor
hashCode(): Returning a constant (e.g.,return 42) fromhashCode()degrades all operations to O(n) - Confusing
put()return value:put()returns the previous value for the key, ornullif none existed — this is indistinguishable from an actualnullvalue; useputIfAbsent()when distinction matters nullkey: Only onenullkey is allowed; subsequentput(null, value)overwrites the first- Ignoring rehashing: Resizing doubles the bucket count and rehashes every entry — O(n) operation; avoid by estimating capacity upfront
Quick Recap
HashMapprovides O(1) average-case put/get by hashing keys to bucket indices- Collisions are handled via chaining (linked list or tree for long chains)
- Keys must have stable
hashCode()and correctequals()implementations nullkey andnullvalues are supported (singlenullkey)- Thread-unsafe; use
ConcurrentHashMaporCollections.synchronizedMap()for thread safety
Interview Questions
::: info
The following .qa-card components contain typical interview questions you may encounter. Reviewing these will help reinforce key concepts.
:::
Model Answer: "A hash collision occurs when two different keys produce the same hash code or map to the same bucket index. `HashMap` handles this via **chaining** — multiple entries in the same bucket form a linked list. When the chain length exceeds 8, it converts to a **Red-Black tree** for O(log n) worst-case lookups instead of O(n)."
Model Answer: "The bucket index is computed as `hash(key) & (capacity - 1)`. The key's `hashCode()` is passed through a mixing function (`hash()` in the source) to distribute bits, then bitwise AND with `capacity - 1` limits the result to a valid bucket index. This is equivalent to `hash % capacity` but faster."
Model Answer: "When the number of entries exceeds `capacity * loadFactor` (default: 16 \* 0.75 = 12), `HashMap` **doubles** the bucket array and **rehashes** every entry into the new array. This is an O(n) operation. You can avoid this cost by setting initial capacity upfront: `new HashMap<>(initialCapacity)`."
Model Answer: "`HashMap` uses `hashCode()` to find the bucket and `equals()` to identify the correct entry within that bucket. If you override only one, the contract breaks: two objects that are `equals()` must have the same `hashCode()`, but objects with the same `hashCode()` do not need to be `equals()`. Overriding both ensures consistent behavior. If only `equals()` is overridden, the object may be stored in the wrong bucket; if only `hashCode()` is overridden, it may not be found even when it exists."
Model Answer: "`ConcurrentHashMap` is **thread-safe** and supports concurrent reads and writes without locking the entire map. It divides the map into segments (or in Java 8+, uses fine-grained bucket locking). `HashMap` is unsynchronized — concurrent modification from multiple threads can cause corruption, infinite loops, or `ConcurrentModificationException`."
Model Answer: "The `hash()` method (also called "spread" or "shuffle") recomputes the hash code to reduce correlation between bits. It applies `hashCode() ^ (hashCode() >>> 16)` — XORing the upper 16 bits with the lower 16 bits. This ensures that bit patterns from poor hash functions (like sequential integers) are better distributed across bucket indices."
Model Answer: "When a single bucket accumulates **8 or more entries** (`TREEIFY_THRESHOLD = 8`), it converts from a linked list to a **Red-Black tree** for O(log n) worst-case lookups instead of O(n). It converts back to a list when the bucket shrinks below **6** (`UNTREEIFY_THRESHOLD = 6`). The intermediate values (6 and 7) prevent oscillation between structures."
Model Answer: "`get(key)` returns the value if the key exists, or `null` if the key is absent or maps to `null`. `getOrDefault(key, defaultValue)` returns the value if present, or the provided `defaultValue` if absent. Use `getOrDefault()` when `null` is a valid stored value and you need to distinguish "not found" from "value is null"."
Model Answer: "Hash flooding exploits poor hash functions by feeding inputs that all hash to the same bucket, degrading performance to O(n). Java 8+ mitigates this with treeification (converting long chains to Red-Black trees). Additionally, `Map.of()`, `Set.of()`, and `Collections` utility methods use randomized hashing to resist intentional attacks."
Model Answer: "If a key's fields used in `hashCode()` or `equals()` change after insertion, the key's hash code changes, making it effectively **unretrievable** — it stays in its original bucket but the map cannot find it. Using immutable types (String, Integer, etc.) or treating custom keys as immutable ensures hash code stability throughout the key's lifetime in the map."
Model Answer: "`put()` always overwrites the existing value and returns the previous value (or null if none). `putIfAbsent()` only inserts if the key is **not already present** — it returns the existing value if found (including when the existing value is `null`), and returns `null` only when the key was actually added."
Model Answer: "`Hashtable` is the legacy synchronized implementation (Java 1.0). It does not allow `null` keys or values, and all methods are synchronized. `HashMap` is newer, unsynchronized, and allows one `null` key and multiple `null` values. For concurrent access, use `ConcurrentHashMap` instead of `Hashtable`."
Model Answer: "When the number of entries exceeds `capacity * loadFactor` (default threshold: 16 * 0.75 = 12), `HashMap` doubles the bucket array and rehashes every entry. This is an O(n) operation done infrequently but can cause latency spikes under heavy write loads."
Model Answer: "The `hash()` method (also called "spread" or "shuffle") recomputes the hash code to reduce correlation between bits. It applies `hashCode() ^ (hashCode() >>> 16)` — XORing the upper 16 bits with the lower 16 bits. This ensures that bit patterns from poor hash functions (like sequential integers) are better distributed across bucket indices."
Model Answer: "Initial capacity sets the starting bucket array size (default 16). Load factor (default 0.75) determines when to resize. For known size requirements, set initial capacity to avoid intermediate resizes: `new HashMap<>(expectedSize / 0.75 + 1)`. This prevents O(n) rehashing during growth."
Model Answer: "Yes, different keys can have the same hash code (collision). HashMap uses both `hashCode()` to find the bucket and `equals()` to find the correct entry within the bucket. Two keys with the same hash but different `equals()` results are stored separately in the same bucket as distinct entries."
Model Answer: "`computeIfAbsent(key, mappingFunction)` returns the existing value if present. If absent, it computes a new value using the mapping function, stores it, and returns it. This is atomic and useful for lazy initialization and cache patterns. The mapping function is invoked at most once per key."
Model Answer: "A collision occurs when two different keys hash to the same bucket. HashMap handles this via chaining (linked list or tree). With many collisions in one bucket, lookups degrade from O(1) to O(n) for linked lists or O(log n) for trees. Java 8+ converts chains of 8+ nodes to a Red-Black tree for better worst-case performance."
Model Answer: "Linked lists are efficient for few collisions but degrade to O(n) with many. When a bucket has 8+ entries (TREEIFY_THRESHOLD), Java converts to a Red-Black tree for O(log n) worst-case. This hybrid approach balances memory overhead (trees need more nodes) against lookup performance under heavy collision scenarios."
Model Answer: "`clear()` removes all entries (sets size to 0) but keeps the bucket array. `remove(key)` removes a single entry by key and decrements size. After `clear()`, the capacity remains the same; after many `remove()` calls, capacity is unchanged (HashMap never shrinks automatically)."
Further Reading
- Oracle HashMap Documentation — Official API specification
- Baeldung: HashMap Internals — Deep dive into hash function, collisions, and treeification
- How HashMap Works in Java — Step-by-step explanation of put/get operations and bucket handling
- Java HashMap: Performance Optimization — Capacity, load factor, and sizing strategies
Conclusion
HashMap is the go-to key-value store in Java, delivering O(1) average-case get and put through hash-based bucketing. The critical contract is that keys provide stable hashCode() and correct equals() implementations — violating either degrades performance unpredictably.
HashMap works best when keys are simple, immutable types (String, Integer, etc.). When using custom objects as keys, treat hashCode() and equals() as a permanent commitment tied to the object’s identity. HashMap is unordered; if insertion order or sorted-key iteration matters, look at TreeMap or LinkedHashMap.
HashSet is the set equivalent of HashMap — it uses the same internal mechanics with a dummy value. For unique-element collections with O(1) lookups, HashSet is the direct counterpart to explore after HashMap.
Category
Related Posts
Abstract Classes in Java
Learn about partially implemented classes that define contracts for subclasses using abstract methods and concrete implementations.
Arithmetic Operators in Java
Master Java arithmetic operators: addition, subtraction, multiplication, division, and modulo with integer division gotchas and operator precedence explained.
Array Basics in Java
Learn Java array fundamentals: declaration, initialization, element access, and the length property explained simply.