HashSet in Java
Learn HashSet: unique element collections backed by HashMap, contains and add operations, and the internal ELEMENT object pattern.
HashSet in Java
HashSet is Java’s implementation of a set — a collection that stores only unique elements. Under the hood, it is a thin wrapper around a HashMap that uses a shared sentinel object as the value for every key (the element itself). This design gives HashSet all the O(1) performance characteristics of HashMap.
Introduction
HashSet solves a simple problem: you need a collection of unique elements with fast membership checks. Behind the scenes, it is a clever hack — a HashMap where the element becomes the key and a shared dummy object (PRESENT) becomes the value. Since map keys are unique by definition, you get deduplication for free. The O(1) add() and contains() operations come directly from HashMap’s implementation.
The implication is important: HashSet inherits every behavioral quirk of HashMap. The element’s hashCode() must be stable after insertion or the element becomes unfindable. The set is not thread-safe. Iteration order is undefined — HashSet makes no promises about the sequence in which elements are visited, unlike LinkedHashSet which preserves insertion order. And because HashSet is just a thin wrapper, mutating an element in place can silently break the set’s internal consistency without any warning.
This post covers how HashSet delegates to HashMap, the PRESENT sentinel pattern that makes this work, the scenarios where HashSet is the right tool (deduplication, fast lookups, set algebra operations), and when to choose TreeSet (sorted) or LinkedHashSet (insertion order) instead.
When to Use HashSet
Use HashSet when:
- You need to store a collection of unique elements
- You need O(1) membership checks (
contains()) - You do not need to store duplicates or maintain order
- You want fast add, remove, and clear operations
Do not use HashSet when:
- You need sorted elements (use
TreeSet) - You need insertion-order iteration (use
LinkedHashSet) - You need to maintain counts for duplicate elements (use
Map<E, Integer>orMultiset) - You need thread-safe unique collections (use
ConcurrentHashSetorCollections.synchronizedSet())
Internal Structure
HashSet internally uses a HashMap<E, Object> where the element is stored as the key and a shared PRESENT object is used as the value:
private transient HashMap<E, Object> map;
private static final Object PRESENT = new Object();
// add() calls map.put(element, PRESENT) — PRESENT is never used
public boolean add(E e) {
return map.put(e, PRESENT) == null; // null means key was absent (new element)
}
Mermaid Diagram: HashSet Internals
graph TD
A["HashSet<E>"] --> B["HashMap<E, Object>"]
B --> C["map.put(element, PRESENT)"]
C --> D["PRESENT = new Object()"]
D --> E["Used as dummy value for all entries"]
Failure Scenarios
| Scenario | Cause | Result |
|---|---|---|
NullPointerException | Adding null when set does not permit nulls | Runtime crash |
ClassCastException | Storing incompatible types | Runtime crash on retrieval |
ConcurrentModificationException | Modifying during iteration | Runtime crash |
Duplicate add() returns | Adding an element already present | Returns false, no exception |
Trade-Off Table
| Aspect | HashSet | TreeSet | LinkedHashSet |
|---|---|---|---|
| Ordering | None | Sorted (Red-Black tree) | Insertion order |
contains() / add() | O(1) average | O(log n) | O(1) average |
| Memory overhead | Lowest | Tree node overhead | Linked list overhead |
null element | One allowed | Not allowed | One allowed |
| Sorted operations | N/A | first(), last(), subSet() | N/A |
Code Snippets
Basic Operations
Set<String> languages = new HashSet<>();
languages.add("Java");
languages.add("Python");
languages.add("Java"); // Duplicate — ignored, returns false
System.out.println(languages.size()); // 2
System.out.println(languages.contains("Java")); // true
languages.remove("Python");
Set Algebra Operations
Set<String> a = Set.of("Java", "Python", "Go");
Set<String> b = Set.of("Python", "Rust", "Go");
Set<String> intersection = new HashSet<>(a);
intersection.retainAll(b); // ["Python", "Go"]
Set<String> union = new HashSet<>(a);
union.addAll(b); // ["Java", "Python", "Go", "Rust"]
Set<String> difference = new HashSet<>(a);
difference.removeAll(b); // ["Java"]
Converting Between List and Set
List<String> withDuplicates = List.of("a", "b", "a", "c", "b");
// Deduplicate
Set<String> unique = new HashSet<>(withDuplicates);
// Back to list preserving order
List<String> deduped = new ArrayList<>(unique);
Observability Checklist
- Monitor set size distributions — detect unbounded growth
- Track
contains()call frequency in hot paths — these are O(1) but often assumed O(n) - Profile hash function quality if set operations appear slow despite correct implementation
- Log
ConcurrentModificationExceptionto identify iteration conflicts - Use
Set.of()for small, immutable sets — avoids defensive copying overhead
Security Notes
HashSetis not thread-safe; concurrent modification can cause data corruption- Avoid storing sensitive elements (passwords, tokens) in sets that may be serialized or logged
- Consider
Collections.unmodifiableSet()when returning sets to external callers - When using custom objects as set elements, ensure
hashCode()andequals()are consistent and do not leak sensitive state
Common Pitfalls / Anti-Patterns
- Mutable elements as set members: If an element’s
hashCode()changes after insertion, the element becomes unretrievable and orphaned in the set - Confusing
add()return value:add()returnstrueif the element was added,falseif it was already present — this is not an error condition nullelement: MostHashSetimplementations allow onenullelement; attempting to add a second throwsNullPointerException- Assuming set preserves insertion order: It does not; use
LinkedHashSetif order matters - Using
contains()on collections with customequals(): The operation is O(n) forList(linear scan) but O(1) forHashSet— choose wisely
Quick Recap
HashSetwraps aHashMapinternally, using a dummyPRESENTobject as all values- Element uniqueness is enforced via the map’s key-based deduplication
- O(1) average
add(),contains(), andremove()operations - No ordering guarantees; use
TreeSetfor sorted orLinkedHashSetfor insertion-order - Thread-unsafe; synchronize externally or use concurrent alternatives
Interview Questions
::: info
The following .qa-card components contain typical interview questions you may encounter. Reviewing these will help reinforce key concepts.
:::
Model Answer: "`HashSet` internally uses a `HashMap
Model Answer: "**O(1) average-case**, O(n) worst-case. The method computes `hashCode()` of the element, finds the bucket, then traverses the chain (or tree) calling `equals()` on each candidate. With a good hash function and reasonable load factor, collisions are rare, making lookups constant time."
Model Answer: "Yes, most implementations allow exactly **one `null` element**. Adding a second `null` throws a `NullPointerException`. However, attempting to add `null` to a `Collections.checkedSet()` or certain synchronized wrappers may throw an exception even for the first `null`."
Model Answer: "`HashSet` is implemented as a **thin wrapper around `HashMap`**. The set's elements become map keys, and a constant dummy object (`PRESENT`) is used as all map values. All set operations (`add()`, `remove()`, `contains()`) delegate directly to the corresponding map operations. This is a classic GoF **Adapter Pattern** application."
Model Answer: "You can iterate via `iterator()`, `for-each` loop (which uses the iterator under the hood), or `forEach()`. All three are equivalent in performance — O(n) total, O(1) per element. The iteration visits every bucket, so even empty buckets contribute to iteration time. There is no guaranteed order unless using `LinkedHashSet`."
Model Answer: "`add()` calls `map.put(element, PRESENT)`. `HashMap.put()` returns the **previous value** associated with the key, or `null` if the key was absent. Since `PRESENT` is never `null`, `add()` returns `true` when `put()` returns `null` (new element) and `false` when `put()` returns `PRESENT` (duplicate, already present)."
Model Answer: "Yes. HashMap uses both `hashCode()` (to find the bucket) and `equals()` (to find the element within the bucket). If two objects have the same hash code but are not `equals()` to each other, they are stored in the same bucket as separate entries — typically as a linked list or tree structure. Having many such objects degrades performance to O(n) for that bucket."
Model Answer: "Default initial capacity is **16** and default load factor is **0.75**. When the number of entries exceeds `capacity * loadFactor`, the map doubles in size and rehashes all entries. You can override defaults: `new HashSet<>(initialCapacity, loadFactor)`. Setting initial capacity avoids intermediate resizes for known-size sets."
Model Answer: "Use `Set.of("a", "b", "c")` (Java 9+) which creates an immutable set with O(1) operations. Alternatively, `Collections.unmodifiableSet(set)` wraps an existing set. Both prevent modifications but with different performance characteristics — `Set.of()` is typically more efficient for small, fixed sets."
Model Answer: "`LinkedHashSet` extends `HashSet` but additionally maintains a **doubly-linked list** across entries in insertion order. This provides O(1) operations (inherited from HashSet) plus predictable iteration order (insertion order). The tradeoff is slightly higher memory overhead (~8 bytes per entry for the linked list pointers)."
Model Answer: "HashSet uses `HashMap` as its backing store, where elements become keys. `HashMap` keys are unique by definition — when you call `put(key, value)` with a key that already exists, it overwrites and returns the old value. `HashSet.add()` checks this return value and returns `false` for duplicates, silently ignoring the second insert."
Model Answer: "`IdentityHashMap` uses reference equality (`==`) instead of `equals()` for key comparison. This means two distinct objects that are content-equal can both exist as separate keys. `IdentityHashMap` is rarely used — primarily for implementing object identity semantics like deep clones or custom memoization."
Model Answer: "If a mutable object's `hashCode()` changes after insertion into a HashSet, the object becomes effectively **unretrievable**. The set stored it in the bucket corresponding to the original hash code, but now computes a different bucket index. The object is still in the set but cannot be found through normal `contains()` or `get()` operations."
Model Answer: "`HashSet.contains()` is O(1) average-case (hash-based lookup). `List.contains()` is O(n) (linear scan). For frequent membership checks on large collections, `HashSet` is dramatically faster. Use `HashSet` when you need fast lookups without order requirements."
Model Answer: "`HashSet` itself does not support size-based eviction. Use `LinkedHashSet` with `removeEldestEntry()` for LRU cache implementation, or use `WeakHashMap`/`WeakHashSet` for reference-based cleanup. For bounded caches with eviction policies, consider `com.google.common.cache.Cache` from Guava."
Model Answer: "The second `add()` call returns `false` (element was already present) and the set remains unchanged. No exception is thrown. The element's position in the internal HashMap is not modified — only the mapping's value (PRESENT) is re-set to itself."
Model Answer: "`clear()` calls `HashMap.clear()` which iterates through all buckets and sets each entry to `null`, resetting `size` to 0. It does not shrink the bucket array (capacity stays the same). For memory optimization after bulk removal, create a new smaller HashSet instead."
Model Answer: "`HashSet.of("a", "b", "c")` (Java 9+) creates an **immutable** set — modifications throw `UnsupportedOperationException`. It uses less memory and is faster for small fixed sets. `new HashSet<>()` creates a mutable set suitable for dynamic collections. Choose based on whether the set needs to change after creation."
Model Answer: "HashSet uses `hashCode()` to find the bucket and `equals()` to identify the correct entry within the bucket. If two objects are `equals()` but have different `hashCode()` values, they would be stored in different buckets and treated as separate elements — violating the set contract. Both methods must be consistent for proper HashSet behavior."
Model Answer: "`LinkedHashSet` extends `HashSet` and adds a doubly-linked list that maintains **insertion order** iteration. `HashSet` has no ordering guarantee. `LinkedHashSet` provides O(1) operations with predictable iteration, at the cost of ~8 bytes extra per entry for the linked list pointers."
Further Reading
- Oracle HashSet Documentation — Official API specification
- Baeldung: HashSet Internals — How HashSet uses HashMap internally and performance characteristics
- HashSet vs TreeSet vs LinkedHashSet — Comparison of Set implementations and when to use each
- Understanding HashSet Performance — Internal workings and time complexity analysis
Conclusion
HashSet provides the simplest path to unique-element storage with O(1) membership checks. It is backed by a HashMap, using the element as the key and a shared dummy object as the value — all the O(1) behavior comes from that underlying map. The main constraint is no ordering guarantees and no duplicates.
The most common mistake is using mutable objects as set elements, which can silently break retrieval when hashCode() changes after insertion. Stick to immutable keys for any set that will be queried repeatedly. HashSet shines when deduplication, fast contains(), and raw uniqueness are what you need.
HashSet pairs naturally with HashMap — understanding one makes the other obvious since the mechanics are nearly identical. For cases requiring sorted unique elements, TreeSet offers O(log n) operations with full ordering at the cost of some performance.
Category
Related Posts
Abstract Classes in Java
Learn about partially implemented classes that define contracts for subclasses using abstract methods and concrete implementations.
Arithmetic Operators in Java
Master Java arithmetic operators: addition, subtraction, multiplication, division, and modulo with integer division gotchas and operator precedence explained.
Array Basics in Java
Learn Java array fundamentals: declaration, initialization, element access, and the length property explained simply.