The String Class

Master Java's String class: immutability, concatenation, interning, and key methods like substring, split, and format for text processing.

published: reading time: 20 min read author: Geek Workbench

The String Class

The String class is one of the most fundamental and frequently used types in Java. Representing sequences of Unicode characters, Strings are immutable objects that provide rich functionality for text manipulation, searching, and formatting.

Introduction

The String class is the most frequently used reference type in Java — virtually every program works with text, and Strings appear in everything from configuration to user input to network data. Strings in Java are immutable objects: once created, the character sequence cannot change. This immutability is a deliberate design decision that enables thread safety, secure hashing, and the String Pool memory optimization — but it also means that naive string concatenation in loops creates many intermediate objects, degrading performance.

Understanding String interning — the String Pool mechanism where identical literals share memory — is essential for writing memory-efficient code. Knowing when to use StringBuilder instead of + concatenation prevents O(n^2) performance pitfalls in loops. The wide range of built-in methods (indexOf, substring, split, replace, format) covers most text manipulation needs without external libraries.

This post covers String immutability and its implications, the String Pool and when to use intern(), the critical performance difference between concatenation and StringBuilder, essential method patterns with production code examples, and security considerations like using char[] for passwords instead of Strings. You will also learn how to handle Unicode correctly and avoid common pitfalls with split() and substring().

When to Use / When Not to Use

Use String when:

  • Working with text data of any kind
  • You need immutable text storage (keys, constants, etc.)
  • Building up strings through concatenation in non-critical paths
  • Pattern matching with split() or regex

Consider alternatives when:

  • Building strings in loops (use StringBuilder)
  • Frequent string modification (StringBuilder or StringBuffer)
  • Case-insensitive comparisons (use Locale-aware comparison)
  • Large text manipulation (consider CharSequence or specialized libraries)

String Memory Architecture

graph TD
    A["String Literal Pool<br/>(Method Area)"] --> B["\"hello\"<br/>Reference: s1"]
    A --> C["\"hello\"<br/>Reference: s2<br/>Same object as s1"]

    D["Heap Memory"] --> E["new String(\"hello\")<br/>New object, different from pool"]

    F["Reference Variables"] --> A
    F --> D

    style A stroke:#00ff00,color:#00ff00
    style D stroke:#ff00ff,color:#ff00ff

Production Failure Scenarios

ScenarioCauseMitigation
String concatenation in loopsO(n²) performance with + operatorUse StringBuilder
Memory from intern() misuseOver-interning fills PermGen/MetaspaceIntern only when needed
Case-sensitive comparison bugs”ABC”.equals(“abc”) returns falseUse equalsIgnoreCase() or Locale
Split creating empty tokenssplit(“X”) creates empty stringsUse limit parameter or trim
Substring memory leakJDK <7 kept char[] referenceUse substring with care in old versions
// Performance pitfall: concatenation in loop
String result = "";
for (String item : items) {
    result += item + ",";  // Creates new String each iteration!
}
// Better: use StringBuilder
StringBuilder sb = new StringBuilder();
for (String item : items) {
    sb.append(item).append(",");
}
String result = sb.toString();

// Split gotcha: trailing empty strings
String input = "a,b,c,";
String[] parts = input.split(",");  // [a, b, c] - no trailing empty
// But with regex
String[] parts2 = input.split(",");  // Still [a, b, c]

// For controlled splitting with limit
String[] limited = input.split(",", -1);  // [a, b, c, ""] - keeps trailing empty

// Substring before Java 7u6: memory leak (shared char[])
String big = "large string...";
String small = big.substring(0, 5); // In old JDKs, 'small' held reference to large char[]
// In modern JDKs, this is fixed - substring creates new char[] copy

Trade-off Table

OperationMethodPerformanceNotes
Concatenation+O(n²) in loopsCompiler optimizes to StringBuilder
BuildingStringBuilder.append()O(n)Preferred for multiple appends
Thread-safe buildingStringBufferSlower than BuilderUse only when needed
SearchindexOf()O(n)Linear scan
Immutable keyString in HashMapO(1) averageGood for caching
Pattern splitsplit() or regexO(n) + regex overheadConsider StringTokenizer

Implementation Snippets

String Creation and Interning

public class StringCreation {
    public static void main(String[] args) {
        // String literals go to the pool (interned)
        String a = "hello";
        String b = "hello";
        System.out.println(a == b);  // true - same pool reference

        // new creates heap object
        String c = new String("hello");
        System.out.println(a == c);  // false - different objects

        // Explicit interning
        String d = c.intern();  // Returns pool reference
        System.out.println(a == d);  // true

        // String constants are compile-time merged
        String e = "hel" + "lo";  // Compiler optimizes to "hello"
        System.out.println(a == e);  // true
    }
}

Essential String Methods

public class StringMethods {
    public static void main(String[] args) {
        String text = "Hello, World!";

        // Searching
        int idx = text.indexOf("World");     // 7
        int last = text.lastIndexOf("o");    // 8
        boolean has = text.contains("ell"); // true
        boolean starts = text.startsWith("Hell"); // true
        boolean ends = text.endsWith("!");   // true

        // Extraction
        String sub = text.substring(7);      // "World!"
        String sub2 = text.substring(7, 12); // "World"
        char ch = text.charAt(0);           // 'H'
        String trimmed = text.trim();        // Removes leading/trailing whitespace

        // Transformation
        String upper = text.toUpperCase();   // "HELLO, WORLD!"
        String lower = text.toLowerCase();   // "hello, world!"
        String replaced = text.replace("World", "Java"); // "Hello, Java!"
        String[] words = text.split(", ");   // ["Hello", "World!"]

        // Formatting
        String formatted = String.format("Name: %s, Age: %d", "Alice", 30);

        // Null-safe operations
        String nullStr = null;
        String safe = String.valueOf(nullStr); // "null" not NPE
        // Or use Objects.toString()
    }
}

StringBuilder Patterns

public class StringBuilderDemo {
    // Efficient string building
    public static String buildCsv(List<String> values) {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < values.size(); i++) {
            if (i > 0) sb.append(",");
            sb.append(escapeCsv(values.get(i)));
        }
        return sb.toString();
    }

    // Reverse string
    public static String reverse(String input) {
        return new StringBuilder(input).reverse().toString();
    }

    // Check palindrome
    public static boolean isPalindrome(String s) {
        String clean = s.replaceAll("[^a-zA-Z0-9]", "").toLowerCase();
        return clean.equals(new StringBuilder(clean).reverse().toString());
    }

    private static String escapeCsv(String value) {
        if (value.contains(",") || value.contains("\"") || value.contains("\n")) {
            return "\"" + value.replace("\"", "\"\"") + "\"";
        }
        return value;
    }
}

Observability Checklist

  • Monitor string memory usage (String pool size in heap)
  • Track StringBuilder usage vs concatenation in performance profiling
  • Alert on excessive String.split() operations in hot paths
  • Log when interning is called (can indicate memory pressure)
  • Measure substring creation overhead in old JVM versions
// Observability for string operations
public class StringMetrics {
    public static void trackBuildTime(StringBuilder sb, String context) {
        if (sb.length() > 10000) {
            Logger.info("Large StringBuilder [{}]: {} chars", context, sb.length());
        }
    }

    public static void trackSplit(String input, String delimiter) {
        String[] parts = input.split(delimiter);
        if (parts.length > 100) {
            Logger.warn("Large split result [{}]: {} parts", input, parts.length);
        }
    }
}

Common Pitfalls / Anti-Patterns

  • Password handling: Strings are immutable, passwords stay in memory until GC. Use char[] for sensitive data.
  • Log injection: User input in logs can be exploited; sanitize newlines and special chars
  • String comparison timing attacks: Constant-time comparison not native to String (use MessageDigest.isEqual())
  • Encoding issues: Always specify charset when converting bytes to String
// Security: avoid storing passwords in Strings
public class SecurePassword {
    public boolean verify(char[] input, char[] stored) {
        if (input.length != stored.length) return false;

        // Constant-time comparison
        boolean match = true;
        for (int i = 0; i < input.length; i++) {
            match &= (input[i] == stored[i]);
        }

        // Clear arrays when done
        Arrays.fill(input, '0');
        Arrays.fill(stored, '0');

        return match;
    }
}

// Security: specify charset to avoid garbling
byte[] bytes = getData();
String safe = new String(bytes, StandardCharsets.UTF_8); // Always specify
String unsafe = new String(bytes); // Platform default - unpredictable

Common Pitfalls / Anti-patterns

  1. Using == to compare string values

    // BAD - compares references, not content
    String a = new String("hello");
    String b = new String("hello");
    if (a == b) { } // false
    
    // GOOD - compares values
    if (a.equals(b)) { } // true
  2. Concatenating in loops

    // BAD - creates many intermediate String objects
    String result = "";
    for (String s : list) {
        result += s;
    }
    
    // GOOD - StringBuilder
    StringBuilder sb = new StringBuilder();
    for (String s : list) {
        sb.append(s);
    }
    String result = sb.toString();
  3. Ignoring empty string vs null

    // BAD - NPE if str is null
    boolean empty = str.isEmpty();
    
    // GOOD - handles null
    boolean empty = (str == null) || str.isEmpty();
    // or use Apache Commons StringUtils.isEmpty()
  4. Case-insensitive comparison without Locale

    // BAD - uses system default locale, may not be correct
    if (str.equalsIgnoreCase("yes")) { }
    
    // GOOD - explicitly use US locale for ASCII comparison
    if (str.equalsIgnoreCase("yes")) { } // Actually fine for ASCII, but for Turkish:
    str.toUpperCase(Locale.US).equals("YES") // Correct

Quick Recap Checklist

  • String is immutable — creating a new String modifies nothing, returns new object
  • String literals are interned (stored in String pool for reuse)
  • Use new String() only when you need a distinct heap object
  • intern() adds a string to the pool and returns the pooled reference
  • Use StringBuilder for building strings in loops or multiple operations
  • split() with no limit can create empty trailing tokens
  • Use char[] instead of String for passwords (immutability prevents clearing)
  • Always specify charset when creating strings from bytes

Interview Questions

1. Why are Strings immutable in Java?

Model Answer: "String immutability provides several benefits: 1) Security — strings are used as class names, file paths, network URLs; mutation could corrupt these. 2) Thread safety — immutable strings need no synchronization. 3) String pooling — immutability allows interning and memory sharing without fear of modification. 4) HashMap/HashSet keys — immutability ensures hash code remains stable. 5) Performance — the JVM can cache String objects and optimize operations. Once a String is created, its character sequence cannot change.

2. What is the difference between String, StringBuilder, and StringBuffer?

Model Answer: "String is immutable — every modification creates a new object. StringBuilder is mutable and designed for single-threaded string building — append, insert, reverse operations modify the internal buffer in place with better performance. StringBuffer is the thread-safe version of StringBuilder — all methods are synchronized, making it safe for multi-threaded use but slower. Use StringBuilder for most new code; use StringBuffer only when sharing the buffer across threads.

3. What is String interning and when should you use it?

Model Answer: "String interning places literal strings in a shared pool (String.intern()). When you intern a string, if an equivalent literal already exists in the pool, the JVM returns that reference; otherwise the string is added and its reference returned. Use intern() when: you have many identical strings and want to save memory; you need to compare strings using == for performance. Avoid intern() when: the strings are numerous and short-lived (adds to pool pressure); you're not comparing many equal strings. In modern JVMs with G1GC, interned strings are moved to heap, reducing PermGen issues.

4. What does the split() method return for trailing empty strings?

Model Answer: "String.split() with no limit (or limit > 0) discards trailing empty strings. For example, "a,b,c,".split(",") returns ["a", "b", "c"] — the trailing empty string is dropped. To preserve trailing empties, use a negative limit: "a,b,c,".split(",", -1) returns ["a", "b", "c", ""]. This behavior mirrors the Unix tool awk and prevents empty tokens at the end of lines from being silently lost.

5. How does substring() work and what was the memory leak in older JDKs?

Model Answer: "In modern JDKs (7u6+), substring() creates a new char array containing only the requested characters — no memory leak. However, before this fix, substring() in JDK 6 (and earlier) created a new String object but shared the parent String's underlying char array. The new String's offset/length pointed into the parent's array. This meant holding onto a small substring kept the entire large char array alive in memory — a common cause of OutOfMemoryErrors when processing large strings and keeping small substrings. Always prefer substring over other extraction methods for memory efficiency in modern JVMs.

6. What is the difference between indexOf() and lastIndexOf()?

Model Answer: "indexOf() searches from the beginning of the string toward the end, returning the first index where the substring is found, or -1 if not found. lastIndexOf() searches from the end toward the beginning, returning the last (rightmost) index of the substring. Both support an optional start position parameter. Use indexOf() when you want the first occurrence; use lastIndexOf() when you want the last occurrence, such as finding the final path separator in a file path (path.lastIndexOf('/')). Both perform linear O(n) searching through the string's character array.

7. How do you efficiently check if a String contains only digits?

Model Answer: "Use matches() with regex "\\d+" for simple cases, but for performance-sensitive code use a manual loop: public static boolean isNumeric(String s) { for (int i = 0; i < s.length(); i++) { if (!Character.isDigit(s.charAt(i))) return false; } return true; }. This avoids regex compilation overhead and is typically 3-5x faster for repeated checks. Alternatively, use Character.isDigit() in a loop with early exit. For empty strings, the manual loop returns true (no digits found to contradict); if you need false for empty, add a length check. Apache Commons Lang StringUtils.isNumeric() handles additional Unicode digit categories.

8. What is the behavior of String's replace() methods?

Model Answer: "String has four replace methods: replace(char, char) replaces all occurrences of a character — returns a new String. replace(CharSequence, CharSequence) replaces all occurrences of a substring. replaceFirst(String regex) replaces only the first match of a regex pattern. replaceAll(String regex, String replacement) replaces all matches of a regex. All return a new String (String is immutable) and perform linear-time scanning for simple character/substring replacements. For regex-based replacements, compilation happens on each call — consider using Pattern.compile() once and reusing the matcher for better performance in loops.

9. How does String handle Unicode characters beyond BMP?

Model Answer: "String internally stores characters in UTF-16 encoding (char array), where each char is a 16-bit value. For characters in the Basic Multilingual Plane (BMP, U+0000 to U+FFFF), one char is sufficient. For supplementary characters (Unicode code points beyond U+FFFF, like many emoji), two chars are needed — a surrogate pair. Methods like length() return the number of chars, not code points. For correct character counting, use codePointCount(0, length()). Similarly, charAt() returns a single char which may be half of a supplementary character. Use codePointAt() for proper supplementary character handling.

10. What is String.trim() and what are its limitations?

Model Answer: "trim() removes leading and trailing ASCII whitespace (code points <= U+0020). It returns a new String with these characters removed. Limitations: it does not remove non-ASCII whitespace (e.g., NBSP U+00A0, BOM U+FEFF), it only looks at char values <= 32, and it cannot distinguish between intentional zero-width space and unwanted whitespace. For Unicode-aware trimming, use strip() (Java 11+) which uses Character.isWhitespace() and handles Unicode properly. stripLeading() and stripTrailing() remove from one side only. For HTML/entity whitespace, use a regex or a library like Apache Commons Text.

11. How do you convert a String to uppercase/lowercase for different locales?

Model Answer: "Use toLowerCase(Locale.getDefault()) or toUpperCase(Locale.US) to specify locale. The default locale-sensitive versions can produce unexpected results in Turkish locale — for example, 'i'.toUpperCase() in Turkish locale produces 'İ' (dotted I), not 'I'. For ASCII-only strings, explicitly use Locale.ROOT or Locale.US: str.toUpperCase(Locale.US). The toLowerCase() and toUpperCase() without locale use the JVM's default locale, which can change between environments and cause subtle bugs in internationalized applications.

12. What is the difference between String.split() and StringTokenizer?

Model Answer: "String.split() uses regex internally — it compiles a pattern and matches against the string, which has overhead for simple delimiters. StringTokenizer is purpose-built for simple delimiter-based splitting and is generally faster when you only need basic tokenization. StringTokenizer also handles empty tokens more predictably. For simple single-character delimiters, String.indexOf() in a loop can be faster than both. For regex-based splitting (multiple delimiters, patterns), split() is more convenient. Performance difference is negligible for most applications; use whichever is more readable for your use case.

13. How does String's hashCode() work?

Model Answer: "String's hashCode is computed as: s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1], where s[i] is the character value and n is the string length. The constant 31 was chosen empirically as a good compromise between spread and performance (multiplication by 31 can be optimized to (i << 5) - i). This hashing is used by the JVM for String keys in HashMap/HashSet. Because Strings are immutable, the hashCode is cached after first computation — subsequent calls return the cached value. This makes String a good HashMap key because hashCode is stable and caching reduces repeated computation.

14. How do you handle null strings safely in Java?

Model Answer: "Null strings require defensive handling: 1) Null check before use: if (str != null) { ... }. 2) Use String.valueOf(obj) which returns "null" for null rather than throwing NPE. 3) Use Objects.requireNonNull() to fail fast on null input. 4) Use Objects.toString() with a default: Objects.toString(str, "default"). 5) In equals comparisons, put the literal first: "literal".equals(str) — this is null-safe (won't throw NPE) and also guards against the literal being mistakenly null. 6) Apache Commons Lang StringUtils.defaultString() and related utilities handle null gracefully.

15. What is String.format() and when should you use it versus concatenation?

Model Answer: "String.format() uses a format string with printf-style specifiers (%s, %d, %.2f) and returns a formatted string. Use it for: complex formatted output (tables, aligned columns), internationalized messages (with MessageFormat), and when format is defined once and applied to multiple values. Avoid it for: simple concatenations in non-performance-critical paths, or in tight loops (format parsing has overhead). For simple cases like "Hello " + name, concatenation is clearer and the JIT compiler optimizes adjacent string concatenations into StringBuilder internally. Profile before replacing concatenation with format calls in hot paths.

16. How does the String constructor handle charset encoding?

Model Answer: "The String(byte[], charset) constructor decodes the byte array using the specified charset, throwing UnsupportedEncodingException if the charset is not supported. Always specify a charset explicitly: new String(bytes, StandardCharsets.UTF_8). Without a charset, new String(bytes) uses the JVM's default charset, which varies by platform and locale — a common source of encoding bugs. Similarly, getBytes(charset) should be used instead of getBytes() for consistent encoding. For binary data in strings (Base64, etc.), use proper encoding utilities rather than this constructor.

17. What is the performance impact of comparing Strings with equals() versus ==?

Model Answer: "equals() compares character-by-character (O(n) where n is string length) after a quick length check. == compares references — fast O(1) if both point to the same object, but misleading for value comparison. For interned strings from the pool, == may return true for equal literals because they share references. For Strings created with new String(), == always returns false for separate objects even with identical content. Always use equals() for value comparison. The JIT can optimize repeated equals() calls on the same strings by recognizing patterns, but reference equality is not reliable for value semantics.

18. How do you efficiently concatenate many Strings?

Model Answer: "For a known small number of strings, + is readable and the JIT optimizes it to StringBuilder internally. For unknown or large numbers, explicitly use StringBuilder: StringBuilder sb = new StringBuilder(initialCapacity); for (String s : strings) { sb.append(s); } return sb.toString();. Set initial capacity to avoid resizing: new StringBuilder(sum of lengths). For streams, use Collectors.joining() which uses StringBuilder internally. Avoid StringBuffer unless you need thread safety (synchronized). For CSV/character-separated output, StringBuilder with manual append is faster than split+join approaches.

19. What is the difference between isEmpty() and isBlank()?

Model Answer: "isEmpty() (Java 6+) returns true only if length() == 0 — it does not check for whitespace. isBlank() (Java 11+) returns true if the string is empty OR contains only whitespace characters (according to Character.isWhitespace()). Example: " ".isEmpty() is false, but " ".isBlank() is true. "".isBlank() is true (empty is blank by definition). For input validation, isBlank() is usually what you want — it treats spaces, tabs, and other whitespace as empty. For checking if a string has actual content (non-whitespace characters), use isBlank() or trim + isEmpty depending on your requirements.

20. How does String implement Comparable?

Model Answer: "String implements Comparable<String> with lexicographic (dictionary) comparison based on Unicode code point values. compareTo() compares character-by-character, stopping at the first difference or when one string is exhausted. Shorter strings compare as "less than" longer strings when the longer starts with the shorter: "ab".compareTo("abc") returns negative. This is NOT case-insensitive — 'A' (65) < 'a' (97). For case-insensitive or locale-aware comparison, use String.CASE_INSENSITIVE_ORDER or Collator.getInstance(). String comparisons are used in sorting algorithms and TreeSet/TreeMap ordering.

Further Reading

Conclusion

The String class in Java is an immutable object representing a sequence of Unicode characters. Because strings are so fundamental, Java optimizes them heavily — literal strings are interned in the String Pool for memory sharing, and the class provides rich methods for searching, extraction, transformation, and formatting.

Key takeaways: immutability makes String thread-safe and suitable for keys in HashMap/HashSet, but also means every concatenation in loops creates new objects — use StringBuilder instead. The String Pool stores literal strings for reuse, but new String() always creates a separate heap object. split() discards trailing empty strings unless you use a negative limit parameter.

Strings are the most common reference type in Java and appear in virtually every program. For understanding how wrapper classes like Integer and Double handle the boundary between primitives and objects, see Java Wrapper Classes.

Category

Related Posts

Abstract Classes in Java

Learn about partially implemented classes that define contracts for subclasses using abstract methods and concrete implementations.

#java-abstract-classes #java #java-fundamentals

Arithmetic Operators in Java

Master Java arithmetic operators: addition, subtraction, multiplication, division, and modulo with integer division gotchas and operator precedence explained.

#java-arithmetic-operators #java #java-fundamentals

Array Basics in Java

Learn Java array fundamentals: declaration, initialization, element access, and the length property explained simply.

#java-array-basics #java #java-fundamentals