Logical vs Physical Address Space
Understanding address translation, segmentation, and how the operating system constructs a virtual memory view that differs from the actual physical memory layout.
Logical vs Physical Address Space
When your program reads from memory address 0x7fff5fbff58c, it is not accessing the physical memory chips at that physical location. That address is a logical address — a construct that the CPU resolves through the memory management unit (MMU) before any byte hits the actual DRAM. The OS and hardware collaborate to give each process the illusion of a contiguous, exclusive address space, even when physical memory is fragmented, shared, and smaller than the logical address space your code uses.
Understanding the distinction between logical and physical addresses is foundational to reasoning about memory protection, virtual memory, and the security boundaries that keep your processes from crashing — or spying on — each other.
When to Use / When Not to Use
The logical vs physical address distinction is always in effect — you cannot disable it on modern hardware. But understanding it matters most when:
When you benefit from knowing it:
- Debugging segmentation faults (was the fault caused by an invalid logical address or a protection violation?)
- Writing low-level systems code that interacts with the kernel (mmap, mprotect, shmget)
- Optimizing for TLB behavior (understanding which logical pages map to which physical frames)
- Understanding ASLR (Address Space Layout Randomization) and why it mitigates exploits
When you do not need to think about it:
- Writing application code in high-level languages (Python, JavaScript, Go)
- Web services and API handlers
- Standard CRUD operations where the runtime handles memory
The distinction becomes unavoidable when you write C/C++ with manual memory management, or when you work with kernel-level code, device drivers, or any system that calls mmap() or mprotect() directly.
Architecture or Flow Diagram
The address translation path flows from your program’s logical address through the MMU’s translation process, consulting the TLB and page tables before reaching physical memory.
flowchart TD
CPU["CPU Issues<br/>Logical Address"]
MMU["MMU<br/>Memory Management Unit"]
TLB["TLB<br/>Translation Lookaside Buffer"]
PT["Page Tables<br/>in Physical Memory"]
DRAM["Physical DRAM"]
PF["Page Fault?<br/>Exception Handler"]
PGFAULT["OS Page Fault<br/>Handler"]
CPU --> MMU
MMU --> TLB
TLB -->|"Hit"| DRAM
TLB -->|"Miss"| PT
PT -->|"Valid entry"| DRAM
PT -->|"Invalid entry"| PGFAULT
PGFAULT -->|"Page loaded"| DRAM
PGFAULT -->|"Protection fault"| PGFAULT
style PF stroke:#ff6b6b
style PGFAULT stroke:#ff6b6b
The MMU first checks the TLB for a cached virtual-to-physical translation. On a TLB miss, it walks the page tables (stored in physical memory) to find the mapping. If the page is not in physical memory, the OS loads it from disk and updates the page tables. If the access is invalid (protection violation), the OS delivers a segmentation fault to the process.
Core Concepts
Logical (Virtual) Address Space
The logical address space is what the CPU generates and what your program operates within. On a 64-bit system, the theoretical logical address space is 2^64 bytes — far larger than any physical memory installed. In practice, hardware and OS constraints limit this:
- Linux/x86-64: Uses 48-bit virtual addresses (256 TB address space), with the upper half reserved for the kernel
- Windows/x64: Uses 48-bit addresses with a different split between user and kernel halves
- ARM64: Supports 48-bit or 52-bit virtual addresses depending on the hardware implementation
The kernel sets up each process with its own isolated virtual address space. When process A writes to logical address 0x1000, it modifies whatever physical page that address maps to — completely independent of what process B has at its own 0x1000.
Physical Address Space
Physical addresses refer to actual hardware memory chips — the DRAM modules on your motherboard. The physical address space is bounded by installed RAM. A machine with 16 GB of DRAM has a 16 GB physical address space, regardless of how many processes are running or how large their combined virtual address spaces are.
Physical memory pages are not contiguous in terms of virtual addresses — page 0 and page 1 in virtual memory might map to physical pages 0x1A000 and 0x0034000. The OS maintains the mapping; the CPU enforces it.
Segmentation (Historical and Modern)
Segmentation predates paging as a memory management scheme. In pure segmentation, a logical address consists of a segment selector and an offset. The selector indexes a segment descriptor stored in the GDT (Global Descriptor Table) or LDT (Local Descriptor Table), which contains the base physical address and limit (size) of the segment. The CPU adds the offset to the base to produce the physical address.
x86 supports segmentation natively via the CS, DS, SS, ES, FS, GS segment registers. However, modern OSes (Linux, Windows) use flat segmentation — all segment bases are set to zero and limits are set to the maximum, effectively neutralizing segmentation and relying entirely on paging for memory protection and virtualization.
Linux does still use segmentation for specific purposes:
- The
%fssegment register holds per-thread data (thread-local storage) on many Linux configurations - The
%gsregister is used for stack protector canaries and other thread-specific data
The MMU and Address Translation
The Memory Management Unit is hardware dedicated to translating logical addresses to physical addresses on every memory access. On x86, the MMU is integrated into the CPU die. It operates in parallel with the cache hierarchy — physical addresses are used to index the L1 cache, so address translation must happen before or alongside cache lookup.
The translation uses a two-level structure on modern systems:
- Page Directory (CR3 register points to this in physical memory)
- Page Table entries pointed to by each page directory entry
The logical address is split into:
- Page directory index (bits 31-22)
- Page table index (bits 21-12)
- Page offset (bits 11-0)
The MMU walks these levels, checks protection bits, and produces a 44-bit physical frame number, then appends the 12-bit offset to produce a 56-bit physical address.
Production Failure Scenarios + Mitigations
Page Table Bloat in Large Mappings
Failure: Applications that memory-map large files (databases, machine learning models) can create page tables with millions of entries. Each level of the page table hierarchy consumes a full 4 KB page. For a 100 GB memory-mapped file with 4 KB pages, the page table hierarchy alone consumes ~100 MB of physical memory — memory that could serve as actual page cache for other workloads.
Mitigation: Use huge pages (2 MB or 1 GB pages on x86) for large mappings. PostgreSQL’s huge_pages setting, the Linux thp (transparent huge page) feature, and mmap(MAP_HUGETLB) all reduce page table overhead. The tradeoff is increased internal fragmentation (wasted space within each huge page).
TLB Shootdowns Under Heavy Context Switching
Failure: When the OS performs a context switch, it must invalidate all TLB entries for the outgoing process. On systems with many CPU cores, this TLB flush propagates across cores via IPIs (Inter-Processor Interrupts), causing a latency spike. Under heavy fork/exec activity, TLB shootdown cost can consume 10-20% of CPU time.
Mitigation: Linux includes ASID (Address Space ID) support on supported hardware, which tags TLB entries with a process identifier. This allows TLB entries from different processes to coexist without invalidation. Not all hardware supports ASID; on those that do not, every context switch invalidates the TLB. PCID (Process Context ID) on x86 implements ASID-like tagging.
Address Space Layout Randomization Collisions
Failure: ASLR randomizes where the stack, heap, libraries, and mmap regions load in the virtual address space. But entropy is limited — on a 64-bit system with 48-bit addresses, some regions have less randomization than expected due to alignment requirements. In certain conditions, a crafted ROP (Return-Oriented Programming) chain can bypass ASLR despite the randomization.
Mitigation: Use position-independent executables (PIE) (-fPIC -pie on GCC/Clang), which randomizes the entire address space including the main executable. Combine with RETL (Return FlowGuard) and BFG (Blitz Cookie FGuard) for layered exploit mitigation. On Linux, verify ASLR is active with cat /proc/sys/kernel/randomize_va_space (0=off, 1=stack-only, 2=full).
Trade-off Table
| Aspect | Pure Segmentation | Pure Paging | Hybrid Segmentation + Paging |
|---|---|---|---|
| External fragmentation | High (segments must be contiguous in physical memory) | None (pages are uniform size) | Minimal |
| Internal fragmentation | None | ~2 KB average per page (4 KB page, 2 KB average waste) | Same as paging |
| Memory protection granularity | Segment-level (coarse) | Page-level (4 KB, fine-grained) | Segment + page |
| Page table overhead | None (segment descriptors are small) | Multi-level page tables for large address spaces | Reduced (segment base + page offset) |
| TLB efficiency | High (one TLB entry per segment covers all addresses) | One TLB entry per page; large working sets exhaust TLB | Best of both (one entry per segment for commonly-used pages) |
| Complexity | Simple hardware | Moderate (page table walks) | High |
| Modern OS support | Neglected (flat segmentation only) | Universal (Linux, Windows, macOS all use paging) | Historical (80286 era); not used in modern general-purpose OSes |
Implementation Snippets
Examining Your Virtual Address Space (C)
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
/* Read and print the maps file that shows this process's
* virtual address space layout. Each line describes a mapped
* region: start-end perms offset device inode pathname */
void print_virtual_mappings(void) {
FILE *maps = fopen("/proc/self/maps", "r");
if (!maps) {
perror("fopen /proc/self/maps");
return;
}
char line[512];
printf("%-18s %-10s %-6s %s\n",
"Address range", "Permissions", "Offset", "Pathname");
printf("%-18s %-10s %-6s %s\n",
"-------------", "------------", "------", "--------");
while (fgets(line, sizeof(line), maps)) {
// Example line:
// 00400000-0040c000 r-xp 00000000 fd:00 123456 /bin/bash
printf("%s", line);
}
fclose(maps);
}
/* Show the logical vs physical address relationship.
* On x86, page tables are in physical memory; we can read them
* via the CR3 register (via read_cr3() on Linux) */
int main(void) {
printf("Process ID: %d\n", getpid());
printf("Logical address space on x86-64 is 48 bits (256 TB)\n");
printf("Physical memory installed: ");
FILE *meminfo = fopen("/proc/meminfo", "r");
char line[256];
while (fgets(line, sizeof(line), meminfo)) {
if (strncmp(line, "MemTotal:", 9) == 0) {
printf("%s", line);
break;
}
}
fclose(meminfo);
print_virtual_mappings();
return 0;
}
Translating Virtual to Physical Addresses (bash + ping)
#!/bin/bash
# Find the virtual-to-physical memory mapping for a target process
# Requires root for page table access
PID=$(pgrep -n ping | head -1)
if [ -z "$PID" ]; then
echo "No ping process found"
exit 1
fi
echo "Examining virtual address space for PID $PID:"
echo "---"
# /proc/PID/pagemap lets us translate virtual to physical pages
# (requires reading 8 bytes per virtual page)
PAGES=$(cat /proc/$PID/pagemap 2>/dev/null)
if [ -z "$PAGES ]; then
echo "Cannot read pagemap (need root or pagemap capability)"
exit 1
fi
# Show the memory maps for context
echo "Memory mappings for PID $PID:"
cat /proc/$PID/maps | head -20
Observability Checklist
- TLB hit/miss rates:
perf stat -e dTLB-loads,dTLB-load-misses,dTLB-stores,dTLB-store-misses ./program - Page fault frequency and type:
cat /proc/PID/status— look atVmFault*fields;ps -o majflt,minfltshows major vs minor faults - Page table walker events:
perf stat -e page_walk_mgr_*(hardware-specific) for hardware page table walker activity - Virtual memory map size:
ps -o vsz,rss(virtual size vs resident set size) — large gap indicates memory-mapped files or swapped regions - ASLR entropy:
cat /proc/sys/kernel/randomize_va_space— should be2in production - Large page usage:
cat /proc/meminfo | grep -i huge— shows transparent huge page and explicit huge page statistics - OOM (Out of Memory) killer:
dmesg | grep -i "out of memory"orjournalctl -k | grep -i oom— shows when the OOM killer invoked - swapped pages:
vmstat 1—siandsocolumns show swap-in and swap-out rates
Security/Compliance Notes
Address Space Layout Randomization (ASLR): ASLR randomizes the base addresses of the stack, heap, libraries, VDSO, and mmap regions every time a process starts. This forces attackers to guess the address of usable code gadgets (for ROP attacks), rather than knowing them fixed. ASLR is effective against exploits that require knowing a memory address, but it does not protect against information leaks (e.g., Spectre-style timing attacks).
Kernel Address Space Layout Randomization (KASLR): Starting with Linux kernel 4.12, KASLR randomizes the kernel’s load address at boot. Without KASLR, a kernel stack overflow exploit could target known kernel symbol addresses. KASLR makes the kernel’s text segment base unpredictable, adding another layer to kernel exploit mitigation.
SMEP (Supervisor Mode Execution Prevention) and SMAP (Supervisor Mode Access Prevention): These CPU features (on Intel Skylake+ and AMD Zen+) prevent the kernel from executing user-space code (SMEP) or accessing user-space data (SMAP). Even if an exploit hijges kernel control flow, SMEP blocks jumping to user-space shellcode, and SMAP blocks the kernel from dereferencing pointers to user-space buffers (preventing data-only attacks via kernel data structure corruption).
Meltdown/Spectre Impact on Address Translation: The Meltdown vulnerability (CVE-2017-5754) exploited the fact that out-of-order speculative execution could execute memory accesses that should have faulted, populating the cache with data from the target address. The microcode patches (IBRS/STIBP) added serializing instructions to the address translation path, adding measurable latency to every system call and page table walk.
Common Pitfalls / Anti-patterns
Pitfall: Assuming virtual address spaces are not fragmented. The virtual address space can become fragmented through repeated mmap/munmap cycles, leaving gaps between mapped regions. When a large contiguous allocation is requested (e.g., a 10 MB mmap), the allocator may fail with ENOMEM even though the total free bytes exceeds 10 MB — because the free memory is not contiguous.
Pitfall: Writing to memory-mapped files without understanding write-back behavior.
mmap with MAP_SHARED writes directly to the underlying file (or page cache). The write may not reach persistent storage immediately — the OS batches writes for performance. For durability-critical data, use msync() to force writes to storage, or open with MAP_POPULATE to pre-fault pages.
Pitfall: Confusing virtual memory with swap space. Virtual memory is the logical address space your OS presents. Swap is the backing store for pages not present in physical memory. A system can have abundant virtual address space but zero swap free — the OOM killer will invoke. Virtual address space exhaustion is different from physical memory exhaustion.
Anti-pattern: Pointer arithmetic assuming 32-bit addressing on a 64-bit system.
On a 64-bit system, sizeof(void*) is 8 bytes. If you store a pointer in a 32-bit integer (uint32_t), you truncate the upper bits, silently corrupting the address. GCC’s -Wpointer-to-int-cast and -Wint-to-pointer-cast flags catch these. UBSan (Undefined Behavior Sanitizer) also detects them.
Quick Recap Checklist
- A logical (virtual) address is what the CPU generates; a physical address is what DRAM accepts
- The MMU translates logical to physical addresses on every memory access, with TLB as the translation cache
- Page tables stored in physical memory map virtual pages to physical frames; the MMU walks them on TLB misses
- Segmentation (base+limit) is obsolete in modern OSes; paging provides finer-grained protection and eliminates external fragmentation
- ASLR randomizes the virtual address space layout to prevent exploits from hardcoding memory addresses
- TLB shootdowns on context switches cause measurable latency spikes on systems without PCID/ASID support
- Page table bloat for large memory mappings can consume significant physical memory — huge pages mitigate this
- Virtual address space exhaustion (ENOMEM) is different from physical memory exhaustion (OOM killer invocation)
- Tools like
/proc/PID/maps,/proc/PID/pagemap, andperf statwith TLB events reveal address translation behavior - SMEP/SMAP prevent the kernel from executing or accessing user-space memory, closing exploit vectors
Interview Questions
The CPU's memory management unit raises a page fault exception (vector 0xE on x86). The OS page fault handler then determines whether the page is simply not present in physical memory (a minor fault — load it from disk or zero-fill it and resume) or whether the access is invalid (protection violation — deliver a SIGSEGV to the process). If the fault is minor, the OS allocates a physical frame, updates the page table entry to point to it, and resumes the interrupted instruction. If invalid, the process receives a segmentation fault. The key point is that the faulting instruction is restartable — the CPU saves the faulting address in the CR2 register and preserves the instruction pointer so the handler can resolve it and retry.
The Translation Lookaside Buffer is a hardware cache of recent virtual-to-physical address translations. Since every memory access requires an address translation, and page table walks require accessing physical memory themselves, the TLB is critical for performance. A TLB hit resolves in a single cycle; a TLB miss requires the MMU to walk the multi-level page tables in physical memory — adding 10-100 cycles of stall time depending on whether the page table pages are cached in L1/L2 or must be fetched from DRAM. TLB misses are particularly expensive when they occur frequently during a critical loop — the page table walker consumes memory bandwidth that the program itself could use, creating pathological performance.
ASLR randomizes the base addresses of memory regions (stack, heap, libraries, mmap, VDSO) each time a process starts or the system boots. An attacker who previously could jump to a known address like 0x08048xyz (return-to-libc) or chain together known gadgets (ROP) now must discover the addresses at runtime — dramatically increasing exploit complexity. Limitations include: limited entropy on 32-bit systems (fewer bits to randomize), information leaks that reveal addresses despite ASLR (e.g., /proc filesystem disclosures, timing oracles), and attacks that don't need to know addresses (e.g., data-only attacks corrupting function pointers already in the heap). Combining ASLR with stack canaries, NX (no-execute) bits, and control flow integrity creates defense in depth.
A minor page fault occurs when a process accesses a virtual page that is validly mapped to a physical frame, but that frame is not yet loaded in physical memory — it is clean (matches the on-disk copy) and simply needs to be faulted in from the page cache. No disk I/O is needed beyond the page cache read. A major page fault occurs when the page needs to be read from swap (not in physical memory and not clean) — this requires actual disk I/O, adding millisecond-scale latency. The minflt column in ps output counts minor faults (e.g., when copy-on-write pages are faulted in after a fork), while majflt counts major faults (e.g., when a process accesses a page that was swapped out). A process with a high majflt rate is experiencing severe memory pressure.
In a fork(), the child process initially receives a separate virtual address space that shares all the parent's physical memory frames. The sharing is read-only — both processes read the same physical pages. No physical memory is actually copied at fork time, making fork() extremely fast regardless of the parent's memory footprint. Physical memory is only copied (duplicated) when either process writes to a shared page — this is the copy-on-write mechanism. When a write occurs, the OS copies the specific page to a new frame, updates the child process's page table to point to the new frame, and marks the parent's copy as read-only. Until writes happen, the physical memory footprint of fork() is minimal. This optimization is critical for applications that fork() to run background tasks or implement preFork workers without paying for a full memory copy.
Segmentation divides memory into variable-sized segments based on program structure (code, stack, heap) — each segment has a base and limit. Paging divides memory into fixed-size pages of uniform size (4 KB typically). Segmentation provides coarse-grained protection and logical memory abstraction; paging provides fine-grained protection and eliminates external fragmentation. Pure segmentation suffers from external fragmentation (segments must be contiguous in physical memory). Pure paging eliminates external fragmentation but has internal fragmentation (~2 KB average waste per 4 KB page). Modern OSes use flat segmentation (all segment bases=0, limits=max) to neutralize segmentation's memory fragmentation while keeping its segment registers for TLS and security base. Linux uses paging exclusively for virtual memory; segmentation is used only for specific purposes like %fs for thread-local storage.
The MMU first checks the TLB for a cached virtual-to-physical translation. On a TLB hit (the common case for hot data), the translation is instantaneous — a single cycle. On a TLB miss, the MMU walks the multi-level page tables in physical memory. It reads the PML4 entry (via CR3), then the PDP entry, then the PD entry, then the PT entry — each step requiring a physical memory access. If any page table page is not present, a page fault occurs. If all levels are present and the final PTE shows the page is in memory, the MMU constructs the physical address from the frame number and offset. The translation is then cached in the TLB for subsequent accesses. On x86-64 with 4-level paging, a TLB miss costs 4-8 additional memory accesses before the data can be fetched.
vmalloc allocates virtually contiguous but physically fragmented memory from the vmalloc area (~1.5 GB on x86-64). It is suitable for large buffers (multi-MB) that do not require physically contiguous memory. mmap with MAP_ANONYMOUS allocates from the process heap area — which is also virtually contiguous. Both return virtually contiguous memory. The key difference: vmalloc pages are NOT backed by the direct-mapped physical address range — accessing them requires additional page table setup. Anonymous mmap is backed directly by physical pages via the buddy system. For DMA or I/O buffer allocations requiring physical contiguity, neither is suitable — you need alloc_pages() (buddy system) with a high-order allocation. For very large allocations that do not need contiguity, mmap with MAP_ANONYMOUS is typically preferred for its simplicity.
A page table entry (PTE) on x86-64 contains: the Present bit (P, bit 0) — page is in physical memory; the Read/Write bit (R/W, bit 1) — 0=read-only, 1=read-write; the User/Supervisor bit (U/S, bit 2) — 0=kernel only, 1=user accessible; the Page Write-Through bit (PWT, bit 3) and Page Cache Disable bit (PCD, bit 4) for cache policy; the Accessed bit (A, bit 5) — set by hardware on read or write; the Dirty bit (D, bit 6) — set by hardware on write; the Page Size bit (PS, bit 7) — 1 for large pages; the Global bit (G, bit 8) — not flushed on context switch if CR4.PGE=1; and the Physical Frame Number (PFN, bits 12-51) — the upper bits of the physical address. The offset (bits 0-11) comes from the original virtual address and is not stored in the PTE.
A page fault occurs when the accessed virtual address is valid but the page is not resident in physical memory (not present, swapped out, or needs to be loaded). The OS can resolve a page fault by allocating a frame and resuming the process. A segmentation fault (SIGSEGV) occurs when the address is not within any valid virtual memory area (VMA) — i.e., the address was never mapped at all — or when the access type violates the page's protection bits (e.g., writing to a read-only page). The key difference: a page fault is recoverable; the OS can load the missing page and the process continues. A segmentation fault is unrecoverable — the address is invalid or the access is forbidden, and the only options are to handle the signal or terminate the process. Dereferencing a NULL pointer, double-freeing memory, or accessing through a stale pointer typically causes SIGSEGV.
Virtual memory is the logical address space presented to each process — on 64-bit systems, this is 256 TB on Linux/x86-64. Swap space is a disk partition/file that backs pages that have been evicted from physical memory. When physical memory fills up, the OS runs a page replacement algorithm (Clock-Pro in Linux) to select victim pages. If a victim page is clean (matches its on-disk backup), it is simply evicted. If dirty (modified since it was loaded), it must be written to the swap device before the frame can be reclaimed. The swap offset for each evicted page is stored in the page table entry's PFN field. When the process later accesses that page, a page fault triggers, the OS reads the page back from swap, and the process resumes transparently. Virtual memory can exceed physical memory because not all virtual pages are resident simultaneously — only the active working set needs to be in RAM.
KASLR (Kernel Address Space Layout Randomization) randomizes the base address of the kernel text segment at boot time. Without KASLR, a kernel stack overflow exploit could jump to known addresses like 0xffffffff81000000 (the fixed kernel base on x86_64). With KASLR, the kernel base is offset by a random value each boot, making such hardcoded jumps unreliable. KASLR breaks ROP (Return-Oriented Programming) attacks that rely on known code gadget addresses. However, KASLR has limited entropy on 64-bit (~40-47 bits of physical, ~30 bits of virtual) and can be bypassed via information leaks (e.g., /proc/kallsyms disclosure, timing oracles, or Spectre-style side channels). KASLR is complementary to other protections: SMEP prevents executing user code from kernel mode, SMAP prevents the kernel from accessing user data, and KPTI (Kernel Page Table Isolation) separates user and kernel page tables entirely.
TLB shootdown occurs when the OS must invalidate TLB entries associated with a process being switched out. Without PCID/ASID support (older hardware), the only option is to flush the entire TLB via mov cr3, cr3 — invalidating entries for all address spaces. When the new process runs, every memory access causes a TLB miss until the new working set is re-populated. On large working sets, this refill cost can be significant. With PCID, only the outgoing process's entries need invalidation — but this still requires IPIs (Inter-Processor Interrupts) to all cores that may have the outgoing process's TLB entries cached. Under heavy fork/exec activity, the aggregate TLB shootdown cost can consume 10-20% of CPU time on many-core systems. ASID-based tagging in modern hardware reduces this by allowing entries from different processes to coexist.
The kernel heap is the dynamic memory area used by the kernel for runtime allocations — separate from the direct-mapped zone used by kmalloc and the virtual area used by vmalloc. The kernel heap grows from the BSS/BS segment upward via brk(). The allocator manages free blocks using structure data (free lists, bitmaps, or trees) and uses placement strategies: first-fit, best-fit, or worst-fit to select a block. Fragmentation occurs when free blocks are scattered — physical memory is allocated but not contiguous in the heap sense. Slab allocators solve this for objects of fixed size (like file descriptors or task structs) by maintaining per-size caches and avoiding external fragmentation entirely. Buddy allocation handles page-level requests by splitting larger blocks in half until the requested size is met, keeping free blocks in power-of-two sizes that can be efficiently merged (coalesced) on freeing.
kmalloc allocates from the kernel's direct-mapped linear address range (the Normal zone on x86). The returned addresses are virtually and physically contiguous — important for DMA to devices that require physically contiguous buffers. It is limited to ~128 KB maximum (order-7 allocation on x86-64) and uses SLUB slab caches for efficiency.
vmalloc allocates from the vmalloc area (VMALLOC_START to VMALLOC_END on x86-64, ~1.5 GB range). The returned addresses are contiguous in virtual address space but the underlying physical pages are non-contiguous (scatter-gather). This makes vmalloc suitable for large buffers (multi-MB) that don't require physical contiguity, such as module code and large I/O buffers. However, vmalloc has higher overhead — it must set up page tables for each page in the allocation, which means it cannot be called from atomic context (it may sleep). kmalloc with GFP_ATOMIC can be called from interrupt context; vmalloc cannot.
Using 2 MB huge pages for a database buffer pool eliminates TLB pressure dramatically — a 16 GB buffer pool needs only 8,192 TLB entries with 2 MB pages vs 4,194,304 entries with 4 KB pages, easily fitting in any server TLB. It also reduces page table memory overhead. However, huge pages increase internal fragmentation: if a table is 1.1 GB, it consumes 2 GB of buffer pool space (768 MB wasted). With 4 KB pages, waste is ~2 KB average per page. Additionally, huge pages are locked in memory — they cannot be evicted under memory pressure, which can cause OOM issues for other workloads. PostgreSQL's huge_pages setting defaults to try, falling back to 4 KB if huge page allocation fails (common on fragmented systems). The tradeoff is measured: 5-20% throughput improvement vs flexibility and fragmentation control.
In ring 0 (kernel mode), the CPU can access all pages — including those with the User/Supervisor (U/S) bit set to 0 (kernel-only pages). In ring 3 (user mode), the CPU respects the U/S bit: attempting to access a kernel-only page triggers a page fault, which the OS converts to a general protection fault. The privilege level also determines which page table root CR3 is used — the OS maintains separate kernel and user page tables (with KPTI — Kernel Page Table Isolation, enabled by default since Spectre mitigations). With KPTI, user processes have minimal page tables that do not include kernel addresses; a context switch to kernel mode switches CR3. Without KPTI, both share the same page tables and the U/S bit is the only protection. The NX (no-execute) bit adds execution control on top of the R/W protection.
When the combined working set of all runnable processes exceeds physical memory, the OS enters a thrashing state. Each page fault triggers disk I/O to load a page from swap, which slows all processes — more page faults generate more I/O, which generates more latency for all processes. The OS page replacement algorithm (Clock-Pro in Linux) scans pages to find victims, writing dirty anonymous pages to swap and dropping clean file pages. If a process's working set is larger than physical memory, it will spend most of its time waiting for page-in operations. The solution is either to add physical RAM, reduce the number of runnable processes (cgroup limits, container memory limits), tune swappiness to prefer file cache eviction over anonymous page eviction, or use memory-mapped database techniques like huge pages to reduce the page fault rate per unit of working set.
The randomize_va_space sysctl controls ASLR strength: 0 = off, 1 = stack only randomized, 2 = stack + heap + libraries + mmap regions randomized. You verify it with cat /proc/sys/kernel/randomize_va_space (should be 2 in production). To verify ASLR is actually working, you can run a program multiple times and compare addresses — cat /proc/self/maps or ldd (for library addresses) will show different base addresses on each run if ASLR is active. You can also use scanelf -R to check if binaries are compiled as PIE (position-independent executables), which is required for full address space randomization including the main executable. Tools like checksec.sh report ASLR, stack canaries, and NX bit status for binaries. ASLR is most effective on 64-bit systems where entropy is high; on 32-bit, entropy is limited to ~16 bits for stack and ~8 bits for library base.
Memory-mapped I/O devices (e.g., NICs, GPUs) are assigned ranges in the physical address space by the firmware (ACPI, device tree). The OS maps these device regions into the kernel page tables (not user page tables) using ioremap or equivalent APIs. Unlike DRAM, these regions are not coherent with CPU caches by default — accessing them may require using volatile pointers, explicit memory barriers (mmio_barrier()), and cache flush/invalidate operations (for x86, clflush or movnt stores). The OS may mark these pages as uncachable (CD bit in PTE) or use write-combining (WC) to improve throughput. User-space programs cannot directly access device memory — they go through the kernel via read/write syscalls or mmap on a device file (/dev/mem is restricted). The key constraint is that device regions are not swappable and cannot be moved — they are fixed physical addresses reserved by the hardware.
Further Reading
- Concurrency Fundamentals — The problem space and why synchronization is needed
- Mutex Implementation — How mutexes are implemented in userspace and kernel
- Semaphores — Counting semaphores for resource management
- Readers-Writer Locks — Optimizing for read-heavy workloads
- Lock-Free Structures — Advanced techniques for highly concurrent systems
Conclusion
The distinction between logical and physical addresses is fundamental to how modern operating systems provide memory isolation, virtual memory, and security boundaries. The MMU translates every logical address through the TLB and page tables before any physical memory access occurs. This translation is invisible to programs but critical to system stability and security.
Logical address spaces on 64-bit systems are vast (256 TB on Linux/x86-64), but physical memory is finite and shared across all processes. The OS manages this illusion through multi-level page tables, which map virtual pages to physical frames only when needed. Copy-on-write optimizes fork() by deferring physical memory copies until writes occur. ASLR randomizes the layout of memory regions to prevent exploits from hardcoding addresses.
Understanding address translation helps when debugging crashes, optimizing for cache and TLB behavior, and reasoning about security boundaries. For your next step, explore paging and page tables to understand the data structures that make this translation possible, or virtual memory to see how the OS uses disk as a backing store when physical memory is exhausted.
Category
Related Posts
ASLR & Stack Protection
Address Space Layout Randomization, stack canaries, and exploit mitigation techniques
Assembly Language Basics: Writing Code the CPU Understands
Learn to read and write simple programs in x86 and ARM assembly, understanding registers, instructions, and the art of thinking in low-level operations.
Boolean Logic & Gates
Understanding AND, OR, NOT gates and how they combine into arithmetic logic units — the building blocks of every processor.