Distributed Operating Systems

Explore distributed file systems, RPC mechanisms, cluster scheduling, and the fundamental concepts behind modern distributed operating systems.

published: reading time: 20 min read author: GeekWorkBench

Introduction

Distributed operating systems represent a fundamental shift from single machine systems: instead of one computer doing all the work, multiple independent machines coordinate to provide unified services. The operating system abstracts away the network boundary, making distributed resources appear local to applications. This is conceptually elegant but implementationally challenging—network delays, partial failures, and consistency trade-offs create complexity that doesn’t exist in single machine systems.

Modern data centers run distributed systems at every layer: HDFS and Ceph for storage, Kubernetes for orchestration, ZooKeeper for coordination, and custom RPC frameworks for service communication. Understanding distributed OS concepts is essential for anyone building scalable, reliable infrastructure.

When to Use / When Not to Use

Distributed systems are appropriate when:

  • Horizontal scaling is required — Workload exceeds single-machine capacity
  • High availability is mandatory — Applications must survive machine failures
  • Geographic distribution is needed — Users span multiple regions with latency requirements
  • Different components have different requirements — Specialized machines for compute vs storage

Distributed systems are NOT appropriate when:

  • Simple CRUD operations dominate — A single PostgreSQL instance handles millions of queries
  • ACID transactions are critical — Distributed transactions are slow; consider data locality first
  • Team lacks operational expertise — Distributed systems fail in spectacular ways; operators must understand them
  • Latency is ultra-critical — Network round-trips add milliseconds that may be unacceptable

Architecture or Flow Diagram

flowchart TB
    subgraph "Client Layer"
        APP1[Application]
        APP2[Application]
        APP3[Application]
    end

    subgraph "RPC / API Gateway"
        LB[Load Balancer]
        GW[API Gateway]
    end

    subgraph "Service Mesh"
        S1[Service A<br/>Replica 1]
        S2[Service A<br/>Replica 2]
        S3[Service B<br/>Replica 1]
        S4[Service B<br/>Replica 2]
    end

    subgraph "Storage Layer"
        NFS[NFS Server]
        GFS[GFS2 / Ceph Cluster]
        ZK[ZooKeeper<br/>Quorum]
    end

    subgraph "Scheduling"
        K8S[Kubernetes<br/>Control Plane]
        MESOS[Mesos<br/>Master]
    end

    APP1 --> LB
    APP2 --> LB
    APP3 --> GW
    LB --> S1
    LB --> S2
    GW --> S3
    GW --> S4
    S1 --> NFS
    S3 --> GFS
    S2 --> ZK
    S4 --> ZK
    K8S -.-> S1
    K8S -.-> S2
    K8S -.-> S3
    K8S -.-> S4

    style GFS stroke:#ff6b6b,stroke-width:3px
    style ZK stroke:#ffa94d,stroke-width:3px

Core Concepts

Distributed File Systems

NFS (Network File System)

NFS is the classic distributed file system, dating to 1984. It provides transparent file access over the network:

# Mount NFS share on client
mount -t nfs4 -o rw,sync server.example.com:/shared /mnt/nfs

# /etc/exports on server
# /shared *(rw,sync,no_subtree_check,no_root_squash)
# Root squashing maps root user to anonymous UID for security

# Automount with autofs
# /etc/auto.master
# /mnt/nfs /etc/auto.nfs --timeout=60

# /etc/auto.nfs
# shared -rw,sync server.example.com:/shared

NFSv4 uses a stateful protocol with better performance but requires careful lock management. The sync mount option writes data to disk before responding; async is faster but risks data loss on server crash.

GFS2 (Global File System 2)

GFS2 is a clustered file system for shared storage:

# Create GFS2 filesystem
mkfs.gfs2 -p lock_dlm -j 3 -t cluster_name:vol_name /dev/sdb

# Mount GFS2 filesystem (on each node)
mount -t gfs2 -o noatime /dev/sdb /mnt/gfs2

# Check lock status
gfs_tool lockdump /mnt/gfs2
gfs_tool gettune /mnt/gfs2

GFS2 requires shared storage (SAN) and a working cluster manager (Corosync + Pacemaker). It provides coherent file system access across all nodes—writes are immediately visible to all cluster members.

Remote Procedure Call (RPC)

RPC abstracts network communication into function calls:

// Protocol buffer definition (gRPC style)
syntax = "proto3";

package calculator;

service Calculator {
    rpc Add(AddRequest) returns (AddResponse);
    rpc StreamSum(StreamRequest) returns (stream StreamResponse);
}

message AddRequest {
    int32 a = 1;
    int32 b = 2;
}

message AddResponse {
    int32 result = 1;
}

// Generated server stub (Python example)
import grpc
from concurrent import futures
import calculator_pb2 as pb2
import calculator_pb2_grpc as pb2_grpc

class CalculatorServicer(pb2_grpc.CalculatorServicer):
    def Add(self, request, context):
        result = request.a + request.b
        return pb2.AddResponse(result=result)

    def StreamSum(self, request_iterator, context):
        total = 0
        for req in request_iterator:
            total += req.value
            yield pb2.StreamResponse(running_total=total)

# Server setup
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
pb2_grpc.add_CalculatorServicer_to_server(CalculatorServicer(), server)
server.add_insecure_port('[::]:50051')
server.start()

Cluster Scheduling

Kubernetes Scheduling

Kubernetes schedules pods across nodes based on resources and constraints:

apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchExpressions:
                - key: app
                  operator: In
                  values:
                    - my-app
            topologyKey: kubernetes.io/hostname
  containers:
    - name: my-container
      image: my-image:latest
      resources:
        requests:
          memory: "128Mi"
          cpu: "250m"
        limits:
          memory: "256Mi"
          cpu: "500m"
      nodeSelector:
        disktype: ssd
      tolerations:
        - key: "node.kubernetes.io/not-ready"
          operator: "Exists"
          effect: "NoSchedule"

Mesos Architecture

Mesos uses a two-level scheduling architecture:

# Mesos executor framework (Python example)
from mesos.interface import mesos_pb2
import mesos.native as mesos

class MyTask:
    def __init__(self, task_id, command):
        self.task_id = task_id
        self.command = command

def reregistered(driver, frameworkId):
    print(f"Re-registered: {frameworkId.value}")

def resourceOffers(driver, offers):
    for offer in offers:
        if len(tasks) > 0:
            task = tasks.pop(0)
            task.mesos_task_id.value = str(task_id_counter)
            driver.launchTasks(offer.id, [task])
        else:
            driver.declineOffer(offer.id)

driver = mesos.MesosExecutorDriver(
    mesos.Executor(),
    framework=mesos_pb2.FrameworkInfo()
)
driver.run()

Production Failure Scenarios

Scenario 1: Split-Brain in Clustered Storage

Problem: Network partition causes both sides of partition to think they have primary storage access, leading to data corruption.

Mitigation:

  • Use quorum-based fencing (STONITH—Shoot The Other Node In The Head)
  • Configure watchdog timers appropriately
  • Test network partitions regularly withchaos engineering
  • Use journaling file systems that detect split-brain

Scenario 2: NFS Server Becoming Unavailable

Problem: Applications hang when NFS server fails, causing cascading timeouts.

Mitigation:

# Use soft mounts with timeout for less critical data
mount -t nfs4 -o soft,timeo=30,retrans=3 server:/share /mnt/nfs

# Use hard,intr for critical data (can be interrupted)
mount -t nfs4 -o hard,intr,timeo=600 server:/share /mnt/nfs

# Enable automatic unmount on server loss
umount -l /mnt/nfs  # Lazy unmount

Scenario 3: Clock Skew in Distributed Systems

Problem: Nodes disagree on time due to NTP misconfiguration, causing certificate validation failures and cache inconsistencies.

Mitigation:

  • Use NTP with multiple upstream sources
  • Configure acceptable drift thresholds
  • Use logical timestamps (Lamport clocks or vector clocks) for event ordering
  • In critical systems, use PTP (Precision Time Protocol) for sub-millisecond accuracy

Trade-off Table

AspectNFSGFS2HDFSCephFS
Consistency ModelWrite-throughCluster-coherentWrite-onceStrong eventual
Setup ComplexityLowMediumHighHigh
Storage TypeAny sharedShared SAN onlyDistributedAny
Max ScaleSingle serverFew dozen nodesThousandsThousands
Use CaseGeneral purposeHA workloadsBig data analyticsCloud native

Implementation Snippet: Raft-based RPC Framework

Building a fault-tolerant RPC layer:

import hashlib
import time
from dataclasses import dataclass
from typing import Optional, Dict, Any

@dataclass
class LogEntry:
    term: int
    index: int
    command: Dict[str, Any]

class RaftNode:
    def __init__(self, node_id: str, peers: list):
        self.node_id = node_id
        self.peers = peers
        self.current_term = 0
        self.voted_for: Optional[str] = None
        self.log: list[LogEntry] = []
        self.commit_index = 0

    def request_vote(self, candidate_id: str, last_log_index: int,
                     last_log_term: int) -> bool:
        if self.current_term > candidate_id.term:
            return False
        if self.voted_for is None or self.voted_for == candidate_id:
            if last_log_term > self.get_last_log_term():
                self.voted_for = candidate_id
                return True
        return False

    def append_entries(self, leader_id: str, entries: list[LogEntry],
                       prev_log_index: int, prev_log_term: int) -> bool:
        if prev_log_index > len(self.log):
            return False
        # Consistency check
        if prev_log_index > 0 and self.log[prev_log_index - 1].term != prev_log_term:
            self.log = self.log[:prev_log_index - 1]
            return False
        self.log.extend(entries)
        return True

    def get_last_log_term(self) -> int:
        if self.log:
            return self.log[-1].term
        return 0

Observability Checklist

For distributed systems, monitor:

  • End-to-end latency — Distributed tracing (Jaeger, Zipkin)
  • Service mesh metrics — Envoy proxy metrics, circuit breakers
  • Storage latency — NFS mount latency, Ceph OSD throughput
  • Quorum health — ZooKeeper znode count, Raft leader elections
  • Network partitioning — Detector mechanisms, split-brain events
  • Clock skew — NTP offset measurements across nodes

Common Pitfalls / Anti-Patterns

  • Kerberos for NFSv4 — Prevents unauthorized mount without proper ticket
  • mTLS in service mesh — Istio/Linkerd provide automatic encryption and authentication
  • RBAC for Kubernetes — Principle of least privilege for service accounts
  • Secrets management — Use Vault or Kubernetes secrets, not environment variables
  • Audit logging — Distributed traces should include authentication context

Common Pitfalls / Anti-patterns

  1. Ignoring network partitions — Assuming the network is reliable is the most common distributed systems mistake
  2. Not planning for partial failure — Every remote call can fail; timeout and retry logic is mandatory
  3. Over-engineering consistency — Using distributed transactions when simpler eventual consistency would work
  4. Forgetting about operator complexity — Running a distributed system requires expertise that small teams may lack
  5. Treating debugging as impossible — Distributed tracing makes debugging tractable; instrument early

Quick Recap Checklist

  • Distributed systems trade single-system simplicity for scale and fault tolerance
  • CAP theorem forces explicit trade-offs between consistency and availability
  • NFS provides simple networked file access; GFS2 provides cluster-coherent storage
  • RPC frameworks abstract network communication but not its failure modes
  • Cluster schedulers (Kubernetes, Mesos) provide resource abstraction across machines
  • Monitor end-to-end latency, quorum health, and clock synchronization
  • Design assuming partial failure—every remote call can fail

Real-World Case Study: Netflix’s Distributed System Architecture

Netflix operates one of the largest distributed systems in the world, serving 200+ million subscribers. Their architecture demonstrates key distributed OS concepts:

  1. Service Discovery: Eureka provides dynamic service registration and location, replacing static configurations
  2. Data Storage: Cassandra handles global data replication across regions with tunable consistency
  3. Coordination: ZooKeeper (now etcd) manages configuration and leader election for critical services
  4. Traffic Management: Zuul provides dynamic routing, load balancing, and circuit breaking
  5. Asynchronous Communication: Kafka provides durable, ordered message streams between services

Their approach to distributed systems embraces eventual consistency where appropriate (user preferences can be slightly stale) while enforcing strong consistency for billing and playback authorization.

Advanced Topic: Sharding Strategies and Trade-offs

Sharding (horizontal partitioning) distributes data across multiple nodes but introduces complexity:

Hash-based sharding: Route by hash(key) % num_nodes

  • Even distribution, but adding nodes requires remapping all data
  • Use consistent hashing to minimize remapping

Range-based sharding: Route by key ranges (a-m, n-z)

  • Natural for time-series data, but can create hotspots
  • Example: logs organized by date ranges

Geo-based sharding: Route by user region (US, EU, APAC)

  • Low latency for local access, but cross-region queries are expensive
  • Use replication for global reads

Trade-offs: Sharding improves write scalability but complicates queries that span shards (joins, aggregates). Consider whether your workload actually needs sharding—many workloads that scale to a single PostgreSQL instance with read replicas avoid sharding complexity entirely.

Interview Questions

1. What is the difference between Lamport clocks and vector clocks?

Lamport clocks are single integer counters incremented on each event; they establish partial ordering but cannot determine causality. If L(a) < L(b), we know a happened before b, but not vice versa. Vector clocks are arrays of counters, one per node; they can detect causality (if all entries in VC(a) are ≤ VC(b), then a could have caused b). Vector clocks enable better conflict detection in eventually consistent systems like DynamoDB or Cassandra.

2. Explain how two-phase commit (2PC) works and why it is problematic in distributed systems.

In 2PC, a coordinator asks all participants to "prepare" (vote yes/no). If all vote yes, coordinator sends "commit." If any votes no, coordinator sends "rollback." Problems: (1) The coordinator is a single point of failure—if it crashes after "prepare" but before "commit," participants block indefinitely. (2) Network partitions can leave the system in an ambiguous state. (3) Holding locks during the "prepare" phase blocks other transactions. Alternatives include SAGA pattern (for long transactions) or accepting eventual consistency.

3. What is fencing in distributed storage and why is it necessary?

Fencing prevents a node from accessing shared storage after it has been declared "dead" by the cluster manager. Without fencing, a supposedly-dead node might still have I/O queued and corrupt data when it recovers. Fencing techniques include: persistent reservation (SCSI persistent reservations), power fencing (IPMI/PDU), and network fencing (disabling switch ports). STONITH ("Shoot The Other Node In The Head") is the most common fencing strategy in pacemaker/corosync clusters.

4. How does leader election work in ZooKeeper?

ZooKeeper uses ZAB (ZooKeeper Atomic Broadcast) for leader election and consensus. Nodes start as LOOKING state, request votes from peers, and whoever gets a quorum (majority) becomes leader. The leader then broadcasts new epochs; followers sync with the leader before becoming operational. If the leader fails, followers timeout and restart election. This requires a minimum of 3 nodes (for majority quorum) and odd numbers to avoid split-brain.

5. What are the trade-offs between synchronous and asynchronous replication?

Synchronous replication waits for all replicas to acknowledge writes before returning success—guaranteeing durability but adding latency equal to the slowest replica's round-trip. Asynchronous replication acknowledges writes immediately after local storage, accepting potential data loss if the primary fails before replication completes—much lower latency but weaker durability guarantees. Semi-synchronous waits for at least one replica, balancing both concerns. MongoDB's majority read concern and PostgreSQL's synchronous commit are examples of these different approaches.

6. What is the CAP theorem and how does it affect distributed system design?

CAP theorem states you can have at most two of three: Consistency (all nodes see the same data), Availability (every request gets a response), and Partition tolerance (system continues despite network partitions). Since partitions will happen, you must choose: CP systems sacrifice availability (may return errors during partitions)—etcd, ZooKeeper. AP systems sacrifice consistency (return stale data during partitions)—Cassandra, DynamoDB. Understanding which trade-off your workload tolerates guides architectural decisions.

7. What is a CRDT and when would you use it?

CRDT (Conflict-free Replicated Data Type) is a data structure that can be merged automatically without coordination between replicas. Examples include G-Counter (grow-only counter), LWW-Register (last-write-wins), and OR-Set (observed-remove set). Use CRDTs when: (1) multiple nodes can make concurrent updates without coordination, (2) eventual consistency is acceptable, (3) you want to avoid distributed transactions. CRDTs trade semantic richness for automatic convergence—they work for counters and registers but not for complex transactional operations.

8. How does Raft consensus algorithm handle node failures?

Raft handles failures through: (1) Leader election—if a follower doesn't receive heartbeats from the leader within election timeout, it becomes candidate and requests votes. The candidate with quorum becomes leader. (2) Log replication—the leader appends entries to its log and replicates to followers via AppendEntries RPCs. Entries are committed once majority of nodes have persisted them. (3) Membership changes—joint consensus allows adding/removing nodes safely without downtime. If a leader fails, followers timeout and restart election cycle.

9. What is the difference between eventual consistency and strong consistency?

Eventual consistency guarantees that if no new updates are made, all replicas will eventually return the same value—reads may return stale data during the convergence window. Strong consistency guarantees that every read returns the most recent write or an error—reads always see the latest committed data. Examples: DynamoDB (eventual), Google Spanner (strong). Trade-offs: eventual consistency offers lower latency and higher availability; strong consistency provides predictable behavior at the cost of higher latency and potential unavailability during partitions.

10. What is consistent hashing and why is it used in distributed systems?

Consistent hashing maps both data keys and server nodes onto a hash ring (e.g., 0 to 2^32). Each key is stored on the first node clockwise from its hash position. When a node is added/removed, only keys mapped to nearby nodes move—not all keys like traditional hashing. This provides: (1) minimal data movement during scaling, (2) uniform load distribution, (3) simplified cache distribution (CDNs). Variants with virtual nodes improve distribution evenness by mapping each physical node to multiple positions on the ring.

11. How does the Gossip protocol work for state dissemination in distributed systems?

Gossip protocols spread information peer-to-peer with epidemic-style dissemination. Each node periodically picks random peers to exchange state with, and corrupted state converges exponentially fast. Unlike consensus protocols, gossip works async and eventual-consistent, making it resilient to network partitions and node failures. Cassandra uses gossip for membership (each node periodically exchanges cluster state with 3-5 peers), and Consul uses it for service discovery catalog replication. Gossip is eventually consistent but not linearizable.

12. What is the difference between a hash ring and virtual nodes in consistent hashing?

A basic hash ring maps each physical node to one position on the ring. Problems: (1) Uneven load distribution if nodes have different capacities, (2) Large data movement when adding/removing nodes. Virtual nodes map each physical node to multiple positions (e.g., 100-200 hash values), giving finer-grained distribution. If one physical node has 2x capacity, assign it 2x virtual nodes. When a node fails, its virtual nodes are distributed across multiple successors, and adding a node only requires remapping a fraction of keys. This approach balances load and minimizes reorganization.

13. What is MapReduce and how does it relate to distributed OS concepts?

MapReduce is a programming model for processing large datasets across distributed nodes. The map phase filters and transforms input data, producing intermediate key/value pairs. The reduce phase merges all values associated with the same key. Under the hood, MapReduce relies on distributed OS concepts: (1) Task scheduling across nodes (like YARN), (2) Data locality awareness to schedule compute near stored data, (3) Fault tolerance through task retry and speculative execution, (4) Network serialization of intermediate results. Hadoop MapReduce demonstrates how distributed frameworks abstract away the complexity of coordinating hundreds of nodes.

14. How does DynamoDB achieve both availability and eventual consistency?

DynamoDB uses a synthesis of techniques: (1) Quorum reads/writes — R + W > N for strong consistency, R + W <= N for eventual. Typical config: N=3, R=2, W=2. (2) Vector clocks for causality — tracks which version came from which write, enabling merge during reads. (3) Hinted handoff — if a replica is temporarily down, another accepts writes and later hands them off. (4) Merkle tree synchronization — background reconciliation detects divergence. This design deliberately sacrifices linearizability to achieve availability during partitions — the CAP trade-off made explicit.

15. What is the role of Chubby in Google's distributed architecture?

Chubby is a distributed lock service providing coarse-grained advisory locks and reliable storage. It uses Paxos consensus for consistency and stores lock ownership and small data files in a replicated BerkeleyDB. Use cases: (1) Leader election for Bigtable, (2) Namespace locking for GFS, (3) Service discovery via DNS-like name lookups, (4) Configuration storage. Chubby's design philosophy: prefer reliability and simplicity over performance — don't use it for frequently updated data. It's been largely replaced by Zookeeper in open-source stacks, but the design influenced etcd and Consul.

16. What are Bloom filters and how are they used in distributed systems?

A Bloom filter is a probabilistic data structure that answers "is this item in the set?" with possible false positives but no false negatives. It uses k hash functions mapping to a bit array of size m. Adding an item sets k bits; checking queries whether all k bits are set. False positive rate: (1 - e^(-kn/m))^k. In distributed systems: (1) Membership checks — Cassandra uses Bloom filters to check if SSTable might contain a key before accessing disk. (2) Routing optimization — Google uses Bloom filters for approximating set membership in query servers. (3) Cache filtering — avoid caching negative results. Trade-off: memory-efficient but introduces small error rate acceptable for many use cases.

17. How does Merkle tree synchronization work in distributed storage?

A Merkle tree is a hash tree where each leaf is the hash of a data block, and each internal node is the hash of its children. Comparison: compute root hashes of two replicas, if different, traverse to find the subtrees that differ — minimizing data transfer. Amazon DynamoDB uses Merkle trees for anti-entropy (background reconciliation) between replicas. When synchronizing large datasets, Merkle trees reduce comparison complexity from O(n) to O(log n) storage nodes. Cassandra repairs using Merkle trees via nodetool repair. The tree structure enables efficient identification of divergent ranges without comparing every record.

18. What is the difference between optimistic and pessimistic concurrency control in distributed systems?

Pessimistic control assumes conflicts will happen and prevents them by holding locks or serializing access — uses two-phase locking (2PL), where locks are held until transaction commits. Pros: no aborts, predictable latency. Cons: high contention reduces throughput, deadlocks require detection/rollback.

Optimistic control assumes conflicts are rare, allows concurrent execution, and validates at commit time (OCC — optimistic concurrency control). If validation fails (another transaction modified the same data), abort and retry. Pros: high throughput under low contention. Cons: wasted work on aborts, latency from retry. DynamoDB uses optimistic locking via version numbers; Percolator (Google's transaction system) uses timestamp ordering for distributed transactions. Choose based on conflict rates: high contention favors pessimistic; low contention favors optimistic.

19. How does vector clock truncation work and what are the trade-offs?

Vector clocks grow unboundedly — each node's array expands with every causal event. Truncation strategies: (1) Timestamp-based — drop entries older than a threshold (e.g., 2 weeks). (2) Version-based — keep only last N entries per node. (3) Epoch expiration — partition history by epochs, only keep last epoch. Trade-offs: truncation loses ability to determine causality for older events (may create false causality), but unbounded growth would eventually exhaust storage. Cassandra uses version-based truncation (configurable per table). Some systems switch to hybrid approaches (vector clocks for recent events, hash-based for older) after a threshold.

20. What is the difference between eventual linearizability and eventual consistency?

Eventual consistency: if no new updates, all replicas converge to same value eventually. No ordering guarantee — reads may see values out of chronological order.

Eventual linearizability: each operation appears atomic and takes effect at some point between invocation and response, and all clients see operations in the same total order. It's linearizability applied eventually — operations are eventually consistent but maintain the illusion of sequential consistency when observed. Google Spanner achieves this with globally synchronized commit timestamps via TrueTime (GPS + atomic clocks). This matters for operations like compare-and-swap where the "sequence" of operations affects correctness — eventual linearizability guarantees that if A completes before B starts, B sees A's result.

Further Reading


Conclusion

Distributed operating systems coordinate multiple independent machines to provide unified services, trading single-system simplicity for scale and fault tolerance. The CAP theorem forces explicit trade-offs between consistency and availability—understanding your workload’s tolerance for staleness guides architectural decisions.

Key components include distributed file systems (NFS for general purpose, GFS2 for cluster-coherent storage), RPC frameworks that abstract network communication, and cluster schedulers like Kubernetes and Mesos that provide resource abstraction across machines. Design assuming partial failure—every remote call can fail, network partitions can occur, and clocks can skew.

For continued learning, explore distributed consensus algorithms (Raft, Paxos), eventual consistency patterns (CRDTs, conflict-free data structures), and advanced topics like distributed transactions, shadow leader patterns, and multi-region database architectures.


Category

Related Posts

CPU Affinity & Real-Time Operating Systems

CPU affinity binds processes to specific cores for cache warmth and latency control. RTOS adds deterministic scheduling with bounded latency for industrial, medical, and automotive systems.

#operating-systems #cpu-affinity #scheduling

Fork & Exec System Calls

fork() duplicates a running process, then exec() replaces it with a new program. Together they power every shell, web server, and daemon on Unix-like systems.

#operating-systems #fork #exec

System Calls Interface

System calls are the boundary between user programs and the kernel. They are the mechanism by which user-space applications request services from the operating system — opening files, creating processes, allocating memory, and more. Understanding syscalls reveals how the OS enforces isolation and provides safe access to hardware.

#operating-systems #system-calls #kernel