TCP, IP, and UDP: Understanding Internet Transport Protocols
Compare TCP vs UDP, learn the three-way handshake, flow control, congestion control, when to use each protocol, and how QUIC changes things.
TCP, IP, and UDP are the foundational protocols that move data across the internet. TCP handles reliable, ordered delivery with connection setup and flow control, while UDP trades reliability for speed with its connectionless, fire-and-forget model. Understanding when each protocol is appropriate — and how QUIC is changing the equation — is essential for anyone building networked systems. This post covers the layered model, how the three-way handshake works, flow and congestion control, and practical guidance on choosing between TCP and UDP for your application.
Introduction
Networking uses a layered model. Each layer handles specific responsibilities:
graph TB
A[Application Layer<br/>HTTP, DNS, SMTP] --> B[Transport Layer<br/>TCP, UDP]
B --> C[Internet Layer<br/>IP]
C --> D[Link Layer<br/>Ethernet, WiFi]
IP handles addressing and routing. TCP and UDP sit on top of IP, adding their own features. You rarely choose between TCP and UDP directly; you choose an application protocol that uses one or the other.
TCP Protocol
TCP: Transmission Control Protocol
TCP is connection-oriented. Before sending data, the client and server establish a connection. This connection stays open throughout the conversation and closes when done.
The Three-Way Handshake
TCP uses a three-way handshake to establish connections:
sequenceDiagram
participant Client
participant Server
Client->>Server: SYN (seq=x)
Server->>Client: SYN-ACK (seq=y, ack=x+1)
Client->>Server: ACK (ack=y+1)
Note over Client,Server: Connection established!
- Client sends SYN with a sequence number
- Server responds with SYN-ACK, acknowledging the client’s sequence number and sending its own sequence number
- Client sends ACK, acknowledging the server’s sequence number
This takes a full round trip before any data can be sent. HTTPS adds TLS on top, requiring more round trips.
Reliable Data Transfer
TCP guarantees that data arrives intact and in order. It does this through:
- Acknowledgments (ACKs) - Receiver confirms receipt of data
- Sequence numbers - Data is numbered so the receiver can reorder out-of-order packets
- Retransmission - If data is not acknowledged, it is retransmitted
// TCP guarantees (simplified)
const sender = {
sequence: 0,
send(data) {
const packet = { data, sequence: this.sequence };
this.sequence += data.length;
return packet;
},
};
const receiver = {
expectedSequence: 0,
receive(packet) {
if (packet.sequence === this.expectedSequence) {
this.expectedSequence += packet.data.length;
return { ack: packet.sequence + packet.data.length };
}
// Out of order - request retransmission
return { ack: this.expectedSequence };
},
};
Flow Control
TCP prevents the sender from overwhelming the receiver. The receiver advertises a window size indicating how much buffer space it has. The sender cannot send more than this window without receiving acknowledgments.
// Flow control window
const receiver = {
bufferSize: 65535,
usedBuffer: 0,
windowSize() {
return this.bufferSize - this.usedBuffer;
},
};
Congestion Control
TCP also prevents overwhelming the network. It uses algorithms like slow start, congestion avoidance, and fast retransmit to dynamically adjust sending rate.
graph LR
A[Slow Start] --> B[Congestion<br/>Avoidance]
A -->|"packet loss"| C[Reduce Rate]
B -->|"packet loss"| C
C --> A
Slow start begins with a small window and exponentially increases it until packets are lost. This probing helps TCP find the available bandwidth without causing congestion.
UDP Protocol
UDP: User Datagram Protocol
UDP is simpler than TCP. It is connectionless and does not provide reliability, ordering, or flow control. Data is sent as datagrams, and the sender does not wait for acknowledgment.
UDP Characteristics
- No connection establishment (zero latency)
- No ordering or sequencing
- No retransmission of lost packets
- Small header overhead (8 bytes vs TCP’s 20+ bytes)
// UDP is simple
const sender = {
send(data, address) {
const datagram = { data, destination: address };
// Send and forget - no acknowledgment
return datagram;
},
};
const receiver = {
receive(datagram) {
// Handle datagram - might be duplicate, might be missing
return datagram.data;
},
};
UDP Header
UDP has a minimal header:
+----------------+----------------+----------------+----------------+
| Source Port | Dest Port | Length | Checksum |
+----------------+----------------+----------------+----------------+
Four 16-bit fields. Source port is optional (set to 0 if not used). Length includes header and data. Checksum for error detection.
TCP vs UDP Comparison
Feature Comparison Table
| Feature | TCP | UDP |
|---|---|---|
| Connection | Connection-oriented | Connectionless |
| Reliability | Guaranteed delivery | Best effort |
| Ordering | In-order delivery | No ordering |
| Speed | Slower (handshake, ACKs) | Faster (no overhead) |
| Header size | 20+ bytes | 8 bytes |
| Flow control | Yes | No |
| Congestion control | Yes | No |
When to Use TCP
TCP is the right choice when:
- You need all data to arrive intact
- Order matters (files, messages, documents)
- You can tolerate some latency
- You are building HTTP servers, email, file transfer
Most web traffic uses TCP. The reliability guarantees mean you do not have to handle missing or duplicate data yourself.
When to Use UDP
UDP works well when:
- Speed matters more than reliability
- Real-time applications (voice, video, gaming)
- You want minimal overhead
- Application-level error handling is sufficient
- Multicast or broadcast is needed
// Good UDP use cases
const videoStream = {
protocol: "UDP",
// Missing frames are less noticeable than delay
// Accept some packet loss for real-time playback
};
const voiceCall = {
protocol: "UDP",
// Prefer hearing the other person with small gaps
// Over hearing them perfectly but delayed
};
const dnsQuery = {
protocol: "UDP",
// Fast lookup matters more than perfect reliability
// DNS servers retry if no response
};
Port Numbers
Both TCP and UDP use port numbers to multiplex connections. Ports range from 0 to 65535. Well-known ports (0-1023) are reserved for common services:
| Port | Service | Protocol |
|---|---|---|
| 80 | HTTP | TCP |
| 443 | HTTPS | TCP |
| 53 | DNS | UDP (also TCP) |
| 22 | SSH | TCP |
| 25 | SMTP | TCP |
Your application can use any port above 1024. Node.js http.createServer() defaults to port 3000, for example.
Topic-Specific Deep Dives
Common Misconceptions
”UDP is always faster”
UDP avoids TCP overhead, but speed depends on the network. On a reliable local network, UDP can transmit faster. On the public internet with packet loss, TCP’s congestion control actually helps it perform well.
”TCP is for files, UDP is for video”
Many video streaming platforms use TCP. The overhead is acceptable, and TCP’s reliability ensures frames are not dropped. Real-time video calls often use UDP for lower latency, but they build their own reliability layer for important data.
”You always choose between TCP and UDP”
Usually your application protocol decides this for you. HTTP uses TCP. DNS usually uses UDP but switches to TCP for large responses. WebRTC uses UDP for media but TCP for signaling.
Connecting It All Together
The layers build on each other:
graph TB
A[Your Application] --> B[HTTP over TCP]
B --> C[TCP over IP]
C --> D[IP over Ethernet]
D --> E[Physical Network]
Each layer encapsulates the one below it. Your HTTP request becomes a TCP segment, then an IP packet, then an Ethernet frame.
QUIC Protocol
QUIC (RFC 9000) runs over UDP and layers TCP’s reliability on top of UDP’s speed. Google built it to get around TCP’s worst inefficiencies, and it’s now the transport behind HTTP/3.
Why QUIC Exists
TCP has three annoying problems QUIC actually solves:
- Head-of-line blocking: Lose one TCP packet and everything behind it waits—even if other streams could use the bandwidth
- Handshake latency: TCP needs 1 RTT before you even start TLS, then TLS needs another 1-2 RTTs
- Congestion control rigidity: TCP’s algorithms live in the OS kernel—you can’t ship a new one without an OS update
QUIC Handshake vs TCP+TLS
sequenceDiagram
participant Client
participant Server
Note over Client,Server: TCP+TLS (TLS 1.3 takes ~1.5 RTT before data)
Client->>Server: TCP SYN
Server->>Client: SYN-ACK
Client->>Server: TCP ACK
Note over Client,Server: TLS Handshake starts...
Client->>Server: ClientHello
Server->>Client: ServerHello
Client->>Server: Finished
Note over Client,Server: Data ready at ~1.5 RTT
Note over Client,Server: QUIC (1 RTT, crypto integrated into transport)
Client->>Server: QUIC Initial (crypto handshake)
Server->>Client: QUIC Initial (crypto + data)
Client->>Server: QUIC Handshake Finished
Note over Client,Server: Data ready at ~1 RTT
QUIC folds the crypto handshake into the transport layer. That first packet from the client already carries encrypted application data—no more waiting for separate TLS round trips.
QUIC Multi-Stream Advantage
HTTP/1.1 and HTTP/2 both multiplex multiple streams over a single TCP connection. That works until one packet drops—then TCP holds everything until that packet gets retransmitted. QUIC gives each stream its own stream ID:
graph TB
subgraph "TCP (HTTP/2)"
A[Stream 1]:::blocked
B[Stream 2]:::blocked
C[Stream 3]:::blocked
end
subgraph "QUIC (HTTP/3)"
D[Stream 1]
E[Stream 2]
F[Stream 3]
end
D --> E --> F
When a QUIC packet drops, only the stream that owns that packet stalls. Every other stream keeps running.
0-RTT Resumption
If you’ve connected to a server before, QUIC can skip the handshake entirely:
- Client attaches
Early Datawith a session ticket from the previous visit - Server decrypts and accepts the data immediately—no waiting
- You start sending application data in the very first packet
QUIC vs TCP Comparison
Comparison Table
| Feature | TCP+TLS 1.3 | QUIC (RFC 9000) |
|---|---|---|
| Handshake RTT | 1-2 RTTs | 1 RTT (0-RTT on resumption) |
| Head-of-line blocking | Yes (TCP) | No (stream-level) |
| Connection migration | Breaks—IP change kills it | Survives address change |
| Protocol evolution | Kernel-level (slow) | Userspace (ship anytime) |
| Encryption | TLS (application) | Built into transport |
| Flow control | Per-connection | Per-stream |
When to Choose QUIC
- Mobile clients switching between WiFi and cellular (connection sticks around)
- Applications that care about latency (HTTP/3, WebRTC data channels)
- High packet loss environments (stream isolation prevents a single loss from cascading)
- If you’re deploying HTTP/3 via CDN, QUIC comes with it automatically
TCP Header Deep Dive
The TCP header has a minimum 20 bytes but can expand with options:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Offset | Flags | Window Size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options (variable length) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Key Fields
Sequence Number: Byte position of the first data octet in this segment. Enables in-order reconstruction and retransmission identification.
Acknowledgment Number: Next expected sequence number. All data up to this number has been received.
Flags: Nine control bits including SYN (connection establishment), ACK (acknowledgment valid), FIN (graceful termination), RST (abort), PSH (push data to application).
Window Size: How many bytes the receiver accepts. The 16-bit field gets multiplied by the window scaling factor (up to 2^14x) for high-BDP links—critical for 10 Gbps networks with 100ms RTT.
TCP Options (Partial List)
| Kind | Length | Name | Purpose |
|---|---|---|---|
| 0 | 1 | EOL | End of option list |
| 1 | 1 | NOP | Padding |
| 2 | 4 | MSS | Max segment size |
| 3 | 3 | Window Scaling | Shift count (up to 2^14) |
| 8 | 10 | Timestamps | RTTM and PAWS |
| 4 | 2 | SACK Permitted | Selective acknowledgment allowed |
| 5 | Variable | SACK Block | Selective ACK ranges (if negotiated) |
Window Scaling (RFC 7323)
On high-bandwidth, high-delay networks, a 65KB window is insufficient:
Link: 10 Gbps, 100ms RTT
Required window: 10 Gbps × 0.1s = 1 GB
Max window with scaling: 65535 × 2^14 = 1 GB
The window scaling option shifts the 16-bit window left by the scale factor (0-14).
NAT Traversal Guide
Network Address Translation (NAT) enables multiple devices to share one public IP. But NAT breaks end-to-end connectivity, which matters for UDP-based protocols like QUIC and WebRTC.
How NAT Works
Private Network (192.168.1.x) NAT Device Internet
203.0.113.5 (public)
+----------+ +-----------+ +--------+
| Client | 192.168.1.100:5000 --> | PAT entry | --> | Server |
| | <-- 203.0.113.5:5001 | (port map)| <-- | |
+----------+ +-----------+ +--------+
The NAT device tracks source IP, source port, destination IP, destination port. When the response returns, it reverses the mapping.
NAT Types and Traversal Difficulty
| NAT Type | Behavior | Traversal Difficulty |
|---|---|---|
| Full Cone | Any external host can connect | Easiest |
| Restricted | Only contacted external hosts | Moderate |
| Port Restricted | Only contacted ext host+port | Hard |
| Symmetric | Different mapping per destination | Hardest |
STUN and TURN
STUN (RFC 8489): Lets clients discover their public IP:port mappings. Works with Full Cone and some Restricted NATs.
// STUN request (simplified)
const stunServer = "stun:stun.l.google.com:19302";
// Client sends binding request
// Server responds with mapped address
TURN (RFC 8656): Relay server for symmetric NATs. All traffic routes through the TURN server—higher latency but always works.
sequenceDiagram
participant Client
participant TURN
participant Target
Client->>TURN: Allocate request
TURN->>Client: XOR-Mapped-Address
Client->>TURN: Send to Target
TURN->>Target: Data from Client's public IP
Target->>TURN: Response
TURN->>Client: Data
NAT Keepalive
NAT mappings expire. Servers send keepalive packets to maintain sessions:
# TCP keepalive (Linux)
echo 60 > /proc/sys/net/ipv4/tcp_keepalive_time # Start after 60s idle
echo 10 > /proc/sys/net/ipv4/tcp_keepalive_intvl # Send every 10s
echo 3 > /proc/sys/net/ipv4/tcp_keepalive_probes # Drop after 3 failures
# Application-level UDP keepalive for QUIC
quic_conn.sendHeartbeat() # Every 30 seconds typically
QUIC Connection Migration
QUIC handles NAT issues elegantly via connection migration—when your mobile switches from WiFi to cellular, QUIC can continue on the new path using the same connection ID:
// QUIC connection migration (conceptual)
if (networkChanged) {
const newPath = { address: newIP, port: newPort };
quicConn.migrate(newPath); // Same connection ID, new path
}
TCP Performance Tuning
Stock Linux settings often limit performance on modern networks. Tuning can dramatically improve throughput.
Socket Buffer Sizes
Default buffers are too small for high-BDP networks:
# View current TCP buffer settings
# min / default / max (bytes)
cat /proc/sys/net/ipv4/tcp_rmem # Receive buffer
cat /proc/sys/net/ipv4/tcp_wmem # Send buffer
# Example output:
# 4096 16384 6291456
# Set autotuning to higher maximums
echo "6291456 25165824 134217728" > /proc/sys/net/ipv4/tcp_rmem
echo "3145728 12582912 67108864" > /proc/sys/net/ipv4/tcp_wmem
TCP Congestion Control
Linux supports multiple congestion control algorithms:
# List available algorithms
sysctl net.ipv4.tcp_available_congestion_control
# Example output: reno cubic bbr
| Algorithm | Best For | Characteristics |
|---|---|---|
| cubic | General purpose (default) | Fixed targeting, Reno replacement |
| bbr | High-BDP, variable networks | Maximizes throughput, not fair to Reno |
| reno | Legacy, simple networks | Avoids cwnd halving on partial losses |
| vegas | Low-latency preference | Proactive queue detection |
BBR (Bottleneck Bandwidth and RTT)
BBR models the network instead of reacting to loss:
// BBR conceptually targets bottleneck bandwidth + RTT
const bbrState = {
bw: 0, // Bottleneck bandwidth
rtprop: Infinity, // Round trip propagation time
pacing_rate: bw * 1.25,
cwnd: 12 * 1500, // In bytes (12 packets)
};
TIME_WAIT and Port Reuse
Connections in TIME_WAIT linger for 2×MSL (60 seconds on Linux by default). High-traffic servers can run out of source ports:
# Enable TIME_WAIT reuse (reuse, don't recycle)
echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse
# Reduce MSL to speed up cleanup (not RFC-compliant)
echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout
# Increase ephemeral port range
echo "32768 61000" > /proc/sys/net/ipv4/ip_local_port_range
Keepalive Tuning
Keepalives detect dead connections but waste bandwidth:
# Application-agnostic keepalive (system-wide)
net.ipv4.tcp_keepalive_time = 7200 # Default: 7200s (2 hours!)
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
# Better: Application-level with TCP keepalive options
socket.keepAlive(true, 30, 10); // enable, idle 30s, interval 10s
Quick Recap Checklist
Before diving into implementation, ensure you understand:
- TCP header fields and their purposes (sequence, ACK, window, flags)
- Window scaling for high-BDP networks (2^14 multiplier)
- TCP options: MSS, SACK, timestamps, window scaling
- NAT types and traversal techniques (STUN/TURN/ICE)
- QUIC connection migration for mobile networks
- BBR vs cubic congestion control characteristics
- TIME_WAIT recycling and port exhaustion prevention
- Keepalive tuning for production systems
Trade-off Analysis
| Factor | TCP | UDP | QUIC | Recommendation |
|---|---|---|---|---|
| Reliability | Ordered delivery | None | Ordered delivery | Ordered delivery required |
| Overhead | 20+ bytes | 8 bytes | 20+ bytes | Low overhead preferred |
| Speed | Moderate | Fastest | Faster than TCP | Latency-sensitive |
| Congestion Control | Built-in | None | Built-in | Streaming/bulk transfer |
| Connection | Stateful (3-way) | Stateless | Stateless (0-RTT) | Connectionless preferred |
| TLS | Separate handshake | N/A | Built-in | Security + speed |
| NAT Traversal | Full cone issues | STUN/TURN | Better via ID | P2P/multiplayer |
When to Use Each Protocol
- TCP: Web browsing, email, file transfers, any bulk/reliable data
- UDP: VoIP, video streaming, gaming, DNS, IoT
- QUIC/HTTP/3: Modern web apps prioritising low latency
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| TCP connection timeout | Requests hang; poor user experience | Set appropriate connection timeouts; implement retry logic |
| UDP packet loss | Missing data; application-level failures | Implement application-level acknowledgment and retransmission |
| Port exhaustion | Cannot establish new connections; service unavailable | Monitor connection counts; implement connection pooling; increase port range |
| SYN flood attack | Server overwhelmed with half-open connections | Use SYN cookies; implement rate limiting; use DDoS protection |
| NAT timeout | Long-lived connections break; clients appear disconnected | Send keepalive packets; use connection-oriented protocols when possible |
| MTU mismatch | Packets dropped; connectivity issues | Use Path MTU Discovery; set conservative MTU values |
| TCP congestion collapse | Network throughput drops dramatically | Use proper congestion control algorithms; implement traffic shaping |
| UDP amplification attack | Your servers used to attack others | Validate source addresses; restrict UDP responses; implement rate limiting |
Observability Checklist
Metrics
- TCP connection rate (new connections per second)
- Active TCP connections (concurrent connections)
- TCP connection failures (connection refused, timeout)
- Segment retransmission rate
- TCP buffer utilization (bytes in send/receive buffers)
- UDP packet rate (packets sent/received per second)
- UDP error rate (checksum failures, buffer overflows)
- Round-trip time (RTT) for TCP connections
- Throughput (bytes sent/received per second)
Logs
- Connection failures with source IP and port
- TCP reset packets (RST) received
- UDP packet checksum failures
- Connection timeouts
- Port exhaustion warnings
- Network interface errors
Alerts
- Connection failure rate exceeds 5%
- Retransmission rate exceeds 10%
- Active connections approach limits
- UDP error rate increases
- TCP RST rate spikes (potential attack)
- Network latency anomalies
Security Checklist
- Use TLS over TCP when encryption is needed (not raw TCP)
- Implement connection timeouts to prevent resource exhaustion
- Monitor for SYN flood attacks; enable SYN cookies
- Use firewall rules to restrict exposed ports
- Implement rate limiting on TCP/UDP services
- Validate UDP source addresses to prevent spoofing
- Use IPsec for network-level encryption when needed
- Monitor for unusual traffic patterns indicating attacks
- Implement connection tracking for stateful firewall rules
- Restrict broadcast and multicast traffic where not needed
Common Pitfalls / Anti-Patterns
Assuming UDP is Always Faster
UDP avoids TCP overhead but does not guarantee delivery or ordering.
// Problem: UDP with no reliability
const socket = dgram.createSocket("udp4");
socket.send(data, port, host, (err) => {
// No confirmation data arrived - you simply do not know
});
// Better: Implement acknowledgment
socket.send(data, port, host);
socket.on("message", (msg) => {
if (msg.toString() === "ACK") {
// Confirmed delivery
}
});
Ignoring TCP Connection Limits
Each TCP connection consumes file descriptors and memory.
# Check current connection limits
cat /proc/sys/net/core/somaxconn # Max pending connections
cat /proc/sys/fs/file-max # System-wide file descriptors
# Monitor active connections
ss -s
Not Handling Connection Termination Properly
Abruptly closing connections can cause data loss.
// Graceful close - ensure data is sent
socket.end(); // Send FIN after remaining data is sent
socket.on("close", () => {
// Connection fully closed
});
Building Custom Reliability on UDP When TCP Would Work
If you need reliable, ordered delivery, just use TCP.
// Problem: Building reliability on UDP
socket.on("message", (data) => {
// Must implement sequence numbers, acknowledgments, retransmission
// This is essentially reimplementing TCP
});
// Better: Just use TCP
const server = net.createServer((socket) => {
// Reliability built in
});
Interview Questions
Expected answer points:
- Client sends SYN with sequence number x
- Server responds with SYN-ACK (seq=y, ack=x+1)
- Client sends ACK (ack=y+1), connection established
- Necessary to synchronize sequence numbers and confirm both parties can send/receive
Expected answer points:
- Flow control prevents overwhelming the receiver (advertising window size)
- Congestion control prevents overwhelming the network (slow start, avoidance, fast retransmit)
- Flow control is about the receiver's buffer; congestion control is about network capacity
Expected answer points:
- No connection establishment (zero RTT overhead)
- No acknowledgments, retransmissions, or ordering overhead
- Smaller header (8 bytes vs 20+ bytes)
- Choose UDP for real-time apps (video, voice, gaming) where speed > reliability
- Choose UDP when application-level error handling is sufficient
Expected answer points:
- TCP HOL blocking: losing one packet blocks all subsequent packets until retransmission
- QUIC gives each stream its own stream ID and sequence space
- When a QUIC packet is lost, only that stream stalls—other streams continue
Expected answer points:
- States: CLOSED, LISTEN, SYN_SENT, SYN_RECEIVED, ESTABLISHED, FIN_WAIT_1, FIN_WAIT_2, CLOSE_WAIT, CLOSING, LAST_ACK, TIME_WAIT
- Server: CLOSED → LISTEN → SYN_RECEIVED → ESTABLISHED → FIN_WAIT_1 → CLOSE_WAIT → LAST_ACK → CLOSED
- Client: CLOSED → SYN_SENT → ESTABLISHED → FIN_WAIT_1 → FIN_WAIT_2 → CLOSED
Expected answer points:
- TIME_WAIT lasts 2×MSL (60 seconds on Linux) after graceful close
- Ensures delayed packets from old connections are discarded
- Allows proper termination acknowledgment processing
- Mitigation: tcp_tw_reuse, tcp_fin_timeout, larger ephemeral port range
Expected answer points:
- Slow Start: cwnd starts small, exponentially increases until loss
- Congestion Avoidance: linear increase after threshold reached
- Fast Retransmit: on 3 duplicate ACKs, retransmit without waiting for timeout
- Fast Recovery: after retransmit, try to continue without full slow start
- Algorithms: CUBIC (default), BBR (throughput-focused), Reno, Vegas
Expected answer points:
- Cumulative ACK: acknowledges only the last in-order byte received
- SACK: allows receiver to acknowledge non-contiguous blocks of data
- SACK reduces unnecessary retransmissions when multiple segments are lost
- SACK must be negotiated during TCP handshake (option kind 4 and 5)
Expected answer points:
- NAT maps private IPs/ports to public ones for multiple devices sharing one public IP
- STUN: lets clients discover their public mapping; works with full cone and some restricted NATs
- TURN: relay server for symmetric NATs; all traffic routes through TURN server
- ICE: combines STUN/TURN for optimal path selection
Expected answer points:
- TCP window field is 16 bits, max 65535 bytes—insufficient for high-BDP links
- Window scaling (RFC 7323) shifts window left by scale factor (0-14)
- Enables windows up to 65535 × 2^14 = 1 GB
- Necessary for 10 Gbps+ links with 100ms+ RTT
- Scale factor negotiated during handshake via TCP option
Expected answer points:
- Slow Start: cwnd starts small (1-4 MSS), doubles each RTT until ssthresh is reached
- Congestion Avoidance: cwnd increases linearly (1 MSS per RTT) after ssthresh
- Fast Retransmit: 3 duplicate ACKs trigger immediate retransmission without waiting for timeout
- Fast Recovery: after retransmit, cwnd is partially inflated to continue without full slow start
- CUBIC (Linux default): uses cubic function for window growth, RTT-independent, aggressive on high-BDP links
- BBR (Google): models bandwidth and RTT to find optimal throughput, outperforms CUBIC on lossy networks
Expected answer points:
- Cumulative ACK: acknowledges all bytes up to a sequence number; cannot recover multiple lost segments in one RTT
- SACK: receiver explicitly lists non-contiguous blocks received correctly (RFC 2018)
- SACK enables sender to retransmit only specific missing segments, reducing unnecessary retransmissions
- SACK is negotiated at connection setup via the SACK-permitted TCP option
- Particularly valuable on high-latency satellite and intercontinental links
Expected answer points:
- STUN (Session Traversal Utilities for NAT): external server reveals the public IP:port mapping; works for open and cone NATs but fails on symmetric NATs
- TURN (Traversal Using Relays around NAT): relay server forwards traffic when direct P2P fails; adds latency and server cost
- ICE (Interactive Connectivity Establishment): tries all candidate pairs (host → STUN → TURN) in priority order and picks the best working path
- WebRTC uses this stack for peer-to-peer audio/video connections
- STUN discovers addresses, TURN provides fallback, ICE orchestrates selection
Expected answer points:
- TCP HOL blocking: a lost segment blocks all subsequent segments until retransmission, even if they belong to different HTTP/2 streams
- QUIC multiplexes streams independently — each stream has its own sequence space; loss only blocks that stream
- QUIC also integrates TLS handshake into connection establishment (0-RTT or 1-RTT vs TCP+TLS 2-3 RTTs)
- Connection migration: QUIC uses connection IDs so sessions survive IP address changes (mobile network handover); TCP would need a new connection
Expected answer points:
- Real-time media (VoIP, video conferencing): late packets are useless; TCP retransmission adds unacceptable delay
- Online gaming: 100ms latency tolerance; occasional dropped frame is preferable to high jitter from retransmission
- DNS queries: small request/response fits in one packet; UDP avoids TCP handshake overhead
- IoT/sensor networks: devices send infrequent small packets; 8-byte UDP overhead vs 20+ bytes for TCP matters at scale
- Broadcast/multicast: TCP is point-to-point only; UDP natively supports one-to-many delivery
Expected answer points:
- After a connection closes, the endpoint lingers in TIME_WAIT for 2×MSL (typically 60 seconds on Linux)
- Purpose: absorb delayed segments from old connections that might arrive during a new connection with the same 4-tuple
- Port exhaustion: each TIME_WAIT socket holds a local port, limiting new connections on that port
- Mitigations: tcp_tw_reuse (allow TIME_WAIT sockets to be reused); tcp_fin_timeout (reduce MSL); SO_LINGER with timeout=0 (abortive close); increase ephemeral port range (ip_local_port_range)
Expected answer points:
- Classic handshake costs 1.5 RTTs before any application data can be sent (SYN → SYN-ACK → ACK)
- TCP Fast Open (TFO, RFC 7413): on a previously connected client, the SYN packet can carry an TLS-equivalent cookie and the first HTTP request, reducing latency to 1 RTT
- TFO requires both client and server to support it; the first connection still costs 1.5 RTTs to establish the TFO cookie
- QUIC goes further: 0-RTT (no handshake) or 1-RTT (full handshake with 0-RTT data on reconnect)
Expected answer points:
- Maintain a pool of pre-established TCP connections to amortise the 1.5 RTT handshake cost across many requests
- Database clients (PostgreSQL, Redis), HTTP/1.1 clients, and gRPC use connection pools
- Trade-offs: memory overhead for idle connections; risk of stale connections closed by NAT or load balancer; pool sizing is critical (too small = contention, too large = resource waste)
- HTTP/2 multiplexing reduces per-host connection count, but connection pools remain essential for non-HTTP protocols
Expected answer points:
- When a UDP datagram exceeds the path MTU, the sender must fragment it; only the destination reassembles
- Any lost fragment invalidates the entire datagram (all-or-nothing); makes UDP worse on high-loss links
- Path MTU Discovery (PMTUD): sets the DF (Don't Fragment) bit; ICMP "packet too big" informs sender of max MTU
- Some networks block all ICMP, silently breaking PMTUD (ICMP black hole); fallback to a conservative MTU (576 bytes) is often needed
- TCP handles this transparently: it probes the path and adapts MSS to fit
Expected answer points:
- TCP is harder to spoof: sequence numbers and ACK tracking make random packet injection difficult without an existing connection
- UDP is easy to spoof; stateless nature makes it trivial to forge source addresses
- QUIC combines UDP simplicity with built-in TLS 1.3 encryption, achieving TCP+TLS security with less overhead
- DDoS amplification is more practical with UDP (DNS, NTP, QUIC) because request/response ratios heavily favour the attacker
- Both benefit from IPsec (network-layer encryption); application-layer TLS can be used on either protocol
Further Reading
TCP and UDP serve different needs. TCP provides reliability, ordering, and flow control at the cost of latency. UDP provides speed and simplicity at the cost of reliability guarantees.
For application-layer protocols, see the HTTP/HTTPS post. For DNS specifically, the DNS & Domain Management post covers name resolution in detail.
Conclusion
Key Bullets
- TCP provides reliable, ordered, connection-oriented delivery with flow and congestion control
- UDP provides fast, unreliable, connectionless delivery without overhead
- TCP three-way handshake establishes connections; UDP sends immediately
- TCP uses acknowledgments, sequence numbers, and retransmission for reliability
- UDP header is 8 bytes; TCP header is 20+ bytes minimum
- Choose TCP when correctness matters; choose UDP when speed and low latency matter more
- Common TCP ports: 80 (HTTP), 443 (HTTPS), 22 (SSH), 25 (SMTP), 53 (DNS)
- DNS uses UDP port 53 for queries, TCP for zone transfers and large responses
Copy/Paste Checklist
# Check TCP connection states
ss -tunapl | grep -E "(State|Recv-Q|Send-Q)"
# Monitor TCP metrics
cat /proc/net/tcp
cat /proc/net/tcp6
# Check UDP statistics
cat /proc/net/udp
cat /proc/net/udp6
# Test TCP connection with netcat
nc -zv host.example.com 443
# Test UDP connectivity (limited)
nc -zvu host.example.com 53
# Check for open ports
ss -tunapl | grep LISTEN
# View TCP window sizes
cat /proc/sys/net/ipv4/tcp_rmem
cat /proc/sys/net/ipv4/tcp_wmem
# Test MTU
ping -M do -s 1472 example.com
Category
Tags
Related Posts
HTTP and HTTPS Protocol: A Complete Guide to Web Communication
Deep dive into HTTP methods, status codes, headers, keep-alive, and protocol evolution. Understand HTTP/1.1, HTTP/2, and HTTP/3 differences.
Cloud Security: IAM, Network Isolation, and Encryption
Implement defense-in-depth security for cloud infrastructure—identity and access management, network isolation, encryption, and security monitoring.
Docker Networking: From Bridge to Overlay
Master Docker's networking models—bridge, host, overlay, and macvlan—for connecting containers across hosts and distributed applications.