Amazon DynamoDB: Scalable NoSQL with Predictable Performance
Deep dive into Amazon DynamoDB architecture, partitioned tables, eventual consistency, on-demand capacity, and the single-digit millisecond SLA.
Amazon DynamoDB: Scalable NoSQL with Predictable Performance
Introduction
DynamoDB traces back to an internal Amazon project tackling the shopping cart’s scalability problems in the mid-2000s. The 2007 Dynamo paper introduced consistent hashing, vector clocks, and eventual consistency, ideas that shifted how engineers approached distributed storage.
Amazon launched DynamoDB as a managed service in 2012. Today it powers massive-scale applications at companies like Airbnb, Dropbox, and Netflix. The core trade-off from the original paper: relax strong consistency, gain scalability and availability.
Core Concepts
DynamoDB organizes data in tables. Each table holds items, and each item has attributes. DynamoDB is schemaless except for the primary key, so items can have different attributes.
Primary keys come in two flavors:
- Simple: single attribute (partition key only)
- Composite: two attributes (partition key + sort key)
# DynamoDB item example
{
"UserId": "user-12345", # Partition key
"OrderId": "order-9876", # Sort key (if composite)
"Status": "shipped",
"TotalAmount": 129.99,
"Items": ["SKU-001", "SKU-042"], # List attribute
"Metadata": { # Map attribute
"ShippingAddress": "123 Main St",
"Carrier": "UPS"
}
}
DynamoDB spreads data across storage nodes called partitions. The partition key determines which partition holds your data, and DynamoDB routes requests accordingly.
Data Distribution: Consistent Hashing
DynamoDB uses consistent hashing for data distribution. Hash the partition key to find the partition. Add or remove nodes, and only a fraction of the data needs to move. This is the core mechanism behind DynamoDB’s horizontal scaling.
graph TD
A[Key Space 0-2^32] --> B[Partition 1]
A --> C[Partition 2]
A --> D[Partition 3]
A --> E[Partition N]
B --> F[Replica 1]
C --> G[Replica 2]
D --> H[Replica 3]
F --> I[us-east-1a]
G --> J[us-east-1b]
H --> K[us-east-1c]
Each partition replicates across three availability zones automatically. AWS uses a Paxos variant for partition leader election, though the details are internal.
Partitions split when they exceed capacity (roughly 10GB or 3,000 write capacity units). As your table grows, DynamoDB handles the redistribution.
Hot Partitions: Problems and Mitigation
A hot partition occurs when one partition key receives disproportionately high traffic. Since DynamoDB routes all requests for that key to a single partition, it becomes a bottleneck while other partitions sit idle.
Common causes:
- Using a low-cardinality attribute as partition key (e.g., “status” with values like “active”, “pending”, “completed”)
- Viral content where one item gets massively more reads than others
- Time-based keys causing all writes during peak hours to hit one partition
Mitigation strategies:
- Randomized partition keys: Add a random suffix to distribute writes across partitions
import random
def generate_partition_key(user_id):
# Spread writes across 10 virtual partitions
random_suffix = random.randint(0, 9)
return f"{user_id}#{random_suffix}"
# Trade-off: now you must query all 10 partitions to find a user
-
Write sharding with high-cardinality salts: Use multiple distinct values that map to the same underlying entity
-
Read centralization: For read-heavy hot items, use DAX (DynamoDB Accelerator) to cache aggressively
-
Adaptive capacity: DynamoDB now automatically distributes traffic for eventually consistent reads, but writes still hit the partition directly
Warning signs of hot partitions:
ProvisionedThroughputExceededExceptionerrors affecting only certain keys- ConsumedWriteCapacityUnits showing one partition at 100% while others are under 10%
- Latency spikes on specific keys during traffic bursts
# Monitoring hot partition with CloudWatch
import boto3
cloudwatch = boto3.client('cloudwatch')
# Get partition-level metrics
response = cloudwatch.get_metric_statistics(
Namespace='AWS/DynamoDB',
MetricName='ConsumedWriteCapacityUnits',
Dimensions=[
{'Name': 'TableName', 'Value': 'Orders'}
],
StartTime='2024-01-01T00:00:00Z',
EndTime='2024-01-02T00:00:00Z',
Period=3600,
Statistics=['Sum']
)
When hot partitions are unavoidable: Consider whether DynamoDB is the right fit, or split the entity into multiple tables with different partition schemes.
Consistency Models
DynamoDB lets you pick consistency per request:
Eventually Consistent Reads (default):
- Returns data within milliseconds
- Might occasionally return stale data
- Highest throughput, lowest latency
- Costs 0.5 RCU
Strongly Consistent Reads:
- Always returns the most recent write
- Higher latency due to synchronous replication
- Costs 1 RCU
import boto3
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamodb.Table('Orders')
# Eventually consistent read (default)
response = table.get_item(Key={'OrderId': '123'})
# Strongly consistent read
response = table.get_item(
Key={'OrderId': '123'},
ConsistentRead=True
)
For read-heavy applications, eventual consistency is the obvious choice. For inventory checks or anything needing fresh data, strong consistency is worth the extra RCU.
Capacity Management
DynamoDB offers two capacity modes, switchable once per day:
Provisioned Mode
You declare expected reads and writes per second. DynamoDB reserves that capacity.
# Provisioned capacity specification
{
'TableName': 'Orders',
'ProvisionedThroughput': {
'ReadCapacityUnits': 100, # 100 strongly consistent reads/sec
'WriteCapacityUnits': 50 # 50 writes/sec
}
}
Auto-scaling can adjust capacity based on utilization, scaling up during spikes and down during quiet periods.
On-Demand Mode
Pay per request. No capacity planning. More expensive at sustained high throughput, but simpler for unpredictable workloads.
| Mode | Best For | Cost Model |
|---|---|---|
| Provisioned | Steady workloads | Fixed hourly rate + scaling |
| On-Demand | Variable or spiky workloads | Per request pricing |
A table with 1M reads and 100k writes daily might cost $50/month provisioned versus $200/month on-demand. But for wildly varying traffic, on-demand sidesteps the need to provision for peak.
On-Demand vs Provisioned Break-Even Calculator
Use this framework to decide which capacity mode makes financial sense for your workload:
def calculate_monthly_cost_dynamodb(mode, rcu_per_second, wcu_per_second):
"""
Calculate monthly DynamoDB cost for a given mode and capacity.
Pricing as of 2024 (us-east-1):
- Provisioned RCU: $0.00013 per RCU-hour
- Provisioned WCU: $0.000065 per WCU-hour
- On-demand: $0.25 per million read units
- On-demand: $1.25 per million write units
Each RCU supports 1 strongly consistent read/sec or 2 eventually consistent reads/sec
Each WCU supports 1 write/sec
"""
if mode == 'provisioned':
hours_per_month = 730 # Average month
rcu_cost = rcu_per_second * hours_per_month * 0.00013
wcu_cost = wcu_per_second * hours_per_month * 0.000065
return rcu_cost + wcu_cost
elif mode == 'ondemand':
# Convert per-second capacity to monthly request units
reads_per_month = rcu_per_second * 3600 * 24 * 30
writes_per_month = wcu_per_second * 3600 * 24 * 30
# On-demand pricing (reads are per RCU, not per read unit)
# 1 RCU = 1 strongly consistent read/sec = 2 eventually consistent reads/sec
read_units = reads_per_month # Assuming strongly consistent
write_units = writes_per_month
return (read_units / 1_000_000) * 0.25 + (write_units / 1_000_000) * 1.25
# Example: 100 RCU, 50 WCU sustained
rcu = 100
wcu = 50
provisioned_cost = calculate_monthly_cost_dynamodb('provisioned', rcu, wcu)
ondemand_cost = calculate_monthly_cost_dynamodb('ondemand', rcu, wcu)
print(f"Provisioned: ${provisioned_cost:.2f}/month")
print(f"On-demand: ${ondemand_cost:.2f}/month")
print(f"On-demand is {ondemand_cost/provisioned_cost:.1f}x more expensive at sustained load")
Break-even calculation:
The break-even point occurs when on-demand pricing equals provisioned pricing:
provisioned_monthly = RCU * 730 * $0.00013 + WCU * 730 * $0.000065
ondemand_monthly = (RCU * 2592000 / 1M) * $0.25 + (WCU * 2592000 / 1M) * $1.25
For 100 RCU and 50 WCU sustained: provisioned is ~$12.47/month while on-demand is ~$52.19/month at the same throughput. The break-even multiplier is roughly 4-5x for typical workloads.
Practical decision matrix:
| Scenario | Recommended Mode | Reason |
|---|---|---|
| Steady 24/7 traffic | Provisioned | 4-5x cheaper at sustained load |
| Predictable daily peaks | Provisioned + auto-scaling | Base capacity + elastic peak handling |
| Unpredictable traffic | On-demand | No throttling risk, pay per use |
| New table / unknown load | On-demand | Avoid over-provisioning |
| Migration with known load | Provisioned | More cost-effective once load known |
| Burst-heavy (night/day) | Provisioned + auto-scaling | Reserve for base, scale for bursts |
| < 25 RCU or < 25 WCU average | On-demand minimum | Below provisioned floor pricing |
Auto-scaling as a hybrid approach:
For workloads with a predictable baseline but occasional spikes, provisioned with auto-scaling captures most savings while handling bursts:
# Configure auto-scaling for a table
appautoscaling = boto3.client('application-autoscaling')
# Register scalable target
appautoscaling.register_scalable_target(
ServiceNamespace='dynamodb',
ResourceId='table/Orders',
ScalableDimension='dynamodb:table:WriteCapacityUnits',
MinCapacity=10, # Never below 10 WCU
MaxCapacity=1000 # Can scale to 1000 WCU
)
# Define scaling policy
appautoscaling.put_scaling_policy(
ServiceNamespace='dynamodb',
ResourceId='table/Orders',
ScalableDimension='dynamodb:table:WriteCapacityUnits',
PolicyName='OrdersWriteCapacityScalingPolicy',
TargetTrackingConfiguration={
'TargetValue': 70.0, # Keep utilization at 70%
'PredefinedMetricSpecification': {
'PredefinedMetricType': 'DynamoDBWriteCapacityUtilization'
}
}
)
Auto-scaling has a 4-5 minute adjustment period. For very spiky workloads that exceed auto-scaling response time, on-demand or a manual capacity increase handles the spike.
Primary Operations
Core DynamoDB operations:
PutItem / BatchWriteItem: Write items, with optional conditional writes
table.put_item(
Item={'UserId': '123', 'Email': 'user@example.com'},
ConditionExpression='attribute_not_exists(UserId)' # Prevent overwrites
)
GetItem / BatchGetItem: Retrieve items by primary key
# Single item retrieval
response = table.get_item(Key={'UserId': '123'})
# Batch retrieval (up to 100 items)
response = table.batch_get_item(
RequestItems={
'Orders': {'Keys': [{'OrderId': '1'}, {'OrderId': '2'}]}
}
)
Query: Retrieve items by partition key and optional sort key range
# Query all orders for user-123 placed in 2024
response = table.query(
KeyConditionExpression='UserId = :uid AND begins_with(OrderId, :year)',
ExpressionAttributeValues={
':uid': 'user-123',
':year': '2024'
}
)
Scan: Full table scan — expensive, avoid in production
# Scan reads every item - use sparingly
response = table.scan()
Error Handling and Retry Logic
DynamoDB throws specific exceptions that require different handling strategies. The most common is ProvisionedThroughputExceededException, which occurs when you exceed your reserved read or write capacity.
import boto3
from botocore.config import Config
from botocore.exceptions import ClientError
import time
import random
dynamodb = boto3.resource('dynamodb')
# Configure retries with exponential backoff and jitter
config = Config(
retries={
'max_attempts': 10,
'mode': 'adaptive' # Adaptive mode adds rate-limiting awareness
}
)
dynamodb = boto3.resource('dynamodb', config=config)
def put_item_with_backoff(table_name, item, max_retries=8):
"""Put item with exponential backoff and jitter."""
table = dynamodb.Table(table_name)
base_delay = 0.025 # 25ms base (DynamoDB SDK default)
for attempt in range(max_retries):
try:
table.put_item(Item=item)
return True
except ClientError as e:
error_code = e.response['Error']['Code']
if error_code == 'ProvisionedThroughputExceededException':
# Exponential backoff with full jitter
delay = random.uniform(0, base_delay * (2 ** attempt))
time.sleep(delay)
base_delay = min(base_delay * 2, 5.0) # Cap at 5 seconds
elif error_code == 'ThrottlingException':
# Same handling as ProvisionedThroughputExceeded
delay = random.uniform(0, base_delay * (2 ** attempt))
time.sleep(delay)
elif error_code == 'InternalServerError':
# Transient server-side error - retry
delay = random.uniform(0, base_delay * (2 ** attempt))
time.sleep(delay)
else:
# Non-retryable error
raise
raise Exception(f"Failed after {max_retries} attempts")
Common DynamoDB exceptions and their handling:
| Exception | Cause | Retry Strategy |
|---|---|---|
ProvisionedThroughputExceededException | RCU/WCU exceeded | Exponential backoff with jitter, 25ms base |
ThrottlingException | Request rate too high | Same as above |
InternalServerError | DynamoDB internal error | Retry with backoff |
RequestLimitExceeded | Too many requests simultaneously | Backoff and reduce concurrency |
ResourceNotFoundException | Table or GSI does not exist | Do not retry - fix the code |
ConditionalCheckFailedException | Condition expression not met | Do not retry - business logic issue |
TransactionCanceledException | Transaction conflict | Retry with backoff after resolution |
Exponential backoff with jitter formula:
delay = random(0, min(cap, base * 2^attempt))
| Parameter | Value | Notes |
|---|---|---|
| base_delay | 25ms | SDK default, adjust based on table size |
| max_delay | 5000ms | Cap to avoid long waits |
| max_attempts | 10 | After 10 failures, something is wrong |
| jitter | full | Random value between 0 and calculated delay |
For provisioned capacity with auto-scaling, the SDK’s built-in retry handles most throttling automatically. For on-demand, throttling is more frequent during traffic spikes since DynamoDB adapts capacity in 4-minute windows.
Secondary Indexes
Primary key lookups are efficient, but access patterns often need other attributes. DynamoDB provides two index types:
Global Secondary Index (GSI):
- Own partition key and optional sort key
- Queries across the entire table
- eventual or strong consistency
- Projects attributes from main table
Local Secondary Index (LSI):
- Shares partition key with main table
- Different sort key
- Strongly consistent reads only
- Must be created with table, cannot be modified later
# Creating a GSI for email lookups
table.update(
GlobalSecondaryIndexUpdates=[{
'Create': {
'IndexName': 'EmailIndex',
'KeySchema': [{'AttributeName': 'Email', 'KeyType': 'HASH'}],
'Projection': {'ProjectionType': 'ALL'},
'ProvisionedThroughput': {
'ReadCapacityUnits': 10,
'WriteCapacityUnits': 10
}
}
}]
)
GSIs handle most flexible access patterns. LSI only matters when you need strongly consistent queries within a partition key.
Sparse Index Tricks for GSIs
GSIs only index items that have the GSI key attribute. This behavior enables powerful patterns where you selectively include items in an index.
Pattern: Sparse GSI for Rarely-Accessed Conditional Data. Only items with the GSI key attribute get indexed. Items without that attribute are invisible to the GSI. This lets you maintain a sparse index containing only a subset of items.
# Example: GSI for "orders with issues only"
# Items without IssueFlag are not indexed, keeping GSI small
# Order item with no issue - not indexed
table.put_item(Item={
'OrderId': 'order-001',
'CustomerId': 'cust-123',
'Status': 'delivered',
'Total': 99.99
# No IssueFlag attribute - invisible to IssueIndex GSI
})
# Order item with issue - indexed
table.put_item(Item={
'OrderId': 'order-002',
'CustomerId': 'cust-123',
'Status': 'delivered',
'Total': 99.99,
'IssueFlag': 'REFUND_REQUESTED',
'IssueReason': 'damaged_in_transit'
# IssueFlag present - appears in IssueIndex GSI
})
Pattern: Sparse GSI for Soft-Deleted Items. Instead of deleting items immediately, set a Deleted attribute. Query the GSI for items without the deleted flag to find active records. This pattern is useful when you need to maintain deleted item history in DynamoDB Streams but want efficient queries for active items.
# Soft delete - add Deleted attribute instead of deleting
table.update_item(
Key={'OrderId': 'order-001'},
UpdateExpression='SET Deleted = :del, DeletedAt = :now',
ExpressionAttributeValues={':del': True, ':now': int(time.time())}
)
# Query GSI for non-deleted orders only
response = table.query(
IndexName='StatusIndex',
KeyConditionExpression='CustomerId = :cid AND #status = :status',
FilterExpression='attribute_not_exists(Deleted)',
ExpressionAttributeNames={'#status': 'Status'},
ExpressionAttributeValues={':cid': 'cust-123', ':status': 'pending'}
)
Pattern: Versioned Data with Sparse GSI. Store multiple versions of an item using a version key. Only the current version has the CurrentVersion flag, making the GSI index small and efficient for “current” queries.
# Historical versions without CurrentVersion
table.put_item(Item={
'EntityId': 'user-profile-123',
'VersionId': 'v1',
'Data': {'name': 'Alice', 'email': 'alice@v1.com'},
'UpdatedAt': 1704067200
})
# Current version with CurrentVersion attribute
table.put_item(Item={
'EntityId': 'user-profile-123',
'VersionId': 'v2',
'Data': {'name': 'Alice', 'email': 'alice@v2.com'},
'UpdatedAt': 1704153600,
'CurrentVersion': True # Sparse index only includes this
})
# Query GSI to get current version only
response = table.query(
IndexName='CurrentVersionIndex',
KeyConditionExpression='EntityId = :eid',
FilterExpression='attribute_exists(CurrentVersion)'
)
Sparse GSI trade-offs:
| Aspect | Consideration |
|---|---|
| GSI size | GSI contains only items with the key attribute - can be much smaller than main table |
| Write amplification | Every write with the GSI key attribute consumes GSI write capacity |
| Null handling | Items without GSI key are invisible - cannot query for “missing” items |
| TTL interaction | TTL-deleted items remain in GSI until TTL processes them |
| Update behavior | Updating an item to remove the GSI key removes it from the GSI |
Common sparse GSI use cases:
- Orders with issues (separate index from all orders)
- Active subscriptions vs expired
- Featured products flag
- User notifications that need efficient unread queries
- Entities pending approval vs approved
DAX: DynamoDB Accelerator In-Memory Cache
DAX is a fully managed, in-memory cache for DynamoDB. It sits in front of DynamoDB and caches read results, dramatically reducing read latency for frequently accessed items.
When DAX Makes Sense
- Read-heavy workloads with hot data that does not change frequently
- Microsecond latency requirements that DynamoDB’s single-digit milliseconds cannot meet
- Burst read patterns where you need to handle traffic spikes without throttling
When DAX is Not the Answer
- Write-heavy workloads - DAX does not cache writes
- Real-time data requirements - DAX can serve stale data (though it invalidates on writes)
- Strongly consistent reads only - DAX returns eventually consistent data
DAX Architecture
graph LR
A[Application] --> B[DAX Cluster]
B --> C[Node 1 Cache]
B --> D[Node 2 Cache]
B --> E[Node 3 Cache]
C --> F[DynamoDB]
D --> F
E --> F
DAX clusters run across multiple nodes for HA. You get automatic failover if a node goes down. The cluster manages cache invalidation when items change in DynamoDB.
Code Example
import boto3
from amazon.dax.runtime import Cluster
# DAX client - same API as DynamoDB
dax = boto3.resource('dynamodb', endpoint_url='https://dax.us-east-1.amazonaws.com')
# Same get_item call, but now goes through DAX cache
table = dax.Table('Orders')
response = table.get_item(Key={'OrderId': '123'})
# DAX automatically handles:
# - Cache hit: returns from memory (< 1ms)
# - Cache miss: fetches from DynamoDB, caches result, returns
# - Item updated in DynamoDB: DAX invalidates that item
DAX vs ElastiCache Comparison
| Aspect | DAX | ElastiCache (Redis) |
|---|---|---|
| Integration | Native DynamoDB API | Requires code changes |
| Cache invalidation | Automatic on DynamoDB writes | Manual invalidation logic |
| Strong consistency | Not supported (eventual only) | Possible with write-through |
| Item-level caching | Yes | Yes |
| Query caching | No | Yes |
| Cluster management | Fully managed | Requires cluster management |
Bottom line: DAX is simpler if you only need item-level caching with DynamoDB. Use ElastiCache if you need query result caching or cross-table data caching.
- NoSQL Databases covers the NoSQL landscape DynamoDB sits in
- CAP Theorem explains why DynamoDB chose eventual consistency
- Consistent Hashing describes DynamoDB’s partitioning strategy
- Database Replication explains the cross-AZ replication model
Streams and Triggers
DynamoDB Streams captures a time-ordered sequence of item changes:
- INSERT: New item added
- MODIFY: Item updated
- REMOVE: Item deleted
DynamoDB Streams Limitations
Streams are powerful but come with constraints that matter at scale:
Ordered per-partition only:
DynamoDB Streams maintains arrival order only within a single partition key. If you have events for UserId=A and UserId=B interleaved in time, the stream guarantees ordering for A’s events among themselves and B’s events among themselves, but not across A and B. This matters for event processing where causality across partition keys matters.
# Stream record example - note the partition key determines ordering scope
for record in stream_records:
event_source_arn = record['eventSourceARN'] # Table and stream ARN
event_name = record['eventName'] # INSERT/MODIFY/REMOVE
partition_key = record['dynamodb']['Keys']['UserId']['S'] # Ordering scope
sequence_number = record['dynamodb']['SequenceNumber']
# Events for same UserId arrive in sequence; cross-partition ordering is not guaranteed
24-hour retention cap:
Streams retain only 24 hours of data. For longer retention, you must pipe events to Kinesis Data Streams, Kafka, or a custom consumer that writes to persistent storage.
# Kinesis Data Streams as a longer-retention alternative
import boto3
kinesis = boto3.client('kinesis')
dynamodb = boto3.client('dynamodb')
# Enable DynamoDB Streams to Kinesis integration
dynamodb.enable_kinesis_streaming_destination(
TableName='Users',
StreamArn='arn:aws:kinesis:us-east-1:123456789:stream/user-events'
)
Shard consumption parallelism:
A stream’s shards determine your Lambda concurrency. A single shard processes roughly 1MB/second and supports one concurrent Lambda invocation. If your table has 100 active partitions, you need at least 100 shards for full parallel processing. You can pre-split shards using UpdateTable with ProvisionedThroughput.
| Limitation | Impact | Mitigation |
|---|---|---|
| Per-partition ordering only | Cross-partition event ordering not guaranteed | Design partition keys around event co-ordering needs |
| 24-hour retention | Long-term event history unavailable | Pipe to Kinesis/Kafka for archival |
| Shard-level parallelism | High-partition tables need corresponding shards | Pre-split shards before enabling streams |
| No dead-letter queue | Failed Lambda events lost after retry exhaustion | Configure SQS DLQ on Lambda async invocation |
| GSI updates not tracked | GSI changes do not generate stream events | Query GSI separately if GSI change capture needed |
Build trigger-like behavior with Lambda:
# Lambda function triggered by DynamoDB stream
def lambda_handler(event, context):
for record in event['Records']:
if record['eventName'] == 'INSERT':
new_item = dynamodb_types.unmarshal(record['dynamodb']['NewImage'])
send_welcome_email(new_item['Email'])
Streams retain 24 hours of changes, fine for most use cases. For longer retention, pipe events to Kinesis or a custom consumer.
TTL Behavior and Tombstone Garbage Collection
DynamoDB TTL automatically deletes items after a specified timestamp, but the deletion process works differently than a direct DeleteItem call.
How TTL deletion works:
- When an item’s TTL attribute passes the current time, DynamoDB marks it as expired
- The expired item appears in your table but becomes invisible to queries
- A background process removes the item within 48 hours (typically minutes)
- The deletion generates a
REMOVEevent in DynamoDB Streams
# Setting TTL on an item - TTL attribute must be a Unix timestamp in seconds
import time
table.put_item(
Item={
'UserId': 'user-123',
'Email': 'user@example.com',
'SessionData': '...',
'TTL': int(time.time()) + 86400 * 30 # Expire in 30 days
}
)
# Check if TTL is enabled on the table
table.meta.client.describe_time_to_live(TableName='Users')
Tombstone behavior:
Unlike Cassandra where deletions create tombstones that persist until compaction, DynamoDB tombstones exist only during the 48-hour deletion window. After that, the item is permanently removed.
Garbage collection impact:
- TTL deletions do not consume write capacity
- Items deleted by TTL do not appear in on-demand backup exports
- Point-in-time recovery includes TTL-deleted items within the retention window
- Global Tables replicate TTL deletions across regions
Common TTL pitfalls:
| Pitfall | Impact | Mitigation |
|---|---|---|
| TTL attribute in past | Item immediately invisible | Always set TTL to future timestamps |
| TTL on GSI partition key | GSI updates queued, delayed deletion | Avoid TTL on frequently indexed attributes |
| TTL during backups | Backup may include soon-to-expire items | Check TTL values after restore |
| Timezone confusion | TTL is Unix epoch seconds, not local time | Use int(time.time()) or equivalent |
TTL is approximate: DynamoDB guarantees items expire within 48 hours of the TTL timestamp, though in practice it usually happens within minutes. Do not use TTL for time-sensitive deletions like session expiry where exact timing matters.
Global Tables
Global Tables replicate across AWS regions:
- Active-active replication for low-latency global access
- Automatic conflict resolution (last-writer-wins by default)
- Sub-second replication between regions
# Create global table in two regions
dynamodb_client.create_table(
TableName='Users',
# ... table definition ...
StreamSpecification={'StreamViewType': 'NEW_AND_OLD_IMAGES'}
)
# Enable multi-region replication
dynamodb_client.create_table(
TableName='Users',
# Specify replicas in us-west-2 and eu-west-1
)
Global Tables use all-or-nothing replication: either all replicas accept a write or none do. This gives eventual consistency with conflict resolution.
Backup and Restore
DynamoDB provides three backup mechanisms with different trade-offs: on-demand backups, point-in-time recovery, and cross-region backup replication.
On-Demand Backups
On-demand backups create a full snapshot of the table at a point in time. They do not affect read or write performance and retain until you delete them.
import boto3
dynamodb = boto3.client('dynamodb')
# Create on-demand backup
response = dynamodb.create_backup(
TableName='Orders',
BackupName='Orders-backup-2024-01-15'
)
backup_arn = response['BackupDetails']['BackupArn']
backup_creation_date = response['BackupDetails']['BackupCreationDateTime']
# Restore from backup
dynamodb.restore_table_from_backup(
TargetTableName='Orders-Restored',
BackupARN=backup_arn
)
# List all backups
backups = dynamodb.list_backups(TableName='Orders')
for backup in backups['BackupSummaries']:
print(f"{backup['BackupName']}: {backup['BackupStatus']}")
On-demand backups capture the entire table including TTL-deleted items still within the 48-hour window.
Point-in-Time Recovery
Point-in-time recovery (PITR) continuously backs up your table with second-level granularity for the last 35 days. Enable it per table:
# Enable point-in-time recovery
dynamodb.update_continuous_backups(
TableName='Orders',
PointInTimeRecoverySpecification={
'PointInTimeRecoveryEnabled': True
}
)
# Check PITR status
status = dynamodb.describe_continuous_backups(TableName='Orders')
pitr_status = status['ContinuousBackupsDescription']['PointInTimeRecoveryDescription']['PointInTimeRecoveryStatus']
print(f"PITR status: {pitr_status}")
# Restore to specific time (must be within last 35 days)
dynamodb.restore_table_to_point_in_time(
SourceTableName='Orders',
TargetTableName='Orders-PITR-Restore',
RestoreDateTime=datetime(2024, 1, 10, 12, 0, 0)
)
PITR has no performance impact and does not consume write capacity. Restoration typically takes minutes to hours depending on table size.
Cross-Region Backup with AWS Backup
For disaster recovery across regions, AWS Backup manages cross-region backup copies:
# Create cross-region backup plan with AWS Backup
backup = boto3.client('backup')
# Define backup rule with copy to another region
backup_plan = {
'BackupPlanName': 'DynamoDB-CrossRegion-Backup',
'Rules': [{
'RuleName': 'CopyToDrRegion',
'TargetBackupVaultName': 'backup-vault-us-east-1',
'CopyActions': [{
'DestinationBackupVaultArn': 'arn:aws:backup:us-west-2:123456789:backup-vault/dr-vault',
'Lifecycle': {'DeleteAfterDays': 30}
}],
'Lifecycle': {'DeleteAfterDays': 7}
}]
}
backup.create_backup_plan(BackupPlan=backup_plan)
Backup Comparison Table
| Feature | On-Demand Backup | Point-in-Time Recovery | AWS Backup (Cross-Region) |
|---|---|---|---|
| Retention | Until deleted | 35 days | Configurable (e.g., 30 days) |
| Granularity | Full table snapshot | Second-level within window | Configurable schedule |
| Performance impact | None | None | None |
| Cost | Per-backup storage | Continuous backup storage | Per-copy storage + transfer |
| Cross-region | Manual export to S3 | Not native | Native managed copies |
| Restore time | Minutes to hours | Minutes to hours | Minutes to hours |
Export to S3 for Long-Term Archival
For compliance or analytics needs, export table data to S3:
# Export to S3 (using Data Pipeline historically, now S3 Export directly)
dynamodb.export_table_to_point_in_time(
TableArn='arn:aws:dynamodb:us-east-1:123456789:table/Orders',
ExportTime=datetime(2024, 1, 15, 0, 0, 0),
S3Bucket='my-dynamodb-exports',
S3Prefix='exports/orders-2024-01-15/',
ExportFormat='DYNAMODB_JSON'
)
Exports do not impact production performance and read from backup storage, not the live table.
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| Hot partition key | One partition absorbs disproportionate traffic; throttles while other partitions are underutilized | Use partition key design that distributes traffic (add random suffix, use composite keys); split high-traffic items across multiple keys |
| Provisioned capacity exceeded | Requests throttled with ProvisionedThroughputExceededException | Enable auto-scaling; set appropriate RCU/WCU; use exponential backoff with jitter on retries |
| On-demand mode cost spike | Unpredictable traffic patterns can cause unexpectedly high bills | Set billing alerts; use provisioned capacity for predictable baseline + on-demand for spikes |
| DynamoDB outage in single region | Applications depending on that region fail until DNS failover | Use Global Tables for multi-region active-active; implement cross-region read replicas if active-active not needed |
| Cache invalidation race | DAX returns stale data after DynamoDB write due to eventual consistency | DAX is not strong-consistency compatible; use short TTL; invalidate DAX entry explicitly on writes |
| Transaction conflict | Concurrent transactions modifying same item in opposite directions both fail | Design item access patterns to minimize conflicts; use conditional writes with exponential backoff |
| TTL expired but not deleted immediately | Items remain viewable briefly after TTL expires | TTL is approximate (within 48 hours of actual expiry); if strict expiry needed, handle deletion in application logic |
Common Pitfalls / Anti-Patterns
When to Use DynamoDB
DynamoDB makes sense when you need predictable performance at any scale. Single-digit millisecond latency with consistent throughput comes standard, whether you use provisioned capacity or on-demand mode. Serverless is a natural fit — DynamoDB scales to zero and you pay only for what you use, so variable workloads do not require capacity planning. High write throughput plays to DynamoDB’s strengths too; it handles massive write loads from IoT telemetry, event streams, and gaming leaderboards without breaking a sweat. Multi-region replication via Global Tables gives you active-active replication across regions with automatic conflict resolution. And when your access patterns are simple — key-value lookups or document access by partition key and sort key range — DynamoDB is straightforward and efficient.
When Not to Use DynamoDB
DynamoDB is the wrong choice when you need complex queries across multiple entities. There are no JOINs, no aggregations, and no ad-hoc filtering across tables — PostgreSQL or Elasticsearch handle those better. Strong consistency across multiple tables also does not work well since transactions can span multiple items but not multiple tables; cross-table consistency demands application-level coordination. Full-text search is not available natively — you would need OpenSearch or Elasticsearch as a sidecar. The 400KB item size limit rules out large objects, so media and big blobs belong in S3 instead. And if SQL compatibility matters, DynamoDB has none — migrating from relational databases is painful.
Quick Recap Checklist
Run through this before your next DynamoDB interview or when auditing a live table. Each item catches a real mistake engineers make in production.
- Partition key design distributes traffic evenly (no low-cardinality keys)
- Composite partition + sort key used when accessing items by range
- Eventually consistent reads used where fresh data is not required (halves RCU cost)
- Strongly consistent reads used for inventory, payments, and critical writes
- Provisioned capacity with auto-scaling configured for steady workloads
- On-demand mode used for spiky/unpredictable workloads
- GSI created for alternate access patterns beyond primary key
- Sparse GSI patterns understood and applied for filtered queries
- Hot partition mitigation in place (random suffix, write sharding)
- Exponential backoff with jitter implemented for retry logic
- Conditional writes used to prevent race conditions
- TTL set for items that naturally expire (sessions, temp data)
- DAX evaluated for read-heavy microsecond-latency requirements
- Global Tables configured for multi-region active-active requirements
- PITR enabled for any table requiring point-in-time recovery
- On-demand backups scheduled for critical tables
- Export to S3 configured for compliance/archival needs
- CloudWatch alarms set for throttling and consumed capacity
- Batch operations used (BatchGetItem, BatchWriteItem) to reduce round trips
- Scan avoided in production; Query and GetItem used instead
Interview Questions
Expected answer points:
- DynamoDB hashes the partition key to determine which partition stores the item; requests for the same partition key always route to the same partition
- A hot partition occurs when one key receives excessive traffic relative to other partitions, causing throttling while other partitions remain underutilized
- Mitigation strategies include adding random suffixes to partition keys, write sharding with high-cardinality salts, using DAX for read-heavy hot items, and adaptive capacity for eventually consistent reads
Expected answer points:
- Eventually consistent reads return data within milliseconds and might occasionally return stale data; they cost 0.5 RCU per read
- Strongly consistent reads always return the most recent write due to synchronous replication across AZs; they cost 1 RCU and have higher latency
- Choose eventually consistent reads for read-heavy applications like feeds, timelines, or dashboards where slightly stale data is acceptable
- Choose strongly consistent reads for inventory checks, payment processing, or any operation requiring fresh data accuracy
Expected answer points:
- Consistent hashing maps partition keys to a key space (0–2^32); each partition owns a range of that space
- When partitions are added or removed, only a fraction of keys remap to different partitions, minimizing data movement
- Partitions split when they exceed approximately 10GB or 3,000 write capacity units; DynamoDB automatically redistributes data
- This mechanism allows DynamoDB to scale horizontally without a central bottleneck or coordinator
Expected answer points:
- Provisioned mode suits steady, predictable workloads where you can forecast RCU/WCU needs; 4–5x cheaper at sustained throughput
- On-demand mode suits variable, spiky, or unpredictable workloads where you pay per request without capacity planning
- Use provisioned with auto-scaling for a predictable baseline plus elastic peak handling
- On-demand is ideal for new tables with unknown load and migration workloads with temporary high capacity needs
- Break-even is approximately 4–5x: on-demand costs 4–5x more than provisioned for the same sustained throughput
Expected answer points:
- GSI has its own partition key and optional sort key; queries span the entire table; supports eventual or strong consistency; can be created/modified after table creation
- LSI shares the same partition key as the main table but has a different sort key; only supports strongly consistent reads; must be created with the table and cannot be modified later
- GSIs are more flexible and commonly used for alternate access patterns; LSIs are only needed when you require strongly consistent queries within a partition key
Expected answer points:
- GSIs only index items that have the GSI key attribute; items without that attribute are invisible to the index, keeping the GSI small
- Pattern 1 – Conditional indexing: Create a GSI for "orders with issues only" by including an IssueFlag attribute selectively; orders without issues are not indexed
- Pattern 2 – Soft deletes: Use a Deleted attribute; items without Deleted appear in the "active" GSI; querying for items without Deleted efficiently returns only active records
- Trade-off: write amplification (every item with the GSI key consumes GSI write capacity) and null handling (items without the key are invisible)
Expected answer points:
- Streams capture a time-ordered sequence of item-level changes: INSERT, MODIFY, REMOVE events with before/after images
- Limitation 1 – Ordered per-partition only: cross-partition event ordering is not guaranteed; design partition keys around event co-ordering needs
- Limitation 2 – 24-hour retention: long-term event history requires piping to Kinesis Data Streams, Kafka, or custom persistent storage
- Limitation 3 – Shard-level parallelism: full parallel processing requires corresponding number of shards (1MB/sec per shard); pre-split shards for high-partition tables
- Limitation 4 – No native dead-letter queue: failed Lambda events lost after retry exhaustion; configure SQS DLQ on Lambda async invocation
- Limitation 5 – GSI updates not tracked: GSI changes do not generate stream events
Expected answer points:
- TTL deletes items automatically after a specified timestamp; DynamoDB marks items as expired when TTL attribute passes current time, then removes them within 48 hours (typically minutes)
- TTL generates REMOVE events in DynamoDB Streams so consumers can react to expirations
- Pitfall 1 – TTL attribute in the past: item immediately becomes invisible; always set TTL to future timestamps
- Pitfall 2 – TTL on GSI partition key: GSI updates are queued, causing delayed deletions
- Pitfall 3 – Timezone confusion: TTL is Unix epoch seconds, not local time; always use `int(time.time())` or equivalent
- TTL does not consume write capacity; items deleted by TTL do not appear in on-demand backup exports; PITR includes TTL-deleted items within the retention window
Expected answer points:
- DAX is a fully managed, in-memory cache that sits in front of DynamoDB; cache hits return in microseconds vs. single-digit milliseconds from DynamoDB
- DAX is the right choice for read-heavy workloads with hot data, microsecond latency requirements, and burst read patterns where you need to handle traffic spikes without throttling
- DAX is NOT suitable for write-heavy workloads (does not cache writes), real-time data requirements, or when you need strongly consistent reads
- Choose DAX over ElastiCache when you need native DynamoDB API compatibility and item-level caching without code changes
- Choose ElastiCache when you need query result caching, cross-table data caching, or write-through caching for strong consistency
Expected answer points:
- Use exponential backoff with jitter: delay = random(0, min(cap, base × 2^attempt)) where base is 25ms and cap is 5000ms
- Set max_attempts to 10; after 10 failures something is wrong and you should investigate capacity design
- Adaptive retry mode in the SDK adds rate-limiting awareness, adjusting retry behavior based on throttling signals from DynamoDB
- For provisioned capacity with auto-scaling, the SDK's built-in retry handles most throttling automatically
- For on-demand, throttling is more frequent during traffic spikes since DynamoDB adapts capacity in 4-minute windows; exponential backoff bridges those windows
Expected answer points:
- Global Tables provide active-active replication across AWS regions, enabling low-latency global access without a custom replication layer
- They use all-or-nothing replication: either all replicas accept a write or none do
- Automatic conflict resolution uses last-writer-wins by default; applications can implement custom conflict resolution if needed
- Sub-second replication between regions is guaranteed by the managed service
- Global Tables replicate TTL deletions across regions
Expected answer points:
- On-demand backups: Full table snapshots retained until deleted; no performance impact; good for scheduled weekly/monthly snapshots and migrations
- Point-in-time recovery (PITR): Continuous second-level backup for last 35 days; no performance impact; good for any table requiring recent point-in-time restore capability
- AWS Backup cross-region: Managed service for disaster recovery with configurable retention and lifecycle; good for compliance requiring cross-region backup copies
- Export to S3: For long-term archival and compliance; reads from backup storage, not live table; no performance impact
- TTL deletion is NOT captured in on-demand backup exports but IS included in PITR within the retention window
Expected answer points:
- Query retrieves items by partition key and optional sort key range; efficient because it targets specific partitions
- Scan reads every item in a table or GSI; highly inefficient as it reads unrelated data and consumes large amounts of RCU
- Use Query for all production access patterns; it should be your primary operation
- Use Scan only for one-off administrative tasks, data exports, or when you genuinely need to process all items; never in hot production paths
- BatchGetItem can retrieve up to 100 items by primary key in a single call, reducing round trips compared to multiple GetItem calls
Expected answer points:
- DynamoDB transactions (TransactWriteItems, TransactGetItems) operate within a single table or across tables in the same AWS account and region
- For true cross-table consistency, use application-level coordination: write to both tables, use conditional writes to verify state, and implement compensating transactions for rollback
- Alternatively, use DynamoDB Streams to trigger a Lambda that updates secondary tables and handles failures with retry logic or dead-letter queues
- For strong cross-region consistency, DynamoDB transactions are not supported across regions; Global Tables provide eventual consistency across regions only
Expected answer points:
- Use tenant ID as the leading element of the partition key (e.g., TenantID or TenantID#EntityType) to ensure data isolation and even distribution
- If a single tenant's data exceeds partition limits, add a sub-entity ID as a sort key or use hierarchical keys (TenantID#UserID as PK, OrderID as SK)
- GSIs should also start with TenantID to maintain tenant isolation in alternate access patterns
- Implement row-level security at the application layer by always including TenantID in queries
- Consider separate tables per tenant for extreme isolation requirements, but manage operational complexity
Expected answer points:
- Conditional writes: Use `ConditionExpression='attribute_not_exists(pk)'` or `attribute_exists(pk)'` to prevent overwrites or ensure presence
- TransactWriteItems: Atomically write or delete up to 100 items across tables; fails if any condition fails
- Optimistic locking: Store a version attribute; update only if version matches expected value using conditional expressions
- Exponential backoff with jitter: Handle `TransactionCanceledException` due to conflicts between concurrent transactions modifying the same items
- Design item access patterns to minimize conflicts: avoid having multiple concurrent processes modifying the same item in opposite directions
Expected answer points:
- DynamoDB has a 400KB maximum item size limit (attributes plus key names and values)
- For large objects, store the data in S3 and save the S3 object key as a DynamoDB attribute; never store media or large blobs directly in DynamoDB
- For documents exceeding 400KB, split them into chunks (chunk 1, chunk 2, etc.) with a composite sort key, then reassemble in the application
- Store metadata in DynamoDB for querying while the actual content lives in S3
Expected answer points:
- DynamoDB is a CP (Consistent and Partition-tolerant) system under the CAP theorem during network partitions
- When a network partition occurs, DynamoDB sacrifices availability and returns errors for strongly consistent reads/writes until the partition heals
- Eventually consistent reads may still be served during a partition (reducing RCU cost by half)
- The 2017 DynamoDB paper clarified that DynamoDB chose to give up strong consistency for scalability and availability; you can achieve strong consistency at the cost of higher latency and RCU
- DynamoDB's single-digit millisecond SLA is a deliberate trade-off: predictable performance over absolute consistency
Expected answer points:
- CloudWatch metrics to monitor: ConsumedReadCapacityUnits, ConsumedWriteCapacityUnits, ThrottledRequests, UserErrors, SystemErrors
- Partition-level metrics via CloudWatch Contributor Insights to identify hot partition keys
- Billing alerts: set CloudWatch alarms for unusual spend spikes, especially in on-demand mode
- PITR status monitoring: ensure point-in-time recovery remains enabled
- DAX cache hit/miss ratio if DAX is in use; high miss rate indicates cache sizing issues
- Stream metrics: Lambda invocation errors and iterator age for stream consumers
Expected answer points:
- Choosing DynamoDB for complex relational data: no JOINs, no ad-hoc queries, no aggregations; teams end up building workarounds that negate DynamoDB's benefits
- Poor partition key design: hot partitions cause throttling and unpredictable costs; invest upfront in partition key strategy
- Underestimating GSI costs: each GSI has its own provisioned throughput; excessive GSIs multiply both cost and operational complexity
- Vendor lock-in concerns: DynamoDB is a proprietary managed service with no SQL compatibility; ensure the team accepts this trade-off before committing
- Over-engineering early: start simple with on-demand mode and evolve capacity mode as load becomes predictable; avoid premature optimization
Further Reading
- NoSQL Databases — the NoSQL landscape DynamoDB sits in
- CAP Theorem — why DynamoDB chose eventual consistency
- Consistent Hashing — DynamoDB’s partitioning strategy
- Database Replication — cross-AZ replication model
- Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service — 2018 ATC paper with production telemetry data
- Amazon DynamoDB Storage Frontend — AWS blog on the storage layer architecture
- DynamoDB Global Tables — official docs for multi-region replication
- DAX Developer Guide — official docs for the in-memory cache
Conclusion
DynamoDB embraces eventual consistency to achieve scale and availability. Predictable performance, managed operations, and flexible data modeling make it a solid choice for many modern applications.
Key points:
- Partition key design matters for performance and cost
- Eventually consistent reads halve RCU costs
- GSIs enable query flexibility at the cost of additional throughput
- On-demand mode simplifies capacity planning for variable workloads
- Global Tables give active-active replication across regions
DynamoDB is not the right fit for complex queries (look at PostgreSQL), multi-item transactions (Aurora), or teams who cannot accept vendor lock-in. For high-throughput, access-pattern-driven data with predictable scaling needs, DynamoDB performs where few alternatives can.
The 2017 DynamoDB paper clarified the hard choices: always writable, scalable, and simple came at the cost of strong consistency. The managed service keeps those trade-offs while adding operational simplicity the original Dynamo lacked.
Category
Related Posts
Apache Cassandra: Distributed Column Store Built for Scale
Explore Apache Cassandra's peer-to-peer architecture, CQL query language, tunable consistency, compaction strategies, and use cases at scale.
Google Spanner: Globally Distributed SQL at Scale
Google Spanner architecture combining relational model with horizontal scalability, TrueTime API for global consistency, and F1 database implementation.
Column-Family Databases: Cassandra and HBase Architecture
Cassandra and HBase data storage explained. Learn partition key design, column families, time-series modeling, and consistency tradeoffs.