AWS SQS and SNS: Cloud Messaging Services

Learn AWS SQS for point-to-point queues and SNS for pub/sub notifications, including FIFO ordering, message filtering, and common use cases.

published: reading time: 16 min read

AWS SQS and SNS: Cloud Messaging Services

Amazon Web Services offers two managed messaging services: SQS (Simple Queue Service) and SNS (Simple Notification Service). They address different needs. SQS is a fully managed point-to-point queue. SNS is a managed pub/sub service. Both integrate tightly with other AWS services.

This post covers both services, their differences, and when to use each.

AWS SQS: Point-to-Point Queues

SQS provides managed message queues without the operational overhead of running your own broker. You create a queue, send messages, and consume them.

Queue Types

SQS offers two types:

Standard queues offer maximum throughput (unlimited messages per second) and at-least-once delivery. A message may be delivered more than once, so your consumer must be idempotent.

FIFO queues guarantee exactly-once processing and preserve order. Throughput is lower (300 messages per second per queue without batching, up to 3000 with batching). FIFO is appropriate when order matters or duplicates are unacceptable.

Working with SQS

import boto3

sqs = boto3.client('sqs')

# Create a queue
queue_url = sqs.create_queue(
    QueueName='tasks.fifo',
    Attributes={'FifoQueue': 'true', 'ContentBasedDeduplication': 'true'}
)['QueueUrl']

# Send a message
sqs.send_message(
    QueueUrl=queue_url,
    MessageBody=json.dumps({'task': 'process', 'data': value}),
    MessageGroupId='task-group'
)

# Receive messages
response = sqs.receive_message(
    QueueUrl=queue_url,
    MaxNumberOfMessages=10,
    WaitTimeSeconds=20
)

for msg in response['Messages']:
    process(json.loads(msg['Body']))
    sqs.delete_message(QueueUrl=queue_url, ReceiptHandle=msg['ReceiptHandle'])

Key SQS Features

Visibility timeout controls how long a message is hidden after being received. If a consumer crashes, the message reappears after the timeout, available for another consumer to pick up.

Dead letter queues capture messages that fail repeatedly. Configure a redrive policy to send messages to a DLQ after N failed processing attempts.

Message retention keeps messages for up to 14 days. This buffer handles consumer downtime without message loss.

Long polling reduces empty responses by waiting up to 20 seconds for messages to arrive. This reduces API calls and latency.

AWS SNS: Pub/Sub Notifications

SNS is a managed pub/sub service. You create topics, subscribe endpoints (email, SMS, HTTP, Lambda, SQS, mobile push), and publish messages.

SNS Topics

sns = boto3.client('sns')

# Create topic
topic_arn = sns.create_topic(Name='order-events')['TopicArn']

# Subscribe endpoints
sns.subscribe(TopicArn=topic_arn, Protocol='lambda', Endpoint='arn:aws:lambda:...')
sns.subscribe(TopicArn=topic_arn, Protocol='sqs', Endpoint='arn:aws:sqs:...')

# Publish
sns.publish(
    TopicArn=topic_arn,
    Message=json.dumps({'event': 'order.placed', 'order_id': '12345'}),
    Subject='Order Notification'
)

Message Filtering

Subscribers can apply filter policies to receive only relevant messages:

sns.subscribe(
    TopicArn=topic_arn,
    Protocol='sqs',
    Endpoint='sqs-arn',
    Attributes={'FilterPolicy': json.dumps({
        'event': ['order.placed', 'order.cancelled'],
        'region': ['us-west', 'us-east']
    })}
)

Messages not matching the filter policy are not delivered to that subscriber.

SNS Message Batching

SNS supports message batching to lower costs and handle more throughput. The PublishBatch API allows up to 10 messages per batch, reducing the number of API calls.

# Send batch of messages (up to 10 per batch)
entries = [
    {'Id': '1', 'Message': json.dumps({'event': 'order.placed', 'order_id': '1001'})},
    {'Id': '2', 'Message': json.dumps({'event': 'order.placed', 'order_id': '1002'})},
    {'Id': '3', 'Message': json.dumps({'event': 'order.placed', 'order_id': '1003'})},
]
sns.publish_batch(TopicArn=topic_arn, PublishBatchRequestEntries=entries)

Batching reduces costs at scale: instead of 100 API calls for 100 messages, you make only 10 calls.

SNS FIFO Topics

SNS supports FIFO (First-In-First-Out) topics that provide strict ordering and exactly-once delivery. FIFO topics are designed for scenarios where message order matters, such as financial transactions or inventory updates.

# Create FIFO topic
fifo_topic_arn = sns.create_topic(
    Name='order-events.fifo',
    Attributes={'FifoTopic': 'true', 'ContentBasedDeduplication': 'true'}
)['TopicArn']

# Publish with message group ID for ordering
sns.publish(
    TopicArn=fifo_topic_arn,
    Message=json.dumps({'event': 'order.placed', 'order_id': '12345'}),
    MessageGroupId='order-processing'  # Ensures ordering within group
)
FeatureSNS StandardSNS FIFO
OrderingNo guaranteePer message group
DeduplicationNone5-minute window
ThroughputUnlimited300 messages/sec per topic
Message groupN/AGroups messages for ordering
CostPer message + deliveryHigher (per message)

SNS FIFO is ideal when you need to ensure that messages for the same entity (same order, same user) are processed in order.

Fan-Out to SQS

A common pattern is SNS fan-out to multiple SQS queues:

graph LR
    Publisher -->|publish| SNS[SNS Topic]
    SNS -->|deliver| Q1[SQS Queue: Analytics]
    SNS -->|deliver| Q2[SQS Queue: Notifications]
    SNS -->|deliver| Q3[SQS Queue: Audit]

This combines SNS’s pub/sub with SQS’s queuing. Each consumer gets its own queue, enabling independent processing, retries, and parallel consumption.

SQS vs SNS: When to Use Which

AspectSQSSNS
PatternPoint-to-point queuePub/sub
DeliveryPull (consumers poll)Push (subscribers receive)
Multiple consumersSingle consumer per messageAll subscribers receive
OrderingFIFO option availableNo ordering guarantee
ThroughputUnlimited (standard), 3000/s (FIFO)Unlimited
CostPer API callPer message published + per delivery

Use SQS when:

  • You need work distribution across consumers
  • Consumers should each process a message once
  • You need buffering for burst traffic
  • You want visibility timeout and redrive capabilities

Use SNS when:

  • Multiple consumers need the same message
  • You want push-based delivery
  • You are broadcasting events to many subscribers
  • You want simple pub/sub without managing infrastructure

When Not to Use SQS

  • When you need push-based delivery: SQS is pull-based; consumers must poll
  • When you need message ordering across queues: Standard queues do not guarantee ordering
  • When you need exactly-once delivery without deduplication logic: Standard queues deliver at-least-once
  • When you need multiple consumers on same stream: Each message goes to one queue only

When Not to Use SNS

  • When you need message persistence beyond 14 days: SNS does not persist messages (subscribers must be available)
  • When you need strict ordering: SNS does not guarantee ordering across subscribers
  • When you need exactly-once without client deduplication: SNS delivers at-least-once
  • When you have many small subscribers: Each subscription incurs delivery costs

SNS vs EventBridge

EventBridge is a serverless event bus that builds on SNS but adds event routing rules, schema discovery, and SaaS integrations. The choice depends on your use case.

AspectSNSEventBridge
ArchitecturePub/sub topicEvent bus with routing rules
Schema registryNoneBuilt-in schema registry
SaaS integrationsNone200+ SaaS sources
Event routingTopic-basedRule-based with filtering
Archive and replayNoYes (up to 24 hours)
CostPer message + deliveryPer event + processing
Dead letter handlingDLQ per subscriptionVia API destinations

EventBridge excels when you have SaaS integrations (Salesforce, Zendesk, third-party webhooks) or need schema validation and discovery. SNS is simpler and cheaper for pure point-to-point fan-out within AWS.

# EventBridge rule-based routing example
import boto3

events = boto3.client('events')

# Create rule with multiple targets based on detail type
events.put_rule(
    Name='order-events',
    EventPattern='{"source": ["aws.ec2"], "detail-type": ["EC2 Instance State Change"]}',
    State='ENABLED'
)

# Add targets
events.put_targets(
    Rule='order-events',
    Targets=[
        {'Id': '1', 'Arn': 'lambda-arn', 'RoleArn': 'execution-role-arn'},
        {'Id': '2', 'Arn': 'sqs-arn'}
    ]
)

Within your application stack, SNS handles most messaging well. When you need event routing, schema management, or SaaS ingestion, EventBridge costs more but adds value.

Cost Optimization

SQS and SNS costs scale with API calls and message delivery. Optimizing both reduces your bill.

SQS Cost Factors

SQS charges per API request:

Request TypeStandard QueueFIFO Queue
Send, Receive, Delete$0.40 per million$0.40 per million
Other operations$0.40 per million$0.40 per million

Reducing SQS Costs

Long polling is the primary way to reduce costs. Short polling (the default) bills per request regardless of whether a message arrives. Long polling waits up to 20 seconds, batching multiple empty responses into one billable request.

# Enable long polling to reduce costs
sqs.receive_message(
    QueueUrl=queue_url,
    MaxNumberOfMessages=10,
    WaitTimeSeconds=20,  # Long polling - wait up to 20s
    ReceiveRequestAttemptM=3  # For FIFO, helps with ordering
)

For high-throughput queues, batching with ReceiveMessage (up to 10 messages per call) reduces the number of billable requests.

SNS Cost Factors

SNS charges per message published plus per message delivered:

OperationCost per Million
Publish$0.50
Subscribe/Confirm$0.40
Delivery to SQS/HTTP/Lambda$0.50
Delivery to Mobile Push$1.50 - $6.00

Reducing SNS Costs

Message batching with PublishBatch reduces publish costs. For delivery, use filter policies to avoid delivering messages to subscribers who will discard them.

# Batch publish to reduce costs
entries = [
    {'Id': str(i), 'Message': json.dumps({'event': f'event-{i}'})}
    for i in range(10)
]
sns.publish_batch(TopicArn=topic_arn, PublishBatchRequestEntries=entries)
# 10 messages for the price of 1 publish call + 10 deliveries

Cross-region SNS subscriptions add data transfer costs. Keep subscribers in the same region when possible.

AWS PrivateLink/VPC Endpoint Configuration

PrivateLink keeps SQS and SNS traffic within the AWS network, avoiding internet traversal and providing private connectivity from within a VPC.

SQS VPC Endpoints

# Create VPC endpoint for SQS
aws ec2 create-vpc-endpoint \
    --vpc-id vpc-012345678 \
    --service-name com.amazonaws.us-east-1.sqs \
    --vpc-endpoint-type Interface \
    --subnet-ids subnet-012345678 subnet-876543210 \
    --security-group-ids sg-012345678

Configure the security group to allow traffic on port 443 from your application servers. VPC endpoints use ENIs in your subnets.

SNS VPC Endpoints

# Create VPC endpoint for SNS
aws ec2 create-vpc-endpoint \
    --vpc-id vpc-012345678 \
    --service-name com.amazonaws.us-east-1.sns \
    --vpc-endpoint-type Interface \
    --subnet-ids subnet-012345678 subnet-876543210 \
    --security-group-ids sg-012345678

IAM Policies for VPC Endpoints

VPC endpoints require IAM policies that allow access from the VPC endpoint, not just the public internet:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": "*",
      "Action": ["sqs:ReceiveMessage", "sqs:DeleteMessage"],
      "Resource": "arn:aws:sqs:us-east-1:123456789:my-queue",
      "Condition": {
        "StringEquals": {
          "aws:sourceVpce": "vpce-012345678"
        }
      }
    }
  ]
}

This policy restricts access to messages in the queue to traffic originating from your VPC endpoint only.

SNS to Lambda Failure Modes

When SNS delivers to Lambda, failures can occur at several points. Understanding the failure modes helps you design DLQ strategies.

Lambda Invocation Failure Modes

FailureWhat HappensMitigation
Lambda throttlingSNS retries with exponential backoffRequest more concurrent executions
Lambda timeoutTreated as invocation failureSet appropriate timeout + DLQ
Lambda crashMessage not processedDLQ for failed messages
Invalid payloadLambda throws on unmarshalSNS rejects before invocation
Permission deniedSNS retries with exponential backoff up to visibility timeout, then DLQVerify IAM role has lambda:InvokeFunction permission

Configuring Lambda DLQ

# SNS-to-Lambda DLQ configuration
sns.subscribe(
    TopicArn=topic_arn,
    Protocol='lambda',
    Endpoint=lambda_arn,
    Attributes={
        'DeliveryPolicy': json.dumps({
            'healthyRetryPolicy': {
                'minDelayTarget': 60,
                'maxDelayTarget': 600,
                'numRetries': 3,
                'numNoDelayRetries': 0,
                'backoffFunction': 'exponential'
            }
        })
    }
)

# In Lambda, send failures to DLQ
def handler(event, context):
    try:
        process_event(event)
    except Exception as e:
        # Send to DLQ via SNS
        sns.publish(
            TopicArn=dlq_arn,
            Message=json.dumps({'original': event, 'error': str(e)})
        )
        raise  # Re-raise so SNS marks as failed

Async Invocation Configuration

Lambda async invocation has its own retry behavior separate from SNS:

# Configure Lambda async settings via boto3
lambda_client.put_function_event_invoke_config(
    FunctionName='my-function',
    MaximumRetryAttempts=2,
    MaximumEventAgeInSeconds=3600,
    DestinationConfig={
        'OnFailure': {
            'Destination': 'arn:aws:sqs:us-east-1:123456789:my-dlq'
        }
    }
)

SNS-to-Lambda chains the SNS retry on top of Lambda’s own retry. Set both to avoid duplicate processing or message loss.

Combining SQS and SNS

The most common pattern is SNS fan-out to SQS queues:

  1. Publisher sends to SNS topic
  2. SNS pushes to multiple SQS queues (one per consumer group)
  3. Each consumer group processes from its own queue

This gives you:

  • SNS’s topic-based routing
  • SQS’s per-consumer queuing and retry
  • Independent scaling per consumer group
# SNS publishes to multiple SQS queues (configured via topic subscription)
sns.publish(TopicArn=topic_arn, Message=json.dumps(event))

# Each consumer group has its own queue
# Analytics queue consumer
for msg in sqs.receive_message(QueueUrl=analytics_queue_url):
    run_analytics(msg)

# Notifications queue consumer
for msg in sqs.receive_message(QueueUrl=notifications_queue_url):
    send_notification(msg)

Comparison to Self-Managed Solutions

Managed services like SQS and SNS remove operational burden. You do not provision servers, manage replication, or tune performance. AWS handles availability and durability.

The tradeoff is vendor lock-in. Your code depends on AWS APIs, and migrating to another platform requires rewriting the messaging layer. Self-managed solutions (Kafka, RabbitMQ) give you portability but require operational investment.

For understanding messaging patterns that apply regardless of platform, see message queue types and pub/sub patterns.

Production Failure Scenarios

FailureImpactMitigation
SQS broker failureQueue temporarily unavailable; messages not sent or receivedSQS manages replication; multi-AZ deployment is automatic
SNS broker failureMessages not delivered to subscribersUse SNS topic ARN retries; implement dead letter handling
Consumer crash mid-processingMessage becomes visible again after visibility timeoutUse visibility timeout appropriately; implement idempotent processing
SNS subscription deletedMessages silently dropped for that subscriberUse CloudWatch to monitor subscription status
SQS queue deletionAll messages permanently lostUse SQS lifecycle policies; backup critical messages
Throughput limit exceededMessages rejected or throttledRequest service quota increase; use exponential backoff
FIFO ordering violationMessages processed out of orderUse message group IDs correctly; single consumer per group
SNS filter policy misconfigurationSubscribers receive no messages or wrong messagesTest filter policies; monitor filtered-out message counts

Observability Checklist

Metrics to Monitor

  • SQS queue depth: ApproximateNumberOfMessagesVisible
  • SQS old message age: ApproximateAgeOfOldestMessage (critical for ordering)
  • SNS delivery rate: NumberOfMessagesPublished, NumberOfNotificationsDelivered
  • SNS delivery success/failure: NumberOfNotificationsFailed
  • SNS filter policy matches: NumberOfMessagesFilteredOut
  • SQS receive latency: ReceiveMessageWaitTimeSeconds (long polling effectiveness)
  • FIFO group ordering lag: Monitors per message group

Logs to Capture

  • SQS sendMessage, receiveMessage, and deleteMessage events
  • SNS publish and delivery events
  • SNS subscription creation and deletion
  • SQS visibility timeout expirations
  • Dead letter queue arrivals (via DLQ subscription)
  • CloudTrail API calls for administrative actions

Alerts to Configure

  • SQS queue depth exceeds threshold
  • Oldest message age exceeds SLA threshold
  • SNS delivery failure rate exceeds threshold
  • SNS filtered-out message rate is abnormal
  • SQS long polling not effective (empty responses)
  • FIFO message group lag for critical groups

Security Checklist

  • Authentication: Use IAM roles for AWS SDK clients; avoid long-term access keys
  • Authorization: Use IAM policies for SQS/SNS access; principle of least privilege
  • Encryption in transit: Enable TLS; use VPC endpoints for private access
  • Encryption at rest: Enable SQS server-side encryption (SSE) with KMS
  • VPC endpoints: Use AWS PrivateLink to keep traffic within AWS network
  • Message content: Do not send sensitive data unencrypted; use SNS message encryption or application-level encryption
  • Cross-account access: Use resource policies for cross-account SNS subscriptions
  • Audit logging: Enable CloudTrail for all SQS and SNS API operations

Common Pitfalls / Anti-Patterns

Pitfall 1: Not Setting Visibility Timeout Correctly

If visibility timeout is too short, messages are reprocessed before the consumer finishes. If too long, poison messages block the queue. Set it based on expected processing time plus a buffer.

Pitfall 2: Using Standard Queues When FIFO Is Needed

Standard queues offer best-effort ordering. If your business requires ordering, use FIFO queues with message group IDs.

Pitfall 3: Not Polling Efficiently

Short polling (default) wastes API calls. Use long polling (WaitTimeSeconds > 0) to reduce costs and latency.

Pitfall 4: Forgetting to Delete Messages After Processing

SQS does not auto-delete. Always call DeleteMessage after successful processing or messages will be reprocessed.

Pitfall 5: Mixing Message Types in One Queue

Different consumers processing different message types in one queue leads to coupling and processing errors. Use separate queues per message type.

Pitfall 6: Not Handling SNS Delivery Failures

If a Lambda subscriber throws an error or an HTTP endpoint is unreachable, SNS retries. But without a DLQ, failed messages are lost after retries. Always configure dead letter queues for failed deliveries.

Quick Recap

Key Points

  • SQS is pull-based point-to-point queuing; SNS is push-based pub/sub
  • Standard queues offer unlimited throughput with at-least-once delivery; FIFO offers exactly-once with ordering
  • SNS fan-out to multiple SQS queues combines pub/sub flexibility with queuing durability
  • Visibility timeout controls when messages become visible again if not acknowledged
  • Dead letter queues capture failed messages for debugging
  • Long polling reduces empty responses and costs
  • Server-side encryption with KMS protects messages at rest

Pre-Deployment Checklist

- [ ] SQS visibility timeout set based on expected processing time
- [ ] SQS long polling enabled (WaitTimeSeconds > 0)
- [ ] Dead letter queue configured for failed message handling
- [ ] FIFO queues used when ordering is required
- [ ] Message group IDs set correctly for FIFO ordering
- [ ] Idempotent message processing implemented
- [ ] SSE-KMS encryption enabled for SQS queues
- [ ] IAM policies scoped to minimum required permissions
- [ ] VPC endpoints configured for private network access
- [ ] CloudWatch alarms set for queue depth and message age
- [ ] SNS filter policies tested before deployment
- [ ] SNS dead letter queue configured for failed deliveries
- [ ] DeleteMessage called after successful processing
- [ ] SNS subscription permissions reviewed for cross-account access

Conclusion

SQS and SNS are complementary. SQS handles work queues with pull-based consumption. SNS handles broadcast notifications with push-based delivery. Together, they cover most asynchronous communication needs in AWS.

FIFO queues give you ordering guarantees when needed. Message filtering reduces unnecessary processing. Dead letter queues handle failures. The managed nature means you focus on application logic rather than infrastructure.

If you are already on AWS, these services work well with Lambda, EC2, and other AWS compute options.

Category

Related Posts

Cloud Cost Optimization: Right-Sizing, Reserved Capacity, Spot Instances

Control cloud costs without sacrificing reliability. Learn right-sizing compute, reserved capacity planning, spot instance strategies, and cost allocation across teams.

#cloud #cost #optimization

AWS Data Services: Kinesis, Glue, Redshift, and S3

Guide to AWS data services for building data pipelines. Compare Kinesis vs Kafka, use Glue for ETL, query with Athena, and design S3 data lakes.

#data-engineering #aws #kinesis

Data Migration: Strategies and Patterns for Moving Data

Learn proven strategies for migrating data between systems with minimal downtime. Covers bulk migration, CDC patterns, validation, and rollback.

#data-engineering #data-migration #cdc