AWS Core Services for DevOps: EC2, ECS, EKS, S3, Lambda
Navigate essential AWS services for DevOps workloads—compute (EC2, ECS, EKS), storage (S3), serverless (Lambda), and foundational networking.
AWS Core Services for DevOps: EC2, ECS, EKS, S3, Lambda
AWS forms the backbone of many enterprise cloud strategies. Understanding its core services means understanding how compute, storage, networking, and serverless components fit together. This post covers the essential services for deploying and operating applications on AWS.
The services covered here appear in virtually every AWS architecture. Even if you plan to use higher-level services, knowing how the underlying components work helps you make better architectural decisions and debug problems when they arise.
Introduction
EC2 vs. ECS vs. EKS vs. Lambda
Choose EC2 when you need full control over the operating system, require specific hardware configurations, or run legacy applications that cannot be containerized. EC2 gives you the most flexibility at the cost of the most operational overhead.
Choose ECS when you want container orchestration without the complexity of Kubernetes. ECS integrates tightly with AWS services like ALB, CloudWatch, and IAM, making it a natural fit for teams already invested in AWS. Use Fargate launch type when you want serverless containers—AWS manages the EC2 fleet for you.
Choose EKS when your team knows Kubernetes and wants portability across cloud providers, or when you need Kubernetes-specific features like custom controllers, complex pod scheduling, or a broad ecosystem of third-party tools.
Choose Lambda when your workload is event-driven, short-running, or bursty. Lambda handles scaling automatically and charges only for execution time. If your function runs for hours continuously, EC2 or ECS is likely cheaper.
S3 Storage Class Selection
Use S3 Standard for frequently accessed data—hot storage for active workloads. Use S3 IA (Infrequent Access) for data that is accessed less than once per month but needs rapid access when needed. Use S3 Glacier for archival data that you need to retain but rarely access, with retrieval times of minutes to hours depending on the tier.
AWS Multi-Account Architecture
AWS resources are deployed to specific geographic regions, and regions are independent of each other. Each region has multiple availability zones (AZs)—physically separate data centers with independent power, networking, and cooling. Deploying across multiple AZs protects against single-datacenter failures.
# List available regions
aws ec2 describe-regions --output table
# Get current region
aws configure get region
Account structure shapes your AWS environment. Organizations use consolidated billing to manage multiple accounts under a single payer. Common patterns include separate accounts per environment (dev, staging, production), per team, or per application domain.
Organization
├── Management Account (billing, SCPs)
├── Security Account (GuardDuty, Security Hub)
├── Dev Account
├── Staging Account
└── Production Account
Service Control Policies (SCPs) at the organization level restrict what can be done in member accounts. This enforces guardrails without managing IAM in every account.
flowchart TD
A[AWS Organization] --> B[Management Account]
A --> C[Security Account]
A --> D[Dev Account]
A --> E[Staging Account]
A --> F[Production Account]
C --> G[GuardDuty]
C --> H[Security Hub]
D --> I[Dev VPC]
E --> J[Staging VPC]
F --> K[Production VPC]
K --> L[ALB]
L --> M[EKS Cluster]
M --> N[ECS Tasks]
N --> O[S3 Artifacts]
EC2 Instance Types and ASGs
EC2 provides virtual machines in the cloud. Instance types determine the CPU, memory, storage, and networking capacity. The naming pattern is family, generation, and size—for example, t3.micro is a burstable general purpose instance, third generation, micro size.
# Launch an EC2 instance
aws ec2 run-instances \
--image-id ami-0c55b159cbfafe1f0 \
--instance-type t3.micro \
--key-name my-key-pair \
--security-group-ids sg-0123456789abcdef0 \
--subnet-id subnet-0123456789abcdef0
# Describe instance status
aws ec2 describe-instance-status --instance-ids i-0abcdef1234567890
Auto Scaling Groups (ASGs) automatically adjust capacity based on demand. You define minimum, maximum, and desired capacity, along with scaling policies that trigger adjustments based on metrics like CPU utilization or request count.
# ASG CloudFormation snippet
AutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
MinSize: 2
MaxSize: 10
DesiredCapacity: 2
VPCZoneIdentifier:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
LaunchConfigurationName: !Ref LaunchConfig
TargetGroupARNs:
- !Ref TargetGroup
HealthCheckType: ELB
HealthCheckGracePeriod: 300
ASGs work with Elastic Load Balancers to distribute traffic across healthy instances. The load balancer performs health checks and removes unhealthy instances from the rotation automatically.
ECS Task Definitions and Services
Amazon Elastic Container Service (ECS) manages Docker containers on a cluster of EC2 instances or using AWS Fargate serverless compute. Task definitions describe what containers to run and how much resources they need.
{
"family": "webapp",
"containerDefinitions": [
{
"name": "webapp",
"image": "123456789.dkr.ecr.us-east-1.amazonaws.com/webapp:latest",
"memory": 512,
"cpu": 256,
"essential": true,
"portMappings": [
{
"containerPort": 8080,
"protocol": "tcp"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/webapp",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
}
]
}
An ECS service maintains a desired count of task instances and automatically replaces failed tasks. It integrates with Application Load Balancers for traffic distribution and Auto Scaling for dynamic capacity adjustment.
# Register a new task definition revision
aws ecs register-task-definition --cli-input-json file://task-definition.json
# Update service to use new revision
aws ecs update-service \
--cluster production \
--service webapp \
--task-definition webapp:2
Fargate removes the need to manage EC2 instances for container workloads. You specify CPU and memory requirements, and AWS handles the underlying infrastructure. This simplifies operations at the cost of less granular control over the compute environment.
EKS Cluster Management Basics
Amazon Elastic Kubernetes Service (EKS) provides a managed Kubernetes control plane. AWS handles the master nodes; you manage the worker nodes and workloads.
# Create an EKS cluster
aws eks create-cluster \
--name production \
--role-arn arn:aws:iam::123456789:role/eks-cluster-role \
--resources-vpc-config subnetIds=subnet-0123456789abcdef0,subnet-0123456789abcdef1,securityGroupIds=sg-0123456789abcdef0
# Update kubeconfig
aws eks update-kubeconfig --name production
# Verify cluster access
kubectl get svc
EKS manages the Kubernetes control plane across multiple AZs for high availability. Worker nodes join the cluster via a node group, which can be managed by AWS (EKS Managed Node Groups) or self-managed.
# Node group configuration
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: production
region: us-east-1
managedNodeGroups:
- name: compute
instanceType: t3.medium
desiredCapacity: 3
minSize: 2
maxSize: 10
volumeSize: 50
ssh:
allow: true
Kubernetes deployments, services, and ingresses work the same on EKS as on any Kubernetes cluster. The main difference is how you configure IAM roles for service accounts (IRSA) for workload authentication to AWS services.
S3 for Artifact Storage
Amazon S3 stores objects in buckets. For DevOps, S3 typically holds build artifacts, deployment packages, and infrastructure state. S3 integrates with everything on AWS through IAM policies and resource-based bucket policies.
# Create a bucket for artifacts
aws s3 mb s3://my-app-artifacts --region us-east-1
# Upload a build artifact
aws s3 cp ./dist/app.tar.gz s3://my-app-artifacts/prod/
# List bucket contents
aws s3 ls s3://my-app-artifacts/prod/
# Enable versioning for artifact history
aws s3api put-bucket-versioning \
--bucket my-app-artifacts \
--versioning-configuration Status=Enabled
Lifecycle policies automate archival and deletion. Move old artifacts to cheaper storage classes automatically, or delete artifacts older than a retention period.
{
"Rules": [
{
"ID": "ArchiveOldArtifacts",
"Status": "Enabled",
"Filter": {
"Prefix": "prod/"
},
"Transitions": [
{
"Days": 30,
"StorageClass": "GLACIER"
}
],
"Expiration": {
"Days": 365
}
}
]
}
Lambda for Serverless Workloads
AWS Lambda runs code in response to events without provisioning servers. You pay only for the compute time consumed—billed in milliseconds. Lambda is ideal for event-driven tasks, API backends, and background processing.
// Lambda handler for processing S3 uploads
exports.handler = async (event) => {
const s3Event = event.Records[0].s3;
const bucket = s3Event.bucket.name;
const key = decodeURIComponent(s3Event.object.key.replace(/\+/g, " "));
console.log(`Processing file: ${bucket}/${key}`);
// Process the file...
const result = await processUpload(bucket, key);
return {
statusCode: 200,
body: JSON.stringify({ result }),
};
};
Lambda functions run in a VPC by default with access to AWS services and the internet. To access VPC resources like RDS databases, configure the function with VPC subnet and security group attachments.
# Create a Lambda function
aws lambda create-function \
--function-name my-processor \
--runtime nodejs20.x \
--role arn:aws:iam::123456789:role/lambda-execution-role \
--handler index.handler \
--zip-file fileb://function.zip \
--vpc-config SubnetIds=subnet-0123456789abcdef0,SecurityGroupIds=sg-0123456789abcdef0
For more on managing AWS costs, see our post on Cost Optimization which covers EC2, Lambda, and S3 cost optimization strategies.
For more on securing AWS workloads, see Cloud Security for IAM best practices, VPC design, and encryption patterns, and Network Security for security groups, NACLs, and VPC endpoint configuration.
Trade-off Analysis
| Scenario | EC2 | ECS/Fargate | EKS | Lambda |
|---|---|---|---|---|
| Full OS control needed | Yes | No | No | No |
| Serverless containers | No | Fargate launch type | No | No |
| Kubernetes ecosystem | No | No | Yes | No |
| Pay-per-second billing | No (hourly) | Yes | No | Yes (100ms) |
| Cold start latency | None | Seconds | Seconds | Seconds to minutes |
| Long-running workloads | Best choice | Good | Good | Poor (15 min max) |
| Stateful workloads | Best choice | Limited | Good | No |
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| ASG fails to scale due to ELB health check misconfiguration | Traffic routed to unhealthy instances, requests fail | Use ELB health check type, test scale-in manually |
| ECS task stuck in PENDING due to insufficient resources | Service capacity drops, requests queued or dropped | Set task completion timeouts, monitor pending count |
| EKS node group upgrade fails midway | Pods evicted before new nodes ready, service disruption | Use surge unavailablity settings, upgrade one node at a time |
| S3 bucket policy denies access unexpectedly | Application cannot read/write artifacts, deployments fail | Use IAM access analyzer, test bucket policies in dev first |
| Lambda VPC config causes cold start timeouts | Requests time out during scale-up | Pre-provision connections outside VPC handler, use provisioned concurrency |
AWS Observability Hooks
EC2 and ASG monitoring:
# Get EC2 instance metrics
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-0abcdef1234567890 \
--start-time 2026-03-24T00:00:00 \
--end-time 2026-03-25T00:00:00 \
--period 3600 \
--statistics Average
# Check ASG health status
aws autoscaling describe-auto-scaling-groups \
--auto-scaling-group-names my-asg \
--query 'AutoScalingGroups[0].Instances[*].[InstanceId,HealthStatus,LifeCycleState]'
ECS monitoring:
# Check service health and running task count
aws ecs describe-services \
--cluster production \
--services webapp \
--query 'services[0].{runningCount:runningCount,desiredCount:desiredCount,pendingCount:pendingCount}'
EKS monitoring:
# Check node health and pod distribution
kubectl get nodes -o wide
kubectl get pods -o wide --all-namespaces | grep -v Running
# Get cluster control plane health
aws eks describe-cluster \
--name production \
--query 'cluster.{status:status,version:version,endpoint:endpoint}'
Key CloudWatch metrics to alert on:
| Service | Metric | Alert Threshold |
|---|---|---|
| EC2 | CPUUtilization | > 80% for 5 minutes |
| EC2 | StatusCheckFailed | any for 2 minutes |
| ASG | CPUUtilization | > 75% for 3 minutes |
| ECS | CPUUtilization | > 85% for 3 minutes |
| ECS | RunningTaskCount | < desired for 2 minutes |
| Lambda | Errors | > 0 for 5 minutes |
| Lambda | Duration | > 3000ms p99 |
| S3 | BucketSizeBytes | unexpected change |
Common Pitfalls / Anti-Patterns
Using the default VPC. The default VPC has permissive security groups and is shared across all accounts in a region. Production workloads should use dedicated VPCs with explicit networking controls.
Attaching IAM policies directly to users instead of roles. Direct user policies create credential management nightmares and are harder to audit. Always use IAM roles for EC2, Lambda, and other compute services.
Not configuring ASG health checks properly. If your health check is too lenient, unhealthy instances stay in service. If it is too strict, instances get replaced during legitimate load spikes. Match health check type to your application needs.
Storing secrets in S3 object metadata or user data. S3 object metadata is not encrypted by default and appears in CloudTrail logs. Use AWS Secrets Manager or Systems Manager Parameter Store instead.
Using Lambda VPC config without considering cold starts. VPC-enabled Lambda functions must establish an ENI connection before executing, adding 10-30 seconds to cold starts. For latency-sensitive APIs, pre-provision connections or use Lambda outside VPC for initial request handling.
Capacity Estimation and Benchmark Data
Use these numbers for initial capacity planning. Actual performance varies by workload characteristics.
EC2 Instance Type Families
| Family | Best For | Instance Types | Network Performance |
|---|---|---|---|
| t3 | Burstable workloads, dev/test | t3.micro → t3.2xlarge | Up to 5 Gbps |
| m5 | General purpose, web servers | m5.large → m5.24xlarge | Up to 100 Gbps (xlarge+) |
| m6i | General purpose (latest gen) | m6i.large → m6i.32xlarge | Up to 100 Gbps |
| c5 | Compute optimized, batch processing | c5.large → c5.24xlarge | Up to 100 Gbps |
| c6i | Compute optimized (latest gen) | c6i.large → c6i.32xlarge | Up to 100 Gbps |
| r5 | Memory optimized, databases | r5.large → r5.24xlarge | Up to 100 Gbps |
| r6i | Memory optimized (latest gen) | r6i.large → r6i.32xlarge | Up to 100 Gbps |
Lambda Performance Parameters
| Parameter | Value | Notes |
|---|---|---|
| Cold start (no VPC) | 100-500ms | Depends on runtime and package size |
| Cold start (with VPC) | 1-10 seconds | ENI attachment is the bottleneck |
| Provisioned concurrency | ~50ms | Eliminates cold starts for warmed functions |
| Max execution duration | 900 seconds (15 min) | Configure timeout based on workload |
| Default memory | 128 MB | Increase memory to boost CPU proportionally |
| Max concurrent executions | 1,000 per region | Request limit increase if needed |
S3 Performance Targets
| Metric | Value |
|---|---|
| PUT/LIST/DELETE latency | 100-200ms (p99) |
| GET latency | 60-100ms (p99) |
| Max request rate per prefix | 3,500 PUT/COPY/POST/LIST, 5,500 GET/HEAD per second |
| Multi-object delete | Up to 1,000 keys per request |
| Transfer acceleration | Adds 20-30% on upload speed for distant regions |
Service Limits for Planning
| Service | Default Limit | Typical Increase Request |
|---|---|---|
| Lambda concurrent executions | 1,000 per region | AWS Support |
| API Gateway requests per second | 10,000 per region | AWS Support |
| EBS volumes per account | 5,000 | AWS Support |
| VPCs per region | 5 | AWS Support |
| ENIs per instance (varies by type) | 3-15 | Instance type dependent |
Additional References
- Cost Optimization - EC2, Lambda, and S3 cost optimization strategies
- Cloud Security - IAM best practices, VPC design, and encryption patterns
- Network Security - Security groups, NACLs, and VPC endpoint configuration
- AWS Whitepapers - Official architecture guides
- AWS Well-Architected Framework - Best practices for cloud workloads
Interview Questions
What to cover:
- ASGs respond to CloudWatch metrics: CPU utilization, memory, request count, custom metrics
- Define min/max/desired capacity; ASG adjusts between min and max based on policies
- Scaling policies: step scaling (add/remove instances in steps), target tracking (keep metric at value)
- Health checks: ELB health checks mark unhealthy instances; ASG replaces them
- Cooldown period prevents flapping; wait period before next scaling action
What to cover:
- EC2 launch type: you manage the EC2 fleet; more control over instance type and cost
- Fargate: serverless; AWS manages the underlying nodes; you pay per task resource
- Fargate removes SSH access and node-level customization
- Fargate good for variable workloads; EC2 better for consistent high-throughput with reserved instances
- Both use same task definitions and service scheduler; migration is straightforward
What to cover:
- Create an IAM service account: aws iam create-service-account --name webapp
- Annotate the Kubernetes service account with the IAM role ARN
- Install AWS IAM authenticator; EKS uses it to map IAM roles to K8s service accounts
- Pod automatically gets temporary credentials via OIDC token exchange
- No need to store keys in secrets; credentials rotate automatically
What to cover:
- S3 Standard: frequently accessed (> once per month), immediate retrieval, highest storage cost
- S3 IA: infrequent access (< once per month), lower storage cost, retrieval fees apply
- S3 Glacier: archival, retrieval in minutes to hours, cheapest storage, access cost higher
- Use lifecycle policies: move to IA after 30 days, Glacier after 90 days, delete after 365
- Versioning + lifecycle = artifact history retained economically
What to cover:
- Lambda outside VPC: cold start 100-500ms, full AWS service access, internet access
- Lambda inside VPC: cold start 1-10 seconds due to ENI attachment; no direct internet
- VPC needed for: RDS, ElastiCache, private API Gateways, internal services
- Solution: handler outside VPC for initial request, pre-provision connections, use provisioned concurrency
- Consider: do you actually need VPC access or can you use AWS services directly?
What to cover:
- SCP (Service Control Policies) enforce guardrails at organization level across all accounts
- Separate accounts per environment: dev/staging/prod isolate blast radius
- Separate accounts per team or application domain for clean IAM boundaries
- Consolidated billing: one payer account, track costs by account/tag
- Security account aggregates GuardDuty, Security Hub findings centrally
What to cover:
- Managed node groups: AWS handles node provisioning, updates, and termination
- Managed: you specify instance type and count; AWS handles lifecycle
- Self-managed: you create AMIs, manage kubelet, handle upgrades manually
- Managed node groups support SSH with key pair if needed
- Use managed for baseline; use self-managed when you need custom AMIs or specific kernel versions
What to cover:
- Block public access: bucket settings override bucket policies
- IAM policies: grant access to specific buckets/prefixes per role
- Bucket policies: JSON policies attached to bucket, can grant cross-account access
- Access Analyzer: checks bucket policy for external access risks
- VPC endpoints: access from within VPC without internet
- Encrypt: SSE-KMS with CMK for audit trail of encryption key usage
What to cover:
- ECS service scheduler marks task as unhealthy after grace period
- Unhealthy task is stopped and replaced; new task launches if capacity allows
- Health check grace period gives time for application to initialize
- If task is stuck in PENDING: not enough resources, image pull failures, or health check misconfiguration
- Check: task definition health check, container port mappings, startup time
What to cover:
- Billed per invocation and per GB-second of execution time
- Duration: 100ms billing increments (round up); optimize memory/CPU tradeoffs
- Data transfer: VPC egress charges apply; provisioned concurrency has hourly cost
- Cold starts: do not count as billed duration unless function actually executes
- Estimate: 1M requests × 500ms × 512MB = ~$0.20/month (very rough)
What to cover:
- ALB operates at layer 7 (HTTP/HTTPS), NLB operates at layer 4 (TCP/UDP)
- ALB supports path-based routing, host-based routing, and content-based routing
- ALB terminates TLS and forwards decrypted traffic; NLB passes encrypted traffic through
- NLB handles millions of requests per second with lower latency; ALB adds ~1-2ms latency
- ALB integrates with ECS services for dynamic port mapping; NLB for high-throughput non-HTTP workloads
- ALB includes built-in health checks; NLB health checks are simpler (TCP connect only)
What to cover:
- ECR stores container images in a managed registry backed by S3 for durability
- ECS task definitions reference ECR image URLs: `123456789.dkr.ecr.us-east-1.amazonaws.com/webapp:latest`
- IAM policies control who can pull images from which repositories
- Image scanning on push detects CVEs and prevents vulnerable images from deploying
- Lifecycle policies auto-expire old image versions to reduce storage costs
- ECR works with both ECS and EKS—same registry, different pull authentication
What to cover:
- Users have permanent access keys (long-term credentials); roles provide temporary credentials
- Roles are assumed by identities (users, services, applications) for specific tasks
- For EC2, Lambda, ECS: use instance profiles or task roles—no need to store keys
- IAM users are for human access; service roles are for machine-to-machine access
- Roles prevent credential leakage—keys cannot be stolen if keys do not exist
- Use IAM roles for federation: users assume a role to get temporary elevated access
What to cover:
- EC2 health check: marks instance unhealthy if the instance status or system status becomes impaired
- ELB health check: marks instance unhealthy if the ELB reports the instance as failed via its health check
- ELB health check is more application-aware—checks if your service responds, not just if EC2 is running
- Using EC2 health check when the application can be unhealthy but EC2 is fine leads to traffic to bad instances
- Using ELB health check when the app is fine but the ELB health check endpoint is wrong leads to unnecessary replacements
What to cover:
- S3 Standard: highest storage cost, immediate access, no retrieval fees
- S3 Intelligent-Tiering: monitors access patterns, auto-moves objects to lower-cost tiers after 30 days of no access
- Intelligent-Tiering has a small monthly monitoring fee and possible retrieval fees from infrequent access tier
- Best for: unpredictable access patterns, applications where you do not know access frequency in advance
- Not best for: predictable hot data (Standard is cheaper), data accessed very frequently
What to cover:
- Lambda scales automatically up to 1000 concurrent executions per region by default
- Reserved concurrency: guarantees a set number of executions for a function, isolates it from others
- When reserved concurrency is exhausted, new invocations get throttled (429 Too Many Requests)
- Provisioned concurrency: pre-warms instances to eliminate cold starts for a reserved allocation
- Use reserved concurrency to prevent one function from consuming all regional capacity
- Throttled invocations can be retried or routed to a dead-letter queue
What to cover:
- Family: the name of the task definition, like a versioned template
- Revision: a specific version of the family (webapp:1, webapp:2, webapp:3)
- When you register a new task definition, you specify family and get a new revision number
- ECS service references a specific revision (webapp:2); updating the service picks up new revisions
- Family groups related task definitions—webapp-service and webapp-worker might be separate families
What to cover:
- VPC endpoint creates a private connection from your VPC to S3 without internet traversal
- Without VPC endpoint, traffic to S3 goes through NAT gateway or internet gateway
- VPC endpoint is free; NAT gateway has hourly cost plus data processing cost
- Endpoint policy controls which S3 buckets can be accessed from the endpoint
- Use VPC endpoints for: improved security (no internet exposure), cost reduction, lower latency
- VPC endpoint for DynamoDB is separate from S3—create both for complete private AWS access
What to cover:
- Public endpoint: kubectl access from anywhere with authentication via AWS IAM
- Private endpoint: kubectl access only from within the VPC—more secure for private clusters
- Best practice: enable both, restrict public access via security groups
- Private endpoint uses VPC internal DNS to resolve the cluster endpoint address
- For hybrid scenarios, public endpoint with restricted CIDR blocks is a common pattern
What to cover:
- S3 Standard: burst throughput capability—short spikes up to 3000 PUT/COPY/POST/LIST or 5000 GET/HEAD per second per prefix
- Provisioned throughput: guaranteed sustained throughput for predictable workloads
- Burst is sufficient for most workloads; provisioned for consistent high-throughput requirements
- Burst replenishes over time; if you consistently need more than burst provides, provisioned is better
- Cost: provisioned costs more; only use when burst is consistently insufficient for your workload
Further Reading
- Cost Optimization - EC2, Lambda, and S3 cost optimization strategies
- Cloud Security - IAM best practices, VPC design, and encryption patterns
- Network Security - Security groups, NACLs, and VPC endpoint configuration
- AWS Whitepapers - Official architecture guides
- AWS Well-Architected Framework - Best practices for cloud workloads
Conclusion
Key Takeaways
- EC2 gives full control but maximum operational burden; use for legacy workloads and specific hardware needs
- ECS with Fargate removes EC2 management; best for teams wanting containers without Kubernetes complexity
- EKS provides Kubernetes portability; best for multi-cloud strategies and teams with Kubernetes expertise
- Lambda is ideal for event-driven, short-running workloads; not suitable for long processes or stateful operations
- Multi-account AWS organizations enforce guardrails via SCPs and simplify billing tracking
AWS Onboarding Checklist
# 1. Create organization and enable SCPs
aws organizations create-organization
aws organizations enable-service-control-policy --service-principal ALL
# 2. Set up VPC for production
aws ec2 create-vpc --cidr-block 10.0.0.0/16
aws ec2 create-subnet --vpc-id vpc-xxx --cidr-block 10.0.1.0/24
# 3. Create ECS cluster with Fargate
aws ecs create-cluster --cluster-name production --capacity-providers FARGATE
# 4. Set up CloudWatch alarms for critical metrics
aws cloudwatch put-metric-alarm \
--alarm-name EC2-High-CPU \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--threshold 80 \
--period 300 \
--evaluation-periods 1
# 5. Enable S3 versioning on artifact bucket
aws s3api put-bucket-versioning \
--bucket my-artifacts \
--versioning-configuration Status=Enabled Category
Related Posts
AWS Data Services: Kinesis, Glue, Redshift, and S3
Guide to AWS data services for building data pipelines. Compare Kinesis vs Kafka, use Glue for ETL, query with Athena, and design S3 data lakes.
Data Migration: Strategies and Patterns for Moving Data
Learn proven strategies for migrating data between systems with minimal downtime. Covers bulk migration, CDC patterns, validation, and rollback.
Serverless Data Processing: Building Elastic Pipelines
Build scalable data pipelines using serverless services. Learn how AWS Lambda, Azure Functions, and Cloud Functions integrate for cost-effective processing.