AWS Core Services for DevOps: EC2, ECS, EKS, S3, Lambda
Navigate essential AWS services for DevOps workloads—compute (EC2, ECS, EKS), storage (S3), serverless (Lambda), and foundational networking.
AWS Core Services for DevOps: EC2, ECS, EKS, S3, Lambda
AWS forms the backbone of many enterprise cloud strategies. Understanding its core services means understanding how compute, storage, networking, and serverless components fit together. This post covers the essential services for deploying and operating applications on AWS.
The services covered here appear in virtually every AWS architecture. Even if you plan to use higher-level services, knowing how the underlying components work helps you make better architectural decisions and debug problems when they arise.
When to Use
EC2 vs. ECS vs. EKS vs. Lambda
Choose EC2 when you need full control over the operating system, require specific hardware configurations, or run legacy applications that cannot be containerized. EC2 gives you the most flexibility at the cost of the most operational overhead.
Choose ECS when you want container orchestration without the complexity of Kubernetes. ECS integrates tightly with AWS services like ALB, CloudWatch, and IAM, making it a natural fit for teams already invested in AWS. Use Fargate launch type when you want serverless containers—AWS manages the EC2 fleet for you.
Choose EKS when your team knows Kubernetes and wants portability across cloud providers, or when you need Kubernetes-specific features like custom controllers, complex pod scheduling, or a broad ecosystem of third-party tools.
Choose Lambda when your workload is event-driven, short-running, or bursty. Lambda handles scaling automatically and charges only for execution time. If your function runs for hours continuously, EC2 or ECS is likely cheaper.
S3 Storage Class Selection
Use S3 Standard for frequently accessed data—hot storage for active workloads. Use S3 IA (Infrequent Access) for data that is accessed less than once per month but needs rapid access when needed. Use S3 Glacier for archival data that you need to retain but rarely access, with retrieval times of minutes to hours depending on the tier.
AWS Multi-Account Architecture
AWS resources are deployed to specific geographic regions, and regions are independent of each other. Each region has multiple availability zones (AZs)—physically separate data centers with independent power, networking, and cooling. Deploying across multiple AZs protects against single-datacenter failures.
# List available regions
aws ec2 describe-regions --output table
# Get current region
aws configure get region
Account structure shapes your AWS environment. Organizations use consolidated billing to manage multiple accounts under a single payer. Common patterns include separate accounts per environment (dev, staging, production), per team, or per application domain.
Organization
├── Management Account (billing, SCPs)
├── Security Account (GuardDuty, Security Hub)
├── Dev Account
├── Staging Account
└── Production Account
Service Control Policies (SCPs) at the organization level restrict what can be done in member accounts. This enforces guardrails without managing IAM in every account.
flowchart TD
A[AWS Organization] --> B[Management Account]
A --> C[Security Account]
A --> D[Dev Account]
A --> E[Staging Account]
A --> F[Production Account]
C --> G[GuardDuty]
C --> H[Security Hub]
D --> I[Dev VPC]
E --> J[Staging VPC]
F --> K[Production VPC]
K --> L[ALB]
L --> M[EKS Cluster]
M --> N[ECS Tasks]
N --> O[S3 Artifacts]
EC2 Instance Types and ASGs
EC2 provides virtual machines in the cloud. Instance types determine the CPU, memory, storage, and networking capacity. The naming pattern is family, generation, and size—for example, t3.micro is a burstable general purpose instance, third generation, micro size.
# Launch an EC2 instance
aws ec2 run-instances \
--image-id ami-0c55b159cbfafe1f0 \
--instance-type t3.micro \
--key-name my-key-pair \
--security-group-ids sg-0123456789abcdef0 \
--subnet-id subnet-0123456789abcdef0
# Describe instance status
aws ec2 describe-instance-status --instance-ids i-0abcdef1234567890
Auto Scaling Groups (ASGs) automatically adjust capacity based on demand. You define minimum, maximum, and desired capacity, along with scaling policies that trigger adjustments based on metrics like CPU utilization or request count.
# ASG CloudFormation snippet
AutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
MinSize: 2
MaxSize: 10
DesiredCapacity: 2
VPCZoneIdentifier:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
LaunchConfigurationName: !Ref LaunchConfig
TargetGroupARNs:
- !Ref TargetGroup
HealthCheckType: ELB
HealthCheckGracePeriod: 300
ASGs work with Elastic Load Balancers to distribute traffic across healthy instances. The load balancer performs health checks and removes unhealthy instances from the rotation automatically.
ECS Task Definitions and Services
Amazon Elastic Container Service (ECS) manages Docker containers on a cluster of EC2 instances or using AWS Fargate serverless compute. Task definitions describe what containers to run and how much resources they need.
{
"family": "webapp",
"containerDefinitions": [
{
"name": "webapp",
"image": "123456789.dkr.ecr.us-east-1.amazonaws.com/webapp:latest",
"memory": 512,
"cpu": 256,
"essential": true,
"portMappings": [
{
"containerPort": 8080,
"protocol": "tcp"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/webapp",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
}
]
}
An ECS service maintains a desired count of task instances and automatically replaces failed tasks. It integrates with Application Load Balancers for traffic distribution and Auto Scaling for dynamic capacity adjustment.
# Register a new task definition revision
aws ecs register-task-definition --cli-input-json file://task-definition.json
# Update service to use new revision
aws ecs update-service \
--cluster production \
--service webapp \
--task-definition webapp:2
Fargate removes the need to manage EC2 instances for container workloads. You specify CPU and memory requirements, and AWS handles the underlying infrastructure. This simplifies operations at the cost of less granular control over the compute environment.
EKS Cluster Management Basics
Amazon Elastic Kubernetes Service (EKS) provides a managed Kubernetes control plane. AWS handles the master nodes; you manage the worker nodes and workloads.
# Create an EKS cluster
aws eks create-cluster \
--name production \
--role-arn arn:aws:iam::123456789:role/eks-cluster-role \
--resources-vpc-config subnetIds=subnet-0123456789abcdef0,subnet-0123456789abcdef1,securityGroupIds=sg-0123456789abcdef0
# Update kubeconfig
aws eks update-kubeconfig --name production
# Verify cluster access
kubectl get svc
EKS manages the Kubernetes control plane across multiple AZs for high availability. Worker nodes join the cluster via a node group, which can be managed by AWS (EKS Managed Node Groups) or self-managed.
# Node group configuration
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: production
region: us-east-1
managedNodeGroups:
- name: compute
instanceType: t3.medium
desiredCapacity: 3
minSize: 2
maxSize: 10
volumeSize: 50
ssh:
allow: true
Kubernetes deployments, services, and ingresses work the same on EKS as on any Kubernetes cluster. The main difference is how you configure IAM roles for service accounts (IRSA) for workload authentication to AWS services.
S3 for Artifact Storage
Amazon S3 stores objects in buckets. For DevOps, S3 typically holds build artifacts, deployment packages, and infrastructure state. S3 integrates with everything on AWS through IAM policies and resource-based bucket policies.
# Create a bucket for artifacts
aws s3 mb s3://my-app-artifacts --region us-east-1
# Upload a build artifact
aws s3 cp ./dist/app.tar.gz s3://my-app-artifacts/prod/
# List bucket contents
aws s3 ls s3://my-app-artifacts/prod/
# Enable versioning for artifact history
aws s3api put-bucket-versioning \
--bucket my-app-artifacts \
--versioning-configuration Status=Enabled
Lifecycle policies automate archival and deletion. Move old artifacts to cheaper storage classes automatically, or delete artifacts older than a retention period.
{
"Rules": [
{
"ID": "ArchiveOldArtifacts",
"Status": "Enabled",
"Filter": {
"Prefix": "prod/"
},
"Transitions": [
{
"Days": 30,
"StorageClass": "GLACIER"
}
],
"Expiration": {
"Days": 365
}
}
]
}
Lambda for Serverless Workloads
AWS Lambda runs code in response to events without provisioning servers. You pay only for the compute time consumed—billed in milliseconds. Lambda is ideal for event-driven tasks, API backends, and background processing.
// Lambda handler for processing S3 uploads
exports.handler = async (event) => {
const s3Event = event.Records[0].s3;
const bucket = s3Event.bucket.name;
const key = decodeURIComponent(s3Event.object.key.replace(/\+/g, " "));
console.log(`Processing file: ${bucket}/${key}`);
// Process the file...
const result = await processUpload(bucket, key);
return {
statusCode: 200,
body: JSON.stringify({ result }),
};
};
Lambda functions run in a VPC by default with access to AWS services and the internet. To access VPC resources like RDS databases, configure the function with VPC subnet and security group attachments.
# Create a Lambda function
aws lambda create-function \
--function-name my-processor \
--runtime nodejs20.x \
--role arn:aws:iam::123456789:role/lambda-execution-role \
--handler index.handler \
--zip-file fileb://function.zip \
--vpc-config SubnetIds=subnet-0123456789abcdef0,SecurityGroupIds=sg-0123456789abcdef0
For more on managing AWS costs, see our post on Cost Optimization which covers EC2, Lambda, and S3 cost optimization strategies.
For more on securing AWS workloads, see Cloud Security for IAM best practices, VPC design, and encryption patterns, and Network Security for security groups, NACLs, and VPC endpoint configuration.
Compute Service Trade-offs
| Scenario | EC2 | ECS/Fargate | EKS | Lambda |
|---|---|---|---|---|
| Full OS control needed | Yes | No | No | No |
| Serverless containers | No | Fargate launch type | No | No |
| Kubernetes ecosystem | No | No | Yes | No |
| Pay-per-second billing | No (hourly) | Yes | No | Yes (100ms) |
| Cold start latency | None | Seconds | Seconds | Seconds to minutes |
| Long-running workloads | Best choice | Good | Good | Poor (15 min max) |
| Stateful workloads | Best choice | Limited | Good | No |
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| ASG fails to scale due to ELB health check misconfiguration | Traffic routed to unhealthy instances, requests fail | Use ELB health check type, test scale-in manually |
| ECS task stuck in PENDING due to insufficient resources | Service capacity drops, requests queued or dropped | Set task completion timeouts, monitor pending count |
| EKS node group upgrade fails midway | Pods evicted before new nodes ready, service disruption | Use surge unavailablity settings, upgrade one node at a time |
| S3 bucket policy denies access unexpectedly | Application cannot read/write artifacts, deployments fail | Use IAM access analyzer, test bucket policies in dev first |
| Lambda VPC config causes cold start timeouts | Requests time out during scale-up | Pre-provision connections outside VPC handler, use provisioned concurrency |
AWS Observability Hooks
EC2 and ASG monitoring:
# Get EC2 instance metrics
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-0abcdef1234567890 \
--start-time 2026-03-24T00:00:00 \
--end-time 2026-03-25T00:00:00 \
--period 3600 \
--statistics Average
# Check ASG health status
aws autoscaling describe-auto-scaling-groups \
--auto-scaling-group-names my-asg \
--query 'AutoScalingGroups[0].Instances[*].[InstanceId,HealthStatus,LifeCycleState]'
ECS monitoring:
# Check service health and running task count
aws ecs describe-services \
--cluster production \
--services webapp \
--query 'services[0].{runningCount:runningCount,desiredCount:desiredCount,pendingCount:pendingCount}'
EKS monitoring:
# Check node health and pod distribution
kubectl get nodes -o wide
kubectl get pods -o wide --all-namespaces | grep -v Running
# Get cluster control plane health
aws eks describe-cluster \
--name production \
--query 'cluster.{status:status,version:version,endpoint:endpoint}'
Key CloudWatch metrics to alert on:
| Service | Metric | Alert Threshold |
|---|---|---|
| EC2 | CPUUtilization | > 80% for 5 minutes |
| EC2 | StatusCheckFailed | any for 2 minutes |
| ASG | CPUUtilization | > 75% for 3 minutes |
| ECS | CPUUtilization | > 85% for 3 minutes |
| ECS | RunningTaskCount | < desired for 2 minutes |
| Lambda | Errors | > 0 for 5 minutes |
| Lambda | Duration | > 3000ms p99 |
| S3 | BucketSizeBytes | unexpected change |
Common Anti-Patterns
Using the default VPC. The default VPC has permissive security groups and is shared across all accounts in a region. Production workloads should use dedicated VPCs with explicit networking controls.
Attaching IAM policies directly to users instead of roles. Direct user policies create credential management nightmares and are harder to audit. Always use IAM roles for EC2, Lambda, and other compute services.
Not configuring ASG health checks properly. If your health check is too lenient, unhealthy instances stay in service. If it is too strict, instances get replaced during legitimate load spikes. Match health check type to your application needs.
Storing secrets in S3 object metadata or user data. S3 object metadata is not encrypted by default and appears in CloudTrail logs. Use AWS Secrets Manager or Systems Manager Parameter Store instead.
Using Lambda VPC config without considering cold starts. VPC-enabled Lambda functions must establish an ENI connection before executing, adding 10-30 seconds to cold starts. For latency-sensitive APIs, pre-provision connections or use Lambda outside VPC for initial request handling.
Capacity Estimation and Benchmark Data
Use these numbers for initial capacity planning. Actual performance varies by workload characteristics.
EC2 Instance Type Families
| Family | Best For | Instance Types | Network Performance |
|---|---|---|---|
| t3 | Burstable workloads, dev/test | t3.micro → t3.2xlarge | Up to 5 Gbps |
| m5 | General purpose, web servers | m5.large → m5.24xlarge | Up to 100 Gbps (xlarge+) |
| m6i | General purpose (latest gen) | m6i.large → m6i.32xlarge | Up to 100 Gbps |
| c5 | Compute optimized, batch processing | c5.large → c5.24xlarge | Up to 100 Gbps |
| c6i | Compute optimized (latest gen) | c6i.large → c6i.32xlarge | Up to 100 Gbps |
| r5 | Memory optimized, databases | r5.large → r5.24xlarge | Up to 100 Gbps |
| r6i | Memory optimized (latest gen) | r6i.large → r6i.32xlarge | Up to 100 Gbps |
Lambda Performance Parameters
| Parameter | Value | Notes |
|---|---|---|
| Cold start (no VPC) | 100-500ms | Depends on runtime and package size |
| Cold start (with VPC) | 1-10 seconds | ENI attachment is the bottleneck |
| Provisioned concurrency | ~50ms | Eliminates cold starts for warmed functions |
| Max execution duration | 900 seconds (15 min) | Configure timeout based on workload |
| Default memory | 128 MB | Increase memory to boost CPU proportionally |
| Max concurrent executions | 1,000 per region | Request limit increase if needed |
S3 Performance Targets
| Metric | Value |
|---|---|
| PUT/LIST/DELETE latency | 100-200ms (p99) |
| GET latency | 60-100ms (p99) |
| Max request rate per prefix | 3,500 PUT/COPY/POST/LIST, 5,500 GET/HEAD per second |
| Multi-object delete | Up to 1,000 keys per request |
| Transfer acceleration | Adds 20-30% on upload speed for distant regions |
Service Limits for Planning
| Service | Default Limit | Typical Increase Request |
|---|---|---|
| Lambda concurrent executions | 1,000 per region | AWS Support |
| API Gateway requests per second | 10,000 per region | AWS Support |
| EBS volumes per account | 5,000 | AWS Support |
| VPCs per region | 5 | AWS Support |
| ENIs per instance (varies by type) | 3-15 | Instance type dependent |
Quick Recap
Key Takeaways
- EC2 gives full control but maximum operational burden; use for legacy workloads and specific hardware needs
- ECS with Fargate removes EC2 management; best for teams wanting containers without Kubernetes complexity
- EKS provides Kubernetes portability; best for multi-cloud strategies and teams with Kubernetes expertise
- Lambda is ideal for event-driven, short-running workloads; not suitable for long processes or stateful operations
- Multi-account AWS organizations enforce guardrails via SCPs and simplify billing tracking
AWS Onboarding Checklist
# 1. Create organization and enable SCPs
aws organizations create-organization
aws organizations enable-service-control-policy --service-principal ALL
# 2. Set up VPC for production
aws ec2 create-vpc --cidr-block 10.0.0.0/16
aws ec2 create-subnet --vpc-id vpc-xxx --cidr-block 10.0.1.0/24
# 3. Create ECS cluster with Fargate
aws ecs create-cluster --cluster-name production --capacity-providers FARGATE
# 4. Set up CloudWatch alarms for critical metrics
aws cloudwatch put-metric-alarm \
--alarm-name EC2-High-CPU \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--threshold 80 \
--period 300 \
--evaluation-periods 1
# 5. Enable S3 versioning on artifact bucket
aws s3api put-bucket-versioning \
--bucket my-artifacts \
--versioning-configuration Status=Enabled
Conclusion
AWS offers a comprehensive set of services that cover virtually any infrastructure need. EC2 provides raw compute with full control. ECS and EKS serve container workloads at different abstraction levels. S3 handles artifact and data storage. Lambda runs event-driven code without servers.
Understanding these core services gives you the foundation to build anything on AWS. Start with the service that matches your workload type, then expand to other services as your needs evolve.
Category
Related Posts
AWS Data Services: Kinesis, Glue, Redshift, and S3
Guide to AWS data services for building data pipelines. Compare Kinesis vs Kafka, use Glue for ETL, query with Athena, and design S3 data lakes.
Data Migration: Strategies and Patterns for Moving Data
Learn proven strategies for migrating data between systems with minimal downtime. Covers bulk migration, CDC patterns, validation, and rollback.
Serverless Data Processing: Building Elastic Pipelines
Build scalable data pipelines using serverless services. Learn how AWS Lambda, Azure Functions, and Cloud Functions integrate for cost-effective processing.