AWS Core Services for DevOps: EC2, ECS, EKS, S3, Lambda

Navigate essential AWS services for DevOps workloads—compute (EC2, ECS, EKS), storage (S3), serverless (Lambda), and foundational networking.

published: reading time: 14 min read

AWS Core Services for DevOps: EC2, ECS, EKS, S3, Lambda

AWS forms the backbone of many enterprise cloud strategies. Understanding its core services means understanding how compute, storage, networking, and serverless components fit together. This post covers the essential services for deploying and operating applications on AWS.

The services covered here appear in virtually every AWS architecture. Even if you plan to use higher-level services, knowing how the underlying components work helps you make better architectural decisions and debug problems when they arise.

When to Use

EC2 vs. ECS vs. EKS vs. Lambda

Choose EC2 when you need full control over the operating system, require specific hardware configurations, or run legacy applications that cannot be containerized. EC2 gives you the most flexibility at the cost of the most operational overhead.

Choose ECS when you want container orchestration without the complexity of Kubernetes. ECS integrates tightly with AWS services like ALB, CloudWatch, and IAM, making it a natural fit for teams already invested in AWS. Use Fargate launch type when you want serverless containers—AWS manages the EC2 fleet for you.

Choose EKS when your team knows Kubernetes and wants portability across cloud providers, or when you need Kubernetes-specific features like custom controllers, complex pod scheduling, or a broad ecosystem of third-party tools.

Choose Lambda when your workload is event-driven, short-running, or bursty. Lambda handles scaling automatically and charges only for execution time. If your function runs for hours continuously, EC2 or ECS is likely cheaper.

S3 Storage Class Selection

Use S3 Standard for frequently accessed data—hot storage for active workloads. Use S3 IA (Infrequent Access) for data that is accessed less than once per month but needs rapid access when needed. Use S3 Glacier for archival data that you need to retain but rarely access, with retrieval times of minutes to hours depending on the tier.

AWS Multi-Account Architecture

AWS resources are deployed to specific geographic regions, and regions are independent of each other. Each region has multiple availability zones (AZs)—physically separate data centers with independent power, networking, and cooling. Deploying across multiple AZs protects against single-datacenter failures.

# List available regions
aws ec2 describe-regions --output table

# Get current region
aws configure get region

Account structure shapes your AWS environment. Organizations use consolidated billing to manage multiple accounts under a single payer. Common patterns include separate accounts per environment (dev, staging, production), per team, or per application domain.

Organization
├── Management Account (billing, SCPs)
├── Security Account (GuardDuty, Security Hub)
├── Dev Account
├── Staging Account
└── Production Account

Service Control Policies (SCPs) at the organization level restrict what can be done in member accounts. This enforces guardrails without managing IAM in every account.

flowchart TD
    A[AWS Organization] --> B[Management Account]
    A --> C[Security Account]
    A --> D[Dev Account]
    A --> E[Staging Account]
    A --> F[Production Account]
    C --> G[GuardDuty]
    C --> H[Security Hub]
    D --> I[Dev VPC]
    E --> J[Staging VPC]
    F --> K[Production VPC]
    K --> L[ALB]
    L --> M[EKS Cluster]
    M --> N[ECS Tasks]
    N --> O[S3 Artifacts]

EC2 Instance Types and ASGs

EC2 provides virtual machines in the cloud. Instance types determine the CPU, memory, storage, and networking capacity. The naming pattern is family, generation, and size—for example, t3.micro is a burstable general purpose instance, third generation, micro size.

# Launch an EC2 instance
aws ec2 run-instances \
  --image-id ami-0c55b159cbfafe1f0 \
  --instance-type t3.micro \
  --key-name my-key-pair \
  --security-group-ids sg-0123456789abcdef0 \
  --subnet-id subnet-0123456789abcdef0

# Describe instance status
aws ec2 describe-instance-status --instance-ids i-0abcdef1234567890

Auto Scaling Groups (ASGs) automatically adjust capacity based on demand. You define minimum, maximum, and desired capacity, along with scaling policies that trigger adjustments based on metrics like CPU utilization or request count.

# ASG CloudFormation snippet
AutoScalingGroup:
  Type: AWS::AutoScaling::AutoScalingGroup
  Properties:
    MinSize: 2
    MaxSize: 10
    DesiredCapacity: 2
    VPCZoneIdentifier:
      - !Ref PrivateSubnet1
      - !Ref PrivateSubnet2
    LaunchConfigurationName: !Ref LaunchConfig
    TargetGroupARNs:
      - !Ref TargetGroup
    HealthCheckType: ELB
    HealthCheckGracePeriod: 300

ASGs work with Elastic Load Balancers to distribute traffic across healthy instances. The load balancer performs health checks and removes unhealthy instances from the rotation automatically.

ECS Task Definitions and Services

Amazon Elastic Container Service (ECS) manages Docker containers on a cluster of EC2 instances or using AWS Fargate serverless compute. Task definitions describe what containers to run and how much resources they need.

{
  "family": "webapp",
  "containerDefinitions": [
    {
      "name": "webapp",
      "image": "123456789.dkr.ecr.us-east-1.amazonaws.com/webapp:latest",
      "memory": 512,
      "cpu": 256,
      "essential": true,
      "portMappings": [
        {
          "containerPort": 8080,
          "protocol": "tcp"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/webapp",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      }
    }
  ]
}

An ECS service maintains a desired count of task instances and automatically replaces failed tasks. It integrates with Application Load Balancers for traffic distribution and Auto Scaling for dynamic capacity adjustment.

# Register a new task definition revision
aws ecs register-task-definition --cli-input-json file://task-definition.json

# Update service to use new revision
aws ecs update-service \
  --cluster production \
  --service webapp \
  --task-definition webapp:2

Fargate removes the need to manage EC2 instances for container workloads. You specify CPU and memory requirements, and AWS handles the underlying infrastructure. This simplifies operations at the cost of less granular control over the compute environment.

EKS Cluster Management Basics

Amazon Elastic Kubernetes Service (EKS) provides a managed Kubernetes control plane. AWS handles the master nodes; you manage the worker nodes and workloads.

# Create an EKS cluster
aws eks create-cluster \
  --name production \
  --role-arn arn:aws:iam::123456789:role/eks-cluster-role \
  --resources-vpc-config subnetIds=subnet-0123456789abcdef0,subnet-0123456789abcdef1,securityGroupIds=sg-0123456789abcdef0

# Update kubeconfig
aws eks update-kubeconfig --name production

# Verify cluster access
kubectl get svc

EKS manages the Kubernetes control plane across multiple AZs for high availability. Worker nodes join the cluster via a node group, which can be managed by AWS (EKS Managed Node Groups) or self-managed.

# Node group configuration
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: production
  region: us-east-1

managedNodeGroups:
  - name: compute
    instanceType: t3.medium
    desiredCapacity: 3
    minSize: 2
    maxSize: 10
    volumeSize: 50
    ssh:
      allow: true

Kubernetes deployments, services, and ingresses work the same on EKS as on any Kubernetes cluster. The main difference is how you configure IAM roles for service accounts (IRSA) for workload authentication to AWS services.

S3 for Artifact Storage

Amazon S3 stores objects in buckets. For DevOps, S3 typically holds build artifacts, deployment packages, and infrastructure state. S3 integrates with everything on AWS through IAM policies and resource-based bucket policies.

# Create a bucket for artifacts
aws s3 mb s3://my-app-artifacts --region us-east-1

# Upload a build artifact
aws s3 cp ./dist/app.tar.gz s3://my-app-artifacts/prod/

# List bucket contents
aws s3 ls s3://my-app-artifacts/prod/

# Enable versioning for artifact history
aws s3api put-bucket-versioning \
  --bucket my-app-artifacts \
  --versioning-configuration Status=Enabled

Lifecycle policies automate archival and deletion. Move old artifacts to cheaper storage classes automatically, or delete artifacts older than a retention period.

{
  "Rules": [
    {
      "ID": "ArchiveOldArtifacts",
      "Status": "Enabled",
      "Filter": {
        "Prefix": "prod/"
      },
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "GLACIER"
        }
      ],
      "Expiration": {
        "Days": 365
      }
    }
  ]
}

Lambda for Serverless Workloads

AWS Lambda runs code in response to events without provisioning servers. You pay only for the compute time consumed—billed in milliseconds. Lambda is ideal for event-driven tasks, API backends, and background processing.

// Lambda handler for processing S3 uploads
exports.handler = async (event) => {
  const s3Event = event.Records[0].s3;
  const bucket = s3Event.bucket.name;
  const key = decodeURIComponent(s3Event.object.key.replace(/\+/g, " "));

  console.log(`Processing file: ${bucket}/${key}`);

  // Process the file...
  const result = await processUpload(bucket, key);

  return {
    statusCode: 200,
    body: JSON.stringify({ result }),
  };
};

Lambda functions run in a VPC by default with access to AWS services and the internet. To access VPC resources like RDS databases, configure the function with VPC subnet and security group attachments.

# Create a Lambda function
aws lambda create-function \
  --function-name my-processor \
  --runtime nodejs20.x \
  --role arn:aws:iam::123456789:role/lambda-execution-role \
  --handler index.handler \
  --zip-file fileb://function.zip \
  --vpc-config SubnetIds=subnet-0123456789abcdef0,SecurityGroupIds=sg-0123456789abcdef0

For more on managing AWS costs, see our post on Cost Optimization which covers EC2, Lambda, and S3 cost optimization strategies.

For more on securing AWS workloads, see Cloud Security for IAM best practices, VPC design, and encryption patterns, and Network Security for security groups, NACLs, and VPC endpoint configuration.

Compute Service Trade-offs

ScenarioEC2ECS/FargateEKSLambda
Full OS control neededYesNoNoNo
Serverless containersNoFargate launch typeNoNo
Kubernetes ecosystemNoNoYesNo
Pay-per-second billingNo (hourly)YesNoYes (100ms)
Cold start latencyNoneSecondsSecondsSeconds to minutes
Long-running workloadsBest choiceGoodGoodPoor (15 min max)
Stateful workloadsBest choiceLimitedGoodNo

Production Failure Scenarios

FailureImpactMitigation
ASG fails to scale due to ELB health check misconfigurationTraffic routed to unhealthy instances, requests failUse ELB health check type, test scale-in manually
ECS task stuck in PENDING due to insufficient resourcesService capacity drops, requests queued or droppedSet task completion timeouts, monitor pending count
EKS node group upgrade fails midwayPods evicted before new nodes ready, service disruptionUse surge unavailablity settings, upgrade one node at a time
S3 bucket policy denies access unexpectedlyApplication cannot read/write artifacts, deployments failUse IAM access analyzer, test bucket policies in dev first
Lambda VPC config causes cold start timeoutsRequests time out during scale-upPre-provision connections outside VPC handler, use provisioned concurrency

AWS Observability Hooks

EC2 and ASG monitoring:

# Get EC2 instance metrics
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-0abcdef1234567890 \
  --start-time 2026-03-24T00:00:00 \
  --end-time 2026-03-25T00:00:00 \
  --period 3600 \
  --statistics Average

# Check ASG health status
aws autoscaling describe-auto-scaling-groups \
  --auto-scaling-group-names my-asg \
  --query 'AutoScalingGroups[0].Instances[*].[InstanceId,HealthStatus,LifeCycleState]'

ECS monitoring:

# Check service health and running task count
aws ecs describe-services \
  --cluster production \
  --services webapp \
  --query 'services[0].{runningCount:runningCount,desiredCount:desiredCount,pendingCount:pendingCount}'

EKS monitoring:

# Check node health and pod distribution
kubectl get nodes -o wide
kubectl get pods -o wide --all-namespaces | grep -v Running

# Get cluster control plane health
aws eks describe-cluster \
  --name production \
  --query 'cluster.{status:status,version:version,endpoint:endpoint}'

Key CloudWatch metrics to alert on:

ServiceMetricAlert Threshold
EC2CPUUtilization> 80% for 5 minutes
EC2StatusCheckFailedany for 2 minutes
ASGCPUUtilization> 75% for 3 minutes
ECSCPUUtilization> 85% for 3 minutes
ECSRunningTaskCount< desired for 2 minutes
LambdaErrors> 0 for 5 minutes
LambdaDuration> 3000ms p99
S3BucketSizeBytesunexpected change

Common Anti-Patterns

Using the default VPC. The default VPC has permissive security groups and is shared across all accounts in a region. Production workloads should use dedicated VPCs with explicit networking controls.

Attaching IAM policies directly to users instead of roles. Direct user policies create credential management nightmares and are harder to audit. Always use IAM roles for EC2, Lambda, and other compute services.

Not configuring ASG health checks properly. If your health check is too lenient, unhealthy instances stay in service. If it is too strict, instances get replaced during legitimate load spikes. Match health check type to your application needs.

Storing secrets in S3 object metadata or user data. S3 object metadata is not encrypted by default and appears in CloudTrail logs. Use AWS Secrets Manager or Systems Manager Parameter Store instead.

Using Lambda VPC config without considering cold starts. VPC-enabled Lambda functions must establish an ENI connection before executing, adding 10-30 seconds to cold starts. For latency-sensitive APIs, pre-provision connections or use Lambda outside VPC for initial request handling.

Capacity Estimation and Benchmark Data

Use these numbers for initial capacity planning. Actual performance varies by workload characteristics.

EC2 Instance Type Families

FamilyBest ForInstance TypesNetwork Performance
t3Burstable workloads, dev/testt3.micro → t3.2xlargeUp to 5 Gbps
m5General purpose, web serversm5.large → m5.24xlargeUp to 100 Gbps (xlarge+)
m6iGeneral purpose (latest gen)m6i.large → m6i.32xlargeUp to 100 Gbps
c5Compute optimized, batch processingc5.large → c5.24xlargeUp to 100 Gbps
c6iCompute optimized (latest gen)c6i.large → c6i.32xlargeUp to 100 Gbps
r5Memory optimized, databasesr5.large → r5.24xlargeUp to 100 Gbps
r6iMemory optimized (latest gen)r6i.large → r6i.32xlargeUp to 100 Gbps

Lambda Performance Parameters

ParameterValueNotes
Cold start (no VPC)100-500msDepends on runtime and package size
Cold start (with VPC)1-10 secondsENI attachment is the bottleneck
Provisioned concurrency~50msEliminates cold starts for warmed functions
Max execution duration900 seconds (15 min)Configure timeout based on workload
Default memory128 MBIncrease memory to boost CPU proportionally
Max concurrent executions1,000 per regionRequest limit increase if needed

S3 Performance Targets

MetricValue
PUT/LIST/DELETE latency100-200ms (p99)
GET latency60-100ms (p99)
Max request rate per prefix3,500 PUT/COPY/POST/LIST, 5,500 GET/HEAD per second
Multi-object deleteUp to 1,000 keys per request
Transfer accelerationAdds 20-30% on upload speed for distant regions

Service Limits for Planning

ServiceDefault LimitTypical Increase Request
Lambda concurrent executions1,000 per regionAWS Support
API Gateway requests per second10,000 per regionAWS Support
EBS volumes per account5,000AWS Support
VPCs per region5AWS Support
ENIs per instance (varies by type)3-15Instance type dependent

Quick Recap

Key Takeaways

  • EC2 gives full control but maximum operational burden; use for legacy workloads and specific hardware needs
  • ECS with Fargate removes EC2 management; best for teams wanting containers without Kubernetes complexity
  • EKS provides Kubernetes portability; best for multi-cloud strategies and teams with Kubernetes expertise
  • Lambda is ideal for event-driven, short-running workloads; not suitable for long processes or stateful operations
  • Multi-account AWS organizations enforce guardrails via SCPs and simplify billing tracking

AWS Onboarding Checklist

# 1. Create organization and enable SCPs
aws organizations create-organization
aws organizations enable-service-control-policy --service-principal ALL

# 2. Set up VPC for production
aws ec2 create-vpc --cidr-block 10.0.0.0/16
aws ec2 create-subnet --vpc-id vpc-xxx --cidr-block 10.0.1.0/24

# 3. Create ECS cluster with Fargate
aws ecs create-cluster --cluster-name production --capacity-providers FARGATE

# 4. Set up CloudWatch alarms for critical metrics
aws cloudwatch put-metric-alarm \
  --alarm-name EC2-High-CPU \
  --metric-name CPUUtilization \
  --namespace AWS/EC2 \
  --threshold 80 \
  --period 300 \
  --evaluation-periods 1

# 5. Enable S3 versioning on artifact bucket
aws s3api put-bucket-versioning \
  --bucket my-artifacts \
  --versioning-configuration Status=Enabled

Conclusion

AWS offers a comprehensive set of services that cover virtually any infrastructure need. EC2 provides raw compute with full control. ECS and EKS serve container workloads at different abstraction levels. S3 handles artifact and data storage. Lambda runs event-driven code without servers.

Understanding these core services gives you the foundation to build anything on AWS. Start with the service that matches your workload type, then expand to other services as your needs evolve.

Category

Related Posts

AWS Data Services: Kinesis, Glue, Redshift, and S3

Guide to AWS data services for building data pipelines. Compare Kinesis vs Kafka, use Glue for ETL, query with Athena, and design S3 data lakes.

#data-engineering #aws #kinesis

Data Migration: Strategies and Patterns for Moving Data

Learn proven strategies for migrating data between systems with minimal downtime. Covers bulk migration, CDC patterns, validation, and rollback.

#data-engineering #data-migration #cdc

Serverless Data Processing: Building Elastic Pipelines

Build scalable data pipelines using serverless services. Learn how AWS Lambda, Azure Functions, and Cloud Functions integrate for cost-effective processing.

#data-engineering #serverless #lambda