AWS Core Services for DevOps: EC2, ECS, EKS, S3, Lambda

Navigate essential AWS services for DevOps workloads—compute (EC2, ECS, EKS), storage (S3), serverless (Lambda), and foundational networking.

published: March 25, 2026 reading time: 25 min read author: GeekWorkBench updated: June 17, 2026

Quick Summary

AWS sits at the backbone of most enterprise cloud strategies, and knowing how EC2, ECS, EKS, S3, and Lambda work together shapes every architectural decision you make. This guide walks through instance selection, ECS task definitions, EKS cluster management with IRSA for workload identity, and S3 lifecycle policies for artifact storage. You will learn how to choose between Fargate and EC2 launch types, when Lambda cold starts matter, and which CloudWatch metrics warrant alerts. By the end you will navigate AWS services confidently and avoid the common missteps that catch teams moving from theory to production.

AWS Core Services for DevOps: EC2, ECS, EKS, S3, Lambda

AWS forms the backbone of many enterprise cloud strategies. Understanding its core services means understanding how compute, storage, networking, and serverless components fit together. This post covers the essential services for deploying and operating applications on AWS.

The services covered here appear in virtually every AWS architecture. Even if you plan to use higher-level services, knowing how the underlying components work helps you make better architectural decisions and debug problems when they arise.

Introduction

EC2 vs. ECS vs. EKS vs. Lambda

Choose EC2 when you need full control over the operating system, require specific hardware configurations, or run legacy applications that cannot be containerized. EC2 gives you the most flexibility at the cost of the most operational overhead.

Choose ECS when you want container orchestration without the complexity of Kubernetes. ECS integrates tightly with AWS services like ALB, CloudWatch, and IAM, making it a natural fit for teams already invested in AWS. Use Fargate launch type when you want serverless containers—AWS manages the EC2 fleet for you.

Choose EKS when your team knows Kubernetes and wants portability across cloud providers, or when you need Kubernetes-specific features like custom controllers, complex pod scheduling, or a broad ecosystem of third-party tools.

Choose Lambda when your workload is event-driven, short-running, or bursty. Lambda handles scaling automatically and charges only for execution time. If your function runs for hours continuously, EC2 or ECS is likely cheaper.

S3 Storage Class Selection

The S3 storage class you pick changes your bill and how fast you can get your data back. S3 Standard costs the most per GB but retrieves in milliseconds. S3 IA cuts storage to roughly half the price but charges you every time you pull data out. S3 Glacier is the cheapest for long-term keeps, but depending on which tier you choose, retrieval takes anywhere from seconds to half a day.

Go with S3 Standard for anything your app reads regularly—build artifacts, user uploads, processed data in your pipeline. The per-GB price is higher, but there are no retrieval fees and no latency surprises.

Move stuff to S3 IA when it will probably sit unused for weeks or months but might get pulled back into active use without warning. Raw input lands in Standard, your processing job runs, the result moves to IA for reference later. That is a common pattern.

S3 Glacier is for compliance archives, legal holds, anything you are keeping for audits but do not expect to touch during normal work. If you will need it back occasionally, account for retrieval time when you pick a tier. Expedited retrievals take 1-5 minutes. Standard retrievals take 3-5 hours. Bulk retrievals take 5-12 hours.

S3 Intelligent-Tiering watches access patterns and moves objects to cheaper tiers automatically after 30 days of no reads. This is useful when you genuinely do not know how often something will be accessed and do not want to manage lifecycle rules by hand. The monitoring fee is small enough that it rarely costs more than it saves, even on data that does not tier down.

Storage Class	Storage Cost (us-east-1)	Retrieval Fee	Best For
Standard	~$0.023/GB	None	Active data, hot storage
Standard-IA	~$0.0125/GB	~$0.01/GB	Rarely accessed, needs quick retrieval
Glacier	~$0.004/GB	Varies by tier	Archival, compliance, legal holds
Intelligent-Tiering	$0.0125-$0.023/GB	None (auto-tiers)	Unknown or variable access patterns

AWS Multi-Account Architecture

AWS resources are deployed to specific geographic regions, and regions are independent of each other. Each region has multiple availability zones (AZs)—physically separate data centers with independent power, networking, and cooling. Deploying across multiple AZs protects against single-datacenter failures.

# List available regions
aws ec2 describe-regions --output table

# Get current region
aws configure get region

Account structure shapes your AWS environment. Organizations use consolidated billing to manage multiple accounts under a single payer. Common patterns include separate accounts per environment (dev, staging, production), per team, or per application domain.

Organization
├── Management Account (billing, SCPs)
├── Security Account (GuardDuty, Security Hub)
├── Dev Account
├── Staging Account
└── Production Account

Service Control Policies (SCPs) at the organization level restrict what can be done in member accounts. This enforces guardrails without managing IAM in every account.

flowchart TD
    A[AWS Organization] --> B[Management Account]
    A --> C[Security Account]
    A --> D[Dev Account]
    A --> E[Staging Account]
    A --> F[Production Account]
    C --> G[GuardDuty]
    C --> H[Security Hub]
    D --> I[Dev VPC]
    E --> J[Staging VPC]
    F --> K[Production VPC]
    K --> L[ALB]
    L --> M[EKS Cluster]
    M --> N[ECS Tasks]
    N --> O[S3 Artifacts]

EC2 Instance Types and ASGs

EC2 provides virtual machines in the cloud. Instance types determine the CPU, memory, storage, and networking capacity. The naming pattern is family, generation, and size—for example, t3.micro is a burstable general purpose instance, third generation, micro size.

# Launch an EC2 instance
aws ec2 run-instances \
  --image-id ami-0c55b159cbfafe1f0 \
  --instance-type t3.micro \
  --key-name my-key-pair \
  --security-group-ids sg-0123456789abcdef0 \
  --subnet-id subnet-0123456789abcdef0

# Describe instance status
aws ec2 describe-instance-status --instance-ids i-0abcdef1234567890

Auto Scaling Groups (ASGs) automatically adjust capacity based on demand. You define minimum, maximum, and desired capacity, along with scaling policies that trigger adjustments based on metrics like CPU utilization or request count.

# ASG CloudFormation snippet
AutoScalingGroup:
  Type: AWS::AutoScaling::AutoScalingGroup
  Properties:
    MinSize: 2
    MaxSize: 10
    DesiredCapacity: 2
    VPCZoneIdentifier:
      - !Ref PrivateSubnet1
      - !Ref PrivateSubnet2
    LaunchConfigurationName: !Ref LaunchConfig
    TargetGroupARNs:
      - !Ref TargetGroup
    HealthCheckType: ELB
    HealthCheckGracePeriod: 300

ASGs work with Elastic Load Balancers to distribute traffic across healthy instances. The load balancer performs health checks and removes unhealthy instances from the rotation automatically.

ECS Task Definitions and Services

Amazon Elastic Container Service (ECS) manages Docker containers on a cluster of EC2 instances or using AWS Fargate serverless compute. Task definitions describe what containers to run and how much resources they need.

{
  "family": "webapp",
  "containerDefinitions": [
    {
      "name": "webapp",
      "image": "123456789.dkr.ecr.us-east-1.amazonaws.com/webapp:latest",
      "memory": 512,
      "cpu": 256,
      "essential": true,
      "portMappings": [
        {
          "containerPort": 8080,
          "protocol": "tcp"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/webapp",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      }
    }
  ]
}

An ECS service maintains a desired count of task instances and automatically replaces failed tasks. It integrates with Application Load Balancers for traffic distribution and Auto Scaling for dynamic capacity adjustment.

# Register a new task definition revision
aws ecs register-task-definition --cli-input-json file://task-definition.json

# Update service to use new revision
aws ecs update-service \
  --cluster production \
  --service webapp \
  --task-definition webapp:2

Fargate removes the need to manage EC2 instances for container workloads. You specify CPU and memory requirements, and AWS handles the underlying infrastructure. This simplifies operations at the cost of less granular control over the compute environment.

EKS Cluster Management Basics

Amazon Elastic Kubernetes Service (EKS) provides a managed Kubernetes control plane. AWS handles the master nodes; you manage the worker nodes and workloads.

# Create an EKS cluster
aws eks create-cluster \
  --name production \
  --role-arn arn:aws:iam::123456789:role/eks-cluster-role \
  --resources-vpc-config subnetIds=subnet-0123456789abcdef0,subnet-0123456789abcdef1,securityGroupIds=sg-0123456789abcdef0

# Update kubeconfig
aws eks update-kubeconfig --name production

# Verify cluster access
kubectl get svc

EKS manages the Kubernetes control plane across multiple AZs for high availability. Worker nodes join the cluster via a node group, which can be managed by AWS (EKS Managed Node Groups) or self-managed.

# Node group configuration
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: production
  region: us-east-1

managedNodeGroups:
  - name: compute
    instanceType: t3.medium
    desiredCapacity: 3
    minSize: 2
    maxSize: 10
    volumeSize: 50
    ssh:
      allow: true

Kubernetes deployments, services, and ingresses work the same on EKS as on any Kubernetes cluster. The main difference is how you configure IAM roles for service accounts (IRSA) for workload authentication to AWS services.

S3 for Artifact Storage

Amazon S3 stores objects in buckets. For DevOps, S3 typically holds build artifacts, deployment packages, and infrastructure state. S3 integrates with everything on AWS through IAM policies and resource-based bucket policies.

# Create a bucket for artifacts
aws s3 mb s3://my-app-artifacts --region us-east-1

# Upload a build artifact
aws s3 cp ./dist/app.tar.gz s3://my-app-artifacts/prod/

# List bucket contents
aws s3 ls s3://my-app-artifacts/prod/

# Enable versioning for artifact history
aws s3api put-bucket-versioning \
  --bucket my-app-artifacts \
  --versioning-configuration Status=Enabled

Lifecycle policies automate archival and deletion. Move old artifacts to cheaper storage classes automatically, or delete artifacts older than a retention period.

{
  "Rules": [
    {
      "ID": "ArchiveOldArtifacts",
      "Status": "Enabled",
      "Filter": {
        "Prefix": "prod/"
      },
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "GLACIER"
        }
      ],
      "Expiration": {
        "Days": 365
      }
    }
  ]
}

Lambda for Serverless Workloads

AWS Lambda runs code in response to events without provisioning servers. You pay only for the compute time consumed—billed in milliseconds. Lambda is ideal for event-driven tasks, API backends, and background processing.

// Lambda handler for processing S3 uploads
exports.handler = async (event) => {
  const s3Event = event.Records[0].s3;
  const bucket = s3Event.bucket.name;
  const key = decodeURIComponent(s3Event.object.key.replace(/\+/g, " "));

  console.log(`Processing file: ${bucket}/${key}`);

  // Process the file...
  const result = await processUpload(bucket, key);

  return {
    statusCode: 200,
    body: JSON.stringify({ result }),
  };
};

Lambda functions run in a VPC by default with access to AWS services and the internet. To access VPC resources like RDS databases, configure the function with VPC subnet and security group attachments.

# Create a Lambda function
aws lambda create-function \
  --function-name my-processor \
  --runtime nodejs20.x \
  --role arn:aws:iam::123456789:role/lambda-execution-role \
  --handler index.handler \
  --zip-file fileb://function.zip \
  --vpc-config SubnetIds=subnet-0123456789abcdef0,SecurityGroupIds=sg-0123456789abcdef0

For more on managing AWS costs, see our post on Cost Optimization which covers EC2, Lambda, and S3 cost optimization strategies.

For more on securing AWS workloads, see Cloud Security for IAM best practices, VPC design, and encryption patterns, and Network Security for security groups, NACLs, and VPC endpoint configuration.

Trade-off Analysis

Scenario	EC2	ECS/Fargate	EKS	Lambda
Full OS control needed	Yes	No	No	No
Serverless containers	No	Fargate launch type	No	No
Kubernetes ecosystem	No	No	Yes	No
Pay-per-second billing	No (hourly)	Yes	No	Yes (100ms)
Cold start latency	None	Seconds	Seconds	Seconds to minutes
Long-running workloads	Best choice	Good	Good	Poor (15 min max)
Stateful workloads	Best choice	Limited	Good	No

Production Failure Scenarios

Failure	Impact	Mitigation
ASG fails to scale due to ELB health check misconfiguration	Traffic routed to unhealthy instances, requests fail	Use ELB health check type, test scale-in manually
ECS task stuck in PENDING due to insufficient resources	Service capacity drops, requests queued or dropped	Set task completion timeouts, monitor pending count
EKS node group upgrade fails midway	Pods evicted before new nodes ready, service disruption	Use surge unavailablity settings, upgrade one node at a time
S3 bucket policy denies access unexpectedly	Application cannot read/write artifacts, deployments fail	Use IAM access analyzer, test bucket policies in dev first
Lambda VPC config causes cold start timeouts	Requests time out during scale-up	Pre-provision connections outside VPC handler, use provisioned concurrency

AWS Observability Hooks

EC2 and ASG monitoring:

# Get EC2 instance metrics
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-0abcdef1234567890 \
  --start-time 2026-03-24T00:00:00 \
  --end-time 2026-03-25T00:00:00 \
  --period 3600 \
  --statistics Average

# Check ASG health status
aws autoscaling describe-auto-scaling-groups \
  --auto-scaling-group-names my-asg \
  --query 'AutoScalingGroups[0].Instances[*].[InstanceId,HealthStatus,LifeCycleState]'

ECS monitoring:

# Check service health and running task count
aws ecs describe-services \
  --cluster production \
  --services webapp \
  --query 'services[0].{runningCount:runningCount,desiredCount:desiredCount,pendingCount:pendingCount}'

EKS monitoring:

# Check node health and pod distribution
kubectl get nodes -o wide
kubectl get pods -o wide --all-namespaces | grep -v Running

# Get cluster control plane health
aws eks describe-cluster \
  --name production \
  --query 'cluster.{status:status,version:version,endpoint:endpoint}'

Key CloudWatch metrics to alert on:

Service	Metric	Alert Threshold
EC2	CPUUtilization	> 80% for 5 minutes
EC2	StatusCheckFailed	any for 2 minutes
ASG	CPUUtilization	> 75% for 3 minutes
ECS	CPUUtilization	> 85% for 3 minutes
ECS	RunningTaskCount	< desired for 2 minutes
Lambda	Errors	> 0 for 5 minutes
Lambda	Duration	> 3000ms p99
S3	BucketSizeBytes	unexpected change

Common Pitfalls / Anti-Patterns

Using the default VPC. The default VPC has permissive security groups and is shared across all accounts in a region. Production workloads should use dedicated VPCs with explicit networking controls.

Attaching IAM policies directly to users instead of roles. Direct user policies create credential management nightmares and are harder to audit. Always use IAM roles for EC2, Lambda, and other compute services.

Not configuring ASG health checks properly. If your health check is too lenient, unhealthy instances stay in service. If it is too strict, instances get replaced during legitimate load spikes. Match health check type to your application needs.

Storing secrets in S3 object metadata or user data. S3 object metadata is not encrypted by default and appears in CloudTrail logs. Use AWS Secrets Manager or Systems Manager Parameter Store instead.

Using Lambda VPC config without considering cold starts. VPC-enabled Lambda functions must establish an ENI connection before executing, adding 10-30 seconds to cold starts. For latency-sensitive APIs, pre-provision connections or use Lambda outside VPC for initial request handling.

Capacity Estimation and Benchmark Data

Use these numbers for initial capacity planning. Actual performance varies by workload characteristics.

EC2 Instance Type Families

Family	Best For	Instance Types	Network Performance
t3	Burstable workloads, dev/test	t3.micro → t3.2xlarge	Up to 5 Gbps
m5	General purpose, web servers	m5.large → m5.24xlarge	Up to 100 Gbps (xlarge+)
m6i	General purpose (latest gen)	m6i.large → m6i.32xlarge	Up to 100 Gbps
c5	Compute optimized, batch processing	c5.large → c5.24xlarge	Up to 100 Gbps
c6i	Compute optimized (latest gen)	c6i.large → c6i.32xlarge	Up to 100 Gbps
r5	Memory optimized, databases	r5.large → r5.24xlarge	Up to 100 Gbps
r6i	Memory optimized (latest gen)	r6i.large → r6i.32xlarge	Up to 100 Gbps

Lambda Performance Parameters

Parameter	Value	Notes
Cold start (no VPC)	100-500ms	Depends on runtime and package size
Cold start (with VPC)	1-10 seconds	ENI attachment is the bottleneck
Provisioned concurrency	~50ms	Eliminates cold starts for warmed functions
Max execution duration	900 seconds (15 min)	Configure timeout based on workload
Default memory	128 MB	Increase memory to boost CPU proportionally
Max concurrent executions	1,000 per region	Request limit increase if needed

S3 Performance Targets

Metric	Value
PUT/LIST/DELETE latency	100-200ms (p99)
GET latency	60-100ms (p99)
Max request rate per prefix	3,500 PUT/COPY/POST/LIST, 5,500 GET/HEAD per second
Multi-object delete	Up to 1,000 keys per request
Transfer acceleration	Adds 20-30% on upload speed for distant regions

Service Limits for Planning

Service	Default Limit	Typical Increase Request
Lambda concurrent executions	1,000 per region	AWS Support
API Gateway requests per second	10,000 per region	AWS Support
EBS volumes per account	5,000	AWS Support
VPCs per region	5	AWS Support
ENIs per instance (varies by type)	3-15	Instance type dependent

Additional References

Cost Optimization - EC2, Lambda, and S3 cost optimization strategies
Cloud Security - IAM best practices, VPC design, and encryption patterns
Network Security - Security groups, NACLs, and VPC endpoint configuration
AWS Whitepapers - Official architecture guides
AWS Well-Architected Framework - Best practices for cloud workloads

Interview Questions

1. How does an Auto Scaling Group determine when to scale out and what metrics does it use?

What to cover:

ASGs respond to CloudWatch metrics: CPU utilization, memory, request count, custom metrics
Define min/max/desired capacity; ASG adjusts between min and max based on policies
Scaling policies: step scaling (add/remove instances in steps), target tracking (keep metric at value)
Health checks: ELB health checks mark unhealthy instances; ASG replaces them
Cooldown period prevents flapping; wait period before next scaling action

2. What is the difference between ECS with EC2 launch type and ECS with Fargate launch type?

What to cover:

EC2 launch type: you manage the EC2 fleet; more control over instance type and cost
Fargate: serverless; AWS manages the underlying nodes; you pay per task resource
Fargate removes SSH access and node-level customization
Fargate good for variable workloads; EC2 better for consistent high-throughput with reserved instances
Both use same task definitions and service scheduler; migration is straightforward

3. Walk through how you would set up IRSA (IAM Role for Service Accounts) for an EKS workload.

What to cover:

Create an IAM service account: aws iam create-service-account --name webapp
Annotate the Kubernetes service account with the IAM role ARN
Install AWS IAM authenticator; EKS uses it to map IAM roles to K8s service accounts
Pod automatically gets temporary credentials via OIDC token exchange
No need to store keys in secrets; credentials rotate automatically

4. How do you choose between S3 Standard, S3 IA, and S3 Glacier for artifact storage?

What to cover:

S3 Standard: frequently accessed (> once per month), immediate retrieval, highest storage cost
S3 IA: infrequent access (< once per month), lower storage cost, retrieval fees apply
S3 Glacier: archival, retrieval in minutes to hours, cheapest storage, access cost higher
Use lifecycle policies: move to IA after 30 days, Glacier after 90 days, delete after 365
Versioning + lifecycle = artifact history retained economically

5. What are the trade-offs between Lambda VPC config and Lambda outside VPC?

What to cover:

Lambda outside VPC: cold start 100-500ms, full AWS service access, internet access
Lambda inside VPC: cold start 1-10 seconds due to ENI attachment; no direct internet
VPC needed for: RDS, ElastiCache, private API Gateways, internal services
Solution: handler outside VPC for initial request, pre-provision connections, use provisioned concurrency
Consider: do you actually need VPC access or can you use AWS services directly?

6. How does multi-account AWS organization help with security and billing?

What to cover:

SCP (Service Control Policies) enforce guardrails at organization level across all accounts
Separate accounts per environment: dev/staging/prod isolate blast radius
Separate accounts per team or application domain for clean IAM boundaries
Consolidated billing: one payer account, track costs by account/tag
Security account aggregates GuardDuty, Security Hub findings centrally

7. What is the difference between EKS managed node groups and self-managed node groups?

What to cover:

Managed node groups: AWS handles node provisioning, updates, and termination
Managed: you specify instance type and count; AWS handles lifecycle
Self-managed: you create AMIs, manage kubelet, handle upgrades manually
Managed node groups support SSH with key pair if needed
Use managed for baseline; use self-managed when you need custom AMIs or specific kernel versions

8. How do you secure S3 bucket access? Walk through the options.

What to cover:

Block public access: bucket settings override bucket policies
IAM policies: grant access to specific buckets/prefixes per role
Bucket policies: JSON policies attached to bucket, can grant cross-account access
Access Analyzer: checks bucket policy for external access risks
VPC endpoints: access from within VPC without internet
Encrypt: SSE-KMS with CMK for audit trail of encryption key usage

9. What happens when an ECS task fails its health check?

What to cover:

ECS service scheduler marks task as unhealthy after grace period
Unhealthy task is stopped and replaced; new task launches if capacity allows
Health check grace period gives time for application to initialize
If task is stuck in PENDING: not enough resources, image pull failures, or health check misconfiguration
Check: task definition health check, container port mappings, startup time

10. How do you estimate Lambda costs for a production API and what factors affect the bill?

What to cover:

Billed per invocation and per GB-second of execution time
Duration: 100ms billing increments (round up); optimize memory/CPU tradeoffs
Data transfer: VPC egress charges apply; provisioned concurrency has hourly cost
Cold starts: do not count as billed duration unless function actually executes
Estimate: 1M requests × 500ms × 512MB = ~$0.20/month (very rough)

11. What are the key differences between Application Load Balancer (ALB) and Network Load Balancer (NLB) in AWS?

What to cover:

ALB operates at layer 7 (HTTP/HTTPS), NLB operates at layer 4 (TCP/UDP)
ALB supports path-based routing, host-based routing, and content-based routing
ALB terminates TLS and forwards decrypted traffic; NLB passes encrypted traffic through
NLB handles millions of requests per second with lower latency; ALB adds ~1-2ms latency
ALB integrates with ECS services for dynamic port mapping; NLB for high-throughput non-HTTP workloads
ALB includes built-in health checks; NLB health checks are simpler (TCP connect only)

12. How does Amazon ECR integrate with ECS and EKS for container image management?

What to cover:

ECR stores container images in a managed registry backed by S3 for durability
ECS task definitions reference ECR image URLs: `123456789.dkr.ecr.us-east-1.amazonaws.com/webapp:latest`
IAM policies control who can pull images from which repositories
Image scanning on push detects CVEs and prevents vulnerable images from deploying
Lifecycle policies auto-expire old image versions to reduce storage costs
ECR works with both ECS and EKS—same registry, different pull authentication

13. What is the difference between IAM roles and IAM users for AWS access?

What to cover:

Users have permanent access keys (long-term credentials); roles provide temporary credentials
Roles are assumed by identities (users, services, applications) for specific tasks
For EC2, Lambda, ECS: use instance profiles or task roles—no need to store keys
IAM users are for human access; service roles are for machine-to-machine access
Roles prevent credential leakage—keys cannot be stolen if keys do not exist
Use IAM roles for federation: users assume a role to get temporary elevated access

14. How does ASG health check type (EC2 vs ELB) affect instance replacement behavior?

What to cover:

EC2 health check: marks instance unhealthy if the instance status or system status becomes impaired
ELB health check: marks instance unhealthy if the ELB reports the instance as failed via its health check
ELB health check is more application-aware—checks if your service responds, not just if EC2 is running
Using EC2 health check when the application can be unhealthy but EC2 is fine leads to traffic to bad instances
Using ELB health check when the app is fine but the ELB health check endpoint is wrong leads to unnecessary replacements

15. What are the trade-offs between S3 Standard and S3 Intelligent-Tiering?

What to cover:

S3 Standard: highest storage cost, immediate access, no retrieval fees
S3 Intelligent-Tiering: monitors access patterns, auto-moves objects to lower-cost tiers after 30 days of no access
Intelligent-Tiering has a small monthly monitoring fee and possible retrieval fees from infrequent access tier
Best for: unpredictable access patterns, applications where you do not know access frequency in advance
Not best for: predictable hot data (Standard is cheaper), data accessed very frequently

16. How does Lambda concurrency work and what happens when you hit the reserved concurrency limit?

What to cover:

Lambda scales automatically up to 1000 concurrent executions per region by default
Reserved concurrency: guarantees a set number of executions for a function, isolates it from others
When reserved concurrency is exhausted, new invocations get throttled (429 Too Many Requests)
Provisioned concurrency: pre-warms instances to eliminate cold starts for a reserved allocation
Use reserved concurrency to prevent one function from consuming all regional capacity
Throttled invocations can be retried or routed to a dead-letter queue

17. What is the difference between ECS task definition revision and task definition family?

What to cover:

Family: the name of the task definition, like a versioned template
Revision: a specific version of the family (webapp:1, webapp:2, webapp:3)
When you register a new task definition, you specify family and get a new revision number
ECS service references a specific revision (webapp:2); updating the service picks up new revisions
Family groups related task definitions—webapp-service and webapp-worker might be separate families

18. How does VPC endpoint for S3 work and why would you use it?

What to cover:

VPC endpoint creates a private connection from your VPC to S3 without internet traversal
Without VPC endpoint, traffic to S3 goes through NAT gateway or internet gateway
VPC endpoint is free; NAT gateway has hourly cost plus data processing cost
Endpoint policy controls which S3 buckets can be accessed from the endpoint
Use VPC endpoints for: improved security (no internet exposure), cost reduction, lower latency
VPC endpoint for DynamoDB is separate from S3—create both for complete private AWS access

19. What is the purpose of EKS cluster endpoint access control (public vs private)?

What to cover:

Public endpoint: kubectl access from anywhere with authentication via AWS IAM
Private endpoint: kubectl access only from within the VPC—more secure for private clusters
Best practice: enable both, restrict public access via security groups
Private endpoint uses VPC internal DNS to resolve the cluster endpoint address
For hybrid scenarios, public endpoint with restricted CIDR blocks is a common pattern

20. How do you choose between S3 burst throughput and provisioned throughput?

What to cover:

S3 Standard: burst throughput capability—short spikes up to 3000 PUT/COPY/POST/LIST or 5000 GET/HEAD per second per prefix
Provisioned throughput: guaranteed sustained throughput for predictable workloads
Burst is sufficient for most workloads; provisioned for consistent high-throughput requirements
Burst replenishes over time; if you consistently need more than burst provides, provisioned is better
Cost: provisioned costs more; only use when burst is consistently insufficient for your workload

Conclusion

Key Takeaways

EC2 gives full control but maximum operational burden; use for legacy workloads and specific hardware needs
ECS with Fargate removes EC2 management; best for teams wanting containers without Kubernetes complexity
EKS provides Kubernetes portability; best for multi-cloud strategies and teams with Kubernetes expertise
Lambda is ideal for event-driven, short-running workloads; not suitable for long processes or stateful operations
Multi-account AWS organizations enforce guardrails via SCPs and simplify billing tracking

AWS Onboarding Checklist

# 1. Create organization and enable SCPs
aws organizations create-organization
aws organizations enable-service-control-policy --service-principal ALL

# 2. Set up VPC for production
aws ec2 create-vpc --cidr-block 10.0.0.0/16
aws ec2 create-subnet --vpc-id vpc-xxx --cidr-block 10.0.1.0/24

# 3. Create ECS cluster with Fargate
aws ecs create-cluster --cluster-name production --capacity-providers FARGATE

# 4. Set up CloudWatch alarms for critical metrics
aws cloudwatch put-metric-alarm \
  --alarm-name EC2-High-CPU \
  --metric-name CPUUtilization \
  --namespace AWS/EC2 \
  --threshold 80 \
  --period 300 \
  --evaluation-periods 1

# 5. Enable S3 versioning on artifact bucket
aws s3api put-bucket-versioning \
  --bucket my-artifacts \
  --versioning-configuration Status=Enabled

AWS Core Services for DevOps: EC2, ECS, EKS, S3, Lambda

Introduction

EC2 vs. ECS vs. EKS vs. Lambda

S3 Storage Class Selection

AWS Multi-Account Architecture

EC2 Instance Types and ASGs

ECS Task Definitions and Services

EKS Cluster Management Basics

S3 for Artifact Storage

Lambda for Serverless Workloads

Trade-off Analysis

Production Failure Scenarios

AWS Observability Hooks

Common Pitfalls / Anti-Patterns

Capacity Estimation and Benchmark Data

EC2 Instance Type Families

Lambda Performance Parameters

S3 Performance Targets

Service Limits for Planning

Additional References

Interview Questions

Further Reading

Conclusion

Key Takeaways

AWS Onboarding Checklist

Category

Tags

Related Posts

AWS Data Services: Kinesis, Glue, Redshift, and S3

Data Migration: Strategies and Patterns for Moving Data

Serverless Data Processing: Building Elastic Pipelines