Terraform: Declarative Infrastructure Provisioning
Learn Terraform from the ground up—state management, providers, modules, and production-ready patterns for managing cloud infrastructure as code.
Terraform: Declarative Infrastructure Provisioning
Terraform is the de facto standard for declarative infrastructure provisioning. It lets you define cloud resources in configuration files that you can version control, reuse, and share. Whether you are spinning up a single EC2 instance or orchestrating a multi-cloud Kubernetes cluster, Terraform handles the lifecycle of your infrastructure with a consistent workflow.
The core idea is simple: you declare what you want, and Terraform figures out how to make it happen. No manual console clicks, no forgotten steps when recreating environments. Your infrastructure becomes code reviewable, testable, and repeatable.
Terraform Basics and HCL Syntax
HashiCorp Configuration Language (HCL) is Terraform’s domain-specific language. It is designed to be human-readable while being machine-parseable. The syntax uses blocks, attributes, and expressions.
# Define the required providers
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
# Configure the AWS provider
provider "aws" {
region = "us-west-2"
}
# Create an S3 bucket
resource "aws_s3_bucket" "app_bucket" {
bucket = "my-unique-app-bucket-${var.environment}"
tags = {
Name = "App bucket"
Environment = var.environment
}
}
# Define a variable
variable "environment" {
description = "Deployment environment"
type = string
default = "dev"
}
Blocks type declares the kind of infrastructure you want. Attributes assign values to specific properties. Variables and outputs allow you to parameterize your configurations. The terraform block configures the Terraform runtime itself, including provider version constraints.
You run terraform init to initialize the working directory, downloading the required providers. Then terraform plan shows you what Terraform intends to do, and terraform apply executes those changes. terraform destroy tears everything down when you no longer need it.
State Management and Backends
Terraform uses state to track the real-world resources it manages. The state file is a JSON snapshot of all managed resources and their current attributes. Every time you run terraform plan or apply, Terraform compares your configuration against the state and determines what changes are necessary.
By default, Terraform stores state locally in a file named terraform.tfstate. This works for solo development, but it breaks down in team environments. Local state files get overwritten, cause merge conflicts, and leak secrets if committed to version control.
Remote backends solve these problems. They store state in a shared location—typically an S3 bucket, Google Cloud Storage bucket, or HashiCorp Cloud—that supports state locking to prevent concurrent modifications.
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "prod/us-east-1/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
The dynamodb_table setting enables state locking, which prevents two team members from running terraform apply simultaneously. If one person holds the lock, the other gets an error message rather than a corrupted state file.
State file security matters. State can contain sensitive values like database passwords or API keys if you use sensitive = true on output definitions. Always enable encryption at rest, restrict access with IAM policies, and never commit state files to version control.
Providers and Resource Types
Providers are plugins that Terraform uses to interact with cloud platforms, SaaS services, and other APIs. HashiCorp maintains official providers for AWS, Azure, GCP, and Kubernetes. The community contributes hundreds more for services like Datadog, GitHub, Stripe, and Cloudflare.
Each provider exposes resource types that map to infrastructure objects. The AWS provider includes resources like aws_instance, aws_vpc, aws_rds_instance, and aws_iam_role. You can mix resources from multiple providers in the same configuration to define your entire stack.
# Create a VPC
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "Main VPC"
}
}
# Create a subnet
resource "aws_subnet" "private" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.1.0/24"
availability_zone = "us-east-1a"
map_public_ip_on_launch = false
tags = {
Name = "Private Subnet"
Type = "Private"
}
}
# Create an IAM role for EC2
resource "aws_iam_role" "ec2_role" {
name = "ec2-app-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}]
})
}
Provider configuration happens in the provider block. You can specify different configurations for the same provider in different workspaces, which is useful when managing resources across multiple AWS regions or accounts.
Writing Reusable Modules
Modules are containers for related resources. They let you package infrastructure patterns and reuse them across projects. A well-designed module accepts input variables and returns output values, abstracting away the implementation details.
# modules/networking/vpc/main.tf
variable "environment" {}
variable "cidr_block" {}
resource "aws_vpc" "main" {
cidr_block = var.cidr_block
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Environment = var.environment
}
}
resource "aws_subnet" "public" {
count = 3
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.cidr_block, 8, count.index)
availability_zone = data.aws_availability_zones.available.names[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.environment}-public-${count.index + 1}"
Type = "Public"
}
}
output "vpc_id" {
value = aws_vpc.main.id
}
output "subnet_ids" {
value = aws_subnet.public[*].id
}
Using a module looks like calling a function:
module "networking" {
source = "./modules/networking"
environment = "production"
cidr_block = "10.1.0.0/16"
}
# Use the module outputs
module "eks" {
source = "./modules/eks"
vpc_id = module.networking.vpc_id
subnet_ids = module.networking.subnet_ids
# ... other arguments
}
Modules promote consistency. Instead of copying and pasting resource definitions, teams maintain a library of modules that encode best practices. When someone fixes a security issue in a module, every project using that module gets the fix by updating the version.
Workspace Strategies
Terraform workspaces let you manage multiple environments from the same configuration. Each workspace maintains its own state file, so you can use one set of configuration files to provision dev, staging, and production environments.
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "environments/${terraform.workspace}/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
resource "aws_instance" "server" {
instance_type = terraform.workspace == "production" ? "t3.large" : "t3.micro"
ami = var.ami_id
tags = {
Name = "Server-${terraform.workspace}"
}
}
The terraform.workspace variable returns the current workspace name. You can use it in expressions to vary resource configuration per environment, like using larger instance types in production or enabling extra logging there.
Some teams prefer directory-based environments instead, keeping separate folders for each environment with a shared module library. This approach makes environment boundaries clearer and prevents accidental cross-environment changes. The tradeoff is maintaining consistency across duplicate directory structures.
Workspaces work well for environments that differ mostly in sizing and scaling, while directory-based approaches suit environments with structural differences.
Tofu and OpenTofu Ecosystem
After HashiCorp switched Terraform to BSL licensing in 2023, the open-source community rallied around OpenTofu, a Linux Foundation project that maintains a Spiritually Fork of Terraform. OpenTofu is fully backward-compatible with existing Terraform configurations and providers.
# Install OpenTofu
curl --proto '=https' --tlsv1.2 -fsSL https://get.opentofu.org/install.sh | sh
# Initialize with OpenTofu instead of Terraform
tofu init
tofu plan
tofu apply
OpenTofu promises to remain open-source under the Linux Foundation governance, removing the licensing uncertainty that HashiCorp introduced. The project has already added features like enhanced templating and improved state encryption that were missing from the last open-source Terraform release.
If you are starting a new project, OpenTofu is worth considering for its community-driven development model. Existing Terraform users can migrate gradually since both tools share the same configuration syntax and state format. The OpenTofu website provides migration guides for users concerned about the licensing direction.
For teams already using Terraform with HashiCorp Cloud or Terraform Enterprise, the change may not affect your daily workflow. But monitoring the ecosystem for provider compatibility and feature parity between Terraform and OpenTofu makes sense as the fork matures.
For more on managing infrastructure costs, check out our post on Cost Optimization.
When to Use / When Not to Use
When Terraform makes sense
Terraform is the right tool when you need to manage infrastructure across multiple cloud providers from a single workflow. If you are running AWS, GCP, and Azure—or even just multiple AWS accounts—Terraform’s provider model gives you a consistent interface for all of it.
Use Terraform when your infrastructure team and application team are separate. Terraform modules let you hide implementation details behind interfaces. The app team gets a VPC module without needing to understand CIDR blocks or routing tables. That separation of concerns matters at scale.
Terraform is also the obvious choice when compliance requires documented, version-controlled infrastructure. If an auditor needs to see what existed in production on a given date, Git history plus Terraform state gives you that. Manual console clicks do not.
When to use something else
If your infrastructure is mostly ephemeral—serverless functions, containerized workloads managed by Kubernetes—Terraform adds overhead without much value. The cloud provider’s own tooling may be faster and more integrated.
If your team consists of software engineers who do not want to learn HCL, Pulumi or AWS CDK let them manage infrastructure in languages they already know.
For simple, stable infrastructure that rarely changes, the investment in Terraform (state management, module maintenance, workflow overhead) may not pay off. Sometimes a CloudFormation template or even manual provisioning is fine.
Terraform Architecture Flow
flowchart TD
A[Config Files .tf] --> B[terraform init]
B --> C[Provider Plugins Downloaded]
C --> D[terraform plan]
D --> E[State File Updated]
E --> F[terraform apply]
F --> G[Real Infrastructure]
G --> D
G --> H[terraform destroy]
H --> I[Resources Torn Down]
Production Failure Scenarios
Common Terraform Failures
| Failure | Impact | Mitigation |
|---|---|---|
| State drift | Actual infrastructure diverges from Terraform’s view | Run terraform plan regularly, use import for drift correction |
| Lock timeout | Team member blocked from applying changes | Check for hung apply, increase lock timeout, break locks manually if needed |
| Plan/apply mismatch | State corrupted after manual changes outside Terraform | Enforce policy: all changes via Terraform, no console edits |
| Provider version conflict | Resources fail to create or update | Pin provider versions, test upgrades in staging first |
| Sensitive data in state | Passwords and keys stored in plain text | Use encrypted backends, suppress sensitive outputs, use vault provider |
| Circular dependencies | apply hangs indefinitely | Break cycles by extracting shared resources |
| Destroying shared resources | Production resource deleted accidentally | Use protection flags, review destroy plans carefully |
State Lock Timeout Recovery
flowchart TD
A[terraform apply blocked] --> B{Lock expired?}
B -->|No| C[Wait for lock holder]
B -->|Yes| D[Check lock metadata]
D --> E{Valid lock?}
E -->|Yes| F[Force unlock after timeout]
E -->|No| G[Remove stale lock manually]
F --> H[Re-run apply]
G --> H
Observability Hooks
Track Terraform operations to catch problems early and maintain audit trails.
What to monitor:
- Plan duration and size (large plans mean many pending changes)
- Apply success/failure rate per workspace
- State file size growth over time
- Resource drift detection frequency
- Lock contention events
# Enable Terraform logging
export TF_LOG=TRACE
export TF_LOG_PATH=terraform.log
# Check state lock status
terraform force-unlock LOCK_ID
# Audit state changes
terraform state list | wc -l # Count resources
terraform state pull > state-backup.tfstate
# Monitor drift
terraform plan -detailed-exitcode
# Exit code 2 means drift detected
Common Pitfalls / Anti-Patterns
Storing state locally
Local state files work for learning, but in teams they cause merge conflicts, secret leaks, and no locking. Always use a remote backend for anything beyond solo experiments.
Ignoring destroy plans
Running terraform destroy without reviewing the plan first can take down production. A misplaced filter in your targeting can delete resources across all environments. Always read the destroy plan before confirming.
Using terraform apply without terraform plan
Skipping plan means you approve changes without knowing what they are. CI pipelines that auto-apply without plan review are a recipe for surprises.
Not using version pinning
Provider version drift causes “it worked yesterday” problems. Always pin versions in the required_providers block and test upgrades in staging before production.
Treating modules as macros
Modules that just wrap single resources without adding meaningful abstraction are cargo cult IaC. A module for an EC2 instance that does nothing but pass through arguments adds indirection without value.
Quick Recap
Key Takeaways
- Terraform uses HCL for declarative infrastructure definition with state management
- Remote backends with state locking are mandatory for team environments
- Modules encode best practices and enable separation of concerns between teams
- OpenTofu offers a community-driven open-source alternative to Terraform
- Plan always before apply—never skip the review step
Terraform Health Checklist
# Verify Terraform is initialized
terraform init
# Check for configuration drift
terraform plan -detailed-exitcode
# List all managed resources
terraform state list
# Validate configuration syntax
terraform validate
# Review destroy plan before execution
terraform plan -destroy
# Check backend configuration
terraform workspace list
# Audit state file
terraform state pull > backup.tfstate
Trade-off Summary
| Aspect | Terraform | Pulumi | Cloud-native IaC (CDKTF) |
|---|---|---|---|
| Language | HCL (DSL) | TypeScript, Python, Go, C# | TypeScript, Python, Go, C# |
| State management | Self-managed or Cloud | Pulumi Cloud or self-hosted | Self-managed or Cloud |
| Provider ecosystem | Largest (10,000+) | Large (200+) | Growing |
| Learning curve | Moderate (HCL) | Steeper (programming) | Steep (CDK concepts) |
| Testing | Terratest (external) | Native unit tests | Native unit tests |
| Debugging | Limited tooling | Full IDE support | Full IDE support |
| Governance | HashiCorp/OpenTofu | Pulumi (proprietary) | HashiCorp (open core) |
Conclusion
Terraform excels at managing infrastructure across multiple cloud providers with a consistent workflow. The declarative approach means you define your desired state and let Terraform handle the rest. Start with local state for learning, migrate to remote backends for collaboration, build modules for reusability, and consider OpenTofu if open-source governance matters to your organization.
For securing your infrastructure, see Cloud Security for IAM policies, encryption, and access controls. For monitoring infrastructure changes and drift detection, see Observability Engineering.
The skills you develop writing Terraform configurations transfer directly to other IaC tools. The concepts of state management, provider abstraction, and idempotent resource provisioning appear across the entire infrastructure-as-code landscape.
Category
Related Posts
IaC Module Design: Reusable and Composable Infrastructure
Design Terraform modules that are reusable, composable, and maintainable—versioning, documentation, and publish patterns for infrastructure building blocks.
IaC State Management: Remote Backends and Team Collaboration
Manage Terraform/OpenTofu state securely with remote backends, state locking, and strategies for team collaboration without state conflicts.
Pulumi: Infrastructure as Actual Code
Use Pulumi to define infrastructure using real programming languages—TypeScript, Python, Go, C#—enabling loops, conditionals, and full IDE support for IaC.