Terraform: Declarative Infrastructure Provisioning

Learn Terraform from the ground up—state management, providers, modules, and production-ready patterns for managing cloud infrastructure as code.

published: March 25, 2026 reading time: 26 min read author: GeekWorkBench updated: June 17, 2026

Quick Summary

Terraform manages infrastructure declaratively—you describe the desired state in configuration files, and Terraform determines and executes the steps to reach it. The tool tracks every managed resource in a state file, comparing it against your configuration on every plan and apply to determine what changes are needed. Providers expose resource types for AWS, Azure, GCP, and hundreds of other services, enabling multi-cloud architectures from a single workflow.

Terraform: Declarative Infrastructure Provisioning

Terraform is the standard tool for declarative infrastructure provisioning. You write configuration files that describe what infrastructure you want, and Terraform figures out how to make it happen. No manual console clicks, no forgotten steps when recreating environments. Your infrastructure becomes code reviewable, testable, and repeatable.

The core idea is straightforward: declare what you want, and Terraform handles the rest. Whether you’re spinning up a single EC2 instance or orchestrating a multi-cloud Kubernetes cluster, Terraform manages the full lifecycle with a consistent workflow.

Introduction

Terraform is an infrastructure-as-code tool that defines cloud resources in configuration files you can version, reuse, and share. Instead of clicking through a cloud console to provision servers, databases, or networks, you write code that describes the desired state of your infrastructure. Terraform then determines what actions are needed to reach that state and executes them in the right order. This approach turns infrastructure into something you can test, code review, and reproduce across environments.

The workflow is straightforward: write configuration, run terraform plan to preview changes, then run terraform apply to execute them. Terraform tracks every resource it manages in a state file, which acts as the source of truth for what currently exists. This state file is compared against your configuration on every plan and apply operation, so Terraform always knows what needs to change. The workflow stays consistent whether you’re deploying to a single AWS account or managing resources across multiple cloud providers.

Providers are plugins that let Terraform interact with cloud platforms and external APIs. The AWS provider handles EC2 instances, VPCs, and RDS databases. The Azure provider works with AKS clusters and Blob Storage. The GCP provider manages GKE clusters and Cloud Storage buckets. Community providers add support for GitHub, Datadog, Cloudflare, and hundreds of other services. Each provider exposes resource types that map to real infrastructure objects, and you can mix resources from any number of providers in the same configuration. That flexibility makes Terraform useful for multi-cloud architectures.

Terraform Basics and HCL Syntax

HashiCorp Configuration Language (HCL) is Terraform’s domain-specific language. It is designed to be human-readable while being machine-parseable. The syntax uses blocks, attributes, and expressions.

# Define the required providers
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

# Configure the AWS provider
provider "aws" {
  region = "us-west-2"
}

# Create an S3 bucket
resource "aws_s3_bucket" "app_bucket" {
  bucket = "my-unique-app-bucket-${var.environment}"

  tags = {
    Name        = "App bucket"
    Environment = var.environment
  }
}

# Define a variable
variable "environment" {
  description = "Deployment environment"
  type        = string
  default     = "dev"
}

Blocks type declares the kind of infrastructure you want. Attributes assign values to specific properties. Variables and outputs let you parameterize your configurations. The terraform block configures the Terraform runtime itself, including provider version constraints.

You run terraform init to initialize the working directory, downloading the required providers. Then terraform plan shows you what Terraform intends to do, and terraform apply executes those changes. terraform destroy tears everything down when you no longer need it.

State Management and Backends

Terraform uses state to track the real-world resources it manages. The state file is a JSON snapshot of all managed resources and their current attributes. Every time you run terraform plan or apply, Terraform compares your configuration against the state and determines what changes are necessary.

By default, Terraform stores state locally in a file named terraform.tfstate. This works for solo development, but it breaks down in team environments. Local state files get overwritten, cause merge conflicts, and leak secrets if committed to version control.

Remote backends solve these problems. They store state in a shared location—typically an S3 bucket, Google Cloud Storage bucket, or HashiCorp Cloud—that supports state locking to prevent concurrent modifications.

terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "prod/us-east-1/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

The dynamodb_table setting enables state locking, which prevents two team members from running terraform apply simultaneously. If one person holds the lock, the other gets an error message rather than a corrupted state file.

State file security matters. State can contain sensitive values like database passwords or API keys if you use sensitive = true on output definitions. Always enable encryption at rest, restrict access with IAM policies, and never commit state files to version control.

Providers and Resource Types

Providers are plugins that Terraform uses to interact with cloud platforms, SaaS services, and other APIs. HashiCorp maintains official providers for AWS, Azure, GCP, and Kubernetes. The community contributes hundreds more for services like Datadog, GitHub, Stripe, and Cloudflare.

Each provider exposes resource types that map to infrastructure objects. The AWS provider includes resources like aws_instance, aws_vpc, aws_rds_instance, and aws_iam_role. You can mix resources from multiple providers in the same configuration to define your entire stack.

# Create a VPC
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "Main VPC"
  }
}

# Create a subnet
resource "aws_subnet" "private" {
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.1.0/24"
  availability_zone       = "us-east-1a"
  map_public_ip_on_launch = false

  tags = {
    Name = "Private Subnet"
    Type = "Private"
  }
}

# Create an IAM role for EC2
resource "aws_iam_role" "ec2_role" {
  name = "ec2-app-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ec2.amazonaws.com"
      }
    }]
  })
}

Provider configuration happens in the provider block. You can specify different configurations for the same provider in different workspaces, which is useful when managing resources across multiple AWS regions or accounts.

Writing Reusable Modules

Modules are containers for related resources. They let you package infrastructure patterns and reuse them across projects. A well-designed module accepts input variables and returns output values, abstracting away the implementation details.

# modules/networking/vpc/main.tf
variable "environment" {}
variable "cidr_block" {}

resource "aws_vpc" "main" {
  cidr_block           = var.cidr_block
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Environment = var.environment
  }
}

resource "aws_subnet" "public" {
  count             = 3
  vpc_id             = aws_vpc.main.id
  cidr_block         = cidrsubnet(var.cidr_block, 8, count.index)
  availability_zone  = data.aws_availability_zones.available.names[count.index]
  map_public_ip_on_launch = true

  tags = {
    Name = "${var.environment}-public-${count.index + 1}"
    Type = "Public"
  }
}

output "vpc_id" {
  value = aws_vpc.main.id
}

output "subnet_ids" {
  value = aws_subnet.public[*].id
}

Using a module looks like calling a function:

module "networking" {
  source     = "./modules/networking"
  environment = "production"
  cidr_block  = "10.1.0.0/16"
}

# Use the module outputs
module "eks" {
  source           = "./modules/eks"
  vpc_id           = module.networking.vpc_id
  subnet_ids       = module.networking.subnet_ids
  # ... other arguments
}

Modules promote consistency. Instead of copying and pasting resource definitions, teams maintain a library of modules that encode best practices. When someone fixes a security issue in a module, every project using that module gets the fix by updating the version.

Workspace Strategies

Terraform workspaces let you manage multiple environments from the same configuration. Each workspace maintains its own state file, so you can use one set of configuration files to provision dev, staging, and production environments.

terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "environments/${terraform.workspace}/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

resource "aws_instance" "server" {
  instance_type = terraform.workspace == "production" ? "t3.large" : "t3.micro"
  ami           = var.ami_id

  tags = {
    Name = "Server-${terraform.workspace}"
  }
}

The terraform.workspace variable returns the current workspace name. You can use it in expressions to vary resource configuration per environment, like using larger instance types in production or enabling extra logging there.

Some teams prefer directory-based environments instead, keeping separate folders for each environment with a shared module library. This approach makes environment boundaries clearer and prevents accidental cross-environment changes. The tradeoff is maintaining consistency across duplicate directory structures.

Workspaces work well for environments that differ mostly in sizing and scaling, while directory-based approaches suit environments with structural differences.

Tofu and OpenTofu Ecosystem

After HashiCorp switched Terraform to BSL licensing in 2023, the open-source community rallied around OpenTofu, a Linux Foundation project that maintains a Spiritually Fork of Terraform. OpenTofu is fully backward-compatible with existing Terraform configurations and providers.

# Install OpenTofu
curl --proto '=https' --tlsv1.2 -fsSL https://get.opentofu.org/install.sh | sh

# Initialize with OpenTofu instead of Terraform
tofu init
tofu plan
tofu apply

OpenTofu promises to remain open-source under the Linux Foundation governance, removing the licensing uncertainty that HashiCorp introduced. The project has already added features like enhanced templating and improved state encryption that were missing from the last open-source Terraform release.

If you are starting a new project, OpenTofu is worth considering for its community-driven development model. Existing Terraform users can migrate gradually since both tools share the same configuration syntax and state format. The OpenTofu website provides migration guides for users concerned about the licensing direction.

For teams already using Terraform with HashiCorp Cloud or Terraform Enterprise, the change may not affect your daily workflow. But monitoring the ecosystem for provider compatibility and feature parity between Terraform and OpenTofu makes sense as the fork matures.

For more on managing infrastructure costs, check out our post on Cost Optimization.

When to Use / When Not to Use

When Terraform makes sense

Terraform is the right tool when you need to manage infrastructure across multiple cloud providers from a single workflow. If you are running AWS, GCP, and Azure—or even just multiple AWS accounts—Terraform’s provider model gives you a consistent interface for all of it.

Use Terraform when your infrastructure team and application team are separate. Terraform modules let you hide implementation details behind interfaces. The app team gets a VPC module without needing to understand CIDR blocks or routing tables. That separation of concerns matters at scale.

Terraform is also the obvious choice when compliance requires documented, version-controlled infrastructure. If an auditor needs to see what existed in production on a given date, Git history plus Terraform state gives you that. Manual console clicks do not.

When to use something else

If your infrastructure is mostly ephemeral—serverless functions, containerized workloads managed by Kubernetes—Terraform adds overhead without much value. The cloud provider’s own tooling may be faster and more integrated.

If your team consists of software engineers who do not want to learn HCL, Pulumi or AWS CDK let them manage infrastructure in languages they already know.

For simple, stable infrastructure that rarely changes, the investment in Terraform (state management, module maintenance, workflow overhead) may not pay off. Sometimes a CloudFormation template or even manual provisioning is fine.

Terraform Architecture Flow

flowchart TD
    A[Config Files .tf] --> B[terraform init]
    B --> C[Provider Plugins Downloaded]
    C --> D[terraform plan]
    D --> E[State File Updated]
    E --> F[terraform apply]
    F --> G[Real Infrastructure]
    G --> D
    G --> H[terraform destroy]
    H --> I[Resources Torn Down]

Production Failure Scenarios

Common Terraform Failures

Failure	Impact	Mitigation
State drift	Actual infrastructure diverges from Terraform’s view	Run terraform plan regularly, use import for drift correction
Lock timeout	Team member blocked from applying changes	Check for hung apply, increase lock timeout, break locks manually if needed
Plan/apply mismatch	State corrupted after manual changes outside Terraform	Enforce policy: all changes via Terraform, no console edits
Provider version conflict	Resources fail to create or update	Pin provider versions, test upgrades in staging first
Sensitive data in state	Passwords and keys stored in plain text	Use encrypted backends, suppress sensitive outputs, use vault provider
Circular dependencies	apply hangs indefinitely	Break cycles by extracting shared resources
Destroying shared resources	Production resource deleted accidentally	Use protection flags, review destroy plans carefully

State Lock Timeout Recovery

flowchart TD
    A[terraform apply blocked] --> B{Lock expired?}
    B -->|No| C[Wait for lock holder]
    B -->|Yes| D[Check lock metadata]
    D --> E{Valid lock?}
    E -->|Yes| F[Force unlock after timeout]
    E -->|No| G[Remove stale lock manually]
    F --> H[Re-run apply]
    G --> H

Observability Hooks

Track Terraform operations to catch problems early and maintain audit trails.

What to monitor:

Plan duration and size (large plans mean many pending changes)
Apply success/failure rate per workspace
State file size growth over time
Resource drift detection frequency
Lock contention events

# Enable Terraform logging
export TF_LOG=TRACE
export TF_LOG_PATH=terraform.log

# Check state lock status
terraform force-unlock LOCK_ID

# Audit state changes
terraform state list | wc -l  # Count resources
terraform state pull > state-backup.tfstate

# Monitor drift
terraform plan -detailed-exitcode
# Exit code 2 means drift detected

Common Pitfalls / Anti-Patterns

Storing state locally

Local state files work for learning, but in teams they cause merge conflicts, secret leaks, and no locking. Always use a remote backend for anything beyond solo experiments.

The problem shows up fast once a second developer runs terraform apply on their own machine. The state file gets overwritten with only the resources that developer touched. Meanwhile, resources created by the first developer vanish from state entirely. When the first developer runs terraform plan next, Terraform thinks those resources need to be created again—and tries to create duplicates.

Beyond merge conflicts, local state is a secret leak vector. Terraform stores attribute values in state exactly as they exist in the real infrastructure. Database passwords, API keys, and private keys end up in whatever directory contains terraform.tfstate. If that directory is a Git repo, those secrets are in history forever even after you delete the file. Teams have leaked production credentials by accidentally committing state files.

Remote backends solve all three problems: state lives in a shared location (S3, Google Cloud Storage), state locking prevents simultaneous applies, and access is controlled by cloud IAM policies rather than filesystem permissions. For solo work, local state is fine. For anything beyond learning the basics, start with a remote backend from day one.

Ignoring destroy plans

Running terraform destroy without reviewing the plan first can take down production. A misplaced filter in your targeting can delete resources across all environments. Always read the destroy plan before confirming.

The -target flag is the most dangerous culprit. It scopes a destroy operation to specific resources, which sounds safe. But if you target a resource that other resources depend on—say, a subnet that an EC2 instance uses—Terraform still tries to destroy the downstream dependencies first. That can cascade into a full environment teardown. The -target flag does not just affect the named resource; it affects everything connected to it through dependency chains.

Workspace mix-ups compound the problem. Run terraform workspace select production and then terraform destroy -target=aws_vpc.dev expecting to destroy only the dev VPC. If the targeting expression is wrong or the workspace context leaks into a variable file, you might actually destroy the production VPC. Some teams avoid terraform destroy entirely in favor of terraform apply with an empty configuration file, which forces Terraform to compute a destroy plan from scratch and gives you a full diff before anything is deleted.

Review every destroy plan line by line before confirming. If the plan mentions deleting a database, load balancer, or any resource that handles traffic, stop and verify that resource is not shared across environments. Mark the resource as untouchable with lifecycle { prevent_destroy = true } if deleting it would be catastrophic.

Using `terraform apply` without `terraform plan`

Skipping plan means you approve changes without knowing what they are. CI pipelines that auto-apply without plan review are a recipe for surprises.

The plan phase exists for a reason: it computes the diff between your configuration and the current state, then shows you exactly what will change before touching anything. When you skip it, you are flying blind. A typo in a CIDR block could open your VPC to the internet. Removing a tag could cause an autoscaling group to lose its health check configuration. Changing an instance type might trigger a recreation of the entire fleet.

In CI/CD pipelines, the pattern terraform apply -auto-approve is tempting for automation. It removes the confirmation prompt and speeds up pipelines. But the safe version runs terraform plan first, saves the plan to a file, then runs terraform apply planfile to apply exactly what was reviewed. This two-step approach preserves the review step while still being automatable. Some teams use terraform apply -auto-approve only for infrastructure that is known to be purely additive—adding a new resource to an existing VPC, for example. Anything that modifies or deletes existing resources requires the plan-first flow.

The -detailed-exitcode flag on terraform plan makes automation easier. It returns exit code 2 when there is a diff (drift detected) and exit code 0 when there is no diff. This lets your CI pipeline fail the build when actual infrastructure diverges from the Terraform state, without requiring a human to read the plan output.

Not using version pinning

Provider version drift causes “it worked yesterday” problems. Always pin versions in the required_providers block and test upgrades in staging before production.

When you do not pin provider versions, Terraform fetches the latest compatible version on every terraform init. That works fine until a provider release introduces a breaking change—renames a resource type, changes default values, or drops support for an older AWS region. Then your CI pipeline fails on a Friday afternoon and nobody knows why. The fix is to pin to a known good version in your required_providers block and only update intentionally.

Version constraints use semantic versioning with operators. ~> 5.0 means any version from 5.0 up to but not including 6.0. Pinning to 5.72.0 means exactly that version. For most teams, ~> 5.0 is the right balance: you get patch updates that fix bugs but not minor version upgrades that might change behavior. Upgrade only after testing in a non-production environment.

Provider bugs sometimes require pinning to an exact version below the latest. If AWS releases a version that breaks RDS instance creation, you pin to the last known working version until a fix is available. This is why checking your .terraform.lock.hcl file into version control matters—it ensures every team member and CI runner uses the same provider version.

Treating modules as macros

Modules that just wrap single resources without adding meaningful abstraction are cargo cult IaC. A module for an EC2 instance that does nothing but pass through arguments adds indirection without value.

The tell is a module whose arguments are a 1:1 map to the underlying resource arguments. If your module for an EC2 instance takes ami, instance_type, subnet_id, and tags and then passes them straight through to aws_instance without doing anything else, you have a module that exists only to add a layer of indirection. The justification is usually “it abstracts the implementation,” but abstraction that does not hide anything is just ceremony.

Good module design adds value through enforcement. A VPC module that always enables DNS hostnames and DNS support—that you could not disable from outside—is enforcing a best practice. An RDS module that automatically sets up read replicas, enables backups, and configures parameter groups is doing work you would otherwise have to remember for every instance. An IAM module that generates a role with a least-privilege policy template is encoding security standards rather than just passing through permissions.

Start with direct resource definitions. Only extract a module when you see the same pattern repeated in at least two places. The module should hide complexity, not duplicate the resource syntax with extra steps.

Trade-off Analysis

Aspect	Terraform	Pulumi	Cloud-native IaC (CDKTF)
Language	HCL (DSL)	TypeScript, Python, Go, C#	TypeScript, Python, Go, C#
State management	Self-managed or Cloud	Pulumi Cloud or self-hosted	Self-managed or Cloud
Provider ecosystem	Largest (10,000+)	Large (200+)	Growing
Learning curve	Moderate (HCL)	Steeper (programming)	Steep (CDK concepts)
Testing	Terratest (external)	Native unit tests	Native unit tests
Debugging	Limited tooling	Full IDE support	Full IDE support
Governance	HashiCorp/OpenTofu	Pulumi (proprietary)	HashiCorp (open core)

Interview Questions

1. What is Terraform state and why is it important?

Terraform uses state to track the real-world resources it manages. The state file is a JSON snapshot of all managed resources and their current attributes. Every terraform plan/apply compares configuration against state to determine changes.

State must be protected - it can contain sensitive values like passwords. Without state, Terraform does not know what resources it manages, so destroying state means losing track of infrastructure.

2. Why should you never store Terraform state locally in team environments?

Local state files get overwritten causing merge conflicts. There is no locking - two team members running apply simultaneously corrupts state. Secrets can be leaked if the state file is committed to version control. There is no state versioning or audit trail for who made changes.

Remote backends with state locking are mandatory for collaboration. They prevent concurrent modifications and provide a single source of truth accessible to the whole team.

3. How does Terraform state locking work with S3 backends?

The S3 backend supports state locking via a DynamoDB table. The dynamodb_table setting in the backend configuration enables locking. When one person holds the lock, others get an error message instead of being able to apply changes simultaneously. This prevents corrupted state from simultaneous operations.

4. What is the difference between Terraform providers and resources?

Providers are plugins that Terraform uses to interact with APIs (AWS, Azure, GCP, etc.). Resources are the actual infrastructure objects that providers expose - like aws_instance, aws_vpc, or aws_rds_instance. One provider can expose many resource types.

You configure a provider with a provider block, then use resources from that provider in your configuration. The provider handles the API communication; resources define the actual infrastructure.

5. How do Terraform modules promote infrastructure consistency?

Modules package related resources into reusable components. Well-designed modules accept input variables and return output values, hiding implementation details behind interfaces.

Teams maintain a library of modules encoding best practices. When someone fixes a security issue in a module, all projects using that module get the fix by updating the version. This means consistent security hardening across all infrastructure.

6. What are Terraform workspaces and when should you use them?

Terraform workspaces let you manage multiple environments from the same configuration. Each workspace maintains its own state file, so you can use one configuration to provision dev, staging, and production.

The terraform.workspace variable returns the current workspace name. Use it in expressions to vary resource configuration per environment. Workspaces work well when environments differ mostly in sizing and scaling. Directory-based approaches suit environments with structural differences.

7. What is OpenTofu and why might you choose it over Terraform?

OpenTofu is a Linux Foundation project, forked from Terraform after HashiCorp switched to BSL licensing. It is fully backward-compatible with existing Terraform configurations.

The project promises to remain open-source under community governance. It has already added features like enhanced templating and improved state encryption. Consider it for new projects if open-source governance matters. Existing users can migrate gradually since both tools share the same syntax and state format.

8. How do you recover from a stuck Terraform state lock?

First check if the lock has expired - wait for the lock holder if not. Check lock metadata to identify who holds it. If the lock is valid but the holder is hung, force unlock after the timeout using terraform force-unlock LOCK_ID. If the lock is invalid (stale), remove it manually from DynamoDB. Then re-run the apply that was blocked.

9. What are the common pitfalls when running terraform destroy?

Running terraform destroy without reviewing the plan first can take down production. A misplaced filter in your targeting can delete resources across all environments. Destroying shared resources accidentally is a risk - use protection flags and review destroy plans carefully.

Always read the destroy plan before confirming. Use targeted destroy with -target flag when you only need to delete specific resources. Be especially careful with state that references resources managed outside Terraform.

10. Why is it important to pin provider versions in Terraform?

Provider version drift causes "it worked yesterday" problems. New provider versions may introduce breaking changes. Always pin versions in the required_providers block and test upgrades in staging before production. Unpinned versions can silently update and change behavior in ways that break your infrastructure.

11. What is the Terraform workflow sequence?

Write configuration files (.tf files) with HCL, then run terraform init to initialize the working directory and download providers. Run terraform plan to preview intended changes, then terraform apply to execute them and update state. State file tracks actual resources - plan compares config vs state. terraform destroy tears down resources when no longer needed.

12. When should you use directory-based environments instead of workspaces?

Directory-based approaches suit environments with structural differences. Use when dev, staging, and production have different resource compositions. Workspace-based is better when environments only differ in sizing and scaling.

The directory approach makes boundaries clearer and prevents accidental cross-environment changes. The tradeoff is maintaining consistency across duplicate directory structures.

13. What is state drift and how do you detect it?

State drift: actual infrastructure diverges from Terraform's view. Causes include manual changes outside Terraform, failed applies, and provider bugs.

Detection: run terraform plan regularly - exit code 2 means drift detected. Mitigation: terraform plan frequently, use import for drift correction, and enforce policy that all changes go through Terraform with no console edits.

14. How do you handle sensitive data in Terraform state?

State can contain sensitive values like database passwords if you use sensitive = true. Always enable encryption at rest on the state backend. Restrict access with IAM policies on S3/DynamoDB. Never commit state files to version control. Suppress sensitive outputs in console with -json flag. Consider using the HashiCorp Vault provider for secret management.

15. What is the difference between Terraform and Pulumi?

Terraform uses HCL (a domain-specific language) which is easier for ops engineers. Pulumi uses real programming languages (TypeScript, Python, Go, C#) which enables loops, conditionals, and functions - more expressiveness than HCL.

Terraform has a larger ecosystem with more providers (10,000+). Pulumi offers native unit tests with standard frameworks but has a steeper learning curve for non-programmers.

16. What are anti-patterns when writing Terraform modules?

Modules that just wrap single resources without adding meaningful abstraction are cargo cult IaC. A module for an EC2 instance that just passes through arguments adds indirection without value. Over-abstracting with inheritance chains makes debugging harder.

Modules should encode best practices and hide complexity. Start concrete, and abstract only when repetition demands it.

17. How does terraform plan/apply mismatch corrupt state?

Manual changes outside Terraform (console edits) cause plan to differ from apply. Apply attempts to reconcile state but Terraform's view is outdated. Resources can be orphaned or incorrectly modified, leaving state inconsistent with actual infrastructure.

Prevention: enforce policy that all changes go through Terraform.

18. What is circular dependency in Terraform and how do you resolve it?

Circular dependency: Resource A depends on B, B depends on C, C depends on A. Apply hangs indefinitely waiting for resources that cannot be created.

Resolution: break cycles by extracting shared resources to a separate module. Use data sources to break dependency chains. Refactor configuration to eliminate circular references.

19. How do you monitor Terraform operations in production?

Track plan duration and size - large plans mean many pending changes. Monitor apply success/failure rate per workspace. Track state file size growth over time. Monitor drift detection frequency and lock contention events.

Use TF_LOG=TRACE and TF_LOG_PATH for detailed logs when troubleshooting.

20. When should you NOT use Terraform?

Terraform is not ideal when infrastructure is mostly ephemeral (serverless, containers managed by K8s). The cloud provider's own tooling may be faster and more integrated there.

If your team consists of software engineers who do not want to learn HCL, Pulumi or CDK let them manage infrastructure in languages they already know. For simple, stable infrastructure that rarely changes, the investment in Terraform (state management, module maintenance) may not pay off.

Conclusion

Key Takeaways

Terraform uses HCL for declarative infrastructure definition with state management
Remote backends with state locking are mandatory for team environments
Modules encode best practices and enable separation of concerns between teams
OpenTofu offers a community-driven open-source alternative to Terraform
Plan always before apply—never skip the review step

Terraform Health Checklist

# Verify Terraform is initialized
terraform init

# Check for configuration drift
terraform plan -detailed-exitcode

# List all managed resources
terraform state list

# Validate configuration syntax
terraform validate

# Review destroy plan before execution
terraform plan -destroy

# Check backend configuration
terraform workspace list

# Audit state file
terraform state pull > backup.tfstate

Terraform: Declarative Infrastructure Provisioning

Introduction

Terraform Basics and HCL Syntax

State Management and Backends

Providers and Resource Types

Writing Reusable Modules

Workspace Strategies

Tofu and OpenTofu Ecosystem

When to Use / When Not to Use

When Terraform makes sense

When to use something else

Terraform Architecture Flow

Production Failure Scenarios

Common Terraform Failures

State Lock Timeout Recovery

Observability Hooks

Common Pitfalls / Anti-Patterns

Storing state locally

Ignoring destroy plans

Using terraform apply without terraform plan

Not using version pinning

Treating modules as macros

Trade-off Analysis

Interview Questions

Further Reading

Conclusion

Key Takeaways

Terraform Health Checklist

Category

Tags

Related Posts

IaC Module Design: Reusable and Composable Infrastructure

IaC State Management: Remote Backends and Team Collaboration

Pulumi: Infrastructure as Actual Code

Using `terraform apply` without `terraform plan`