Nick’s Guide to Terraform

For the last 5 years, I’ve been working heavily in cloud and have used Terraform as a common tool for managing AWS, GCP and Azure environments. Infrastructure as Code (IaC) has been an extremely helpful methodology for automating resource deployments as well as providing a sort of inventory for all of the various cloud resources used and their dependencies. Having done multiple introductory knowledge sharing sessions to help get team members across companies up to speed on Terraform and IaC, it seemed like a good time to write a blog post intro to a tool I’ve used daily for years.

The problem Infrastructure as Code solves

If you’ve worked through any labs or deployed any solution manually on one of the various cloud providers, you’ve likely found that you need to deploy more resources and configurations than what may have been in your head. For instance, when creating an EC2 instance, you’ll also find yourself creating a Security Group and an IAM role consisting of IAM policies to interact with other services. That’s a relatively common example and most companies will have much more complex Cloud solutions making it easy to lose track of resources. Steps to redeploy everything would need to be captured in overly long runbooks and custom scripts would need to change and handle challenges such as idempotency for each resource.

To help combat this issue and make working in Cloud easier and more organized, various IaC tools allow have been created to challenge this. These IaC tools work by having user’s write declarative template code specifying the desired state of their resources (VMs, VPCs, Load Balancers, etc.) and passing it to the tool which will handle the steps of aligning actual resources with that state. Terraform is a popular (previously) open source tool that works with any of the major Cloud providers and integrations with any service willing to add a provider. In contrast, its main competitors have been tools like AWS’s CloudFormation, Microsoft’s Azure Resource Manager and Google’s Deployment Manager which provide a similar service for their respective Cloud provider.

The Basics of Terraform

To get started, you’ll need to install Terraform on a machine with permissions set to access your resources Terraform can generally pick up Cloud environment variables and credentials.

With Terraform installed, you’ll need to define your resources. As Terraform picks up all .tf files in the folder it runs in, it’s best to create a folder to manage all of your resources related to a project. While any resource can be any file, it’s common to put providers in main.tf, variable requirements in variables.tf, values to be output after a run in outputs.tf and another resources in .tf files named for their use. You can pass variables via .tfvars files as well. I’ll go into each of these resource types now.

Providers

Before declaring any resources, you’ll need to set a provider for Terraform to use which can interpret the resources and take actions as needed. For instance, the AWS provider allows access to resources in AWS while Google and Azure have their own providers.

Here’s an example from the AWS provider’s documentation showing how to add the provider. You’ll need both required_providers as well as a provider block in newer versions of Terraform.

# main.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

Resources

After configuring the AWS provider, you can access and deploy various AWS resource types. For instance, here’s an example of creating a VPC, subnet and EC2 instance.

# aws_resources.tf
resource "aws_vpc" "example" {
  cidr_block = "10.0.0.0/16"
}

resource "aws_subnet" "example" {
  vpc_id     = aws_vpc.example.id
  cidr_block = "10.0.1.0/24"
}

resource "aws_instance" "example" {
  instance_type = "t3.small"
  subnet_id     = aws_subnet.example.id
}

The resource types “aws_vpc“, “aws_subnet” and “aws_instance” are all made available by the AWS provider and the creation of each and references made to other resources can be made by referencing their properties in terraform. In this case, the “id” property from the example VPC is used for the subnet and the id of the subnet to put the EC2 instance in that subnet. Each resource type for each provider will have its own documentation page and you’ll often need to reference the documentation for the Cloud provider as well to understand the resource itself.

With those two files in the directory structure below, we’re ready to use Terraform deploy our first AWS resources.

example_module/
– main.tf
– aws_resources.tf

Deploying with Terraform

The Terraform deployment process consists of four main commands, terraform init, plan, apply and destroy. Before running any commands, make sure you’re in the directory with your Terraform resources (aws_example in this case).

Terraform Init

The first step of the deployment process is to run “terraform init” which will download any providers and modules (covered soon) needed to make sure Terraform is ready for a plan. It will create a .terraform folder in the current directory with the downloaded resources.

Terraform Plan

Next, we will generate and review a plan with the “terraform plan” command which will read the current state file (Terraform’s view of your resources), the state of your actual cloud resources, and your current Terraform code, and determines what it needs to change to make the Cloud and it’s state file match the resources defined in your Terraform code. This will display the output of Terraform’s planned changes and can also be passed to a file.

Terraform Apply

Finally, we apply the changes with “terraform apply” which will generate a plan and apply it after receiving approval from the user. Terraform will handle the API commands needed to make your cloud resources and Terraform state match what’s defined in your code. This command can also take a plan file as an argument which can be useful with automations.

Terraform Destroy

Lastly, we have “terraform destroy” which will remove all of the resources in Terraform’s state from the Cloud. While not useful for production as it destroys your cloud resources, this is commonly used when working with sandbox resources or testing with temporary resources and state files. Be careful with using this.

Running all four of the commands above would handle deploying and then removing all of the resources defined in your Terraform files.

Modularity with Terraform

While the steps above are enough to deploy resources to AWS, Terraform also provides tools for aiding with code reuse and customization via modules which group combinations of resources, variables to pass allowed options to the modules and outputs which create custom properties which can be referenced by calling modules.

Modules

Each folder containing Terraform code can be considered its own module with inputs and outputs. Common use cases for creating use cases are when you want to standardize an architecture or create a common pattern for using a cloud service, fixing the settings you want to enforce and allowing for variables for options which can be customized. For instance, you might want to have standardized modules for a single Cloud service such as a Lambda or EC2 instance which could take.

As an example let’s take the above code and use it as a module. We’ll convert the cidr blocks and instance type to variables.

# aws_resources.tf
resource "aws_vpc" "example" {
  cidr_block = var.vpc_cidr
}

resource "aws_subnet" "example" {
  vpc_id     = aws_vpc.example.id
  cidr_block = var.subnet_cidr
}

resource "aws_instance" "example" {
  instance_type = var.instance_type
  subnet_id     = aws_subnet.example.id
}

Add a variables.tf file

# variables.tf
variable "vpc_cidr" {
  type = string
}

variable "subnet_cidr" {
  type = string
}

variable "instance_type" {
  type = string
}

Now let’s add another folder with a main.tf to which calls this module.

example_module/
– main.tf
– aws_resources.tf
– variables.tf
calling_module/
– main.tf

Our new module’s main.tf or file of their choosing can call the submodule like so.

# calling_module/main.tf

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

module "example" {
  source = "../example_module"
  
  vpc_cidr      = "10.0.0.0/16"
  subnet_cidr   = "10.0.1.0/24"
  instance_type = "t3.small"
}

You can also set defaults for the variables which will be assumed if no arguments are passed.

# example_module/variables.tf
variable "vpc_cidr" {
  type    = string
  default = "10.0.0.0/16"
}

variable "subnet_cidr" {
  type    = string
  default = "10.0.1.0/24"
}

variable "instance_type" {
  type    = string
  default = "t3.small"
}

# calling_module/main.tf
...
module "example" {
  source = "../example_module"
}
...

Variables to a module can also be passed directly via the command line

terraform plan -var instance_type=t3.large

or by creating an .tfvars file

# example.tfvars
vpc_cidr      = "10.0.0.0/16"
subnet_cidr   = "10.0.1.0/24"
instance_type = "t3.large"

and passing it to terraform via

terraform plan -var-file example.tfvars

By using a single module block, you can call the same terraform code in different ways saving lots of duplication as your codebase gets bigger with default settings for common values.

Conditionals / Tips and Tricks

Modules and resources can also be conditional based on a variable being set or called for each object in a map using count and for_each meta-arguments respectively. Count can create multiple objects in the same configuration or be set to 0 to avoid creating a resource. For example, we could add an “instance_count” variable and create our instance as follows.

resource "aws_instance" "example" {
  count = var.instance_count

  instance_type = var.instance_type
  subnet_id     = aws_subnet.example.id
}

We could also set a local variable containing a map of different configurations for the child module like so

# main.tf
...
locals = {
  "vpc_configs" = {
    "main" = {
       vpc_cidr      = "10.0.0.0/16"
       subnet_cidr   = "10.0.1.0/24"
       instance_type = "t3.large"
    }
    "product_x" = {
       ...
    }
    "product_y" = {
       ...
    }
  }
}

module "example" {
  for_each = local.vpc_configs
  source = "../example_module"

  vpc_cidr      = each.value.vpc_cidr
  subnet_cidr   = each.value.subnet_cidr
  instance_type = each.value.instance_type
}

The last meta-argument worth touching on is the depends_on argument. Generally, Terraform is quite good at ordering the creation or resources based on whats needed. For instance, the example EC2 instance references the subnet and the subnet references the vpcs via properties on the created resources.

# aws_resources.tf
resource "aws_vpc" "example" {
  cidr_block = var.vpc_cidr
}

resource "aws_subnet" "example" {
  vpc_id     = aws_vpc.example.id
  cidr_block = var.subnet_cidr
}

resource "aws_instance" "example" {
  instance_type = var.instance_type
  subnet_id     = aws_subnet.example.id
}

Because of these references, Terraform knows to create the VPC, then the subnet and then the instance at it won’t have the values to pass to the referencing resource until the one it depends on is created. For the most part Terraform is able to handle ordering fine, but every once in awhile you’ll run into a situation where it doesn’t. For those times, you can use depends_on to control the order in which Terraform creates its resources.

resource "aws_instance" "example" {
  instance_type = var.instance_type
  subnet_id     = aws_subnet.example.id
  
  depends_on = aws_subnet.example
}

Working with the Terraform State

Finally, I wanted to touch Terraform’s state, with is a critical component that you may not have noticed so far. After deploying your Terraform resources, you’ll find a terraform.tfstate file present in your directory which contains Terraform’s view of the world. It contains references and details about all resources Terraform manages. When a plan is run, Terraform will compare your match your cloud resources with its state file to determine if anything has been changed outside Terraform along with what needs to be deployed. After changes are deployed, it will update the state file with the new state of the system.

Remote Backends

While Terraform will store the state file locally by default, a good practice (and necessity when working with a team) is to store the state file remotely. This can be handled by adding a backend block to the Terraform configuration in main.tf and Terraform provides support for backends for the various Cloud provider’s storage services. For instance, to store backend in S3, use the following.

# main.tf

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }

  backend {
    bucket = "example_bucket"
    key    = "key path"
    region = "us-east-1"
  }  
}
...

State Manipulation

Lastly, Terraform provides commands for interacting with the state itself which a developer using Terraform will find themselves needing to use at one point or another. Most of these commands start with “terraform state.” For informational purposes, “terraform state list” will show all resources Terraform has in its state while “terraform state show” takes a resource as an argument and describes Terraform’s view of the resource. Commands like “terraform import”, “terraform state mv” and “terraform state rm” actually manipulate the state.

A common concern that will come up when a company decides to start managing its cloud infrastructure with Terraform is what to do with existing manually created resources. Recreating the resource via Terraform may not be ideal but leaving it unmanaged isn’t either. “Terraform import” can be used to import an existing cloud resource into a terraform module to solve this problem. Here’s an example for an EC2 instance, but you’ll need to check the resource documentation for an imported resource to see what to provide from the Cloud side (ie. instance_id here).

terraform import module.example.aws_instance.example $instance_id

Cleaning up your terraform code and find a plan wanting to recreate resources because they’re under a different resource name? “Terraform state mv” moves a resource from an old path to the new one?

terraform state mv module.example.aws_instance.example module.new_example.aws_instance.example

“Terraform state rm” removes a module from the path and may be handy to know for some tough scenarios.

Conclusion

Hopefully this serves as a good introduction to working with Terraform. I’ve found it very helpful for organizing and working with Cloud Infrastructure. It serves both as automation and documentation and recovering from issues and deploying resources would likely lead to a lot more issues. Whenever I’m introducing someone to Terraform, these are the key concepts I like to teach and I hope they’ve served anyone reading in their process of learning and working with Terraform.

NikCreate