Terraform Operational Best Practices 2019 ~Things Like Ditching workspace and More~
2020-05-05 Update I have published an updated set of best practices for spring 2020.
I previously wrote about how to manage tfstate per workspace with terraform, but there were several problems in actual operation.
In conclusion, I have now stopped using workspace-based operations.
Example of workspace-based operation
First, let me show an actual example of operation.
I'm sure there are people who say "I do it in a much smarter way!", but for now let me introduce a common case.
Example) Creating a security group
Let's say we want to create a security group that satisfies the following requirements.
Requirements
- In stg, access is allowed only from the in-office Wifi IP
- In prd, access is allowed without IP restrictions
Sample code
- variables.tf
variable "ips" {
type = "map"
default = {
stg.cidrs = "12.145.67.89/32,22.145.67.89/32"
prod.cidrs = "0.0.0.0/0"
}
}
- security_group.tf
resource "aws_security_group" "hoge" {
name = "${terraform.workspace}-hoge-sg"
vpc_id = "${aws_vpc.vpc_main.id}"
}
resource "aws_security_group_rule" "https" {
security_group_id = "${aws_security_group.hoge.id}"
type = "ingress"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["${split(",", lookup(var.ips, "${terraform.workspace}.cidrs"))}"]
}
resource "aws_security_group_rule" "https" {
security_group_id = "${aws_security_group.hoge.id}"
type = "egress"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
Before actually running terraform plan/apply, you first need to define a terraform workspace.
terraform workspace new stg // This errors out if it has already been created. terraform workspace select stg // Manages the tfstate for terraform workspace = stg in local memory. terraform init
Only after the processing above can `stg.cidrs` and `prd.cidrs` of `variable "ips"` finally be used.
When I tried to operate this way, I ran into the following kinds of problems.
It doesn’t fit well with real-world operations
How should you operate when you want to apply changes only to staging?
What happens if you configure settings for staging and production, the pull request is approved, and the sample code gets merged into master?
It looks like it would be fine to deploy to production too.
No, on the contrary, if it hasn't been applied, it would be confusing.
After that, when you have code you want to apply to production by merging it into master, even if you say "I don't want the sample code part applied!", it gets applied anyway.
On the other hand, inserting code like the following into multiple resources increases the number of unnecessary steps and consumes mental bandwidth. It's also painful to review.
count = "${terraform.workspace == "stg" ? 1: 0}"
Well then, you might say "just don't configure the production one!", but if you don't configure it, the production side starts throwing errors, and you become unable to apply anything else at all.
This happens because both staging and production reference the same file.
Also, when using workspace, there were problems like the following.
When you want to add a new workspace other than stg and prd
What about when you get the following requests?
- "Please prepare an environment identical to production for load testing."
- "I want to do an integration test with an external API, so I'd like you to spin up a separate environment!"
For example, if you try to prepare a load testing environment and create a workspace called loadtst, you need to modify variables.tf as follows.
variable "ips" {
type = "map"
default = {
loadtst.cidrs = "12.145.67.89/32,22.145.67.89/32" // added
stg.cidrs = "12.145.67.89/32,22.145.67.89/32"
prod.cidrs = "0.0.0.0/0"
}
}
In the example above, you only need to add one line to variable "ips", but in reality you need to add code like `loadtst.*** = ***` to every single variable.
Each time a workspace is added, the number of steps grows and the file becomes harder to follow.
Also, when you have code like the following, it likewise consumes mental bandwidth and wears you down.
lookup(var.ips, "${terraform.workspace}.cidrs")
"${terraform.workspace == "stg" ? hoge: moge}"
Summarizing workspace-based operation
Because workspace usage assumes operation by sharing resources across multiple environments, there was degraded readability and a divergence from real-world operation.
When adding a new workspace, you have to add it to every variable map.
→ The code becomes harder to follow.
→ The difficulty of building a new environment increases.Real-world operation is difficult when you want to apply to staging only.
→ Because both staging and production reference the same file, you end up needing to branch the logic within the file with "what if it's staging?".It's hard to tell which workspace you're currently in, so you hesitate quite a bit when running terraform apply.
→ Even if you actually check the workspace with `terraform workspace show` before running `terraform apply`, after a little time passes during execution, you get anxious thinking "wait, which one was it again?", and there were times I had to scroll back through the Terminal to check.
So what’s the better approach?
Thoroughly abandon workspace.
= Let's go with a DRY design!
That's the bottom line.
Here is a summary of what I actually did.
I structured the directory layout as follows.
modules/common ... Place resources that are created with the same configuration commonly across both stg and prd environments.
modules/stg,prd ... Place resources whose configurations differ individually.*1
.
├── README.md
├──envs/
│ ├── prd
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ ├── provider.tf
│ │ ├── region.tf
│ │ ├── templates
│ │ │ └── user-data.tpl
│ │ └── variable.tf
│ └──stg/
│ ├── backend.tf
│ ├── main.tf
│ ├── provider.tf
│ ├── region.tf
│ ├── templates
│ │ └── user-data.tpl
│ └── variable.tf
│
└──modules
├── common
│ ├── bastion.tf
│ ├── bucket_logs.tf
│ ├── bucket_static.tf
│ ├── certificate.tf
│ ├── cloudfront.tf
│ ├── cloudwatch.tf
│ ├── codebuild.tf
│ ├── codepipeline.tf
│ ├── network.tf
│ ├── output.tf
│ ├── rds.tf
│ ├── redis.tf
│ ├── security_group.tf
│ └── variable.tf
├── prd
│ ├── admin.tf
│ ├── admin_autoscaling_policy.tf
│ ├── api.tf
│ ├── app.tf
│ ├── ecr.tf
│ ├── iam_ecs.tf
│ ├── output.tf
│ ├── variable.tf
│ └── waf.tf
└── stg
├── admin.tf
├── api.tf
├── app.tf
├── ecr.tf
├── iam_ecs.tf
├── output.tf
├── variable.tf
└── waf.tf
What does it look like using the earlier security group creation as an example
It looks like the following.
- envs/prd/variables.tf
variable "cidrs" {
default = [
"0.0.0.0/0",
]
}
- envs/stg/variables.tf
variable "cidrs" {
default = [
"12.145.67.89/32",
"22.145.67.89/32",
]
}
- envs/common/security_group.tf
resource "aws_security_group" "hoge" {
name = "${terraform.workspace}-hoge-sg"
vpc_id = "${aws_vpc.vpc_main.id}"
}
resource "aws_security_group_rule" "https" {
security_group_id = "${aws_security_group.hoge.id}"
type = "ingress"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["${var.cidrs"))}"]
}
resource "aws_security_group_rule" "https" {
security_group_id = "${aws_security_group.hoge.id}"
type = "egress"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
If it's a security group you want to apply only to stg, write the security group you want to create in `envs/stg/security_group.tf`.
This covers the real-world operation of applying to stg only.
Also, if you want to prepare an environment called the load testing environment ( `loadtst` ), you just copy it as follows and modify the variables.
- `envs/prd` → `envs/loadtst`
- `modules/prd` → `modules/loadtst`
Even if there are some configuration changes, you can create it so that loadtst-related resources never affect prd or stg.
terraform coding rules
The rule is not to use code that relies on workspace switching like the following.
lookup(var.ips, "${terraform.workspace}.cidrs")
"${terraform.workspace == "stg" ? hoge: moge}"
The following is also disallowed. If only stg differs, you should split it into modules/stg,prd.
"${var.env == "stg" ? hoge: moge}"
terraform execution procedure
To build each of the stg and prd environments, move into the `envs/stg` or `envs/prd` directory and run the following.
terraform init terraform get -update terraform plan terraform apply
Handling AWS credentials
When using the same AWS Account for stg and prd, I think it's best to place a `.envrc` (e.g. with direnv) at the project root and operate from there.
When using different AWS Accounts for stg and prd, place a `.envrc` under each `envs/(stg,prd)` and run the `terraform execution procedure` above.
Dealing with differences in terraform versions per project
Handle this with tfenv.
macOS%$ brew install tfenv
In my previous article, I ran terraform in a one-off container to absorb version differences, but the commands got long and management became cumbersome, so tfenv is preferable.
This too is my honest impression after operating it.
Other
This is more of a "it's recommended to do this?" level, but it was better to remove the version pinning on the provider.
provider aws {
version = "1.54.0"
region = "ap-northeast-1"
}
When it's pinned, there are times you can't use the latest resources.*2
In that case, instead of pinning the version, it's better to fix it in the direction of keeping it updated, so you can follow the latest.
Overall assessment
This is a summary of what I felt after actually operating it: that it might be better to avoid workspace.
Of course, I also think this opinion comes from not fully knowing the merits of workspace, so I have no intention of denying it outright.
Once I get the repository organized, I plan to publish what I can at this stage!
That's all. I hope this becomes useful knowledge for those operating Terraform.

Infrastructure as Code ―クラウドにおけるサーバ管理の原則とプラクティス
- 作者: Kief Morris,宮下剛輔,長尾高弘
- 出版社/メーカー: オライリージャパン
- 発売日: 2017/03/18
- メディア: 単行本(ソフトカバー)
- この商品を含むブログ (2件) を見る
Terraform Operational Best Practices 2019 ~Things Like Ditching workspace and More~
https://kenzo0107.github.io/en/2019/04/17/terraform-2019-workspace/
