
terragrunt
by ionfury
Tom's Homelab mono repository
SKILL.md
name: terragrunt description: | Homelab infrastructure management with Terragrunt, OpenTofu, and Terraform patterns.
Use when: (1) Planning or applying infrastructure changes to dev/integration/live clusters, (2) Adding/modifying machines in inventory.hcl, (3) Creating or updating units and stacks, (4) Working with feature flags, (5) Running validation (fmt, validate, test, plan), (6) Understanding the units→stacks→modules architecture, (7) Working with HCL configuration files, (8) Bare-metal Kubernetes provisioning or Talos configuration.
Triggers: "terragrunt", "terraform", "opentofu", "tofu", "infrastructure code", "IaC", "inventory.hcl", "networking.hcl", "HCL files", "add machine", "add node", "cluster provisioning", "bare metal", "talos config", "task tg:", "infrastructure plan", "infrastructure apply", "stacks", "units", "modules architecture"
Always use task commands (task tg:*) instead of running terragrunt directly.
Terragrunt Infrastructure Skill
Manage bare-metal Kubernetes infrastructure from PXE boot to running clusters.
Architecture Overview
stacks/ → Cluster deployments (dev, integration, live)
└── terragrunt.stack.hcl → Defines units and passes values
units/ → Reusable Terragrunt wrappers
└── terragrunt.hcl → Declares dependencies, passes inputs to modules
modules/ → Pure Terraform/OpenTofu code
└── *.tf → Resources, variables, outputs
Dependency chain: config → unifi / talos → bootstrap / aws-set-params
The config unit is the brain—reads all .hcl config files and outputs structured data consumed by other units.
Task Commands (Always Use These)
# Validation (run in order)
task tg:fmt # Format HCL files
task tg:test-<module> # Test specific module (e.g., task tg:test-config)
task tg:validate-<stack> # Validate stack (e.g., task tg:validate-integration)
# Operations
task tg:list # List available stacks
task tg:plan-<stack> # Plan (e.g., task tg:plan-integration)
task tg:apply-<stack> # Apply (REQUIRES HUMAN APPROVAL)
task tg:gen-<stack> # Generate stack files
task tg:clean-<stack> # Clean generated files
NEVER run terragrunt or tofu directly—always use task commands.
Stack Definition (terragrunt.stack.hcl)
locals {
name = "${basename(get_terragrunt_dir())}" # "integration"
features = ["gateway-api", "longhorn", "prometheus", "spegel"]
}
unit "config" {
source = "../../units/config"
path = "config"
values = {
name = local.name
features = local.features
}
}
unit "talos" {
source = "../../units/talos"
path = "talos"
}
source: Path to unit directorypath: Output path in.terragrunt-stack/values: Data passed to unit'svalues.*references
Unit Definition (terragrunt.hcl)
locals {
networking_vars = read_terragrunt_config(find_in_parent_folders("networking.hcl"))
inventory_vars = read_terragrunt_config(find_in_parent_folders("inventory.hcl"))
}
include "root" {
path = find_in_parent_folders("root.hcl")
}
terraform {
source = "../../../.././/modules/config"
}
dependency "config" {
config_path = "../config"
mock_outputs = { ... }
mock_outputs_allowed_terraform_commands = ["validate", "plan"]
}
inputs = {
name = values.name # From stack's values block
networking = local.networking_vars.locals.clusters[values.name]
}
Key patterns:
read_terragrunt_config()reads sibling.hclfilesvalues.*accesses data from stack'svalues = { }blockdependency.*accesses outputs from prerequisite unitsmock_outputsenables planning without applied dependencies
Configuration Files (Source of Truth)
| File | Purpose | Example Data |
|---|---|---|
inventory.hcl | Hardware (nodes, MACs, IPs, disks) | node41 = { cluster = "live", type = "controlplane", ... } |
networking.hcl | Network topology per cluster | live = { vip = "192.168.10.20", pod_subnet = "172.18.0.0/16" } |
versions.hcl | Pinned software versions | talos = "v1.12.1", kubernetes = "1.34.0" |
accounts.hcl | External service credentials | SSM paths for secrets, not values |
NEVER hardcode values that exist in these files—use read_terragrunt_config().
Common Tasks
Add a Machine
- Edit
inventory.hcl:
node50 = {
cluster = "live"
type = "worker"
install = {
selector = "disk.model == 'Samsung'"
architecture = "amd64"
}
interfaces = [{
id = "eth0"
hardwareAddr = "aa:bb:cc:dd:ee:ff" # VERIFY correct
addresses = [{ ip = "192.168.10.50" }] # VERIFY available
}]
}
- Run
task tg:plan-live - Review plan—config module auto-includes machines where
cluster == "live" - Request human approval before apply
Add a Feature Flag
- Add version to
versions.hclif needed - Add feature detection in
modules/config/main.tf:
locals {
new_feature_enabled = contains(var.features, "new-feature")
}
- Enable in stack's features list:
features = ["gateway-api", "longhorn", "new-feature"]
Create a New Unit
- Create
units/new-unit/terragrunt.hcl:
include "root" {
path = find_in_parent_folders("root.hcl")
}
terraform {
source = "../../../.././/modules/new-unit"
}
dependency "config" {
config_path = "../config"
mock_outputs = { new_unit = {} }
}
inputs = dependency.config.outputs.new_unit
- Create corresponding
modules/new-unit/withvariables.tf,main.tf,outputs.tf,versions.tf - Add output from config module
- Add
unitblock to stacks that need it
Module Testing
Tests use OpenTofu native testing in modules/<name>/tests/*.tftest.hcl:
# Top-level variables set defaults for ALL run blocks
variables {
name = "test-cluster"
features = ["gateway-api"]
machines = {
node1 = {
cluster = "test-cluster"
type = "controlplane"
# ... complete machine definition
}
}
}
run "feature_enabled" {
command = plan
variables {
features = ["prometheus"] # Only override what differs
}
assert {
condition = output.prometheus_enabled == true
error_message = "Prometheus should be enabled"
}
}
Run with task tg:test-config or task tg:test for all modules.
Safety Rules
- NEVER run apply without explicit human approval
- NEVER use
--auto-approveflags - NEVER guess MAC addresses or IPs—verify against
inventory.hcl - NEVER commit
.terragrunt-cache/or.terragrunt-stack/ - NEVER manually edit Terraform state
State Operations
When removing state entries with indexed resources (e.g., this["rpi4"]), xargs strips the quotes causing errors. Use a while loop instead:
# WRONG - xargs mangles quotes in resource names
terragrunt state list | xargs -n 1 terragrunt state rm
# CORRECT - while loop preserves quotes
terragrunt state list | while read -r resource; do terragrunt state rm "$resource"; done
This applies to any state operation on resources with map keys like data.talos_machine_configuration.this["rpi4"].
Validation Checklist
Before requesting apply approval:
-
task tg:fmtpasses -
task tg:testpasses (if module tests exist) -
task tg:validatepasses for ALL stacks -
task tg:plan-<stack>reviewed - No unexpected destroys in plan
- Network changes won't break connectivity
References
Score
Total Score
Based on repository quality metrics
SKILL.mdファイルが含まれている
ライセンスが設定されている
100文字以上の説明がある
GitHub Stars 100以上
1ヶ月以内に更新
10回以上フォークされている
オープンIssueが50未満
プログラミング言語が設定されている
1つ以上のタグが設定されている
Reviews
Reviews coming soon


