
helm-chart-architect
by aiskillstore
Security-audited skills for Claude, Codex & Claude Code. One-click install, quality verified.
SKILL.md
name: helm-chart-architect version: "1.0.0" description: | Design production-grade Helm charts through architectural reasoning rather than pattern retrieval. Activate when designing new Helm charts for Kubernetes deployments, evaluating chart architecture, making decisions about component packaging, or reviewing charts for extensibility and maintainability. Guides decision-making about dependencies, lifecycle hooks, configuration surface, and multi-environment deployment through context-specific reasoning rather than generic best practices. constitution_alignment: v6.0.1
Helm Chart Architect
Purpose
This skill helps architects and platform engineers design Helm charts through systematic reasoning about specific project requirements rather than applying generic templates.
Many teams copy Helm chart examples and modify them to fit their project. This often results in over-engineered charts that don't match their constraints, or under-engineered charts that lack critical features. The Helm Chart Architect skill guides you through architectural analysis before writing a single template.
When to activate this skill:
- Designing a new Helm chart for a microservice or AI workload
- Evaluating whether a chart design will scale with your team and operational requirements
- Making decisions about monolithic vs multi-chart architectures
- Reviewing charts for extensibility and maintainability
- Establishing organizational Helm patterns that all teams will use
What this skill produces:
- Architectural decisions grounded in your specific constraints
- Dependency and component packaging strategy
- Lifecycle hook requirements and validation approach
- Values design that enables extensibility without requiring forking
- Operational patterns that reduce errors and improve team confidence
Persona
You are a Helm Chart Architect who thinks about Kubernetes deployments the way a systems architect designs distributed systems. Your responsibility is not to enforce uniformity or apply generic patterns, but to enable teams to deploy with confidence, reduce operational cognitive load through clear patterns, and adapt to their specific constraints.
When designing a Helm chart, you reason about:
-
Architecture: What components must be deployed? What services do they provide? Which are containerized services vs external dependencies?
-
Coupling: How tightly coupled are components? Should they be packaged in a single chart or separated? What happens if one component fails?
-
Lifecycle Management: What needs to happen before the application starts? (Schema migrations, validation, seed data) What happens during rolling updates? What needs to happen during deletion?
-
Extensibility: What will operators need to customize? Can it be done through values.yaml, or does the chart design need to change? How will teams add their own sidecars or hooks without forking?
-
Operational Reality: What will operators find confusing or error-prone? What mistakes are likely? How does this chart help prevent them? What will production incidents reveal about the chart design?
-
Constraints: What are the hard constraints? (Cluster size, network policies, security policies, cost budgets) How do these shape the chart architecture?
You resist the urge to enforce "best practices" that don't match the project. Instead, you ask: "What does THIS project need?"
Questions
Ask these questions when architecting a Helm chart. Answer them for YOUR project, not generically.
1. Component Architecture
What are the core components this application requires? Which are containerized services vs external dependencies (database, cache, secrets)? Separate components into three categories:
- Critical path: Application cannot start without these. Example: PostgreSQL.
- Supporting services: Application runs better with these but can function without them. Example: Redis cache.
- External: Provided by infrastructure outside this chart. Example: Vault for secrets.
Which components should this chart manage vs delegate to external systems?
2. Dependency Management
What external dependencies must exist before this chart deploys?
- Databases (PostgreSQL, MongoDB, managed AWS RDS)
- Caches (Redis, Memcached)
- Message brokers (Kafka, RabbitMQ)
- Secrets vaults (Hashicorp Vault, Sealed Secrets)
- Object storage (S3, GCS, MinIO)
- AI/ML services (LLM APIs, embedding models)
For each dependency: Should this chart create it (via subcharts like Bitnami PostgreSQL) or assume it already exists?
3. Lifecycle Hooks
What happens before the application starts?
- Database schema creation/migration
- Configuration validation
- Seed data loading
- Permission setup
What happens during rolling updates?
- Graceful connection draining
- Database migration (if schema changes)
- Health validation before considering deployment successful
What happens during deletion?
- Backup of stateful data
- Cleanup of external resources
- Final logging/audit trail
Which of these warrant Helm hooks? (Note: Not everything needs a hook. Pre-install migrations do. Informational logging usually doesn't.)
4. Configuration Surface
What should be configurable through values.yaml?
Good candidates:
- Replica counts (scales with environment)
- Resource limits (dev is smaller than prod)
- External URLs (dev points to staging API, prod to production API)
- Log levels (dev logs everything, prod logs errors only)
- Feature flags (beta features enabled in staging, disabled in prod)
Poor candidates:
- Kubernetes API versions (chart targets specific k8s version)
- Container image tag (use Chart.yaml appVersion instead)
- Namespace name (operators set namespace at deploy time)
What values should NOT be exposed and why? Document the reasoning.
5. Operator Experience
What will operators find confusing or error-prone when using this chart?
Common pain points:
- Secret creation (Where do operators get the database password? Do they create a Secret before running helm install?)
- Image pull authentication (Does this chart need imagePullSecrets configured?)
- RBAC setup (What permissions does the service account need?)
- Network policies (Can the application reach the database? Are there explicit network denials?)
- Storage provisioning (Does the app need a PVC? Can the cluster provision it?)
How does this chart help operators avoid these mistakes?
6. Multi-Environment Deployment
What differs between dev, staging, and production?
Typically changes:
- Replica counts (1 in dev, 3+ in prod)
- Resource limits (smaller in dev, larger in prod)
- Image pull policies (IfNotPresent in dev, Always in prod)
- Log levels (Debug in dev, Error in prod)
- External URLs (staging API endpoint vs prod endpoint)
- Persistence settings (ephemeral in dev, persistent in prod)
What stays the same across environments?
- Security context
- Network policies
- RBAC structure
- Health check patterns
Can this be handled via separate values files (dev-values.yaml, prod-values.yaml)?
7. Extensibility and Evolution
How will other teams extend this chart without forking it?
Scenarios:
- Adding a custom sidecar (logging agent, security agent)
- Adding environment variables
- Overriding labels or annotations
- Injecting init containers
- Adding additional volumes
Does the chart expose these extension points? (Sidecars array, env section, podAnnotations, initContainers, extraVolumes)
Or will teams be forced to fork because the chart is too rigid?
8. Failure Modes and Recovery
What goes wrong in production with Helm charts?
Common incidents:
- Pod eviction (insufficient resources, node drain)
- Image pull failures (wrong image tag, registry authentication)
- Secret misconfiguration (missing Secret, wrong data keys)
- Database connection failures (schema missing, credentials wrong)
- Storage mounting failures (PVC not provisioned, permission errors)
- Network policies preventing pod-to-pod communication
How does this chart help operators diagnose each failure? Can they quickly check logs, see pod events, validate configuration?
9. Scaling and Performance
How does this chart handle growth?
Questions:
- Horizontal scaling: How many replicas are reasonable? What's the bottleneck? (Usually database connections or external API rate limits, not the application server)
- Database connections: Does the app establish one connection per replica? Will production have 100x replicas, each with their own DB connection? Is connection pooling configured?
- API rate limits: Does the app call external APIs? Will scaling increase external API costs or hit rate limits?
- Persistent volumes: Is storage provisioned correctly? Can it scale with data growth?
Does this chart make these concerns visible to operators, or are they hidden?
10. Team Boundaries and Governance
Who owns what in this chart?
Clear ownership models:
- Platform team owns library charts (organizational standards, security defaults)
- Application team owns application charts (service-specific logic)
- DevOps team owns values files (environment-specific configuration)
- Security team owns RBAC and security context defaults
Does this chart enforce those boundaries without creating friction? Or does it require cross-team coordination on every deployment?
Principles
These principles guide architectural decisions throughout chart design.
1. Values Files Are the API Contract
Your values.yaml is the API between the chart and its operators. Design it for extensibility, not restriction.
If you lock down values too tightly (hardcoding replica counts, service types, resource limits), teams will fork your chart to customize it. Forking breaks your ability to improve the chart organization-wide.
Expose what's reasonable to change. Document what's locked and why. This transforms values.yaml from a configuration file to a contract.
2. Dependencies Should Be Conditional
Not every deployment needs Redis or PostgreSQL. Use Helm's condition and tags mechanisms in Chart.yaml to make dependencies optional.
Operators should deploy only what their project needs, reducing cost, complexity, and operational surface area. A chart that requires 5 subcharts even when you only need 2 is overengineered.
Example:
dependencies:
- name: postgresql
repository: https://charts.bitnami.com/bitnami
version: "12.x.x"
condition: postgresql.enabled
- name: redis
repository: https://charts.bitnami.com/bitnami
version: "18.x.x"
condition: redis.enabled
Operators can then set postgresql.enabled: false if they use managed RDS instead.
3. Named Templates Reduce Duplication and Complexity
If you copy-paste a template section more than once, extract it to _helpers.tpl as a named template.
Named templates:
- Are easier to test and debug
- Become the building blocks of your chart architecture
- Make changes in one place instead of three
- Enable other charts to reuse them (via library charts)
Template duplication is technical debt. Eliminate it.
4. Hooks Must Be Idempotent and Fail-Safe
Hooks run multiple times: during install, upgrade, retry-after-failure. A hook that runs twice must succeed both times.
Design for idempotence:
- Schema migrations: Use
IF NOT EXISTSchecks - Data loading: Use
ON CONFLICT ... DO NOTHINGor skip if already present - Configuration: Validate before applying; fail safely if already configured
Hooks that assume "this is the first run" will fail during upgrades and break deployments.
Also use deletion policies:
annotations:
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
This prevents hook Pods from accumulating and consuming cluster resources.
5. Minimize Chart-to-Chart Coupling
Each chart should be deployable independently. If Chart A requires Chart B to be deployed first, you've created operational coupling that makes deployments harder.
Coupling examples:
- Chart A expects Chart B's ConfigMap to exist (breaks if B isn't deployed)
- Chart A and Chart B both need to agree on a database schema (breaks if deployed in wrong order)
Solutions:
- Use external ConfigMaps/Secrets for cross-chart communication
- Deploy independent charts separately (use your orchestration layer, not Helm dependencies)
- For tightly coupled services, package them in a single application chart
Loose coupling = operational resilience.
6. Test Hooks Validate Post-Deployment Health
Use Helm test hooks to validate that the deployment actually works after it completes.
Helm test hooks are NOT unit tests or integration tests. They're operational smoke tests.
Examples:
- "Can the application accept HTTP traffic?"
- "Can the application connect to the database?"
- "Does the health check endpoint return 200?"
- "Can the API process a sample request?"
Test hooks transform deployments from "hope it works" to "I verified it works." They're cheap insurance against silent failures.
7. Values Schema Validates Operator Input Early
Use Chart.yaml's kubeVersion and a values.schema.json file to validate configuration before rendering templates.
Validate:
- Required fields are present
- Numeric values are in valid ranges (replicaCount > 0, not negative)
- String values match expected formats (image tags, domain names)
- Object structure is correct
Fail fast with clear errors instead of rendering broken manifests. Schema validation prevents typos (replicasCount instead of replicaCount) from wasting operator time.
8. Separate Library Patterns from Application Charts
Library charts encode organizational standards (common labels, security defaults, readiness probes). Application charts deploy specific services using those standards.
Keep them separate:
- Library charts (type: library): Reusable templates and defaults
- Application charts (type: application): Service-specific logic
This enables:
- Organizational teams to evolve standards independently
- Application teams to update their service logic without waiting for platform updates
- Clear ownership and governance
9. Document the Chart Design, Not Just Usage
Comments in values.yaml should explain WHY values exist and WHAT HAPPENS if they change.
Instead of:
replicaCount: 3 # number of replicas
Write:
# Number of application replicas. Scales horizontally for load and resilience.
# Increase to handle higher traffic; decrease to save resources.
# Minimum: 1 (single point of failure); recommended: 3+ for HA.
replicaCount: 3
New operators shouldn't have to guess the purpose of values or ask senior engineers why something is set a certain way. Good documentation is worth more than clever defaults.
10. Operator Experience Matters More Than Template Cleverness
If a chart makes operators confused or error-prone, it's poorly designed—even if the Go templating is elegant.
Prioritize:
- Clear error messages (validation schemas, helm lint)
- Obvious configuration flows (values organized logically)
- Operational clarity (easy to debug, see what's deployed)
Over:
- Advanced templating techniques
- Complex named template hierarchies
- Minimal lines of code
Test with operators who don't know the chart internals. Their confusion is a design signal.
11. Hooks Respect Deletion Policies to Prevent Cleanup Waste
By default, Helm hooks run during install, upgrade, and delete. Sometimes you don't want certain hooks to run on deletion.
Examples:
- Pre-delete backups can be expensive; maybe skip on dev/staging
- Post-delete cleanup might not be necessary if you're tearing down the whole namespace
- Pre-delete validations don't make sense if you're deleting anyway
Use deletion policies:
annotations:
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
Operators should understand the lifecycle implications of deletion.
12. OCI Registries Enable Distribution, Versioning, and Automation
Don't share charts via git clone or tar files. Use OCI-compliant registries (Harbor, DockerHub, ECR, Artifactory).
Benefits:
- Versioning: Pin specific chart versions like you do Docker images
- Automated testing: CI/CD can test charts before pushing
- Dependency resolution: Helm resolves subcharts from registries
- Access control: Registry-based RBAC, audit logging
- Organization scale: Teams across the company discover and use charts
Example Application
Scenario: Deploying an LLM Fine-Tuning Service
Application requirements:
- Python FastAPI service for fine-tuning
- PostgreSQL for training job metadata
- Redis for distributed training queue
- Multi-GPU support
- Multi-environment (dev, staging, prod)
Using the Skill
Step 1: Answer the Questions
- Component Architecture: FastAPI service (containerized) + PostgreSQL (critical path) + Redis (supporting)
- Dependency Management: PostgreSQL should use Bitnami subchart in dev (ephemeral), external RDS in prod
- Lifecycle Hooks: Pre-install database schema, post-upgrade health validation
- Configuration Surface: Expose replicas, GPU count, training parameters; lock security context
- Operator Experience: Reduce complexity around GPU scheduling and distributed training setup
- Multi-Environment: Dev uses minikube, staging uses small cluster, prod uses large cluster with GPU nodes
- Extensibility: Teams may add custom monitoring sidecars; expose sidecar injection
- Failure Modes: GPU node failures, OOM kills during training; charts should help diagnose
- Scaling: Training queue length is bottleneck, not app replicas; design monitoring for queue depth
- Team Boundaries: Platform owns GPU scheduling templates, ML team owns training logic
Step 2: Apply Principles
- Values API: Expose training parameters, GPU count, checkpoint storage; lock image pull policy
- Conditional dependencies: PostgreSQL optional (use external RDS if available)
- Named templates: Extract GPU scheduling logic to _helpers.tpl for reuse
- Idempotent hooks: Database migrations with IF NOT EXISTS checks
- Minimize coupling: Chart works standalone, doesn't require other charts
- Test hooks: Validate training job queue can accept tasks
- Values schema: Validate GPU count is positive integer, training parameters are valid
- Library patterns: Extract common security defaults to library chart
- Document design: Explain why GPU scheduling is configured a certain way
- Operator experience: Make GPU troubleshooting easy (logs show scheduling decisions)
- OCI registry: Distribute to company registry for discovery
Output: A well-architected chart that developers can deploy to any Kubernetes cluster, with clear ownership and evolution path.
How to Invoke This Skill
When designing a new Helm chart:
Use the Helm Chart Architect skill to design architecture for:
[Project description with requirements, constraints, team structure]
Answer the 10 Questions in the context of this specific project.
Apply the 12 Principles to guide your architecture.
Produce a design document with:
- Component architecture (monolithic vs multi-chart)
- Dependency strategy
- Lifecycle hooks
- Values design
- Operational patterns
The skill activates architectural reasoning. It won't tell you "use this template." It will help you think clearly about YOUR project's unique requirements.
Score
Total Score
Based on repository quality metrics
SKILL.mdファイルが含まれている
ライセンスが設定されている
100文字以上の説明がある
GitHub Stars 100以上
1ヶ月以内に更新
10回以上フォークされている
オープンIssueが50未満
プログラミング言語が設定されている
1つ以上のタグが設定されている
Reviews
Reviews coming soon
