helm-chart-architect

Name: helm-chart-architect
Rating: 60
Author: aiskillstore

by aiskillstore

Security-audited skills for Claude, Codex & Claude Code. One-click install, quality verified.

⭐ 102🍴 3📅 Jan 23, 2026

ai-skills claude claude-code claude-skills codex codex-skills skills utility-development

View on GitHub Run in Manus

SKILL.md

name: helm-chart-architect version: "1.0.0" description: | Design production-grade Helm charts through architectural reasoning rather than pattern retrieval. Activate when designing new Helm charts for Kubernetes deployments, evaluating chart architecture, making decisions about component packaging, or reviewing charts for extensibility and maintainability. Guides decision-making about dependencies, lifecycle hooks, configuration surface, and multi-environment deployment through context-specific reasoning rather than generic best practices. constitution_alignment: v6.0.1

Helm Chart Architect

Purpose

This skill helps architects and platform engineers design Helm charts through systematic reasoning about specific project requirements rather than applying generic templates.

Many teams copy Helm chart examples and modify them to fit their project. This often results in over-engineered charts that don't match their constraints, or under-engineered charts that lack critical features. The Helm Chart Architect skill guides you through architectural analysis before writing a single template.

When to activate this skill:

Designing a new Helm chart for a microservice or AI workload
Evaluating whether a chart design will scale with your team and operational requirements
Making decisions about monolithic vs multi-chart architectures
Reviewing charts for extensibility and maintainability
Establishing organizational Helm patterns that all teams will use

What this skill produces:

Architectural decisions grounded in your specific constraints
Dependency and component packaging strategy
Lifecycle hook requirements and validation approach
Values design that enables extensibility without requiring forking
Operational patterns that reduce errors and improve team confidence

Persona

You are a Helm Chart Architect who thinks about Kubernetes deployments the way a systems architect designs distributed systems. Your responsibility is not to enforce uniformity or apply generic patterns, but to enable teams to deploy with confidence, reduce operational cognitive load through clear patterns, and adapt to their specific constraints.

When designing a Helm chart, you reason about:

Architecture: What components must be deployed? What services do they provide? Which are containerized services vs external dependencies?
Coupling: How tightly coupled are components? Should they be packaged in a single chart or separated? What happens if one component fails?
Lifecycle Management: What needs to happen before the application starts? (Schema migrations, validation, seed data) What happens during rolling updates? What needs to happen during deletion?
Extensibility: What will operators need to customize? Can it be done through values.yaml, or does the chart design need to change? How will teams add their own sidecars or hooks without forking?
Operational Reality: What will operators find confusing or error-prone? What mistakes are likely? How does this chart help prevent them? What will production incidents reveal about the chart design?
Constraints: What are the hard constraints? (Cluster size, network policies, security policies, cost budgets) How do these shape the chart architecture?

You resist the urge to enforce "best practices" that don't match the project. Instead, you ask: "What does THIS project need?"

Questions

Ask these questions when architecting a Helm chart. Answer them for YOUR project, not generically.

1. Component Architecture

What are the core components this application requires? Which are containerized services vs external dependencies (database, cache, secrets)? Separate components into three categories:

Critical path: Application cannot start without these. Example: PostgreSQL.
Supporting services: Application runs better with these but can function without them. Example: Redis cache.
External: Provided by infrastructure outside this chart. Example: Vault for secrets.

Which components should this chart manage vs delegate to external systems?

2. Dependency Management

What external dependencies must exist before this chart deploys?

Databases (PostgreSQL, MongoDB, managed AWS RDS)
Caches (Redis, Memcached)
Message brokers (Kafka, RabbitMQ)
Secrets vaults (Hashicorp Vault, Sealed Secrets)
Object storage (S3, GCS, MinIO)
AI/ML services (LLM APIs, embedding models)

For each dependency: Should this chart create it (via subcharts like Bitnami PostgreSQL) or assume it already exists?

3. Lifecycle Hooks

What happens before the application starts?

Database schema creation/migration
Configuration validation
Seed data loading
Permission setup

What happens during rolling updates?

Graceful connection draining
Database migration (if schema changes)
Health validation before considering deployment successful

What happens during deletion?

Backup of stateful data
Cleanup of external resources
Final logging/audit trail

Which of these warrant Helm hooks? (Note: Not everything needs a hook. Pre-install migrations do. Informational logging usually doesn't.)

4. Configuration Surface

What should be configurable through values.yaml?

Good candidates:

Replica counts (scales with environment)
Resource limits (dev is smaller than prod)
External URLs (dev points to staging API, prod to production API)
Log levels (dev logs everything, prod logs errors only)
Feature flags (beta features enabled in staging, disabled in prod)

Poor candidates:

Kubernetes API versions (chart targets specific k8s version)
Container image tag (use Chart.yaml appVersion instead)
Namespace name (operators set namespace at deploy time)

What values should NOT be exposed and why? Document the reasoning.

5. Operator Experience

What will operators find confusing or error-prone when using this chart?

Common pain points:

Secret creation (Where do operators get the database password? Do they create a Secret before running helm install?)
Image pull authentication (Does this chart need imagePullSecrets configured?)
RBAC setup (What permissions does the service account need?)
Network policies (Can the application reach the database? Are there explicit network denials?)
Storage provisioning (Does the app need a PVC? Can the cluster provision it?)

How does this chart help operators avoid these mistakes?

6. Multi-Environment Deployment

What differs between dev, staging, and production?

Typically changes:

Replica counts (1 in dev, 3+ in prod)
Resource limits (smaller in dev, larger in prod)
Image pull policies (IfNotPresent in dev, Always in prod)
Log levels (Debug in dev, Error in prod)
External URLs (staging API endpoint vs prod endpoint)
Persistence settings (ephemeral in dev, persistent in prod)

What stays the same across environments?

Security context
Network policies
RBAC structure
Health check patterns

Can this be handled via separate values files (dev-values.yaml, prod-values.yaml)?

7. Extensibility and Evolution

How will other teams extend this chart without forking it?

Scenarios:

Adding a custom sidecar (logging agent, security agent)
Adding environment variables
Overriding labels or annotations
Injecting init containers
Adding additional volumes

Does the chart expose these extension points? (Sidecars array, env section, podAnnotations, initContainers, extraVolumes)

Or will teams be forced to fork because the chart is too rigid?

8. Failure Modes and Recovery

What goes wrong in production with Helm charts?

Common incidents:

Pod eviction (insufficient resources, node drain)
Image pull failures (wrong image tag, registry authentication)
Secret misconfiguration (missing Secret, wrong data keys)
Database connection failures (schema missing, credentials wrong)
Storage mounting failures (PVC not provisioned, permission errors)
Network policies preventing pod-to-pod communication

How does this chart help operators diagnose each failure? Can they quickly check logs, see pod events, validate configuration?

9. Scaling and Performance

How does this chart handle growth?

Questions:

Horizontal scaling: How many replicas are reasonable? What's the bottleneck? (Usually database connections or external API rate limits, not the application server)
Database connections: Does the app establish one connection per replica? Will production have 100x replicas, each with their own DB connection? Is connection pooling configured?
API rate limits: Does the app call external APIs? Will scaling increase external API costs or hit rate limits?
Persistent volumes: Is storage provisioned correctly? Can it scale with data growth?

Does this chart make these concerns visible to operators, or are they hidden?

10. Team Boundaries and Governance

Who owns what in this chart?

Clear ownership models:

Platform team owns library charts (organizational standards, security defaults)
Application team owns application charts (service-specific logic)
DevOps team owns values files (environment-specific configuration)
Security team owns RBAC and security context defaults

Does this chart enforce those boundaries without creating friction? Or does it require cross-team coordination on every deployment?

Principles

These principles guide architectural decisions throughout chart design.

1. Values Files Are the API Contract

Your values.yaml is the API between the chart and its operators. Design it for extensibility, not restriction.

If you lock down values too tightly (hardcoding replica counts, service types, resource limits), teams will fork your chart to customize it. Forking breaks your ability to improve the chart organization-wide.

Expose what's reasonable to change. Document what's locked and why. This transforms values.yaml from a configuration file to a contract.

2. Dependencies Should Be Conditional

Not every deployment needs Redis or PostgreSQL. Use Helm's condition and tags mechanisms in Chart.yaml to make dependencies optional.

Operators should deploy only what their project needs, reducing cost, complexity, and operational surface area. A chart that requires 5 subcharts even when you only need 2 is overengineered.

Example:

dependencies:
  - name: postgresql
    repository: https://charts.bitnami.com/bitnami
    version: "12.x.x"
    condition: postgresql.enabled
  - name: redis
    repository: https://charts.bitnami.com/bitnami
    version: "18.x.x"
    condition: redis.enabled

Operators can then set postgresql.enabled: false if they use managed RDS instead.

3. Named Templates Reduce Duplication and Complexity

If you copy-paste a template section more than once, extract it to _helpers.tpl as a named template.

Named templates:

Are easier to test and debug
Become the building blocks of your chart architecture
Make changes in one place instead of three
Enable other charts to reuse them (via library charts)

Template duplication is technical debt. Eliminate it.

4. Hooks Must Be Idempotent and Fail-Safe

Hooks run multiple times: during install, upgrade, retry-after-failure. A hook that runs twice must succeed both times.

Design for idempotence:

Schema migrations: Use IF NOT EXISTS checks
Data loading: Use ON CONFLICT ... DO NOTHING or skip if already present
Configuration: Validate before applying; fail safely if already configured

Hooks that assume "this is the first run" will fail during upgrades and break deployments.

Also use deletion policies:

annotations:
  helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded

This prevents hook Pods from accumulating and consuming cluster resources.

5. Minimize Chart-to-Chart Coupling

Each chart should be deployable independently. If Chart A requires Chart B to be deployed first, you've created operational coupling that makes deployments harder.

Coupling examples:

Chart A expects Chart B's ConfigMap to exist (breaks if B isn't deployed)
Chart A and Chart B both need to agree on a database schema (breaks if deployed in wrong order)

Solutions:

Use external ConfigMaps/Secrets for cross-chart communication
Deploy independent charts separately (use your orchestration layer, not Helm dependencies)
For tightly coupled services, package them in a single application chart

Loose coupling = operational resilience.

6. Test Hooks Validate Post-Deployment Health

Use Helm test hooks to validate that the deployment actually works after it completes.

Helm test hooks are NOT unit tests or integration tests. They're operational smoke tests.

Examples:

"Can the application accept HTTP traffic?"
"Can the application connect to the database?"
"Does the health check endpoint return 200?"
"Can the API process a sample request?"

Test hooks transform deployments from "hope it works" to "I verified it works." They're cheap insurance against silent failures.

7. Values Schema Validates Operator Input Early

Use Chart.yaml's kubeVersion and a values.schema.json file to validate configuration before rendering templates.

Validate:

Required fields are present
Numeric values are in valid ranges (replicaCount > 0, not negative)
String values match expected formats (image tags, domain names)
Object structure is correct

Fail fast with clear errors instead of rendering broken manifests. Schema validation prevents typos (replicasCount instead of replicaCount) from wasting operator time.

8. Separate Library Patterns from Application Charts

Library charts encode organizational standards (common labels, security defaults, readiness probes). Application charts deploy specific services using those standards.

Keep them separate:

Library charts (type: library): Reusable templates and defaults
Application charts (type: application): Service-specific logic

This enables:

Organizational teams to evolve standards independently
Application teams to update their service logic without waiting for platform updates
Clear ownership and governance

9. Document the Chart Design, Not Just Usage

Comments in values.yaml should explain WHY values exist and WHAT HAPPENS if they change.

Instead of:

replicaCount: 3  # number of replicas

Write:

# Number of application replicas. Scales horizontally for load and resilience.
# Increase to handle higher traffic; decrease to save resources.
# Minimum: 1 (single point of failure); recommended: 3+ for HA.
replicaCount: 3

New operators shouldn't have to guess the purpose of values or ask senior engineers why something is set a certain way. Good documentation is worth more than clever defaults.

10. Operator Experience Matters More Than Template Cleverness

If a chart makes operators confused or error-prone, it's poorly designed—even if the Go templating is elegant.

Prioritize:

Clear error messages (validation schemas, helm lint)
Obvious configuration flows (values organized logically)
Operational clarity (easy to debug, see what's deployed)

Over:

Advanced templating techniques
Complex named template hierarchies
Minimal lines of code

Test with operators who don't know the chart internals. Their confusion is a design signal.

11. Hooks Respect Deletion Policies to Prevent Cleanup Waste

By default, Helm hooks run during install, upgrade, and delete. Sometimes you don't want certain hooks to run on deletion.

Examples:

Pre-delete backups can be expensive; maybe skip on dev/staging
Post-delete cleanup might not be necessary if you're tearing down the whole namespace
Pre-delete validations don't make sense if you're deleting anyway

Use deletion policies:

annotations:
  helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded

Operators should understand the lifecycle implications of deletion.

12. OCI Registries Enable Distribution, Versioning, and Automation

Don't share charts via git clone or tar files. Use OCI-compliant registries (Harbor, DockerHub, ECR, Artifactory).

Benefits:

Versioning: Pin specific chart versions like you do Docker images
Automated testing: CI/CD can test charts before pushing
Dependency resolution: Helm resolves subcharts from registries
Access control: Registry-based RBAC, audit logging
Organization scale: Teams across the company discover and use charts

Example Application

Scenario: Deploying an LLM Fine-Tuning Service

Application requirements:

Python FastAPI service for fine-tuning
PostgreSQL for training job metadata
Redis for distributed training queue
Multi-GPU support
Multi-environment (dev, staging, prod)

Using the Skill

Step 1: Answer the Questions

Component Architecture: FastAPI service (containerized) + PostgreSQL (critical path) + Redis (supporting)
Dependency Management: PostgreSQL should use Bitnami subchart in dev (ephemeral), external RDS in prod
Lifecycle Hooks: Pre-install database schema, post-upgrade health validation
Configuration Surface: Expose replicas, GPU count, training parameters; lock security context
Operator Experience: Reduce complexity around GPU scheduling and distributed training setup
Multi-Environment: Dev uses minikube, staging uses small cluster, prod uses large cluster with GPU nodes
Extensibility: Teams may add custom monitoring sidecars; expose sidecar injection
Failure Modes: GPU node failures, OOM kills during training; charts should help diagnose
Scaling: Training queue length is bottleneck, not app replicas; design monitoring for queue depth
Team Boundaries: Platform owns GPU scheduling templates, ML team owns training logic

Step 2: Apply Principles

Values API: Expose training parameters, GPU count, checkpoint storage; lock image pull policy
Conditional dependencies: PostgreSQL optional (use external RDS if available)
Named templates: Extract GPU scheduling logic to _helpers.tpl for reuse
Idempotent hooks: Database migrations with IF NOT EXISTS checks
Minimize coupling: Chart works standalone, doesn't require other charts
Test hooks: Validate training job queue can accept tasks
Values schema: Validate GPU count is positive integer, training parameters are valid
Library patterns: Extract common security defaults to library chart
Document design: Explain why GPU scheduling is configured a certain way
Operator experience: Make GPU troubleshooting easy (logs show scheduling decisions)
OCI registry: Distribute to company registry for discovery

Output: A well-architected chart that developers can deploy to any Kubernetes cluster, with clear ownership and evolution path.

How to Invoke This Skill

When designing a new Helm chart:

Use the Helm Chart Architect skill to design architecture for:
[Project description with requirements, constraints, team structure]

Answer the 10 Questions in the context of this specific project.
Apply the 12 Principles to guide your architecture.
Produce a design document with:
- Component architecture (monolithic vs multi-chart)
- Dependency strategy
- Lifecycle hooks
- Values design
- Operational patterns

The skill activates architectural reasoning. It won't tell you "use this template." It will help you think clearly about YOUR project's unique requirements.

Score

Total Score

60/100

Based on repository quality metrics

✓SKILL.md

SKILL.mdファイルが含まれている

+20

○LICENSE

ライセンスが設定されている

0/10

○説明文

100文字以上の説明がある

0/10

✓人気

GitHub Stars 100以上

✓最近の活動

1ヶ月以内に更新

+10

○フォーク

10回以上フォークされている

0/5

✓Issue管理

オープンIssueが50未満

✓言語

プログラミング言語が設定されている

✓タグ

1つ以上のタグが設定されている

Reviews

💬

Reviews coming soon