Helm Versioning and Rollback: Managing Application Releases
Master Helm release management—revision history, automated rollbacks, rollback strategies, and handling failed releases gracefully.
Introduction
Every Helm upgrade creates a new numbered revision. If the upgrade breaks something, you roll back to a previous revision with a single command. This revision history is stored as Kubernetes secrets in the release namespace and survives across cluster restarts, making rollback a reliable recovery mechanism when deployments go wrong.
Rollback matters in production because not every failed deployment is caught by pre-deployment testing. A config change that seemed safe in staging may interact unexpectedly with production data. A new image tag may have a subtle bug that only manifests under production load. When this happens, you need to restore the previous working state quickly, and Helm rollback lets you do that without manually reverting changes in Git or applying old manifests.
This guide covers Helm release revision history, manual rollback procedures, automated rollback in CI/CD pipelines, and hooks for pre and post-upgrade tasks. You will learn when rollback is the right tool and when database migrations or forward fixes are required instead. By the end, you will be able to inspect release history, execute rollbacks safely, implement automated rollback with the atomic flag, and design upgrade processes that survive real-world failures.
When to Use / When Not to Use
When Helm rollback makes sense
Helm rollback is the right tool when you have an intact release history and the previous state is known and safe. Database schema changes that went wrong, config changes that broke behavior, and canary deployments that revealed problems are all good rollback candidates. The --atomic flag makes automated rollback in CI/CD practical for these cases.
Helm rollback also works well when your application is stateless or when storage resources are not affected by the change. A bad Deployment image tag or a misconfigured ConfigMap rolls back cleanly.
When rollback is not enough
If your upgrade included database migrations that modified schema, rollback does not undo those changes. PostgreSQL migrations that added columns with non-null constraints, MongoDB schema changes, and any irreversible data transformation cannot be fixed by rolling back the Kubernetes resources.
In these cases, you need forward-fix migrations, not rollback. Design your upgrade process to handle this before deploying.
Rollback Decision Flow
flowchart TD
A[Deployment fails<br/>or degraded] --> B{Upgrade or<br/>Config change?}
B -->|Config| C{Rollback<br/>fixes it?}
B -->|Migration| D[Forward-fix<br/>required]
C -->|Yes| E[helm rollback<br/>to previous revision]
C -->|No| D
E --> F{Stateful<br/>resources affected?}
F -->|Yes| G[Backup data<br/>then rollback]
F -->|No| H[Rollback safe<br/>proceed]
Release Revision History
Helm stores release history in Kubernetes secrets within the release namespace. Each time you install, upgrade, or roll back, Helm creates a new revision.
# List releases
helm list
# List releases with history
helm history myapp
REVISION UPDATED STATUS CHART DESCRIPTION
1 2026-03-20 10:30:00 superseded myapp-1.0.0 Install complete
2 2026-03-21 14:15:00 superseded myapp-1.1.0 Upgrade complete
3 2026-03-22 09:00:00 deployed myapp-1.2.0 Upgrade complete
# Get detailed release status
helm status myapp
# Show release values at any revision
helm get values myapp --revision 2
# Show all hooks and values
helm get all myapp --revision 3
History is stored as Kubernetes secrets:
kubectl get secrets -n mynamespace -l "owner=helm"
NAME TYPE DATA AGE
sh.helm.release.v1.myapp.v1 helm.sh/release.v1 1 5d
sh.helm.release.v1.myapp.v2 helm.sh/release.v1 1 4d
sh.helm.release.v1.myapp.v3 helm.sh/release.v1 1 3d
Manual Rollback Procedures
Rolling back to a previous revision takes a single command:
# Rollback to previous revision
helm rollback myapp
# Rollback to specific revision
helm rollback myapp 2
# Rollback with timeout
helm rollback myapp 1 --timeout 5m
# Dry-run rollback
helm rollback myapp 1 --dry-run
Helm performs the rollback by applying the stored manifests from that revision. This is not a reverse diff but a full recreation of the state.
# Watch rollback progress
helm rollback myapp 1 --wait
# Rollback with cleanup on failure
helm rollback myapp 1 --cleanup-on-fail
The --cleanup-on-fail flag causes Helm to delete new resources that were created by the failed upgrade but are not present in the rollback target revision.
Automated Rollback in CI/CD
In automated pipelines, you need criteria for when to trigger a rollback:
# GitHub Actions workflow excerpt
- name: Deploy to production
run: |
helm upgrade --install myapp ./charts/myapp \
--namespace production \
--values ./config/production.yaml \
--wait --timeout 10m \
--atomic
env:
KUBECONFIG: /tmp/kubeconfig
- name: Verify deployment
run: |
# Check rollout status
kubectl rollout status deployment/myapp -n production
# Run smoke tests
curl -f https://myapp.example.com/health || exit 1
# Check metrics
prometheus query "app:http_requests_total{app='myapp'}" > /dev/null
The --atomic flag automatically rolls back on failure:
helm upgrade --install myapp ./charts/myapp \
--atomic \
--timeout 5m
If the upgrade fails or pods do not become ready within the timeout, Helm automatically rolls back to the previous successful revision.
Prometheus-based rollback:
# deploy-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: deploy-myapp
namespace: cicd
spec:
template:
spec:
containers:
- name: deploy
image: helm/helm:3.14
command:
- /bin/bash
- -c
- |
# Deploy
helm upgrade --install myapp ./charts/myapp \
--wait --timeout 10m
# Wait for metrics
sleep 60
# Check error rate
ERROR_RATE=$(prometheus query "rate(http_errors_total{app='myapp'}[5m])")
if [ "$ERROR_RATE" > "0.01" ]; then
echo "Error rate too high: $ERROR_RATE"
helm rollback myapp
exit 1
fi
Hooks for Pre/Post Upgrade Tasks
Helm hooks run arbitrary Kubernetes jobs at specific points in the release lifecycle:
# templates/backup-hook.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ .Release.Name }}-pre-upgrade-backup"
annotations:
"helm.sh/hook": pre-upgrade
"helm.sh/hook-weight": "-5"
"helm.sh/hook-delete-policy": hook-succeeded,hook-failed
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: backup
image: postgres-client:15
command:
- /bin/bash
- -c
- |
pg_dump -h postgres -U app $DATABASE > /backups/pre-upgrade.sql
# templates/migration-hook.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ .Release.Name }}-db-migration"
annotations:
"helm.sh/hook": post-upgrade
"helm.sh/hook-weight": "5"
"helm.sh/hook-delete-policy": hook-succeeded
spec:
template:
spec:
containers:
- name: migrate
image: myapp/migrator:1.5
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: myapp-db-credentials
key: connection-string
restartPolicy: OnFailure
Hook weights control execution order. Lower weights run first for pre-upgrade hooks, higher weights run first for post-upgrade hooks.
Delete policies:
hook-succeeded: Delete job after successful completionhook-failed: Delete job after failed executionbefore-hook-creation: Delete existing job before creating new one
Rollback Failure Scenarios
Some situations prevent straightforward rollbacks:
Resource no longer exists in old revision:
If the previous version created resources that were deleted in a later version, rollback will recreate them.
Storage resource changes:
Storage classes and persistent volumes may not be rollback-able depending on their provisioner.
API version deprecations:
If you upgraded to a resource using a newer API version that is no longer available in the old version, rollback cannot proceed.
Manual edits:
If someone manually edited Kubernetes resources managed by Helm, rollback may conflict with those changes.
# Force rollback (may leave resources in unexpected state)
helm rollback myapp 1 --force
# Check what will happen before rolling back
helm template myapp --revision 1 ./charts/myapp > /tmp/rollback.yaml
kubectl diff -f /tmp/rollback.yaml
Best Practices for Production
Always use --wait and --timeout:
helm upgrade myapp ./charts/myapp \
--wait \
--timeout 10m \
--cleanup-on-fail
Keep reasonable history:
# Limit secrets stored by Helm
helm.sh/resource-policy: keep
# Or use postrenderer to exclude hook resources
Test rollbacks in staging:
# In your staging pipeline
- name: Rollback test
run: |
# Deploy current version
helm upgrade --install myapp ./charts/myapp --namespace staging
# Rollback immediately
helm rollback myapp --namespace staging
# Verify rollback completed
helm history myapp --namespace staging
Monitor rollback events:
# Alerting rule
- alert: HelmRollback
expr: |
increase(helm_release_rollback_total[5m]) > 0
labels:
severity: warning
annotations:
summary: "Helm rollback detected"
description: "Release {{ $labels.name }} rolled back in namespace {{ $labels.namespace }}"
Rollback Trade-offs
Helm rollback is not always the right tool. Here is how it compares to other recovery strategies.
| Scenario | Helm Rollback | Forward Fix | Blue-Green Deploy |
|---|---|---|---|
| Config change broke things | Fast (single command) | Takes longer | Fast (switch traffic) |
| Bad deployment image tag | Fast | Rebuild and redeploy | Fast switch back |
| Database migration failure | Dangerous (schema already changed) | Correct approach | Roll traffic, fix migration |
| CRD changes | May not work | Often required | Depends on change type |
| Data loss risk | High if storage affected | Lower with proper backup | Low |
The key question: does the previous revision actually represent a safe state? If your upgrade modified persistent data, rolling back may not undo those changes.
Observability Hooks
Track rollback health and deployment reliability with these monitoring practices.
Key metrics to track:
# Rollback frequency by release
sum(rate(helm_release_rollback_total[5m])) by (name, namespace)
# Failed upgrades that triggered atomic rollback
sum(rate(helm_upgrade_failed_total[5m])) by (name, namespace)
# Release revision count (high count = unstable releases)
helm_release_info{owner=helm}
# Time since last successful deployment
time() - helm_release_last_deployed_timestamp
Alert rules for Helm releases:
# Alert when a rollback occurs
- alert: HelmRollbackExecuted
expr: increase(helm_release_rollback_total[5m]) > 0
labels:
severity: warning
annotations:
summary: "Helm rollback executed for {{ $labels.name }}"
description: "Release {{ $labels.name }} in {{ $labels.namespace }} was rolled back. Investigate the cause."
# Alert when release is in failed state
- alert: HelmReleaseFailed
expr: helm_release_info{status="failed"} == 1
labels:
severity: critical
annotations:
summary: "Helm release {{ $labels.name }} is in failed state"
description: "Release has been failing. Manual intervention may be required."
# Alert on high revision count (unstable release)
- alert: HelmReleaseUnstable
expr: count(helm_release_info) by (name, namespace) > 10
labels:
severity: warning
annotations:
summary: "Release {{ $labels.name }} has {{ $value }} revisions"
description: "High revision count indicates frequent changes or rollbacks. Review release stability."
Debugging commands:
# Get full release history with details
helm history myapp --all
# See what changed between revisions
helm diff revision myapp 2 3
# Check current release status
helm status myapp
# Get all values at a specific revision
helm get values myapp --revision 3
# View the manifest that would be applied at revision 1
helm template myapp --revision 1 ./charts/myapp > /tmp/rev1.yaml
Common Pitfalls / Anti-Patterns
Relying on rollback for database migrations
Rollback does not undo database migrations. If your upgrade runs SQL migrations and then the application fails to start, rolling back the Kubernetes Deployment does not roll back the database schema. This leaves your app in a broken state against a migrated schema.
Always design database migrations to be reversible, or design your upgrade process so that a failed migration aborts the Helm upgrade before the new application code is deployed.
Not using —atomic in automated deployments
Without --atomic, a failed upgrade leaves the release in a pending-upgrade or failed state. The next deployment attempt may behave unexpectedly. --atomic guarantees you always end up with a working release, either the new one or the rolled-back previous one.
Manual edits between deploy and rollback
If someone uses kubectl edit to modify a resource managed by Helm after an upgrade, that change survives the next helm upgrade because Helm overwrites whatever is in the cluster with what is in the chart. It also complicates rollback since the cluster state no longer matches the stored revision.
Use helm.sh/resource-policy: keep annotations to exclude specific resources from Helm management when manual edits are necessary.
Not limiting release history
Helm stores every revision as a Kubernetes secret. Without limits, hundreds of revisions accumulate over time, bloating the namespace and slowing down helm list and helm history. Set reasonable history limits and prune old revisions.
Forgetting to test rollback in staging
A rollback procedure that has never been tested is unreliable. If your first rollback attempt happens during a production incident at 3am, you do not want to discover that your rollback hook has a bug.
Test rollback in staging on every significant release.
Interview Questions
Expected answer points:
- Helm stores every release as a numbered revision in Kubernetes secrets within the release namespace
- Secrets have name format `sh.helm.release.v1.
.v ` and type `helm.sh/release.v1` - Each install, upgrade, or rollback creates a new revision secret
- You can view history with `helm history` and inspect specific revisions with `helm get values --revision N`
Expected answer points:
- `--atomic` automatically rolls back to the previous successful revision if the upgrade fails or times out
- Ensures you always end up with a working release—either the new one or the rolled-back previous one
- Essential for CI/CD pipelines where manual intervention during failures is impractical
- If upgrade fails and `--atomic` is set, Helm guaranteed rollback happens before returning error
Expected answer points:
- Helm rollback only re-applies Kubernetes manifests—it cannot undo database schema changes
- If your upgrade runs SQL migrations that modify schema (e.g., adding columns with non-null constraints), those changes persist after rollback
- Application code may fail against the migrated schema if it was designed for the old schema
- Design migrations to be reversible, or build forward-fix procedures instead of relying on rollback
Expected answer points:
- Hooks run Kubernetes jobs at specific points in release lifecycle: pre-install, post-install, pre-upgrade, post-upgrade, pre-rollback, post-rollback
- Hook weight controls execution order (lower weight runs first for pre-hooks)
- Delete policies: `hook-succeeded`, `hook-failed`, `before-hook-creation`
- Pre-upgrade hooks with backup logic run before migration, enabling data preservation before risky changes
Expected answer points:
- Storage classes and persistent volumes may not be rollback-able depending on their provisioner
- PVC claims that were deleted in a later version are not automatically recreated on rollback
- Some storage provisioners allow volume expansion but not contraction
- Always check storage state before assuming rollback fully restores previous state
Expected answer points:
- Deploy using `helm upgrade --install` with `--wait --timeout`
- After deployment, wait for metrics to stabilize (sleep N seconds)
- Query Prometheus for error rate or latency thresholds
- If thresholds exceeded, run `helm rollback` and exit with failure
- Without `--atomic`, you must explicitly check and rollback—otherwise release stays in failed state
Expected answer points:
- `helm rollback` actually applies changes to the cluster
- `helm template` renders the chart without connecting to cluster—useful for seeing what would be applied
- `helm diff revision A B` shows what changed between two revisions
- Use `helm template --revision N` to preview rollback state before executing
Expected answer points:
- API version deprecation—if upgraded to a newer API version no longer available in old revision, rollback cannot proceed
- Resources manually edited after upgrade—rollback conflicts with those changes
- Stateful workload storage changes that persist despite rollback
- `--force` flag can override but may leave resources in unexpected state
Expected answer points:
- Add rollback test step in staging pipeline: deploy, immediately rollback, verify rollback completed
- Test with every significant release, not just once during initial setup
- Verify hooks run correctly (backup jobs succeed, migration jobs complete)
- A rollback procedure that has never been tested is unreliable when you need it at 3am
Expected answer points:
- Rollback frequency by release (increase in helm_release_rollback_total)
- Failed upgrades that triggered atomic rollback (helm_upgrade_failed_total)
- Release revision count (high count indicates unstable releases)
- Time since last successful deployment
- Alert when rollback occurs, when release is in failed state, or when revision count exceeds threshold
Expected answer points:
- Helm rollback does not fetch charts from the repository — it uses the manifests stored in release secrets
- Release secrets (sh.helm.release.v1.*) in Kubernetes contain the full rendered manifests
- Even if the chart version is no longer in the repository, rollback can apply the stored manifests
- Problem occurs if: secret-based releases are deleted, or if resource definitions require chart-specific files (templates, hooks)
- If chart tarball is missing for hooks: rollback may fail if hook execution requires chart assets
- Best practice: keep chart versions available in repository for rollback scenarios requiring re-rendering
Expected answer points:
- `helm.sh/resource-policy: keep` prevents Helm from deleting specific resources during upgrade or rollback
- Use when you want to preserve resources created manually or by external tools that Helm should not manage
- `helm.sh/hook-delete-policy` controls when hook resources (jobs, pods) are deleted after hook execution
- Hook delete policies: hook-succeeded (delete if hook completed successfully), hook-failed (delete if failed), before-hook-creation (delete existing before creating new)
- Resource policy keep is for persistent resources; hook delete policy is for temporary hook workloads
- Both are annotations on resources — they serve different purposes and can be used together
Expected answer points:
- Design upgrade strategy: migrate first, then deploy new application version
- Pre-upgrade hook runs database migration with retry logic before new pods start
- If migration fails: hook fails, Helm upgrade stops, previous release remains deployed
- Add init container to application that waits for migration completion before starting app
- Blue-green deployment: deploy new version alongside old, run migration, switch traffic, delete old
- Backward-compatible database migrations: add columns with defaults, never remove columns in same release as code change
Expected answer points:
- Helm tests are Kubernetes jobs that run after installation to verify the release is working
- Test pod must have `helm.sh/hook: test-success` annotation
- Tests should verify: service is reachable, health endpoint returns 200, application can read/write data
- Run tests with `helm test myapp` — outputs test pod logs
- Tests run after upgrade if `--timeout` is reached and pods are ready
- Write meaningful tests: smoke tests that catch real failures, not just "pod is running"
Expected answer points:
- Use `helm upgrade --reuse-values` with `--set` to update secret values without changing other config
- For External Secrets Operator: secrets auto-refresh from secret store — no Helm upgrade needed
- For sealed secrets: deploy new sealed secret, then delete old secret after verification
- Strategy: deploy new secret alongside old, verify new secret works, remove old secret
- Avoid `helm.sh/resource-policy: keep` on secrets — old secret references may break if recreated
- Test rollback procedure for secret rotation — verify you can roll back to previous secret if needed
Expected answer points:
- Use release name that indicates purpose: `myapp-prod`, not `myapp`
- Install each application to its own namespace — don't mix unrelated applications
- One release per application per namespace — multiple releases of same chart in same namespace causes conflicts
- Use `--generate-name` only for temporary test installs, never in production
- Track releases in documentation — which release is in which namespace and what it does
- Use labels and annotations on releases to track ownership, contact, and purpose
Expected answer points:
- Helm acquires a lock on the release during upgrade — concurrent upgrades are rejected
- Second upgrade attempt receives error: "another operation (install/upgrade/rollback) is in progress"
- Use `--wait` to ensure previous operation completes before starting new one
- In CI/CD: implement locking mechanism (flock, database mutex) to prevent concurrent deploys
- If deploy fails mid-operation and leaves release in pending-upgrade state: use `helm rollback` or `helm upgrade --atomic` to recover
Expected answer points:
- Helm native support for canary is limited — use Helmfile or additional tools for complex strategies
- Simple canary: `helm upgrade --set replicaCount=1` for canary, then gradually increase
- Use Argo Rollouts or Flagger for production-grade canary with traffic splitting
- Argo Rollouts defines rollout strategy in CRD — separate from Helm values
- Promote canary: update Helm release with canary values; rollback if metrics degrade
- Monitor error rate and latency during canary period before full promotion
Expected answer points:
- `helm install` creates a new release; fails if release with same name already exists
- `helm upgrade --install` creates release if it doesn't exist, or upgrades if it does — idempotent
- Use `--install` in CI/CD for repeatable deployments — works for both initial install and updates
- `helm install` is useful for testing with temporary random names via `--generate-name`
- `--install` combined with `--atomic` ensures clean state: if upgrade fails, rollback to previous release
Expected answer points:
- Release is left in `pending-upgrade` or `failed` state if upgrade fails without `--atomic`
- Recovery: `helm upgrade myapp mychart --atomic` — if current state is broken, rollback succeeds
- If rollback also fails: use `helm rollback myapp --force` — recreates resources from stored manifest
- Check what went wrong: `helm status myapp`, `kubectl get events`, application logs
- For persistent failures: use `helm template` to render manifests, apply manually with `kubectl` for debugging
- Delete release in failed state: `helm uninstall myapp --keep-history` then manually clean up orphaned resources
Further Reading
- Helm Charts - Comprehensive Helm chart development guide
- Helm Repository Management - ChartMuseum, Harbor, and repository strategies
- OCI Artifacts Distribution - Chart distribution via OCI registries
- CI/CD Pipelines - Deployment pipeline patterns
- GitOps with ArgoCD and Flux - GitOps deployment strategies
- Helm Documentation - Official Helm documentation
Conclusion
Key Takeaways
- Helm stores every release as a numbered revision; rollback replays the stored state of that revision
- Use
--atomicfor automated pipelines to guarantee clean state on failure - Design migrations to be reversible or build forward-fix procedures instead of relying on rollback
- Test rollback in staging before relying on it in production
- Monitor rollback events with Prometheus alerts to catch problems early
Rollback Checklist
# Check release history
helm history myapp
# See what values were deployed at each revision
helm get values myapp --revision 2
# Rollback to previous working revision
helm rollback myapp --wait --timeout 5m
# Rollback to specific revision
helm rollback myapp 1 --wait
# Check rollback succeeded
helm status myapp
helm history myapp
Helm revision history makes rollback straightforward in most cases. Use --atomic for automatic rollback in CI/CD, implement proper hooks for data migration, and test rollback procedures regularly in non-production environments. For more on Helm, see our Helm Charts overview, and for deployment strategies that reduce rollback need, see our CI/CD Pipelines guide. For GitOps patterns with ArgoCD, see our GitOps with ArgoCD and Flux post.
Category
Related Posts
Developing Helm Charts: Templates, Values, and Testing
Create production-ready Helm charts with Go templates, custom value schemas, and testing using Helm unittest and ct.
Helm Charts: Templating, Values, and Package Management
Helm Charts guide covering templates, values management, chart repositories, and production deployment workflows.
Container Security: Image Scanning and Vulnerability Management
Implement comprehensive container security: from scanning images for vulnerabilities to runtime security monitoring and secrets protection.