Helm Versioning and Rollback: Managing Application Releases
Master Helm release management—revision history, automated rollbacks, rollback strategies, and handling failed releases gracefully.
Helm tracks every release installation and upgrade as a numbered revision. Understanding this history enables confident rollbacks when things go wrong.
When to Use / When Not to Use
When Helm rollback makes sense
Helm rollback is the right tool when you have an intact release history and the previous state is known and safe. Database schema changes that went wrong, config changes that broke behavior, and canary deployments that revealed problems are all good rollback candidates. The --atomic flag makes automated rollback in CI/CD practical for these cases.
Helm rollback also works well when your application is stateless or when storage resources are not affected by the change. A bad Deployment image tag or a misconfigured ConfigMap rolls back cleanly.
When rollback is not enough
If your upgrade included database migrations that modified schema, rollback does not undo those changes. PostgreSQL migrations that added columns with non-null constraints, MongoDB schema changes, and any irreversible data transformation cannot be fixed by rolling back the Kubernetes resources.
In these cases, you need forward-fix migrations, not rollback. Design your upgrade process to handle this before deploying.
Rollback Decision Flow
flowchart TD
A[Deployment fails<br/>or degraded] --> B{Upgrade or<br/>Config change?}
B -->|Config| C{Rollback<br/>fixes it?}
B -->|Migration| D[Forward-fix<br/>required]
C -->|Yes| E[helm rollback<br/>to previous revision]
C -->|No| D
E --> F{Stateful<br/>resources affected?}
F -->|Yes| G[Backup data<br/>then rollback]
F -->|No| H[Rollback safe<br/>proceed]
Release Revision History
Helm stores release history in Kubernetes secrets within the release namespace. Each time you install, upgrade, or roll back, Helm creates a new revision.
# List releases
helm list
# List releases with history
helm history myapp
REVISION UPDATED STATUS CHART DESCRIPTION
1 2026-03-20 10:30:00 superseded myapp-1.0.0 Install complete
2 2026-03-21 14:15:00 superseded myapp-1.1.0 Upgrade complete
3 2026-03-22 09:00:00 deployed myapp-1.2.0 Upgrade complete
# Get detailed release status
helm status myapp
# Show release values at any revision
helm get values myapp --revision 2
# Show all hooks and values
helm get all myapp --revision 3
History is stored as Kubernetes secrets:
kubectl get secrets -n mynamespace -l "owner=helm"
NAME TYPE DATA AGE
sh.helm.release.v1.myapp.v1 helm.sh/release.v1 1 5d
sh.helm.release.v1.myapp.v2 helm.sh/release.v1 1 4d
sh.helm.release.v1.myapp.v3 helm.sh/release.v1 1 3d
Manual Rollback Procedures
Rolling back to a previous revision takes a single command:
# Rollback to previous revision
helm rollback myapp
# Rollback to specific revision
helm rollback myapp 2
# Rollback with timeout
helm rollback myapp 1 --timeout 5m
# Dry-run rollback
helm rollback myapp 1 --dry-run
Helm performs the rollback by applying the stored manifests from that revision. This is not a reverse diff but a full recreation of the state.
# Watch rollback progress
helm rollback myapp 1 --wait
# Rollback with cleanup on failure
helm rollback myapp 1 --cleanup-on-fail
The --cleanup-on-fail flag causes Helm to delete new resources that were created by the failed upgrade but are not present in the rollback target revision.
Automated Rollback in CI/CD
In automated pipelines, you need criteria for when to trigger a rollback:
# GitHub Actions workflow excerpt
- name: Deploy to production
run: |
helm upgrade --install myapp ./charts/myapp \
--namespace production \
--values ./config/production.yaml \
--wait --timeout 10m \
--atomic
env:
KUBECONFIG: /tmp/kubeconfig
- name: Verify deployment
run: |
# Check rollout status
kubectl rollout status deployment/myapp -n production
# Run smoke tests
curl -f https://myapp.example.com/health || exit 1
# Check metrics
prometheus query "app:http_requests_total{app='myapp'}" > /dev/null
The --atomic flag automatically rolls back on failure:
helm upgrade --install myapp ./charts/myapp \
--atomic \
--timeout 5m
If the upgrade fails or pods do not become ready within the timeout, Helm automatically rolls back to the previous successful revision.
Prometheus-based rollback:
# deploy-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: deploy-myapp
namespace: cicd
spec:
template:
spec:
containers:
- name: deploy
image: helm/helm:3.14
command:
- /bin/bash
- -c
- |
# Deploy
helm upgrade --install myapp ./charts/myapp \
--wait --timeout 10m
# Wait for metrics
sleep 60
# Check error rate
ERROR_RATE=$(prometheus query "rate(http_errors_total{app='myapp'}[5m])")
if [ "$ERROR_RATE" > "0.01" ]; then
echo "Error rate too high: $ERROR_RATE"
helm rollback myapp
exit 1
fi
Hooks for Pre/Post Upgrade Tasks
Helm hooks run arbitrary Kubernetes jobs at specific points in the release lifecycle:
# templates/backup-hook.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ .Release.Name }}-pre-upgrade-backup"
annotations:
"helm.sh/hook": pre-upgrade
"helm.sh/hook-weight": "-5"
"helm.sh/hook-delete-policy": hook-succeeded,hook-failed
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: backup
image: postgres-client:15
command:
- /bin/bash
- -c
- |
pg_dump -h postgres -U app $DATABASE > /backups/pre-upgrade.sql
# templates/migration-hook.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ .Release.Name }}-db-migration"
annotations:
"helm.sh/hook": post-upgrade
"helm.sh/hook-weight": "5"
"helm.sh/hook-delete-policy": hook-succeeded
spec:
template:
spec:
containers:
- name: migrate
image: myapp/migrator:1.5
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: myapp-db-credentials
key: connection-string
restartPolicy: OnFailure
Hook weights control execution order. Lower weights run first for pre-upgrade hooks, higher weights run first for post-upgrade hooks.
Delete policies:
hook-succeeded: Delete job after successful completionhook-failed: Delete job after failed executionbefore-hook-creation: Delete existing job before creating new one
Rollback Failure Scenarios
Some situations prevent straightforward rollbacks:
Resource no longer exists in old revision:
If the previous version created resources that were deleted in a later version, rollback will recreate them.
Storage resource changes:
Storage classes and persistent volumes may not be rollback-able depending on their provisioner.
API version deprecations:
If you upgraded to a resource using a newer API version that is no longer available in the old version, rollback cannot proceed.
Manual edits:
If someone manually edited Kubernetes resources managed by Helm, rollback may conflict with those changes.
# Force rollback (may leave resources in unexpected state)
helm rollback myapp 1 --force
# Check what will happen before rolling back
helm template myapp --revision 1 ./charts/myapp > /tmp/rollback.yaml
kubectl diff -f /tmp/rollback.yaml
Best Practices for Production
Always use --wait and --timeout:
helm upgrade myapp ./charts/myapp \
--wait \
--timeout 10m \
--cleanup-on-fail
Keep reasonable history:
# Limit secrets stored by Helm
helm.sh/resource-policy: keep
# Or use postrenderer to exclude hook resources
Test rollbacks in staging:
# In your staging pipeline
- name: Rollback test
run: |
# Deploy current version
helm upgrade --install myapp ./charts/myapp --namespace staging
# Rollback immediately
helm rollback myapp --namespace staging
# Verify rollback completed
helm history myapp --namespace staging
Monitor rollback events:
# Alerting rule
- alert: HelmRollback
expr: |
increase(helm_release_rollback_total[5m]) > 0
labels:
severity: warning
annotations:
summary: "Helm rollback detected"
description: "Release {{ $labels.name }} rolled back in namespace {{ $labels.namespace }}"
Rollback Trade-offs
Helm rollback is not always the right tool. Here is how it compares to other recovery strategies.
| Scenario | Helm Rollback | Forward Fix | Blue-Green Deploy |
|---|---|---|---|
| Config change broke things | Fast (single command) | Takes longer | Fast (switch traffic) |
| Bad deployment image tag | Fast | Rebuild and redeploy | Fast switch back |
| Database migration failure | Dangerous (schema already changed) | Correct approach | Roll traffic, fix migration |
| CRD changes | May not work | Often required | Depends on change type |
| Data loss risk | High if storage affected | Lower with proper backup | Low |
The key question: does the previous revision actually represent a safe state? If your upgrade modified persistent data, rolling back may not undo those changes.
Observability Hooks
Track rollback health and deployment reliability with these monitoring practices.
Key metrics to track:
# Rollback frequency by release
sum(rate(helm_release_rollback_total[5m])) by (name, namespace)
# Failed upgrades that triggered atomic rollback
sum(rate(helm_upgrade_failed_total[5m])) by (name, namespace)
# Release revision count (high count = unstable releases)
helm_release_info{owner=helm}
# Time since last successful deployment
time() - helm_release_last_deployed_timestamp
Alert rules for Helm releases:
# Alert when a rollback occurs
- alert: HelmRollbackExecuted
expr: increase(helm_release_rollback_total[5m]) > 0
labels:
severity: warning
annotations:
summary: "Helm rollback executed for {{ $labels.name }}"
description: "Release {{ $labels.name }} in {{ $labels.namespace }} was rolled back. Investigate the cause."
# Alert when release is in failed state
- alert: HelmReleaseFailed
expr: helm_release_info{status="failed"} == 1
labels:
severity: critical
annotations:
summary: "Helm release {{ $labels.name }} is in failed state"
description: "Release has been failing. Manual intervention may be required."
# Alert on high revision count (unstable release)
- alert: HelmReleaseUnstable
expr: count(helm_release_info) by (name, namespace) > 10
labels:
severity: warning
annotations:
summary: "Release {{ $labels.name }} has {{ $value }} revisions"
description: "High revision count indicates frequent changes or rollbacks. Review release stability."
Debugging commands:
# Get full release history with details
helm history myapp --all
# See what changed between revisions
helm diff revision myapp 2 3
# Check current release status
helm status myapp
# Get all values at a specific revision
helm get values myapp --revision 3
# View the manifest that would be applied at revision 1
helm template myapp --revision 1 ./charts/myapp > /tmp/rev1.yaml
Common Pitfalls / Anti-Patterns
Relying on rollback for database migrations
Rollback does not undo database migrations. If your upgrade runs SQL migrations and then the application fails to start, rolling back the Kubernetes Deployment does not roll back the database schema. This leaves your app in a broken state against a migrated schema.
Always design database migrations to be reversible, or design your upgrade process so that a failed migration aborts the Helm upgrade before the new application code is deployed.
Not using —atomic in automated deployments
Without --atomic, a failed upgrade leaves the release in a pending-upgrade or failed state. The next deployment attempt may behave unexpectedly. --atomic guarantees you always end up with a working release, either the new one or the rolled-back previous one.
Manual edits between deploy and rollback
If someone uses kubectl edit to modify a resource managed by Helm after an upgrade, that change survives the next helm upgrade because Helm overwrites whatever is in the cluster with what is in the chart. It also complicates rollback since the cluster state no longer matches the stored revision.
Use helm.sh/resource-policy: keep annotations to exclude specific resources from Helm management when manual edits are necessary.
Not limiting release history
Helm stores every revision as a Kubernetes secret. Without limits, hundreds of revisions accumulate over time, bloating the namespace and slowing down helm list and helm history. Set reasonable history limits and prune old revisions.
Forgetting to test rollback in staging
A rollback procedure that has never been tested is unreliable. If your first rollback attempt happens during a production incident at 3am, you do not want to discover that your rollback hook has a bug.
Test rollback in staging on every significant release.
Quick Recap
Key Takeaways
- Helm stores every release as a numbered revision; rollback replays the stored state of that revision
- Use
--atomicfor automated pipelines to guarantee clean state on failure - Design migrations to be reversible or build forward-fix procedures instead of relying on rollback
- Test rollback in staging before relying on it in production
- Monitor rollback events with Prometheus alerts to catch problems early
Rollback Checklist
# Check release history
helm history myapp
# See what values were deployed at each revision
helm get values myapp --revision 2
# Rollback to previous working revision
helm rollback myapp --wait --timeout 5m
# Rollback to specific revision
helm rollback myapp 1 --wait
# Check rollback succeeded
helm status myapp
helm history myapp
Conclusion
Helm revision history makes rollback straightforward in most cases. Use --atomic for automatic rollback in CI/CD, implement proper hooks for data migration, and test rollback procedures regularly in non-production environments. For more on Helm, see our Helm Charts overview, and for deployment strategies that reduce rollback need, see our CI/CD Pipelines guide. For GitOps patterns with ArgoCD, see our GitOps with ArgoCD and Flux post.
Category
Related Posts
Developing Helm Charts: Templates, Values, and Testing
Create production-ready Helm charts with Go templates, custom value schemas, and testing using Helm unittest and ct.
Helm Charts: Templating, Values, and Package Management
Learn Helm Charts from basics to advanced patterns. Master Helm templates, values management, chart repositories, and production deployment workflows.
Container Security: Image Scanning and Vulnerability Management
Implement comprehensive container security: from scanning images for vulnerabilities to runtime security monitoring and secrets protection.