Helm Versioning and Rollback: Managing Application Releases

Master Helm release management—revision history, automated rollbacks, rollback strategies, and handling failed releases gracefully.

published: reading time: 10 min read

Helm tracks every release installation and upgrade as a numbered revision. Understanding this history enables confident rollbacks when things go wrong.

When to Use / When Not to Use

When Helm rollback makes sense

Helm rollback is the right tool when you have an intact release history and the previous state is known and safe. Database schema changes that went wrong, config changes that broke behavior, and canary deployments that revealed problems are all good rollback candidates. The --atomic flag makes automated rollback in CI/CD practical for these cases.

Helm rollback also works well when your application is stateless or when storage resources are not affected by the change. A bad Deployment image tag or a misconfigured ConfigMap rolls back cleanly.

When rollback is not enough

If your upgrade included database migrations that modified schema, rollback does not undo those changes. PostgreSQL migrations that added columns with non-null constraints, MongoDB schema changes, and any irreversible data transformation cannot be fixed by rolling back the Kubernetes resources.

In these cases, you need forward-fix migrations, not rollback. Design your upgrade process to handle this before deploying.

Rollback Decision Flow

flowchart TD
    A[Deployment fails<br/>or degraded] --> B{Upgrade or<br/>Config change?}
    B -->|Config| C{Rollback<br/>fixes it?}
    B -->|Migration| D[Forward-fix<br/>required]
    C -->|Yes| E[helm rollback<br/>to previous revision]
    C -->|No| D
    E --> F{Stateful<br/>resources affected?}
    F -->|Yes| G[Backup data<br/>then rollback]
    F -->|No| H[Rollback safe<br/>proceed]

Release Revision History

Helm stores release history in Kubernetes secrets within the release namespace. Each time you install, upgrade, or roll back, Helm creates a new revision.

# List releases
helm list

# List releases with history
helm history myapp

REVISION  UPDATED                  STATUS     CHART          DESCRIPTION
1         2026-03-20 10:30:00      superseded myapp-1.0.0    Install complete
2         2026-03-21 14:15:00      superseded myapp-1.1.0    Upgrade complete
3         2026-03-22 09:00:00      deployed   myapp-1.2.0     Upgrade complete
# Get detailed release status
helm status myapp

# Show release values at any revision
helm get values myapp --revision 2

# Show all hooks and values
helm get all myapp --revision 3

History is stored as Kubernetes secrets:

kubectl get secrets -n mynamespace -l "owner=helm"

NAME                     TYPE       DATA  AGE
sh.helm.release.v1.myapp.v1   helm.sh/release.v1   1   5d
sh.helm.release.v1.myapp.v2   helm.sh/release.v1   1   4d
sh.helm.release.v1.myapp.v3   helm.sh/release.v1   1   3d

Manual Rollback Procedures

Rolling back to a previous revision takes a single command:

# Rollback to previous revision
helm rollback myapp

# Rollback to specific revision
helm rollback myapp 2

# Rollback with timeout
helm rollback myapp 1 --timeout 5m

# Dry-run rollback
helm rollback myapp 1 --dry-run

Helm performs the rollback by applying the stored manifests from that revision. This is not a reverse diff but a full recreation of the state.

# Watch rollback progress
helm rollback myapp 1 --wait

# Rollback with cleanup on failure
helm rollback myapp 1 --cleanup-on-fail

The --cleanup-on-fail flag causes Helm to delete new resources that were created by the failed upgrade but are not present in the rollback target revision.

Automated Rollback in CI/CD

In automated pipelines, you need criteria for when to trigger a rollback:

# GitHub Actions workflow excerpt
- name: Deploy to production
  run: |
    helm upgrade --install myapp ./charts/myapp \
      --namespace production \
      --values ./config/production.yaml \
      --wait --timeout 10m \
      --atomic

  env:
    KUBECONFIG: /tmp/kubeconfig

- name: Verify deployment
  run: |
    # Check rollout status
    kubectl rollout status deployment/myapp -n production

    # Run smoke tests
    curl -f https://myapp.example.com/health || exit 1

    # Check metrics
    prometheus query "app:http_requests_total{app='myapp'}" > /dev/null

The --atomic flag automatically rolls back on failure:

helm upgrade --install myapp ./charts/myapp \
  --atomic \
  --timeout 5m

If the upgrade fails or pods do not become ready within the timeout, Helm automatically rolls back to the previous successful revision.

Prometheus-based rollback:

# deploy-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: deploy-myapp
  namespace: cicd
spec:
  template:
    spec:
      containers:
        - name: deploy
          image: helm/helm:3.14
          command:
            - /bin/bash
            - -c
            - |
              # Deploy
              helm upgrade --install myapp ./charts/myapp \
                --wait --timeout 10m

              # Wait for metrics
              sleep 60

              # Check error rate
              ERROR_RATE=$(prometheus query "rate(http_errors_total{app='myapp'}[5m])")
              if [ "$ERROR_RATE" > "0.01" ]; then
                echo "Error rate too high: $ERROR_RATE"
                helm rollback myapp
                exit 1
              fi

Hooks for Pre/Post Upgrade Tasks

Helm hooks run arbitrary Kubernetes jobs at specific points in the release lifecycle:

# templates/backup-hook.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: "{{ .Release.Name }}-pre-upgrade-backup"
  annotations:
    "helm.sh/hook": pre-upgrade
    "helm.sh/hook-weight": "-5"
    "helm.sh/hook-delete-policy": hook-succeeded,hook-failed
spec:
  template:
    spec:
      restartPolicy: OnFailure
      containers:
        - name: backup
          image: postgres-client:15
          command:
            - /bin/bash
            - -c
            - |
              pg_dump -h postgres -U app $DATABASE > /backups/pre-upgrade.sql
# templates/migration-hook.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: "{{ .Release.Name }}-db-migration"
  annotations:
    "helm.sh/hook": post-upgrade
    "helm.sh/hook-weight": "5"
    "helm.sh/hook-delete-policy": hook-succeeded
spec:
  template:
    spec:
      containers:
        - name: migrate
          image: myapp/migrator:1.5
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: myapp-db-credentials
                  key: connection-string
      restartPolicy: OnFailure

Hook weights control execution order. Lower weights run first for pre-upgrade hooks, higher weights run first for post-upgrade hooks.

Delete policies:

  • hook-succeeded: Delete job after successful completion
  • hook-failed: Delete job after failed execution
  • before-hook-creation: Delete existing job before creating new one

Rollback Failure Scenarios

Some situations prevent straightforward rollbacks:

Resource no longer exists in old revision:

If the previous version created resources that were deleted in a later version, rollback will recreate them.

Storage resource changes:

Storage classes and persistent volumes may not be rollback-able depending on their provisioner.

API version deprecations:

If you upgraded to a resource using a newer API version that is no longer available in the old version, rollback cannot proceed.

Manual edits:

If someone manually edited Kubernetes resources managed by Helm, rollback may conflict with those changes.

# Force rollback (may leave resources in unexpected state)
helm rollback myapp 1 --force

# Check what will happen before rolling back
helm template myapp --revision 1 ./charts/myapp > /tmp/rollback.yaml
kubectl diff -f /tmp/rollback.yaml

Best Practices for Production

Always use --wait and --timeout:

helm upgrade myapp ./charts/myapp \
  --wait \
  --timeout 10m \
  --cleanup-on-fail

Keep reasonable history:

# Limit secrets stored by Helm
helm.sh/resource-policy: keep

# Or use postrenderer to exclude hook resources

Test rollbacks in staging:

# In your staging pipeline
- name: Rollback test
  run: |
    # Deploy current version
    helm upgrade --install myapp ./charts/myapp --namespace staging

    # Rollback immediately
    helm rollback myapp --namespace staging

    # Verify rollback completed
    helm history myapp --namespace staging

Monitor rollback events:

# Alerting rule
- alert: HelmRollback
  expr: |
    increase(helm_release_rollback_total[5m]) > 0
  labels:
    severity: warning
  annotations:
    summary: "Helm rollback detected"
    description: "Release {{ $labels.name }} rolled back in namespace {{ $labels.namespace }}"

Rollback Trade-offs

Helm rollback is not always the right tool. Here is how it compares to other recovery strategies.

ScenarioHelm RollbackForward FixBlue-Green Deploy
Config change broke thingsFast (single command)Takes longerFast (switch traffic)
Bad deployment image tagFastRebuild and redeployFast switch back
Database migration failureDangerous (schema already changed)Correct approachRoll traffic, fix migration
CRD changesMay not workOften requiredDepends on change type
Data loss riskHigh if storage affectedLower with proper backupLow

The key question: does the previous revision actually represent a safe state? If your upgrade modified persistent data, rolling back may not undo those changes.

Observability Hooks

Track rollback health and deployment reliability with these monitoring practices.

Key metrics to track:

# Rollback frequency by release
sum(rate(helm_release_rollback_total[5m])) by (name, namespace)

# Failed upgrades that triggered atomic rollback
sum(rate(helm_upgrade_failed_total[5m])) by (name, namespace)

# Release revision count (high count = unstable releases)
helm_release_info{owner=helm}

# Time since last successful deployment
time() - helm_release_last_deployed_timestamp

Alert rules for Helm releases:

# Alert when a rollback occurs
- alert: HelmRollbackExecuted
  expr: increase(helm_release_rollback_total[5m]) > 0
  labels:
    severity: warning
  annotations:
    summary: "Helm rollback executed for {{ $labels.name }}"
    description: "Release {{ $labels.name }} in {{ $labels.namespace }} was rolled back. Investigate the cause."

# Alert when release is in failed state
- alert: HelmReleaseFailed
  expr: helm_release_info{status="failed"} == 1
  labels:
    severity: critical
  annotations:
    summary: "Helm release {{ $labels.name }} is in failed state"
    description: "Release has been failing. Manual intervention may be required."

# Alert on high revision count (unstable release)
- alert: HelmReleaseUnstable
  expr: count(helm_release_info) by (name, namespace) > 10
  labels:
    severity: warning
  annotations:
    summary: "Release {{ $labels.name }} has {{ $value }} revisions"
    description: "High revision count indicates frequent changes or rollbacks. Review release stability."

Debugging commands:

# Get full release history with details
helm history myapp --all

# See what changed between revisions
helm diff revision myapp 2 3

# Check current release status
helm status myapp

# Get all values at a specific revision
helm get values myapp --revision 3

# View the manifest that would be applied at revision 1
helm template myapp --revision 1 ./charts/myapp > /tmp/rev1.yaml

Common Pitfalls / Anti-Patterns

Relying on rollback for database migrations

Rollback does not undo database migrations. If your upgrade runs SQL migrations and then the application fails to start, rolling back the Kubernetes Deployment does not roll back the database schema. This leaves your app in a broken state against a migrated schema.

Always design database migrations to be reversible, or design your upgrade process so that a failed migration aborts the Helm upgrade before the new application code is deployed.

Not using —atomic in automated deployments

Without --atomic, a failed upgrade leaves the release in a pending-upgrade or failed state. The next deployment attempt may behave unexpectedly. --atomic guarantees you always end up with a working release, either the new one or the rolled-back previous one.

Manual edits between deploy and rollback

If someone uses kubectl edit to modify a resource managed by Helm after an upgrade, that change survives the next helm upgrade because Helm overwrites whatever is in the cluster with what is in the chart. It also complicates rollback since the cluster state no longer matches the stored revision.

Use helm.sh/resource-policy: keep annotations to exclude specific resources from Helm management when manual edits are necessary.

Not limiting release history

Helm stores every revision as a Kubernetes secret. Without limits, hundreds of revisions accumulate over time, bloating the namespace and slowing down helm list and helm history. Set reasonable history limits and prune old revisions.

Forgetting to test rollback in staging

A rollback procedure that has never been tested is unreliable. If your first rollback attempt happens during a production incident at 3am, you do not want to discover that your rollback hook has a bug.

Test rollback in staging on every significant release.

Quick Recap

Key Takeaways

  • Helm stores every release as a numbered revision; rollback replays the stored state of that revision
  • Use --atomic for automated pipelines to guarantee clean state on failure
  • Design migrations to be reversible or build forward-fix procedures instead of relying on rollback
  • Test rollback in staging before relying on it in production
  • Monitor rollback events with Prometheus alerts to catch problems early

Rollback Checklist

# Check release history
helm history myapp

# See what values were deployed at each revision
helm get values myapp --revision 2

# Rollback to previous working revision
helm rollback myapp --wait --timeout 5m

# Rollback to specific revision
helm rollback myapp 1 --wait

# Check rollback succeeded
helm status myapp
helm history myapp

Conclusion

Helm revision history makes rollback straightforward in most cases. Use --atomic for automatic rollback in CI/CD, implement proper hooks for data migration, and test rollback procedures regularly in non-production environments. For more on Helm, see our Helm Charts overview, and for deployment strategies that reduce rollback need, see our CI/CD Pipelines guide. For GitOps patterns with ArgoCD, see our GitOps with ArgoCD and Flux post.

Category

Related Posts

Developing Helm Charts: Templates, Values, and Testing

Create production-ready Helm charts with Go templates, custom value schemas, and testing using Helm unittest and ct.

#helm #kubernetes #devops

Helm Charts: Templating, Values, and Package Management

Learn Helm Charts from basics to advanced patterns. Master Helm templates, values management, chart repositories, and production deployment workflows.

#kubernetes #helm #devops

Container Security: Image Scanning and Vulnerability Management

Implement comprehensive container security: from scanning images for vulnerabilities to runtime security monitoring and secrets protection.

#container-security #docker #kubernetes