Artifact Management: Build Caching, Provenance, and Retention
Manage CI/CD artifacts effectively—build caching for speed, provenance tracking for security, and retention policies for cost control.
Effective artifact management improves pipeline speed, ensures security, and controls costs. This guide covers caching strategies, provenance tracking, and retention policies.
Artifact Types and Storage Backends
CI/CD pipelines produce various artifact types requiring different storage characteristics.
Common artifact types:
| Type | Examples | Size | Retention |
|---|---|---|---|
| Build outputs | JAR, DLL, binary | Medium | Long |
| Container images | Docker, OCI | Large | Medium |
| Test reports | JUnit, cobertura | Small | Short |
| Dependencies | npm packages, Maven | Large | Medium |
| Deployment manifests | Helm charts, K8s YAML | Small | Medium |
Storage backend options:
| Backend | Best For | Cost | Features |
|---|---|---|---|
| S3/GCS | Any artifact type | Pay per use | Versioning, lifecycle |
| Azure Blob | Cross-cloud | Competitive | Immutable blobs |
| Artifactory | Package management | Enterprise | Universal format |
| GitHub Actions cache | Build cache | Limited free | Built-in |
| GitLab CI artifacts | Native integration | Included | Simple |
GitHub Actions artifact configuration:
- name: Upload build artifacts
uses: actions/upload-artifact@v4
with:
name: build-${{ matrix.node-version }}
path: |
dist/
coverage/
*.nupkg
retention-days: 30
compression-level: 9
- name: Download artifacts
uses: actions/download-artifact@v4
with:
pattern: build-*
path: ./combined
merge-multiple: true
GitLab CI artifacts:
build:
stage: build
script:
- npm run build
artifacts:
name: "build-$CI_COMMIT_SHORT_SHA"
paths:
- dist/
- coverage/
expire_in: 1 week
reports:
junit: junit.xml
coverage_report: cobertura.xml
Build Cache Strategies
Caching dependencies and intermediate build outputs dramatically reduces pipeline time.
Dependency cache patterns:
# npm with cache
- uses: actions/setup-node@v4
with:
node-version: "20"
cache: "npm"
cache-dependency-path: "package-lock.json"
# Maven with cache
- uses: actions/cache@v4
with:
path: |
~/.m2/repository
build/
key: maven-${{ runner.os }}-${{ hashFiles('**/pom.xml') }}
restore-keys: |
maven-${{ runner.os }}-
Layer-based Docker caching:
# GitHub Actions with BuildKit
- uses: docker/setup-buildx-action@v3
- uses: docker/build-push-action@v5
with:
push: true
tags: myregistry.azurecr.io/myapp:${{ github.sha }}
cache-from: |
type=gha
url=https://${{ secrets.GITHUB_TOKEN }}@artifactcache
cache-to: |
type=gha,mode=max
GitLab Docker layer caching:
build:docker:
stage: build
image: docker:24
services:
- docker:dind
script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
- docker build --cache-from $CI_REGISTRY_IMAGE:previous
- docker build --cache-from $CI_REGISTRY_IMAGE:previous -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
- docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
variables:
DOCKER_BUILDKIT: "1"
Distributed cache for self-hosted runners:
# Terraform for S3 cache bucket
resource "aws_s3_bucket" "cache" {
bucket = "myci-cache-bucket"
lifecycle_rule {
enabled = true
expiration {
days = 14
}
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "cache" {
bucket = aws_s3_bucket.cache.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
SBOM and Artifact Provenance
Software Bills of Materials and provenance tracking improve supply chain security.
Generate SBOM with Syft:
- name: Generate SBOM
uses: anchore/sbom-action@v0
with:
image: myregistry.azurecr.io/myapp:${{ github.sha }}
format: spdx-json
output-file: sbom.spdx.json
license-policy-file: license-policy.yaml
Attach provenance with GitHub Actions:
- name: Generate provenance
uses: actions/attest-build-provenance@v1
with:
subject-name: myregistry.azurecr.io/myapp
push-to-registry: true
image-shasum: ${{ env.IMAGE_SHA }}
GitLab CI SBOM generation:
sbom:generation:
stage: analyze
image:
name: anchore/syft:latest
entrypoint: [""]
script:
- syft myregistry.azurecr.io/myapp:${CI_COMMIT_SHA} -o spdx-json > sbom.spdx
artifacts:
paths:
- sbom.spdx
expire_in: 1 week
Verify provenance at deployment:
# OPA Gatekeeper policy for provenance
apiVersion: v1
kind: ConfigMap
metadata:
name: provenance-policy
namespace: gatekeeper-system
data:
policy: |
package kubernetes.admission
deny[msg] {
not provenance.verify(image)
msg := "Image provenance verification failed"
}
Retention Policies and Cleanup
Automatic cleanup prevents storage costs from growing unbounded.
GitHub Actions retention:
# Set globally or per artifact
- name: Upload artifact
uses: actions/upload-artifact@v4
with:
name: logs
path: logs/
retention-days: 7 # Override default 90 days
S3 lifecycle policies:
resource "aws_s3_bucket_lifecycle_configuration" "artifacts" {
bucket = aws_s3_bucket.artifacts.id
rule {
id = "cleanup-old-artifacts"
status = "Enabled"
filter {
prefix = "builds/"
}
expiration {
days = 30
}
noncurrent_version_expiration {
noncurrent_days = 7
}
}
rule {
id = "cleanup-failed-builds"
status = "Enabled"
filter {
prefix = "builds/failed/"
}
expiration {
days = 7
}
}
}
GitLab artifact expiry:
# In .gitlab-ci.yml
job:
artifacts:
expire_in: 1 week # or 3 days, 2 weeks, etc.
when: always # Keep artifacts even on failure
Automated cleanup scripts:
#!/bin/bash
# cleanup-artifacts.sh
REGISTRY="myregistry.azurecr.io"
RETENTION_DAYS=30
# Delete old images not in use
az acr repository show-manifests \
--repository myapp \
--orderby time_asc \
--query "[?timestamp<'$(date -d '30 days ago' -I)'].digest" \
--output tsv | while read digest; do
echo "Deleting digest: $digest"
az acr repository delete \
--yes \
--name myregistry \
--image myapp@sha256:$digest
done
Artifact Signing and Verification
Sign artifacts to ensure integrity and authenticity.
Cosign for signing:
# Install cosign
brew install cosign
# Sign image
cosign sign --yes myregistry.azurecr.io/myapp:v1.0.0
# Sign with key (production)
cosign sign --key cosign.key myregistry.azurecr.io/myapp:v1.0.0
# Verify
cosign verify --key cosign.pub myregistry.azurecr.io/myapp:v1.0.0
GitHub Actions signing:
- name: Install Cosign
uses: sigstore/cosign-installer@v3
- name: Sign container image
env:
COSIGN_YES: "true"
run: |
cosign sign \
--key-env-file <(echo "$COSIGN_KEY") \
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}@${{ env.DIGEST }}
Verification in Kubernetes admission:
# Kyverno policy for signed images
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-signed-images
spec:
validationFailureAction: Enforce
rules:
- name: check-signature
match:
resources:
kinds:
- Pod
verifyImages:
- imageReferences:
- "myregistry.azurecr.io/*"
attestors:
- entries:
- keys:
secretData: "{{ my-signing-key }}"
Cost Optimization for Artifact Storage
Tiered storage strategies:
# Terraform with intelligent tiering
resource "aws_s3_bucket" "artifacts" {
bucket = "myci-artifacts"
lifecycle_rule {
id = "standard-ia-transition"
enabled = true
transition {
days = 30
storage_class = "STANDARD_IA"
}
transition {
days = 90
storage_class = "GLACIER"
}
transition {
days = 365
storage_class = "DEEP_ARCHIVE"
}
}
}
Cache optimization:
# Prefer cache over artifact storage
- name: Build with cache
run: |
# Cache hit - use cached dependencies
if [ -d node_modules ]; then
echo "Using cached dependencies"
fi
Monitor storage usage:
# GitLab CI storage report
storage:
stage: report
script:
- |
echo "Checking artifact storage usage..."
api_url="https://gitlab.com/api/v4"
project_id=$CI_PROJECT_ID
token=$GITLAB_TOKEN
# Get storage usage
curl -s --header "PRIVATE-TOKEN: $token" \
"$api_url/projects/$project_id/storage" | jq .
When to Use / When Not to Use
When artifact management pays off
Artifact management matters when your pipelines build things more than once. If you are compiling code, packaging containers, or generating reports across dozens of commits per day, managing what gets stored and for how long directly affects your pipeline speed and cloud bill.
Use explicit artifact management when you need to pass outputs between pipeline stages. Downloading a fresh copy of node_modules on every job is wasteful when you could cache it. Same goes for build outputs that other jobs need later.
SBOM generation and provenance tracking make sense for any organization subject to supply chain security requirements. If your customers or compliance team ask “what went into this build?”, artifact management practices give you an answer.
When to keep it simple
For small projects with fast builds and infrequent deployments, aggressive artifact management adds complexity without much return. A project that builds in 30 seconds does not need layer caching or distributed cache infrastructure.
If you are just starting out with CI/CD, set basic retention policies first. You can always add SBOM generation and artifact signing later when the workflow stabilizes.
Artifact Management Decision Flow
flowchart TD
A[Pipeline Build Completes] --> B{Artifact needed later?}
B -->|No| C[Skip artifact storage]
B -->|Yes| D{Shared across jobs?}
D -->|Yes| E[Upload to shared storage]
D -->|No| F{Cache dependency?}
F -->|Yes| G[Use build cache]
F -->|No| C
E --> H{Security required?}
H -->|SBOM/provenance| I[Generate SBOM + sign]
H -->|No| J[Apply retention policy]
Production Failure Scenarios
Common Artifact Failures
| Failure | Impact | Mitigation |
|---|---|---|
| Cache key collision | Wrong cached artifact used for different code | Include file hash in cache key |
| Retention too short | Artifact deleted before debugging complete | Set minimum 7-day retention for test artifacts |
| Storage quota exceeded | Pipeline fails uploading artifacts | Monitor usage, set lifecycle policies |
| Unsigned artifact deployed | Security policy blocks deployment | Require signed artifacts in admission controller |
| SBOM stale after build | SBOM does not match deployed image | Generate SBOM after final image build, not intermediate |
Cache Corruption Recovery
flowchart TD
A[Build Fails with Cache] --> B{Cache suspected?}
B -->|Yes| C[Clear cache]
B -->|No| D[Investigate build script]
C --> E[Retry build]
E --> F{Succeeds?}
F -->|Yes| D
F -->|No| G[Disable cache, retry]
D --> H[Fix root cause]
Artifact Verification Checklist
# Verify artifact integrity
sha256sum myapp-1.0.0.jar
# Check Cosign signature
cosign verify --key cosign.pub myregistry.azurecr.io/myapp:v1.0.0
# Validate SBOM against image
syft myregistry.azurecr.io/myapp:v1.0.0 -o spdx-json | jq '.packages[].name'
Observability Hooks
Track artifact health to catch problems before they affect builds.
What to monitor:
- Artifact upload success/failure rate
- Cache hit ratio (cache vs fresh download)
- Storage consumption growth per pipeline
- Artifact age distribution (are retention policies working?)
- SBOM generation success rate
# GitHub Actions - check cache hit rate
# Enable cache reporting in job summary
- name: Build
run: npm ci && npm test
env:
Cache-hit: ${{ steps.cache.outputs.cache-hit }}
# GitLab CI - storage metrics
curl --header "PRIVATE-TOKEN: $TOKEN" \
"https://gitlab.com/api/v4/projects/$PROJECT_ID/storage"
# AWS S3 - bucket size by prefix
aws s3api list-objects-v2 \
--bucket myci-artifacts \
--query 'sum(Contents[].Size)' \
--output text
Common Pitfalls / Anti-Patterns
Using the same cache for unrelated builds
If your cache key is just node-modules without a hash of lock files, a React project and a Vue project can end up sharing a cache with incompatible dependency versions. Always include a content hash in cache keys.
Forgetting to expire artifacts
Without retention policies, artifacts accumulate forever. A busy CI system can accumulate terabytes in months. Set lifecycle rules on S3 buckets and retention-days on CI artifacts from day one.
Storing credentials in artifacts
Never upload artifacts containing secrets, API keys, or credentials. Even private repositories are not immune to accidental exposure. Audit what goes into your artifacts before uploading.
Not validating cached outputs
A cache hit does not mean the cached artifact is correct. A build that succeeds once with wrong configuration will keep succeeding from cache until someone clears it. Validate cache integrity when possible.
Over-engineering the first pass
Starting with distributed caches, SBOMs, and artifact signing before your pipeline is stable is backwards. Get the basic artifact storage working first, add caching, then layer on security features as the project matures.
Quick Recap
Key Takeaways
- Cache dependency downloads to cut pipeline time significantly
- SBOMs and provenance are essential for supply chain security compliance
- Retention policies prevent storage costs from growing unbounded
- Sign artifacts with Cosign and verify in Kubernetes admission
- Cache keys must include content hashes to avoid collisions
Artifact Health Checklist
# Check cache hit rate in GitHub Actions
# (enable Actions step summary for this)
# Verify artifact retention is set
grep -r "retention-days" .github/workflows/
# Test Cosign verification locally
cosign verify --key cosign.pub myregistry.azurecr.io/myapp:v1.0.0
# Check S3 lifecycle rules
aws s3api get-bucket-lifecycle-configuration --bucket myci-artifacts
# Scan for credentials in artifacts before upload
git secrets --scan || echo "No secrets found"
# List large artifacts for cleanup
aws s3api list-objects-v2 \
--bucket myci-artifacts \
--query 'sort_by(Contents, &Size)[-10:]'
Trade-off Summary
| Artifact Storage | Latency | Cost | Security | Best For |
|---|---|---|---|---|
| S3 / GCS / Blob | Low | Pay per use | IAM + bucket policies | Most CI/CD workloads |
| Artifactory | Medium | License + infra | Rich access control | Enterprise with artifacts |
| GitHub Packages | Medium | Storage + egress | GitHub token | GitHub-native workflows |
| GitLab Container Registry | Medium | Storage + egress | GitLab token | GitLab-native workflows |
| Harbor | Medium | Infrastructure | RBAC + OPA | Enterprise / air-gapped |
| Amazon ECR / GCR / ACR | Low | Pay per storage + egress | IAM | Cloud-native workloads |
Conclusion
Artifact management balances speed, security, and cost. Use aggressive caching for dependencies, generate SBOMs for supply chain security, and enforce retention policies to control storage. Sign artifacts with Cosign for verification in production clusters. For more on CI/CD practices, see our CI/CD Pipelines overview, and for deployment patterns, see our Deployment Strategies guide.
Category
Tags
Related Posts
Automated Testing in CI/CD: Strategies and Quality Gates
Integrate comprehensive automated testing into your CI/CD pipeline—unit tests, integration tests, end-to-end tests, and quality gates.
CI/CD Pipeline Design: Stages, Jobs, and Parallel Execution
Design CI/CD pipelines that are fast, reliable, and maintainable using parallel jobs, caching strategies, and proper stage orchestration.
Docker Volumes: Persisting Data Across Container Lifecycles
Understand how to use Docker volumes and bind mounts to persist data, share files between containers, and manage stateful applications.