GCP Core Services: GCE, GKE, Cloud Storage, BigQuery
Master essential Google Cloud Platform services for containerized and serverless workloads—GCE, GKE, Cloud Storage, and serverless options.
GCP Core Services: GCE, GKE, Cloud Storage, BigQuery
Google Cloud Platform shares many concepts with other cloud providers but has its own service names, resource hierarchy, and operational patterns. If you are new to GCP, this post covers the essential services and how they fit together for DevOps workloads.
GCP organizes resources around projects rather than accounts. A project groups related resources, has its own IAM policies, and accumulates its own billing. This structure makes it easy to isolate workloads and manage access control across teams.
When to Use
GKE Autopilot vs Standard
Choose GKE Autopilot when you want Google to manage node provisioning, scaling, and upgrades. Autopilot works well for teams that want Kubernetes without the operational overhead, and for variable workloads where per-pod pricing beats paying for idle nodes.
Choose GKE Standard when you need SSH access to nodes, specific node configurations, daemon sets on dedicated infrastructure, or visibility into node-level resource allocation. Fixed, predictable workloads or compliance requirements around node access also point toward Standard.
Compute Engine vs Cloud Run vs Cloud Functions
Choose Compute Engine for long-running VMs requiring persistent state, specific hardware configurations, or legacy workloads that do not fit the container model.
Choose Cloud Run for containerized HTTP services that need automatic scaling from zero to thousands. Cloud Run handles requests up to 60 minutes per instance—most web services and APIs fit comfortably within that limit.
Choose Cloud Functions for single-purpose event-driven tasks—processing a Cloud Storage upload, responding to a Pub/Sub message, or a lightweight ETL step. For anything more substantial, Cloud Run is the better fit.
GCS Storage Class Selection
Use Standard Storage for active hot data accessed daily. Use Nearline Storage for data accessed less than once per month—backups, archival logs, or monthly reports. Use Coldline Storage for data accessed less than twice per year. Use Archive Storage for compliance archives you might need once a year or less.
When Not to Use GCP
Avoid GKE Autopilot when you need specific kernel modules, custom node images, or daemon sets that require direct node access. Autopilot restricts SSH access and node-level customization.
Avoid Compute Engine when you do not need persistent VMs—containerized workloads on Cloud Run scale faster and cost less for event-driven or HTTP-based services.
Avoid regional GCS buckets when you need global low-latency access. Standard regional buckets serve local traffic efficiently but add egress costs for cross-region downloads.
Avoid BigQuery as a primary application database. BigQuery is an analytics warehouse, not a transactional database. It handles analytical queries over large datasets well but lacks the random-access read/write patterns that operational databases provide.
GCP Project and Resource Hierarchy
GCP resources follow a four-level hierarchy: Organization, Folder, Project, and Resource. The organization sits at the top, followed by folders that can contain other folders or projects, then individual resources within projects.
# Set your project
gcloud config set project my-project-123
# List available projects
gcloud projects list
# Set compute zone
gcloud config set compute/zone us-central1-a
# Set compute region
gcloud config set compute/region us-central1
Folders let you group projects by team, environment, or department. Organizational policies applied at the folder level cascade down to all contained projects. This simplifies governance for large organizations.
IAM roles control access at project or resource level. GCP distinguishes between primitive roles (owner, editor, viewer) that affect all resources, and predefined roles that grant specific permissions for specific services.
flowchart TD
A[Organization] --> B[Folder: Team A]
A --> C[Folder: Team B]
B --> D[Project: prod-api]
B --> E[Project: prod-frontend]
C --> F[Project: dev-services]
D --> G[GKE Autopilot Cluster]
D --> H[Cloud Run Service]
D --> I[Cloud Storage Bucket]
E --> J[Cloud Run Service]
F --> K[Compute Engine MIG]
G --> L[Workload Identity]
L --> M[GCP Service Account]
Compute Engine Fundamentals
Google Compute Engine (GCE) provides virtual machines similar to EC2. Instance templates define the machine type, image, disk, and network configuration for launching instances.
# Create an instance from a template
gcloud compute instances create web-server-1 \
--source-instance-template=web-server-template \
--zone=us-central1-a
# List running instances
gcloud compute instances list
# Connect via SSH
gcloud compute ssh web-server-1 --zone=us-central1-a
Managed instance groups (MIGs) maintain a fleet of instances across zones. Like AWS ASGs, MIGs automatically heal failed instances and scale based on load.
# Resize a managed instance group
gcloud compute instance-groups managed set-size web-server-mig \
--size=5 \
--zone=us-central1-a
# Update instance template for rolling updates
gcloud compute instance-groups managed rolling-action start-update web-server-mig \
--zone=us-central1-a \
--version=template=web-server-template-v2
Preemptible VMs cost less than regular instances but can be terminated by GCP at any time. They work well for batch jobs and fault-tolerant workloads. Spot VMs are the successor to preemptible VMs with similar pricing dynamics.
GKE Operating Modes
Google Kubernetes Engine (GKE) offers two operating modes. Standard mode gives you control over node provisioning, scaling, and upgrades. Autopilot mode offloads node management to Google, provisioning and scaling nodes automatically as your workloads demand them.
# Create a Standard GKE cluster
gcloud container clusters create standard-cluster \
--zone=us-central1-a \
--num-nodes=3 \
--machine-type=e2-medium
# Create an Autopilot GKE cluster
gcloud container clusters create autopilot-cluster \
--region=us-central1 \
--enable-autopilot
Autopilot clusters provision nodes when pods schedule and remove them when workloads complete. You pay per pod rather than per node, which can reduce costs for variable workloads. The tradeoff is less control over node configuration and the inability to SSH to nodes directly.
Standard clusters give you full control. You choose instance types, manage node pools explicitly, and handle upgrades yourself. This works better when you have specific infrastructure requirements or need to run daemon sets and system workloads on dedicated nodes.
# node-pool configuration for standard cluster
apiVersion: container.cnrm.cloud.google.com/v1beta1
kind: ContainerNodePool
metadata:
name: compute-nodepool
spec:
clusterRef:
name: standard-cluster
location: us-central1
nodeConfig:
machineType: e2-medium
diskSizeGb: 50
nodeCount: 3
GKE uses Kubernetes natively, so kubectl commands, Helm charts, and GitOps workflows work the same as any Kubernetes cluster. The main GCP-specific integrations are workload identity for service account authentication and Anthos for hybrid/multi-cluster management.
Cloud Storage for Artifacts
Google Cloud Storage (GCS) uses buckets to store objects. Buckets live in projects and have globally unique names. GCS supports multiple storage classes that trade off cost against access latency.
# Create a bucket
gcloud storage buckets create gs://my-artifacts-bucket \
--location=US \
--default-storage-class=STANDARD
# Upload artifacts
gcloud storage cp ./dist/app.tar.gz gs://my-artifacts-bucket/prod/
# List bucket contents
gcloud storage ls gs://my-artifacts-bucket/prod/
# Set lifecycle policy
gcloud storage buckets update gs://my-artifacts-bucket \
--set-lifecycle-file=lifecycle-policy.json
Lifecycle policies automate object transitions between storage classes and deletions.
{
"rule": [
{
"action": {
"type": "SetStorageClass",
"storageClass": "NEARLINE"
},
"condition": {
"age": 30
}
},
{
"action": {
"type": "Delete"
},
"condition": {
"age": 365
}
}
]
}
GCS integrates with Cloud CDN for content delivery, with IAM for access control, and with Pub/Sub for triggering functions on object changes.
Cloud Run for Serverless Containers
Cloud Run runs containerized applications without managing infrastructure. It scales from zero to thousands of instances automatically based on incoming requests. Cloud Run is fully managed when you do not specify a VPC connector, or it can run in your VPC for private workload access.
# Deploy a container to Cloud Run
gcloud run deploy webapp \
--image=gcr.io/my-project/webapp:v1 \
--platform=managed \
--region=us-central1 \
--allow-unauthenticated
# Check service status
gcloud run services describe webapp --region=us-central1
# View logs
gcloud run services logs read webapp --region=us-central1
Cloud Run bills per actual usage measured in milliseconds, rounding up to 100ms minimum. This makes it economical for sporadic workloads where Lambda might charge for full seconds or billing increments.
# service.yaml for Cloud Run
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: webapp
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/minScale: "0"
autoscaling.knative.dev/maxScale: "100"
spec:
containers:
- image: gcr.io/my-project/webapp:v1
ports:
- containerPort: 8080
env:
- name: NODE_ENV
value: production
resources:
limits:
cpu: 1000m
memory: 512Mi
Cloud Functions is GCP’s original serverless offering, running individual functions in response to events. Cloud Run handles longer-running services and containers, while Cloud Functions handles quick event-driven tasks. Both are serverless options with different use cases.
GCP IAM for Service Accounts
IAM in GCP uses service accounts for workloads and human users. Service accounts are identities that workloads use to authenticate to GCP APIs. Workload Identity Federation lets Kubernetes service accounts assume GCP service accounts without managing key files.
# Create a service account
gcloud iam service-accounts create build-bot \
--display-name="CI/CD Build Bot"
# Grant roles to the service account
gcloud projects add-iam-policy-binding my-project-123 \
--member="serviceAccount:build-bot@my-project-123.iam.gserviceaccount.com" \
--role="roles/storage.objectAdmin"
# Create a key for external use (CI/CD outside GCP)
gcloud iam service-accounts keys create key.json \
--iam-account=build-bot@my-project-123.iam.gserviceaccount.com
Workload identity is the preferred approach for GKE workloads. It binds a Kubernetes service account to a GCP service account, and short-lived tokens replace key files.
# Enable workload identity on existing cluster
gcloud container clusters update standard-cluster \
--region=us-central1 \
--workload-pool=my-project-123.svc.id.goog
# Create IAM binding for the KSA
gcloud iam service-accounts add-iam-policy-binding \
--role=roles/iam.workloadIdentityUser \
--member="serviceAccount:my-project-123.svc.id.goog[default/my-k8s-service-account]" \
my-gcp-sa@my-project-123.iam.gserviceaccount.com
For more on managing cloud costs across providers, see our post on Cost Optimization.
Compute Service Trade-offs
| Scenario | Compute Engine | GKE Standard | GKE Autopilot | Cloud Run | Cloud Functions |
|---|---|---|---|---|---|
| Full node access | Yes | SSH to nodes | No | No | No |
| Serverless containers | No | No | No | Yes | No |
| Kubernetes ecosystem | No | Yes | Yes | No | No |
| Pay-per-second billing | No (per second, min 1 min) | No (per node) | Per pod | Yes | Yes |
| Scale from zero | No | No | No | Yes | Yes |
| Max request timeout | Persistent | Persistent | Persistent | 60 minutes | 60 minutes (2nd gen) |
| Daemon sets | Yes | Yes | No | No | No |
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| GKE Autopilot pod scheduling failures due to quota | Pods pending indefinitely, deployments time out | Pre-check quota in the region, request quota increases via support |
| MIG instance health check failures | Instances marked unhealthy and replaced, service disruption | Verify firewall rules allow health check IPs, check instance startup scripts |
| Cloud Run cold start affecting latency SLOs | First request after idle period times out | Set min instances to keep warm, use pre-warming pings |
| GCS bucket without lifecycle policy accumulating costs | Storage costs grow unbounded for old artifact versions | Apply lifecycle rules immediately on bucket creation, audit bucket sizes quarterly |
| Workload identity misconfiguration locking out GKE pods | Pods cannot authenticate to GCP APIs, services fail | Test workload identity bindings before deploying, keep a fallback key rotation window |
GCP Observability Hooks
GCE and MIG monitoring:
# List MIG instances and their status
gcloud compute instance-groups managed describe web-server-mig \
--zone=us-central1-a \
--format="table(name,baseInstanceName,currentAction,status)"
# Get instance metrics via Cloud Monitoring
gcloud monitoring metrics list --filter="resource.type=gce_instance"
GKE monitoring:
# Get cluster component status
gcloud container clusters describe standard-cluster \
--zone=us-central1-a \
--format="table(name,status,currentMasterVersion)"
# Check pod status across namespaces
kubectl get pods -A -o wide
# Get node pool sizes
gcloud container node-pools list --cluster=standard-cluster --zone=us-central1-a
Cloud Run monitoring:
# Check service revisions and traffic
gcloud run services describe webapp --region=us-central1
# View recent logs
gcloud logging read "resource.type=cloud_run_revision" --limit=50
Key Cloud Monitoring metrics to alert on:
| Service | Metric | Alert Threshold |
|---|---|---|
| Compute Engine | CPU utilization | > 80% for 5 minutes |
| Compute Engine | Instance uptime | < 99.9% monthly |
| GKE | Pod pending time | > 2 minutes |
| GKE | Node CPU allocation | > 85% |
| Cloud Run | Request latency p99 | > 2000ms |
| Cloud Run | Container instance count | > max - 2 |
| Cloud Storage | Object count | unexpected growth |
| Cloud Storage | Monthly storage | > budget |
Common Anti-Patterns
Using the default network. GCP’s default network has permissive firewall rules and is the same across all projects. Create a dedicated network with explicit firewall rules for each environment.
Not using Workload Identity. Creating and managing service account key files is a security risk and an operational burden. Workload Identity eliminates key files entirely for GKE workloads.
Leaving service account keys in source code or CI/CD systems. Service account keys committed to git or stored in CI/CD variables are a common compromise vector. Use Workload Identity for GKE or short-lived credentials for CI/CD systems.
Not setting up GCS bucket policies. Buckets are private by default, but misconfigured uniform bucket-level access can accidentally expose data. Use IAM conditions and test bucket ACLs in a non-production environment first.
Mixing prod and non-prod resources in the same project. Using a single project for all environments defeats GCP’s natural isolation. Separate projects per environment make IAM governance simpler and contain blast radius.
Capacity Estimation and Benchmark Data
Use these numbers for initial capacity planning. Actual performance varies by workload characteristics.
GCE Machine Types
| Series | Best For | Machine Types | Network Performance |
|---|---|---|---|
| e2 | Cost-effective general purpose | e2-medium → e2-standard-32 | Up to 16 Gbps |
| n2 | General purpose (standard workloads) | n2-standard-2 → n2-standard-80 | Up to 100 Gbps |
| n1 | General purpose (balanced) | n1-standard-1 → n1-standard-96 | Up to 32 Gbps |
| c2 | Compute optimized (high CPU) | c2-standard-4 → c2-standard-60 | Up to 100 Gbps |
| m2 | Memory optimized | m2-megabyte-416 → m2-ultramem-416 | Up to 100 Gbps |
| a2 | GPU optimized | a2-highgpu-1g → a2-megagpu-16g | Up to 100 Gbps |
Cloud Run Performance Targets
| Metric | Value | Notes |
|---|---|---|
| Cold start (container instance) | 100ms-2s | Depends on image size and initialization time |
| Cold start (dynamic loading) | 2-5 seconds | First request to a new instance |
| Min instances = 0 latency | Cold starts apply when at zero | Set min instances for latency-sensitive services |
| Max requests per instance | 80 concurrent (default) | Configure based on memory/CPU needs |
| Request timeout | 300 seconds (5 min) | Increase for long-running operations |
| Throughput per instance | ~1,000 RPS (simple HTTP) | Varies significantly with workload type |
GCS Storage Performance
| Metric | Value |
|---|---|
| Single object GET latency | 5-20ms (p50), 100-200ms (p99) |
| Single object PUT latency | 20-50ms (p50) |
| Recommended requests per bucket | Up to 1,000-5,000 requests per second per bucket |
| Typical throughput per bucket | 1-5 Gbps for large objects |
| Consistency | Strong consistency for all operations |
Cloud SQL Instance Tiers
| Tier | vCPUs | Memory | Max Connections | Typical Use |
|---|---|---|---|---|
| db-f1-micro | 1 | 0.6 GB | 45 | Dev/test |
| db-g1-small | 1 | 1.7 GB | 200 | Small production |
| db-n1-standard-1 | 1 | 3.75 GB | 400 | Entry production |
| db-n1-standard-4 | 4 | 15 GB | 1,000 | Medium production |
| db-n1-standard-8 | 8 | 30 GB | 2,000 | Large production |
| db-n1-highmem-4 | 4 | 26 GB | 1,000 | Memory-intensive |
Quick Recap
Key Takeaways
- GCP uses projects as the core organizational unit, not accounts like AWS
- GKE Autopilot removes node management entirely—Google handles provisioning and scaling
- GKE Standard keeps full control over nodes and is better for daemon sets and custom configurations
- Cloud Run handles containerized HTTP services with true serverless scaling from zero
- Workload Identity is the recommended way to give GKE workloads access to GCP APIs
GCP Onboarding Checklist
# 1. Set up a new project
gcloud projects create my-project-123 --name="My Project"
# 2. Enable required APIs
gcloud services enable container.googleapis.com compute.googleapis.com storage.googleapis.com
# 3. Create a GKE Autopilot cluster
gcloud container clusters create autopilot-cluster \
--region=us-central1 \
--enable-autopilot
# 4. Configure Workload Identity for a namespace
kubectl create serviceaccount my-sa -n default
gcloud iam service-accounts add-iam-policy-binding \
--role=roles/iam.workloadIdentityUser \
--member="serviceAccount:my-project-123.svc.id.goog[default/my-sa]" \
my-gcp-sa@my-project-123.iam.gserviceaccount.com
# 5. Create a Cloud Storage bucket with lifecycle policy
gcloud storage buckets create gs://my-artifacts-bucket --location=US
Conclusion
GCP provides robust alternatives to the services you would use in any cloud environment. Compute Engine handles virtual machines, GKE manages Kubernetes clusters in both standard and autopilot modes, Cloud Storage provides object storage with lifecycle management, and Cloud Run offers serverless container hosting.
The GCP project-based organization and strong IAM integration make it straightforward to isolate workloads and enforce least-privilege access. If your team is comfortable with Kubernetes, GKE autopilot removes the operational overhead while standard mode keeps the door open for custom configurations.
Category
Related Posts
GCP Data Services: Dataflow, BigQuery, and Pub/Sub
Guide to Google Cloud data services for building pipelines. Compare Dataflow vs Kafka, leverage BigQuery for analytics, use Pub/Sub, and design data lakes.
Alerting in Production: Building Alerts That Matter
Build alerting systems that catch real problems without fatigue. Learn alert design principles, severity levels, runbooks, and on-call best practices.
Data Migration: Strategies and Patterns for Moving Data
Learn proven strategies for migrating data between systems with minimal downtime. Covers bulk migration, CDC patterns, validation, and rollback.