Authenticating

Create a new ED25519 key and add it to your GCP account

ssh-keygen -t ed25519

gcloud compute project-info add-metadata --metadata-from-file ssh-keys=~/.ssh/id_ed25519.pub

# Add the key to a Compute Instance
gcloud compute instances add-metadata my-instance --metadata-from-file ssh-keys=~/.ssh/id_ed25519.pub

Application Default Credentials (ADC)

ADC is a mechanism for applications to automatically obtain credentials to call Google APIs.

Google’s own client libraries look for credentials in:

  • GOOGLE_APPLICATION_CREDENTIALS environment variable

Workload Identity

Workload Identity is a feature of GKE that allows you to associate a Kubernetes Service Account with a Google Cloud Service Account. This allows you to use the Google Cloud Service Account to authenticate to Google Cloud APIs from within a Kubernetes cluster.

Example using imperative gcloud commands

Here’s an example that configures a Kubernetes Service Account with Workload Identity so that a Pod can access a Google Cloud Storage bucket:

export KUBE_SA_NAME=myapp-sa
export KUBE_NAMESPACE=mynamespace
export GCP_SA_NAME=mycluster-workload-identity
export GCP_PROJECT=your-google-cloud-project
export GCP_BUCKET_NAME_DATA=your-loki-data-bucket

kubectl create serviceaccount ${KUBE_SA_NAME} --namespace ${KUBE_NAMESPACE}

gcloud iam service-accounts create ${GCP_SA_NAME} \
    --project=${GCP_PROJECT}

# Grant admin access to the bucket to the Google Cloud service account
gsutil iam ch serviceAccount:${GCP_SA_NAME}@${GCP_PROJECT}.iam.gserviceaccount.com:objectAdmin gs://${GCP_BUCKET_NAME_DATA}

gcloud iam service-accounts add-iam-policy-binding ${GCP_SA_NAME}@${GCP_PROJECT}.iam.gserviceaccount.com \
    --role roles/iam.workloadIdentityUser \
    --member "serviceAccount:${GCP_PROJECT}.svc.id.goog[${KUBE_NAMESPACE}/${KUBE_SA_NAME}]"

kubectl annotate serviceaccount ${KUBE_SA_NAME} \
    --namespace ${KUBE_NAMESPACE} \
    iam.gke.io/gcp-service-account=${GCP_SA_NAME}@${GCP_PROJECT}.iam.gserviceaccount.com

kubectl -n ${KUBE_NAMESPACE} set sa deploy/myapp ${KUBE_SA_NAME}

Adding IAM role using Terraform

This resource links an IAM service account with a Kubernetes service account so that a Pod can access a Google Cloud Storage bucket:

resource "google_service_account_iam_member" "workload_identity" {
  service_account_id = "projects/${var.project_id}/serviceAccounts/${var.service_account}"
  role               = "roles/iam.workloadIdentityUser"
  member             = "serviceAccount:${var.project_id}.svc.id.goog[${var.namespace}/${var.kubernetes_service_account}]"
}

Verify that the target IAM policy binding on the target IAM service account is correct:

$ gcloud iam service-accounts get-iam-policy my-service-account-h5cp@my-google-project.iam.gserviceaccount.com
bindings:
- members:
  - serviceAccount:my-google-project.svc.id.goog[default/webterminal]
  role: roles/iam.workloadIdentityUser
etag: abcdefabcdef
version: 1

Might also need to use the google_storage_bucket_iam_member resource to grant the IAM role to the service account:

resource "google_storage_bucket_iam_member" "bucket_admin" {
  bucket = "my-bucket-of-files"
  role   = "roles/storage.objectAdmin"
  member = "serviceAccount:${var.service_account}"
}

You will also need to add the annotation to the Kubernetes service account used by your app, e.g.:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: webterminal
  annotations:
    iam.gke.io/gcp-service-account: my-service-account-h5cp@my-google-project.iam.gserviceaccount.com

Troubleshoot Workload Identity

You can deploy the Google Cloud CLI in the same namespace as your application, and with the same service account, which should allow you to test the Google Cloud APIs “as if you were the same user”:

kubectl -n default apply -f - <<API
apiVersion: apps/v1
kind: Deployment
metadata:
  name: workload-identity-test
spec:
  selector:
    matchLabels:
      app: workload-identity-test
  template:
    metadata:
      labels:
        app: workload-identity-test
    spec:
      serviceAccountName: webterminal
      containers:
      - name: workload-identity-test
        image: gcr.io/google.com/cloudsdktool/google-cloud-cli:latest
        command: ["sleep","infinity"]
API

kubectl exec -it workload-identity-test-pod-xsjjqskqk -- sh

Now have a play around with gcloud and see if you can access the resources you’re expecting, e.g.:

gcloud artifacts repositories list

You can also see your current user identity:

$ gcloud info | grep 'Account:'
Account: [myapp-runner@mycompany-project.iam.gserviceaccount.com]

Using OAuth/JWT and Grafana Infinity plugin

If you want to access Google Cloud APIs from the Infinity plugin in Grafana, you can use the following details:

  • Authentication type = OAuth2
  • Grant Type = JWT
  • Email = your-username@your-project.iam.gserviceaccount.com
  • Private key = (JWT)
  • Token URL = https://oauth2.googleapis.com/token
  • Scopes = https://www.googleapis.com/auth/cloud-platform

Cookbook

Create a simple utility VM

To create a simple utility VM (e.g. for testing out an installation, or doing some research):

export VM_NAME=myvm123

gcloud config get-value project  # check which Project we're using

gcloud compute instances create $VM_NAME \
  --machine-type "e2-standard-2" \
  --image-project "debian-cloud" \
  --image-family "debian-11" \
  --subnet "default" \
  --zone europe-west1-b

gcloud compute ssh $VM_NAME --zone europe-west1-b

gcloud compute ssh $VM_NAME --zone $(gcloud compute instances list --filter="name=$VM_NAME" --format "get(zone)" | awk -F/ '{print $NF}')

Projects and zones

List all zones

gcloud compute zones list

List all Projects:

gcloud projects list

Networking

Delete all Network Endpoint Groups in a VPC

Because sometimes you might have Network Endpoint Groups hanging around after you’ve deleted a Kubernetes cluster:

gcloud compute network-endpoint-groups list \
  --filter="network:($VPC_NAME)" \
  --format="csv[no-heading](name,zone)" \
  | while IFS=, read -r name zone ; do 
  echo gcloud compute network-endpoint-groups delete $name --zone $zone --quiet; 
  done

IAM

  • A principal is a user (e.g. a Google Account) (user:), a service account (serviceAccount:), a group (group:), Workspace account or domain (domain:). Each principal has a unique identifier, which is typically an email address.
  • A role is a collection of permissions, which define what can be done on a resource.
  • A resource is a Google Cloud resource, such as a project, folder, or organisation.
  • A role binding is a combination of a principal(s) and a role, which grants the role to the principal(s).
  • An allow policy is a collection of role bindings.

It is also absolutely bonkers. I mean it’s fairly powerful, but I feel like I could still be learning this in 100 years’ time.

List all roles

This lists all of the roles that you can assign to a user or service account:

gcloud iam roles list

List the roles held by a principal/service account (at a Project level)

This needlessly-complex command lists all of the roles that a user has at a project level. This allows us to see what roles a user has in a Google Cloud Project.

gcloud projects get-iam-policy my-project \
  --flatten="bindings[].members" \
  --format='table(bindings.role)' \
  --filter="bindings.members:theserviceaccount@my-project.iam.gserviceaccount.com"

Should return something like this:

ROLE
roles/artifactregistry.reader
roles/artifactregistry.writer

Add a role to a principal/service account (at a Project level)

If you want to give a role in a project to a service account, you can do this:

gcloud projects add-iam-policy my-project \
  --member="serviceAccount:my-service-account@my-project.iam.gserviceaccount.com" \
  --role="roles/artifactregistry.reader"

You can list the roles available with gcloud iam roles list --format="value(name)".

Grant a role to a principal

gcloud projects add-iam-policy-binding my-project --member user:

Google Kubernetes Engine (GKE)

List all clusters

gcloud container clusters list --project my-corporate-department

Log on to a cluster

gcloud container clusters get-credentials my-pet-cluster --zone us-central1-c --project my-corporate-department

Artifact Registry

Authenticate to the container registry with podman

From 1

gcloud auth print-access-token | podman login -u oauth2accesstoken --password-stdin XX.gcr.io

Google Cloud SQL

Connect ad-hoc to Google Cloud SQL instance with a private IP address

Google Cloud SQL is a PITA to connect to, if you’ve chosen to put your database on a private IP address.

Here’s one way to do it - assuming you’ve already got a GKE Kubernetes cluster running in GCP:

export DATABASE_USER=myapp
export DATABASE_IP=$(gcloud sql instances describe YOURDBINSTANCENAME --format 'value(ipAddresses.ipAddress)')

kubectl -n default apply -f - <<API
apiVersion: apps/v1
kind: Deployment
metadata:
  name: pg-util
spec:
  selector:
    matchLabels:
      app: pg-util
  template:
    metadata:
      labels:
        app: pg-util
    spec:
      containers:
      - name: pg
        image: docker.io/library/postgres:14-alpine
        command: ["sleep","infinity"]
API

kubectl -n default exec -it $(kubectl -n default get pod -l app=pg-util -o name | cut --delimiter="/" --fields=2) -- psql --host ${DATABASE_IP} --user ${DATABASE_USER}

# Run your SQL here

kubectl -n default delete deploy/pg-util

Troubleshooting

This error is seen in kubectl get events: “Failed to Attach 1 network endpoint(s) (NEG “k8s1-4362fb64-default-myapp-4000-7414754d” in zone “us-central1-c”): googleapi: Error 400: Invalid value for field ‘resource.ipAddress’: ‘10.32.1.5’. Specified IP address 10.32.1.5 doesn’t belong to the (sub)network default or to the instance gke-mycluster-w-default-pool-fff0000-zzzz., invalid”

  • If you visit the Google Cloud web console, browse to your Cluster → Ingress → Backend services → (Service for your app) → Backends, you will see that there are 0 of 0 healthy services. Your Network Endpoint Group (NEG) is empty.
  • Cause: You are trying to expose a service outside the cluster using container-native load balancing but your Kubernetes cluster is not “VPC-native”.

Cannot deploy an image from a private Artifact Registry onto a GKE cluster - constant “ImagePullBackOff” error:

  • You need to grant the following OAuth scope to the nodes (in the cluster’s node pool) when you create the cluster: https://www.googleapis.com/auth/devstorage.read_only
  • If your cluster has node pools which use a custom Service Account, then you will need to grant the Role roles/artifactregistry.reader to the custom Service Account.
    • Get the Service Account ID: gcloud container node-pools describe default-pool --zone us-central1 --cluster my-gke-cluster --project my-gcp-project --format="value(config.serviceAccount)"
    • Grant the Role: gcloud artifacts repositories add-iam-policy-binding my-artifact-repo --location=us --member $SERVICE_ACCOUNT --role roles/artifactregistry.reader --project my-gcp-project

When trying to add an Ingress with SSL to a VPC-native GKE cluster, with Google-managed SSL, the endpoint cannot be reached in a browser:

  • If the Ingress has been created with annotation kubernetes.io/ingress.allow-http: "false" then the SSL configuration must be successful, otherwise the Ingress will be inaccessible, even when trying to directly access the IP of the Ingress. In other words: check that you’ve configured SSL correctly.
  • Diagnosis:
    • Check the status of the Ingress object - it might say something like “error running load balancer syncing routine: loadbalancer xxxxx does not exist: invalid configuration: both HTTP and HTTPS are disabled (kubernetes.io/ingress.allow-http is false and there is no valid TLS configuration); your Ingress will not be able to serve any traffic”
    • Check the status of the ManagedCertificate - kubectl -n default describe managedcertificate mycert - in the status field, it should show the reason why the certificate failed to be provisioned. For example: FailedNotVisible means that the DNS entry couldn’t be reached.
  • Solution:
    • If you don’t have a ManagedCertificate, create one.
    • Check that you can reach the hostname given in your ManagedCertificate. If it’s configured through Google DNS, then make sure an appropriate A-record exists for the hostname.
  1. https://stackoverflow.com/questions/63790529/authenticate-to-google-container-registry-with-podman