Google Cloud
Authenticating
Create a new ED25519 key and add it to your GCP account
ssh-keygen -t ed25519
gcloud compute project-info add-metadata --metadata-from-file ssh-keys=~/.ssh/id_ed25519.pub
# Add the key to a Compute Instance
gcloud compute instances add-metadata my-instance --metadata-from-file ssh-keys=~/.ssh/id_ed25519.pub
Application Default Credentials (ADC)
ADC is a mechanism for applications to automatically obtain credentials to call Google APIs.
Google’s own client libraries look for credentials in:
GOOGLE_APPLICATION_CREDENTIALS
environment variable
Workload Identity
Workload Identity is a feature of GKE that allows you to associate a Kubernetes Service Account with a Google Cloud Service Account. This allows you to use the Google Cloud Service Account to authenticate to Google Cloud APIs from within a Kubernetes cluster.
Example using imperative gcloud commands
Here’s an example that configures a Kubernetes Service Account with Workload Identity so that a Pod can access a Google Cloud Storage bucket:
export KUBE_SA_NAME=myapp-sa
export KUBE_NAMESPACE=mynamespace
export GCP_SA_NAME=mycluster-workload-identity
export GCP_PROJECT=your-google-cloud-project
export GCP_BUCKET_NAME_DATA=your-loki-data-bucket
kubectl create serviceaccount ${KUBE_SA_NAME} --namespace ${KUBE_NAMESPACE}
gcloud iam service-accounts create ${GCP_SA_NAME} \
--project=${GCP_PROJECT}
# Grant admin access to the bucket to the Google Cloud service account
gsutil iam ch serviceAccount:${GCP_SA_NAME}@${GCP_PROJECT}.iam.gserviceaccount.com:objectAdmin gs://${GCP_BUCKET_NAME_DATA}
gcloud iam service-accounts add-iam-policy-binding ${GCP_SA_NAME}@${GCP_PROJECT}.iam.gserviceaccount.com \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:${GCP_PROJECT}.svc.id.goog[${KUBE_NAMESPACE}/${KUBE_SA_NAME}]"
kubectl annotate serviceaccount ${KUBE_SA_NAME} \
--namespace ${KUBE_NAMESPACE} \
iam.gke.io/gcp-service-account=${GCP_SA_NAME}@${GCP_PROJECT}.iam.gserviceaccount.com
kubectl -n ${KUBE_NAMESPACE} set sa deploy/myapp ${KUBE_SA_NAME}
Adding IAM role using Terraform
This resource links an IAM service account with a Kubernetes service account so that a Pod can access a Google Cloud Storage bucket:
resource "google_service_account_iam_member" "workload_identity" {
service_account_id = "projects/${var.project_id}/serviceAccounts/${var.service_account}"
role = "roles/iam.workloadIdentityUser"
member = "serviceAccount:${var.project_id}.svc.id.goog[${var.namespace}/${var.kubernetes_service_account}]"
}
Verify that the target IAM policy binding on the target IAM service account is correct:
$ gcloud iam service-accounts get-iam-policy my-service-account-h5cp@my-google-project.iam.gserviceaccount.com
bindings:
- members:
- serviceAccount:my-google-project.svc.id.goog[default/webterminal]
role: roles/iam.workloadIdentityUser
etag: abcdefabcdef
version: 1
Might also need to use the google_storage_bucket_iam_member
resource to grant the IAM role to the service account:
resource "google_storage_bucket_iam_member" "bucket_admin" {
bucket = "my-bucket-of-files"
role = "roles/storage.objectAdmin"
member = "serviceAccount:${var.service_account}"
}
You will also need to add the annotation to the Kubernetes service account used by your app, e.g.:
apiVersion: v1
kind: ServiceAccount
metadata:
name: webterminal
annotations:
iam.gke.io/gcp-service-account: my-service-account-h5cp@my-google-project.iam.gserviceaccount.com
Troubleshoot Workload Identity
You can deploy the Google Cloud CLI in the same namespace as your application, and with the same service account, which should allow you to test the Google Cloud APIs “as if you were the same user”:
kubectl -n default apply -f - <<API
apiVersion: apps/v1
kind: Deployment
metadata:
name: workload-identity-test
spec:
selector:
matchLabels:
app: workload-identity-test
template:
metadata:
labels:
app: workload-identity-test
spec:
serviceAccountName: webterminal
containers:
- name: workload-identity-test
image: gcr.io/google.com/cloudsdktool/google-cloud-cli:latest
command: ["sleep","infinity"]
API
kubectl exec -it workload-identity-test-pod-xsjjqskqk -- sh
Now have a play around with gcloud
and see if you can access the resources you’re expecting, e.g.:
gcloud artifacts repositories list
You can also see your current user identity:
$ gcloud info | grep 'Account:'
Account: [myapp-runner@mycompany-project.iam.gserviceaccount.com]
Using OAuth/JWT and Grafana Infinity plugin
If you want to access Google Cloud APIs from the Infinity plugin in Grafana, you can use the following details:
- Authentication type = OAuth2
- Grant Type = JWT
- Email = your-username@your-project.iam.gserviceaccount.com
- Private key = (JWT)
- Token URL = https://oauth2.googleapis.com/token
- Scopes = https://www.googleapis.com/auth/cloud-platform
Cookbook
Create a simple utility VM
To create a simple utility VM (e.g. for testing out an installation, or doing some research):
export VM_NAME=myvm123
gcloud config get-value project # check which Project we're using
gcloud compute instances create $VM_NAME \
--machine-type "e2-standard-2" \
--image-project "debian-cloud" \
--image-family "debian-11" \
--subnet "default" \
--zone europe-west1-b
gcloud compute ssh $VM_NAME --zone europe-west1-b
gcloud compute ssh $VM_NAME --zone $(gcloud compute instances list --filter="name=$VM_NAME" --format "get(zone)" | awk -F/ '{print $NF}')
Projects and zones
List all zones
gcloud compute zones list
List all Projects:
gcloud projects list
Networking
Delete all Network Endpoint Groups in a VPC
Because sometimes you might have Network Endpoint Groups hanging around after you’ve deleted a Kubernetes cluster:
gcloud compute network-endpoint-groups list \
--filter="network:($VPC_NAME)" \
--format="csv[no-heading](name,zone)" \
| while IFS=, read -r name zone ; do
echo gcloud compute network-endpoint-groups delete $name --zone $zone --quiet;
done
IAM
- A principal is a user (e.g. a Google Account) (
user:
), a service account (serviceAccount:
), a group (group:
), Workspace account or domain (domain:
). Each principal has a unique identifier, which is typically an email address. - A role is a collection of permissions, which define what can be done on a resource.
- A resource is a Google Cloud resource, such as a project, folder, or organisation.
- A role binding is a combination of a principal(s) and a role, which grants the role to the principal(s).
- An allow policy is a collection of role bindings.
It is also absolutely bonkers. I mean it’s fairly powerful, but I feel like I could still be learning this in 100 years’ time.
List all roles
This lists all of the roles that you can assign to a user or service account:
gcloud iam roles list
List the roles held by a principal/service account (at a Project level)
This needlessly-complex command lists all of the roles that a user has at a project level. This allows us to see what roles a user has in a Google Cloud Project.
gcloud projects get-iam-policy my-project \
--flatten="bindings[].members" \
--format='table(bindings.role)' \
--filter="bindings.members:theserviceaccount@my-project.iam.gserviceaccount.com"
Should return something like this:
ROLE
roles/artifactregistry.reader
roles/artifactregistry.writer
Add a role to a principal/service account (at a Project level)
If you want to give a role in a project to a service account, you can do this:
gcloud projects add-iam-policy my-project \
--member="serviceAccount:my-service-account@my-project.iam.gserviceaccount.com" \
--role="roles/artifactregistry.reader"
You can list the roles available with gcloud iam roles list --format="value(name)"
.
Grant a role to a principal
gcloud projects add-iam-policy-binding my-project --member user:
Google Kubernetes Engine (GKE)
List all clusters
gcloud container clusters list --project my-corporate-department
Log on to a cluster
gcloud container clusters get-credentials my-pet-cluster --zone us-central1-c --project my-corporate-department
Artifact Registry
Authenticate to the container registry with podman
From 1
gcloud auth print-access-token | podman login -u oauth2accesstoken --password-stdin XX.gcr.io
Google Cloud SQL
Connect ad-hoc to Google Cloud SQL instance with a private IP address
Google Cloud SQL is a PITA to connect to, if you’ve chosen to put your database on a private IP address.
Here’s one way to do it - assuming you’ve already got a GKE Kubernetes cluster running in GCP:
export DATABASE_USER=myapp
export DATABASE_IP=$(gcloud sql instances describe YOURDBINSTANCENAME --format 'value(ipAddresses.ipAddress)')
kubectl -n default apply -f - <<API
apiVersion: apps/v1
kind: Deployment
metadata:
name: pg-util
spec:
selector:
matchLabels:
app: pg-util
template:
metadata:
labels:
app: pg-util
spec:
containers:
- name: pg
image: docker.io/library/postgres:14-alpine
command: ["sleep","infinity"]
API
kubectl -n default exec -it $(kubectl -n default get pod -l app=pg-util -o name | cut --delimiter="/" --fields=2) -- psql --host ${DATABASE_IP} --user ${DATABASE_USER}
# Run your SQL here
kubectl -n default delete deploy/pg-util
Troubleshooting
This error is seen in kubectl get events
: “Failed to Attach 1 network endpoint(s) (NEG “k8s1-4362fb64-default-myapp-4000-7414754d” in zone “us-central1-c”): googleapi: Error 400: Invalid value for field ‘resource.ipAddress’: ‘10.32.1.5’. Specified IP address 10.32.1.5 doesn’t belong to the (sub)network default or to the instance gke-mycluster-w-default-pool-fff0000-zzzz., invalid”
- If you visit the Google Cloud web console, browse to your Cluster → Ingress → Backend services → (Service for your app) → Backends, you will see that there are
0 of 0
healthy services. Your Network Endpoint Group (NEG) is empty. - Cause: You are trying to expose a service outside the cluster using container-native load balancing but your Kubernetes cluster is not “VPC-native”.
- Container-native load balancing is enabled when you add an annotation
cloud.google.com/neg: '{"ingress": true}'
to a Service. - Solution: Create a VPC-native cluster.
- Thanks to this awesome GitHub issue.
- Container-native load balancing is enabled when you add an annotation
Cannot deploy an image from a private Artifact Registry onto a GKE cluster - constant “ImagePullBackOff” error:
- You need to grant the following OAuth scope to the nodes (in the cluster’s node pool) when you create the cluster:
https://www.googleapis.com/auth/devstorage.read_only
- If your cluster has node pools which use a custom Service Account, then you will need to grant the Role
roles/artifactregistry.reader
to the custom Service Account.- Get the Service Account ID:
gcloud container node-pools describe default-pool --zone us-central1 --cluster my-gke-cluster --project my-gcp-project --format="value(config.serviceAccount)"
- Grant the Role:
gcloud artifacts repositories add-iam-policy-binding my-artifact-repo --location=us --member $SERVICE_ACCOUNT --role roles/artifactregistry.reader --project my-gcp-project
- Get the Service Account ID:
When trying to add an Ingress with SSL to a VPC-native GKE cluster, with Google-managed SSL, the endpoint cannot be reached in a browser:
- If the Ingress has been created with annotation
kubernetes.io/ingress.allow-http: "false"
then the SSL configuration must be successful, otherwise the Ingress will be inaccessible, even when trying to directly access the IP of the Ingress. In other words: check that you’ve configured SSL correctly. - Diagnosis:
- Check the status of the
Ingress
object - it might say something like “error running load balancer syncing routine: loadbalancer xxxxx does not exist: invalid configuration: both HTTP and HTTPS are disabled (kubernetes.io/ingress.allow-http is false and there is no valid TLS configuration); your Ingress will not be able to serve any traffic” - Check the status of the
ManagedCertificate
-kubectl -n default describe managedcertificate mycert
- in thestatus
field, it should show the reason why the certificate failed to be provisioned. For example:FailedNotVisible
means that the DNS entry couldn’t be reached.
- Check the status of the
- Solution:
- If you don’t have a
ManagedCertificate
, create one. - Check that you can reach the hostname given in your
ManagedCertificate
. If it’s configured through Google DNS, then make sure an appropriate A-record exists for the hostname.
- If you don’t have a
-
https://stackoverflow.com/questions/63790529/authenticate-to-google-container-registry-with-podman ↩