Plausible Analytics
Backing up
User database (PostgreSQL) backup
# Backup the database (users, etc.)
podman exec plausible-db pg_dumpall -p 5432 -U plausible -l plausible > postgres12dump.sql
Analytics database (Clickhouse) backup
Half-arsed backup
How to do a quick and dirty back up of the Clickhouse database by backing up the underlying Persistent Volume.
This assumes that your Persistent Volume is actually somewhere on the host:
LOCAL_PV_PATH=$(kubectl get pvc -n plausible data-plausible-events-db-0 -o template='{{.spec.volumeName}}' | xargs kubectl get pv -o template='{{.spec.hostPath.path}}')
tar -cf clickhouse-pv-$(date +%F).tar -C $LOCAL_PV_PATH .
(How much use would this be in a real DR situation? Anyone’s guess…)
Disk backup, shipped manually to AWS
NB: I think that this needs Clickhouse 22 or later because it uses the BACKUP
command.
This hasn’t been tested so don’t use this for anything important ;-)
First, configure a backup destination in Clickhouse.
Restart Clickhouse to pick up the changes.
Next set up a user in AWS:
# Create a policy that allows read/write to the backup bucket
aws iam create-policy --policy-name backup-robot-policy --policy-document file://backup-robot-policy.json
# Create a user group
aws iam create-group --group-name backup-agents
# Create a custom policy that allows write access to the backup bucket
aws iam create-policy --policy-name backup-agent-write \
--description "Allows write access to the backup bucket" \
--policy-document \
'{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "RegistryWrite",
"Effect": "Allow",
"Action": [
"s3:GetBucketAcl",
"s3:ListBucket",
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::backups.monodot",
"arn:aws:s3:::backups.monodot/*"
]
}
]
}'
# Attach the policy to the group
aws iam attach-group-policy --group-name backup-agents --policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/backup-agent-write
aws iam create-user --user-name vinson-backup-robot
# Add the user to the group
aws iam add-user-to-group --user-name vinson-backup-robot --group-name backup-agents
# Create an access key for the user
aws iam create-access-key --user-name vinson-backup-robot
# Create a bucket for backups
aws s3 mb s3://backups.xxx
Then, drop to a shell with access to the Kubernetes cluster where Plausible and Clickhouse are running:
kubectl -n plausible exec plausible-events-db-0 -- clickhouse-client --query "BACKUP DATABASE plausible_dev TO Disk('backups', 'clickhouse-backup.zip')"
# Copy the zip file to local disk
kubectl -n plausible cp plausible-events-db-0:/tmp/backups/clickhouse-backup.zip ./clickhouse-backup.zip
# Ship it off to AWS
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
aws s3 cp ./clickhouse-backup.zip s3://backups.xxxxxx/plausible/clickhouse-backup-$(date +%F).zip
# Remove the backup file from the Pod
kubectl -n plausible exec plausible-events-db-0 -- rm /tmp/backups/clickhouse-backup.zip
Native S3 backup in Clickhouse
Clickhouse now has a native S3
backup target. But this feature doesn’t seem to be available in my version of Clickhouse so this is my future plan for backing up:
BACKUP DATABASE plausible_dev TO S3('https://backups.xxxx.s3.amazonaws.com/backup-S3/clickhouse_backup', 'xxxxxxxxx', 'xxxxxxx')
Operations
Viewing the latest analytics events in the DB
clickhouse-client
USE plausible_dev;
SELECT COUNT(*) FROM events;
SELECT name, user_id, session_id, hostname, pathname FROM events ORDER BY timestamp DESC LIMIT 100;
Migrating/updating
# Start a container running the old DB
podman run --name plausible-db -e POSTGRES_PASSWORD=xxxxxxx -e POSTGRES_USER=plausible -e POSTGRES_DATABASE=plausible -v /opt/plausible/db-data:/var/lib/postgresql/data postgres:12
# Dump all of the DB contents
podman exec plausible-db pg_dumpall -p 5432 -U plausible -l plausible > postgres12dump.sql
# Launch a postgres 14 instance
# (apply some kube yaml here)
# Import the data from 12
kubectl exec -i plausible-db-0 -n plausible -- psql -d plausible -U postgres < postgres12dump.sql
Troubleshooting
Empty analytics data
- Check that there is data in Clickhouse - go to http://clickhouse:8123/play (or expose it somehow). Use the top-right boxes to provide the username and password for Clickhouse. Then try a query like
SHOW DATABASES
orselect count(*) from SCHEMA_NAME.events
Visitor numbers are tiny compared to page visits
- Plausible calculates visitors by hashing IP addresses. If it can’t determine the visitor’s IP address correctly, or if it’s obfuscated by another network component (e.g. a load balancer), then it will be unable to correctly identify the visitor.
- I run Plausible on k3s which includes a load balancer, Traefik. In normal configuration it obfuscates the IP address of the client. To change this, set
hostNetwork: true
in the spec for thetraefik
deployment. This will expose the client’s IP address to Plausible. (This is a bit of a hack, but it works.)