Skip to content

Using etcd as Patroni DCS with the Zalando Postgres Operator

Patroni is a reliable cluster software for Postgres which, when running on Kubernetes, can use the Kubernetes API itself as DCS (Distributed Config Store). This is very nice, but on Kubernetes clusters with an unreliable API / Controlplane, this can lead to demotion of your cluster leader and thereby unhappy users. Gladly, Patroni supports other setups as DCS like etcd.

Etcd is a lightweight and resilient key / value store. Since Patroni is storing it’s cluster state, current leader and members of the cluster using only key / value pairs, this exactly fits the needs. Of course, this first needs an etcd deployment (in this case we run etcd itself on a Kubernetes cluster). There are a lot of “ready to go” solutions out there, I decided for a little different approach and created a etcd cluster deployment as Kustomize overlay instead of the wide spread Helm deployments out there. You can find the deployment on my Github repository. It relies on a Kustomize plugin called KHelm, which renders the Helm Chart first and then applies patches or additional resource definitions using Kustomize.

As you may know out of my other articles, I use heavily the Zalando Postgres Operator and with that, the Zalando Spilo container images. Spilo has Patroni built in and this all plays nicely with each other (at least most of the time 🙂 ).

Let’s deploy the above linked deployment of etcd:

kustomize build ./base --enable-alpha-plugins | k apply -f -

This will create everything your need, from namespace to the actual deployment. Probably you need to change the storageClass parameter in the values.yaml before applying. With a bit of patience, you should see three etcd pods coming up and after some more minutes, the etcd cluster has figured out who is the leader. Congratulations, you have created your first etcd cluster with three replicas. Also created by this deployment, are two services, one of which will become important later on.

Be aware, that I configured the parameter ETCD_ENABLE_V2. This enables the v2 API of etcd which is deprecated and cannot be used in the upcoming minor version of etcd (v3.6) anymore. I tell you in a second why this is important.

Patroni itself can be configured widely to use either etcd v2 or v3. To use secure communication with etcd and also authentication. So why I configured the deprecated v2 etcd version? Regardless on how future proof Patroni might be, the Zalando Operator is sadly not capable to handle all this configuration up to the point of writing this. A pull request is open now for almost a year in the Zalando Postgres Operator repository here. As soon as this is getting merged, feel free to use etcd v3 with all the configuration you may like. But up to this point, we only can configure an Operator parameter called etcd_host in the Zalando Operator. So from security point of view, I advice you to only run an etcd cluster on the same Kubernetes cluster as your Postgres clusters are running on. There is no encryption of traffic whatsoever. I haven’t tried, but it is maybe possible to workaround this issue by setting additional environment variables and adjusting the Patroni configuration within the postgresql CRD.

So, presumed you have a Zalando Postgres Operator running (if not, read here), the only thing you need to do, to use etcd as DCS for all Patroni clusters managed by the Operator, is to add the etcd_host parameter to the postgres-operator configmap. We use the service DNS name which has been created earlier. This should look like this:

  name: postgres-operator
  namespace: postgres-operator
apiVersion: v1
  api_port: "8080"
  aws_region: eu-central-1
  etcd_host: "etcd.etcd.svc.cluster.local:2379"
  external_traffic_policy: Cluster

Add this line to the existing postgres-operator configmap by editing it, or by adjusting your Postgres Operator deployment and reapply it. A restart of the actual Postgres Operator pod is probably needed (if you don’t rely on something like a reloader).

After the restart of the Operator, you should see in the log output, that he recreates the database pods using etcd now as DCS.

The actual database pod(s) are getting restarted by the Operator and the relevant configuration is thereby applied. After your database pods are back up, you can shell into one of your etcd containers and query the actual etcd value store using etcdctl. First we check what kind of values there are and then we get the current leader pod using etcdctl get.

export ETCDCTL_API=2

I have no name!@etcd-0:/opt/bitnami/etcd$ etcdctl ls service/

I have no name!@etcd-0:/opt/bitnami/etcd$ etcdctl get /service/postgres-a905452c-3e03-4e46-9989-142b3af70e29/leader


Leave a Reply

Your email address will not be published. Required fields are marked *