Demystifying Stateful Sets

You may have heard of StatefulSets in Kubernetes. But if you’ve never had the chance to work with them, it might be hard to understand what purpose they serve.

In this article we will review when to use StatefulSets and provide a quick walkthrough of how to create one.

Why Stateful Sets

StatefulSet is the workload API object used to manage stateful applications

Kubernetes documentation

A stateful application remembers specific details of a request, either in memory or persistent storage. Future requests from the same client require the “processor” (e.g., the database or webapp) to have access to that same data. A user profile or shopping cart are examples of stateful features. Databases and caching software like Mysql, Memcached, Redis, and Cassandra, as well as stateful messaging systems like RabbitMQ, are also examples of stateful applications.

Let’s review some of the key features of StatefulSets. Keep in mind that an application does not need to use all of these features, however.

Features of StatefulSets

Stable Network Identities

Pods in a StatefulSet are assigned unique and predictable hostnames that are based on their ordinal index using the formula $(statefulset name)-$(ordinal) (db-0, db-1, db-2). This provides predictable hostnames for applications to discover each other.

Ordered Deployment / Scaling

Let’s say we want to deploy a Mysql database cluster to Kubernetes consisting of a primary node and several replicas. If we were to use a standard Deployment for the mysql nodes, pods would be deployed and scaled up/down in arbitrary order. If the primary node was removed, we would break replication in the Mysql cluster.

If we instead use a StatefulSet to deploy the Mysql cluster, we could ensure the first pod (mysql-0) is the primary mysql node while all subsequent pods (mysql-n) are read-replicas. On scale-down, Kubernetes will remove the highest numbered pods first – the reverse order of creation.

As long as we don’t scale the StatefulSet pods down to zero, we won’t lose the mysql primary node.

Stable Storage

Stateful applications often require data storage that persists beyond the lifecycle of an individual pod. In Kubernetes, this is accomplished with persistent volumes (PVs). When the pod is replaced, the storage volume can be reattached. The unique naming of pods make it easy to match storage volumes with pods when they are replaced.

Rolling Updates

StatefulSets support rolling updates, allowing you to update pods one at a time. Rolling updates will proceed in the same order as Pod termination (from the largest ordinal to the smallest), updating each Pod one at a time. The “reverse” order of rolling updates was designed to accommodate that the first pod (pod 0) is often the “primary” node.

Headless Service

A StatefulSet can use a Headless Service to control the domain of its Pods. The domain managed by this Service takes the form:

$(service name).$(namespace).svc.cluster.local

Each pod belonging to the service also gets consistent DNS entry consisting of:

$(pod-name).$(service name).$(namespace).svc.cluster.local

By the way, a headless service is created by explicitly specifying “None” for the cluster IP address (.spec.clusterIP).

With headless Services, a cluster IP is not allocated, kube-proxy does not handle these Services, and there is no load balancing or proxying done by the platform for them. This will become clearer when we create a headless service later in the example.


Visualization of a StatefulSet

Playing Around with StatefulSets

In this section we will walk though creating a StatefulSet running Apache HTTP Server. A StatefulSet is not something normally used for a simple web server, but this will allow us to display different pages from different pods to demonstrate reliable storage, and it’s easy to set up.

I’m using minikube with podman from my laptop, which will create “hostPath” style persistent volumes (i.e., data stored in a directory on the host VM). In a real Kuberentes production environment, attached storage volumes are normally used.

We create a StatefulSet with 1 initial replica, with a volume claim of 512Mi. We will mount the volume on the default www directory /usr/local/apache2/htdocs/:

kubectl create -f statefulset.yaml
# statefulset.yaml
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  serviceName: "httpd"
  replicas: 1
  selector:
    matchLabels:
      app: httpd
  template:
    metadata:
      labels:
        app: httpd
    spec:
      containers:
      - name: httpd
        image: docker.io/library/httpd:2-alpine
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: www
          mountPath: /usr/local/apache2/htdocs/
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 512Mi

We have 1 pod created from the StatefulSet:

kubectl get all
NAME        READY   STATUS    RESTARTS   AGE
pod/web-0   1/1     Running   0          3h10m

NAME                   READY   AGE
statefulset.apps/web   1/1     3h38m

Let’s verify the volumes that were created:

kubectl get persistentvolumeclaims
NAME        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
www-web-0   Bound    pvc-f0adab17-4bdf-4070-9a50-c9a9120fab33   512Mi      RWO            standard       6m34s

kubectl get persistentvolumes
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM               STORAGECLASS   REASON   AGE
pvc-f0adab17-4bdf-4070-9a50-c9a9120fab33   512Mi      RWO            Delete           Bound    sandbox/www-web-0   standard                6m1s

Let’s scale up to 2 pods:

kubectl scale statefulset.apps/web --replicas 2
statefulset.apps/web scaled

kubectl get pods
NAME    READY   STATUS    RESTARTS   AGE
web-0   1/1     Running   0          13m
web-1   1/1     Running   0          14s

❯ kubectl get persistentvolumeclaims,persistentvolumes
NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/www-web-0   Bound    pvc-f0adab17-4bdf-4070-9a50-c9a9120fab33   512Mi      RWO            standard       14m
persistentvolumeclaim/www-web-1   Bound    pvc-5961d007-bd46-4561-a0e1-168a9d9dbcef   512Mi      RWO            standard       53s

NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM               STORAGECLASS   REASON   AGE
persistentvolume/pvc-5961d007-bd46-4561-a0e1-168a9d9dbcef   512Mi      RWO            Delete           Bound    sandbox/www-web-1   standard                53s
persistentvolume/pvc-f0adab17-4bdf-4070-9a50-c9a9120fab33   512Mi      RWO            Delete           Bound    sandbox/www-web-0   standard                14m

Now let’s create a headless service, so that each pod gets a DNS subdomain associated with the service name:

kubectl create service clusterip --clusterip='None' httpd
service/httpd created

kubectl get services
NAME    TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
httpd   ClusterIP   None         <none>        <none>    76s

Let’s now run a test pod, and install some basic dns tools:

kubectl run --rm -it --image=alpine test -- /bin/sh
If you don't see a command prompt, try pressing enter.

# apk update && apk add busybox-extras bind-tools

Now we can verify the DNS names created by the headless service. Querying the service will return the IPs of each pod.

$ host httpd.sandbox.svc.cluster.local
httpd.sandbox.svc.cluster.local has address 10.244.0.6
httpd.sandbox.svc.cluster.local has address 10.244.0.7

We can also query the dns name assigned to each pod:

$ host web-0.httpd.sandbox.svc.cluster.local
web-0.httpd.sandbox.svc.cluster.local has address 10.244.0.6
$ host web-1.httpd.sandbox.svc.cluster.local
web-1.httpd.sandbox.svc.cluster.local has address 10.244.0.7

Let’s now test that separate, stable storage is assigned to each pod. We will create a file containing the name of the pod:

kubectl exec web-0 -- /bin/sh -c 'echo "hello from $(hostname)" > /usr/local/apache2/htdocs/index.html'

kubectl exec web-1 -- /bin/sh -c 'echo "hello from $(hostname)" > /usr/local/apache2/htdocs/index.html'

When we port-forward to each pod, we can see a the unique file created on each pod:

kubectl port-forward pod/web-0 8080:80

kubectl port-forward pod/web-1 8081:80

Finally, let’s scale down the StatefulSet to zero, so that all pods are stopped:

kubectl scale statefulset.apps/web --replicas 0

kubectl get pods
No resources found in sandbox namespace.

However, our volumes are still there:

kubectl get persistentvolumeclaims,persistentvolumes
NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/www-web-0   Bound    pvc-f0adab17-4bdf-4070-9a50-c9a9120fab33   512Mi      RWO            standard       99m
persistentvolumeclaim/www-web-1   Bound    pvc-5961d007-bd46-4561-a0e1-168a9d9dbcef   512Mi      RWO            standard       86m

NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM               STORAGECLASS   REASON   AGE
persistentvolume/pvc-5961d007-bd46-4561-a0e1-168a9d9dbcef   512Mi      RWO            Delete           Bound    sandbox/www-web-1   standard                86m
persistentvolume/pvc-f0adab17-4bdf-4070-9a50-c9a9120fab33   512Mi      RWO            Delete           Bound    sandbox/www-web-0   standard                99m

We’ll scale the StatefulSet back to 2 pods:

kubectl scale statefulset.apps/web --replicas 2
statefulset.apps/web scaled

Our Apache Web server pods should have the same volumes reattached:

kubectl describe pod/web-0
...
    Mounts:
      /usr/local/apache2/htdocs/ from www (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d7r4l (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  www:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  www-web-0
    ReadOnly:   false
...

kubectl describe pod/web-1
...
    Mounts:
      /usr/local/apache2/htdocs/ from www (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mgnxw (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  www:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  www-web-1
    ReadOnly:   false
...

Summary

This was a quick introduction to StatefulSets. I hope it’s a bit more clear why they are useful and when you should use them. As mentioned previously, if you were running StatefulSets in a production Kubernetes cluster with important data, you would use a persistent storage volume such as Firestore on Google’s GKE or EBS with AWS’s EKS, rather than local disk on the Kubernetes worker node.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top