You may have heard of StatefulSets in Kubernetes. But if you’ve never had the chance to work with them, it might be hard to understand what purpose they serve.
In this article we will review when to use StatefulSets and provide a quick walkthrough of how to create one.
Why Stateful Sets
StatefulSet is the workload API object used to manage stateful applications
Kubernetes documentation
A stateful application remembers specific details of a request, either in memory or persistent storage. Future requests from the same client require the “processor” (e.g., the database or webapp) to have access to that same data. A user profile or shopping cart are examples of stateful features. Databases and caching software like Mysql, Memcached, Redis, and Cassandra, as well as stateful messaging systems like RabbitMQ, are also examples of stateful applications.
Let’s review some of the key features of StatefulSets. Keep in mind that an application does not need to use all of these features, however.
Features of StatefulSets
Stable Network Identities
Pods in a StatefulSet are assigned unique and predictable hostnames that are based on their ordinal index using the formula $(statefulset name)-$(ordinal) (db-0, db-1, db-2). This provides predictable hostnames for applications to discover each other.
Ordered Deployment / Scaling
Let’s say we want to deploy a Mysql database cluster to Kubernetes consisting of a primary node and several replicas. If we were to use a standard Deployment for the mysql nodes, pods would be deployed and scaled up/down in arbitrary order. If the primary node was removed, we would break replication in the Mysql cluster.
If we instead use a StatefulSet to deploy the Mysql cluster, we could ensure the first pod (mysql-0) is the primary mysql node while all subsequent pods (mysql-n) are read-replicas. On scale-down, Kubernetes will remove the highest numbered pods first – the reverse order of creation.
As long as we don’t scale the StatefulSet pods down to zero, we won’t lose the mysql primary node.
Stable Storage
Stateful applications often require data storage that persists beyond the lifecycle of an individual pod. In Kubernetes, this is accomplished with persistent volumes (PVs). When the pod is replaced, the storage volume can be reattached. The unique naming of pods make it easy to match storage volumes with pods when they are replaced.
Rolling Updates
StatefulSets support rolling updates, allowing you to update pods one at a time. Rolling updates will proceed in the same order as Pod termination (from the largest ordinal to the smallest), updating each Pod one at a time. The “reverse” order of rolling updates was designed to accommodate that the first pod (pod 0) is often the “primary” node.
Headless Service
A StatefulSet can use a Headless Service to control the domain of its Pods. The domain managed by this Service takes the form:
$(service name).$(namespace).svc.cluster.local
Each pod belonging to the service also gets consistent DNS entry consisting of:
$(pod-name).$(service name).$(namespace).svc.cluster.local
By the way, a headless service is created by explicitly specifying “None” for the cluster IP address (.spec.clusterIP).
With headless Services, a cluster IP is not allocated, kube-proxy does not handle these Services, and there is no load balancing or proxying done by the platform for them. This will become clearer when we create a headless service later in the example.
Playing Around with StatefulSets
In this section we will walk though creating a StatefulSet running Apache HTTP Server. A StatefulSet is not something normally used for a simple web server, but this will allow us to display different pages from different pods to demonstrate reliable storage, and it’s easy to set up.
I’m using minikube with podman from my laptop, which will create “hostPath” style persistent volumes (i.e., data stored in a directory on the host VM). In a real Kuberentes production environment, attached storage volumes are normally used.
We create a StatefulSet with 1 initial replica, with a volume claim of 512Mi. We will mount the volume on the default www directory /usr/local/apache2/htdocs/:
kubectl create -f statefulset.yaml
# statefulset.yaml
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
serviceName: "httpd"
replicas: 1
selector:
matchLabels:
app: httpd
template:
metadata:
labels:
app: httpd
spec:
containers:
- name: httpd
image: docker.io/library/httpd:2-alpine
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/local/apache2/htdocs/
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 512Mi
We have 1 pod created from the StatefulSet:
kubectl get all
NAME READY STATUS RESTARTS AGE
pod/web-0 1/1 Running 0 3h10m
NAME READY AGE
statefulset.apps/web 1/1 3h38m
Let’s verify the volumes that were created:
kubectl get persistentvolumeclaims
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
www-web-0 Bound pvc-f0adab17-4bdf-4070-9a50-c9a9120fab33 512Mi RWO standard 6m34s
kubectl get persistentvolumes
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-f0adab17-4bdf-4070-9a50-c9a9120fab33 512Mi RWO Delete Bound sandbox/www-web-0 standard 6m1s
Let’s scale up to 2 pods:
kubectl scale statefulset.apps/web --replicas 2
statefulset.apps/web scaled
kubectl get pods
NAME READY STATUS RESTARTS AGE
web-0 1/1 Running 0 13m
web-1 1/1 Running 0 14s
❯ kubectl get persistentvolumeclaims,persistentvolumes
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/www-web-0 Bound pvc-f0adab17-4bdf-4070-9a50-c9a9120fab33 512Mi RWO standard 14m
persistentvolumeclaim/www-web-1 Bound pvc-5961d007-bd46-4561-a0e1-168a9d9dbcef 512Mi RWO standard 53s
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-5961d007-bd46-4561-a0e1-168a9d9dbcef 512Mi RWO Delete Bound sandbox/www-web-1 standard 53s
persistentvolume/pvc-f0adab17-4bdf-4070-9a50-c9a9120fab33 512Mi RWO Delete Bound sandbox/www-web-0 standard 14m
Now let’s create a headless service, so that each pod gets a DNS subdomain associated with the service name:
kubectl create service clusterip --clusterip='None' httpd
service/httpd created
kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
httpd ClusterIP None <none> <none> 76s
Let’s now run a test pod, and install some basic dns tools:
kubectl run --rm -it --image=alpine test -- /bin/sh
If you don't see a command prompt, try pressing enter.
# apk update && apk add busybox-extras bind-tools
Now we can verify the DNS names created by the headless service. Querying the service will return the IPs of each pod.
$ host httpd.sandbox.svc.cluster.local
httpd.sandbox.svc.cluster.local has address 10.244.0.6
httpd.sandbox.svc.cluster.local has address 10.244.0.7
We can also query the dns name assigned to each pod:
$ host web-0.httpd.sandbox.svc.cluster.local
web-0.httpd.sandbox.svc.cluster.local has address 10.244.0.6
$ host web-1.httpd.sandbox.svc.cluster.local
web-1.httpd.sandbox.svc.cluster.local has address 10.244.0.7
Let’s now test that separate, stable storage is assigned to each pod. We will create a file containing the name of the pod:
kubectl exec web-0 -- /bin/sh -c 'echo "hello from $(hostname)" > /usr/local/apache2/htdocs/index.html'
kubectl exec web-1 -- /bin/sh -c 'echo "hello from $(hostname)" > /usr/local/apache2/htdocs/index.html'
When we port-forward to each pod, we can see a the unique file created on each pod:
kubectl port-forward pod/web-0 8080:80
kubectl port-forward pod/web-1 8081:80
Finally, let’s scale down the StatefulSet to zero, so that all pods are stopped:
kubectl scale statefulset.apps/web --replicas 0
kubectl get pods
No resources found in sandbox namespace.
However, our volumes are still there:
kubectl get persistentvolumeclaims,persistentvolumes
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/www-web-0 Bound pvc-f0adab17-4bdf-4070-9a50-c9a9120fab33 512Mi RWO standard 99m
persistentvolumeclaim/www-web-1 Bound pvc-5961d007-bd46-4561-a0e1-168a9d9dbcef 512Mi RWO standard 86m
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-5961d007-bd46-4561-a0e1-168a9d9dbcef 512Mi RWO Delete Bound sandbox/www-web-1 standard 86m
persistentvolume/pvc-f0adab17-4bdf-4070-9a50-c9a9120fab33 512Mi RWO Delete Bound sandbox/www-web-0 standard 99m
We’ll scale the StatefulSet back to 2 pods:
kubectl scale statefulset.apps/web --replicas 2
statefulset.apps/web scaled
Our Apache Web server pods should have the same volumes reattached:
kubectl describe pod/web-0
...
Mounts:
/usr/local/apache2/htdocs/ from www (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d7r4l (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
www:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: www-web-0
ReadOnly: false
...
kubectl describe pod/web-1
...
Mounts:
/usr/local/apache2/htdocs/ from www (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mgnxw (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
www:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: www-web-1
ReadOnly: false
...
Summary
This was a quick introduction to StatefulSets. I hope it’s a bit more clear why they are useful and when you should use them. As mentioned previously, if you were running StatefulSets in a production Kubernetes cluster with important data, you would use a persistent storage volume such as Firestore on Google’s GKE or EBS with AWS’s EKS, rather than local disk on the Kubernetes worker node.