Understanding CPU Requests and Limits

If you search for information about setting CPU requests and limits for containers, you might end up becoming even more confused from all the conflicting advice on the topic. This article is my attempt to understand how CPU requests and limits work.

In Kubernetes, CPU behaves differently from memory. CPU resources are compressible, meaning you can allocate more than what is physically available, and the system will throttle when the CPU is fully used.

Memory, on the other hand, is incompressible and strictly enforced, meaning you can’t allocate more than is available. This distinction makes understanding CPU requests and limits a bit more complex.

In this article, I’ll provide a brief overview of CPU requests and limits, explain Linux cgroups, and delve into how CPU throttling can occur when you define CPU limits.

Reasons Why CPU Requests and Limits Exist

Before we get started, it’s worth reviewing why CPU limits and requests exist in the first place.

Scheduling Pods

The primary purpose of CPU requests is to help the Kubernetes scheduler distribute pods fairly across nodes in the cluster. To do this, the scheduler must be given “hints” of how much CPU the pod is expected to consume. If CPU requests are not provided, multiple pods could end up competing for CPU on the same node.

On the other hand, if more CPU is requested than actually used, it can result in node underutilization. Even worse, this can also lead to node over-provisioning – where more nodes than needed are added to the cluster (when autoscaling is used to manage node).

Furthermore, if you happen to set CPU requests higher than any node in the cluster can provide, the pod will become stuck in “pending” state. There are a few dynamic node autoscaling solutions such Karpenter, which can launch nodes in the cloud that match the CPU requested.

You are ultimately responsible for setting CPU requests that meets the minimum CPU requirements for your application, yet avoids excessive over-provisioning.

Resource Isolation

Kubernetes provides CPU limits to “limit” the amount of CPU a pod has access to. This can help prevent a single pod from monopolizing all of the CPU on a node. However, before implementing CPU limits, it is crucial to understand how this mechanism works. CPU throttling and increased application latency can occur if limits are incorrectly set (more on that later).

Efficient Cluster Utilization

When CPU and memory requests and limits are set properly, the overall operating efficiency of the cluster is optimized. There is very little idle capacity and no pods are fighting for resources.

Checking CPU Utilization on a Node

We can see usable and allocated resources with kubectl describe node. Here is the relevant output from a node in a production cluster:

Some takeaways:

In the Capacity section, we see this node has 16 CPUs available (16 * 1000 millicores/CPU = 16,000 millicores) **
Allocatable CPU is 15600 m (millicores). Allocatable is the actual amount of CPU resources that can be allocated to pods running on the node – after accounting for system-level processes. Notice that 400 m is reserved for the system in this example.
There are 11 running pods, with different configurations for CPU requests and limits. One pod is requesting 12370 m of CPU (79%)
The total Allocated Resources for CPU is the sum of all requests – 12370m+200m+10m+100m+100m+30m+25m+100m = 12935m

** A millicore or millicpu is one thousandth of a CPU

Under “Allocated resources”, we see the notice:

Total limits may be over 100 percent, i.e., overcommitted

The means that the combined resource limits of all pods can be greater than the available capacity for a given resource on a node.

For example, on a node with 8 CPUs, we could have 10 pods running on it, each with a CPU limit of 1, for a total limit of 10 CPUs. Kubernetes will allow this, the idea being that different workloads can “burst” up to this limit. However, if all pods begin to use their max limit of 1 CPU at the same time, it could result in node instability or pod evictions.

Resource overcommitment is a strategy to optimize cluster utilization at the expense of stability, and should be used with care.

Understanding CPU Cgroups

Kubernetes uses the Linux kernel control groups (cgroups) feature to manage pod resource constraints.

CPU metrics for a cgroup can be found under /sys/fs/cgroup/cpu,cpuacct/. On Kubernetes nodes, there is a dedicated cgroup for pods, in addition to the other system cgroups:

/sys/fs/cgroup/cpu,cpuacct/
├── kubepods.slice
├── runtime.slice
├── system.slice
└── user.slice

The CPU cgroups are modeled in a hierarchy, each node dividing up CPU shares to its children nodes:

Let’s explore the files under /sys/fs/cgroup/cpu,cpuacct/ :

ls -1 /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/
cgroup.clone_children
cgroup.procs
cpuacct.stat
cpuacct.usage
cpuacct.usage_all
cpuacct.usage_percpu
cpuacct.usage_percpu_sys
cpuacct.usage_percpu_user
cpuacct.usage_sys
cpuacct.usage_user
cpu.cfs_period_us
cpu.cfs_quota_us
cpu.rt_period_us
cpu.rt_runtime_us
cpu.shares
cpu.stat
kubepods-besteffort.slice
kubepods-burstable.slice

ls -1 /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-besteffort.slice/
cgroup.clone_children
cgroup.procs
cpuacct.stat
cpuacct.usage
cpuacct.usage_all
cpuacct.usage_percpu
cpuacct.usage_percpu_sys
cpuacct.usage_percpu_user
cpuacct.usage_sys
cpuacct.usage_user
cpu.cfs_period_us
cpu.cfs_quota_us
cpu.rt_period_us
cpu.rt_runtime_us
cpu.shares
cpu.stat
kubepods-besteffort-pod23c6b220_7509_4962_aea1_8fdd38bda6cb.slice
kubepods-besteffort-pod4db42943_96e5_4994_8386_fde43c39a27d.slice
kubepods-besteffort-pod9526f82b_da65_4f1a_bdf8_2ec539abfb26.slice
notify_on_release
tasks

cat /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-burstable.slice/cpu.shares
13245

cat /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0231e885_14b7_4d0f_aeae_7ab16a92fd9c.slice/cri-containerd-02cbda6318520d18acc91b17d31371c3bd222fefdea390528fa2f1385a5ccb36.scope/cpu.shares
2

cat /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podfdbe3961_88ab_41cd_8935_f3afcf975c00.slice/cpu.shares
12666

A few observations:

there is a dedicated cgroup for burstable QoS pods (at least one container in the pod has a memory or CPU request or limit)
there is a dedicated cgroup for besteffort QoS pods (no container in the pods has requests or limits set)
cpu.shares – this is where cpu is allocate in “shares” ( 1 core = 1024 shares )
cpu.cfs_quota_us – the amount of CPU time that a process can consume over a specific time period
cpu.cfs_period_us – the time window in which CPU quota is enforced, measured in microseconds (default 100,000)
cpu.stat – contains throttling metrics

When limits and requests are specified in pods, the container runtime (e.g., docker, containerd) configures the cpu.shares in cgroups for each pod and container.

Understanding CPU Throttling

When CPU limits are used, the kernel uses the Completely Fair Scheduler (CFS) cgroup bandwidth control to enforce those limits.

When containers have CPU limits defined, those limits will be converted to a cgroup CPU “quota”. A CPU quota is the amount of time a process has access to CPU power, within a defined time period. When a process has used its allotted CPU quota for the given period, it gets throttled until the next period.

Let’s try to further understand the effect of CPU throttling and the calculations involved.

When resources.limits.cpu is set for a container, it gets translated to the cgroup cpu.cfs_quota_us using the formula:

cpu.cfs_quota_us = (cpu limit in CPU units) * cpu.cfs_period_us

So if we set resources.limits.cpu to 0.5 (half a CPU core), the value of cpu.cfs_quota_us will be:

cpu.cfs_quota_us = 0.5 * 100000 = 50000 microseconds (50 milliseconds)

If you have a CPU limit specified in millicores, simply divide by 1000 to get CPU units. For example, 300m would be 0.3 CPU units.

But how does the CSF bandwidth control determine whether the pod is exceeding the quota?

Going back to our example, the container is allowed to use up to 50 milliseconds of CPU time within a 100 millisecond second window (50% of the cfs_period_us window). This is its CPU quota. If it tries to use more CPU time than this, it will be throttled.

Next, we will examine the effects of CPU throttling on an application. All units of time will be in milliseconds.

Example scenario: We set resources.limits.cpu = "250m" (millicores) for an application container. The application requires 300 milliseconds of CPU time to complete a single request. How will the CPU limit affect the time it takes to complete the request?

apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  containers:
  - name: my-app
    image: app:latest
    resources:
      limits:
        cpu: "250m"
      requests:
        cpu: "100m"
...

Due to the CPU limit, the application will get a CPU quota of 25 ms. That is, 25 ms of CPU time for every 100 ms period (100,000 us = 100 ms). Since our example application needs 300 ms of CPU time to complete a single request, it will get throttled.

The application will be allowed to run for 25 ms, then CPU power will be “paused” for 75 ms. This will result in longer overall processing time for the request … but how much longer you ask?

*Only 25 ms CPU processing time for every 100 ms*

Our request now takes a total of 1125 ms. This is more than three times as long as it would take without any CPU limits!

Calculations:

12 periods of cpu.cfs_period_us have elapsed before the request was completed ( 300 ms / 25 ms )
The application was throttled 11 times (the last run was not throttle)
The application was throttled for 825 ms during processing of this request ( 11 * 75 ms )
Total request time is 1125 ms ( 12 periods * 100 ms + 25 ms )

CPU throttling can have a serious and noticeable effect on application latency and should be avoided.

Furthermore, if our example application was multi-threaded, CPU time used by each thread counts toward the quota. This means if our application runs in 10 parallel threads, it will exhaust its quota in only 2.5 ms (25 ms / 10 ). This is something to keep in mind when using CPU limits with multi-threaded applications.

Also, remember that a pod’s resource request/limit is the sum of the resource requests/limits for each container in the Pod. This should also be factored in when setting requests/limits.

Should I Stop Setting CPU Limits for Containers?

It would be misleading to say there is never a good reason to set CPU limits. Limits were designed to prevent a pod from monopolizing resources on a node, causing resource starvation of other pods on the same node.

What I feel is not made clear enough in the documentation is that pods are guaranteed to get the amount of CPU they request. And pods without limits can use excess CPU when available.

There is a even growing school of thought recommending not to set CPU limits at all:

Given that many customer-facing applications are sensitive to latency, it is your responsibility to decide whether setting CPU limits is worth the risk that the application is CPU throttled when it attempts to use more CPU.

However, I believe avoiding CPU limits should not be a universal rule for all Kubernetes clusters. As with most things, it depends.

My recommendation would be to monitor your applications carefully, especially new applications with unknown resource utilization patterns, then adjust requests (and limits if set) based on observed CPU usage patterns.

Open source monitoring tools such as Prometheus can expose CPU related pod metrics, which can be used in combination with Grafana for visualization and Alertmanager to trigger alarms.

Some example promql queries related to CPU (may vary depending on prometheus config):

# the sum of CPU throttled periods for pods within a namespace
sum(rate(container_cpu_cfs_throttled_seconds_total{namespace="my-namespace"}[5m])) by (pod) > 0

# top 10 CPU throttled pods
topk(10, sum(container_cpu_cfs_throttled_periods_total) by (pod, namespace))

# CPU utilization by a container as a percentage of the CPU requested 
100 * sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace="my-namespace"}) by (container) /  sum(kube_pod_container_resource_requests{job="kube-state-metrics", namespace="my-namespace", resource="cpu"}) by (container)

CPU utilization relative to CPU requested (red line)

Closing Thoughts

Understanding CPU requests/limits in Kubernetes can be tricky. There have also been bugs with the CFS bandwidth control causing unexpected CPU throttling, so make sure you are using an up-to-date linux kernel.

It is vital to monitor the CPU utilization across the cluster. If you see pods using unusually high amounts of CPU, consider Horizontal Pod Autoscaling (HPA), refactoring the application, or providing a worker node with sufficient CPU.

Use CPU requests to ensure pods receive minimum CPU they are expected to use (they can use more than requested).

Use CPU limits carefully, if required, but only once you understand how limits and throttling work.

Some other good articles I’ve found on the subject:

Reasons Why CPU Requests and Limits Exist

Scheduling Pods

Resource Isolation

Efficient Cluster Utilization

Checking CPU Utilization on a Node

Understanding CPU Cgroups

Understanding CPU Throttling

Should I Stop Setting CPU Limits for Containers?

Closing Thoughts

Related Posts

Leave a Comment Cancel Reply