System administrators often use load averages to quickly assess whether a server is overloaded. These values are easily accessible in tools like uptime and top. However, load averages alone do not provide enough detail about which resource—CPU or I/O—may be causing the bottleneck, and they lack granularity in time and cgroup-specific data.
According to man uptime
:
System load averages is the average number of processes that are either in a runnable or uninterruptable state. A process in a runnable state is either using the CPU or waiting to use the CPU. A process in uninterruptable state is waiting for some I/O access, eg waiting for disk. The averages are taken over the three time intervals. Load averages are not normalized for the number of CPUs in a system, so a load average of 1 means a single CPU system is loaded all the time while on a 4 CPU system it means it was idle 75% of the time.
As one can see from the description, high load averages can mean CPU contention but can also mean I/O contention and it is not possible to distinguish the two from this metric alone. Additionally, load averages represent system-wide metrics, meaning they are not cgroup-aware, and they only provide data for specific time windows (1, 5, and 15 minutes).
Enter Pressure Stall Information
To address these limitations, Linux introduced Pressure Stall Information (PSI) a few years ago. PSI provides a more precise view of contention by measuring how resources like CPU, memory, and I/O are being used.
PSI is available to the userspace on the proc filesystem, under /proc/pressure/<resource>
, such as /proc/pressure/cpu
:
$ cat /proc/pressure/cpu
some avg10=0.00 avg60=0.00 avg300=0.00 total=3490428
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
The values avg10, avg60, and avg300 indicate the average pressure over the past 10 seconds, 60 seconds, and 300 seconds, respectively.
- The “some” line indicates the share of time in which at least some tasks are stalled on a given resource.
- The “full” line indicates the share of time in which all non-idle tasks are stalled on a given resource simultaneously. Full indicates actual CPU time going into waste due to the contention.
In addition to the ratios, one can also see the total stall time in us as the last column.
Lastly, one can access PSI information for specific cgroups under their own cgroup path. This is extremely valuable as limits could have been set on specific groups which will may cause contention on their resources.
In summary, PSI provides deeper insights into system performance by offering resource-specific and cgroup-aware data. By leveraging PSI, system administrators can better pinpoint and resolve issues related to resource contention, improving the overall efficiency of their Linux environments.