I’ve used so many performance monitoring tools and systems over the years. When you need to know information right now, tools like btop and glances are great for quick overviews. Historical data is fairly easy to pick through with sysstat.
However, when you want a comprehensive view of system performance over time, especially with GPU metrics for machine learning workloads, Performance Co-Pilot (PCP) is an excellent choice. It has some handy integrations with Cockpit for web-based monitoring, but I prefer using the command line tools directly.
This post explains how to set up PCP on Fedora and enable some very basic GPU monitoring for both NVIDIA and AMD GPUs.
Installing Performance Co-Pilot
Install the core packages and command line tools:
sudo dnf install pcp pcp-system-tools
Enable and start the PCP services:
sudo systemctl enable --now pmcd pmlogger
sudo systemctl status pmcd
These two services work together like a team:
pmcd(Performance Metrics Collection Daemon) gathers real-time metrics from various sources on your system when you request them.pmloggerrecords these metrics to log files for historical analysis.
You can verify that the services are working as expected:
# Check available metrics
pminfo | head -20
# View current CPU utilization
pmval kernel.all.cpu.user
# Show memory statistics
pmstat -s 5
Adding GPU metrics collection
I do a lot of LLM work locally and I’d like to keep track of my GPU usage over time. Fortunately, PCP supports popular GPUs through something called a PMDA (Performance Metrics Domain Agent). These are packaged in Fedora, but they have an interesting installation process.
NVIDIA GPUs
[!NOTE] Unverified instructions: I only have an AMD GPU, but I pulled this NVIDIA information from various places on the internet. Please let me know if you find any issues and I’ll update the post!
For NVIDIA GPUs, ensure you have the NVIDIA drivers and nvidia-ml library:
# Check if nvidia-smi works
nvidia-smi
# Install the NVIDIA management library if needed
sudo dnf install nvidia-driver-cuda-libs
Now install the NVIDIA PMDA:
cd /var/lib/pcp/pmdas/nvidia
sudo ./Install
The installer will prompt you for configuration options. Accept the defaults unless you have specific requirements.
[!NOTE] Thanks to Will Cohen for helping me get these NVIDIA steps corrected! 👏
After installation, verify GPU metrics are available:
# List all NVIDIA metrics
pminfo nvidia
# Check GPU utilization
pmval nvidia.gpuactive
# Monitor GPU memory usage
pmval nvidia.memused
AMD GPUs
For AMD GPUs, PCP provides the amdgpu PMDA that works with the ROCm stack:
# Ensure rocm-smi is installed and working
rocm-smi
# Install the AMD GPU PMDA package
sudo dnf install pcp-pmda-amdgpu
# Install the PMDA
cd /var/lib/pcp/pmdas/amdgpu
sudo ./Install
After installation, verify AMD GPU metrics:
# List all AMD GPU metrics
pminfo amdgpu
# Check GPU utilization
pmval amdgpu.gpu.load
# Monitor GPU memory usage
pmval amdgpu.memory.used
Querying performance data
There are lots of handy tools for querying PCP data depending on whether you need information about something happening now or want to analyze historical trends.
Real-time monitoring with pmrep
The pmrep tool provides formatted output perfect for dashboards or scripts.
It’s great for situations where you need to see what’s happening right now.
It’s much like iostat or vmstat from the sysstat package, but you get a lot more flexibility.
# System overview with 1-second updates
pmrep --space-scale=MB -t 1 kernel.all.load kernel.all.cpu.user mem.util.used
# GPU metrics for LLM monitoring (NVIDIA)
pmrep --space-scale=MB -t1 nvidia.gpuactive nvidia.memused nvidia.temperature
# GPU metrics for LLM monitoring (AMD)
pmrep --space-scale=MB -t 1 amdgpu.gpu.load amdgpu.memory.used amdgpu.gpu.temperature
Historical analysis with pmlogsummary
If you’re used to to running sar commands from the sysstat package, you’ll find pmlogsummary very familiar.
Again, you can do a lot more with pmlogsummary than with sar, but the basic concepts are similar.
# Summarize yesterday's GPU utilization (NVIDIA)
pmlogsummary -S @yesterday -T @today /var/log/pcp/pmlogger/$(hostname)/$(date -d yesterday +%Y%m%d) nvidia.gpuactive
# Summarize yesterday's GPU utilization (AMD)
pmlogsummary -S @yesterday -T @today /var/log/pcp/pmlogger/$(hostname)/$(date -d yesterday +%Y%m%d) amdgpu.gpu.load
# Find peak memory usage over the last hour
pmlogsummary -S -1hour /var/log/pcp/pmlogger/$(hostname)/$(date +%Y%m%d) mem.util.used
Troubleshooting tips
If GPU metrics aren’t showing up:
# Check if the PMDA is properly installed
pminfo -f pmcd.agent | grep -E "amdgpu|nvidia"
# Restart PMCD to reload PMDAs
sudo systemctl restart pmcd
# Check PMDA logs for errors
sudo journalctl -u pmcd -n 50
# Verify GPU drivers are working
rocm-smi # for AMD
nvidia-smi # for NVIDIA
Further reading
- Performance Co-Pilot documentation - Official PCP documentation and quick reference guides
- Red Hat’s PCP guide - Enterprise deployment patterns and best practices
- PMAPI - Performance metrics API
