Ultimate Monitoring Setup: Prometheus, Grafana, Node Exporter and NVIDIA GPU Utilization

In this step-by-step guide, we will create a full monitoring solution for both system metrics and NVIDIA GPU metrics using Prometheus, Grafana, and Node Exporter. This tutorial is designed to be as simple as copying and pasting commands, with no additional setup required. By the end, you’ll have a working monitoring stack visualizing GPU and system data.
Step 1: Create the Monitoring Directory
First, create a directory to organize all your monitoring-related files.
- Open your terminal and run the following command to create the directory:
mkdir -p ~/Docker/monitoring
cd ~/Docker/monitoring
This Docker/monitoring directory will contain the necessary configuration files and the Docker Compose file for the monitoring stack.
Step 2: Create the Docker Compose File
This file defines your entire stack, including Prometheus, Grafana, Node Exporter, and NVIDIA GPU Exporter. It will make sure everything runs in Docker containers. This will allow you to collect system metrics, GPU metrics, and visualize them in Grafana.
- Create the docker-compose.yml file inside the ~/Docker/monitoring directory:
nano docker-compose.yml
2. Copy and paste the following content into the docker-compose.yml file:
version: '3.8'
services:
prometheus:
image: prom/prometheus
container_name: prometheus
restart: unless-stopped
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
command:
- "--config.file=/etc/prometheus/prometheus.yml"
networks:
- monitoring
grafana:
image: grafana/grafana:latest
container_name: grafana
restart: unless-stopped
ports:
- "3000:3000"
networks:
- monitoring
node_exporter:
image: prom/node-exporter:latest
container_name: node_exporter
ports:
- "9100:9100"
networks:
- monitoring
nvidia_smi_exporter:
image: docker.io/utkuozdemir/nvidia_gpu_exporter:1.2.1
container_name: nvidia_smi_exporter
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=all
ports:
- "9835:9835"
volumes:
- /usr/bin/nvidia-smi:/usr/bin/nvidia-smi
- /usr/lib/x86_64-linux-gnu/libnvidia-ml.so:/usr/lib/x86_64-linux-gnu/libnvidia-ml.so
- /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1:/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
restart: unless-stopped
networks:
- monitoring
networks:
monitoring:
driver: bridge
volumes:
prometheus-data:
grafana-storage:
- Prometheus: Scrapes metrics and exposes them (port 9090).
- Grafana: Visualizes data from Prometheus, accessible (port 3000).
- Node Exporter: Collects system metrics like CPU and memory usage (port 9100).
- NVIDIA GPU Exporter: for collecting GPU metrics (port 9835).
Save and close the file (CTRL + O, ENTER, CTRL + X).
Step 3: Configure Prometheus to Scrape GPU Metrics
Now we need to configure Prometheus to scrape both system and GPU metrics.
- In the same directory (~/Docker/monitoring), create the Prometheus configuration file:
nano prometheus.yml
2. Paste the following configuration into the file:
global:
scrape_interval: 10s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['prometheus:9090']
- job_name: 'node_exporter'
static_configs:
- targets: ['node_exporter:9100']
- job_name: 'nvidia_smi_exporter'
static_configs:
- targets: ['nvidia_smi_exporter:9835']
This configuration tells Prometheus to scrape metrics from itself, Node Exporter (for system metrics), and NVIDIA GPU Exporter (for GPU metrics).
Step 4: Start the Monitoring Stack
With everything in place, we can now use Docker Compose to start the monitoring stack.
- In the ~/Docker/monitoring directory, run the following command to start all services:
docker compose up -d
2. Verify that all containers are running:
docker ps
You should see Prometheus, Grafana, and Node Exporter running.
Step 5: Access Prometheus and Grafana
Access Prometheus
Open your browser and go to:
http://<server-ip>:9090
This is the Prometheus interface, where you can query metrics. To check the running processes use the following endpoint:
http://<server-ip>:9090/targets

Access Grafana
Open Grafana in your browser:
http://<server-ip>:3000
Log in with the default credentials (admin/admin), then create or skip a new password form.
Step 6: Add Prometheus as a Data Source in Grafana
- In Grafana, navigate to Connections > Data sources > Add data source.
- Select Prometheus.
- Set the URL to http://prometheus:9090 and click Save & Test.
- Grafana should successfully connect to Prometheus.
Step 7: Import Dashboards in Grafana
Import System Metrics Dashboard (Node Exporter)
- Go to Dashboards > New > Import.
- In the Find and import dashboards for common applications field, enter 1860 (this is the Node Exporter Full Dashboard ID) and click Load.
- Select Prometheus as the data source and click Import.
Import NVIDIA GPU Metrics Dashboard
- Go to Dashboards > New > Import.
- In the Find and import dashboards for common applications field, enter 14574 (this is the custom Nvidia GPU Metrics Dashboard ID) and click Load.
- Select Prometheus as the data source and click Import.
Now you should have two dashboards:
- Node Exporter Full Dashboard: Shows CPU, memory, and disk metrics.
- Nvidia GPU Metrics Dashboard: Shows GPU utilization, temperature, and other GPU metrics.
Step 8: [Optional] Clean-Up
When you are done with monitoring and need to remove all the monitoring stack we used above, you can do so by stopping the containers and removing the volumes:
docker compose down --volumes
This command stops and removes all containers and associated volumes.
Conclusion
By following this guide, you’ve set up a full monitoring solution that tracks both system and GPU metrics using Prometheus, Grafana, and Node Exporter. All metrics are collected and visualized in Grafana, providing insights into your system’s performance and GPU utilization.
This setup is designed for easy deployment with minimal configuration. Just copy and paste the provided commands, and you’ll have your monitoring stack up and running in no time!
References
Nvidia GPU Exporter for Prometheus Using Nvidia-SMI Binary:
- GitHub Repository: https://github.com/utkuozdemir/nvidia_gpu_exporter
- This repository contains the NVIDIA GPU Exporter tool, which leverages the nvidia-smi binary to expose GPU metrics for Prometheus.
Christian Lempa — Docker Compose Boilerplate for Nvidia SMI Exporter:
- GitHub Compose YAML: https://github.com/ChristianLempa/boilerplates/blob/08ce2b48300776e32390c102b3cd58f69b5ab354/docker-compose/nvidiasmi/compose.yaml
- This boilerplate provides a Docker Compose setup for the NVIDIA SMI Exporter.
Node Exporter Full Dashboard for Grafana:
- Grafana Dashboard ID: 1860 — Node Exporter Full
- A comprehensive Grafana dashboard for monitoring system metrics collected by Node Exporter, such as CPU, memory, disk, and network usage.
Nvidia GPU Metrics Dashboard:
- Grafana Dashboard ID: 14574 — Nvidia GPU Metrics
- This dashboard provides detailed NVIDIA GPU metrics based on the data exposed by the NVIDIA GPU Exporter, allowing you to monitor GPU usage, memory, power, and more.