This is a simple doc to install, configure, and use a simple Prometheus instance.
Download and run Prometheus locally, configure it to scrape itself and an example application, and then work with queries, rules, and graphs to use collected time series data.
Using pre-compiled binaries
There are precompiled binaries for most official Prometheus components.
Download the latest release of Prometheus for your platform, then extract and run it:
tar xvfz prometheus-*.tar.gz
All Prometheus services are available as Docker images on Quay.io or Docker Hub.
Running Prometheus on Docker is as simple as
docker run -p 9090:9090 prom/prometheus.
This starts Prometheus with a sample configuration and exposes it on port
The Prometheus image uses a volume to store the actual metrics.
For production deployments, it is highly recommended to use a named volume to ease managing the data on Prometheus upgrades.
To provide your configuration, there are several options. Here are two examples.
Bind-mount your prometheus.yml from the host by running:
docker run \ -p 9090:9090 \ -v /path/to/prometheus.yml:/etc/prometheus/prometheus.yml \ prom/prometheus
Or bind-mount the directory containing
/etc/prometheus by running:
docker run \ -p 9090:9090 \ -v /path/to/config:/etc/prometheus \ prom/prometheus
Prometheus data is stored in
/prometheus dir inside the container, so the data is cleared every time the container gets restarted. To save your data, you need to set up persistent storage (or bind mounts) for your container.
Run Prometheus container with persistent storage:
# Create a persistent volume for your data docker volume create prometheus-data # Start Prometheus container docker run \ -p 9090:9090 \ -v /path/to/prometheus.yml:/etc/prometheus/prometheus.yml \ -v prometheus-data:/prometheus \ prom/prometheus
To avoid managing a file on the host and bind-mounting it, the configuration can be baked into the image. This works well if the configuration itself is rather static and the same across all environments.
For this, create a new directory with a Prometheus configuration and a Dockerfile like this:
FROM prom/prometheus ADD prometheus.yml /etc/prometheus/
Now build and run it:
docker build -t my-prometheus .
docker run -p 9090:9090 my-prometheus
A more advanced option is to render the configuration dynamically on start with some tooling or even have a daemon update it periodically.
To start Prometheus with your newly created configuration file, change to the directory containing the Prometheus binary and run:
# Start Prometheus. # By default, Prometheus stores its database in ./data (flag --storage.tsdb.path). ./prometheus --config.file=prometheus.yml
Prometheus should start up. You should also be able to browse a status page about itself at
Give it a couple of seconds to collect data about itself from its own HTTP metrics endpoint.
You can also verify that Prometheus is serving metrics about itself by navigating to its metrics endpoint:
Explore data that Prometheus has collected about itself.
To use Prometheus's built-in expression browser, navigate to http://localhost:9090/graph and choose the "Table" view within the "Graph" tab.
As you can gather from
localhost:9090/metrics, one metric that Prometheus exports about itself is named
prometheus_target_interval_length_seconds (the actual amount of time between target scrapes).
Enter the below into the expression console and then click "Execute":
This should return several different time series (along with the latest value recorded for each), each with the metric name
prometheus_target_interval_length_seconds, but with different labels.
These labels designate different latency percentiles and target group intervals.
If we are interested only in 99th-percentile latencies, we could use this query:
To count the number of returned time series, you could write:
To graph expressions, navigate to http://localhost:9090/graph and use the "Graph" tab.
Enter the following expression to graph the per-second rate of chunks being created in the self-scraped Prometheus:
Try the graph range parameters and other settings.
Add additional targets for Prometheus to scrape.
The Node Exporter is used as an example target, for more information on using it see these instructions.
tar -xzvf node_exporter-*.*.tar.gz
# Start 3 example targets in separate terminals: ./node_exporter --web.listen-address 127.0.0.1:8080 ./node_exporter --web.listen-address 127.0.0.1:8081 ./node_exporter --web.listen-address 127.0.0.1:8082
You should now have example targets listening on
Now we will configure Prometheus to scrape these new targets. Let's group all three endpoints into one job called node. We will imagine that the first two endpoints are prod targets, while the third one represents a dev instance. To model this in Prometheus, we can add several groups of endpoints to a single job, adding extra labels to each group of targets. In this example, we will add the
group="suyash" label to the first group of targets, while adding
group="suyi" to the second.
Add the following job definition to the
scrape_configs section in your
prometheus.yml and restart your Prometheus instance:
scrape_configs: - job_name: 'node' # Override the global default and scrape targets from this job every 5 seconds. scrape_interval: 5s static_configs: - targets: ['localhost:8080', 'localhost:8081'] labels: group: 'suyash' - targets: ['localhost:8082'] labels: group: 'suyi'
Go to the expression browser and verify that Prometheus now has information about the time series that these example endpoints expose, such as
Configure rules for aggregating scraped data into new time series.
Though not a problem in our example, queries that aggregate over thousands of time series can get slow when computed ad-hoc. To make this more efficient, Prometheus can prerecord expressions into new persisted time series via configured recording rules. Let's say we are interested in recording the per-second rate of CPU time (
node_cpu_seconds_total) averaged over all CPU per instance as measured over a window of 5 minutes. We could write this as:
avg by (job, instance, mode) (rate(node_cpu_seconds_total[5m]))
To record the time series resulting from this expression into a new metric called
job_instance_mode:node_cpu_seconds:avg_rate5m, create a file with the following recording rule and save it as
groups: - name: cpu-node rules: - record: job_instance_mode:node_cpu_seconds:avg_rate5m expr: avg by (job, instance, mode) (rate(node_cpu_seconds_total[5m]))
To make Prometheus pick up this new rule, add a rule_files statement in your
prometheus.yml. The config should now look like this:
global: scrape_interval: 15s # By default, scrape targets every 15 seconds. evaluation_interval: 15s # Evaluate rules every 15 seconds. # Attach these extra labels to all time-series collected by this Prometheus instance. external_labels: monitor: 'codelab-monitor' rule_files: - 'prometheus.rules.yml' scrape_configs: - job_name: 'prometheus' # Override the global default and scrape targets from this job every 5 seconds. scrape_interval: 5s static_configs: - targets: ['localhost:9090'] - job_name: 'node' # Override the global default and scrape targets from this job every 5 seconds. scrape_interval: 5s static_configs: - targets: ['localhost:8080', 'localhost:8081'] labels: group: 'suyash' - targets: ['localhost:8082'] labels: group: 'suyi'
Restart Prometheus with the new configuration and verify that a new time series with the metric name
job_instance_mode:node_cpu_seconds:avg_rate5m is now available by querying it through the expression browser or graphing it.
As mentioned in the configuration documentation a Prometheus instance can have its configuration reloaded without restarting the process by using the
SIGHUP signal. If you're running on Linux this can be performed by using
kill -s SIGHUP 11232, where 11232 is your Prometheus PID.
While Prometheus does have recovery mechanisms in the case that there is an abrupt process failure it is recommended to use the
SIGTERM signal to cleanly shut down a Prometheus instance. If you're running on Linux this can be performed by using kill -s SIGTERM , replacing with your Prometheus process ID.