Collecting and inspecting metrics dumps

What is a metrics dump?

Metrics dumps are a snapshot of the entire Prometheus database. This includes detailed information from the past 7d about the health of Sourcegraph, which alerts were firing, resource utilization, request performance, and more - but all aggregate / statistical information containing no code, personal information, etc.

Metrics dumps are very heavy (often in the range of ~8GB uncompressed to ~3GB compressed), and take ~10mins of an admins time to collect + some time to upload the file somewhere. It is most useful when debugging performance problems - but should be considered a last resort of sorts (with alerts being the first thing to check).

How to ask a site admin for a metrics dump

To ask a site admin for a metrics dump, create a shared Google Drive folder where they will be able to upload the dump and ask them to follow these instructions to create and upload their sourcegraph-metrics-dump.tgz file: https://docs.sourcegraph.com/admin/troubleshooting#submitting-a-metrics-dump

How to inspect a metrics dump

Simply extract the dump file to the location of Prometheus’s --storage.tsdb.path flag in any Sourcegraph deployment of the same version.

For example, if the snapshot was created using 3.17.1 and is located in ~/Downloads/sourcegraph-metrics-dump.tgz then extract it to ~/.sourcegraph/data/prometheus by first wiping out that directory:

rm -rf $HOME/.sourcegraph

And then extracting the snapshot:

export DATA_DIR="$HOME/.sourcegraph/data/prometheus"; rm -rf $DATA_DIR && mkdir -p $DATA_DIR && cd $DATA_DIR && tar -xzf ~/Downloads/sourcegraph-metrics-dump.tgz && mv */* .

Now if you launch a 3.17.1 server following the quickstart guide and navigate to Grafana (http://localhost:7080/-/debug/grafana) you can begin exploring the data.