> ## Documentation Index
> Fetch the complete documentation index at: https://lightdash-docs-data-app-visualizations.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Configure Prometheus metrics for self-hosted Lightdash

<Note>
  🛠 This page is for engineering teams self-hosting their own Lightdash instance. If you want to monitor usage and analytics, go to the [Usage analytics](/references/workspace/usage-analytics) guide.
</Note>

Lightdash can expose Prometheus metrics to help you monitor the performance and health of your Lightdash instance. This guide explains how to enable and configure Prometheus metrics for your self-hosted Lightdash deployment.

## Enabling Prometheus metrics

By default, Prometheus metrics are disabled in Lightdash. To enable them, set the following environment variable:

```bash theme={null}
LIGHTDASH_PROMETHEUS_ENABLED=true
```

## Configuration options

You can customize the Prometheus metrics endpoint using the following environment variables:

| Variable                                    | Description                                                                                                | Required? |           Default           |
| :------------------------------------------ | :--------------------------------------------------------------------------------------------------------- | :-------: | :-------------------------: |
| `LIGHTDASH_PROMETHEUS_ENABLED`              | Enables/Disables Prometheus metrics endpoint                                                               |           |           `false`           |
| `LIGHTDASH_PROMETHEUS_PORT`                 | Port for Prometheus metrics endpoint                                                                       |           |            `9090`           |
| `LIGHTDASH_PROMETHEUS_PATH`                 | Path for Prometheus metrics endpoint                                                                       |           |          `/metrics`         |
| `LIGHTDASH_PROMETHEUS_PREFIX`               | Prefix for metric names                                                                                    |           |                             |
| `LIGHTDASH_GC_DURATION_BUCKETS`             | Buckets for duration histogram in seconds                                                                  |           | `0.001, 0.01, 0.1, 1, 2, 5` |
| `LIGHTDASH_EVENT_LOOP_MONITORING_PRECISION` | Precision for event loop monitoring in milliseconds. Must be greater than zero.                            |           |             `10`            |
| `LIGHTDASH_PROMETHEUS_LABELS`               | Labels to add to all metrics. Must be valid JSON                                                           |           |                             |
| `LIGHTDASH_CUSTOM_METRICS_CONFIG_PATH`      | Path to a JSON config file for custom event-driven counter metrics                                         |           |                             |
| `LIGHTDASH_PROMETHEUS_HTTP_METRICS_ENABLED` | Enables the OpenTelemetry `http.server.request.duration` histogram with semconv labels and route templates |           |           `false`           |

## Available metrics

Lightdash exposes the following metrics:

### Process metrics

These metrics provide information about the Node.js process running Lightdash:

| Metric                             | Type    | Description                                           |
| :--------------------------------- | :------ | :---------------------------------------------------- |
| `process_cpu_user_seconds_total`   | counter | Total user CPU time spent in seconds                  |
| `process_cpu_system_seconds_total` | counter | Total system CPU time spent in seconds                |
| `process_cpu_seconds_total`        | counter | Total user and system CPU time spent in seconds       |
| `process_start_time_seconds`       | gauge   | Start time of the process since unix epoch in seconds |
| `process_resident_memory_bytes`    | gauge   | Resident memory size in bytes                         |
| `process_virtual_memory_bytes`     | gauge   | Virtual memory size in bytes                          |
| `process_heap_bytes`               | gauge   | Process heap size in bytes                            |
| `process_open_fds`                 | gauge   | Number of open file descriptors                       |
| `process_max_fds`                  | gauge   | Maximum number of open file descriptors               |

### Node.js metrics

These metrics provide information about the Node.js runtime:

| Metric                                   | Type      | Description                                                                                                |
| :--------------------------------------- | :-------- | :--------------------------------------------------------------------------------------------------------- |
| `nodejs_eventloop_lag_seconds`           | gauge     | Lag of event loop in seconds                                                                               |
| `nodejs_eventloop_lag_min_seconds`       | gauge     | The minimum recorded event loop delay                                                                      |
| `nodejs_eventloop_lag_max_seconds`       | gauge     | The maximum recorded event loop delay                                                                      |
| `nodejs_eventloop_lag_mean_seconds`      | gauge     | The mean of the recorded event loop delays                                                                 |
| `nodejs_eventloop_lag_stddev_seconds`    | gauge     | The standard deviation of the recorded event loop delays                                                   |
| `nodejs_eventloop_lag_p50_seconds`       | gauge     | The 50th percentile of the recorded event loop delays                                                      |
| `nodejs_eventloop_lag_p90_seconds`       | gauge     | The 90th percentile of the recorded event loop delays                                                      |
| `nodejs_eventloop_lag_p99_seconds`       | gauge     | The 99th percentile of the recorded event loop delays                                                      |
| `nodejs_active_resources`                | gauge     | Number of active resources that are currently keeping the event loop alive, grouped by async resource type |
| `nodejs_active_resources_total`          | gauge     | Total number of active resources                                                                           |
| `nodejs_active_handles`                  | gauge     | Number of active libuv handles grouped by handle type                                                      |
| `nodejs_active_handles_total`            | gauge     | Total number of active handles                                                                             |
| `nodejs_active_requests`                 | gauge     | Number of active libuv requests grouped by request type                                                    |
| `nodejs_active_requests_total`           | gauge     | Total number of active requests                                                                            |
| `nodejs_heap_size_total_bytes`           | gauge     | Process heap size from Node.js in bytes                                                                    |
| `nodejs_heap_size_used_bytes`            | gauge     | Process heap size used from Node.js in bytes                                                               |
| `nodejs_external_memory_bytes`           | gauge     | Node.js external memory size in bytes                                                                      |
| `nodejs_heap_space_size_total_bytes`     | gauge     | Process heap space size total from Node.js in bytes                                                        |
| `nodejs_heap_space_size_used_bytes`      | gauge     | Process heap space size used from Node.js in bytes                                                         |
| `nodejs_heap_space_size_available_bytes` | gauge     | Process heap space size available from Node.js in bytes                                                    |
| `nodejs_version_info`                    | gauge     | Node.js version info                                                                                       |
| `nodejs_gc_duration_seconds`             | histogram | Garbage collection duration by kind                                                                        |
| `nodejs_eventloop_utilization`           | gauge     | The calculated Event Loop Utilization (ELU) as a percentage                                                |

### PostgreSQL metrics

These metrics provide information about the PostgreSQL connection pool:

| Metric                       | Type      | Description                                                   | Labels |
| :--------------------------- | :-------- | :------------------------------------------------------------ | :----- |
| `pg_pool_max_size`           | gauge     | Max size of the PG pool                                       |        |
| `pg_pool_size`               | gauge     | Current size of the PG pool                                   |        |
| `pg_active_connections`      | gauge     | Number of active connections in the PG pool                   |        |
| `pg_idle_connections`        | gauge     | Number of idle connections in the PG pool                     |        |
| `pg_queued_queries`          | gauge     | Number of queries waiting in the PG pool queue                |        |
| `pg_connection_acquire_time` | histogram | Time to acquire a connection from the PG pool in milliseconds |        |
| `pg_query_duration`          | histogram | Histogram of PG query execution time in milliseconds          |        |

### Queue metrics

| Metric       | Type  | Description                 |
| :----------- | :---- | :-------------------------- |
| `queue_size` | gauge | Number of jobs in the queue |

### Query metrics

These metrics track query execution performance. The `context` label is either `scheduled` or `interactive` based on the execution context.

| Metric                                        | Type      | Description                                                       | Labels                                         |
| :-------------------------------------------- | :-------- | :---------------------------------------------------------------- | :--------------------------------------------- |
| `lightdash_query_status_total`                | counter   | Total number of queries by terminal status                        | `status`, `context`                            |
| `lightdash_query_state_transitions_total`     | counter   | Query state transitions                                           | `from`, `to`, `context`                        |
| `lightdash_query_queue_wait_duration_seconds` | histogram | Time spent waiting in queue before execution                      | `context`                                      |
| `lightdash_query_total_duration_seconds`      | histogram | Total query duration from creation to results ready               | `context`                                      |
| `lightdash_query_warehouse_duration_seconds`  | histogram | Warehouse query execution duration                                | `warehouse_type`, `context`                    |
| `lightdash_query_overhead_duration_seconds`   | histogram | Lightdash overhead: total duration minus warehouse execution time | `context`                                      |
| `lightdash_query_cache_hit_total`             | counter   | Total number of query cache hits and misses                       | `result`, `context`, `has_pre_aggregate_match` |

### Pre-aggregate metrics

These metrics track the pre-aggregate system, including materialization, DuckDB resolution, and file management:

| Metric                                                               | Type      | Description                                                              | Labels                            |
| :------------------------------------------------------------------- | :-------- | :----------------------------------------------------------------------- | :-------------------------------- |
| `lightdash_pre_aggregate_match_total`                                | counter   | Total number of pre-aggregate match attempts                             | `result`, `miss_reason`, `format` |
| `lightdash_pre_aggregate_materialization_total`                      | counter   | Total number of pre-aggregate materializations by outcome                | `status`, `trigger`               |
| `lightdash_pre_aggregate_active_materializations`                    | gauge     | Current number of active pre-aggregate materializations                  |                                   |
| `lightdash_pre_aggregate_materialization_duration_seconds`           | histogram | Pre-aggregate materialization duration                                   | `status`, `trigger`               |
| `lightdash_pre_aggregate_materialization_poll_duration_seconds`      | histogram | Time spent polling for materialization query completion in seconds       | `status`, `trigger`               |
| `lightdash_pre_aggregate_materialization_warehouse_duration_seconds` | histogram | Warehouse execution time during materialization in seconds               | `status`, `trigger`               |
| `lightdash_pre_aggregate_materialization_promote_duration_seconds`   | histogram | Time to check file size and promote materialization to active in seconds | `status`, `trigger`               |
| `lightdash_pre_aggregate_materialization_file_size_bytes`            | histogram | File size of pre-aggregate materialization in bytes                      | `format`                          |
| `lightdash_pre_aggregate_parquet_conversion_duration_seconds`        | histogram | Duration of JSONL to Parquet conversion                                  | `status`                          |
| `lightdash_pre_aggregate_duckdb_resolution_total`                    | counter   | Total number of DuckDB pre-aggregate resolution attempts                 | `status`, `reason`                |
| `lightdash_pre_aggregate_duckdb_resolution_duration_seconds`         | histogram | DuckDB pre-aggregate resolution duration                                 | `status`                          |
| `lightdash_pre_aggregate_duckdb_query_latency_seconds`               | histogram | Total DuckDB query latency in seconds                                    |                                   |
| `lightdash_pre_aggregate_duckdb_parquet_read_duration_seconds`       | histogram | Time spent in READ\_PARQUET operators in seconds                         |                                   |
| `lightdash_pre_aggregate_duckdb_bytes_read`                          | histogram | Bytes read from S3/parquet by DuckDB queries                             |                                   |
| `lightdash_pre_aggregate_duckdb_scan_amplification`                  | histogram | Ratio of rows scanned to rows returned in DuckDB queries                 |                                   |
| `lightdash_pre_aggregate_fallback_total`                             | counter   | Total number of opportunistic pre-aggregate fallbacks to warehouse       | `reason`                          |

### AI agent metrics

These metrics track the performance of the AI agent:

| Metric                                    | Type      | Description                                                                                   | Labels                                  |
| :---------------------------------------- | :-------- | :-------------------------------------------------------------------------------------------- | :-------------------------------------- |
| `ai_agent_generate_response_duration_ms`  | histogram | AI agent generate response time in milliseconds                                               |                                         |
| `ai_agent_stream_response_duration_ms`    | histogram | AI agent stream response time in milliseconds                                                 |                                         |
| `ai_agent_stream_first_chunk_ms`          | histogram | AI agent time to first chunk (any type)                                                       |                                         |
| `ai_agent_ttft_ms`                        | histogram | AI agent time to first token (TTFT)                                                           | `model`, `mode`                         |
| `ai_repofs_github_tree_duration_ms`       | histogram | Duration of the repoShell GitHub Git Trees fetch (repo listing, once per run) in milliseconds |                                         |
| `ai_repofs_github_file_duration_ms`       | histogram | Duration of a per-file GitHub Contents read in milliseconds                                   | `outcome` (`found`, `missing`, `error`) |
| `ai_writeback_sandbox_create_duration_ms` | histogram | Duration of AI writeback E2B sandbox create/resume in milliseconds                            |                                         |
| `ai_writeback_compile_duration_ms`        | histogram | Duration of in-sandbox `lightdash compile` during AI writeback in milliseconds                | `status`                                |
| `ai_writeback_run_duration_ms`            | histogram | End-to-end AI writeback run duration in milliseconds                                          | `status`                                |

### S3 metrics

| Metric                                         | Type      | Description                | Labels   |
| :--------------------------------------------- | :-------- | :------------------------- | :------- |
| `lightdash_s3_results_upload_duration_seconds` | histogram | S3 results upload duration | `source` |

### HTTP server metrics

When `LIGHTDASH_PROMETHEUS_HTTP_METRICS_ENABLED` is set to `true`, Lightdash exposes a standardized [OpenTelemetry HTTP server semantic convention](https://opentelemetry.io/docs/specs/semconv/http/http-metrics/) histogram. Buckets are in seconds and labels use route templates (not raw URLs) to keep cardinality bounded.

| Metric                                 | Type      | Description                                  | Labels                                                                                                     |
| :------------------------------------- | :-------- | :------------------------------------------- | :--------------------------------------------------------------------------------------------------------- |
| `http_server_request_duration_seconds` | histogram | Duration of inbound HTTP requests in seconds | `http_request_method`, `http_response_status_code`, `http_route`, `url_scheme`, `network_protocol_version` |

### Custom event metrics

Lightdash supports operator-configurable Prometheus counter metrics that are driven by application events. These are defined via a JSON configuration file specified by the `LIGHTDASH_CUSTOM_METRICS_CONFIG_PATH` environment variable.

Each entry in the config file creates a counter metric that increments when a matching application event fires. This allows you to track custom business-level metrics such as user logins or query executions without modifying the application code.

## Using metrics for monitoring and alerting

You can use these metrics to create dashboards and alerts in your monitoring system. Some common use cases include:

* Monitoring memory usage and setting alerts for potential memory leaks
* Tracking PostgreSQL connection pool utilization
* Monitoring event loop lag to detect performance issues
* Setting up alerts for high CPU usage

For example, you might want to create alerts for:

* High memory usage: `process_resident_memory_bytes > threshold`
* Event loop lag: `nodejs_eventloop_lag_p99_seconds > threshold`
* Database connection pool saturation: `pg_active_connections / pg_pool_max_size > 0.8`

## OpenTelemetry support

Lightdash metrics are also compatible with OpenTelemetry. You can use the [OpenTelemetry Collector](https://opentelemetry.io/docs/collector) with the [Prometheus receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/prometheusreceiver) to scrape Lightdash's Prometheus metrics endpoint and export them to any OpenTelemetry-compatible backend.

Example OpenTelemetry Collector configuration:

```yaml theme={null}
receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'lightdash'
          scrape_interval: 15s
          static_configs:
            - targets: ['lightdash:9090']

exporters:
  # Configure your preferred exporter (e.g., OTLP, Jaeger, etc.)
  otlp:
    endpoint: "your-otlp-endpoint:4317"

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      exporters: [otlp]
```

## Setting up a Prometheus server

If you don't already have a Prometheus server set up, here are some resources to help you get started:

### General Prometheus setup

* [Prometheus Getting Started Guide](https://prometheus.io/docs/prometheus/latest/getting_started) - Official documentation on how to install and configure Prometheus
* [Prometheus Installation](https://prometheus.io/docs/prometheus/latest/installation) - Different ways to install Prometheus
* [Prometheus Configuration](https://prometheus.io/docs/prometheus/latest/configuration/configuration) - Detailed configuration options for Prometheus

### Setting up Prometheus in Google Cloud Platform (GCP)

* [Google Cloud Managed Service for Prometheus](https://cloud.google.com/stackdriver/docs/managed-prometheus) - Google Cloud's managed Prometheus service
* [Installing Prometheus on GKE](https://cloud.google.com/stackdriver/docs/managed-prometheus/setup-managed) - Setting up Prometheus on Google Kubernetes Engine
* [Google Cloud Operations Suite Integration](https://cloud.google.com/stackdriver/docs/solutions/gke/prometheus) - Integrating Prometheus with Google Cloud Operations Suite

### Setting up Prometheus in Amazon Web Services (AWS)

* [Amazon Managed Service for Prometheus](https://aws.amazon.com/prometheus) - AWS managed Prometheus service
* [Getting Started with Amazon Managed Service for Prometheus](https://docs.aws.amazon.com/prometheus/latest/userguide/what-is-Amazon-Managed-Service-Prometheus.html) - Official AWS documentation
* [Setting up Prometheus on Amazon EKS](https://docs.aws.amazon.com/eks/latest/userguide/prometheus.html) - Deploying Prometheus on Amazon Elastic Kubernetes Service
