Skip to main content

Monitoring

Set up monitoring for your saga-based applications.

Key Metrics​

Monitor these essential metrics for saga health:

MetricDescriptionAlert Threshold
saga_messages_processed_totalTotal messages processedN/A (use rate)
saga_message_processing_duration_secondsHandler execution timep99 > 5s
saga_messages_failed_totalFailed message count> 0/min
saga_active_instancesCurrently running sagasDepends on load
saga_dlq_messages_totalDead letter queue size> 0

Prometheus Setup​

import { createMetricsMiddleware } from '@saga-bus/middleware-metrics';

const bus = createBus({
transport,
store,
sagas,
middleware: [
createMetricsMiddleware({
prefix: 'saga_bus',
labels: ['saga_name', 'message_type'],
}),
],
});

Grafana Dashboard​

Key panels to include:

  1. Message Throughput - Messages processed per second
  2. Processing Latency - p50, p95, p99 duration histograms
  3. Error Rate - Failed messages over time
  4. Active Sagas - Current saga instance count
  5. DLQ Depth - Dead letter queue size

Alerting Rules​

groups:
- name: saga-bus
rules:
- alert: HighErrorRate
expr: rate(saga_messages_failed_total[5m]) > 0.01
for: 5m
labels:
severity: warning

- alert: SlowProcessing
expr: histogram_quantile(0.99, saga_message_processing_duration_seconds) > 5
for: 10m
labels:
severity: warning

- alert: DLQNotEmpty
expr: saga_dlq_messages_total > 0
for: 1m
labels:
severity: critical

See Also​