Getting Started
Architecture
NServiceBus
Transports
Persistence
ServiceInsight
ServicePulse
ServiceControl
Monitoring
Samples

Capture and visualize metrics using Prometheus and Grafana

Component: Metrics
NuGet Package: NServiceBus.Metrics (4.x)
Target Version: NServiceBus 8.x

See how using ServicePulse →

Monitor messaging consumers, detect failures, automatic retries in real-time, and replay dead-letter messages with minimal configuration. See the benefits of using ServicePulse
NServiceBus version 8 and above can export metric data to Prometheus and Grafana via OpenTelemetry without the metrics package. See the Using Open Telemetry with Prometheus and Grafana sample for more details.

Introduction

Prometheus is a monitoring solution for storing time series data like metrics. Grafana allows to visualize the data stored in Prometheus (and other sources). This sample demonstrates how to capture NServiceBus metrics, storing these in Prometheus and visualizing these metrics using Grafana.

Grafana NServiceBus processing time

This sample reports the following metrics to Prometheus:

  • Fetched messages per second
  • Failed messages per second
  • Successful messages per second
  • Critical time in seconds
  • Processing time seconds

For a detailed explanation of these metrics refer to the metrics captured section in the metrics documentation.

Prerequisites

To run this sample, download and run both Prometheus and Grafana. This sample uses Prometheus and Grafana.

Code overview

The sample simulates messages load with a random 10% failure rate using the LoadSimulator class:

var simulator = new LoadSimulator(endpointInstance, TimeSpan.Zero, TimeSpan.FromSeconds(10));
simulator.Start();

Capturing metric values

A Prometheus service is hosted inside an endpoint via the NuGet package prometheus-net. The service enables Prometheus to scrape data gathered by the metrics package. In the sample the service that exposes the data to scrape is hosted on http://localhost:3030. The service is started and stopped inside a feature startup task as shown below

class MetricServerTask : FeatureStartupTask
{
    MetricServer metricServer = new MetricServer(port: 3030);

    protected override Task OnStart(IMessageSession session, CancellationToken cancellationToken = default)
    {
        metricServer.Start();
        return Task.CompletedTask;
    }

    protected override Task OnStop(IMessageSession session, CancellationToken cancellationToken = default)
    {
        metricServer.Stop();
        return Task.CompletedTask;
    }
}

Custom observers need to be registered for the metric probes provided via NServiceBus.Metrics. This is all setup in the PrometheusFeature

endpointConfiguration.EnableMetrics();

The names provided by the NServiceBus.Metrics probes are not compatible with Prometheus. The NServiceBus.Metrics names need to be aligned with the naming conventions defined by Prometheus by mapping them accordingly

Counters: nservicebus_{counter-name}_total

Summaries: nservicebus_{summary-name}_seconds

Dictionary<string, string> nameMapping = new Dictionary<string, string>
{
    // https://prometheus.io/docs/practices/naming/
    {"# of msgs successfully processed / sec", "nservicebus_success_total"},
    {"# of msgs pulled from the input queue /sec", "nservicebus_fetched_total"},
    {"# of msgs failures / sec", "nservicebus_failure_total"},
    {"Critical Time", "nservicebus_criticaltime_seconds"},
    {"Processing Time", "nservicebus_processingtime_seconds"},
    {"Retries", "nservicebus_retries_total"},
};

The registered observers convert NServiceBus.Metric Signals to Prometheus Counters and NServiceBus.Metric Durations to Prometheus Summaries. Additionally, labels are added that identify the endpoint, the endpoint queue and more within Prometheus. With these labels, it is possible to filter and group metric values.

var instanceQueueAddress = context.InstanceSpecificQueueAddress();
var labelValues = new[]
{
    settings.EndpointName(),
    Environment.MachineName,
    Dns.GetHostName(),
    context.LocalQueueAddress().ToString(),
    instanceQueueAddress != null ? instanceQueueAddress.Discriminator : null,
};

var metricsOptions = settings.Get<MetricsOptions>();

metricsOptions.RegisterObservers(
    register: probeContext =>
    {
        RegisterProbes(probeContext, labelValues);
    });
Labels should be chosen thoughtfully since each unique combination of key-value label pairs represents a new time series which can dramatically increase the amount of data stored. The labels used here are for demonstration purpose only.

During the registration the following steps are required:

  • Map metric names
  • Register observer callbacks
  • Create summaries and counters with corresponding labels
  • Invoke the summaries and counters in the observer callback
foreach (var duration in context.Durations)
{
    if (!nameMapping.ContainsKey(duration.Name))
    {
        log.WarnFormat("Unsupported duration probe {0}", duration.Name);
        continue;
    }
    var prometheusName = nameMapping[duration.Name];
    var summary = Metrics.CreateSummary(prometheusName, duration.Description,
                                        new SummaryConfiguration
                                        {
                                            Objectives = new[]
                                                         {
                                                             new QuantileEpsilonPair(0.5, 0.05),
                                                             new QuantileEpsilonPair(0.9, 0.01),
                                                             new QuantileEpsilonPair(0.99, 0.001)
                                                         },
                                            LabelNames = Labels
                                        });
    duration.Register((ref DurationEvent @event) => summary.Labels(labelValues).Observe(@event.Duration.TotalSeconds));
}

foreach (var signal in context.Signals)
{
    if (!nameMapping.ContainsKey(signal.Name))
    {
        log.WarnFormat("Unsupported signal probe {0}", signal.Name);
        continue;
    }
    var prometheusName = nameMapping[signal.Name];
    var counter = Metrics.CreateCounter(prometheusName, signal.Description, Labels);
    signal.Register((ref SignalEvent @event) => counter.Labels(labelValues).Inc());
}

Prometheus

Prometheus needs to be configured to pull data from the endpoint. For more information how to setup Prometheus refer to the getting started guide.

Guided configuration

Copy the following files into the root folder of the Prometheus installation.

Overwrite the existing prometheus.yml in the Prometheus demo installation. Or proceed with the manual configuration if desired.

Manual configuration

Add a target

Edit prometheus.yml and add a new target for scraping similar to

- job_name: 'nservicebus'

    scrape_interval: 5s

    static_configs:
      - targets: ['localhost:3030']

Define rules

Queries can be expensive operations. Prometheus allows defining pre-calculated queries by configuring rules that calculate rates based on the counters.

groups:
- name: NServiceBus
  rules:
  - record: nservicebus_success_total:avg_rate5m
    expr: avg(rate(nservicebus_success_total[5m]))
  - record: nservicebus_failure_total:avg_rate5m
    expr: avg(rate(nservicebus_failure_total[5m]))
  - record: nservicebus_fetched_total:avg_rate5m
    expr: avg(rate(nservicebus_fetched_total[5m]))

The pre-calculated query can then be used.

nservicebus_success_total:avg_rate5m

For efficiency reasons the sample dashboard shown later requires three queries defined in a rules file. Create nservicebus.rules.txt in the root folder of the Prometheus installation and add the three rules as defined above.

To enable the rules edit prometheus.yml and add:

rule_files:
  - 'nservicebus.rules.txt'

Show a graph

Start Prometheus and open http://localhost:9090 in a web browser.

NServiceBus pushes events for success, failure, and fetched. These events need to be converted to rates by a query:

avg(rate(nservicebus_success_total[5m])) 

Prometheus graphs based on query

Example configuration

Prometheus configuration files demonstrating the concepts from this sample:

Grafana

Grafana needs to be installed and configured to display the data available in Prometheus. For more information how to install Grafana refer to the Installation Guide.

Guided configuration

Execute setup.grafana.ps1 in a PowerShell with elevated permission and provide the username and password to authenticate with Grafana. This script will

  • Create a data source called PrometheusNServiceBusDemo
  • Import the sample dashboard and connect it to the data source

Manual configuration

Datasource

Create a new data source called PrometheusNServiceBusDemo. For more information how to define a Prometheus data source refer to Using Prometheus in Grafana.

Dashboard

To graph the Prometheus rule nservicebus_failure_total:avg_rate5m the following steps have to be performed:

  • Add a new dashboard
  • Add a graph
  • Click its title to edit
  • Click the Metric tab

Grafana metric using Prometheus as datasource

Dashboard

Grafana dashboard with NServiceBus metrics

The sample included an export of the grafana dashboard, this can be imported as a reference.


Last modified