Getting Started
Architecture
NServiceBus
Transports
Persistence
ServiceInsight
ServicePulse
ServiceControl
Monitoring
Modernization
Samples

Azure Cosmos DB Persistence

Target Version: NServiceBus 9.x

The Azure Cosmos DB persister uses the Azure Cosmos DB NoSQL database service for storage.

Persistence at a glance

For a description of each feature, see the persistence at a glance legend.

Feature
Supported storage typesSagas, Outbox
TransactionsUsing TransactionalBatch, with caveats
Concurrency controlOptimistic concurrency, optional pessimistic concurrency
Scripted deploymentNot supported
InstallersContainer is created by installers.

Usage

Add a NuGet package reference to NServiceBus.Persistence.CosmosDB. Configure the endpoint to use the persister through the following configuration API:

endpointConfiguration.UsePersistence<CosmosPersistence>()
    .CosmosClient(new CosmosClient("ConnectionString"));

Token credentials

Using a TokenCredential enables the usage of Microsoft Entra ID authentication such as managed identities for Azure resources instead of the requiring a shared secret in the connection string.

A TokenCredential can be provided by using the corresponding CosmosClient constructor overload when creating the client passed to the persister.

Customizing the database used

By default, the persister will store records in a database named NServiceBus. This can be overwritten by using the following configuration API:

endpointConfiguration.UsePersistence<CosmosPersistence>()
    .CosmosClient(new CosmosClient("ConnectionString"))
    .DatabaseName("DatabaseName");

Customizing the container used

The container that is used by default for all incoming messages is specified via the DefaultContainer(..) configuration API:

endpointConfiguration.UsePersistence<CosmosPersistence>()
    .CosmosClient(new CosmosClient("ConnectionString"))
    .DefaultContainer(
        containerName: "ContainerName",
        partitionKeyPath: "/partition/key/path");

Added in version 3.2.1: By default, message container extractors cannot override the configured default container. To allow extractors to override the default container, enable the EnableContainerFromMessageExtractor flag:

config.UsePersistence<CosmosPersistence>()
    .EnableContainerFromMessageExtractor();

When this flag is enabled and multiple extractors are configured, the last extractor in the pipeline determines the final container. For example, if both a Header Extractor (physical stage) and a Message Extractor (logical stage) are configured, the Message Extractor takes precedence.


When installers are enabled, this (default) container will be created if it doesn't exist. To opt-out of creating the default container, either disable the installers or use DisableContainerCreation():

endpointConfiguration.UsePersistence<CosmosPersistence>()
    .CosmosClient(new CosmosClient("ConnectionString"))
    .DefaultContainer(
        containerName: "ContainerName",
        partitionKeyPath: "/partition/key/path")
    .DisableContainerCreation();

Any other containers that are resolved by extracting partition information from incoming messages need to be manually created in Azure.

Customizing the CosmosClient provider

When the CosmosClient is configured and used via dependency injection, a custom provider can be implemented:

class CustomCosmosClientProvider
    : IProvideCosmosClient
{
    // get fully configured via DI
    public CustomCosmosClientProvider(CosmosClient cosmosClient)
    {
        Client = cosmosClient;
    }
    public CosmosClient Client { get; }
}

and registered on the container:

endpointConfiguration.RegisterComponents(c => c.AddTransient<IProvideCosmosClient, CustomCosmosClientProvider>());

Capacity planning using request units (RU)

Understanding Request Units (RUs) is essential for effective capacity planning in Azure Cosmos DB. RUs represent the cost of database operations in terms of system resources. Knowing how your workload consumes them helps you avoid throttling, control costs, and size your setup appropriately, especially when using Provisioned Throughput or Serverless accounts.

Using the Microsoft Cosmos DB Capacity Planner

Microsoft provide a Cosmos DB capacity calculator which can be used to model the throughput costs of your solution. It uses several parameters to calculate this, but only the following are directly affected when using the Azure Cosmos DB Persistence.

Capacity Calculator ParameterPersistence OperationCosmos DB Operation
Point readsLogical/physical outbox read, Outbox partition key fallback read, Saga readReadItemStreamAsync
CreatesNew outbox record, New saga recordCreateItemStream
UpdatesSaga update, Saga acquire lease, Saga release lock, Outbox dispatched, Outbox delete (updates TTL)ReplaceItemStream, UpsertItemStream, PatchItemStreamAsync, PatchItem
DeletesSaga complete, Outbox TTL background cleanupDeleteItem
QueriesSaga migration modeGetItemQueryStreamIterator

Document size also affects RU usage as the size of an item increases, the number of RUs consumed to read or write the item also increases. The table below provides an estimate of the persistence cost that should be considered per message when modeling throughput requirements.

Record TypeEstimated Size
Outbox~630 bytes + message body
Saga~300 bytes + saga data

The below tables gives an indication of what Cosmos DB operations occur in different NServiceBus endpoint configurations for every processed message. This can be used with the Cosmos DB Capacity Planner, along with other factors that affect pricing (such as the selected Cosmos DB API, number of regions, etc), and the total message throughput to produce an estimated RU capacity requirement.

No Outbox

Incoming message ScenarioPoint ReadsCreatesUpdatesDeletesQueriesPersistence Requirements*
No Saga000000 bytes
Saga (new)11000300 bytes
Saga (new) + Migration Mode11001300 bytes
Saga (update)10100300 bytes
Saga (complete)10010300 bytes

With Outbox

Incoming message scenarioPoint ReadsCreatesUpdatesDeletesQueriesPersister Requirements*
No Saga1111 (delayed)0630 bytes (1 msg sent)
No Saga + Partition Key Fallback Read2111 (delayed)0630 bytes (1 msg sent)
Saga (new)2211 (delayed)0630 + 300 = 930 bytes
Saga (update)2121 (delayed)0630 + 300 = 930 bytes
Saga (complete)2112 (delayed)0630 + 300 = 930 bytes
Saga + Pessimistic Locking (no contention)11-23-41-2 (delayed)0630 + 360 = 990 bytes
Saga + Pessimistic Locking (3 retries)11-29-101-2 (delayed)0630 + 360 = 990 bytes

*Persister requirements exclude message bodies and saga data and assume one handler sends one outgoing message.

Additional operations (conditional):

  • Multiple Partition Keys: Separate operations per partition key
  • More outgoing messages: +400 bytes overhead per additional message sent

Example

  • Outbox: Enabled
  • Sagas: Order saga (average 3 KB)
  • Locking: Optimistic (default)
  • Message rate: 500 messages/second peak
  • Each handler sends average 2 outgoing messages (1 KB each)
Calculator Inputs
Operation TypeCalculationResult
Point Reads500 msg/sec × 2 reads =1,000/sec
Creates500 msg/sec × 1 create (outbox) =500/sec
Updates500 msg/sec × 2 updates (saga + outbox) =1,000/sec
Deletes500 msg/sec avg over 24h (steady state) =500/sec
Queries00/sec
OutboxRecord size200 bytes + (2 × 1000 bytes) =2.2 KB
Saga size3000 bytes + 300 bytes metadata =3.3 KB

Using Code

Another, more direct, approach to RU capacity planning would be to use a Cosmos DB RequestHandler attached to a customized CosmosClient provider in your NServiceBus endpoint in a development environment. This request handler gives you the flexibility to log every Cosmos DB request and response, and its associated RU charge. In this way, you can measure exactly what operations are being performed on the Cosmos DB database for each message for that endpoint, and what the RU costs for each operation are. This can then be multiplied by the estimated throughput of that NServiceBus endpoint when in production.

//...
var endpointConfiguration = new EndpointConfiguration("Name");
var builder = new CosmosClientBuilder(cosmosConnection);
builder.AddCustomHandlers(new LoggingHandler());
CosmosClient cosmosClient = builder.Build();
//...

class LoggingHandler : RequestHandler
{
    public override async Task<ResponseMessage> SendAsync(RequestMessage request, CancellationToken cancellationToken = default)
    {
        ResponseMessage response = await base.SendAsync(request, cancellationToken).ConfigureAwait(false);
        CosmosDiagnostics diagnostics = response.Diagnostics;

        // diagnostics JSON string contains the operation name. i.e. ReadItemStreamAsync
        // use this to map the cosmos operation to the capacity planner using the table above

        string requestChargeRU = response.Headers["x-ms-request-charge"];

        if ((int)response.StatusCode == 429)
        {
            logger.LogWarning("Request throttled");
        }

        return response;
    }
}

Using Azure

Alternatively, the Azure Cosmos DB Diagnostic Settings can be configured to route the diagnostic logs to an Azure Log Analytics Workspace. Here they can be queried for the same data used for RU capacity planning. This method is not recommended for live monitoring of RU usage as diagnostic logs typically are delayed by a few minutes, and cost and retention of Log Analytics would be a limiting factor.

For real time monitoring, the metrics pane in the Cosmos DB account can be used.

Provisioned throughput rate-limiting

When using provisioned throughput, it is possible for the CosmosDB service to rate-limit usage, resulting in "request rate too large" exceptions indicated by a 429 status code.

The Cosmos DB SDK provides a mechanism to automatically retry collection operations when rate-limiting occurs. Besides changing the provisioned RUs or switching to the serverless tier, those settings can be adjusted to help prevent messages from failing during spikes in message volume.

These settings may be set when initializing the CosmosClient via the CosmosClientOptions MaxRetryAttemptsOnRateLimitedRequests and MaxRetryWaitTimeOnRateLimitedRequests properties:

endpointConfiguration.UsePersistence<CosmosPersistence>()
    .CosmosClient(new CosmosClient("ConnectionString", new CosmosClientOptions
    {
        MaxRetryWaitTimeOnRateLimitedRequests = TimeSpan.FromSeconds(30),
        MaxRetryAttemptsOnRateLimitedRequests = 9
    }));

They may also be set when using a CosmosClientBuilder via the WithThrottlingRetryOptions method:

var cosmosClientBuilder = new CosmosClientBuilder("ConnectionString")
   .WithThrottlingRetryOptions(
       maxRetryWaitTimeOnThrottledRequests: TimeSpan.FromSeconds(30),
       maxRetryAttemptsOnThrottledRequests: 9
   );

endpointConfiguration.UsePersistence<CosmosPersistence>()
    .CosmosClient(cosmosClientBuilder.Build());

Transactions

The Cosmos DB persister supports using the Cosmos DB transactional batch API. However, Cosmos DB only allows operations to be batched if all operations are performed within the same logical partition key. This is due to the distributed nature of the Cosmos DB service, which does not support distributed transactions.

The transactions documentation provides additional details on how to configure NServiceBus to resolve the incoming message to a specific partition key to take advantage of this Cosmos DB feature.

Outbox

Storage format

Version 3.2.1 and up

A default synthetic partition key will be used for all incoming messages, in the format {endpointName}-{messageId}, if not explicitly overwritten at runtime.

To support backward compatibility of control messages during migration, the persistence includes a fallback mechanism. When enabled (default), and if a record is not found using the synthetic key format, the system falls back to the legacy {messageId} format. Since the fallback mechanism involves an additional read operation on the Outbox container, it is recommended to turn it off once all legacy records have expired.

endpointConfiguration
    .EnableOutbox()
    .DisableReadFallback();

Version 3.1 and under

Outbox cleanup

When the outbox is enabled, the deduplication data is kept for seven days by default. To customize this time frame, use the following API:

var outbox = endpointConfiguration.EnableOutbox();
outbox.TimeToKeepOutboxDeduplicationData(TimeSpan.FromDays(7));

Outbox cleanup depends on the Cosmos DB time-to-live feature. Failure to remove the expired outbox records is caused by a misconfigured collection that has time-to-live disabled. Refer to the Cosmos DB documentation to configure the collection correctly.

Samples

Related Articles