The Azure Cosmos DB persister uses the Azure Cosmos DB NoSQL database service for storage.
It is important to read and understand partitioning in Azure Cosmos DB before using NServiceBus..
Persistence at a glance
For a description of each feature, see the persistence at a glance legend.
| Feature | |
|---|---|
| Supported storage types | Sagas, Outbox |
| Transactions | Using TransactionalBatch, with caveats |
| Concurrency control | Optimistic concurrency, optional pessimistic concurrency |
| Scripted deployment | Not supported |
| Installers | Container is created by installers. |
The Outbox feature requires partition planning.
Usage
Add a NuGet package reference to NServiceBus.. Configure the endpoint to use the persister through the following configuration API:
endpointConfiguration.UsePersistence<CosmosPersistence>()
.CosmosClient(new CosmosClient("ConnectionString"));
Token credentials
Using a TokenCredential enables the usage of Microsoft Entra ID authentication such as managed identities for Azure resources instead of the requiring a shared secret in the connection string.
A TokenCredential can be provided by using the corresponding CosmosClient constructor overload when creating the client passed to the persister.
Customizing the database used
By default, the persister will store records in a database named NServiceBus. This can be overwritten by using the following configuration API:
endpointConfiguration.UsePersistence<CosmosPersistence>()
.CosmosClient(new CosmosClient("ConnectionString"))
.DatabaseName("DatabaseName");
Customizing the container used
The container that is used by default for all incoming messages is specified via the DefaultContainer(. configuration API:
endpointConfiguration.UsePersistence<CosmosPersistence>()
.CosmosClient(new CosmosClient("ConnectionString"))
.DefaultContainer(
containerName: "ContainerName",
partitionKeyPath: "/partition/key/path");
Added in version 3.2.1: By default, message container extractors cannot override the configured default container. To allow extractors to override the default container, enable the EnableContainerFromMessageExtractor flag:
config.UsePersistence<CosmosPersistence>()
.EnableContainerFromMessageExtractor();
When this flag is enabled and multiple extractors are configured, the last extractor in the pipeline determines the final container. For example, if both a Header Extractor (physical stage) and a Message Extractor (logical stage) are configured, the Message Extractor takes precedence.
If an extractor fails to retrieve container information, the system falls back to the next available source in this order: Message Extractor → Header Extractor → configured default container. If no default container is configured and all extractors fail, an exception is thrown.
For users upgrading from version 3.1 or older: Enabling this flag may require you to migrate data from the default container to the container configured in the message extractor. This action prevents message duplication.
When installers are enabled, this (default) container will be created if it doesn't exist. To opt-out of creating the default container, either disable the installers or use DisableContainerCreation():
endpointConfiguration.UsePersistence<CosmosPersistence>()
.CosmosClient(new CosmosClient("ConnectionString"))
.DefaultContainer(
containerName: "ContainerName",
partitionKeyPath: "/partition/key/path")
.DisableContainerCreation();
Any other containers that are resolved by extracting partition information from incoming messages need to be manually created in Azure.
The transactions documentation details additional options on how to configure NServiceBus to specify the container using the incoming message headers or contents.
Customizing the CosmosClient provider
When the CosmosClient is configured and used via dependency injection, a custom provider can be implemented:
class CustomCosmosClientProvider
: IProvideCosmosClient
{
// get fully configured via DI
public CustomCosmosClientProvider(CosmosClient cosmosClient)
{
Client = cosmosClient;
}
public CosmosClient Client { get; }
}
and registered on the container:
endpointConfiguration.RegisterComponents(c => c.AddTransient<IProvideCosmosClient, CustomCosmosClientProvider>());
Capacity planning using request units (RU)
Understanding Request Units (RUs) is essential for effective capacity planning in Azure Cosmos DB. RUs represent the cost of database operations in terms of system resources. Knowing how your workload consumes them helps you avoid throttling, control costs, and size your setup appropriately, especially when using Provisioned Throughput or Serverless accounts.
Using the Microsoft Cosmos DB Capacity Planner
Microsoft provide a Cosmos DB capacity calculator which can be used to model the throughput costs of your solution. It uses several parameters to calculate this, but only the following are directly affected when using the Azure Cosmos DB Persistence.
| Capacity Calculator Parameter | Persistence Operation | Cosmos DB Operation |
|---|---|---|
| Point reads | Logical/physical outbox read, Outbox partition key fallback read, Saga read | ReadItemStreamAsync |
| Creates | New outbox record, New saga record | CreateItemStream |
| Updates | Saga update, Saga acquire lease, Saga release lock, Outbox dispatched, Outbox delete (updates TTL) | ReplaceItemStream, UpsertItemStream, PatchItemStreamAsync, PatchItem |
| Deletes | Saga complete, Outbox TTL background cleanup | DeleteItem |
| Queries | Saga migration mode | GetItemQueryStreamIterator |
Document size also affects RU usage as the size of an item increases, the number of RUs consumed to read or write the item also increases. The table below provides an estimate of the persistence cost that should be considered per message when modeling throughput requirements.
| Record Type | Estimated Size |
|---|---|
| Outbox | ~630 bytes + message body |
| Saga | ~300 bytes + saga data |
The below tables gives an indication of what Cosmos DB operations occur in different NServiceBus endpoint configurations for every processed message. This can be used with the Cosmos DB Capacity Planner, along with other factors that affect pricing (such as the selected Cosmos DB API, number of regions, etc), and the total message throughput to produce an estimated RU capacity requirement.
No Outbox
| Incoming message Scenario | Point Reads | Creates | Updates | Deletes | Queries | Persistence Requirements* |
|---|---|---|---|---|---|---|
| No Saga | 0 | 0 | 0 | 0 | 0 | 0 bytes |
| Saga (new) | 1 | 1 | 0 | 0 | 0 | 300 bytes |
| Saga (new) + Migration Mode | 1 | 1 | 0 | 0 | 1 | 300 bytes |
| Saga (update) | 1 | 0 | 1 | 0 | 0 | 300 bytes |
| Saga (complete) | 1 | 0 | 0 | 1 | 0 | 300 bytes |
With Outbox
| Incoming message scenario | Point Reads | Creates | Updates | Deletes | Queries | Persister Requirements* |
|---|---|---|---|---|---|---|
| No Saga | 1 | 1 | 1 | 1 (delayed) | 0 | 630 bytes (1 msg sent) |
| No Saga + Partition Key Fallback Read | 2 | 1 | 1 | 1 (delayed) | 0 | 630 bytes (1 msg sent) |
| Saga (new) | 2 | 2 | 1 | 1 (delayed) | 0 | 630 + 300 = 930 bytes |
| Saga (update) | 2 | 1 | 2 | 1 (delayed) | 0 | 630 + 300 = 930 bytes |
| Saga (complete) | 2 | 1 | 1 | 2 (delayed) | 0 | 630 + 300 = 930 bytes |
| Saga + Pessimistic Locking (no contention) | 1 | 1-2 | 3-4 | 1-2 (delayed) | 0 | 630 + 360 = 990 bytes |
| Saga + Pessimistic Locking (3 retries) | 1 | 1-2 | 9-10 | 1-2 (delayed) | 0 | 630 + 360 = 990 bytes |
*Persister requirements exclude message bodies and saga data and assume one handler sends one outgoing message.
Additional operations (conditional):
- Multiple Partition Keys: Separate operations per partition key
- More outgoing messages: +400 bytes overhead per additional message sent
Example
- Outbox: Enabled
- Sagas: Order saga (average 3 KB)
- Locking: Optimistic (default)
- Message rate: 500 messages/second peak
- Each handler sends average 2 outgoing messages (1 KB each)
Calculator Inputs
| Operation Type | Calculation | Result |
|---|---|---|
| Point Reads | 500 msg/sec × 2 reads = | 1,000/sec |
| Creates | 500 msg/sec × 1 create (outbox) = | 500/sec |
| Updates | 500 msg/sec × 2 updates (saga + outbox) = | 1,000/sec |
| Deletes | 500 msg/sec avg over 24h (steady state) = | 500/sec |
| Queries | 0 | 0/sec |
| OutboxRecord size | 200 bytes + (2 × 1000 bytes) = | 2.2 KB |
| Saga size | 3000 bytes + 300 bytes metadata = | 3.3 KB |
Using Code
Another, more direct, approach to RU capacity planning would be to use a Cosmos DB RequestHandler attached to a customized CosmosClient provider in your NServiceBus endpoint in a development environment. This request handler gives you the flexibility to log every Cosmos DB request and response, and its associated RU charge. In this way, you can measure exactly what operations are being performed on the Cosmos DB database for each message for that endpoint, and what the RU costs for each operation are. This can then be multiplied by the estimated throughput of that NServiceBus endpoint when in production.
Its not recommended to monitor the RU costs using the direct RequestHandler approach in production as this could have performance implications.
//...
var endpointConfiguration = new EndpointConfiguration("Name");
var builder = new CosmosClientBuilder(cosmosConnection);
builder.AddCustomHandlers(new LoggingHandler());
CosmosClient cosmosClient = builder.Build();
//...
class LoggingHandler : RequestHandler
{
public override async Task<ResponseMessage> SendAsync(RequestMessage request, CancellationToken cancellationToken = default)
{
ResponseMessage response = await base.SendAsync(request, cancellationToken).ConfigureAwait(false);
CosmosDiagnostics diagnostics = response.Diagnostics;
// diagnostics JSON string contains the operation name. i.e. ReadItemStreamAsync
// use this to map the cosmos operation to the capacity planner using the table above
string requestChargeRU = response.Headers["x-ms-request-charge"];
if ((int)response.StatusCode == 429)
{
logger.LogWarning("Request throttled");
}
return response;
}
}
Using Azure
Alternatively, the Azure Cosmos DB Diagnostic Settings can be configured to route the diagnostic logs to an Azure Log Analytics Workspace. Here they can be queried for the same data used for RU capacity planning. This method is not recommended for live monitoring of RU usage as diagnostic logs typically are delayed by a few minutes, and cost and retention of Log Analytics would be a limiting factor.
For real time monitoring, the metrics pane in the Cosmos DB account can be used.
Provisioned throughput rate-limiting
When using provisioned throughput, it is possible for the CosmosDB service to rate-limit usage, resulting in "request rate too large" exceptions indicated by a 429 status code.
When using the Cosmos DB persister with the outbox enabled, "request rate too large" errors may result in handler re-execution and/or duplicate message dispatches depending on which operation is throttled.
Microsoft provides guidance on how to diagnose and troubleshoot request rate too large exceptions.
The Cosmos DB SDK provides a mechanism to automatically retry collection operations when rate-limiting occurs. Besides changing the provisioned RUs or switching to the serverless tier, those settings can be adjusted to help prevent messages from failing during spikes in message volume.
These settings may be set when initializing the CosmosClient via the CosmosClientOptions MaxRetryAttemptsOnRateLimitedRequests and MaxRetryWaitTimeOnRateLimitedRequests properties:
endpointConfiguration.UsePersistence<CosmosPersistence>()
.CosmosClient(new CosmosClient("ConnectionString", new CosmosClientOptions
{
MaxRetryWaitTimeOnRateLimitedRequests = TimeSpan.FromSeconds(30),
MaxRetryAttemptsOnRateLimitedRequests = 9
}));
They may also be set when using a CosmosClientBuilder via the WithThrottlingRetryOptions method:
var cosmosClientBuilder = new CosmosClientBuilder("ConnectionString")
.WithThrottlingRetryOptions(
maxRetryWaitTimeOnThrottledRequests: TimeSpan.FromSeconds(30),
maxRetryAttemptsOnThrottledRequests: 9
);
endpointConfiguration.UsePersistence<CosmosPersistence>()
.CosmosClient(cosmosClientBuilder.Build());
Transactions
The Cosmos DB persister supports using the Cosmos DB transactional batch API. However, Cosmos DB only allows operations to be batched if all operations are performed within the same logical partition key. This is due to the distributed nature of the Cosmos DB service, which does not support distributed transactions.
The transactions documentation provides additional details on how to configure NServiceBus to resolve the incoming message to a specific partition key to take advantage of this Cosmos DB feature.
Outbox
Storage format
Version 3.2.1 and up
A default synthetic partition key will be used for all incoming messages, in the format {endpointName}-{messageId}, if not explicitly overwritten at runtime.
The default partition key should be overwritten whenever the message handler creates or updates business records in CosmosDB. To guarantee atomicity, explicitly set the Outbox partition key to match the partition key of your business record. This is a requirement for including both the business record and the Outbox record in the same Cosmos DB transactional batch. Conversely, for simplicity, you can use the default partition key when a handler's logic does not involve persisting business data.
To support backward compatibility of control messages during migration, the persistence includes a fallback mechanism. When enabled (default), and if a record is not found using the synthetic key format, the system falls back to the legacy {messageId} format. Since the fallback mechanism involves an additional read operation on the Outbox container, it is recommended to turn it off once all legacy records have expired.
endpointConfiguration
.EnableOutbox()
.DisableReadFallback();
Since message identities are not unique across endpoints from a processing perspective, when overwriting the default synthetic key, either separate different endpoints into different containers or override the default synthetic partition key in a way that ensures message identities are unique to each processing endpoint.
Version 3.1 and under
Versions v3. and up: For control messages, a default partition key in the format {endpointName}-{messageId} will be used. As a result, multiple logical endpoints can share the same database and container.
Versions v3. and under: For control messages, a default partition key in the format {messageId} will be used, however these Outbox records are not separated by endpoint name. As a result, multiple logical endpoints cannot share the same database and container since message identities are not unique across endpoints from a processing perspective. To avoid conflicts, either separate different endpoints into different containers, override the partition key, or update to NServiceBus..
Outbox cleanup
When the outbox is enabled, the deduplication data is kept for seven days by default. To customize this time frame, use the following API:
var outbox = endpointConfiguration.EnableOutbox();
outbox.TimeToKeepOutboxDeduplicationData(TimeSpan.FromDays(7));
Outbox cleanup depends on the Cosmos DB time-to-live feature. Failure to remove the expired outbox records is caused by a misconfigured collection that has time-to-live disabled. Refer to the Cosmos DB documentation to configure the collection correctly.