SQL Server transport in Azure SQL

Prior to NServiceBus.SqlServer version 6.0.1, the SQL Server transport used clustered indexes based on an identity column (sequence number). This structure was prone to high contention that caused severe issues in high-throughput scenarios, especially on Azure SQL. The index migration guide describes the details of upgrading to the new table schema.

Using SQL Server with Azure SQL allows building messaging systems that provide exactly-once message processing guarantees. The same instance of Azure SQL is used both as an application data store and as messaging infrastructure.

This article discusses throughput characteristics of the SQL Server transport in Azure SQL. Numbers presented here are rough estimates of what can be expected when running on Azure SQL. All measurements were made on a Mananaged Instance database running in vCore General Purpose service tier.

Testing methodology

A small set of stress tests has been done to estimate the maximum throughput on a single queue. It yielded the following results:

vCoremsg/s
8600
16800

While useful, these results are not representative of most real-life scenarios. In most deployments there are at least tens or even hundreds of endpoints, each processing messages with much smaller throughput.

A set of load tests were designed to measure the CPU usage while processing messages coming at a constant rate. Messages were processed by a chain of 15 endpoints. Each endpoint in the chain was forwarding messages to the next one.

Here are the CPU usage values:

  • 2 vCPU
msgs/sresource %
2530% IO, 35% CPU
5065% IO, 65% CPU
6080% IO, 80% CPU
  • 4 vCPU
msgs/sresource %
2515% IO, 15% CPU
5035% IO, 35% CPU
7550% IO, 60% CPU
9060% IO, 85% CPU
  • 8 vCPU
msgs/sresource %
258% IO, 8% CPU
5017% IO, 17% CPU
7525% IO, 30% CPU
10032% IO, 50% CPU
12540% IO, 80% CPU
  • 16 vCPU
msgs/sresource %
259% IO, 4% CPU
5017% IO, 9% CPU
7525% IO, 16% CPU
10033% IO, 25% CPU
12540-50% IO, 35-45% CPU

Discussion

The total system throughput (for 15 endpoints in the test) for different per-endpoint throughputs are:

single endpoint msg/stotal msg/s
25375
50750
60900
751125
1001500
1251875

In these scenarios the database is used both as a transport and as a data store for the application state, so it is recommended to assume that the CPU usage of the transport should not exceed 35% on average. That means that for a system consisting of 15 endpoints, the recommended vCPU count as a function of total throughput is:

total msg/svCores
0-3752
375-7504
750-11258
1125-150016
The exact values for a production system may vary but the table above should provide a basis for estimations. It is worth noting that the SQL Server transport scales better with number of endpoints than with throughput of each endpoint. That means that higher total throughput can be achieved with higher number of endpoints and lower per-endpoint throughput. In an extreme case, a 16 vCPU database can handle only around 800 messages with a single queue and well over 2000 messages when running with 15 queues.

When designing Software-as-a-Service systems, it is recommended to use separate databases and endpoint sets for groups of tenants rather than for vertical slices of a business flow. With this approach each database would support relatively high number (whole business flow end-to-end) of low-throughput endpoints (a portion of customer base) as opposed to low number of high-throughput endpoints.

When using multiple instances of SQL Azure consider connecting them with NServiceBus.Router. In order to ensure exactly-once message processing semantics, consider elastic transactions in the message router.


Last modified