Although messaging systems work best with small message sizes, some scenarios require sending binary large objects (BLOBs) data along with a message (also known as a Claim Check). For this purpose, NServiceBus has a Data Bus feature to overcome the message size limitations imposed by an underlying transport.
How it works
Instead of serializing the payload along with the rest of the message, the Data Bus
approach involves storing the payload in a separate location that both the sending and receiving parties can access, then putting the reference to that location in the message.
If the location is not available upon sending, the send operation will fail. When a message is received and the payload location is not available, the receive operation will fail as well, resulting in the standard NServiceBus retry behavior, possibly resulting in the message being moved to the error queue if the error could not be resolved.
Transport message size limits
The Data Bus may be used to send messages which exceed the transport's message size limit, which is determined by the message size limit of the underlying queuing/storage technologies.
Transport | Maximum size |
---|---|
Amazon SQS | 256KB |
Amazon SQS (using S3) | 2GB |
Azure Storage Queues | 64KB |
Azure Service Bus (Standard tier) | 256KB |
Azure Service Bus (Premium tier) | 100MB |
RabbitMQ | Configured by max_message_size |
SQL Server | No limit |
Learning | No limit |
MSMQ | 4MB |
Enabling the data bus
See the individual data bus implementations for details on enabling and configuring the data bus.
Cleanup
By default, BLOBs are stored with no set expiration. If messages have a time to be received set, the data bus will pass this along to the data bus storage implementation.
Specifying data bus properties
There are two ways to specify the message properties to be sent using the data bus:
- Using the
DataBusProperty
type<T> - Message conventions
Using DataBusProperty<T>
Set the type of the property to be sent over the data bus as DataBusProperty
:
public class MessageWithLargePayload
{
public string SomeProperty { get; set; }
public DataBusProperty<byte[]> LargeBlob { get; set; }
}
Using message conventions
NServiceBus also supports defining data bus properties by convention. This allows data properties to be sent using the data bus without using DataBusProperty
, thus removing the need for having a dependency on NServiceBus from the message types.
In the configuration of the endpoint include:
var conventions = endpointConfiguration.Conventions();
conventions.DefiningDataBusPropertiesAs(property =>
{
return property.Name.EndsWith("DataBus");
});
Set the type of the property as byte[]
:
public class MessageWithLargePayload
{
public string SomeProperty { get; set; }
public byte[] LargeBlobDataBus { get; set; }
}
Serialization
To configure the data bus, a serializer must be chosen. The recommended serializer is SystemJsonDataBusSerializer
which is built into NServiceBus
and uses the System.Text.Json serializer.
endpointConfiguration.UseDataBus<FileShareDataBus, SystemJsonDataBusSerializer>();
Additional deserializers
Additional deserializers can be added when configuring the data bus. They are picked up based on the data bus content-type header of the message, and also when the main serializer fails to deserialize a message.
endpointConfiguration.UseDataBus<FileShareDataBus, SystemJsonDataBusSerializer>()
.AddDeserializer<BsonDataBusSerializer>();
Implementing custom serializers
To override the data bus property serializer, create a class that implements IDataBusSerializer
and add it to the dependency injection container when configuring the data bus. The custom serializer must be available to both the sending and the receiving endpoints.
IDataBusSerializer
should not close Stream
instances that NServiceBus provides. NServiceBus manages the lifecycle of these Stream
instances and may attempt to manipulate them after the custom serializer has been called.Type
property provided to the Deserialize
method.Data bus attachments cleanup
The various data bus implementations each behave differently with regard to cleanup of physical attachments used to transfer data properties depending on the implementation used.
Why attachments are not removed by default
Automatically removing these attachments can cause problems in many situations. For example:
- The supported data bus implementations do not participate in distributed transactions. If the message handler throws an exception and the transaction rolls back, the delete operation on the attachment cannot be rolled back. Therefore, when the message is retried, the attachment will no longer be present causing additional problems.
- The message can be deferred so that the file will be processed later. Removing the file after deferring the message, results in a message without the corresponding file.
- Functional requirements might dictate the message to be available for a longer duration.
- If the data bus feature is used when publishing an event to multiple subscribers, neither the publisher nor any specific subscribing endpoint can determine when all subscribers have successfully processed the message allowing the file to be cleaned up.
- If message processing fails, it will be handled by the recoverability feature. This message can then be retried some period after that failure. The data bus files need to exist for that message to be re-processed correctly.
Alternatives
- Use a different transport or different tier (e.g. Azure Service Bus Premium instead of Standard).
- Use message body compression, which works well on text-based payloads like XML and JSON or any payload (text or binary) that contains repetitive data.
- The message mutator sample demonstrates message body compression.
- Use stream-based properties.
- The sample showing how to handle large stream properties via the pipeline demonstrates a purely stream-based approach (instead of loading full payloads into memory).
- Use a more efficient serializer, such as a binary serializer.
- A custom serializer can usually be implemented in only a few lines of code.
- Some binary serializers are maintained by the community.
- Use NServiceBus.Attachments for unbounded binary payloads. The package is similar to the Data Bus but has some differences:
- Read on demand: Attachments are only retrieved when read by a consumer.
- Async enumeration: The package supports processing all data items using an
IAsyncEnumerable
. - No serialization: The serializer is not used, which may result in a significant reduction in memory usage.
- Direct stream access: This makes the package more suitable for binary large objects (BLOBs since stream contents do not necessarily have to be loaded into memory before storing them or when retrieving them.
Other considerations
Monitoring and reliability
The storage location for data bus blobs is critical to the operation of endpoints. As such it should be as reliable as other infrastructure such as the transport or persistence. It should also be monitored for errors and be actively maintained. Since messages cannot be sent or received when the storage location is unavailable, it may be necessary to stop endpoints when maintenance tasks occur.
Auditing
The data stored in data bus blobs may be considered part of an audit record. In these cases data bus blobs should be archived alongside messages for as long as the audit record is required.