Asynchronous Handlers

It is difficult to give generic advice on structuring asynchronous code. It is important to understand compute-bound vs. I/O-bound operations and avoid copying and pasting snippets without analyzing their benefits for a given business scenario. Don't assume; measure it.

Handlers and sagas are executed by threads from the thread pool. Depending on the transport implementation, the worker thread pool thread or the I/O thread pool thread might be used. Message handlers and sagas typically issue I/O-bound work, such as sending or publishing messages, storing information in databases, and calling web services. In other cases, message handlers are used to schedule compute-bound work. To write efficient message handlers and sagas, it is crucial to understand the difference between those scenarios.

Thread pool

A thread pool is associated with a process and manages the execution of asynchronous callbacks on behalf of the application. Its primary purpose is to reduce the number of application threads and provide efficient thread management. Every thread pool manages a pool of threads designated to handle one class of workload: I/O-bound or compute-bound work.

Worker thread pool

Parallel / Compute-bound blocking work happens on the worker thread pool. Things like Task.Run, Task.Factory.StartNew, or Parallel.For schedule tasks on the worker thread pool.

Whenever a compute-bound work is scheduled, the worker thread pool will expand its worker threads (ramp-up phase). Ramping up more worker threads is expensive. The thread injection rate of the worker thread pool is limited.

Compute-bound recommendations:

Manual scheduling of compute-bound work to the worker thread pool is a top-level concern only. Use Task.Run or Task.Factory.StartNew as high up in the call hierarchy as possible (e.g., in the Handle methods of either a handler or saga.
Avoid those operations deeper in the call hierarchy.
Group compute-bound operations together as much as possible.
Make compute-bound operations coarse-grained instead of fine-grained.

I/O-thread pool

I/O-bound work is scheduled on the I/O-thread pool. The I/O-bound thread pool has a fixed number of worker threads (usually equal to the number of cores), which can work concurrently on thousands of I/O-bound tasks. I/O-bound work under Windows uses I/O completion ports (IOCP) to get notifications when an I/O-bound operation is completed. IOCP enables efficient offloading of I/O-bound work from the user code to the kernel, driver, and hardware without blocking the user code until the I/O work is done. To achieve that, the user code registers notifications in the form of a callback. The callback occurs on an I/O thread which is a pool thread managed by the I/O system made available to the user code.

I/O-bound work typically takes longer to complete compared to compute-bound work. The I/O system is optimized to keep the thread count low and schedule all callbacks, thereby allowing the execution of interleaved user code on that one thread. Due to those optimizations, all work gets serialized, and there is minimal context switching as the OS scheduler owns the threads. In general, asynchronous code can handle bursting traffic much better because of the "always-on" nature of the IOCP.

Memory and allocations

Asynchronous code tends to use much less memory because the amount of memory saved by freeing up a thread in the worker thread pool dwarfs the amount of memory used for all the compiler-generated async structures combined.

Synchronous vs. asynchronous

If each request is examined in isolation, asynchronous code would be slightly slower than the corresponding synchronous version. There might be extra kernel transitions, task scheduling, etc., but the scalability more than compensates for this.

From a server perspective, if asynchronous code is compared to synchronous code by looking at one method or one request at a time, then synchronous might make more sense. But if asynchronous code is compared to parallelism — watching the server as a whole — asynchronous wins. Every worker thread that can be freed up on a server is worth freeing up. It reduces the amount of memory needed and frees up the CPU for compute-bound work while saturating the I/O system completely.

Calling short-running, compute-bound code

Short-running, compute-bound code that is executed in the handler should be executed directly on the I/O thread that is executing the handler code.

Copy code|Copy usings|Edit

public class ShortComputeBoundHandler :
    IHandleMessages<MyMessage>
{
    public Task Handle(MyMessage message, IMessageHandlerContext context)
    {
        ComputeBoundComponent.BlocksForAShortTime();
        return Task.CompletedTask;
    }
}

Call the code directly, and do not wrap it with a Task.Run or Task.Factory.StartNew.

This approach is acceptable for most business scenarios since many of the asynchronous base class library methods in the .NET Framework will schedule continuations on the worker thread pool; the likelihood that no I/O thread is blocked is high.

Calling long-running, compute-bound code

This approach should be used only after a thorough analysis of the runtime behavior and the code involved in the call hierarchy of a handler. Wrapping code inside the handler with Task.Run or Task.Factory.StartNew can seriously harm the throughput if applied incorrectly. It should be used when multiple long-running compute-bound tasks must be executed in parallel.

Long-running compute-bound code executed in a handler could be offloaded to the worker thread pool.

Copy code|Copy usings|Edit

public class LongComputeBoundHandler :
    IHandleMessages<MyMessage>
{
    public Task Handle(MyMessage message, IMessageHandlerContext context)
    {
        var longRunning1 = Task.Run(() => ComputeBoundComponent.BlocksForALongTime(), context.CancellationToken);
        var longRunning2 = Task.Run(() => ComputeBoundComponent.BlocksForALongTime(), context.CancellationToken);
        return Task.WhenAll(longRunning1, longRunning2);
    }
}

Wrap the compute-bound code in a Task.Run or Task.Factory.StartNew and await the result of the task.

Return or await

Await the task

For the majority of cases, it is sufficient to mark the handler's Handle method with the async keyword and await all asynchronous calls inside the method.

Copy code|Copy usings|Edit

public class HandlerAwaitsTheTask :
    IHandleMessages<MyMessage>
{
    public async Task Handle(MyMessage message, IMessageHandlerContext context)
    {
        await SomeLibrary.SomeAsyncMethod(message);
    }
}

Return the task

For high-throughput scenarios, and if there are only one or two asynchronous exit points in the Handle method, the async keyword can be avoided entirely by returning the task instead of awaiting it. This will omit the state machine creation, which drives the async code, and reduce the number of allocations on the given code path.

Copy code|Copy usings|Edit

public class HandlerReturnsATask :
    IHandleMessages<MyMessage>
{
    public Task Handle(MyMessage message, IMessageHandlerContext context)
    {
        var task = SomeLibrary.SomeAsyncMethod(message);
        return task;
    }
}

Copy code|Copy usings|Edit

public class HandlerReturnsTwoTasks :
    IHandleMessages<MyMessage>
{
    bool someCondition = true;

    public Task Handle(MyMessage message, IMessageHandlerContext context)
    {
        if (someCondition)
        {
            // Task.CompletedTask
            return Task.CompletedTask;
        }

        return SomeLibrary.SomeAsyncMethod(message);
    }
}

Concurrency

Task-based APIs enable better composition of asynchronous code and allow conscious decisions on whether to execute the asynchronous code sequentially or concurrently.

Small amount of concurrent message operations

Batched

By default, all outgoing message operations on the message handler contexts are batched. Batching means messages are kept in memory and sent out when the handler is completed. So, the I/O-bound work happens outside the execution scope of a handler (individual transports may apply optimizations). For a few outgoing message operations, it makes sense to reduce complexity to sequentially await all the outgoing operations, as shown below.

Copy code|Copy usings|Edit

public class BatchedDispatchHandler :
    IHandleMessages<MyMessage>
{
    public async Task Handle(MyMessage message, IMessageHandlerContext context)
    {
        for (var i = 0; i < 100; i++)
        {
            var myMessage = new MyMessage();
            await context.Send(myMessage);
        }
    }
}

Immediate dispatch

Immediate dispatch means outgoing message operations will be immediately dispatched to the underlying transport. For immediate dispatch operations, it might make sense to execute them concurrently, as shown below.

Copy code|Copy usings|Edit

public class ImmediateDispatchHandler :
    IHandleMessages<MyMessage>
{
    public Task Handle(MyMessage message, IMessageHandlerContext context)
    {
        var tasks = new Task[100];
        for (var i = 0; i < 100; i++)
        {
            var options = new SendOptions();
            options.RequireImmediateDispatch();

            var myMessage = new MyMessage();
            tasks[i] = context.Send(myMessage, options);
        }
        return Task.WhenAll(tasks);
    }
}

Large amount of concurrent message operations

Unbounded concurrency can be problematic. For large numbers of concurrent message operations, it might be preferable to package multiple outgoing operations together into batches, limiting the concurrency to the size of an individual batch (divide & conquer).

Copy code|Copy usings|Edit

public class PacketsImmediateDispatchHandler :
    IHandleMessages<MyMessage>
{
    public async Task Handle(MyMessage message, IMessageHandlerContext context)
    {
        for (var i = 0; i < 100; i++)
        {
            var tasks = new Task[100];
            for (var j = 0; j < 100; j++)
            {
                var options = new SendOptions();
                options.RequireImmediateDispatch();
                var myMessage = new MyMessage();
                tasks[j] = context.Send(myMessage, options);
            }
            await Task.WhenAll(tasks);
        }
    }
}

It is also possible to limit the concurrency by using SemaphoreSlim as shown below.

Copy code|Copy usings|Edit

public class LimitConcurrencyImmediateDispatchHandler :
    IHandleMessages<MyMessage>
{
    public async Task Handle(MyMessage message, IMessageHandlerContext context)
    {
        var semaphore = new SemaphoreSlim(100);

        var tasks = new Task[10000];
        for (var i = 0; i < 10000; i++)
        {
            await semaphore.WaitAsync(context.CancellationToken);

            tasks[i] = Send(context, semaphore);
        }
        await Task.WhenAll(tasks);
    }

    static async Task Send(IMessageHandlerContext context, SemaphoreSlim semaphore)
    {
        try
        {
            var options = new SendOptions();
            options.RequireImmediateDispatch();
            var message = new MyMessage();
            await context.Send(message, options);
        }
        finally
        {
            semaphore.Release();
        }
    }
}

In practice, packaging operations together has proven to be more effective regarding memory allocations and performance. The snippet is shown nonetheless for completeness reasons as well as because SemaphoreSlim is a helpful concurrency primitive for various scenarios.

Integration with non-tasked-based APIs

Events

Sometimes, it is necessary to call APIs from an asynchronous handler that uses events as the trigger for completion. Before async/await was introduced, ManualResetEvent or AutoResetEvent were usually used to synchronize runtime code flow. Unfortunately, these synchronization primitives are of a blocking nature. For asynchronous one-time event synchronization, the TaskCompletionSource<TResult> can be used.

Copy code|Copy usings|Edit

public class HandlerWhichIntegratesWithEvent :
    IHandleMessages<MyMessage>
{
    public async Task Handle(MyMessage message, IMessageHandlerContext context)
    {
        var cancellationToken = new CancellationTokenSource();
        cancellationToken.CancelAfter(TimeSpan.FromSeconds(10));

        var taskCompletionSource = new TaskCompletionSource<object>();

        using (cancellationToken.Token.Register(
            callback: state =>
            {
                var completionSource = (TaskCompletionSource<object>)state;
                completionSource.TrySetCanceled();
            },
            state: taskCompletionSource))
        {
            var dependency = new DependencyWhichRaisedEvent();
            dependency.MyEvent += (sender, args) =>
            {
                taskCompletionSource.TrySetResult(null);
            };

            await taskCompletionSource.Task;
        }
    }
}

The above snippet shows how a TaskCompletionSource<TResult> can be used to asynchronously wait for an event to happen and optionally cancel it.

Thread pool

Worker thread pool

Compute-bound recommendations:

I/O-thread pool

Memory and allocations

Synchronous vs. asynchronous

Calling short-running, compute-bound code

Calling long-running, compute-bound code

Return or await

Await the task

Return the task

Concurrency

Small amount of concurrent message operations

Batched

Immediate dispatch

Large amount of concurrent message operations

Integration with non-tasked-based APIs

Events

In this article