Critical Errors

Component: NServiceBus
NuGet Package NServiceBus (6.x)

NServiceBus has built-in recoverability, however certain scenarios are not possible to handle errors in a graceful way. The reason for this is that NServiceBus does not have enough context to make a sensible decision on how to proceed after these error have occurred. Some examples Critical Errors include:

  • An Exception occurs when NServiceBus is attempting to move a message to the Error Queue.
  • There are repeated failures in reading information from a required storage.
  • An exception occurs reading from the input queue.

Default behavior

The default behavior is to stop the endpoint.

This will cause the endpoint to stop processing messages until recreated and started again. The host process is not affected.

Logging of critical errors

Critical Errors are logged inside the critical error action. This means that if replacing the Critical Error in these versions ensure to write the log entry.

var logger = LogManager.GetLogger("NServiceBus");
logger.Fatal(errorMessage, exception);

Custom handling

It is possible to providing a delegate that overrides the above action. When a Critical Error occurs the new action will be called instead of the default.

Define a custom handler using the following code.

endpointConfiguration.DefineCriticalErrorAction(OnCriticalError);

A possible custom implementation

async Task OnCriticalError(ICriticalErrorContext context)
{
    try
    {
        // To leave the process active, dispose the bus.
        // When the bus is disposed, the attempt to send message will cause an ObjectDisposedException.
        await context.Stop().ConfigureAwait(false);
        // Perform custom actions here, e.g.
        // NLog.LogManager.Shutdown();
    }
    finally
    {
        var failMessage = $"Critical error shutting down:'{context.Error}'.";
        Environment.FailFast(failMessage, context.Exception);
    }
}

When to override the default critical error action

The default action should be overridden in the following scenarios:

  • When using NServiceBus Host, in case some custom operations should be performed before the endpoint process is exited.
  • When self hosting.
The default action should be always overridden when self-hosting NServiceBus, as by default when a critical error occurs the endpoint will stop without exiting the process.
If the endpoint is stopped without exiting the process, then any Send operations will result in ObjectDisposedException being thrown.

When implementing a custom critical error callback:

  • To exit the process use the Environment.FailFast method. In case the environment has threads running that should be completed before shutdown (e.g. non transactional operations), one may also use the Environment.Exit method.
  • The code should be wrapped in a try...finally clause. In the try block perform any custom operations, in the finally block call the method that exits the process.
  • The custom operations should include flushing any in-memory state and cached data, if normally its persisted at a certain interval or during graceful shutdown. For example, flush appenders when using buffering or asynchronous appenders for NLog or log4net state by calling LogManager.Shutdown();.

Whenever possible rely on the environment hosting the endpoint process to automatically restart it:

  • IIS: The IIS host will automatically spawn a new instance.
  • Windows Service: The OS can restart the service after 1 minute if Windows Service Recovery is enabled.
It is important to consider the effect these defaults will have on other things hosted in the same process. For example if co-hosting NServiceBus with a web-service or website.

Raising Critical error

Any code in the endpoint can invoke the Critical Error action.

// 'criticalError' is an instance of NServiceBus.CriticalError
// This instance can be resolved from dependency injection
criticalError.Raise(errorMessage, exception);

ServicePulse and ServiceControl Heartbeat functionality

The ServicePulse/ServiceControl Heartbeat functionality is configured to start pinging ServiceControl immediately after the bus starts. It only stops when the process exits. So the only way for a Critical Error to result in a Heartbeat failure in ServicePulse/ServiceControl is for the Critical Error to kill the process.


Last modified