Automatic Retries

Sometimes processing of a message can fail. This could be due to a transient problem like a deadlock in the database, in which case retrying the message a few times might overcome this problem. Or, if the problem is more protracted, like a third party web service going down or a database being unavailable, where it might be useful to wait a little longer before retrying the message again.

For situations like these, NServiceBus offers two levels of retries:

  • First Level Retry(FLR) is for the transient errors where quick successive retries could solve the problem.
  • Second Level Retry(SLR) is when a small delay is needed between retries.
When a message cannot be deserialized, it will bypass all retry mechanisms both the FLR and SLR and the message will be moved directly to the error queue.

First Level Retries

NServiceBus automatically retries the message when an exception is thrown during message processing up to five successive times by default. This value can be configured through app.config or code.

The configured value describes the minimum number of times a message will be retried. Especially in environments with competing consumers on the same queue there is an increased chance of retrying a failing message more often across the endpoints.

Configuring FLR using app.config

In Version 3 this configuration was available via MsmqTransportConfig.

In Version 4 and higher the configuration for this mechanism is implemented in the TransportConfig section.

<configuration>
  <configSections>
    <section name="MsmqTransportConfig"
             type="NServiceBus.Config.MsmqTransportConfig, NServiceBus.Core" />
  </configSections>
  <MsmqTransportConfig ErrorQueue="error"
                       NumberOfWorkerThreads="1"
                       MaxRetries="5"/>
</configuration>
<configuration>
  <configSections>
    <section name="TransportConfig"
             type="NServiceBus.Config.TransportConfig, NServiceBus.Core"/>
  </configSections>
  <MessageForwardingInCaseOfFaultConfig ErrorQueue="error"/>
  <TransportConfig MaxRetries="2" />
</configuration>
<configuration>
  <configSections>
    <section name="TransportConfig"
             type="NServiceBus.Config.TransportConfig, NServiceBus.Core"/>
  </configSections>
  <TransportConfig MaxRetries="2" />
</configuration>
<configuration>
  <configSections>
    <section name="TransportConfig"
             type="NServiceBus.Config.TransportConfig, NServiceBus.Core"/>
  </configSections>
  <TransportConfig MaxRetries="2" />
</configuration>

Configuring FLR through IProvideConfiguration

class ProvideConfiguration : IProvideConfiguration<TransportConfig>
{
    public TransportConfig GetConfiguration()
    {
        return new TransportConfig
        {
            MaxRetries = 2
        };
    }
}
class ProvideConfiguration : IProvideConfiguration<TransportConfig>
{
    public TransportConfig GetConfiguration()
    {
        return new TransportConfig
        {
            MaxRetries = 2
        };
    }
}

Configuring FLR through ConfigurationSource

public class ConfigurationSource : IConfigurationSource
{
    public T GetConfiguration<T>() where T : class, new()
    {
        //To Provide FLR Config
        if (typeof(T) == typeof(MsmqTransportConfig))
        {
            MsmqTransportConfig flrConfig = new MsmqTransportConfig
            {
                MaxRetries = 2
            };

            return flrConfig as T;
        }

        // To in app.config for other sections not defined in this method, otherwise return null.
        return ConfigurationManager.GetSection(typeof(T).Name) as T;
    }
}
public class ConfigurationSource : IConfigurationSource
{
    public T GetConfiguration<T>() where T : class, new()
    {
        //To Provide FLR Config
        if (typeof(T) == typeof(TransportConfig))
        {
            TransportConfig flrConfig = new TransportConfig
            {
                MaxRetries = 2
            };

            return flrConfig as T;
        }

        // To in app.config for other sections not defined in this method, otherwise return null.
        return ConfigurationManager.GetSection(typeof(T).Name) as T;
    }
}
public class ConfigurationSource : IConfigurationSource
{
    public T GetConfiguration<T>() where T : class, new()
    {
        //To Provide FLR Config
        if (typeof(T) == typeof(TransportConfig))
        {
            TransportConfig flrConfig = new TransportConfig
            {
                MaxRetries = 2
            };

            return flrConfig as T;
        }

        // To in app.config for other sections not defined in this method, otherwise return null.
        return ConfigurationManager.GetSection(typeof(T).Name) as T;
    }
}
Configure configure = Configure.With();
configure.CustomConfigurationSource(new ConfigurationSource());
busConfiguration.CustomConfigurationSource(new ConfigurationSource());
configuration.CustomConfigurationSource(new ConfigurationSource());
From Version 6, configuration of the FLR mechanism will have no effect on how many times a deferred message is dispatched when an exception is thrown. In such a case the TimeoutManager will attempt the dispatch five times.

Second Level Retries

SLR introduces another level of retrying mechanism for messages that fail processing. When using SLR, the message that causes the exception is, as before, instantly retried, but instead of being sent to the error queue, it is sent to a retries queue.

SLR then picks up the message and defers it, by default first for 10 seconds, then 20, and lastly for 30 seconds, then returns it to the original worker queue.

For example, if there is a call to an web service in your handler, but the service goes down for five seconds just at that time. Without SLR, the message is retried instantly and sent to the error queue. With SLR, the message is instantly retried, deferred for 10 seconds, and then retried again. This way, when the Web Service is available the message is processed just fine.

Retrying messages for extended periods of time would hide failures from operators preventing them from taking manual action to honor Service Level Agreements. To avoid this happening, due to miss-configured retry polices, NServiceBus will make sure that no message is retried for more than 24 hours before being sent the error queue.

SLR can be configured in several ways.

Configuring SLR using app.config

To configure SLR, enable its configuration section:

<configSections>
  <section name="SecondLevelRetriesConfig"
           type="NServiceBus.Config.SecondLevelRetriesConfig, NServiceBus.Core"/>
</configSections>
<SecondLevelRetriesConfig Enabled="true" 
                          TimeIncrease="00:00:10" 
                          NumberOfRetries="3" />
<configSections>
  <section name="SecondLevelRetriesConfig"
           type="NServiceBus.Config.SecondLevelRetriesConfig, NServiceBus.Core"/>
  </configSections>
<SecondLevelRetriesConfig Enabled="true"
                          TimeIncrease="00:00:10"
                          NumberOfRetries="3" />
  • Enabled: Turns the feature on and off. Default: true.
  • TimeIncrease: A time span after which the time between retries increases. Default: 10 seconds (00:00:10).
  • NumberOfRetries: Number of times SLR kicks in. Default: 3.

Configuration SLR through IProvideConfiguration

class ProvideConfiguration : IProvideConfiguration<SecondLevelRetriesConfig>
{
    public SecondLevelRetriesConfig GetConfiguration()
    {
        return new SecondLevelRetriesConfig
        {
            Enabled = true,
            NumberOfRetries = 2,
            TimeIncrease = TimeSpan.FromSeconds(10)
        };
    }
}
class ProvideConfiguration : IProvideConfiguration<SecondLevelRetriesConfig>
{
    public SecondLevelRetriesConfig GetConfiguration()
    {
        return new SecondLevelRetriesConfig
        {
            Enabled = true,
            NumberOfRetries = 2,
            TimeIncrease = TimeSpan.FromSeconds(10)
        };
    }
}

Configuring SLR through ConfigurationSource

public class ConfigurationSource : IConfigurationSource
{
    public T GetConfiguration<T>() where T : class, new()
    {
        // To provide SLR Config
        if (typeof(T) == typeof(SecondLevelRetriesConfig))
        {
            SecondLevelRetriesConfig slrConfig = new SecondLevelRetriesConfig
            {
                Enabled = true,
                NumberOfRetries = 2, 
                TimeIncrease = TimeSpan.FromSeconds(10)
            };

            return slrConfig as T;
        }

        // To in app.config for other sections not defined in this method, otherwise return null.
        return ConfigurationManager.GetSection(typeof(T).Name) as T;
    }
}
public class ConfigurationSource : IConfigurationSource
{
    public T GetConfiguration<T>() where T : class, new()
    {
        // To provide SLR Config
        if (typeof(T) == typeof(SecondLevelRetriesConfig))
        {
            SecondLevelRetriesConfig slrConfig = new SecondLevelRetriesConfig
            {
                Enabled = true,
                NumberOfRetries = 2, 
                TimeIncrease = TimeSpan.FromSeconds(10)
            };

            return slrConfig as T;
        }

        // To in app.config for other sections not defined in this method, otherwise return null.
        return ConfigurationManager.GetSection(typeof(T).Name) as T;
    }
}
Configure configure = Configure.With();
configure.CustomConfigurationSource(new ConfigurationSource());
busConfiguration.CustomConfigurationSource(new ConfigurationSource());
configuration.CustomConfigurationSource(new ConfigurationSource());

Disabling SLR through code

To completely disable SLR through code:

Configure configure = Configure.With();
configure.DisableSecondLevelRetries();
Configure.Features
    .Disable<NServiceBus.Features.SecondLevelRetries>();
BusConfiguration busConfiguration = new BusConfiguration();
busConfiguration.DisableFeature<SecondLevelRetries>();
EndpointConfiguration configuration = new EndpointConfiguration();
configuration.DisableFeature<SecondLevelRetries>();

Custom Retry Policy

You can apply custom retry logic based on headers or timing in code.

Applying a custom policy

SecondLevelRetries.RetryPolicy = MyCustomRetryPolicy;
Configure.Features.SecondLevelRetries(s => s.CustomRetryPolicy(MyCustomRetryPolicy));
SecondLevelRetriesSettings retriesSettings = busConfiguration.SecondLevelRetries();
retriesSettings.CustomRetryPolicy(MyCustomRetryPolicy);
SecondLevelRetriesSettings retriesSettings = configuration.SecondLevelRetries();
retriesSettings.CustomRetryPolicy(MyCustomRetryPolicy);

Error Headers Helper

A Custom Policy has access to the raw message including both the retries handling headers and the error forwarding headers. Any of these headers can be used to control the reties for a message. In the below examples the following helper class will provide access to a subset of the headers.

static class ErrorsHeadersHelper
{

    internal static int NumberOfRetries(this TransportMessage transportMessage)
    {
        string value;
        if (transportMessage.Headers.TryGetValue(Headers.Retries, out value))
        {
            return int.Parse(value);
        }
        return 0;
    }

    internal static string ExceptionType(this TransportMessage transportMessage)
    {
        return transportMessage.Headers["NServiceBus.ExceptionInfo.ExceptionType"];
    }

}
static class ErrorsHeadersHelper
{

    internal static int NumberOfRetries(this IncomingMessage incomingMessage)
    {
        string value;
        if (incomingMessage.Headers.TryGetValue(Headers.Retries, out value))
        {
            return int.Parse(value);
        }
        return 0;
    }

    internal static string ExceptionType(this IncomingMessage incomingMessage)
    {
        return incomingMessage.Headers["NServiceBus.ExceptionInfo.ExceptionType"];
    }

}

Simple Policy

Here is a simple retry policy that will retry 3 times with a 5 second interval.

TimeSpan MyCustomRetryPolicy(TransportMessage transportMessage)
{
    // retry max 3 times
    if (transportMessage.NumberOfRetries() >= 3)
    {
        // sending back a TimeSpan.MinValue tells the 
        // SecondLevelRetry not to retry this message
        return TimeSpan.MinValue;
    }

    return TimeSpan.FromSeconds(5);
}
TimeSpan MyCustomRetryPolicy(IncomingMessage incomingMessage)
{
    // retry max 3 times
    if (incomingMessage.NumberOfRetries() >= 3)
    {
        // sending back a TimeSpan.MinValue tells the 
        // SecondLevelRetry not to retry this message
        return TimeSpan.MinValue;
    }

    return TimeSpan.FromSeconds(5);
}

Exception based Policy

Here is a policy that extends the above with custom handling for a specific exception.

TimeSpan MyCustomRetryPolicy(TransportMessage transportMessage)
{
    if (transportMessage.ExceptionType() == typeof(MyBusinessException).FullName)
    {
        // Do not retry for MyBusinessException
        return TimeSpan.MinValue;
    }

    if (transportMessage.NumberOfRetries() >= 3)
    {
        return TimeSpan.MinValue;
    }

    return TimeSpan.FromSeconds(5);
}
TimeSpan MyCustomRetryPolicy(IncomingMessage incomingMessage)
{
    if (incomingMessage.ExceptionType() == typeof(MyBusinessException).FullName)
    {
        // Do not retry for MyBusinessException
        return TimeSpan.MinValue;
    }

    if (incomingMessage.NumberOfRetries() >= 3)
    {
        return TimeSpan.MinValue;
    }

    return TimeSpan.FromSeconds(5);
}

Total number of possible attempts

The total number of possible attempts can be calculated with the below formula

Total Attempts = (FLR:MaxRetries) * (SLR:NumberOfRetries + 1)

So for example given a variety of FLR and SLR here are the resultant possible attempts.

FLR:MaxRetries SLR:NumberOfRetries Total possible attempts
1 1 2
1 2 3
1 3 4
2 1 4
3 1 6
2 2 6

Retry Logging

Given the following configuration:

  • FLR MaxRetries: 3
  • SLR NumberOfRetries: 2

And a Handler that both throws and exception and logs the current count of attempts:

Then the resultant output in the log will be:

Handler - Attempt 1
Warn. UnicastBus. Exception included. Text: MyMessageHandler failed handling message.
Warn. TransactionalTransport. Exception included. Text: Failed raising 'transport message received' event for message with ID=messageId

Handler - Attempt 2
Warn. UnicastBus. Exception included. Text: MyMessageHandler failed handling message.
Warn. TransactionalTransport. Exception included. Text: Failed raising 'transport message received' event for message with ID=messageId.

Handler - Attempt 3
Warn. UnicastBus. Exception included. Text: MyMessageHandler failed handling message.
Warn. TransactionalTransport. Exception included. Text: Failed raising 'transport message received' event for message with ID=messageId.
Error. TransactionalTransport. Exception omitted. Text: Message has failed the maximum number of times allowed, ID=messageId.

Handler - Attempt 4
Warn. UnicastBus. Exception included. Text: MyMessageHandler failed handling message.
Warn. TransactionalTransport. Exception included. Text: Failed raising 'transport message received' event for message with ID=messageId.

Handler - Attempt 5
Warn. UnicastBus. Exception included. Text: MyMessageHandler failed handling message.
Warn. TransactionalTransport. Exception included. Text: Failed raising 'transport message received' event for message with ID=messageId.

Handler - Attempt 6
Warn. UnicastBus. Exception included. Text: MyMessageHandler failed handling message.
Warn. TransactionalTransport. Exception included. Text: Failed raising 'transport message received' event for message with ID=messageId.
Error. TransactionalTransport. Exception omitted. Text: Message has failed the maximum number of times allowed, ID=messageId.

Handler - Attempt 7
Warn. UnicastBus. Exception included. Text: MyMessageHandler failed handling message.
Warn. TransactionalTransport. Exception included. Text: Failed raising 'transport message received' event for message with ID=messageId.

Handler - Attempt 8
Warn. UnicastBus. Exception included. Text: MyMessageHandler failed handling message. 
Warn. TransactionalTransport. Exception included. Text: Failed raising 'transport message received' event for message with ID=messageId.

Handler - Attempt 9
Warn. UnicastBus. Exception included. Text: MyMessageHandler failed handling message.
Warn. TransactionalTransport. Exception included. Text: Failed raising 'transport message received' event for message with ID=messageId.
Error. TransactionalTransport. Exception omitted. Text: Message has failed the maximum number of times allowed, ID=messageId.
Info.  SecondLevelRetries. Exception omitted. Text: Send message to error queue, error@machine
Handler - Attempt 1
Info. TransportReceiver. Exception included. Text: Failed to process message.

Handler - Attempt 2
Info. TransportReceiver. Exception included. Text: Failed to process message.

Handler - Attempt 3
Info. TransportReceiver. Exception included. Text: Failed to process message.
Warn. FaultManager. Exception omitted. Text: Message with 'messageId' id has failed FLR and will be handed over to SLR for retry attempt 1.

Handler - Attempt 4
Info. TransportReceiver. Exception included. Text: Failed to process message.

Handler - Attempt 5
Info. TransportReceiver. Exception included. Text: Failed to process message.

Handler - Attempt 6
Info. TransportReceiver. Exception included. Text: Failed to process message.
Warn. FaultManager. Exception omitted. Text: Message with 'messageId' id has failed FLR and will be handed over to SLR for retry attempt 2.

Handler - Attempt 7
Type. TransportReceiver. Exception included. Text: Failed to process message. 

Handler - Attempt 8
Info. TransportReceiver. Exception included. Text: Failed to process message. 

Handler - Attempt 9
Info. TransportReceiver. Exception included. Text: Failed to process message. 
Warn. FaultManager. Exception omitted. Text: Message with 'messageId' id has failed FLR and will be handed over to SLR for retry attempt 3.
Handler - Attempt 1
Info. TransportReceiver. Exception included. Text: Failed to process message.

Handler - Attempt 2
Info. TransportReceiver. Exception included. Text: Failed to process message.

Handler - Attempt 3
Info. TransportReceiver. Exception included. Text: Failed to process message.
Warn. FaultManager. Exception omitted. Text: Message with 'messageId' id has failed FLR and will be handed over to SLR for retry attempt 1.

Handler - Attempt 4
Info. TransportReceiver. Exception included. Text: Failed to process message.

Handler - Attempt 5
Info. TransportReceiver. Exception included. Text: Failed to process message.

Handler - Attempt 6
Info. TransportReceiver. Exception included. Failed to process message.
Warn. FaultManager. Exception omitted. Text: Message with 'messageId' id has failed FLR and will be handed over to SLR for retry attempt 2.

Handler - Attempt 7
Info. TransportReceiver. Exception included. Text: Failed to process message.

Handler - Attempt 8
Info. TransportReceiver. Exception included. Text: Failed to process message.

Handler - Attempt 9
Info. TransportReceiver. Exception included. Text: Failed to process message.
Error. FaultManager. Exception omitted. Text: SLR has failed to resolve the issue with message messageId and will be forwarded to the error queue at error@machine.

Note that in some cases a log entry contains the exception (Exception included) and in some cases it is omitted (Exception omitted)

Samples

  • Fault Tolerance
    See how NServiceBus messaging can get past all sorts of failure scenarios.
  • Notifications
    Illustrates using the notifications API

Last modified 2016-02-04 23:38:43Z