Automatic Retries

Sometimes processing of a message fails. This could be due to a transient problem like a deadlock in the database, in which case retrying the message a few times should solve the issue. If the problem is more protracted, like a third party web service going down or a database being unavailable, solving the issue would take longer. It is therefore useful to wait longer before retrying the message again.

NServiceBus offers two levels of retries:

  • First Level Retry(FLR) is for transient errors, where quick successive retries solve the problem.
  • Second Level Retry(SLR) is for errors that persist after FLR, where a small delay is needed between retries.
When a message cannot be deserialized, it will bypass all retry mechanisms and the message will be moved directly to the error queue.

First Level Retries

NServiceBus automatically retries the message when an exception is thrown during message processing for up to five times by default. This value can be configured through app.config or via code.

The configured value describes the minimum number of times a message will be retried. Especially in environments with competing consumers on the same queue, there is an increased chance of retrying a failing message more times across the endpoints.

Transport transaction requirements

The FLR mechanism is implemented by making the message available for consumption again at the top of the queue, so that the endpoint can process it again immediately. FLR cannot be used when transport transactions are disabled. For more information about transport transactions, refer to transport transaction.

Configuring FLR using app.config

In Version 3 this configuration was available via MsmqTransportConfig.

In Version 4 and above the configuration for this mechanism is implemented in the TransportConfig section.

6-pre NServiceBus
<configuration>
  <configSections>
    <section name="TransportConfig"
             type="NServiceBus.Config.TransportConfig, NServiceBus.Core"/>
  </configSections>
  <TransportConfig MaxRetries="2" />
</configuration>
5.x NServiceBus
<configuration>
  <configSections>
    <section name="TransportConfig"
             type="NServiceBus.Config.TransportConfig, NServiceBus.Core"/>
  </configSections>
  <TransportConfig MaxRetries="2" />
</configuration>
4.x NServiceBus
<configuration>
  <configSections>
    <section name="TransportConfig"
             type="NServiceBus.Config.TransportConfig, NServiceBus.Core"/>
  </configSections>
  <MessageForwardingInCaseOfFaultConfig ErrorQueue="error"/>
  <TransportConfig MaxRetries="2" />
</configuration>
3.x NServiceBus
<configuration>
  <configSections>
    <section name="MsmqTransportConfig"
             type="NServiceBus.Config.MsmqTransportConfig, NServiceBus.Core" />
  </configSections>
  <MsmqTransportConfig ErrorQueue="error"
                       NumberOfWorkerThreads="1"
                       MaxRetries="5"/>
</configuration>

Configuring FLR through IProvideConfiguration

6-pre NServiceBus
class ProvideConfiguration : IProvideConfiguration<TransportConfig>
{
    public TransportConfig GetConfiguration()
    {
        return new TransportConfig
        {
            MaxRetries = 2
        };
    }
}
4.x - 5.x NServiceBus
class ProvideConfiguration : IProvideConfiguration<TransportConfig>
{
    public TransportConfig GetConfiguration()
    {
        return new TransportConfig
        {
            MaxRetries = 2
        };
    }
}

Configuring FLR through ConfigurationSource

6-pre NServiceBus
public class ConfigurationSource : IConfigurationSource
{
    public T GetConfiguration<T>() where T : class, new()
    {
        //To Provide FLR Config
        if (typeof(T) == typeof(TransportConfig))
        {
            TransportConfig flrConfig = new TransportConfig
            {
                MaxRetries = 2
            };

            return flrConfig as T;
        }

        // To in app.config for other sections not defined in this method, otherwise return null.
        return ConfigurationManager.GetSection(typeof(T).Name) as T;
    }
}
4.x - 5.x NServiceBus
public class ConfigurationSource : IConfigurationSource
{
    public T GetConfiguration<T>() where T : class, new()
    {
        //To Provide FLR Config
        if (typeof(T) == typeof(TransportConfig))
        {
            TransportConfig flrConfig = new TransportConfig
            {
                MaxRetries = 2
            };

            return flrConfig as T;
        }

        // To in app.config for other sections not defined in this method, otherwise return null.
        return ConfigurationManager.GetSection(typeof(T).Name) as T;
    }
}
3.x NServiceBus
public class ConfigurationSource : IConfigurationSource
{
    public T GetConfiguration<T>() where T : class, new()
    {
        //To Provide FLR Config
        if (typeof(T) == typeof(MsmqTransportConfig))
        {
            MsmqTransportConfig flrConfig = new MsmqTransportConfig
            {
                MaxRetries = 2
            };

            return flrConfig as T;
        }

        // To in app.config for other sections not defined in this method, otherwise return null.
        return ConfigurationManager.GetSection(typeof(T).Name) as T;
    }
}
6-pre NServiceBus
endpointConfiguration.CustomConfigurationSource(new ConfigurationSource());
5.x NServiceBus
busConfiguration.CustomConfigurationSource(new ConfigurationSource());
3.x - 4.x NServiceBus
configure.CustomConfigurationSource(new ConfigurationSource());

Second Level Retries

SLR introduces another level of retry mechanism for messages that fail processing. SLR picks up the message and defers its delivery, by default first for 10 seconds, then 20, and lastly for 30 seconds, then returns it to the original worker queue.

For example, if there is a call to an web service in the handler, but the service goes down for five seconds just at that time. Without SLR, the message is retried instantly and sent to the error queue. With SLR, the message is instantly retried, deferred for 10 seconds, and then retried again. This way, when the Web Service is available the message is processed just fine.

Retrying messages for extended periods of time would hide failures from operators, thus preventing them from taking manual action to honor their Service Level Agreements. To avoid this, NServiceBus will make sure that no message is retried for more than 24 hours before being sent the error queue.

Transport transaction requirements

The SLR mechanism is implemented by rolling back the transport transaction and scheduling the message for delayed-delivery. Aborting the receive operation when transactions are turned off would result in a message loss. Therefore SLR cannot be used when transport transactions are disabled.

Configuring SLR using app.config

To configure SLR, enable its configuration section:

6-pre NServiceBus
<configSections>
  <section name="SecondLevelRetriesConfig"
           type="NServiceBus.Config.SecondLevelRetriesConfig, NServiceBus.Core"/>
  </configSections>
<SecondLevelRetriesConfig Enabled="true"
                          TimeIncrease="00:00:10"
                          NumberOfRetries="3" />
<configSections>
  <section name="SecondLevelRetriesConfig"
           type="NServiceBus.Config.SecondLevelRetriesConfig, NServiceBus.Core"/>
  </configSections>
<SecondLevelRetriesConfig Enabled="true"
                          TimeIncrease="00:00:10"
                          NumberOfRetries="3" />
  • Enabled: Turns the feature on and off. Default: true.
  • TimeIncrease: A time span after which the time between retries increases. Default: 10 seconds (00:00:10).
  • NumberOfRetries: Number of times SLR kicks in. Default: 3.

Configuration SLR through IProvideConfiguration

6-pre NServiceBus
class ProvideConfiguration : IProvideConfiguration<SecondLevelRetriesConfig>
{
    public SecondLevelRetriesConfig GetConfiguration()
    {
        return new SecondLevelRetriesConfig
        {
            Enabled = true,
            NumberOfRetries = 2,
            TimeIncrease = TimeSpan.FromSeconds(10)
        };
    }
}
4.x - 5.x NServiceBus
class ProvideConfiguration : IProvideConfiguration<SecondLevelRetriesConfig>
{
    public SecondLevelRetriesConfig GetConfiguration()
    {
        return new SecondLevelRetriesConfig
        {
            Enabled = true,
            NumberOfRetries = 2,
            TimeIncrease = TimeSpan.FromSeconds(10)
        };
    }
}

Configuring SLR through ConfigurationSource

6-pre NServiceBus
public class ConfigurationSource : IConfigurationSource
{
    public T GetConfiguration<T>() where T : class, new()
    {
        // To provide SLR Config
        if (typeof(T) == typeof(SecondLevelRetriesConfig))
        {
            SecondLevelRetriesConfig slrConfig = new SecondLevelRetriesConfig
            {
                Enabled = true,
                NumberOfRetries = 2, 
                TimeIncrease = TimeSpan.FromSeconds(10)
            };

            return slrConfig as T;
        }

        // To in app.config for other sections not defined in this method, otherwise return null.
        return ConfigurationManager.GetSection(typeof(T).Name) as T;
    }
}
public class ConfigurationSource : IConfigurationSource
{
    public T GetConfiguration<T>() where T : class, new()
    {
        // To provide SLR Config
        if (typeof(T) == typeof(SecondLevelRetriesConfig))
        {
            SecondLevelRetriesConfig slrConfig = new SecondLevelRetriesConfig
            {
                Enabled = true,
                NumberOfRetries = 2, 
                TimeIncrease = TimeSpan.FromSeconds(10)
            };

            return slrConfig as T;
        }

        // To in app.config for other sections not defined in this method, otherwise return null.
        return ConfigurationManager.GetSection(typeof(T).Name) as T;
    }
}
6-pre NServiceBus
endpointConfiguration.CustomConfigurationSource(new ConfigurationSource());
5.x NServiceBus
busConfiguration.CustomConfigurationSource(new ConfigurationSource());
3.x - 4.x NServiceBus
configure.CustomConfigurationSource(new ConfigurationSource());

Disabling SLR through code

6-pre NServiceBus
endpointConfiguration.DisableFeature<SecondLevelRetries>();
5.x NServiceBus
busConfiguration.DisableFeature<SecondLevelRetries>();
4.x NServiceBus
Configure.Features
    .Disable<NServiceBus.Features.SecondLevelRetries>();
3.x NServiceBus
configure.DisableSecondLevelRetries();

Custom Retry Policy

You can apply custom retry logic based on headers or timing in code.

Applying a custom policy

6-pre NServiceBus
SecondLevelRetriesSettings retriesSettings = endpointConfiguration.SecondLevelRetries();
retriesSettings.CustomRetryPolicy(MyCustomRetryPolicy);
5.x NServiceBus
SecondLevelRetriesSettings retriesSettings = busConfiguration.SecondLevelRetries();
retriesSettings.CustomRetryPolicy(MyCustomRetryPolicy);
4.x NServiceBus
Configure.Features.SecondLevelRetries(s => s.CustomRetryPolicy(MyCustomRetryPolicy));
3.x NServiceBus
SecondLevelRetries.RetryPolicy = MyCustomRetryPolicy;

Error Headers Helper

A Custom Policy has access to the raw message including both the retries handling headers and the error forwarding headers. Any of these headers can be used to control the retries for a message. In the following examples the helper class will provide access to a subset of the headers.

6-pre NServiceBus
static class ErrorsHeadersHelper
{

    internal static int NumberOfRetries(this IncomingMessage incomingMessage)
    {
        string value;
        if (incomingMessage.Headers.TryGetValue(Headers.Retries, out value))
        {
            return int.Parse(value);
        }
        return 0;
    }

    internal static string ExceptionType(this IncomingMessage incomingMessage)
    {
        return incomingMessage.Headers["NServiceBus.ExceptionInfo.ExceptionType"];
    }

}
static class ErrorsHeadersHelper
{

    internal static int NumberOfRetries(this TransportMessage transportMessage)
    {
        string value;
        if (transportMessage.Headers.TryGetValue(Headers.Retries, out value))
        {
            return int.Parse(value);
        }
        return 0;
    }

    internal static string ExceptionType(this TransportMessage transportMessage)
    {
        return transportMessage.Headers["NServiceBus.ExceptionInfo.ExceptionType"];
    }

}

Simple Policy

The following retry policy that will retry a message 3 times with a 5 second interval.

6-pre NServiceBus
TimeSpan MyCustomRetryPolicy(IncomingMessage incomingMessage)
{
    // retry max 3 times
    if (incomingMessage.NumberOfRetries() >= 3)
    {
        // sending back a TimeSpan.MinValue tells the 
        // SecondLevelRetry not to retry this message
        return TimeSpan.MinValue;
    }

    return TimeSpan.FromSeconds(5);
}
TimeSpan MyCustomRetryPolicy(TransportMessage transportMessage)
{
    // retry max 3 times
    if (transportMessage.NumberOfRetries() >= 3)
    {
        // sending back a TimeSpan.MinValue tells the 
        // SecondLevelRetry not to retry this message
        return TimeSpan.MinValue;
    }

    return TimeSpan.FromSeconds(5);
}

Exception based Policy

The following retry policy extends the previous policy with a custom handling logic for a specific exception.

6-pre NServiceBus
TimeSpan MyCustomRetryPolicy(IncomingMessage incomingMessage)
{
    if (incomingMessage.ExceptionType() == typeof(MyBusinessException).FullName)
    {
        // Do not retry for MyBusinessException
        return TimeSpan.MinValue;
    }

    if (incomingMessage.NumberOfRetries() >= 3)
    {
        return TimeSpan.MinValue;
    }

    return TimeSpan.FromSeconds(5);
}
TimeSpan MyCustomRetryPolicy(TransportMessage transportMessage)
{
    if (transportMessage.ExceptionType() == typeof(MyBusinessException).FullName)
    {
        // Do not retry for MyBusinessException
        return TimeSpan.MinValue;
    }

    if (transportMessage.NumberOfRetries() >= 3)
    {
        return TimeSpan.MinValue;
    }

    return TimeSpan.FromSeconds(5);
}

Total number of possible retries

The total number of possible retries can be calculated with the following formula

Total Attempts = (FLR:MaxRetries) * (SLR:NumberOfRetries + 1)

So for example given a variety of FLR and SLR here are the resultant possible attempts.

FLR:MaxRetries SLR:NumberOfRetries Total possible attempts
1 1 2
1 2 3
1 3 4
2 1 4
3 1 6
2 2 6
In Versions 6 and higher, the configuration of the FLR mechanism will have no effect on how many times a deferred message is dispatched when an exception is thrown. SLR will retry the message for the number of times specified in its configuration.

Retry Logging

Given the following configuration:

  • FLR MaxRetries: 3
  • SLR NumberOfRetries: 2

and a Handler that both throws an exception and logs the current count of attempts, the output in the log will be:

5.x - 6.x NServiceBus
Handler - Attempt 1
Info. TransportReceiver. Exception included. Text: Failed to process message.

Handler - Attempt 2
Info. TransportReceiver. Exception included. Text: Failed to process message.

Handler - Attempt 3
Info. TransportReceiver. Exception included. Text: Failed to process message.
Warn. FaultManager. Exception omitted. Text: Message with 'messageId' id has failed FLR and will be handed over to SLR for retry attempt 1.

Handler - Attempt 4
Info. TransportReceiver. Exception included. Text: Failed to process message.

Handler - Attempt 5
Info. TransportReceiver. Exception included. Text: Failed to process message.

Handler - Attempt 6
Info. TransportReceiver. Exception included. Failed to process message.
Warn. FaultManager. Exception omitted. Text: Message with 'messageId' id has failed FLR and will be handed over to SLR for retry attempt 2.

Handler - Attempt 7
Info. TransportReceiver. Exception included. Text: Failed to process message.

Handler - Attempt 8
Info. TransportReceiver. Exception included. Text: Failed to process message.

Handler - Attempt 9
Info. TransportReceiver. Exception included. Text: Failed to process message.
Error. FaultManager. Exception omitted. Text: SLR has failed to resolve the issue with message messageId and will be forwarded to the error queue at error@machine.
4.x NServiceBus
Handler - Attempt 1
Info. TransportReceiver. Exception included. Text: Failed to process message.

Handler - Attempt 2
Info. TransportReceiver. Exception included. Text: Failed to process message.

Handler - Attempt 3
Info. TransportReceiver. Exception included. Text: Failed to process message.
Warn. FaultManager. Exception omitted. Text: Message with 'messageId' id has failed FLR and will be handed over to SLR for retry attempt 1.

Handler - Attempt 4
Info. TransportReceiver. Exception included. Text: Failed to process message.

Handler - Attempt 5
Info. TransportReceiver. Exception included. Text: Failed to process message.

Handler - Attempt 6
Info. TransportReceiver. Exception included. Text: Failed to process message.
Warn. FaultManager. Exception omitted. Text: Message with 'messageId' id has failed FLR and will be handed over to SLR for retry attempt 2.

Handler - Attempt 7
Type. TransportReceiver. Exception included. Text: Failed to process message.

Handler - Attempt 8
Info. TransportReceiver. Exception included. Text: Failed to process message.

Handler - Attempt 9
Info. TransportReceiver. Exception included. Text: Failed to process message.
Warn. FaultManager. Exception omitted. Text: Message with 'messageId' id has failed FLR and will be handed over to SLR for retry attempt 3.
3.x NServiceBus
Handler - Attempt 1
Warn. UnicastBus. Exception included. Text: MyMessageHandler failed handling message.
Warn. TransactionalTransport. Exception included. Text: Failed raising 'transport message received' event for message with ID=messageId

Handler - Attempt 2
Warn. UnicastBus. Exception included. Text: MyMessageHandler failed handling message.
Warn. TransactionalTransport. Exception included. Text: Failed raising 'transport message received' event for message with ID=messageId.

Handler - Attempt 3
Warn. UnicastBus. Exception included. Text: MyMessageHandler failed handling message.
Warn. TransactionalTransport. Exception included. Text: Failed raising 'transport message received' event for message with ID=messageId.
Error. TransactionalTransport. Exception omitted. Text: Message has failed the maximum number of times allowed, ID=messageId.

Handler - Attempt 4
Warn. UnicastBus. Exception included. Text: MyMessageHandler failed handling message.
Warn. TransactionalTransport. Exception included. Text: Failed raising 'transport message received' event for message with ID=messageId.

Handler - Attempt 5
Warn. UnicastBus. Exception included. Text: MyMessageHandler failed handling message.
Warn. TransactionalTransport. Exception included. Text: Failed raising 'transport message received' event for message with ID=messageId.

Handler - Attempt 6
Warn. UnicastBus. Exception included. Text: MyMessageHandler failed handling message.
Warn. TransactionalTransport. Exception included. Text: Failed raising 'transport message received' event for message with ID=messageId.
Error. TransactionalTransport. Exception omitted. Text: Message has failed the maximum number of times allowed, ID=messageId.

Handler - Attempt 7
Warn. UnicastBus. Exception included. Text: MyMessageHandler failed handling message.
Warn. TransactionalTransport. Exception included. Text: Failed raising 'transport message received' event for message with ID=messageId.

Handler - Attempt 8
Warn. UnicastBus. Exception included. Text: MyMessageHandler failed handling message. 
Warn. TransactionalTransport. Exception included. Text: Failed raising 'transport message received' event for message with ID=messageId.

Handler - Attempt 9
Warn. UnicastBus. Exception included. Text: MyMessageHandler failed handling message.
Warn. TransactionalTransport. Exception included. Text: Failed raising 'transport message received' event for message with ID=messageId.
Error. TransactionalTransport. Exception omitted. Text: Message has failed the maximum number of times allowed, ID=messageId.
Info.  SecondLevelRetries. Exception omitted. Text: Send message to error queue, error@machine

Note that in some cases a log entry contains the exception (Exception included) and in some cases it is omitted (Exception omitted)

Samples


Last modified 2016-04-26 12:53:13Z