NServiceBus is designed for scalability and reliability, but to take advantage of these features, you need to deploy it in a Windows Failover Cluster. Unfortunately, information on how to do this effectively is, as yet, incomplete and scattered. This article describes the process for deploying NServiceBus in a failover cluster. This article does not cover the generic setup of a failover cluster. There are other, better resources for that, such as Creating a Cluster in Windows Server 2008. The focus here is the setup related to NServiceBus.
A simple setup for scalability and reliability includes at least two servers in a failover cluster, and two additional servers for worker endpoints. The failover cluster servers run the following:
TimeoutManager, if you require one to support Sagas.
Stopmethod, as the
Startmethod is called when the service starts, and the Stop method is called when the service stops (and is transferred to the other cluster node).
The two other servers are worker nodes, and contain only endpoints with simple message handlers. The endpoints request work from the clustered distributors, do the work, and then ask for more.
While technically it shouldn't matter from which clustered server you set up, generally it is more reliable to set up everything from whichever server currently holds the Quorum disk. Find the server that has it (it moves around when the server holding it restarts), and open up Failover Cluster Management under Administrative Tools.
Set up a clustered DTC access point:
Configure DTC for NServiceBus:
Administrative Tools - Component Services, expand
Component Services - Computers - My Computer - Distributed Transaction Coordinator.
Set up a MSMQ Cluster Group. Cluster group is a group of resources that have a unique DNS name and can be addressed externally like a computer.
For more information, see http://technet.microsoft.com/en-us/library/cc753575.aspx
For NServiceBus endpoint destination, we address the queues by the MSMQ cluster group's name, where we will later add all the rest of our clustered resources. In non-cluster terms, we typically add the machine name to address the queue, i.e.
queue@MachineName. In cluster terms we address it by queue@MSMQ Network name.
In Failover Cluster Management, from the server with Quorum:
This should give you a clustered MSMQ instance. Click the instance under Services and Applications to see the summary, which contains the Server Name, Storage, and MSMQ instance. Right click the MSMQ instance, select Properties, and the Dependencies tab. Make sure that it contains dependencies for the MSMQ Network Name AND IP Address AND Storage.
View the MSMQ MMC snap-in by right clicking the MSMQ cluster in the left pane and selecting Manage MSMQ, which opens the Computer Management tool geared toward the clustered instance.
Go to MSMQ by expanding Services and Applications - Message Queuing.
Keep in mind that this only seems to work if you're viewing Failover Cluster Management from the server where the MSMQ Network Name currently resides. If you are on Server A and you try to manage MSMQ on a MSMQ Network residing on Server B, you won't see Message Queuing in the Computer Management window.
Try swapping the MSMQ Network Name back and forth between nodes a few times. It's best to make sure that everything is working properly now before continuing.
The "cluster name" is a Network Name created for the cluster as part of the core Cluster Group - a group created by default for each cluster. The core cluster group is different than the MSMQ cluster group and it has a Different network name. One of the most common confusions while using MSMQ on a cluster is using the Cluster Name in the client instead of the MSMQ Network Name.
In this picture:
Before you can cluster the NServiceBus.Host.exe processes, you need to install them as services on all clustered nodes.
Copy the Distributor binary as many times as you have logical queues, and then configure each one as described in the NServiceBus Distributor page. To keep everything straight, the queues are named according to the following convention:
A review of how the distributor works: Other endpoints send messages to the queue specified by the Distributor Data Bus, where they accumulate if no worker is running. When a worker comes online, it sends a ReadyMessage to the queue specified by the Distributor Control Bus. If there is work to be done, the distributor sends an item from the Data Bus to the endpoint's local input queue, otherwise, it files it in the Distributor Storage Queue so that when work does come in, the distributor knows who is ready to process it.
Using this naming convention, all of your applications' queues are grouped together, and all of the queues for a logical QueueName are also grouped together in alphabetical order.
Install each distributor from the command line:
NServiceBus.Host.exe /install /serviceName:Distributor.ProjectName.QueueName /displayName:Distributor.ProjectName.QueueName /description:Distributor.ProjectName.QueueName /userName:DOMAIN\us /password:thepassword NServiceBus.Production NServiceBus.Distributor
It's easier to set the service name, display name, and description to be the same. It helps when trying to start and stop things from a NET START/STOP command and when viewing them in the multiple graphical tools. Starting each one with Distributor puts them all together alphabetically in the Services MMC snap-in.
Don't forget the
NServiceBus.Production at the end, which sets the profile for the NServiceBus generic host, as described in the Generic Host page and the
NServiceBus.Distributor which sets up the host in distributor mode.
Do not try starting the services. If you do, they will run in the scope of the local server node, and will attempt to create their queues there.
Now, add each distributor to the cluster:
With your distributors installed, you can repeat the same procedure for any Commander applications, if you have them. You may want to skip the Commander application for now, however. It's sometimes easier to get everything else installed first as a stable system that reacts to events but has no stimulus, and then add the Commander application which will get the whole system in motion.
Again, try swapping the cluster back and forth, to make sure it can move freely between the cluster nodes.
## Setting up the workers
Set up your worker processes on both worker servers (not the cluster nodes!) as services, as you did for the distributors. But instead of using NServiceBus.Distributor, use NServiceBus.Worker profile instead.
Configure the workers'
UnicastBusConfig sections to point to the distributor's data and control queues as described on the Distributor Page under Routing with the Distributor.
With your distributors running in the cluster and your worker processes coming online, you should see the Storage queues for each process start to fill up. The more worker threads you have configured, the more messages you can expect to see in each Storage queue.
While in development, your endpoint configurations probably don't have any @ symbols in them, in production you have to change all of them to point to the Data Bus queue on the cluster, i.e., for application MyApp and logical queue MyQueue, your worker config looks like this:
<?xml version="1.0" encoding="utf-8" ?> <configuration> <configSections> <!-- Other sections go here --> <section name="MasterNodeConfig" type="NServiceBus.Config.MasterNodeConfig, NServiceBus.Core" /> <section name="UnicastBusConfig" type="NServiceBus.Config.UnicastBusConfig, NServiceBus.Core"/> </configSections> <!-- Other config options go here --> <MasterNodeConfig Node="MachineWhereDistributorRuns"/> <UnicastBusConfig DistributorControlAddress="distributorControlBus@MsmqNetworkName" DistributorDataAddress="distributorDataBus@MsmqNetworkName"> <MessageEndpointMappings> <!-- regular entries --> </MessageEndpointMappings> </UnicastBusConfig> </configuration>
This article shows how to set up a Windows Failover Cluster and two worker node servers to run a scalable, maintainable, and reliable NServiceBus application infrastructure.
(This article has some minor updates to the originally written and published article by David Boike.)
Last modified 2014-11-18 01:59:36Z