ServiceControl, which exists to serve the management of distributed systems, is itself a distributed system. As a result, pieces of the system can be upgraded and managed separately.
This document describes in general terms how to replace a ServiceControl Audit instance, and links to more specific information on how to accomplish these tasks for each potential deployment method.
See Replacing an Error Instance for similar guidance for Error instances.
Overview
ServiceControl Audit instances store audit data for a configured period of time, after which expired audit data is removed. Using the ServiceControl remotes feature, multiple audit instances can store a portion of the overall audit data (sharding) which is queried in a scatter-gather fashion.
Using this capability, an Audit instance that can't be upgraded can be replaced without downtime. The process follows these steps:
- Add a new audit instance as a remote
- Disable audit queue ingestion on the old audit instance
- Decommission the old audit instance when all audit information is expired
For scenarios where retaining audit message data is not required (e.g. transient data that does not merit effort to retain), this process is not necessary -- the audit instance can simply be deleted and recreated with the same name.
Initial state
Before doing anything, the deployment looks like this:
Add a new audit instance
The first step is to create a new audit instance, and add it to the Error instance's remotes collection:
- Adding a new audit instance with ServiceControl Management
- Adding a new audit instance with PowerShell
- Adding a new audit instance with Containers
Then, the new Audit instance must be added to the Error instance's remotes collection:
- Updating the remotes collection with ServiceControl Management
- Updating the remotes collection with PowerShell
- Updating the remotes collection with Containers
After this step the installation looks like this:
Although both ServiceControl Audit instances ingest messages from the audit queue, each message only ends up in a single instance. The ServiceControl Error instance queries both Audit instances transparently.
Disable audit queue ingestion on the old instance
Now that the new audit instance exists, the old audit instance must be configured so that it does not ingest any new audit data from the audit queue. This will make the old audit instance effectively read-only. The only reason it is not fully read-only is that old audit data that the old instance will continue to delete expired audit data that has passed the audit retention period.
- Disabling audit queue ingestion with ServiceControl Management
- Disabling audit queue ingestion with PowerShell
- Disabling audit queue ingestion with Containers
After this step the installation looks like this:
The ServiceControl Error instance continues to query both instances but the original Audit instance no longer reads new messages.
Decommission the old audit instance, when it is empty
As the original audit instance is no longer ingesting messages, it will be empty after the audit retention period has elapsed and can be removed. The following steps describe how to determine when an audit instance is empty:
- Access the database directly
- Launch RavenDB Management Studio with a browser.
- If the instance is using RavenDB 3.5 for persistence, go to the
database. If the instance is using RavenDB 5, go to the<system> audit
database. - Check the documents count in the
ProcessedMessages
collection.
When the ProcessedMessages
collection is empty, the audit instance can be decomissioned:
- Decommissioning the old audit instance using ServiceControl Management
- Decommissioning the old audit instance using PowerShell
- Decommissioning the old audit instance using Containers
After this step the installation looks like this:
At this point, the old Audit instance has been completely replaced by the new instance.