Replacing an Audit instance

ServiceControl, which exists to serve the management of distributed systems, is itself a distributed system. As a result, pieces of the system can be upgraded and managed separately.

This document describes in general terms how to replace a ServiceControl Audit instance, and links to more specific information on how to accomplish these tasks for each potential deployment method.

See Replacing an Error Instance for similar guidance for Error instances.

Overview

ServiceControl Audit instances store audit data for a configured period of time, after which expired audit data is removed. Using the ServiceControl remotes feature, multiple audit instances can store a portion of the overall audit data (sharding) which is queried in a scatter-gather fashion.

Using this capability, an Audit instance that can't be upgraded can be replaced without downtime. The process follows these steps:

Add a new audit instance as a remote
Disable audit queue ingestion on the old audit instance
Decommission the old audit instance when all audit information is expired

For scenarios where retaining audit message data is not required (e.g. transient data that does not merit effort to retain), this process is not necessary -- the audit instance can simply be deleted and recreated with the same name.

Initial state

Before doing anything, the deployment looks like this:

graph TD endpoints -- send errors to --> errorQ[Error Queue] endpoints -- send audits to --> auditQ[Audit Queue] errorQ -- ingested by --> sc[ServiceControl Error] auditQ -- ingested by --> sca[Original ServiceControl audit] sc -. connected to .-> sca sp[ServicePulse] -. connected to .-> sc si[ServiceInsight] -. connected to .-> sc classDef Endpoints fill:#00A3C4,stroke:#00729C,color:#FFFFFF classDef ServiceInsight fill:#878CAA,stroke:#585D80,color:#FFFFFF classDef ServicePulse fill:#409393,stroke:#205B5D,color:#FFFFFF classDef ServiceControlError fill:#A84198,stroke:#92117E,color:#FFFFFF,stroke-width:4px classDef ServiceControlRemote fill:#A84198,stroke:#92117E,color:#FFFFFF class endpoints Endpoints class si ServiceInsight class sp ServicePulse class sc ServiceControlError class sca ServiceControlRemote

Add a new audit instance

The first step is to create a new audit instance:

Then, the new Audit instance must be added to the Error instance's remotes collection:

After this step the installation looks like this:

graph TD endpoints -- send errors to --> errorQ[Error Queue] endpoints -- send audits to --> auditQ[Audit Queue] errorQ -- ingested by --> sc[ServiceControl Error] auditQ -- ingested by --> sca[Original ServiceControl audit] auditQ -- ingested by --> sca2[New ServiceControl audit] sc -. connected to .-> sca sc -. connected to .-> sca2 sp[ServicePulse] -. connected to .-> sc si[ServiceInsight] -. connected to .-> sc classDef Endpoints fill:#00A3C4,stroke:#00729C,color:#FFFFFF classDef ServiceInsight fill:#878CAA,stroke:#585D80,color:#FFFFFF classDef ServicePulse fill:#409393,stroke:#205B5D,color:#FFFFFF classDef ServiceControlError fill:#A84198,stroke:#92117E,color:#FFFFFF,stroke-width:4px classDef ServiceControlRemote fill:#A84198,stroke:#92117E,color:#FFFFFF class endpoints Endpoints class si ServiceInsight class sp ServicePulse class sc ServiceControlError class sca,sca2 ServiceControlRemote

Although both ServiceControl Audit instances ingest messages from the audit queue, each message only ends up in a single instance. The ServiceControl Error instance queries both Audit instances transparently.

Disable audit queue ingestion on the old instance

Now that the new audit instance exists, the old audit instance must be configured so that it does not ingest any new audit data from the audit queue. This will make the old audit instance effectively read-only. The only reason it is not fully read-only is that old audit data that the old instance will continue to delete expired audit data that has passed the audit retention period.

After this step the installation looks like this:

graph TD endpoints -- send errors to --> errorQ[Error Queue] endpoints -- send audits to --> auditQ[Audit Queue] errorQ -- ingested by --> sc[ServiceControl error] auditQ -- ingested by --> sca2[New ServiceControl audit] sc -. connected to .-> sca[Original ServiceControl audit] sc -. connected to .-> sca2 sp[ServicePulse] -. connected to .-> sc si[ServiceInsight] -. connected to .-> sc classDef Endpoints fill:#00A3C4,stroke:#00729C,color:#FFFFFF classDef ServiceInsight fill:#878CAA,stroke:#585D80,color:#FFFFFF classDef ServicePulse fill:#409393,stroke:#205B5D,color:#FFFFFF classDef ServiceControlError fill:#A84198,stroke:#92117E,color:#FFFFFF,stroke-width:4px classDef ServiceControlRemote fill:#A84198,stroke:#92117E,color:#FFFFFF class endpoints Endpoints class si ServiceInsight class sp ServicePulse class sc ServiceControlError class sca,sca2 ServiceControlRemote

The ServiceControl Error instance continues to query both instances but the original Audit instance no longer reads new messages.

Decommission the old audit instance, when it is empty

As the original audit instance is no longer ingesting messages, it will be empty after the audit retention period has elapsed and can be removed. The following steps describe how to determine when an audit instance is empty:

Access the database directly
Launch RavenDB Management Studio with a browser.
If the instance is using RavenDB 3.5 for persistence, go to the <system> database. If the instance is using RavenDB 5, go to the audit database.
Check the documents count in the ProcessedMessages collection.

When the ProcessedMessages collection is empty, the audit instance can be decomissioned:

After this step the installation looks like this:

graph TD endpoints -- send errors to --> errorQ[Error Queue] endpoints -- send audits to --> auditQ[Audit Queue] errorQ -- ingested by --> sc[ServiceControl error] auditQ -- ingested by --> sca2[New ServiceControl audit] sc -. connected to .-> sca2 sp[ServicePulse] -. connected to .-> sc si[ServiceInsight] -. connected to .-> sc classDef Endpoints fill:#00A3C4,stroke:#00729C,color:#FFFFFF classDef ServiceInsight fill:#878CAA,stroke:#585D80,color:#FFFFFF classDef ServicePulse fill:#409393,stroke:#205B5D,color:#FFFFFF classDef ServiceControlError fill:#A84198,stroke:#92117E,color:#FFFFFF,stroke-width:4px classDef ServiceControlRemote fill:#A84198,stroke:#92117E,color:#FFFFFF class endpoints Endpoints class si ServiceInsight class sp ServicePulse class sc ServiceControlError class sca2 ServiceControlRemote

At this point, the old Audit instance has been completely replaced by the new instance.

Overview

Initial state

Add a new audit instance

Disable audit queue ingestion on the old instance

Decommission the old audit instance, when it is empty

Related Articles

In this article