Getting Started

ServiceControl Error instances

Component: ServiceControl

ServiceControl instances collect and analyze data about the endpoints that make up a system and the messages flowing between them. This data is exposed to ServiceInsight and ServicePulse by an HTTP API and is exposed for other uses by external integration events.

The ServiceControl HTTP API is designed for use by ServicePulse and ServiceInsight only and may change at any time. Use of this HTTP API for other purposes is discouraged.
In versions of ServiceControl prior to 4.13.0, saga audit plugin data can only be processed by the ServiceControl Error instance using the input queue. Starting with version 4.13.0, saga audit plugin data can also be processed by a ServiceControl audit instance using the audit queue. The latter approach is recommended.

All endpoints in the system should be configured to send a copy of every message that is processed to a central audit queue. A ServiceControl instance consumes the messages from the audit queue and makes them available for visualization in ServiceInsight. If required, the messages may also be forwarded to an audit log queue for further processing.

In ServiceControl version 4 and above, messages in the audit queue are consumed by one or more separate ServiceControl Audit instances. The ServiceControl Error instance is configured to aggregate data from all connected ServiceControl Audit instances.

All endpoints in the system should be configured to send failed messages to a central error queue after those messages have exhausted immediate and delayed retries. A ServiceControl instance consumes the messages from the error queue and makes them available for manual retries in ServicePulse and ServiceInsight. If required, the messages may also be forwarded to an error log queue for further processing.

An endpoint may also have plugins installed which collect and send data to a ServiceControl instance. The heartbeat plugin detects which endpoint instances are running and which are offline. The custom checks plugin sends user-defined health reports to ServiceControl on a regular schedule. The saga audit plugin enriches audit messages with the details of saga state changes, for visualization in ServiceInsight.

All ServiceControl instances publish external integration events which may be subscribed to by any endpoint.

All ServiceControl instances store data in an embedded database. Audit data is retained for 30 days. Failed message data is retained until the message is retried or manually deleted. These retention periods may be changed.

Each environment should have a single audit queue and a single error queue that all endpoints are configured to use. Each environment should have at least one ServiceControl instance that is connected to its audit and error queues. The planning documentation should be consulted before creating a new ServiceControl instance.

Self-monitoring via custom checks

ServiceControl includes some basic self-monitoring implemented as custom checks. These checks are reported in ServicePulse along with other custom checks.

MSMQ transactional dead letter queue

A machine running MSMQ has a single transactional dead letter queue. Messages that cannot be delivered to queues located on remote machines are eventually moved to the transactional dead letter queue. ServiceControl monitors the transactional dead letter queue on the machine it is installed on. The presence of messages in this queue may indicate problems delivering messages for retries.

Azure Service Bus staging dead letter queue

Every Azure Service Bus queue has an associated dead letter queue. When ServiceControl sends a message for retry, it uses a staging queue. ServiceControl monitors the dead letter queue associated with the staging queue. The presence of messages in the dead letter queue indicates problems delivering messages for retries.

Failed imports

When ServiceControl is unable to ingest an audit or error message, an error is logged and the message is stored separately. ServiceControl monitors these messages. For more information, see re-importing failed messages.

Error message ingestion process

When ServiceControl has difficulty connecting to the configured transport, the error message ingestion process is shut down for sixty seconds. These shutdowns are monitored. The time to wait before restarting the error ingestion process is controlled by the ServiceControl/TimeToRestartErrorIngestionAfterFailure setting.

Message database storage space

ServiceControl stores messages in an embedded database. If the drive containing the database runs out of storage space, message ingestion fails and the ServiceControl instance stops. This may cause instability in the database, even after storage space is increased. The remaining storage space on the drive is monitored. The check reports a failure if the drive has less than 20% remaining of its total capacity. This threshold is controlled by the ServiceControl/DataSpaceRemainingThreshold setting.

Critical message database storage space

This is similar to the Message database storage space check. However, in this case, if the drive containing the database has less than 5% remaining of its total capacity, message ingestion on the ServiceControl instance is stopped to prevent data loss, and a failure is reported. This threshold is controlled by the ServiceControl/MinimumStorageLeftRequiredForIngestion (for the error instance) and ServiceControl.Audit/MinimumStorageLeftRequiredForIngestion (for the audit instance) settings.

Related Articles

Last modified