Alarms

Analytics Alarms are triggered for many reasons, for example, if your ArcGIS Online credits get below a minimum number of credits, or if storage on a critical server is getting slow, or a server has a low response time.

All types of alarms have default settings that describe the criteria that will cause the alarm to trigger, as well as which recipients will be notified when the alarm triggers, and a script that can be run when the alarm triggers. These default settings apply to all the alarms of that type, all Maximum CPU Usage alarms. Alternatively, you can configure individual settings for that type of alarm on each resource and the individual alarm settings then override the default settings.

In some alarms, you can specify iterations - how many times the alarm criteria must occur sequentially before an alarm is triggered.

All triggered alarms are recorded and displayed in various Analytics panels in Status and Trends so that you can view them. In addition to viewing alarm information in the panels, you can also set up a list of people to receive alarm notifications. After you have the recipient list, you can configure individual alarms to notify one or more recipients directly via email or a text message to their phone (SMS). You can set up one or more recipients to receive all alarm notifications or be very specific about configuring which recipients get notifications about specific alarms and resources. For more information, see Initial Configuration.

To receive email notifications, you must first configure settings for the SMTP email server used to send emails.
To receive SMS alarm notifications to phones, you must set up a Twilio SMS account for Analytics to use when sending text notifications.

Within the panels that list alarms, you can pause an alarm, or run an immediate check on any alarm. Windows Performance Alarms that are located on a Hub or agent server cannot be checked immediately but need to be queued for checking during the next interval (usually less than 15 minutes). Paused alarms continue to be monitored and information about the resource is still collected and displayed in Analytics panels, but no notifications are sent and no scripts are run while the paused state is in effect.

Default and Individual Alarms

Alarms in Analytics have default settings that affect all the resources of the same type, that use that kind of alarm. You can change the default alarm settings for any set of resources, for example, all Portals, all Servers, or all Essentials instances. These alarms are referred to as default alarms.

In addition to default alarms, each individual resource has its own Alarms tab. These alarms are set on a single, specific resource. These alarms are referred to as individual alarms.

How Alarms Work

When an alarm triggers, the following events occur:

Alarms and Parent/Child Resources

Common Alarm Settings

Most alarms in Analytics have settings that apply to only that type of alarm but the following settings are found in all alarms:

Run a Script When an Alarm Triggers

You can run a Windows batch script when any alarm in Analytics triggers. For example, you could run a script to restart a service when a Request Failure alarm triggers on that service, or run a script to restart a server when a Server Unavailable alarm triggers.

All scripts run under the same Windows account that is used to run the Geocortex Core service. By default, this is the SYSTEM user account.

When the script runs, Analytics passes parameters into the batch script that provides information about the alarm that was triggered and the resource that the alarm belongs to.

You can reference the parameter values in the batch script using argument variables:

In the batch script, the parameters are passed in the following order:

  1. The Date the alarm triggered (%~1)
  2. The display name of the resource in Analytics (%~2)
  3. The type of resource (%~3)
  4. The external link to the resource if available (%~4)
  5. The type of alarm (%~5)
  6. The value of the alarm if available (%~7)
  7. The number of times the alarm has triggered consecutively (%~8)
  8. The configuration of the alarm if available (%~9)

Example Script

For example, you could create a Windows batch script to log your own alarms.

When Analytics runs this batch script, the output would be a file called alarm-log.txt and it would contain the following information:

Tuesday, 20 August 2019 17:43: An HttpFailure alarm triggered for resource 'ArcGIS Server Prod 1' (https://ags-prod1.domain.com/arcgis/rest/services)

Run a Script

  1. Write the script using the above parameters and save it to the following directory on the Analytics Hub server:

    [DataInstallDir]/Analytics.ResourceCollector/Scripts

  2. In Analytics, on the Alarms tab, click to open the alarm.

  3. In the Script to Run drop-down list, select the script to run, and then Save the alarm.

Alarm Icons

Alarm icons indicate the state of a resource.

This icon...

Indicates...

Notes

A critical error on the resource.

The resource is broken and not functioning.

A critical error on a parent resource.

A parent of the resource has a critical error and is not functioning.
For example, a Site will show this error if its Essentials instance is down.

When a resource stops functioning, this warning shows on all its child resources regardless of what the last known state of the child resource was.




The alarm has been paused.

A paused alarm means that notifications are not sent and scripts are not run. Alarm information continues to be gathered and displayed in Analytics.

Data is not yet collected on the resource.

Monitoring is disabled on this resource.

Either the resource is new and Analytics has not had time to collect data on it, or monitoring of this resource has been disabled.

A warning on the resource.

The resource has a problem of some kind and is not working optimally. For example, it is responding slower than it should.

A child of this resource has a critical error.

A child resource of this resource has a critical error. For example, an ArcGIS Server would show this error if one of its services had a critical error.
This type of warning cascades upwards only a single level in the resource hierarchy.

The resource has been stopped or disabled.

The resource cannot be monitored as it is not running.

A healthy resource.

The resource has no problems and is being monitored.

© 2019 Latitude Geographics Group Ltd. All Rights Reserved.

Documentation Version 1.5