You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AutoAlarm provides out-of-the-box monitoring with sensible defaults while allowing full customization through resource tags. In addition to default alarms,
AutoAlarm allows operations teams to customize alarms and monitoring when necessary using a simple tagging strategy.
Managing AutoAlarm
To enabled AutoAlarm for a service instance, tag an instance as follows:
Tag Key
Tag Value
Result
autoalarm:enabled
true
Enabled AutoAlarm Alarm Management for a resource and creates all default alarms - *Required to use AutoAlarm
autoalarm:enabled
false
Deletes all AutoAlarm managed alarms (both default and custom alarms). Alternatively, the tag can simply be removed
Overriding Default Alarm Values with Tags
Each alarm configuration supported by AutoAlarm has a default configuration. Furthermore, each service has alarms that are automatically included by default
any time the autoalarm:enabled tag is set to true. In scenarios where a user needs to change the default values on the default alarms or enable alarms
that are not included by default, these alarms can be configured using a tagging schema with specific tag keys and values as defined below:
Tag Value Structure
Each tag value consists of 8 parameters separated by /:
Supported Services and Default Alarm Configurations
Threshold values that contain '-' are undefined and will default to not creating the alarm for that threshold (Warning or Critical). If neither the warning and critical
threshold values are provided in the tag value when setting the tag on the resource, no alarm will be created.
When setting up non-default alarms with tags, you must provide at least one of the first two values (warning and critical
thresholds) for the tag to function correctly if the default thresholds do not contain values. Otherwise, these alarms
will not be created.
Prometheus alarms will only pull Warning and critical thresholds and periods from the tags. All other values are specific
to CloudWatch alarms and are not used in Prometheus alarms.
Alarm Types
Static Threshold Alarms
Trigger when metrics cross fixed values
Best for metrics with consistent, predictable ranges
Anomaly Detection Alarms
Trigger when metrics deviate from historical patterns
Use tag names containing 'anomaly'
Threshold values represent standard deviations from the baseline
Supported Tag Values
Threshold Configuration
Parameter
Static Threshold Alarms
Anomaly Detection Alarms
Warning Threshold
Numeric value that triggers warning (e.g., 80 for 80% CPU)
Number of standard deviations from baseline (e.g., 2)
Critical Threshold
Numeric value that triggers critical alert (e.g., 95 for 95% CPU)
Number of standard deviations from baseline (e.g., 3)
Number of breaching data points required to trigger alarm
Any positive integer. Must be Equal to or less than evaluation periods
2
Evaluation Periods
Total evaluation periods to consider
Any positive integer
3
Understanding Datapoints vs Periods
Scenario
Period
Datapoints to Alarm
Number of Periods
Result
Quick Response
60s
1
1
Alarm triggers after 1 breach in 1 minute
Sustained Issue
300s
2
3
Alarm triggers when 2 out of 3 five-minute periods breach
Highly Tolerant
60s
5
10
Alarm triggers when 5 out of 10 one-minute periods breach
Statistic:
Note: AWS has limitations on the acceptable characters for the statistic value. you cannot use spaces, '%', or '(/)'.
All stats must be the statistic followed by a number or two numbers separated by a colon. For example, p95 or TM2:98.
Value below which a percentage of data falls (e.g., p95 = 95% of data is below this value)
Trimmed Mean
tm90, TM2:98, TM150:1000
Mean after excluding values outside boundaries. Can use percentages or absolute values
Interquartile Mean
IQM
Trimmed mean of middle 50% of values (equivalent to TM25:75)
Winsorized Mean
wm98, WM10:90
Mean with outliers capped to boundary values instead of excluded
Percentile Rank
PR:300, PR100:2000
Percentage of values meeting a threshold (exclusive lower, inclusive upper)
Trimmed Count
tc90, TC0.005:0.030
Number of data points within trimmed mean boundaries
Trimmed Sum
ts90, TS80:
Sum of data points within trimmed mean boundaries (TM × TC)
Missing Data Treatment
Tag Value
Behavior
missing
Data point is missing
ignore
Current alarm state maintained
breaching
Treated as threshold breach
notBreaching
Treated as within threshold
Valid Comparison Operators
*Note: Ensure that a valid Comparison Operator is used between static threshold and anomaly alarms.
Alarm Type
Comparison Operator
Description
Static Threshold
GreaterThanOrEqualToThreshold
Alarm when metric ≥ threshold
Static Threshold
GreaterThanThreshold
Alarm when metric > threshold
Static Threshold
LessThanThreshold
Alarm when metric < threshold
Static Threshold
LessThanOrEqualToThreshold
Alarm when metric ≤ threshold
Anomaly Detection
GreaterThanUpperThreshold
Alarm when metric exceeds upper band
Anomaly Detection
LessThanLowerOrGreaterThanUpperThreshold
Alarm when metric is outside the band (either direction)
Anomaly Detection
LessThanLowerThreshold
Alarm when metric falls below lower band
Using the Nullish Character ("-") and Implicit Values in AutoAlarm
AutoAlarm supports shorthand notation to simplify tag configuration:
Key Concepts
Nullish Character (-): Disables alarm creation for warning or critical thresholds when used in place of a value.
Implicit Values: Omit values you don't want to change from the defaults.
Use empty positions (//) to skip to later parameters while keeping defaults for earlier ones.
Examples
*Note: When using implicit values, ensure that each implicit parameter leading up to the custom parameter is properly seperated by a /. See Tag Value Structure.
Empty positions between slashes (//) preserve the default values for those parameters while allowing you to customize later parameters.
Warning alarm disabled with -, critical alarm customized
autoalarm:memory
-/-
Both alarms disabled (useful for overriding default Alarms)
autoalarm:4xx-errors
//3/Minimum///notBreaching
Only period (3) and statistic (Minimum) customized, uses defaults for thresholds
autoalarm:5xx-errors
-/73////3
Warning disabled, critical threshold=73, datapoints=7, other values from defaults
autoalarm:4xx-errors-anomaly
3/-/
Warning threshold=3, critical alarm disabled, remaining values from defaults
autoalarm:network-in-anomaly
/
Creates a non-default alarm with default values. Useful shorthand for deploying non default alarms with defaults.
ReAlarm Tag Configuration and Behavior:
Overview
The ReAlarm function is an AWS Lambda-based handler designed to monitor and reset CloudWatch alarms that are in an
"ALARM" state. It is an optional part of the AutoAlarm system, aimed at ensuring alarms are not missed or ignored.
Default Values
By default, the ReAlarm function is enabled. When ReAlarm is enabled, it runs on a default schedule of every 120 minutes.
Configure ReAlarm Behavior with Tags
ReAlarm's behavior can be configured on a per-alarm basis using tags.
Customize ReAlarm Schedule:
The ReAlarm schedule by default runs every 120 minutes.
ReAlarm can be customized to run at different intervals on a per-alarm basis by setting the autoalarm:re-alarm-minutes
tag to a whole number value.
Disable ReAlarm for a Resource:
Alarms can be tagged with autoalarm:re-alarm-enabled=false to exclude them from the ReAlarm process.
When this tag is present on an alarm, ReAlarm will skip resetting it, regardless of its state.
This is useful for alarms that should be managed manually or have specific conditions that should not trigger ReAlarm.
Example
Tag
Value
Description
autoalarm:re-alarm-enabled
false
Disable ReAlarm for this alarm
autoalarm:re-alarm-minutes
30, 60, 240
Custom reset interval (minutes)
Special Note:
ReAlarm is hardcoded to NOT reset alarms associated with AutoScaling actions. This is to prevent the function from
interfering with scaling operations.
Additional References:
For Deployment and install instructions, please see DEPLOYMENT.md
For a more thorough breakdown of Design and Architecture, please see ARCHITECTURE.MD.