This example demonstrates how to set up CloudWatch alarms and SNS notifications to monitor runner health and get notified of issues.
- How to create CloudWatch alarms for failed runner starts
- How to set up email notifications for failed runner image builds
- How to monitor runner health and get notified of issues
There are two critical things to monitor:
-
Failed runner starts: When runners fail to start, jobs may sit and wait. Use
GitHubRunners.metric_failed()to get a metric for the number of failed runner starts and create an alarm. -
Failed runner image builds: Runner images are rebuilt every week by default. Failed builds mean you'll get stuck with out-of-date software, which may lead to security vulnerabilities or slower runner start-ups. Use
GitHubRunners.failed_image_builds_topic()to get an SNS topic that gets notified of failed runner image builds.
Other useful metrics to track:
GitHubRunners.metric_started()- number of runners startedGitHubRunners.metric_stopped()- number of runners stoppedGitHubRunners.metric_running()- current number of running runnersGitHubRunners.metric_job_completed()- number of completed jobs broken down by labels and job successGitHubRunners.metric_time()- total time a runner is running (includes the overhead of starting the runner)
After deploying:
- CloudWatch will monitor your runners and trigger alarms when failures occur
- You'll receive email notifications when runner image builds fail
- Replace
your-email@example.comwith your actual email address before deploying
- Important: Update the email address in the code (
your-email@example.com) to your actual email - Deploy the stack:
cdk deploy - Check your email and confirm the SNS subscription (you'll receive a confirmation email)
- Follow the setup instructions in the main README.md to configure GitHub integration
- Optionally, uncomment the code to add email notifications to the failed runners alarm as well
- Use the
codebuildlabel in your workflows