BetterStack: Resilient Website Monitoring System

Introduction

BetterStack is a resilient website monitoring system built to track service availability, measure response times, store health-check results, and surface incidents through a modern dashboard. The project is organized as a multi-part system with:

a Rust backend workspace for high-performance service components
a TurboRepo-based frontend and JavaScript service layer for dashboard, API, queue producers, and workers
PostgreSQL for persistent monitoring data
Redis Streams for decoupled job delivery between producers and workers

The system continuously checks whether registered websites are up or down, records each result as a tick, and presents the data through a frontend dashboard for users and operators.

Motivation

In June 2022, a major Cloudflare outage affected thousands of websites and services globally. A key lesson from such incidents is simple: if the monitoring platform depends on the same failing infrastructure, both the service and the monitoring system may go down together. In that situation:

No alerts
No notifications
No visibility

This project is motivated by the need to build a monitoring platform that remains useful during service disruption. Instead of relying on a single tightly coupled component, BetterStack uses a queue-based architecture with dedicated workers, persistent storage, and independent health checks to improve resilience and observability.

Problem Statement

Modern applications rely on cloud platforms, APIs, databases, and distributed infrastructure. When one service fails:

businesses lose revenue
users lose trust
engineers respond late without timely signals

Many teams depend on third-party monitoring platforms, but those systems can also fail or become unreachable. BetterStack addresses this by designing a monitoring workflow that is always collecting availability data, pushing checks asynchronously, and storing results for later alerting, analysis, and incident visibility.

Objectives

Monitor websites and service endpoints 24/7
Detect downtime and latency degradation quickly
Store health-check history for analysis
Support worker-based distributed monitoring
Separate data production from processing using Redis Streams
Provide a frontend dashboard for website, status, and incident visibility
Reduce single points of failure in the monitoring pipeline

System Overview

BetterStack currently contains two major implementation areas:

1. Rust Backend Workspace

Located in BetterStack_Rust/, this workspace contains Rust services focused on performance, persistence, and worker processing.

api: HTTP API built with Poem
store: shared database layer using Diesel and PostgreSQL
worker: background monitor that consumes Redis Stream jobs, sends HTTP requests, and stores ticks
redis: shared Redis Stream helpers
pusher: queue-oriented service scaffold for dispatching work

2. TurboRepo Frontend and JS Services

Located in BetterStack_turbo/, this monorepo contains the dashboard frontend and TypeScript/Bun service layer.

apps/my-app: Next.js frontend dashboard
apps/api: Express API for auth, websites, alerts, and status endpoints
apps/pusher: producer that pushes website jobs into Redis Streams
apps/worker: worker that consumes jobs and stores ticks
packages/store: shared Prisma client and schema
packages/redis-stream: Redis utility package
packages/ui: reusable UI components

Backend Architecture

Rust Backend

The Rust implementation is designed around performance and reliability.

Poem is used for HTTP server and routing in the Rust API.
Tokio powers asynchronous execution and worker concurrency.
Diesel handles PostgreSQL ORM and schema management.
Reqwest is used by workers to perform endpoint checks.
Redis Streams decouple website scheduling from processing.

Current Rust flow:

A website is registered through the API.
Website data is stored in PostgreSQL.
Jobs are placed into Redis Streams.
Rust workers consume pending jobs.
Each worker sends an HTTP request to the target endpoint.
The worker records response time and status (Up, Down, or Unknown) in the database.

TurboRepo Frontend and Service Layer

The Turbo repo provides the dashboard and a JavaScript runtime implementation of the platform.

Next.js is used for the user-facing frontend.
React powers the dashboard UI.
Tailwind CSS styles the frontend.
Express exposes backend HTTP APIs for auth and website management.
Bun is used to run the API, pusher, and worker services.
Prisma provides a shared database client for PostgreSQL.
Zustand is used for frontend state management.

Current frontend/service capabilities include:

landing page
sign-in flow
dashboard view
website detail view
incidents page
settings page
authenticated website CRUD operations
status and tick history retrieval

Data Flow

The monitoring pipeline can be summarized as:

User registers a website from the frontend dashboard.
API stores the website and ownership metadata.
Pusher reads websites and adds monitoring jobs to Redis Streams.
Workers consume queued jobs.
Workers hit the endpoint and calculate response time.
Workers store tick data in PostgreSQL.
Dashboard fetches website and tick history for visualization.

Development and Deployment

BetterStack is split into a Rust backend workspace and a TurboRepo frontend/service workspace. In the current deployment state, the frontend and HTTP APIs are not deployed publicly. The deployed production components are the monitoring workers, which run on Render free instances.

The active Render worker deployments are distributed across these regions:

Ohio
Virginia
Singapore
Frankfurt

For local development, first clone the repository and create the required .env files with database, Redis, JWT, and frontend API configuration values. The exact values depend on the local or hosted PostgreSQL and Redis instances being used.

Run the Rust backend services from BetterStack_Rust/:

cd BetterStack_Rust
cargo run -p api
cargo run -p pusher
REGION_NAME=ohio WORKER_ID=ohio-1 cargo run -p worker

The Rust API listens on port 3001. The Rust pusher reads websites from PostgreSQL and publishes jobs to Redis Streams. The Rust worker consumes those jobs, checks each website, and stores tick results in PostgreSQL. Workers can be started with different REGION_NAME and WORKER_ID values to represent separate monitoring locations.

Run the TurboRepo frontend from BetterStack_turbo/:

cd BetterStack_turbo
npm install
npm run dev --workspace=my-app

The Next.js frontend runs on the default Next.js development port, usually http://localhost:3000. It calls the API configured through NEXT_PUBLIC_API_URL, defaulting to http://localhost:5000 in the frontend source. If using the Rust API locally, point the frontend environment variable at the Rust API port.

The TurboRepo also contains TypeScript API, pusher, and worker implementations for development and comparison:

npm run dev --workspace=api
cd apps/pusher && bun index.ts
cd apps/worker && bun index.ts

Database schema management differs by implementation. The Rust workspace uses Diesel migrations under BetterStack_Rust/store/migrations. The TurboRepo uses Prisma migrations under BetterStack_turbo/packages/store/prisma/migrations. PostgreSQL and Redis must already be available locally or through hosted services; npm run dev does not start those services automatically.

This deployment model keeps the production monitoring workload focused on the Render-hosted workers while the frontend and API layers remain local/not publicly deployed at this stage. Earlier VPS-specific production assumptions such as Nginx reverse proxying, Certbot SSL automation, Redis persistence tuning, and object-storage backups are not part of the current deployment.

Literature Review

Reliable monitoring systems are a core part of distributed systems engineering. Prior work and industry guidance consistently show that monitoring must be real-time, alert-driven, and resilient to infrastructure-level failures.

Google's SRE guidance distinguishes between white-box monitoring and black-box monitoring, emphasizing that black-box checks are essential because they validate user-visible system behavior rather than only internal metrics. This directly aligns with BetterStack's model of hitting endpoints and measuring actual service responsiveness.

Research and operational reports also show that failures are not always complete shutdowns. Many production systems experience fail-slow behavior, partial outages, or degraded responsiveness before full failure. For this reason, monitoring systems should capture both availability and latency, not just binary up/down state.

The 2022 Cloudflare outage further illustrates why independent observability matters. If organizations rely on a tightly coupled monitoring setup, major infrastructure incidents can remove the very visibility needed to detect and respond to failures. BetterStack is motivated by this gap and adopts a decoupled producer-worker-storage architecture to improve resilience.

Tools and Technologies

Backend (Rust)

Tool / Technology	Purpose
Rust	Systems programming language for backend services
Poem	HTTP server and routing
Tokio	Async runtime
Diesel	ORM for PostgreSQL
PostgreSQL	Persistent data store
Redis Streams	Queue-based job delivery
Reqwest	Endpoint health checks
Dotenvy	Environment variable loading
UUID / Chrono	ID generation and timestamps

Frontend and Service Layer (TurboRepo)

Tool / Technology	Purpose
TurboRepo	Monorepo orchestration
Next.js	Frontend framework
React	UI library
Tailwind CSS	Styling
TypeScript	Type-safe application code
Bun	Runtime for API, worker, and pusher
Express	REST API layer
Prisma	Database client and schema management
PostgreSQL	Shared relational database
Redis	Message stream transport
Zustand	Client-side state management
Radix UI / Lucide React	UI primitives and icons

Database Model

The shared schema in the Turbo repo and the Rust store layer both reflect the core monitoring entities:

User: stores authentication and ownership data
Website: stores monitored URLs
Region: identifies the location or worker region
WebsiteTick: stores response time, status code, and check timestamp

This structure supports historical monitoring, per-site analysis, and multi-region extension in future versions.

Key Features

user registration and authentication
website registration and ownership mapping
periodic website health checks
queue-based processing using Redis Streams
latency and status storage in PostgreSQL
incident and dashboard-oriented frontend
modular backend structure in both Rust and TurboRepo implementations

Why This Design Matters

This architecture improves resilience in several ways:

Producers and workers are decoupled, so checks can continue even if one service is restarted.
Results are persisted, so historical visibility is preserved.
Workers can scale horizontally for larger monitoring workloads.
The frontend remains separated from the check execution pipeline.
The design supports future extensions such as alerting, retries, region-based checks, and status-page generation.

Suggested Future Improvements

email, SMS, or webhook alerting
multi-region worker deployment
SLO/SLA reporting
incident timeline generation
retry logic and failure classification
charts for uptime percentage and response-time trends
role-based access control
Docker Compose or Kubernetes deployment manifests

Repo Structure

BetterStack/
|-- Readme.md
|-- BetterStack_Rust/
|   |-- api/
|   |-- worker/
|   |-- store/
|   |-- redis/
|   `-- pusher/
|-- BetterStack_turbo/
|   |-- apps/
|   |   |-- my-app/
|   |   |-- api/
|   |   |-- pusher/
|   |   `-- worker/
|   `-- packages/
|       |-- store/
|       |-- redis-stream/
|       `-- ui/
`-- redis_stream_ex/

Conclusion

BetterStack is a practical distributed monitoring platform that combines a Rust backend implementation with a TurboRepo-based frontend and service ecosystem. It addresses a real-world reliability problem: monitoring must remain trustworthy even when parts of the infrastructure are under stress. By combining persistent storage, asynchronous workers, Redis Streams, and a dashboard interface, the project creates a strong foundation for resilient service monitoring.

References (IEEE Format)

[1] T. Strickx and J. Hartman, "Cloudflare outage on June 21, 2022," Cloudflare Blog, Jun. 21, 2022. [Online]. Available: https://blog.cloudflare.com/cloudflare-outage-on-june-21-2022/

[2] R. Ewaschuk, "Monitoring Distributed Systems," in Site Reliability Engineering, Google SRE. [Online]. Available: https://sre.google/sre-book/monitoring-distributed-systems/

[3] R. Lu et al., "Perseus: A Fail-Slow Detection Framework for Cloud Storage Systems," in 21st USENIX Conference on File and Storage Technologies (FAST '23), 2023. [Online]. Available: https://www.usenix.org/conference/fast23/presentation/lu

[4] M. P. Kasick, J. Tan, R. Gandhi, and P. Narasimhan, "Black-Box Problem Diagnosis in Parallel File Systems," in 8th USENIX Conference on File and Storage Technologies (FAST '10), 2010. [Online]. Available: https://www.usenix.org/conference/fast-10/black-box-problem-diagnosis-parallel-file-systems

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BetterStack: Resilient Website Monitoring System

Introduction

Motivation

Problem Statement

Objectives

System Overview

1. Rust Backend Workspace

2. TurboRepo Frontend and JS Services

Backend Architecture

Rust Backend

TurboRepo Frontend and Service Layer

Data Flow

Development and Deployment

Literature Review

Tools and Technologies

Backend (Rust)

Frontend and Service Layer (TurboRepo)

Database Model

Key Features

Why This Design Matters

Suggested Future Improvements

Repo Structure

Conclusion

References (IEEE Format)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
BetterStack_Rust		BetterStack_Rust
BetterStack_turbo		BetterStack_turbo
redis_stream_ex		redis_stream_ex
server		server
.gitignore		.gitignore
Readme.md		Readme.md
steps.md		steps.md

Folders and files

Latest commit

History

Repository files navigation

BetterStack: Resilient Website Monitoring System

Introduction

Motivation

Problem Statement

Objectives

System Overview

1. Rust Backend Workspace

2. TurboRepo Frontend and JS Services

Backend Architecture

Rust Backend

TurboRepo Frontend and Service Layer

Data Flow

Development and Deployment

Literature Review

Tools and Technologies

Backend (Rust)

Frontend and Service Layer (TurboRepo)

Database Model

Key Features

Why This Design Matters

Suggested Future Improvements

Repo Structure

Conclusion

References (IEEE Format)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages