BetterStack is a resilient website monitoring system built to track service availability, measure response times, store health-check results, and surface incidents through a modern dashboard. The project is organized as a multi-part system with:
- a Rust backend workspace for high-performance service components
- a TurboRepo-based frontend and JavaScript service layer for dashboard, API, queue producers, and workers
- PostgreSQL for persistent monitoring data
- Redis Streams for decoupled job delivery between producers and workers
The system continuously checks whether registered websites are up or down, records each result as a tick, and presents the data through a frontend dashboard for users and operators.
In June 2022, a major Cloudflare outage affected thousands of websites and services globally. A key lesson from such incidents is simple: if the monitoring platform depends on the same failing infrastructure, both the service and the monitoring system may go down together. In that situation:
- No alerts
- No notifications
- No visibility
This project is motivated by the need to build a monitoring platform that remains useful during service disruption. Instead of relying on a single tightly coupled component, BetterStack uses a queue-based architecture with dedicated workers, persistent storage, and independent health checks to improve resilience and observability.
Modern applications rely on cloud platforms, APIs, databases, and distributed infrastructure. When one service fails:
- businesses lose revenue
- users lose trust
- engineers respond late without timely signals
Many teams depend on third-party monitoring platforms, but those systems can also fail or become unreachable. BetterStack addresses this by designing a monitoring workflow that is always collecting availability data, pushing checks asynchronously, and storing results for later alerting, analysis, and incident visibility.
- Monitor websites and service endpoints 24/7
- Detect downtime and latency degradation quickly
- Store health-check history for analysis
- Support worker-based distributed monitoring
- Separate data production from processing using Redis Streams
- Provide a frontend dashboard for website, status, and incident visibility
- Reduce single points of failure in the monitoring pipeline
BetterStack currently contains two major implementation areas:
Located in BetterStack_Rust/, this workspace contains Rust services focused on performance, persistence, and worker processing.
api: HTTP API built with Poemstore: shared database layer using Diesel and PostgreSQLworker: background monitor that consumes Redis Stream jobs, sends HTTP requests, and stores ticksredis: shared Redis Stream helperspusher: queue-oriented service scaffold for dispatching work
Located in BetterStack_turbo/, this monorepo contains the dashboard frontend and TypeScript/Bun service layer.
apps/my-app: Next.js frontend dashboardapps/api: Express API for auth, websites, alerts, and status endpointsapps/pusher: producer that pushes website jobs into Redis Streamsapps/worker: worker that consumes jobs and stores tickspackages/store: shared Prisma client and schemapackages/redis-stream: Redis utility packagepackages/ui: reusable UI components
The Rust implementation is designed around performance and reliability.
Poemis used for HTTP server and routing in the Rust API.Tokiopowers asynchronous execution and worker concurrency.Dieselhandles PostgreSQL ORM and schema management.Reqwestis used by workers to perform endpoint checks.Redis Streamsdecouple website scheduling from processing.
Current Rust flow:
- A website is registered through the API.
- Website data is stored in PostgreSQL.
- Jobs are placed into Redis Streams.
- Rust workers consume pending jobs.
- Each worker sends an HTTP request to the target endpoint.
- The worker records response time and status (
Up,Down, orUnknown) in the database.
The Turbo repo provides the dashboard and a JavaScript runtime implementation of the platform.
Next.jsis used for the user-facing frontend.Reactpowers the dashboard UI.Tailwind CSSstyles the frontend.Expressexposes backend HTTP APIs for auth and website management.Bunis used to run the API, pusher, and worker services.Prismaprovides a shared database client for PostgreSQL.Zustandis used for frontend state management.
Current frontend/service capabilities include:
- landing page
- sign-in flow
- dashboard view
- website detail view
- incidents page
- settings page
- authenticated website CRUD operations
- status and tick history retrieval
The monitoring pipeline can be summarized as:
- User registers a website from the frontend dashboard.
- API stores the website and ownership metadata.
- Pusher reads websites and adds monitoring jobs to Redis Streams.
- Workers consume queued jobs.
- Workers hit the endpoint and calculate response time.
- Workers store tick data in PostgreSQL.
- Dashboard fetches website and tick history for visualization.
BetterStack is split into a Rust backend workspace and a TurboRepo frontend/service workspace. In the current deployment state, the frontend and HTTP APIs are not deployed publicly. The deployed production components are the monitoring workers, which run on Render free instances.
The active Render worker deployments are distributed across these regions:
- Ohio
- Virginia
- Singapore
- Frankfurt
For local development, first clone the repository and create the required .env files with database, Redis, JWT, and frontend API configuration values. The exact values depend on the local or hosted PostgreSQL and Redis instances being used.
Run the Rust backend services from BetterStack_Rust/:
cd BetterStack_Rust
cargo run -p api
cargo run -p pusher
REGION_NAME=ohio WORKER_ID=ohio-1 cargo run -p workerThe Rust API listens on port 3001. The Rust pusher reads websites from PostgreSQL and publishes jobs to Redis Streams. The Rust worker consumes those jobs, checks each website, and stores tick results in PostgreSQL. Workers can be started with different REGION_NAME and WORKER_ID values to represent separate monitoring locations.
Run the TurboRepo frontend from BetterStack_turbo/:
cd BetterStack_turbo
npm install
npm run dev --workspace=my-appThe Next.js frontend runs on the default Next.js development port, usually http://localhost:3000. It calls the API configured through NEXT_PUBLIC_API_URL, defaulting to http://localhost:5000 in the frontend source. If using the Rust API locally, point the frontend environment variable at the Rust API port.
The TurboRepo also contains TypeScript API, pusher, and worker implementations for development and comparison:
npm run dev --workspace=api
cd apps/pusher && bun index.ts
cd apps/worker && bun index.tsDatabase schema management differs by implementation. The Rust workspace uses Diesel migrations under BetterStack_Rust/store/migrations. The TurboRepo uses Prisma migrations under BetterStack_turbo/packages/store/prisma/migrations. PostgreSQL and Redis must already be available locally or through hosted services; npm run dev does not start those services automatically.
This deployment model keeps the production monitoring workload focused on the Render-hosted workers while the frontend and API layers remain local/not publicly deployed at this stage. Earlier VPS-specific production assumptions such as Nginx reverse proxying, Certbot SSL automation, Redis persistence tuning, and object-storage backups are not part of the current deployment.
Reliable monitoring systems are a core part of distributed systems engineering. Prior work and industry guidance consistently show that monitoring must be real-time, alert-driven, and resilient to infrastructure-level failures.
Google's SRE guidance distinguishes between white-box monitoring and black-box monitoring, emphasizing that black-box checks are essential because they validate user-visible system behavior rather than only internal metrics. This directly aligns with BetterStack's model of hitting endpoints and measuring actual service responsiveness.
Research and operational reports also show that failures are not always complete shutdowns. Many production systems experience fail-slow behavior, partial outages, or degraded responsiveness before full failure. For this reason, monitoring systems should capture both availability and latency, not just binary up/down state.
The 2022 Cloudflare outage further illustrates why independent observability matters. If organizations rely on a tightly coupled monitoring setup, major infrastructure incidents can remove the very visibility needed to detect and respond to failures. BetterStack is motivated by this gap and adopts a decoupled producer-worker-storage architecture to improve resilience.
| Tool / Technology | Purpose |
|---|---|
| Rust | Systems programming language for backend services |
| Poem | HTTP server and routing |
| Tokio | Async runtime |
| Diesel | ORM for PostgreSQL |
| PostgreSQL | Persistent data store |
| Redis Streams | Queue-based job delivery |
| Reqwest | Endpoint health checks |
| Dotenvy | Environment variable loading |
| UUID / Chrono | ID generation and timestamps |
| Tool / Technology | Purpose |
|---|---|
| TurboRepo | Monorepo orchestration |
| Next.js | Frontend framework |
| React | UI library |
| Tailwind CSS | Styling |
| TypeScript | Type-safe application code |
| Bun | Runtime for API, worker, and pusher |
| Express | REST API layer |
| Prisma | Database client and schema management |
| PostgreSQL | Shared relational database |
| Redis | Message stream transport |
| Zustand | Client-side state management |
| Radix UI / Lucide React | UI primitives and icons |
The shared schema in the Turbo repo and the Rust store layer both reflect the core monitoring entities:
User: stores authentication and ownership dataWebsite: stores monitored URLsRegion: identifies the location or worker regionWebsiteTick: stores response time, status code, and check timestamp
This structure supports historical monitoring, per-site analysis, and multi-region extension in future versions.
- user registration and authentication
- website registration and ownership mapping
- periodic website health checks
- queue-based processing using Redis Streams
- latency and status storage in PostgreSQL
- incident and dashboard-oriented frontend
- modular backend structure in both Rust and TurboRepo implementations
This architecture improves resilience in several ways:
- Producers and workers are decoupled, so checks can continue even if one service is restarted.
- Results are persisted, so historical visibility is preserved.
- Workers can scale horizontally for larger monitoring workloads.
- The frontend remains separated from the check execution pipeline.
- The design supports future extensions such as alerting, retries, region-based checks, and status-page generation.
- email, SMS, or webhook alerting
- multi-region worker deployment
- SLO/SLA reporting
- incident timeline generation
- retry logic and failure classification
- charts for uptime percentage and response-time trends
- role-based access control
- Docker Compose or Kubernetes deployment manifests
BetterStack/
|-- Readme.md
|-- BetterStack_Rust/
| |-- api/
| |-- worker/
| |-- store/
| |-- redis/
| `-- pusher/
|-- BetterStack_turbo/
| |-- apps/
| | |-- my-app/
| | |-- api/
| | |-- pusher/
| | `-- worker/
| `-- packages/
| |-- store/
| |-- redis-stream/
| `-- ui/
`-- redis_stream_ex/
BetterStack is a practical distributed monitoring platform that combines a Rust backend implementation with a TurboRepo-based frontend and service ecosystem. It addresses a real-world reliability problem: monitoring must remain trustworthy even when parts of the infrastructure are under stress. By combining persistent storage, asynchronous workers, Redis Streams, and a dashboard interface, the project creates a strong foundation for resilient service monitoring.
[1] T. Strickx and J. Hartman, "Cloudflare outage on June 21, 2022," Cloudflare Blog, Jun. 21, 2022. [Online]. Available: https://blog.cloudflare.com/cloudflare-outage-on-june-21-2022/
[2] R. Ewaschuk, "Monitoring Distributed Systems," in Site Reliability Engineering, Google SRE. [Online]. Available: https://sre.google/sre-book/monitoring-distributed-systems/
[3] R. Lu et al., "Perseus: A Fail-Slow Detection Framework for Cloud Storage Systems," in 21st USENIX Conference on File and Storage Technologies (FAST '23), 2023. [Online]. Available: https://www.usenix.org/conference/fast23/presentation/lu
[4] M. P. Kasick, J. Tan, R. Gandhi, and P. Narasimhan, "Black-Box Problem Diagnosis in Parallel File Systems," in 8th USENIX Conference on File and Storage Technologies (FAST '10), 2010. [Online]. Available: https://www.usenix.org/conference/fast-10/black-box-problem-diagnosis-parallel-file-systems