Skip to content

[Feature] Service health monitoring and status pages #152

@Polliog

Description

@Polliog

Feature Description

Add proactive uptime monitoring capabilities on top of Logtide's existing reactive log and alert analysis. Users should be able to define monitored endpoints, get notified when they go down, and have automatic access to the related logs when an incident occurs.

Problem/Use Case

Logtide is excellent at telling you what happened after something goes wrong via logs and traces, but it currently has no way to tell you that something is down right now. Teams that use Logtide for observability still need to run a separate tool (Uptime Kuma, Better Stack, Freshping) just for uptime checks. This creates a split workflow: alerts come from one system, investigation happens in another.

Additionally, many teams need a status page to communicate service health to internal stakeholders or customers — another thing that currently requires a separate service.

Proposed Solution

  • Monitor definitions: configurable HTTP/HTTPS checks (interval, timeout, expected status code, optional response body assertion), TCP ping monitors, and "heartbeat" monitors that alert when no log has been received from a service within a configurable time window
  • Monitors executed by BullMQ workers on defined intervals, consistent with the existing job infrastructure
  • Monitor results stored as time-series data in TimescaleDB alongside logs and metrics
  • Automatic incident creation: when a monitor fails, an incident is created in the Logtide incident management system and linked to the relevant logs from that service in that time window
  • Status page: auto-generated per-project status page showing uptime history, current status per monitor, and recent incidents — configurable as public or auth-protected
  • Notifications via the existing channels already supported: email, Slack, Discord webhooks

Alternatives Considered

  • Log-based alerting as a proxy for uptime: alerting on absence of expected log lines to infer service health. This is already partially possible with heartbeat monitors but is not user-friendly for simple HTTP endpoint checks.
  • External tool integration: receiving downtime alerts from Uptime Kuma via webhooks (see issue [Performance] Database Optimization & Query Speed Improvements #6). This is a valid complementary approach but doesn't eliminate the need to run a separate tool.

Implementation Details (Optional)

  • Monitor worker: new BullMQ queue monitor-checks with a repeatable job per monitor definition; results written to a monitor_results hypertable in TimescaleDB
  • Status page: a new public route /status/:projectSlug in SvelteKit, served without authentication when set to public
  • Heartbeat monitors: query the existing logs hypertable for the last log timestamp from a given service — no separate data store needed
  • Incident auto-linking: when a monitor failure creates an incident, attach a pre-filtered log query link scoped to the affected service and the failure time window

Priority

  • Critical - Blocking my usage of LogTide
  • High - Would significantly improve my workflow
  • Medium - Nice to have
  • Low - Minor enhancement

Target Users

  • DevOps Engineers
  • Developers
  • Security/SIEM Users
  • System Administrators
  • All Users

Additional Context

This feature is particularly relevant for self-hosted teams that want to reduce the number of running services. Replacing Uptime Kuma (or similar) with native Logtide monitoring means one less Docker container, one less UI to check, and automatic correlation between downtime and the logs that explain it.

Contribution

  • I would like to work on implementing this feature

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions