Skip to content

S3 Events External Table Refresh #218

@robinskil

Description

@robinskil

Summary

Add event-driven metadata updates for existing tables backed by object storage prefixes.

Beacon already supports creating and dropping tables. This issue only covers updating an existing table when new dataset files are added under the table's configured prefix.

Motivation

Tables backed by object storage can become stale when files are added outside of Beacon.

Beacon should be able to receive or process storage events, detect which table prefix the new file belongs to, and update the relevant table metadata without requiring a full manual refresh.

Goal

When a new dataset file is added under a table prefix, Beacon should update only the relevant table metadata.

Example:

table: observations
prefix: observations/files/

new file:
observations/files/new.parquet

result:
observations table metadata is updated

Requirements

  • Add support for processing object storage create events.
  • Match incoming file paths to existing table prefixes.
  • Update only the table whose prefix matches the new file.
  • Add the new file to the table's tracked dataset/file metadata.
  • Infer partition values from the path if the table is partitioned.
  • Ignore events that do not match any table prefix.
  • Avoid refreshing unrelated tables.
  • Ensure query planning can use the updated table metadata.
  • Ensure path matching is segment-aware.

Event Matching

Events should be matched to tables by their configured storage prefix.

Example match:

table prefix:
data/example/

event path:
s3://bucket/data/example/file.parquet

matches:
yes

Example non-match:

table prefix:
data/example/

event path:
s3://bucket/data/example_2/file.parquet

matches:
no

Path matching must not use unsafe prefix matching where example/ accidentally matches example_2/.

Event Types

Initial scope should support dataset/file creation events.

Example:

ObjectCreated

or an internal equivalent event such as:

DatasetAdded {
    path: String,
}

Deletion, update, and full resync can be handled in follow-up issues.

Example Flow

Existing table:

CREATE TABLE observations
STORED AS PARQUET
LOCATION 'observations/'
PARTITIONED BY (year, month);

Incoming event:

ObjectCreated:
observations/year=2026/month=05/part-000.parquet

Beacon should:

1. Find the table with matching prefix.
2. Add the file to that table's metadata.
3. Infer partition values:
   year = 2026
   month = 05
4. Make the file available for future queries.

Acceptance Criteria

  • Beacon can process a dataset/file-added event.
  • Events are matched to existing table prefixes.
  • Matching events update only the relevant table.
  • Non-matching events are ignored.
  • New files are added to the table's tracked metadata.
  • Partition values are inferred from Hive-style paths where applicable.
  • Query planning sees newly added files after the event is processed.
  • Path matching distinguishes example/ from example_2/.
  • Tests cover matching events.
  • Tests cover non-matching events.
  • Tests cover partition inference.
  • Tests cover safe prefix matching.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions