Summary
Add event-driven metadata updates for existing tables backed by object storage prefixes.
Beacon already supports creating and dropping tables. This issue only covers updating an existing table when new dataset files are added under the table's configured prefix.
Motivation
Tables backed by object storage can become stale when files are added outside of Beacon.
Beacon should be able to receive or process storage events, detect which table prefix the new file belongs to, and update the relevant table metadata without requiring a full manual refresh.
Goal
When a new dataset file is added under a table prefix, Beacon should update only the relevant table metadata.
Example:
table: observations
prefix: observations/files/
new file:
observations/files/new.parquet
result:
observations table metadata is updated
Requirements
- Add support for processing object storage create events.
- Match incoming file paths to existing table prefixes.
- Update only the table whose prefix matches the new file.
- Add the new file to the table's tracked dataset/file metadata.
- Infer partition values from the path if the table is partitioned.
- Ignore events that do not match any table prefix.
- Avoid refreshing unrelated tables.
- Ensure query planning can use the updated table metadata.
- Ensure path matching is segment-aware.
Event Matching
Events should be matched to tables by their configured storage prefix.
Example match:
table prefix:
data/example/
event path:
s3://bucket/data/example/file.parquet
matches:
yes
Example non-match:
table prefix:
data/example/
event path:
s3://bucket/data/example_2/file.parquet
matches:
no
Path matching must not use unsafe prefix matching where example/ accidentally matches example_2/.
Event Types
Initial scope should support dataset/file creation events.
Example:
or an internal equivalent event such as:
DatasetAdded {
path: String,
}
Deletion, update, and full resync can be handled in follow-up issues.
Example Flow
Existing table:
CREATE TABLE observations
STORED AS PARQUET
LOCATION 'observations/'
PARTITIONED BY (year, month);
Incoming event:
ObjectCreated:
observations/year=2026/month=05/part-000.parquet
Beacon should:
1. Find the table with matching prefix.
2. Add the file to that table's metadata.
3. Infer partition values:
year = 2026
month = 05
4. Make the file available for future queries.
Acceptance Criteria
Summary
Add event-driven metadata updates for existing tables backed by object storage prefixes.
Beacon already supports creating and dropping tables. This issue only covers updating an existing table when new dataset files are added under the table's configured prefix.
Motivation
Tables backed by object storage can become stale when files are added outside of Beacon.
Beacon should be able to receive or process storage events, detect which table prefix the new file belongs to, and update the relevant table metadata without requiring a full manual refresh.
Goal
When a new dataset file is added under a table prefix, Beacon should update only the relevant table metadata.
Example:
Requirements
Event Matching
Events should be matched to tables by their configured storage prefix.
Example match:
Example non-match:
Path matching must not use unsafe prefix matching where
example/accidentally matchesexample_2/.Event Types
Initial scope should support dataset/file creation events.
Example:
or an internal equivalent event such as:
Deletion, update, and full resync can be handled in follow-up issues.
Example Flow
Existing table:
Incoming event:
Beacon should:
Acceptance Criteria
example/fromexample_2/.