You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 16, 2026. It is now read-only.
Add protobuf schema for the NormalizedEvent / TapEventData types that flow through the NATS JetStream -> ClickHouse pipeline. This is ensemble-tap's highest-throughput data path and currently uses untyped map[string]any JSON.
Motivation
ensemble-tap ingests webhooks from 20+ SaaS providers (Stripe, HubSpot, Slack, etc.), normalizes them into a NormalizedEvent struct, and publishes to NATS JetStream. A ClickHouse consumer writes batches of 500 events every 2 seconds.
Current problems:
The changes and snapshot fields are serialized as JSON strings into ClickHouse String columns -- no schema, no type safety
TapEventData uses map[string]any for the event payload -- consumers must guess the shape
CloudEvent data field is json.RawMessage with no contract
Adding a new provider field means hoping all consumers handle it correctly
Scope
Define proto/tap/v1/event.proto with NormalizedEvent, TapEventData, and provider-specific payload types
Publish to NATS as serialized protobuf instead of JSON
Update ClickHouse consumer to use protobuf input format for batch inserts
Keep CloudEvents envelope as JSON (it's the transport), but use protobuf for the data payload
Provider-specific payload types can use google.protobuf.Struct initially, migrating to typed messages per provider over time
Expected Benefits
Schema enforcement at the NATS publish boundary
Smaller wire format for high-throughput batch writes
ClickHouse can use Protobuf input format, eliminating JSON parse on ingest
Type-safe deserialization for any downstream consumer of the tap event stream
Foundation for adding new providers with schema validation
Summary
Add protobuf schema for the
NormalizedEvent/TapEventDatatypes that flow through the NATS JetStream -> ClickHouse pipeline. This is ensemble-tap's highest-throughput data path and currently uses untypedmap[string]anyJSON.Motivation
ensemble-tap ingests webhooks from 20+ SaaS providers (Stripe, HubSpot, Slack, etc.), normalizes them into a
NormalizedEventstruct, and publishes to NATS JetStream. A ClickHouse consumer writes batches of 500 events every 2 seconds.Current problems:
changesandsnapshotfields are serialized as JSON strings into ClickHouseStringcolumns -- no schema, no type safetyTapEventDatausesmap[string]anyfor the event payload -- consumers must guess the shapedatafield isjson.RawMessagewith no contractScope
proto/tap/v1/event.protowithNormalizedEvent,TapEventData, and provider-specific payload typesdatapayloadgoogle.protobuf.Structinitially, migrating to typed messages per provider over timeExpected Benefits
Protobufinput format, eliminating JSON parse on ingest