Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
64574e7
docs: add CLAUDE.md for repository guidance and command reference
neerajvipparla May 1, 2026
acdfd3e
docs: add ClickHouse Log Sink design document for new logging impleme…
neerajvipparla May 1, 2026
e4ee3db
build(deps): update Go modules with new indirect dependencies
neerajvipparla May 1, 2026
1b1eef1
feat(config): implement ClickHouse configuration and validation
neerajvipparla May 1, 2026
9afa432
feat(clickhouse): add schema management and DDL generation for ClickH…
neerajvipparla May 1, 2026
a768a33
feat(clickhouse): implement batch writing for ClickHouse logs
neerajvipparla May 1, 2026
6daca3d
feat(clickhouse): add unit tests for batch writing and core functiona…
neerajvipparla May 1, 2026
fdb0154
feat(clickhouse): enhance logRow structure and validation in configur…
neerajvipparla May 1, 2026
4f54bca
feat(clickhouse): enhance row extraction and configuration validation
neerajvipparla May 1, 2026
b0edef7
feat(clickhouse): enhance configuration and logging functionality
neerajvipparla May 2, 2026
7158dd2
feat(clickhouse): integrate ClickHouse logging into Ion framework
neerajvipparla May 2, 2026
4acad72
refactor(clickhouse): reorganize package structure and update imports
neerajvipparla May 2, 2026
39a8815
fix(config): remove deprecated 'warning' log level and enhance ClickH…
neerajvipparla May 2, 2026
e324030
Merge pull request #1 from neerajvipparla/add/clickhouse
neerajvipparla May 2, 2026
0b7f89c
docs(readme): add ClickHouse configuration details and schema informa…
neerajvipparla May 3, 2026
c6a42c0
Merge pull request #2 from neerajvipparla/add/clickhouse
neerajvipparla May 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,5 @@ examples/basic/basic
examples/otel-test/otel-test
*.exe

temp/
temp/
examples/clickhouse-simulator/*
107 changes: 107 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Commands

```bash
# Test (race detector + coverage, all packages)
make test
# or directly:
go test -v -race -cover ./...

# Run a single test
go test -v -run TestName ./...

# Lint (golangci-lint must be installed)
make lint

# Format
make fmt

# Dependencies
make deps
```

## Architecture

Ion is a Go observability library. The module path is `github.com/JupiterMetaLabs/ion`.

### Public API (stable, `internal/*` has no guarantees)

| File | Role |
|------|------|
| `ion.go` | `*Ion` struct — root observability instance; `New()`, `Child()`, `Named()`, `With()`, `Shutdown()` |
| `logger.go` | `Logger` interface |
| `logger_impl.go` | `zapLogger` struct — internal Zap wrapper, `prepareFields()`, field conversion |
| `config.go` | Type aliases into `internal/config`; `Default()`, `Development()` builders |
| `fields.go` | `Field` struct and typed constructors (`String`, `Int64`, `Err`, etc.) |
| `context.go` | Context helpers (`WithRequestID`, `extractContextZapFields`, etc.) |
| `tracer.go` | `Tracer`/`Span` interfaces; `noopTracer` |
| `attrs.go` | `Attr` type alias, OTEL status constants |
| `fields/blockchain.go` | Domain-specific field constructors (`TxHash`, `BlockHeight`, etc.) |
| `middleware/ionhttp/` | HTTP middleware for context propagation |
| `middleware/iongrpc/` | gRPC stats handler for context propagation |

### Internal packages

| Package | Role |
|---------|------|
| `internal/config/config.go` | All config structs (`Config`, `OTELConfig`, `FileConfig`, …), `Validate()`, `Default()`, `Development()`, `NewFileWriter()` |
| `internal/core/logger_factory.go` | `NewZapLogger()` — assembles all Zap cores (console, file, OTEL) into a `zapcore.Tee` |
| `internal/core/otel.go` | `SetupLogProvider()`, `SetupTracerProvider()`, OTLP exporter construction, endpoint/auth helpers |
| `internal/core/meter.go` | `SetupMeterProvider()`, OTLP metrics exporter construction |
| `internal/core/filter.go` | `filteringCore` — strips internal sentinel keys from log output |
| `internal/core/enforcer.go` | `levelEnforcer` — overrides a core's level check (needed for the OTEL zap bridge which defaults to Info) |
| `internal/core/constants.go` | `SentinelKey = "__ion_ctx__"`, `SystemFieldPrefix = "__ion_"` |

### Log pipeline (critical to understand before modifying)

```
ion.Info(ctx, msg, fields...)
└─ zapLogger.prepareFields(ctx, fields)
├─ toZapFields(fields) // Field → zap.Field, zero-alloc for primitives
└─ extractContextZapFields(ctx) // trace_id, span_id, request_id, user_id
+ zap.Reflect(SentinelKey, ctx) // carries raw ctx into the core for OTEL bridge
└─ zap.Logger.Info(msg, zapFields...)
└─ zapcore.Tee [
filteringCore(consoleCore) // strips SentinelKey before writing to stdout/stderr
filteringCore(fileCore) // strips SentinelKey before writing to file
filteringCore(levelEnforcer(otelzapCore)) // OTEL bridge uses SentinelKey to
] // extract TraceID/SpanID before it's stripped
```

**`SentinelKey` (`"__ion_ctx__"`)** is the mechanism for passing `context.Context` through Zap's field system so the `otelzap` bridge can call `trace.SpanContextFromContext()` inside the core, while `filteringCore` prevents the raw context from leaking into console/file output.

### Adding a new log sink (the pattern to follow for ClickHouse)

All sinks are assembled in `internal/core/logger_factory.go` inside `NewZapLogger()`. The `cores` slice is built and passed to `zapcore.NewTee`. Steps:

1. Add config struct fields to `internal/config/config.go` (follow `FileConfig` pattern).
2. Alias the new type in `config.go` (public package), add a fluent builder on `Config`.
3. Create `internal/core/<sink>.go` implementing or wrapping `zapcore.Core`.
4. In `NewZapLogger()`: build the core if enabled, wrap it in `NewFilteringCore(core, SentinelKey)`, append to `cores`.
5. If the sink needs graceful shutdown (connections, flushers), add it to `ZapFactoryResult` and call shutdown in `zapLogger.Shutdown()`.
6. Wire any new config-level minimum-level calculation into the `minLevel` block in `NewZapLogger()`.

`filteringCore` **must** wrap every new core to strip the sentinel; otherwise raw `context.Context` values appear in output.

### `*Ion` vs `Logger` vs `zapLogger`

- `zapLogger` — concrete, unexported. Owns the `*zap.Logger`, `zap.AtomicLevel`, and `*core.LogProvider`. Implements `Logger`.
- `*Ion` — exported. Embeds `*zapLogger` (promoting all `Logger` methods). Also holds `tracerProvider` and `meterProvider`. The concrete type behind every `Logger` interface value returned by the public API.
- `Child()` returns `*Ion` directly (caller needs Tracer/Meter). `Named()`/`With()` return `Logger` (interface) — internally still `*Ion`.
- All children share the same `zap.AtomicLevel` pointer → `SetLevel()` on any instance propagates everywhere.
- Only the root `*Ion` from `New()` should be shut down. `Shutdown()` on a child tears down shared providers.

### `Critical()` — no-exit Fatal

`Critical()` calls `zap.Fatal()` but `New()` installs `zap.WithFatalHook(noExitHook{})`, which is a no-op hook. This emits a `FATAL`-level log entry and returns control to the caller — it never calls `os.Exit`.

### Config inheritance

`Tracing` and `Metrics` configs inherit `Endpoint`, `Protocol`, `Insecure`, `Username`, `Password`, `Headers`, `Timeout`, `BatchSize`, and `ExportInterval` from `OTEL` config when their own values are empty. This inheritance is applied in `ion.go` `New()` before calling the setup functions.

## Linting

golangci-lint v2 config is in `.golangci.yml`. Active linters: `govet`, `ineffassign`, `unused`, `nolintlint`, `staticcheck`, `errcheck`, `gosec`. Every `//nolint` directive must name the specific linter and include a reason comment. Import ordering enforced by `goimports` with local prefix `github.com/JupiterMetaLabs/ion`.
72 changes: 72 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,7 @@ Ion uses a comprehensive configuration struct for behavior control. This maps 1:
| `OTEL` | `OTELConfig` | `Enabled: false` | Configuration for remote OpenTelemetry logging. |
| `Tracing` | `TracingConfig` | `Enabled: false` | Configuration for distributed tracing. |
| `Metrics` | `MetricsConfig` | `Enabled: false` | Configuration for OpenTelemetry metrics. |
| `ClickHouse` | `ClickHouseConfig` | `Enabled: false` | Configuration for ClickHouse analytics log sink. |

### Console Configuration (`ion.ConsoleConfig`)

Expand Down Expand Up @@ -348,6 +349,77 @@ Controls the OpenTelemetry **Trace** Provider. Empty fields inherit from `OTELCo
| `Username` | `string` | `""` | Inherits `OTEL.Username` if empty. |
| `Password` | `string` | `""` | Inherits `OTEL.Password` if empty. |

### ClickHouse Configuration (`ion.ClickHouseConfig`)

Ion can write every log entry to a ClickHouse table in parallel with console/file/OTEL output. The sink is fully asynchronous — log calls return immediately and a background goroutine batches rows to ClickHouse. This enables fast analytical queries over log data (e.g., "error rate per validator in the last 5 minutes") that are not possible with Loki.

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `Enabled` | `bool` | `false` | Enables the ClickHouse sink. |
| `DSN` | `string` | `""` | Connection string. **Required when enabled.** `"http://user:pass@host:8123/db"` or `"clickhouse://user:pass@host:9000/db"`. |
| `Table` | `string` | `"ion_logs"` | Target table name. Must be a valid unquoted SQL identifier. |
| `Level` | `string` | `""` | Minimum level for this sink. Inherits global `Level` if empty. |
| `BatchSize` | `int` | `1000` | Max rows per flush. Larger batches → fewer TCP writes, higher latency. |
| `FlushInterval` | `Duration` | `5s` | How often the background flusher sends pending rows. |
| `ChannelBuffer` | `int` | `10000` | Async queue depth. Entries are dropped (not blocked) when full. |
| `AutoSchema` | `bool` | `false` | Run `CREATE TABLE IF NOT EXISTS` on startup. Set `true` for dev; manage DDL yourself in production. |
| `DialTimeout` | `Duration` | `10s` | Connection dial timeout. |
| `WriteTimeout` | `Duration` | `30s` | Per-flush write timeout. |
| `MaxOpenConns` | `int` | `5` | Connection pool size. |
| `MaxIdleConns` | `int` | `5` | Idle connections kept open. |
| `ConnMaxLifetime` | `Duration` | `1h` | Max connection reuse duration. |

#### Table Schema

Ion creates the following table when `AutoSchema = true`. You can also run the DDL manually before deployment:

```sql
CREATE TABLE IF NOT EXISTS ion_logs
(
timestamp DateTime64(9, 'UTC'), -- nanosecond precision
level LowCardinality(String), -- dictionary-encoded, fast WHERE
service LowCardinality(String),
version LowCardinality(String),
logger String,
message String,
trace_id String,
span_id String,
request_id String,
user_id String,
caller String,
str_fields Map(String, String), -- ion.String() fields
int_fields Map(String, Int64), -- ion.Int64(), ion.Uint64() fields
flt_fields Map(String, Float64), -- ion.Float64() fields
bool_fields Map(String, UInt8), -- ion.Bool() fields (0/1)
extra String -- JSON bag for errors, structs
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (service, level, timestamp)
TTL timestamp + INTERVAL 30 DAY DELETE
SETTINGS index_granularity = 8192;
```

**Schema design:** `LowCardinality` on `level`/`service`/`version` gives ~10× storage reduction for high-cardinality repeated strings. Typed map columns (`int_fields`, `flt_fields`) allow index pushdown on `WHERE int_fields['block_height'] > 19000000` without string casting. The `extra` column is a JSON escape hatch for complex Go types (errors, structs) and is not indexed.

#### Monitoring Back-Pressure

```go
app, _, _ := ion.New(cfg)

// DroppedCount returns the total entries dropped due to a full channel buffer.
// A rising value means the buffer is filling faster than the flusher can drain it.
dropped := app.DroppedCount()
```

Tuning guide:

| Symptom | Cause | Action |
|---------|-------|--------|
| `DroppedCount()` rising | Buffer filling faster than it is flushed | Decrease `FlushInterval` (flush more often), increase `BatchSize` (flush more rows per write), and increase `ChannelBuffer` (more room before drops occur) |
| Flush errors in stderr | ClickHouse unreachable | Fix connectivity; rows during downtime are lost |
| Memory growth | `ChannelBuffer` too large | Decrease `ChannelBuffer` |

### Metrics Configuration (`ion.MetricsConfig`)

Controls the OpenTelemetry **Metrics** Provider (OTLP Push). Empty fields inherit from `OTELConfig`.
Expand Down
4 changes: 4 additions & 0 deletions config.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package ion

import (
clickhouseconfig "github.com/JupiterMetaLabs/ion/internal/clickhouse/config"
"github.com/JupiterMetaLabs/ion/internal/config"
)

Expand All @@ -23,6 +24,9 @@ type TracingConfig = config.TracingConfig
// MetricsConfig configures OpenTelemetry metrics export.
type MetricsConfig = config.MetricsConfig

// ClickHouseConfig configures the ClickHouse log sink.
type ClickHouseConfig = clickhouseconfig.Config

// Default returns a Config with sensible production defaults.
func Default() Config {
return config.Default()
Expand Down
Loading
Loading