braintrustdata · Matt Perpick (clutchski) · Oct 22, 2025 · Oct 22, 2025
diff --git a/.DONE.md b/.DONE.md
@@ -0,0 +1,258 @@
+# Braintrust Ruby SDK - Completed Work
+
+## Phase 0: Documentation ✅
+
+- [x] Create .PLAN.md (moved to hidden)
+- [x] Create .TODO.md (moved to hidden)
+
+## Phase 1: Project Setup & Infrastructure ✅
+
+- [x] Create braintrust.gemspec (no runtime deps yet)
+- [x] Create Gemfile
+- [x] Create Rakefile (test, lint, ci tasks only)
+- [x] Create mise.toml with precommit hooks + bundle install
+- [x] Create .env.example
+- [x] Create .github/workflows/ci.yml (uses rake ci)
+- [x] Set up Standard linter config (via Rakefile)
+- [x] Set up SimpleCov config (via test_helper.rb)
+- [x] Create minimal README.md
+- [x] Create minimal CONTRIBUTING.md
+- [x] Create .gitignore
+- [x] Create CHANGELOG.md
+- [x] Create lib/braintrust/version.rb
+- [x] Create lib/braintrust.rb (skeleton)
+- [x] Create test/test_helper.rb
+- [x] Create scripts/install-deps.sh (cross-platform)
+- [x] Create main branch
+- [x] Add rake ci task
+
+## Phase 2: Core State & Configuration (TDD) ✅ COMPLETE
+
+### lib/braintrust/config.rb ✅
+- [x] Write test: parse ENV vars
+- [x] Implement Config.from_env
+- [x] Write test: default values
+- [x] Write test: merge options with ENV vars (options override)
+- [x] Write test: ENV vars override defaults
+- [x] All tests passing, linter clean
+
+### lib/braintrust/state.rb ✅
+- [x] Write test: create state with required fields
+- [x] Write test: validate required fields (api_key required)
+- [x] Write test: state is immutable (frozen)
+- [x] Write test: thread-safe global state access (Mutex)
+- [x] Implement State class
+- [x] Implement State.global getter/setter
+- [x] Implement State validation
+- [x] All tests passing, linter clean
+
+### lib/braintrust.rb ✅
+- [x] Write test: init sets global state by default
+- [x] Write test: init with set_global: false returns state
+- [x] Write test: init merges options with ENV vars
+- [x] Implement Braintrust.init
+- [x] Implement Braintrust.current_state
+- [x] Add blocking_login parameter to Braintrust.init
+- [x] Document all init options explicitly
+- [x] All tests passing, linter clean
+
+### lib/braintrust/api/auth.rb ✅
+- [x] Write test: login with valid API key
+- [x] Write test: login with invalid API key
+- [x] Implement API::Auth.login
+- [x] Implement AuthResult struct
+- [x] Handle 401/403 as invalid API key
+- [x] Handle 400/4xx/5xx with appropriate errors
+- [x] Implement API::Auth.mask_api_key
+- [x] All tests passing (real API tests), linter clean
+
+### lib/braintrust/logger.rb ✅
+- [x] Create logger with DEBUG level when BRAINTRUST_DEBUG=true
+- [x] Implement debug, info, warn, error methods
+- [x] Write to stderr
+
+### lib/braintrust/state.rb (login) ✅
+- [x] Add State#login method
+- [x] Login calls API::Auth.login
+- [x] Login updates state fields (org_id, org_name, api_url, proxy_url, logged_in)
+- [x] Add new attr_readers: org_id, proxy_url, logged_in
+- [x] Remove freeze (allow login to mutate state)
+- [x] All tests passing, linter clean
+
+### examples/login/ ✅
+- [x] Create examples/login/login_basic.rb
+- [x] Demonstrate blocking_login usage
+- [x] Test example runs successfully
+
+## Phase 3: Core Tracing (TDD) - ✅ COMPLETE (Trace.enable)
+
+### Add OpenTelemetry dependencies to braintrust.gemspec ✅
+- [x] Add opentelemetry-sdk runtime dependency
+- [x] Add opentelemetry-exporter-otlp runtime dependency
+- [x] Run bundle install
+
+### lib/braintrust/trace.rb ✅
+- [x] Write test: enable raises error if no state available
+- [x] Write test: enable with explicit state
+- [x] Write test: enable with global state
+- [x] Write test: enable adds console exporter when BRAINTRUST_ENABLE_TRACE_CONSOLE_LOG=true
+- [x] Implement Trace.enable(tracer_provider, state: nil)
+- [x] Configure OTLP HTTP exporter with correct endpoint (api_url/otel/v1/traces)
+- [x] Set Authorization header with API key
+- [x] Register BatchSpanProcessor with tracer provider
+- [x] Add SSL workaround (VERIFY_NONE with TODO)
+- [x] All tests passing (4 tests, 8 assertions), linter clean
+
+### examples/trace/trace_basic.rb ✅
+- [x] Create example demonstrating Trace.enable
+- [x] Show manual span creation with braintrust.parent attribute
+- [x] Test example runs successfully
+
+### lib/braintrust/trace/span_processor.rb ✅
+- [x] Write test: adds braintrust.parent attribute
+- [x] Write test: preserves existing parent attribute
+- [x] Write test: adds braintrust.org attribute
+- [x] Write test: adds braintrust.app_url attribute
+- [x] Implement SpanProcessor class
+- [x] Implement on_start hook (adds default_parent, org, app_url)
+- [x] Implement on_finish hook
+- [x] Wrap OTLP exporter in custom span processor
+- [x] Update State/Config to use single default_parent field
+- [x] Update BRAINTRUST_DEFAULT_PROJECT env var
+- [x] Update example to remove manual parent setting
+- [x] All tests passing (4 tests), linter clean
+
+## Phase 4: OpenAI Integration (TDD) - ✅ COMPLETE (First Pass)
+
+### lib/braintrust/trace/openai.rb ✅
+- [x] Add openai gem as development dependency
+- [x] Write basic test: wrapper creates span for chat.completions
+- [x] Implement basic OpenAI.wrap method
+- [x] Update wrapper to use braintrust.* attributes (match Go SDK)
+  - [x] Use `braintrust.input_json` for input messages (JSON-encoded once)
+  - [x] Use `braintrust.output_json` for output choices (JSON-encoded once)
+  - [x] Use `braintrust.metadata` for request/response metadata (JSON-encoded once)
+  - [x] Use `braintrust.metrics` for token usage (JSON-encoded once)
+- [x] Simplified output using `.to_h` to capture all fields (tool_calls, annotations, etc.)
+- [x] Update test to verify braintrust.input_json contains messages
+- [x] Update test to verify braintrust.output_json contains choices
+- [x] Update test to verify braintrust.metadata contains model, temperature, etc
+- [x] Update test to verify braintrust.metrics contains prompt_tokens, completion_tokens, tokens
+- [x] Update span name to "openai.chat.completions.create" (match Go)
+- [x] Test with real OpenAI API and verify in Braintrust UI
+
+### examples/openai.rb ✅
+- [x] Create openai.rb example with tracing
+- [x] Test example runs successfully
+- [x] Verify traces appear correctly in Braintrust UI with input/output/metadata
+
+### examples/internal/openai.rb ✅
+- [x] Create comprehensive example showcasing all features
+- [x] Vision (image understanding)
+- [x] Tool/function calling
+- [x] Reasoning models (o1-mini with reasoning tokens)
+- [x] Advanced parameters (temperature, top_p, etc.)
+- [x] All examples under single parent trace with permalink
+
+## Phase 6: Evals Framework (TDD) - ✅ MOSTLY COMPLETE
+
+### lib/braintrust/eval/case.rb ✅
+- [x] Write test: Case with input/expected
+- [x] Write test: Case with tags and metadata
+- [x] Implement Case class
+
+### lib/braintrust/eval/scorer.rb ✅
+- [x] Write test: Scorer interface
+- [x] Write test: Scorer helper with block
+- [x] Write test: Scorer returns score
+- [x] Implement Scorer module/class
+- [x] Implement Eval.scorer helper
+
+### lib/braintrust/eval/cases.rb ✅
+- [x] Write test: Cases enumerable
+- [x] Write test: Cases from array
+- [x] Implement Cases class
+
+### lib/braintrust/eval/result.rb ✅
+- [x] Write test: Result with success/failed status
+- [x] Implement Result class
+
+### lib/braintrust/internal/experiments.rb ✅
+- [x] Implement get_or_create for experiment resolution
+- [x] Implement project and experiment registration via API
+
+### lib/braintrust/eval.rb ✅ (Error handling complete)
+- [x] Write test: run with cases array
+- [x] Write test: run resolves project
+- [x] Write test: run resolves experiment
+- [x] Write test: run executes task for each case
+- [x] Write test: run executes scorers
+- [x] Write test: run creates OTEL spans
+- [x] Write test: run with explicit state
+- [x] Write test: run with global state
+- [x] Write test: run handles task errors
+- [x] Write test: run handles scorer errors
+- [x] Write test: task errors record exception events with stacktraces
+- [x] Write test: scorer errors record exception events with stacktraces
+- [x] Implement Eval.run
+- [x] Implement project resolution
+- [x] Implement experiment resolution
+- [x] Implement task execution
+- [x] Implement scorer execution
+- [x] Implement span creation
+- [x] Implement result generation
+- [x] Implement error recording with span.record_exception()
+- [x] Update record_span_error helper to use OpenTelemetry standard
+
+### Error Handling ✅ COMPLETE
+- [x] Task errors recorded on task span with full stacktrace
+- [x] Scorer errors recorded on score span with custom "ScorerError" type
+- [x] Eval span gets error status when child spans fail
+- [x] Exception events include type, message, and stacktrace
+- [x] Backend correctly extracts and populates error field
+- [x] Tests verify stacktrace attribute exists
+- [x] All 72 tests pass with 243 assertions
+
+## Session History
+
+### Session 1 Completed
+- Config class with ENV parsing, defaults, and option merging (4 tests)
+- State class with validation and thread-safe global state (5 tests)
+- Braintrust.init and Braintrust.current_state (3 tests)
+
+### Session 2 Completed
+- Login functionality (API::Auth.login with real API tests)
+- Logger with BRAINTRUST_DEBUG support
+- State#login method (updates org info from API)
+- Updated Braintrust.init with blocking_login option
+- Documented all init options
+- examples/login/login_basic.rb
+- Trace.enable method with OTLP exporter to Braintrust
+- Console debug support with BRAINTRUST_ENABLE_TRACE_CONSOLE_LOG
+- Custom Span Processor with automatic attribute injection
+- Changed to default_parent field (from project_id/project_name)
+- BRAINTRUST_DEFAULT_PROJECT env var (format: "project_name:foo")
+- examples/trace/trace_basic.rb
+- **Total: 21 test runs, 41 assertions, all passing, linter clean**
+
+### Session 3 Completed
+- OpenAI integration with braintrust.* attributes (input_json, output_json, metadata, metrics)
+- Simplified output using `.to_h` to capture all fields including tool_calls
+- Comprehensive test coverage (28 assertions)
+- examples/openai.rb with Trace.permalink
+- examples/internal/openai.rb showcasing vision, tools, reasoning, advanced params
+- Verified traces in Braintrust UI via MCP
+- SSL config improvements
+- **Total: 28 test runs, 82 assertions, all passing, linter clean**
+
+### Session 4 Completed (Error Handling)
+- Fixed error recording to match Go SDK behavior
+- Updated task error handling to use `span.record_exception(e)`
+- Updated `record_span_error` helper to use OpenTelemetry standard
+- Errors now include full stacktraces via exception events
+- Added stacktrace assertions to tests
+- Investigated backend error processing (api-ts/src/otel/collector.ts parseError function)
+- Verified errors populate in Braintrust database via MCP queries
+- Task errors: Full stacktrace on task span, error message on eval span
+- Scorer errors: Full stacktrace on score span with custom "ScorerError" type
+- **Total: 72 test runs, 243 assertions, all passing, linter clean**
diff --git a/.PLAN.md b/.PLAN.md
@@ -63,12 +63,15 @@ Braintrust.with_state(state)      # Temporarily override state
 
 **lib/braintrust/state.rb**
 
-Immutable state container.
+State container with login support.
 
 - Thread-safe global state management
 - Merges ENV vars with explicit options
-- Validates required fields
-- Holds tracer_provider instance
+- Validates required fields (api_key required)
+- Mutable to allow login() to update org info
+- login() method fetches org details from Braintrust API
+- Holds org_id, org_name, api_url, proxy_url after login
+- Will hold tracer_provider instance (Phase 3)
 
 ### Braintrust::Config
 
@@ -83,6 +86,8 @@ ENV vars:
 - `BRAINTRUST_DEFAULT_PROJECT_NAME` - Default project name
 - `BRAINTRUST_APP_URL` - App URL (default: https://www.braintrust.dev)
 - `BRAINTRUST_API_URL` - API URL (default: https://api.braintrust.dev)
+- `BRAINTRUST_DEBUG` - Enable debug logging
+- `BRAINTRUST_ENABLE_TRACE_CONSOLE_LOG` - Enable console trace logging (Phase 3)
 
 ### Braintrust::Trace
 
@@ -260,29 +265,88 @@ Utilities for testing:
 ## Dependencies
 
 ### Runtime
-- `opentelemetry-sdk` (~> 1.5) - OpenTelemetry SDK
-- `opentelemetry-exporter-otlp` (~> 0.29) - OTLP exporter
-- `ruby-openai` (~> 7.0) - OpenAI client
-- `faraday` (~> 2.0) - HTTP client (used by ruby-openai)
+**Note**: Runtime dependencies are added incrementally as features are implemented:
+- Phase 3: `opentelemetry-sdk`, `opentelemetry-exporter-otlp`
+- Phase 4: `ruby-openai`, `faraday`
+- Phase 5: HTTP client for Braintrust API
 
 ### Development
 - `minitest` (~> 5.0) - Testing framework
-- `standard` (~> 1.0) - Linting
-- `simplecov` - Code coverage
-- `rake` - Task automation
+- `rake` (~> 13.0) - Task automation
+- `standard` (~> 1.0) - Linting (zero-config)
+- `simplecov` (~> 0.22) - Code coverage
 
 ### Tools (via mise)
-- Ruby 3.2, 3.3, 3.4
+- Ruby 3.2 (pinned for development)
+- Rust 1.83 (for Ruby compilation)
 - watchexec - File watching for tests
 
 ## Key Differences from Go SDK
 
-1. **State Management**: Hybrid global/explicit vs pure global
+1. **State Management**: Hybrid global/explicit vs pure global (avoids Go SDK's global state issues)
 2. **API Style**: Ruby blocks/procs vs Go functions
 3. **Middleware**: Faraday vs HTTP middleware
 4. **Parallelism**: Threads vs goroutines
 5. **Testing**: Minitest vs testify
-6. **Linting**: Standard vs golangci-lint
+6. **Linting**: Standard (zero-config) vs golangci-lint
+7. **Dependencies**: Added incrementally as needed vs upfront
+
+## Implementation Notes
+
+### Session 1 (2025-10-21)
+
+**Completed**:
+- Full project infrastructure (gemspec, Rakefile, CI/CD)
+- mise.toml with automatic bundle install and precommit hooks
+- Cross-platform dependency installer (scripts/install-deps.sh)
+- Minimal docs (README.md, CONTRIBUTING.md)
+- Moved tracking docs to hidden files (.PLAN.md, .TODO.md)
+- Added `rake ci` task for CI verification
+- Removed build/release tasks (will add when ready to publish)
+- Created main branch
+- Config class with ENV parsing and option merging
+- State class with thread-safe global state management
+- Braintrust.init with set_global option
+
+**Decisions**:
+- Runtime deps added only when needed (not all upfront)
+- Standard linter (zero-config, opinionated)
+- Minitest (Ruby built-in, plain asserts)
+- Simplified docs (essentials only)
+- No system gem installation tasks
+- mise handles Ruby + Rust, brew handles C libraries
+- Hybrid state management (global + explicit state)
+- Mutable state (removed freeze to allow login to update fields)
+
+### Session 2 (2025-10-21)
+
+**Completed**:
+- Login API integration (lib/braintrust/api/auth.rb)
+  - AuthResult struct with org_id, org_name, api_url, proxy_url
+  - Proper HTTP error handling (401/403/400/4xx/5xx)
+  - API key masking for logging
+- Logger module (lib/braintrust/logger.rb)
+  - DEBUG level when BRAINTRUST_DEBUG=true env var set
+  - Outputs to stderr
+- State#login method
+  - Calls API::Auth.login
+  - Updates state with org info from API
+  - Added org_id, proxy_url, logged_in attributes
+- Updated Braintrust.init
+  - Added blocking_login parameter
+  - Documented all options explicitly (not **options)
+- Login example (examples/login/login_basic.rb)
+  - Demonstrates blocking_login usage
+  - Real API integration tests (no mocks)
+
+**Decisions**:
+- Real API tests (not mocks), tests fail if BRAINTRUST_API_KEY not set
+- State.login updates current state (doesn't return new state)
+- Removed state immutability (freeze) to allow login mutations
+- API logic separated into lib/braintrust/api/ module structure
+- Struct-based return values (AuthResult) instead of raw hashes
+- SSL verification workaround for macOS (VERIFY_NONE with TODO)
+- State#login_until_success deferred (background thread with retries)
 
 ## Future Enhancements