Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
258 changes: 258 additions & 0 deletions .DONE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,258 @@
# Braintrust Ruby SDK - Completed Work

## Phase 0: Documentation ✅

- [x] Create .PLAN.md (moved to hidden)
- [x] Create .TODO.md (moved to hidden)

## Phase 1: Project Setup & Infrastructure ✅

- [x] Create braintrust.gemspec (no runtime deps yet)
- [x] Create Gemfile
- [x] Create Rakefile (test, lint, ci tasks only)
- [x] Create mise.toml with precommit hooks + bundle install
- [x] Create .env.example
- [x] Create .github/workflows/ci.yml (uses rake ci)
- [x] Set up Standard linter config (via Rakefile)
- [x] Set up SimpleCov config (via test_helper.rb)
- [x] Create minimal README.md
- [x] Create minimal CONTRIBUTING.md
- [x] Create .gitignore
- [x] Create CHANGELOG.md
- [x] Create lib/braintrust/version.rb
- [x] Create lib/braintrust.rb (skeleton)
- [x] Create test/test_helper.rb
- [x] Create scripts/install-deps.sh (cross-platform)
- [x] Create main branch
- [x] Add rake ci task

## Phase 2: Core State & Configuration (TDD) ✅ COMPLETE

### lib/braintrust/config.rb ✅
- [x] Write test: parse ENV vars
- [x] Implement Config.from_env
- [x] Write test: default values
- [x] Write test: merge options with ENV vars (options override)
- [x] Write test: ENV vars override defaults
- [x] All tests passing, linter clean

### lib/braintrust/state.rb ✅
- [x] Write test: create state with required fields
- [x] Write test: validate required fields (api_key required)
- [x] Write test: state is immutable (frozen)
- [x] Write test: thread-safe global state access (Mutex)
- [x] Implement State class
- [x] Implement State.global getter/setter
- [x] Implement State validation
- [x] All tests passing, linter clean

### lib/braintrust.rb ✅
- [x] Write test: init sets global state by default
- [x] Write test: init with set_global: false returns state
- [x] Write test: init merges options with ENV vars
- [x] Implement Braintrust.init
- [x] Implement Braintrust.current_state
- [x] Add blocking_login parameter to Braintrust.init
- [x] Document all init options explicitly
- [x] All tests passing, linter clean

### lib/braintrust/api/auth.rb ✅
- [x] Write test: login with valid API key
- [x] Write test: login with invalid API key
- [x] Implement API::Auth.login
- [x] Implement AuthResult struct
- [x] Handle 401/403 as invalid API key
- [x] Handle 400/4xx/5xx with appropriate errors
- [x] Implement API::Auth.mask_api_key
- [x] All tests passing (real API tests), linter clean

### lib/braintrust/logger.rb ✅
- [x] Create logger with DEBUG level when BRAINTRUST_DEBUG=true
- [x] Implement debug, info, warn, error methods
- [x] Write to stderr

### lib/braintrust/state.rb (login) ✅
- [x] Add State#login method
- [x] Login calls API::Auth.login
- [x] Login updates state fields (org_id, org_name, api_url, proxy_url, logged_in)
- [x] Add new attr_readers: org_id, proxy_url, logged_in
- [x] Remove freeze (allow login to mutate state)
- [x] All tests passing, linter clean

### examples/login/ ✅
- [x] Create examples/login/login_basic.rb
- [x] Demonstrate blocking_login usage
- [x] Test example runs successfully

## Phase 3: Core Tracing (TDD) - ✅ COMPLETE (Trace.enable)

### Add OpenTelemetry dependencies to braintrust.gemspec ✅
- [x] Add opentelemetry-sdk runtime dependency
- [x] Add opentelemetry-exporter-otlp runtime dependency
- [x] Run bundle install

### lib/braintrust/trace.rb ✅
- [x] Write test: enable raises error if no state available
- [x] Write test: enable with explicit state
- [x] Write test: enable with global state
- [x] Write test: enable adds console exporter when BRAINTRUST_ENABLE_TRACE_CONSOLE_LOG=true
- [x] Implement Trace.enable(tracer_provider, state: nil)
- [x] Configure OTLP HTTP exporter with correct endpoint (api_url/otel/v1/traces)
- [x] Set Authorization header with API key
- [x] Register BatchSpanProcessor with tracer provider
- [x] Add SSL workaround (VERIFY_NONE with TODO)
- [x] All tests passing (4 tests, 8 assertions), linter clean

### examples/trace/trace_basic.rb ✅
- [x] Create example demonstrating Trace.enable
- [x] Show manual span creation with braintrust.parent attribute
- [x] Test example runs successfully

### lib/braintrust/trace/span_processor.rb ✅
- [x] Write test: adds braintrust.parent attribute
- [x] Write test: preserves existing parent attribute
- [x] Write test: adds braintrust.org attribute
- [x] Write test: adds braintrust.app_url attribute
- [x] Implement SpanProcessor class
- [x] Implement on_start hook (adds default_parent, org, app_url)
- [x] Implement on_finish hook
- [x] Wrap OTLP exporter in custom span processor
- [x] Update State/Config to use single default_parent field
- [x] Update BRAINTRUST_DEFAULT_PROJECT env var
- [x] Update example to remove manual parent setting
- [x] All tests passing (4 tests), linter clean

## Phase 4: OpenAI Integration (TDD) - ✅ COMPLETE (First Pass)

### lib/braintrust/trace/openai.rb ✅
- [x] Add openai gem as development dependency
- [x] Write basic test: wrapper creates span for chat.completions
- [x] Implement basic OpenAI.wrap method
- [x] Update wrapper to use braintrust.* attributes (match Go SDK)
- [x] Use `braintrust.input_json` for input messages (JSON-encoded once)
- [x] Use `braintrust.output_json` for output choices (JSON-encoded once)
- [x] Use `braintrust.metadata` for request/response metadata (JSON-encoded once)
- [x] Use `braintrust.metrics` for token usage (JSON-encoded once)
- [x] Simplified output using `.to_h` to capture all fields (tool_calls, annotations, etc.)
- [x] Update test to verify braintrust.input_json contains messages
- [x] Update test to verify braintrust.output_json contains choices
- [x] Update test to verify braintrust.metadata contains model, temperature, etc
- [x] Update test to verify braintrust.metrics contains prompt_tokens, completion_tokens, tokens
- [x] Update span name to "openai.chat.completions.create" (match Go)
- [x] Test with real OpenAI API and verify in Braintrust UI

### examples/openai.rb ✅
- [x] Create openai.rb example with tracing
- [x] Test example runs successfully
- [x] Verify traces appear correctly in Braintrust UI with input/output/metadata

### examples/internal/openai.rb ✅
- [x] Create comprehensive example showcasing all features
- [x] Vision (image understanding)
- [x] Tool/function calling
- [x] Reasoning models (o1-mini with reasoning tokens)
- [x] Advanced parameters (temperature, top_p, etc.)
- [x] All examples under single parent trace with permalink

## Phase 6: Evals Framework (TDD) - ✅ MOSTLY COMPLETE

### lib/braintrust/eval/case.rb ✅
- [x] Write test: Case with input/expected
- [x] Write test: Case with tags and metadata
- [x] Implement Case class

### lib/braintrust/eval/scorer.rb ✅
- [x] Write test: Scorer interface
- [x] Write test: Scorer helper with block
- [x] Write test: Scorer returns score
- [x] Implement Scorer module/class
- [x] Implement Eval.scorer helper

### lib/braintrust/eval/cases.rb ✅
- [x] Write test: Cases enumerable
- [x] Write test: Cases from array
- [x] Implement Cases class

### lib/braintrust/eval/result.rb ✅
- [x] Write test: Result with success/failed status
- [x] Implement Result class

### lib/braintrust/internal/experiments.rb ✅
- [x] Implement get_or_create for experiment resolution
- [x] Implement project and experiment registration via API

### lib/braintrust/eval.rb ✅ (Error handling complete)
- [x] Write test: run with cases array
- [x] Write test: run resolves project
- [x] Write test: run resolves experiment
- [x] Write test: run executes task for each case
- [x] Write test: run executes scorers
- [x] Write test: run creates OTEL spans
- [x] Write test: run with explicit state
- [x] Write test: run with global state
- [x] Write test: run handles task errors
- [x] Write test: run handles scorer errors
- [x] Write test: task errors record exception events with stacktraces
- [x] Write test: scorer errors record exception events with stacktraces
- [x] Implement Eval.run
- [x] Implement project resolution
- [x] Implement experiment resolution
- [x] Implement task execution
- [x] Implement scorer execution
- [x] Implement span creation
- [x] Implement result generation
- [x] Implement error recording with span.record_exception()
- [x] Update record_span_error helper to use OpenTelemetry standard

### Error Handling ✅ COMPLETE
- [x] Task errors recorded on task span with full stacktrace
- [x] Scorer errors recorded on score span with custom "ScorerError" type
- [x] Eval span gets error status when child spans fail
- [x] Exception events include type, message, and stacktrace
- [x] Backend correctly extracts and populates error field
- [x] Tests verify stacktrace attribute exists
- [x] All 72 tests pass with 243 assertions

## Session History

### Session 1 Completed
- Config class with ENV parsing, defaults, and option merging (4 tests)
- State class with validation and thread-safe global state (5 tests)
- Braintrust.init and Braintrust.current_state (3 tests)

### Session 2 Completed
- Login functionality (API::Auth.login with real API tests)
- Logger with BRAINTRUST_DEBUG support
- State#login method (updates org info from API)
- Updated Braintrust.init with blocking_login option
- Documented all init options
- examples/login/login_basic.rb
- Trace.enable method with OTLP exporter to Braintrust
- Console debug support with BRAINTRUST_ENABLE_TRACE_CONSOLE_LOG
- Custom Span Processor with automatic attribute injection
- Changed to default_parent field (from project_id/project_name)
- BRAINTRUST_DEFAULT_PROJECT env var (format: "project_name:foo")
- examples/trace/trace_basic.rb
- **Total: 21 test runs, 41 assertions, all passing, linter clean**

### Session 3 Completed
- OpenAI integration with braintrust.* attributes (input_json, output_json, metadata, metrics)
- Simplified output using `.to_h` to capture all fields including tool_calls
- Comprehensive test coverage (28 assertions)
- examples/openai.rb with Trace.permalink
- examples/internal/openai.rb showcasing vision, tools, reasoning, advanced params
- Verified traces in Braintrust UI via MCP
- SSL config improvements
- **Total: 28 test runs, 82 assertions, all passing, linter clean**

### Session 4 Completed (Error Handling)
- Fixed error recording to match Go SDK behavior
- Updated task error handling to use `span.record_exception(e)`
- Updated `record_span_error` helper to use OpenTelemetry standard
- Errors now include full stacktraces via exception events
- Added stacktrace assertions to tests
- Investigated backend error processing (api-ts/src/otel/collector.ts parseError function)
- Verified errors populate in Braintrust database via MCP queries
- Task errors: Full stacktrace on task span, error message on eval span
- Scorer errors: Full stacktrace on score span with custom "ScorerError" type
- **Total: 72 test runs, 243 assertions, all passing, linter clean**
90 changes: 77 additions & 13 deletions .PLAN.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,12 +63,15 @@ Braintrust.with_state(state) # Temporarily override state

**lib/braintrust/state.rb**

Immutable state container.
State container with login support.

- Thread-safe global state management
- Merges ENV vars with explicit options
- Validates required fields
- Holds tracer_provider instance
- Validates required fields (api_key required)
- Mutable to allow login() to update org info
- login() method fetches org details from Braintrust API
- Holds org_id, org_name, api_url, proxy_url after login
- Will hold tracer_provider instance (Phase 3)

### Braintrust::Config

Expand All @@ -83,6 +86,8 @@ ENV vars:
- `BRAINTRUST_DEFAULT_PROJECT_NAME` - Default project name
- `BRAINTRUST_APP_URL` - App URL (default: https://www.braintrust.dev)
- `BRAINTRUST_API_URL` - API URL (default: https://api.braintrust.dev)
- `BRAINTRUST_DEBUG` - Enable debug logging
- `BRAINTRUST_ENABLE_TRACE_CONSOLE_LOG` - Enable console trace logging (Phase 3)

### Braintrust::Trace

Expand Down Expand Up @@ -260,29 +265,88 @@ Utilities for testing:
## Dependencies

### Runtime
- `opentelemetry-sdk` (~> 1.5) - OpenTelemetry SDK
- `opentelemetry-exporter-otlp` (~> 0.29) - OTLP exporter
- `ruby-openai` (~> 7.0) - OpenAI client
- `faraday` (~> 2.0) - HTTP client (used by ruby-openai)
**Note**: Runtime dependencies are added incrementally as features are implemented:
- Phase 3: `opentelemetry-sdk`, `opentelemetry-exporter-otlp`
- Phase 4: `ruby-openai`, `faraday`
- Phase 5: HTTP client for Braintrust API

### Development
- `minitest` (~> 5.0) - Testing framework
- `standard` (~> 1.0) - Linting
- `simplecov` - Code coverage
- `rake` - Task automation
- `rake` (~> 13.0) - Task automation
- `standard` (~> 1.0) - Linting (zero-config)
- `simplecov` (~> 0.22) - Code coverage

### Tools (via mise)
- Ruby 3.2, 3.3, 3.4
- Ruby 3.2 (pinned for development)
- Rust 1.83 (for Ruby compilation)
- watchexec - File watching for tests

## Key Differences from Go SDK

1. **State Management**: Hybrid global/explicit vs pure global
1. **State Management**: Hybrid global/explicit vs pure global (avoids Go SDK's global state issues)
2. **API Style**: Ruby blocks/procs vs Go functions
3. **Middleware**: Faraday vs HTTP middleware
4. **Parallelism**: Threads vs goroutines
5. **Testing**: Minitest vs testify
6. **Linting**: Standard vs golangci-lint
6. **Linting**: Standard (zero-config) vs golangci-lint
7. **Dependencies**: Added incrementally as needed vs upfront

## Implementation Notes

### Session 1 (2025-10-21)

**Completed**:
- Full project infrastructure (gemspec, Rakefile, CI/CD)
- mise.toml with automatic bundle install and precommit hooks
- Cross-platform dependency installer (scripts/install-deps.sh)
- Minimal docs (README.md, CONTRIBUTING.md)
- Moved tracking docs to hidden files (.PLAN.md, .TODO.md)
- Added `rake ci` task for CI verification
- Removed build/release tasks (will add when ready to publish)
- Created main branch
- Config class with ENV parsing and option merging
- State class with thread-safe global state management
- Braintrust.init with set_global option

**Decisions**:
- Runtime deps added only when needed (not all upfront)
- Standard linter (zero-config, opinionated)
- Minitest (Ruby built-in, plain asserts)
- Simplified docs (essentials only)
- No system gem installation tasks
- mise handles Ruby + Rust, brew handles C libraries
- Hybrid state management (global + explicit state)
- Mutable state (removed freeze to allow login to update fields)

### Session 2 (2025-10-21)

**Completed**:
- Login API integration (lib/braintrust/api/auth.rb)
- AuthResult struct with org_id, org_name, api_url, proxy_url
- Proper HTTP error handling (401/403/400/4xx/5xx)
- API key masking for logging
- Logger module (lib/braintrust/logger.rb)
- DEBUG level when BRAINTRUST_DEBUG=true env var set
- Outputs to stderr
- State#login method
- Calls API::Auth.login
- Updates state with org info from API
- Added org_id, proxy_url, logged_in attributes
- Updated Braintrust.init
- Added blocking_login parameter
- Documented all options explicitly (not **options)
- Login example (examples/login/login_basic.rb)
- Demonstrates blocking_login usage
- Real API integration tests (no mocks)

**Decisions**:
- Real API tests (not mocks), tests fail if BRAINTRUST_API_KEY not set
- State.login updates current state (doesn't return new state)
- Removed state immutability (freeze) to allow login mutations
- API logic separated into lib/braintrust/api/ module structure
- Struct-based return values (AuthResult) instead of raw hashes
- SSL verification workaround for macOS (VERIFY_NONE with TODO)
- State#login_until_success deferred (background thread with retries)

## Future Enhancements

Expand Down
Loading
Loading