diff --git a/CHANGELOG.md b/CHANGELOG.md index 9692515..ff4bc47 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -22,4 +22,20 @@ The format is based on Keep a Changelog and this project follows Semantic Versio ### Added -- Initial public migration tooling baseline with CLI, resource migrators, and tests. +- Initial public release of the Braintrust migration CLI with orchestrated resource migrators and test coverage. +- Support for migrating experiments, datasets, logs, ACLs, and experiment tags between Braintrust environments. +- Opt-in group member user mapping via email matching during ACL migration. +- Pagination support for resource listing, `created_before` filtering, and release/tag-based publishing workflows. +- Opt-in live smoke and E2E validation coverage for concurrency-sensitive migration paths. + +### Changed + +- Refactored migration internals to use the current API surface instead of the older Braintrust API SDK. +- Improved migration resilience around temporary-directory creation, client/config plumbing, and concurrent resource orchestration. +- Hardened CI and release automation with versioned release workflow guidance and pinned GitHub Actions. + +### Fixed + +- Dry-run project discovery is now read-only and no longer creates destination projects. +- File output now consistently uses UTF-8 encoding. +- Restored DAG scheduler and orchestrator compatibility helpers expected by the test suite. diff --git a/README.md b/README.md index 9bb0916..58934b9 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,12 @@ # Braintrust Migration Tool -> **⚠️ WARNING: Large-scale migrations (especially logs/experiments) can be extremely expensive and operationally risky. This tool includes streaming + resumable migration for high-volume event streams, but TB-scale migrations have not been fully soak-tested in production-like conditions. Use with caution and test on a subset first.** +> **⚠️ WARNING: Large-scale migrations can still be expensive and operationally risky, but high-volume event resources are now streamed and resumable. Logs, experiment events, and dataset events use BTQL sorted pagination with checkpointed resume and SDK-backed `logs3` writes. TB-scale migrations have not been fully soak-tested in production-like conditions, so test on a subset first.** A Python CLI & library for migrating Braintrust organizations with maximum fidelity, using direct HTTP requests (via `httpx`) against the Braintrust REST API. ## Overview -This tool provides migration capabilities for Braintrust organizations, handling everything from AI provider credentials to project-level data. **It is best suited for small-scale migrations, such as moving POC/test data to a new deployment.** +This tool provides migration capabilities for Braintrust organizations, handling everything from AI provider credentials to project-level data. It works well for small and medium migrations, and it now has dedicated streaming paths for high-volume logs, experiment events, and dataset events. - **Organization administrators** migrating between environments (dev → staging → prod) - **Teams** consolidating multiple organizations @@ -19,7 +19,7 @@ This tool provides migration capabilities for Braintrust organizations, handling - **Dependency Resolution**: Handles resource dependencies (e.g., functions referenced by prompts, datasets referenced by experiments) - **Organization vs Project Scope**: Org-level resources are migrated once, project-level resources per project - **Real-time Progress**: Live progress indicators and detailed migration reports -- **High-volume Streaming**: Logs, experiment events, and dataset events are migrated via BTQL sorted pagination (by `_pagination_key`) with bounded insert batches +- **High-volume Streaming**: Logs, experiment events, and dataset events are migrated via BTQL sorted pagination (by `_pagination_key`) with checkpointed resume and SDK-backed `logs3` writes - **Resume + Idempotency**: Per-resource/per-experiment checkpoints + a SQLite "seen ids" store enable safe resume and help avoid duplicate inserts/overwrites - **Rate Limit Resilience**: Automatic LIMIT backoff on 500/504 errors (retries with progressively smaller page sizes: 1000 → 500 → 250 → ...) @@ -170,12 +170,12 @@ These settings control BTQL-based streaming for high-volume resources. | Environment Variable | CLI Flag | Default | Description | |---------------------|----------|---------|-------------| | `MIGRATION_EVENTS_FETCH_LIMIT` | — | `1000` | BTQL fetch page size (rows per query) | -| `MIGRATION_EVENTS_INSERT_BATCH_SIZE` | — | `200` | Events per insert API call | +| `MIGRATION_EVENTS_FETCH_GROUP_SIZE` | — | `25` | Number of experiment or dataset ids to group into one BTQL event stream | | `MIGRATION_EVENTS_USE_SEEN_DB` | — | `true` | Use SQLite store for deduplication | | `MIGRATION_LOGS_FETCH_LIMIT` | `--logs-fetch-limit` | *(inherits)* | Override fetch limit for logs only | -| `MIGRATION_LOGS_INSERT_BATCH_SIZE` | `--logs-insert-batch-size` | *(inherits)* | Override insert batch size for logs only |` +| `MIGRATION_LOGS_INSERT_BATCH_SIZE` | `--logs-insert-batch-size` | `200` | Max rows per pre-SDK logs insert chunk before enqueueing to the SDK writer | -Resource-specific overrides follow the pattern `MIGRATION_{RESOURCE}_FETCH_LIMIT`, `MIGRATION_{RESOURCE}_INSERT_BATCH_SIZE`, `MIGRATION_{RESOURCE}_USE_SEEN_DB` where `{RESOURCE}` is `LOGS`, `EXPERIMENT_EVENTS`, or `DATASET_EVENTS`. +Resource-specific overrides follow the pattern `MIGRATION_{RESOURCE}_FETCH_LIMIT` and `MIGRATION_{RESOURCE}_USE_SEEN_DB` where `{RESOURCE}` is `LOGS`, `EXPERIMENT_EVENTS`, or `DATASET_EVENTS`. Logs additionally support `MIGRATION_LOGS_INSERT_BATCH_SIZE` and `MIGRATION_LOGS_USE_VERSION_SNAPSHOT`. #### Insert Request Sizing diff --git a/RELEASING.md b/RELEASING.md index fa248d1..5e92a81 100644 --- a/RELEASING.md +++ b/RELEASING.md @@ -47,6 +47,8 @@ git push origin vX.Y.Z - publish to PyPI - create a GitHub Release +Create the tag only after the release commit is on `main`, so the tag points at an immutable revision whose `pyproject.toml` version and `CHANGELOG.md` entry already match `X.Y.Z`. + ## Hotfixes 1. Branch from latest release tag: `hotfix/X.Y.Z+1` diff --git a/braintrust_migrate/batching.py b/braintrust_migrate/batching.py index 9d44eef..0797b4e 100644 --- a/braintrust_migrate/batching.py +++ b/braintrust_migrate/batching.py @@ -1,4 +1,4 @@ -"""Helpers for batching inserts by both count and approximate payload size.""" +"""Helpers for batching inserts by both count and payload size.""" from __future__ import annotations @@ -7,11 +7,10 @@ from typing import Any -# We deliberately use the *non-compact* json encoding here to slightly -# over-estimate payload sizes vs compact separators. This works well with a -# headroom ratio to avoid flirting with gateway limits. +# Measure the actual UTF-8 payload bytes that will be sent over the wire. +# This keeps batching aligned with gateway limits even for non-ASCII content. def approx_json_bytes(obj: Any) -> int: - return len(json.dumps(obj, ensure_ascii=False)) + return len(json.dumps(obj, ensure_ascii=False).encode("utf-8")) _EMPTY_EVENTS_WRAPPER_BYTES = approx_json_bytes({"events": []}) @@ -25,7 +24,6 @@ def approx_events_insert_payload_bytes( """Approximate the JSON payload bytes for `{"events": events}`. This uses a fast additive estimate: wrapper overhead + sum(event bytes) + commas. - It intentionally overestimates in common cases. """ if not events: return _EMPTY_EVENTS_WRAPPER_BYTES diff --git a/braintrust_migrate/btql.py b/braintrust_migrate/btql.py index ebaa006..edbdff7 100644 --- a/braintrust_migrate/btql.py +++ b/braintrust_migrate/btql.py @@ -11,6 +11,7 @@ from __future__ import annotations from collections.abc import Callable +from importlib.metadata import PackageNotFoundError, version from typing import Any, cast import httpx @@ -19,6 +20,13 @@ from braintrust_migrate.client import BraintrustClient +def _client_version() -> str: + try: + return version("braintrust-migrate") + except PackageNotFoundError: + return "dev" + + def btql_quote(s: str) -> str: """Escape a string for inclusion in a single-quoted BTQL/SQL literal.""" return s.replace("\\", "\\\\").replace("'", "\\'") @@ -98,9 +106,12 @@ async def _do_btql(*, op: str, query_text: str) -> Any: # All modern deployments are Brainstore-backed; do not attempt # a Postgres fallback. "use_brainstore": True, + "client_version": _client_version(), + "query_timeout_seconds": int(timeout_seconds), }, timeout=timeout_seconds, ), + non_retryable_statuses={500, 504}, ) async def _fetch_one_limit(n: int) -> Any: diff --git a/braintrust_migrate/cli.py b/braintrust_migrate/cli.py index bacff90..aeb054b 100644 --- a/braintrust_migrate/cli.py +++ b/braintrust_migrate/cli.py @@ -817,6 +817,8 @@ def hook(update: dict[str, Any], *, _label: str = label) -> None: inserted_last = update.get("inserted_last") inserted_bytes_last = update.get("inserted_bytes_last") insert_seconds = update.get("insert_seconds") + pending_buffered_rows = update.get("pending_buffered_rows") + pending_buffered_bytes = update.get("pending_buffered_bytes") gb_part = "" if isinstance(inserted_bytes, int): @@ -844,25 +846,34 @@ def hook(update: dict[str, Any], *, _label: str = label) -> None: # Per-resource context page_part = f" page={page_num}" if page_num is not None else "" + pending_part = "" + if ( + isinstance(pending_buffered_rows, int) + and pending_buffered_rows > 0 + ): + pending_part = f" buffered={pending_buffered_rows}" + if isinstance(pending_buffered_bytes, int): + pending_gb = pending_buffered_bytes / 1_000_000_000 + pending_part += f" pending_gb={pending_gb:.3f}" if update.get("resource") == "experiment_events": desc = ( f"{_label} ({_project_name}):{page_part}" f" fetched={fetched} inserted={inserted}" - f"{gb_part}{batch_rate_part}" + f"{gb_part}{pending_part}{batch_rate_part}" ) elif update.get("resource") == "dataset_events": desc = ( f"{_label} ({_project_name}):{page_part}" f" fetched={fetched} inserted={inserted}" - f"{gb_part}{batch_rate_part}" + f"{gb_part}{pending_part}{batch_rate_part}" ) else: # logs desc = ( f"{_label} ({_project_name}):{page_part}" f" fetched={fetched} inserted={inserted}" - f"{gb_part}{batch_rate_part}" + f"{gb_part}{pending_part}{batch_rate_part}" ) if phase == "done": diff --git a/braintrust_migrate/client.py b/braintrust_migrate/client.py index ae97479..fd55aff 100644 --- a/braintrust_migrate/client.py +++ b/braintrust_migrate/client.py @@ -406,7 +406,13 @@ async def check_brainstore_enabled(self) -> bool: self._logger.warning("Could not determine Brainstore status", error=str(e)) return False - async def with_retry(self, operation_name: str, coro_func): + async def with_retry( + self, + operation_name: str, + coro_func, + *, + non_retryable_statuses: set[int] | None = None, + ): """Execute a coroutine function with adaptive retry logic. Args: @@ -457,6 +463,11 @@ def _classify_exception( if isinstance(exc, httpx.HTTPStatusError) and exc.response is not None: status = int(exc.response.status_code) retry_after = _parse_retry_after_seconds(exc.response) + if ( + non_retryable_statuses is not None + and status in non_retryable_statuses + ): + return False, status, retry_after if status == HTTP_STATUS_TOO_MANY_REQUESTS: return True, status, retry_after if status in {408, 409, 425, 500, 502, 503, 504}: diff --git a/braintrust_migrate/config.py b/braintrust_migrate/config.py index e5cfafc..42570ff 100644 --- a/braintrust_migrate/config.py +++ b/braintrust_migrate/config.py @@ -186,20 +186,19 @@ class MigrationConfig(BaseModel): le=10_000, description="Fetch page size for streaming experiment events via BTQL (limit is in rows/spans)", ) - experiment_events_insert_batch_size: int = Field( - default=200, - ge=1, - le=10_000, - description="Insert batch size for streaming experiment events (number of events per insert call)", - ) - experiment_events_use_version_snapshot: bool = Field( - default=True, - description="Pin a stable snapshot version for streaming experiment event migration", - ) experiment_events_use_seen_db: bool = Field( default=True, description="Use a SQLite seen-id store to prevent older versions overwriting newer ones during experiment pagination", ) + events_fetch_group_size: int = Field( + default=25, + ge=1, + le=1_000, + description=( + "Number of dataset or experiment ids to group into a single BTQL event stream. " + "Used for grouped dataset/experiment event fetches." + ), + ) # Dataset event migration tuning dataset_events_fetch_limit: int = Field( @@ -208,16 +207,6 @@ class MigrationConfig(BaseModel): le=10_000, description="Fetch page size for streaming dataset events via BTQL (limit is in rows/spans)", ) - dataset_events_insert_batch_size: int = Field( - default=200, - ge=1, - le=10_000, - description="Insert batch size for streaming dataset events (number of events per insert call)", - ) - dataset_events_use_version_snapshot: bool = Field( - default=True, - description="Pin a stable snapshot version for streaming dataset event migration", - ) dataset_events_use_seen_db: bool = Field( default=True, description="Use a SQLite seen-id store to prevent older versions overwriting newer ones during dataset pagination", @@ -379,8 +368,8 @@ def from_env(cls) -> "Config": # # Unified: # MIGRATION_EVENTS_FETCH_LIMIT=1000 - # MIGRATION_EVENTS_INSERT_BATCH_SIZE=200 # MIGRATION_EVENTS_USE_SEEN_DB=true + # MIGRATION_EVENTS_FETCH_GROUP_SIZE=25 # # Resource-specific overrides (optional): # MIGRATION_LOGS_FETCH_LIMIT, MIGRATION_EXPERIMENT_EVENTS_FETCH_LIMIT, etc. @@ -392,20 +381,20 @@ def _get_bool(specific_key: str, unified_key: str, default: str) -> bool: val = os.getenv(specific_key) or os.getenv(unified_key, default) return val.lower() in {"1", "true", "yes", "y", "on"} + events_fetch_group_size = int( + os.getenv("MIGRATION_EVENTS_FETCH_GROUP_SIZE", "25") + ) + # Logs logs_fetch_limit = _get_int( "MIGRATION_LOGS_FETCH_LIMIT", "MIGRATION_EVENTS_FETCH_LIMIT", "1000" ) - logs_insert_batch_size = _get_int( - "MIGRATION_LOGS_INSERT_BATCH_SIZE", - "MIGRATION_EVENTS_INSERT_BATCH_SIZE", - "200", - ) - logs_use_version_snapshot = _get_bool( - "MIGRATION_LOGS_USE_VERSION_SNAPSHOT", - "MIGRATION_EVENTS_USE_VERSION_SNAPSHOT", - "true", + logs_insert_batch_size = int( + os.getenv("MIGRATION_LOGS_INSERT_BATCH_SIZE", "200") ) + logs_use_version_snapshot = os.getenv( + "MIGRATION_LOGS_USE_VERSION_SNAPSHOT", "true" + ).lower() in {"1", "true", "yes", "y", "on"} logs_use_seen_db = _get_bool( "MIGRATION_LOGS_USE_SEEN_DB", "MIGRATION_EVENTS_USE_SEEN_DB", "true" ) @@ -420,16 +409,6 @@ def _get_bool(specific_key: str, unified_key: str, default: str) -> bool: "MIGRATION_EVENTS_FETCH_LIMIT", "1000", ) - experiment_events_insert_batch_size = _get_int( - "MIGRATION_EXPERIMENT_EVENTS_INSERT_BATCH_SIZE", - "MIGRATION_EVENTS_INSERT_BATCH_SIZE", - "200", - ) - experiment_events_use_version_snapshot = _get_bool( - "MIGRATION_EXPERIMENT_EVENTS_USE_VERSION_SNAPSHOT", - "MIGRATION_EVENTS_USE_VERSION_SNAPSHOT", - "true", - ) experiment_events_use_seen_db = _get_bool( "MIGRATION_EXPERIMENT_EVENTS_USE_SEEN_DB", "MIGRATION_EVENTS_USE_SEEN_DB", @@ -442,16 +421,6 @@ def _get_bool(specific_key: str, unified_key: str, default: str) -> bool: "MIGRATION_EVENTS_FETCH_LIMIT", "1000", ) - dataset_events_insert_batch_size = _get_int( - "MIGRATION_DATASET_EVENTS_INSERT_BATCH_SIZE", - "MIGRATION_EVENTS_INSERT_BATCH_SIZE", - "200", - ) - dataset_events_use_version_snapshot = _get_bool( - "MIGRATION_DATASET_EVENTS_USE_VERSION_SNAPSHOT", - "MIGRATION_EVENTS_USE_VERSION_SNAPSHOT", - "true", - ) dataset_events_use_seen_db = _get_bool( "MIGRATION_DATASET_EVENTS_USE_SEEN_DB", "MIGRATION_EVENTS_USE_SEEN_DB", @@ -536,12 +505,9 @@ def _get_bool(specific_key: str, unified_key: str, default: str) -> bool: created_after=created_after, created_before=created_before, experiment_events_fetch_limit=experiment_events_fetch_limit, - experiment_events_insert_batch_size=experiment_events_insert_batch_size, - experiment_events_use_version_snapshot=experiment_events_use_version_snapshot, experiment_events_use_seen_db=experiment_events_use_seen_db, + events_fetch_group_size=events_fetch_group_size, dataset_events_fetch_limit=dataset_events_fetch_limit, - dataset_events_insert_batch_size=dataset_events_insert_batch_size, - dataset_events_use_version_snapshot=dataset_events_use_version_snapshot, dataset_events_use_seen_db=dataset_events_use_seen_db, copy_attachments=copy_attachments, attachment_max_bytes=attachment_max_bytes, diff --git a/braintrust_migrate/insert_bisect.py b/braintrust_migrate/insert_bisect.py deleted file mode 100644 index 3299074..0000000 --- a/braintrust_migrate/insert_bisect.py +++ /dev/null @@ -1,54 +0,0 @@ -"""Insert resilience helpers. - -Streaming migrators (logs / experiment events / dataset events) need to handle -HTTP 413 payload limits robustly. We do this by bisecting batches (preserving -order) until the payload is accepted, and isolating single events that are too -large to ever insert. -""" - -from __future__ import annotations - -import math -from collections.abc import Awaitable, Callable -from typing import TypeVar - -T = TypeVar("T") -R = TypeVar("R") - - -async def insert_with_413_bisect( - items: list[T], - *, - insert_fn: Callable[[list[T]], Awaitable[R]], - is_http_413: Callable[[Exception], bool], - on_success: Callable[[list[T], R], Awaitable[None]] | None = None, - on_single_413: Callable[[T, Exception], Awaitable[None]] | None = None, -) -> None: - """Insert items, bisecting on 413 to isolate oversized payloads. - - - Preserves the original item order by always processing the left half first. - - If a singleton batch triggers 413, invokes `on_single_413` (if provided) and re-raises. - """ - stack: list[list[T]] = [items] - while stack: - batch = stack.pop() - if not batch: - continue - try: - res = await insert_fn(batch) - if on_success is not None: - await on_success(batch, res) - except Exception as e: - if is_http_413(e): - if len(batch) == 1: - if on_single_413 is not None: - await on_single_413(batch[0], e) - raise - mid = math.ceil(len(batch) / 2) - left = batch[:mid] - right = batch[mid:] - # Process left first for determinism. - stack.append(right) - stack.append(left) - continue - raise diff --git a/braintrust_migrate/orchestration.py b/braintrust_migrate/orchestration.py index e8a31e9..aae97e7 100644 --- a/braintrust_migrate/orchestration.py +++ b/braintrust_migrate/orchestration.py @@ -658,8 +658,6 @@ async def _migrate_project( project_checkpoint_dir, self.config.migration.batch_size, events_fetch_limit=self.config.migration.dataset_events_fetch_limit, - events_insert_batch_size=self.config.migration.dataset_events_insert_batch_size, - events_use_version_snapshot=self.config.migration.dataset_events_use_version_snapshot, events_use_seen_db=self.config.migration.dataset_events_use_seen_db, events_progress_hook=progress_hook, ) @@ -671,8 +669,6 @@ async def _migrate_project( project_checkpoint_dir, self.config.migration.batch_size, events_fetch_limit=self.config.migration.experiment_events_fetch_limit, - events_insert_batch_size=self.config.migration.experiment_events_insert_batch_size, - events_use_version_snapshot=self.config.migration.experiment_events_use_version_snapshot, events_use_seen_db=self.config.migration.experiment_events_use_seen_db, events_progress_hook=progress_hook, ) @@ -736,6 +732,7 @@ async def _migrate_project( migrated=resource_results["migrated"], skipped=resource_results["skipped"], failed=resource_results["failed"], + skip_summary=resource_results.get("skip_summary") or None, ) # Notify callback for real-time console feedback @@ -1250,6 +1247,7 @@ async def _migrate_organization_resources( migrated=resource_results["migrated"], skipped=resource_results["skipped"], failed=resource_results["failed"], + skip_summary=resource_results.get("skip_summary") or None, ) except Exception as e: diff --git a/braintrust_migrate/resources/base.py b/braintrust_migrate/resources/base.py index bbf731b..6ca89e5 100644 --- a/braintrust_migrate/resources/base.py +++ b/braintrust_migrate/resources/base.py @@ -1076,6 +1076,18 @@ async def migrate_all( # Save state after each batch self._save_state() + skip_reasons: dict[str, int] = {} + for detail in skipped_details: + reason = detail["skip_reason"] + skip_reasons[reason] = skip_reasons.get(reason, 0) + 1 + + skip_summary_text = ", ".join( + [ + f"{count} {reason.replace('_', ' ')}" + for reason, count in skip_reasons.items() + ] + ) + # Log detailed summary self._logger.info( f"Completed migration of {self.resource_name}", @@ -1083,25 +1095,13 @@ async def migrate_all( migrated=migrated_count, skipped=skipped_count, failed=failed_count, + skip_summary=skip_summary_text or None, ) # Log brief skip summary if any - make this very visible if skipped_details: - skip_reasons = {} - for detail in skipped_details: - reason = detail["skip_reason"] - skip_reasons[reason] = skip_reasons.get(reason, 0) + 1 - - # Create a brief, readable summary - skip_summary = ", ".join( - [ - f"{count} {reason.replace('_', ' ')}" - for reason, count in skip_reasons.items() - ] - ) - self._logger.info( - f"📋 {self.resource_name} skipped: {skip_summary}", + f"📋 {self.resource_name} skipped: {skip_summary_text}", breakdown=skip_reasons, ) @@ -1113,5 +1113,7 @@ async def migrate_all( "failed": failed_count, "errors": errors, "skipped_details": skipped_details, + "skip_breakdown": skip_reasons, + "skip_summary": skip_summary_text, "migrated_details": migrated_details, } diff --git a/braintrust_migrate/resources/datasets.py b/braintrust_migrate/resources/datasets.py index 682ac45..916c726 100644 --- a/braintrust_migrate/resources/datasets.py +++ b/braintrust_migrate/resources/datasets.py @@ -3,6 +3,7 @@ from __future__ import annotations import json as _json +import hashlib from collections.abc import Callable from pathlib import Path from typing import Any, ClassVar @@ -16,11 +17,12 @@ fetch_btql_sorted_page_with_retries, ) from braintrust_migrate.resources.base import MigrationResult, ResourceMigrator +from braintrust_migrate.sdk_logs import SDKDatasetWriter from braintrust_migrate.streaming_utils import ( EventsStreamState, SeenIdsDB, build_btql_sorted_page_query, - stream_btql_sorted_events, + stream_btql_sorted_events_buffered, ) logger = structlog.get_logger(__name__) @@ -29,6 +31,20 @@ HTTP_STATUS_REQUEST_ENTITY_TOO_LARGE = 413 +def _coerce_int_config( + cfg: Any, attr_name: str, default: int, *, minimum: int | None = None +) -> int: + value = getattr(cfg, attr_name, default) + if not isinstance(value, int): + try: + value = int(value) + except Exception: + value = default + if minimum is not None and value < minimum: + return default + return value + + class DatasetMigrator(ResourceMigrator[dict]): """Migrator for Braintrust datasets. @@ -41,6 +57,10 @@ class DatasetMigrator(ResourceMigrator[dict]): Uses raw API requests instead of SDK to avoid model dependencies. """ + SDK_FLUSH_MAX_ROWS: ClassVar[int] = 5_000 + SDK_FLUSH_MAX_BYTES: ClassVar[int] = 25 * 1024 * 1024 + DEFAULT_EVENT_FETCH_GROUP_SIZE: ClassVar[int] = 25 + def __init__( self, source_client, @@ -48,9 +68,7 @@ def __init__( checkpoint_dir: Path, batch_size: int = 100, *, - events_fetch_limit: int = 50, - events_insert_batch_size: int = 200, - events_use_version_snapshot: bool = True, + events_fetch_limit: int = 1000, events_use_seen_db: bool = True, events_progress_hook: Callable[[dict[str, Any]], None] | None = None, ) -> None: @@ -58,8 +76,6 @@ def __init__( source_client, dest_client, checkpoint_dir, batch_size=batch_size ) self.events_fetch_limit = events_fetch_limit - self.events_insert_batch_size = events_insert_batch_size - self.events_use_version_snapshot = events_use_version_snapshot self.events_use_seen_db = events_use_seen_db self._events_progress_hook = events_progress_hook self._attachment_copier: AttachmentCopier | None = None @@ -86,6 +102,14 @@ def __init__( self._insert_max_bytes: int | None = int(max_req * headroom) except Exception: self._insert_max_bytes = None + self._sdk_flush_max_rows = int(self.SDK_FLUSH_MAX_ROWS) + self._sdk_flush_max_bytes = int(self.SDK_FLUSH_MAX_BYTES) + self._event_fetch_group_size = _coerce_int_config( + cfg, + "events_fetch_group_size", + self.DEFAULT_EVENT_FETCH_GROUP_SIZE, + minimum=1, + ) @property def resource_name(self) -> str: @@ -310,30 +334,36 @@ async def _migrate_records_for_datasets( dataset_count=len(successful_migrations), ) - for result in successful_migrations: - try: + async def _flush_group(group: list[MigrationResult]) -> None: + if not group: + return + source_to_dest: dict[str, str] = {} + for result in group: if result.dest_id is None: - raise ValueError( - "Dataset migrated without dest_id; cannot copy records" - ) - await self._migrate_dataset_records(result.source_id, result.dest_id) + raise ValueError("Dataset migrated without dest_id; cannot copy records") + source_to_dest[result.source_id] = result.dest_id - # Update metadata to indicate records are migrated + await self._migrate_dataset_records_streaming_grouped(source_to_dest) + + for result in group: if result.metadata: result.metadata["records_pending"] = False result.metadata["records_migrated"] = True + for start in range(0, len(successful_migrations), self._event_fetch_group_size): + group = successful_migrations[start : start + self._event_fetch_group_size] + try: + await _flush_group(group) except Exception as e: self._logger.error( - "Failed to migrate records for dataset", - source_id=result.source_id, - dest_id=result.dest_id, + "Failed to migrate records for dataset group", + source_ids=[result.source_id for result in group], error=str(e), ) - # Update metadata to indicate record migration failed - if result.metadata: - result.metadata["records_pending"] = False - result.metadata["records_failed"] = True + for result in group: + if result.metadata: + result.metadata["records_pending"] = False + result.metadata["records_failed"] = True self._logger.info( "Completed bulk record migration", @@ -453,8 +483,15 @@ def _event_to_insert( if event.get("created") is not None: origin["created"] = event.get("created") out["origin"] = origin + out["dataset_id"] = source_dataset_id return out + def _event_to_insert_from_row(self, event: dict[str, Any]) -> dict[str, Any]: + source_dataset_id = event.get("dataset_id") + if not isinstance(source_dataset_id, str) or not source_dataset_id: + raise ValueError("Fetched dataset event missing dataset_id") + return self._event_to_insert(event, source_dataset_id) + async def _fetch_dataset_events_page( self, *, @@ -468,22 +505,23 @@ async def _fetch_dataset_events_page( _ = cursor _ = version return await self._fetch_dataset_events_page_btql_sorted( - dataset_id=dataset_id, limit=limit, state=state + dataset_ids=[dataset_id], limit=limit, state=state ) async def _fetch_dataset_events_page_btql_sorted( self, *, - dataset_id: str, + dataset_ids: list[str], limit: int, state: EventsStreamState, ) -> dict[str, Any]: """Fetch one page via POST /btql using native BTQL syntax, sorted by _pagination_key.""" last_pagination_key = state.btql_min_pagination_key + quoted_ids = ", ".join(f"'{btql_quote(dataset_id)}'" for dataset_id in dataset_ids) def _query_text_for_limit(n: int) -> str: return build_btql_sorted_page_query( - from_expr=f"dataset('{btql_quote(dataset_id)}') spans", + from_expr=f"dataset({quoted_ids}) spans", limit=n, last_pagination_key=last_pagination_key, select="*", @@ -494,23 +532,10 @@ def _query_text_for_limit(n: int) -> str: query_for_limit=_query_text_for_limit, configured_limit=int(limit), operation="btql_dataset_events_page", - log_fields={"source_dataset_id": dataset_id}, + log_fields={"source_dataset_ids": dataset_ids}, timeout_seconds=120.0, ) - async def _insert_dataset_events( - self, *, dataset_id: str, events: list[dict[str, Any]] - ) -> None: - await self.dest_client.with_retry( - "insert_dataset_events", - lambda: self.dest_client.raw_request( - "POST", - f"/v1/dataset/{dataset_id}/insert", - json={"events": events}, - timeout=120.0, - ), - ) - @staticmethod def _is_http_413(exc: Exception) -> bool: return ( @@ -591,17 +616,56 @@ def _dump_oversize_event_summary( cursor=cursor, ) + @staticmethod + def _group_stream_basename(source_dataset_ids: list[str]) -> str: + if len(source_dataset_ids) == 1: + return source_dataset_ids[0] + joined = ",".join(source_dataset_ids) + digest = hashlib.sha1(joined.encode("utf-8")).hexdigest()[:12] + return f"group_{digest}" + + async def _insert_dataset_events_grouped( + self, + *, + batch: list[dict[str, Any]], + writers_by_source: dict[str, SDKDatasetWriter], + ) -> None: + grouped: dict[str, list[dict[str, Any]]] = {} + for event in batch: + source_dataset_id = event.get("dataset_id") + if not isinstance(source_dataset_id, str) or not source_dataset_id: + raise ValueError("Dataset event batch missing source dataset_id") + grouped.setdefault(source_dataset_id, []).append(event) + + for source_dataset_id, rows in grouped.items(): + writer = writers_by_source.get(source_dataset_id) + if writer is None: + raise KeyError(f"No destination writer for source dataset {source_dataset_id}") + await self.dest_client.with_retry( + "insert_dataset_events", + lambda rows=rows, writer=writer: writer.write_rows(rows), + ) + async def _migrate_dataset_records_streaming( self, source_dataset_id: str, dest_dataset_id: str + ) -> None: + await self._migrate_dataset_records_streaming_grouped( + {source_dataset_id: dest_dataset_id} + ) + + async def _migrate_dataset_records_streaming_grouped( + self, source_to_dest_dataset_ids: dict[str, str] ) -> None: events_dir = self.checkpoint_dir / "dataset_events" events_dir.mkdir(parents=True, exist_ok=True) + source_dataset_ids = list(source_to_dest_dataset_ids.keys()) + group_basename = self._group_stream_basename(source_dataset_ids) - state_path = events_dir / f"{source_dataset_id}_state.json" + state_path = events_dir / f"{group_basename}_state.json" state = EventsStreamState.from_path(state_path) seen_db = ( - SeenIdsDB(str(events_dir / f"{source_dataset_id}_seen.sqlite3")) + SeenIdsDB(str(events_dir / f"{group_basename}_seen.sqlite3")) if self.events_use_seen_db else None ) @@ -613,12 +677,14 @@ async def _migrate_dataset_records_streaming( # beginning and rely on seen_db for idempotency. self._logger.warning( "Dataset events checkpoint contains legacy cursor but no btql_min_pagination_key; restarting BTQL stream from beginning", - source_dataset_id=source_dataset_id, + source_dataset_ids=source_dataset_ids, ) state.cursor = None - # BTQL-based streaming does not use version snapshots. - version = None + writers_by_source = { + source_dataset_id: SDKDatasetWriter(self.dest_client, dest_dataset_id) + for source_dataset_id, dest_dataset_id in source_to_dest_dataset_ids.items() + } progress = self._events_progress_hook @@ -629,9 +695,9 @@ def _save_state() -> None: async def _fetch(n: int) -> dict[str, Any]: return await self._fetch_dataset_events_page( - dataset_id=source_dataset_id, + dataset_id=source_dataset_ids[0], cursor=None, - version=version, + version=None, limit=n, state=state, ) @@ -640,69 +706,95 @@ async def _on_single_413(event: dict[str, Any], err: Exception) -> None: self._dump_oversize_event_summary( events_dir=events_dir, cursor=state.btql_min_pagination_key, - dest_dataset_id=dest_dataset_id, + dest_dataset_id=( + source_to_dest_dataset_ids.get(source_dataset_id) + if isinstance(source_dataset_id, str) + else None + ) + or "unknown", event=event, error=err, ) - await stream_btql_sorted_events( - fetch_page=_fetch, + async def _fetch_group(n: int) -> dict[str, Any]: + return await self._fetch_dataset_events_page_btql_sorted( + dataset_ids=source_dataset_ids, + limit=n, + state=state, + ) + + event_to_insert = ( + self._event_to_insert_from_row + if len(source_dataset_ids) > 1 + else lambda event, _source_dataset_id=source_dataset_ids[0]: self._event_to_insert( + event, _source_dataset_id + ) + ) + + await stream_btql_sorted_events_buffered( + fetch_page=_fetch_group, page_limit=int(self.events_fetch_limit), - get_last_pk=lambda: state.btql_min_pagination_key, - set_last_pk=lambda pk: setattr(state, "btql_min_pagination_key", pk), + state=state, save_state=_save_state, page_event_filter=lambda e: e.get("_object_delete") is True, - event_to_insert=lambda e: self._event_to_insert(e, source_dataset_id), + event_to_insert=event_to_insert, seen_db=seen_db, - insert_batch_size=int(self.events_insert_batch_size), - insert_max_bytes=self._insert_max_bytes, rewrite_event_in_place=( None if self._attachment_copier is None else self._attachment_copier.rewrite_event_in_place ), - insert_events=lambda batch: self._insert_dataset_events( - dataset_id=dest_dataset_id, events=batch + insert_events=lambda batch: self._insert_dataset_events_grouped( + batch=batch, + writers_by_source=writers_by_source, ), + flush_max_rows=self._sdk_flush_max_rows, + flush_max_bytes=self._sdk_flush_max_bytes, is_http_413=self._is_http_413, on_single_413=_on_single_413, - incr_fetched=lambda n: setattr( - state, "fetched_events", int(state.fetched_events) + int(n) - ), - incr_inserted=lambda n: setattr( - state, "inserted_events", int(state.inserted_events) + int(n) - ), - incr_inserted_bytes=lambda n: setattr( - state, "inserted_bytes", int(state.inserted_bytes) + int(n) - ), - incr_skipped_deleted=lambda n: setattr( - state, "skipped_deleted", int(state.skipped_deleted) + int(n) - ), - incr_skipped_seen=lambda n: setattr( - state, "skipped_seen", int(state.skipped_seen) + int(n) - ), - incr_attachments_copied=lambda n: setattr( - state, - "attachments_copied", - int(state.attachments_copied) + int(n), - ), hooks=None if progress is None else { + "on_fetch": lambda info, _p=progress: _p( + { + "resource": "dataset_events", + "phase": "fetch", + "source_dataset_ids": source_dataset_ids, + "dest_dataset_ids": list(source_to_dest_dataset_ids.values()), + "page_num": info.get("page_num"), + "page_events": info.get("page_events"), + "fetched_total": info.get("fetched_total"), + "inserted_total": info.get("inserted_total"), + "inserted_bytes_total": info.get("inserted_bytes_total"), + "skipped_deleted_total": info.get("skipped_deleted_total"), + "skipped_seen_total": info.get("skipped_seen_total"), + "attachments_copied_total": info.get("attachments_copied_total"), + "pending_buffered_rows": info.get("pending_buffered_rows"), + "pending_buffered_bytes": info.get("pending_buffered_bytes"), + "cursor": ( + (state.btql_min_pagination_key[:16] + "…") + if isinstance(state.btql_min_pagination_key, str) + else None + ), + "next_cursor": None, + } + ), "on_page": lambda info, _p=progress: _p( { "resource": "dataset_events", "phase": "page", - "source_dataset_id": source_dataset_id, - "dest_dataset_id": dest_dataset_id, + "source_dataset_ids": source_dataset_ids, + "dest_dataset_ids": list(source_to_dest_dataset_ids.values()), "page_num": info.get("page_num"), "page_events": info.get("page_events"), - "fetched_total": state.fetched_events, - "inserted_total": state.inserted_events, - "inserted_bytes_total": state.inserted_bytes, - "skipped_deleted_total": state.skipped_deleted, - "skipped_seen_total": state.skipped_seen, - "attachments_copied_total": state.attachments_copied, + "fetched_total": info.get("fetched_total"), + "inserted_total": info.get("inserted_total"), + "inserted_bytes_total": info.get("inserted_bytes_total"), + "skipped_deleted_total": info.get("skipped_deleted_total"), + "skipped_seen_total": info.get("skipped_seen_total"), + "attachments_copied_total": info.get("attachments_copied_total"), + "pending_buffered_rows": info.get("pending_buffered_rows"), + "pending_buffered_bytes": info.get("pending_buffered_bytes"), "cursor": ( (state.btql_min_pagination_key[:16] + "…") if isinstance(state.btql_min_pagination_key, str) @@ -711,18 +803,20 @@ async def _on_single_413(event: dict[str, Any], err: Exception) -> None: "next_cursor": None, } ), - "on_done": lambda _info, _p=progress: _p( + "on_done": lambda info, _p=progress: _p( { "resource": "dataset_events", "phase": "done", - "source_dataset_id": source_dataset_id, - "dest_dataset_id": dest_dataset_id, - "fetched_total": state.fetched_events, - "inserted_total": state.inserted_events, - "inserted_bytes_total": state.inserted_bytes, - "skipped_deleted_total": state.skipped_deleted, - "skipped_seen_total": state.skipped_seen, - "attachments_copied_total": state.attachments_copied, + "source_dataset_ids": source_dataset_ids, + "dest_dataset_ids": list(source_to_dest_dataset_ids.values()), + "fetched_total": info.get("fetched_total"), + "inserted_total": info.get("inserted_total"), + "inserted_bytes_total": info.get("inserted_bytes_total"), + "skipped_deleted_total": info.get("skipped_deleted_total"), + "skipped_seen_total": info.get("skipped_seen_total"), + "attachments_copied_total": info.get("attachments_copied_total"), + "pending_buffered_rows": info.get("pending_buffered_rows"), + "pending_buffered_bytes": info.get("pending_buffered_bytes"), "cursor": None, "next_cursor": None, } @@ -732,8 +826,8 @@ async def _on_single_413(event: dict[str, Any], err: Exception) -> None: self._logger.info( "Migrated dataset records (streaming)", - source_dataset_id=source_dataset_id, - dest_dataset_id=dest_dataset_id, + source_dataset_ids=source_dataset_ids, + dest_dataset_ids=list(source_to_dest_dataset_ids.values()), fetched=state.fetched_events, inserted=state.inserted_events, skipped_deleted=state.skipped_deleted, diff --git a/braintrust_migrate/resources/experiments.py b/braintrust_migrate/resources/experiments.py index c8de897..3c09a2f 100644 --- a/braintrust_migrate/resources/experiments.py +++ b/braintrust_migrate/resources/experiments.py @@ -3,6 +3,7 @@ from __future__ import annotations import json as _json +import hashlib from collections.abc import Callable from pathlib import Path from typing import Any, ClassVar @@ -15,17 +16,32 @@ fetch_btql_sorted_page_with_retries, ) from braintrust_migrate.resources.base import MigrationResult, ResourceMigrator +from braintrust_migrate.sdk_logs import SDKExperimentWriter from braintrust_migrate.streaming_utils import ( EventsStreamState, SeenIdsDB, build_btql_sorted_page_query, - stream_btql_sorted_events, + stream_btql_sorted_events_buffered, ) # HTTP status codes HTTP_STATUS_REQUEST_ENTITY_TOO_LARGE = 413 +def _coerce_int_config( + cfg: Any, attr_name: str, default: int, *, minimum: int | None = None +) -> int: + value = getattr(cfg, attr_name, default) + if not isinstance(value, int): + try: + value = int(value) + except Exception: + value = default + if minimum is not None and value < minimum: + return default + return value + + class ExperimentMigrator(ResourceMigrator[dict]): """Migrator for Braintrust experiments. @@ -38,6 +54,10 @@ class ExperimentMigrator(ResourceMigrator[dict]): Uses raw API requests instead of SDK to avoid model dependencies. """ + SDK_FLUSH_MAX_ROWS: ClassVar[int] = 1_000 + SDK_FLUSH_MAX_BYTES: ClassVar[int] = 25 * 1024 * 1024 + DEFAULT_EVENT_FETCH_GROUP_SIZE: ClassVar[int] = 25 + def __init__( self, source_client, @@ -45,9 +65,7 @@ def __init__( checkpoint_dir: Path, batch_size: int = 100, *, - events_fetch_limit: int = 50, - events_insert_batch_size: int = 200, - events_use_version_snapshot: bool = True, + events_fetch_limit: int = 1000, events_use_seen_db: bool = True, events_progress_hook: Callable[[dict[str, Any]], None] | None = None, ) -> None: @@ -55,8 +73,6 @@ def __init__( source_client, dest_client, checkpoint_dir, batch_size=batch_size ) self.events_fetch_limit = events_fetch_limit - self.events_insert_batch_size = events_insert_batch_size - self.events_use_version_snapshot = events_use_version_snapshot self.events_use_seen_db = events_use_seen_db self._events_progress_hook = events_progress_hook self._attachment_copier: AttachmentCopier | None = None @@ -83,6 +99,14 @@ def __init__( self._insert_max_bytes: int | None = int(max_req * headroom) except Exception: self._insert_max_bytes = None + self._sdk_flush_max_rows = int(self.SDK_FLUSH_MAX_ROWS) + self._sdk_flush_max_bytes = int(self.SDK_FLUSH_MAX_BYTES) + self._event_fetch_group_size = _coerce_int_config( + cfg, + "events_fetch_group_size", + self.DEFAULT_EVENT_FETCH_GROUP_SIZE, + minimum=1, + ) @property def resource_name(self) -> str: @@ -386,30 +410,38 @@ async def _migrate_events_for_experiments( experiment_count=len(successful_migrations), ) - for result in successful_migrations: - try: + async def _flush_group(group: list[MigrationResult]) -> None: + if not group: + return + source_to_dest: dict[str, str] = {} + for result in group: if result.dest_id is None: raise ValueError( "Experiment migrated without dest_id; cannot copy events" ) - await self._migrate_experiment_events(result.source_id, result.dest_id) + source_to_dest[result.source_id] = result.dest_id + + await self._migrate_experiment_events_streaming_grouped(source_to_dest) - # Update metadata to indicate events are migrated + for result in group: if result.metadata: result.metadata["events_pending"] = False result.metadata["events_migrated"] = True + for start in range(0, len(successful_migrations), self._event_fetch_group_size): + group = successful_migrations[start : start + self._event_fetch_group_size] + try: + await _flush_group(group) except Exception as e: self._logger.error( - "Failed to migrate events for experiment", - source_id=result.source_id, - dest_id=result.dest_id, + "Failed to migrate events for experiment group", + source_ids=[result.source_id for result in group], error=str(e), ) - # Update metadata to indicate event migration failed - if result.metadata: - result.metadata["events_pending"] = False - result.metadata["events_failed"] = True + for result in group: + if result.metadata: + result.metadata["events_pending"] = False + result.metadata["events_failed"] = True self._logger.info( "Completed bulk event migration", @@ -486,8 +518,8 @@ async def _migrate_experiment_events( Deleted events (`_object_delete=true`) are skipped. """ - await self._migrate_experiment_events_streaming( - source_experiment_id, dest_experiment_id + await self._migrate_experiment_events_streaming_grouped( + {source_experiment_id: dest_experiment_id} ) @staticmethod @@ -518,6 +550,7 @@ def _event_to_insert( # Ensure id preserved for idempotency if "id" in event: out["id"] = event["id"] + out["experiment_id"] = source_experiment_id # Add provenance if absent if "origin" not in out or out.get("origin") is None: origin: dict[str, Any] = { @@ -532,6 +565,12 @@ def _event_to_insert( out["origin"] = origin return out + def _event_to_insert_from_row(self, event: dict[str, Any]) -> dict[str, Any]: + source_experiment_id = event.get("experiment_id") + if not isinstance(source_experiment_id, str) or not source_experiment_id: + raise ValueError("Fetched experiment event missing experiment_id") + return self._event_to_insert(event, source_experiment_id) + async def _fetch_experiment_events_page( self, *, @@ -545,22 +584,23 @@ async def _fetch_experiment_events_page( _ = cursor _ = version return await self._fetch_experiment_events_page_btql_sorted( - experiment_id=experiment_id, limit=limit, state=state + experiment_ids=[experiment_id], limit=limit, state=state ) async def _fetch_experiment_events_page_btql_sorted( self, *, - experiment_id: str, + experiment_ids: list[str], limit: int, state: EventsStreamState, ) -> dict[str, Any]: """Fetch one page via POST /btql using native BTQL syntax, sorted by _pagination_key.""" last_pagination_key = state.btql_min_pagination_key + quoted_ids = ", ".join(f"'{btql_quote(experiment_id)}'" for experiment_id in experiment_ids) def _query_text_for_limit(n: int) -> str: return build_btql_sorted_page_query( - from_expr=f"experiment('{btql_quote(experiment_id)}') spans", + from_expr=f"experiment({quoted_ids}) spans", limit=n, last_pagination_key=last_pagination_key, select="*", @@ -571,23 +611,10 @@ def _query_text_for_limit(n: int) -> str: query_for_limit=_query_text_for_limit, configured_limit=int(limit), operation="btql_experiment_events_page", - log_fields={"source_experiment_id": experiment_id}, + log_fields={"source_experiment_ids": experiment_ids}, timeout_seconds=120.0, ) - async def _insert_experiment_events( - self, *, experiment_id: str, events: list[dict[str, Any]] - ) -> None: - await self.dest_client.with_retry( - "insert_experiment_events", - lambda: self.dest_client.raw_request( - "POST", - f"/v1/experiment/{experiment_id}/insert", - json={"events": events}, - timeout=120.0, - ), - ) - @staticmethod def _is_http_413(exc: Exception) -> bool: return ( @@ -668,18 +695,52 @@ def _dump_oversize_event_summary( cursor=cursor, ) - async def _migrate_experiment_events_streaming( - self, source_experiment_id: str, dest_experiment_id: str + @staticmethod + def _group_stream_basename(source_experiment_ids: list[str]) -> str: + if len(source_experiment_ids) == 1: + return source_experiment_ids[0] + joined = ",".join(source_experiment_ids) + digest = hashlib.sha1(joined.encode("utf-8")).hexdigest()[:12] + return f"group_{digest}" + + async def _insert_experiment_events_grouped( + self, + *, + batch: list[dict[str, Any]], + writers_by_source: dict[str, SDKExperimentWriter], + ) -> None: + grouped: dict[str, list[dict[str, Any]]] = {} + for event in batch: + source_experiment_id = event.get("experiment_id") + if not isinstance(source_experiment_id, str) or not source_experiment_id: + raise ValueError("Experiment event batch missing source experiment_id") + grouped.setdefault(source_experiment_id, []).append(event) + + for source_experiment_id, rows in grouped.items(): + writer = writers_by_source.get(source_experiment_id) + if writer is None: + raise KeyError( + f"No destination writer for source experiment {source_experiment_id}" + ) + await self.dest_client.with_retry( + "insert_experiment_events", + lambda rows=rows, writer=writer: writer.write_rows(rows), + ) + + async def _migrate_experiment_events_streaming_grouped( + self, source_to_dest_experiment_ids: dict[str, str] ) -> None: try: events_dir = self.checkpoint_dir / "experiment_events" events_dir.mkdir(parents=True, exist_ok=True) + source_experiment_ids = list(source_to_dest_experiment_ids.keys()) + group_basename = self._group_stream_basename(source_experiment_ids) - state_path = events_dir / f"{source_experiment_id}_state.json" + state_path = events_dir / f"{group_basename}_state.json" state = EventsStreamState.from_path(state_path) seen_db = ( - SeenIdsDB(str(events_dir / f"{source_experiment_id}_seen.sqlite3")) + SeenIdsDB(str(events_dir / f"{group_basename}_seen.sqlite3")) if self.events_use_seen_db else None ) @@ -691,13 +752,17 @@ async def _migrate_experiment_events_streaming( # beginning and rely on seen_db for idempotency. self._logger.warning( "Experiment events checkpoint contains legacy cursor but no btql_min_pagination_key; restarting BTQL stream from beginning", - source_experiment_id=source_experiment_id, + source_experiment_ids=source_experiment_ids, ) state.cursor = None - # BTQL-based streaming does not use version snapshots. - version = None progress = self._events_progress_hook + writers_by_source = { + source_experiment_id: SDKExperimentWriter( + self.dest_client, dest_experiment_id + ) + for source_experiment_id, dest_experiment_id in source_to_dest_experiment_ids.items() + } def _save_state() -> None: state.cursor = None @@ -705,85 +770,115 @@ def _save_state() -> None: _json.dump(state.to_dict(), f, indent=2) async def _fetch(n: int) -> dict[str, Any]: - return await self._fetch_experiment_events_page( - experiment_id=source_experiment_id, - cursor=None, - version=version, + return await self._fetch_experiment_events_page_btql_sorted( + experiment_ids=source_experiment_ids, limit=n, state=state, ) async def _on_single_413(event: dict[str, Any], err: Exception) -> None: + source_experiment_id = event.get("experiment_id") self._dump_oversize_event_summary( events_dir=events_dir, cursor=state.btql_min_pagination_key, - dest_experiment_id=dest_experiment_id, + dest_experiment_id=( + source_to_dest_experiment_ids.get(source_experiment_id) + if isinstance(source_experiment_id, str) + else None + ) + or "unknown", event=event, error=err, ) - await stream_btql_sorted_events( + await stream_btql_sorted_events_buffered( fetch_page=_fetch, page_limit=int(self.events_fetch_limit), - get_last_pk=lambda: state.btql_min_pagination_key, - set_last_pk=lambda pk: setattr( - state, "btql_min_pagination_key", pk - ), + state=state, save_state=_save_state, page_event_filter=lambda e: e.get("_object_delete") is True, - event_to_insert=lambda e: self._event_to_insert( - e, source_experiment_id - ), + event_to_insert=self._event_to_insert_from_row, seen_db=seen_db, - insert_batch_size=int(self.events_insert_batch_size), - insert_max_bytes=self._insert_max_bytes, rewrite_event_in_place=( None if self._attachment_copier is None else self._attachment_copier.rewrite_event_in_place ), - insert_events=lambda batch: self._insert_experiment_events( - experiment_id=dest_experiment_id, events=batch + insert_events=lambda batch: self._insert_experiment_events_grouped( + batch=batch, + writers_by_source=writers_by_source, ), + flush_max_rows=self._sdk_flush_max_rows, + flush_max_bytes=self._sdk_flush_max_bytes, is_http_413=self._is_http_413, on_single_413=_on_single_413, - incr_fetched=lambda n: setattr( - state, "fetched_events", int(state.fetched_events) + int(n) - ), - incr_inserted=lambda n: setattr( - state, "inserted_events", int(state.inserted_events) + int(n) - ), - incr_inserted_bytes=lambda n: setattr( - state, "inserted_bytes", int(state.inserted_bytes) + int(n) - ), - incr_skipped_deleted=lambda n: setattr( - state, "skipped_deleted", int(state.skipped_deleted) + int(n) - ), - incr_skipped_seen=lambda n: setattr( - state, "skipped_seen", int(state.skipped_seen) + int(n) - ), - incr_attachments_copied=lambda n: setattr( - state, - "attachments_copied", - int(state.attachments_copied) + int(n), - ), hooks=None if progress is None else { + "on_fetch": lambda info, _p=progress: _p( + { + "resource": "experiment_events", + "phase": "fetch", + "source_experiment_ids": source_experiment_ids, + "dest_experiment_ids": list( + source_to_dest_experiment_ids.values() + ), + "page_num": info.get("page_num"), + "page_events": info.get("page_events"), + "fetched_total": info.get("fetched_total"), + "inserted_total": info.get("inserted_total"), + "inserted_bytes_total": info.get( + "inserted_bytes_total" + ), + "skipped_deleted_total": info.get( + "skipped_deleted_total" + ), + "skipped_seen_total": info.get("skipped_seen_total"), + "attachments_copied_total": info.get( + "attachments_copied_total" + ), + "pending_buffered_rows": info.get( + "pending_buffered_rows" + ), + "pending_buffered_bytes": info.get( + "pending_buffered_bytes" + ), + "cursor": ( + (state.btql_min_pagination_key[:16] + "…") + if isinstance(state.btql_min_pagination_key, str) + else None + ), + "next_cursor": None, + } + ), "on_page": lambda info, _p=progress: _p( { "resource": "experiment_events", "phase": "page", - "source_experiment_id": source_experiment_id, - "dest_experiment_id": dest_experiment_id, + "source_experiment_ids": source_experiment_ids, + "dest_experiment_ids": list( + source_to_dest_experiment_ids.values() + ), "page_num": info.get("page_num"), "page_events": info.get("page_events"), - "fetched_total": state.fetched_events, - "inserted_total": state.inserted_events, - "inserted_bytes_total": state.inserted_bytes, - "skipped_deleted_total": state.skipped_deleted, - "skipped_seen_total": state.skipped_seen, - "attachments_copied_total": state.attachments_copied, + "fetched_total": info.get("fetched_total"), + "inserted_total": info.get("inserted_total"), + "inserted_bytes_total": info.get( + "inserted_bytes_total" + ), + "skipped_deleted_total": info.get( + "skipped_deleted_total" + ), + "skipped_seen_total": info.get("skipped_seen_total"), + "attachments_copied_total": info.get( + "attachments_copied_total" + ), + "pending_buffered_rows": info.get( + "pending_buffered_rows" + ), + "pending_buffered_bytes": info.get( + "pending_buffered_bytes" + ), "cursor": ( (state.btql_min_pagination_key[:16] + "…") if isinstance(state.btql_min_pagination_key, str) @@ -792,18 +887,32 @@ async def _on_single_413(event: dict[str, Any], err: Exception) -> None: "next_cursor": None, } ), - "on_done": lambda _info, _p=progress: _p( + "on_done": lambda info, _p=progress: _p( { "resource": "experiment_events", "phase": "done", - "source_experiment_id": source_experiment_id, - "dest_experiment_id": dest_experiment_id, - "fetched_total": state.fetched_events, - "inserted_total": state.inserted_events, - "inserted_bytes_total": state.inserted_bytes, - "skipped_deleted_total": state.skipped_deleted, - "skipped_seen_total": state.skipped_seen, - "attachments_copied_total": state.attachments_copied, + "source_experiment_ids": source_experiment_ids, + "dest_experiment_ids": list( + source_to_dest_experiment_ids.values() + ), + "fetched_total": info.get("fetched_total"), + "inserted_total": info.get("inserted_total"), + "inserted_bytes_total": info.get( + "inserted_bytes_total" + ), + "skipped_deleted_total": info.get( + "skipped_deleted_total" + ), + "skipped_seen_total": info.get("skipped_seen_total"), + "attachments_copied_total": info.get( + "attachments_copied_total" + ), + "pending_buffered_rows": info.get( + "pending_buffered_rows" + ), + "pending_buffered_bytes": info.get( + "pending_buffered_bytes" + ), "cursor": None, "next_cursor": None, } @@ -813,8 +922,8 @@ async def _on_single_413(event: dict[str, Any], err: Exception) -> None: self._logger.info( "Migrated experiment events (streaming)", - source_experiment_id=source_experiment_id, - dest_experiment_id=dest_experiment_id, + source_experiment_ids=source_experiment_ids, + dest_experiment_ids=list(source_to_dest_experiment_ids.values()), fetched=state.fetched_events, inserted=state.inserted_events, skipped_deleted=state.skipped_deleted, @@ -830,19 +939,19 @@ async def _on_single_413(event: dict[str, Any], err: Exception) -> None: if "Error code: 303" in error_str: self._logger.warning( "Experiment events fetch returned HTTP 303 - skipping events migration", - source_experiment_id=source_experiment_id, + source_experiment_ids=list(source_to_dest_experiment_ids.keys()), ) return elif "Error code: 404" in error_str: self._logger.info( "No events found in source experiment (404)", - source_experiment_id=source_experiment_id, + source_experiment_ids=list(source_to_dest_experiment_ids.keys()), ) return else: self._logger.error( "Failed to migrate experiment events", - source_experiment_id=source_experiment_id, + source_experiment_ids=list(source_to_dest_experiment_ids.keys()), error=str(e), ) raise diff --git a/braintrust_migrate/resources/logs.py b/braintrust_migrate/resources/logs.py index d795bb8..fad64cf 100644 --- a/braintrust_migrate/resources/logs.py +++ b/braintrust_migrate/resources/logs.py @@ -7,6 +7,7 @@ from __future__ import annotations import json as _json +import time from collections.abc import Callable from dataclasses import dataclass from pathlib import Path @@ -16,6 +17,10 @@ import structlog from braintrust_migrate.attachments import AttachmentCopier +from braintrust_migrate.batching import ( + approx_events_insert_payload_bytes, + approx_json_bytes, +) from braintrust_migrate.btql import ( btql_quote, fetch_btql_sorted_page_with_retries, @@ -23,10 +28,10 @@ ) from braintrust_migrate.client import BraintrustClient from braintrust_migrate.resources.base import MigrationState, ResourceMigrator +from braintrust_migrate.sdk_logs import SDKProjectLogsWriter from braintrust_migrate.streaming_utils import ( SeenIdsDB, build_btql_sorted_page_query, - stream_btql_sorted_events, ) logger = structlog.get_logger(__name__) @@ -52,7 +57,6 @@ class _LogsStreamingState: btql_min_pagination_key: str | None = None btql_min_pagination_key_inclusive: bool = False btql_last_created: str | None = None - query_source: str | None = None created_after: str | None = None created_before: str | None = None @@ -76,7 +80,6 @@ def from_path(cls, path: Path) -> _LogsStreamingState: data.get("btql_min_pagination_key_inclusive", False) ), btql_last_created=data.get("btql_last_created"), - query_source=data.get("query_source"), created_after=data.get("created_after"), created_before=data.get("created_before"), ) @@ -94,7 +97,6 @@ def to_dict(self) -> dict[str, Any]: "btql_min_pagination_key": self.btql_min_pagination_key, "btql_min_pagination_key_inclusive": self.btql_min_pagination_key_inclusive, "btql_last_created": self.btql_last_created, - "query_source": self.query_source, "created_after": self.created_after, "created_before": self.created_before, } @@ -103,6 +105,9 @@ def to_dict(self) -> dict[str, Any]: class LogsMigrator(ResourceMigrator[dict[str, Any]]): """Streaming migrator for Braintrust project logs.""" + SDK_FLUSH_MAX_ROWS: ClassVar[int] = 5_000 + SDK_FLUSH_MAX_BYTES: ClassVar[int] = 25 * 1024 * 1024 + _INSERT_FIELDS: ClassVar[set[str]] = { "input", "output", @@ -150,6 +155,9 @@ def __init__( _ = batch_size # intentionally ignored self._logger = logger.bind(migrator=self.__class__.__name__) + self._sdk_logs_writer: SDKProjectLogsWriter | None = None + self._sdk_flush_max_rows = int(self.SDK_FLUSH_MAX_ROWS) + self._sdk_flush_max_bytes = int(self.SDK_FLUSH_MAX_BYTES) self._stream_state_path = self.checkpoint_dir / "logs_streaming_state.json" self._stream_state = _LogsStreamingState.from_path(self._stream_state_path) @@ -317,16 +325,26 @@ def _query_text_for_limit(n: int) -> str: async def _insert_events( self, *, project_id: str, events: list[dict[str, Any]] ) -> None: + if self._sdk_logs_writer is None: + self._sdk_logs_writer = SDKProjectLogsWriter(self.dest_client, project_id) await self.dest_client.with_retry( "insert_project_logs_events", - lambda: self.dest_client.raw_request( - "POST", - f"/v1/project_logs/{project_id}/insert", - json={"events": events}, - timeout=120.0, - ), + lambda: self._sdk_logs_writer.write_rows(events), ) + @staticmethod + def _extract_ids(events: list[dict[str, Any]]) -> list[str]: + ids: list[str] = [] + for event in events: + event_id = event.get("id") + if isinstance(event_id, str) and event_id: + ids.append(event_id) + return ids + + @staticmethod + def _sum_event_bytes(events: list[dict[str, Any]]) -> int: + return sum(approx_json_bytes(event) for event in events) + @staticmethod def _is_http_413(exc: Exception) -> bool: return ( @@ -470,7 +488,6 @@ async def migrate_all( "migration_config.created_after must be a non-empty string when set" ) self._stream_state.created_after = created_after_cfg - self._stream_state.query_source = "btql_sorted_date_filter" self._save_stream_state() elif created_after_cfg is None: raise ValueError( @@ -493,12 +510,14 @@ async def migrate_all( or created_before_cfg is not None ): if self._stream_state.created_before is None: - if not isinstance(created_before_cfg, str) or not created_before_cfg: + if ( + not isinstance(created_before_cfg, str) + or not created_before_cfg + ): raise TypeError( "migration_config.created_before must be a non-empty string when set" ) self._stream_state.created_before = created_before_cfg - self._stream_state.query_source = "btql_sorted_date_filter" self._save_stream_state() elif created_before_cfg is None: raise ValueError( @@ -561,15 +580,29 @@ async def migrate_all( } self._stream_state.btql_min_pagination_key = start_pk self._stream_state.btql_min_pagination_key_inclusive = True - self._stream_state.query_source = "btql_sorted_date_filter" self._save_stream_state() progress_hook = self._progress_hook current_page_num: int | None = None current_page_events: int | None = None + active_last_pk = self._stream_state.btql_min_pagination_key + active_last_pk_inclusive = bool( + self._stream_state.btql_min_pagination_key_inclusive + ) + + pending_events: list[dict[str, Any]] = [] + pending_seen_ids: set[str] = set() + pending_row_bytes = 0 + pending_fetched_events = 0 + pending_inserted_events = 0 + pending_inserted_bytes = 0 + pending_skipped_seen = 0 + pending_attachments_copied = 0 + pending_last_pk: str | None = None + pending_last_created: str | None = None def _pk_cursor_prefix() -> str | None: - pk = self._stream_state.btql_min_pagination_key + pk = pending_last_pk or active_last_pk return (pk[:16] + "…") if isinstance(pk, str) else pk def _save_state() -> None: @@ -577,14 +610,6 @@ def _save_state() -> None: self._stream_state.cursor = None self._save_stream_state() - async def _fetch(n: int) -> dict[str, Any]: - return await self._fetch_page( - project_id=source_project_id, - cursor=None, - version=None, - limit=n, - ) - async def _on_single_413(event: dict[str, Any], err: Exception) -> None: self._dump_oversize_event_summary( cursor=self._stream_state.btql_min_pagination_key, @@ -609,11 +634,16 @@ def _on_fetch(info: dict[str, Any]) -> None: "page_events": current_page_events, "configured_fetch_limit": self.page_limit, "configured_insert_batch_size": self.insert_batch_size, - "fetched_total": self._stream_state.fetched_events, + "fetched_total": self._stream_state.fetched_events + + pending_fetched_events, "inserted_total": self._stream_state.inserted_events, "inserted_bytes_total": self._stream_state.inserted_bytes, - "skipped_seen_total": self._stream_state.skipped_seen, - "attachments_copied_total": self._stream_state.attachments_copied, + "skipped_seen_total": self._stream_state.skipped_seen + + pending_skipped_seen, + "attachments_copied_total": self._stream_state.attachments_copied + + pending_attachments_copied, + "pending_buffered_rows": pending_inserted_events, + "pending_buffered_bytes": pending_row_bytes, "cursor": _pk_cursor_prefix(), } ) @@ -637,6 +667,8 @@ def _on_insert(insert_info: dict[str, Any]) -> None: "inserted_bytes_total": self._stream_state.inserted_bytes, "skipped_seen_total": self._stream_state.skipped_seen, "attachments_copied_total": self._stream_state.attachments_copied, + "pending_buffered_rows": 0, + "pending_buffered_bytes": 0, "cursor": _pk_cursor_prefix(), } ) @@ -654,11 +686,16 @@ def _on_page(info: dict[str, Any]) -> None: "page_events": info.get("page_events"), "configured_fetch_limit": self.page_limit, "configured_insert_batch_size": self.insert_batch_size, - "fetched_total": self._stream_state.fetched_events, + "fetched_total": self._stream_state.fetched_events + + pending_fetched_events, "inserted_total": self._stream_state.inserted_events, "inserted_bytes_total": self._stream_state.inserted_bytes, - "skipped_seen_total": self._stream_state.skipped_seen, - "attachments_copied_total": self._stream_state.attachments_copied, + "skipped_seen_total": self._stream_state.skipped_seen + + pending_skipped_seen, + "attachments_copied_total": self._stream_state.attachments_copied + + pending_attachments_copied, + "pending_buffered_rows": pending_inserted_events, + "pending_buffered_bytes": pending_row_bytes, "cursor": _pk_cursor_prefix(), "next_cursor": None, } @@ -678,6 +715,8 @@ def _on_done(_info: dict[str, Any]) -> None: "inserted_bytes_total": self._stream_state.inserted_bytes, "skipped_seen_total": self._stream_state.skipped_seen, "attachments_copied_total": self._stream_state.attachments_copied, + "pending_buffered_rows": 0, + "pending_buffered_bytes": 0, "cursor": None, "next_cursor": None, } @@ -753,63 +792,192 @@ def _set_last_pk(pk: str | None) -> None: self._stream_state.btql_min_pagination_key = pk self._stream_state.btql_min_pagination_key_inclusive = False - await stream_btql_sorted_events( - fetch_page=_fetch, - page_limit=int(self.page_limit), - get_last_pk=lambda: self._stream_state.btql_min_pagination_key, - set_last_pk=_set_last_pk, - save_state=_save_state, - page_event_filter=None, - event_to_insert=lambda e: self._event_to_insert(e, source_project_id), - seen_db=seen_db, - insert_batch_size=int(self.insert_batch_size), - insert_max_bytes=self._insert_max_bytes, - rewrite_event_in_place=( - None - if self._attachment_copier is None - else self._attachment_copier.rewrite_event_in_place - ), - insert_events=lambda batch: self._insert_events( - project_id=dest_project_id, events=batch - ), - is_http_413=self._is_http_413, - on_single_413=_on_single_413, - incr_fetched=lambda n: setattr( - self._stream_state, - "fetched_events", - int(self._stream_state.fetched_events) + int(n), - ), - incr_inserted=lambda n: setattr( - self._stream_state, - "inserted_events", - int(self._stream_state.inserted_events) + int(n), - ), - incr_inserted_bytes=lambda n: setattr( - self._stream_state, - "inserted_bytes", - int(self._stream_state.inserted_bytes) + int(n), - ), - incr_skipped_deleted=None, - incr_skipped_seen=lambda n: setattr( - self._stream_state, - "skipped_seen", - int(self._stream_state.skipped_seen) + int(n), - ), - incr_attachments_copied=lambda n: setattr( - self._stream_state, - "attachments_copied", - int(self._stream_state.attachments_copied) + int(n), - ), - hooks=None - if progress_hook is None - else { - "on_fetch": _on_fetch, - "on_insert": _on_insert, - "on_page": _on_page, - "on_done": _on_done, - "on_batch_error": _on_batch_error, - }, - ) + async def _flush_pending_events() -> None: + nonlocal pending_events + nonlocal pending_seen_ids + nonlocal pending_row_bytes + nonlocal pending_fetched_events + nonlocal pending_inserted_events + nonlocal pending_inserted_bytes + nonlocal pending_skipped_seen + nonlocal pending_attachments_copied + nonlocal pending_last_pk + nonlocal pending_last_created + + if ( + pending_fetched_events == 0 + and pending_inserted_events == 0 + and pending_skipped_seen == 0 + and pending_attachments_copied == 0 + and pending_last_pk is None + ): + return + + if pending_events: + batch = list(pending_events) + started = time.perf_counter() + try: + await self._insert_events( + project_id=dest_project_id, events=batch + ) + except Exception as e: + _on_batch_error( + { + "page_num": current_page_num, + "batch": batch, + "error": e, + } + ) + if len(batch) == 1 and self._is_http_413(e): + await _on_single_413(batch[0], e) + raise + if seen_db is not None and pending_seen_ids: + seen_db.mark_seen(list(pending_seen_ids)) + self._stream_state.inserted_events += pending_inserted_events + self._stream_state.inserted_bytes += pending_inserted_bytes + _on_insert( + { + "inserted_last": pending_inserted_events, + "inserted_bytes_last": pending_inserted_bytes, + "insert_seconds": max(0.0, time.perf_counter() - started), + "flush_rows": pending_inserted_events, + "flush_buffer_bytes": pending_row_bytes, + } + ) + + self._stream_state.fetched_events += pending_fetched_events + self._stream_state.skipped_seen += pending_skipped_seen + self._stream_state.attachments_copied += pending_attachments_copied + self._stream_state.btql_last_created = pending_last_created + if pending_last_pk is not None: + _set_last_pk(pending_last_pk) + _save_state() + + pending_events = [] + pending_seen_ids = set() + pending_row_bytes = 0 + pending_fetched_events = 0 + pending_inserted_events = 0 + pending_inserted_bytes = 0 + pending_skipped_seen = 0 + pending_attachments_copied = 0 + pending_last_pk = None + pending_last_created = None + + page_num = 0 + while True: + page_num += 1 + current_page_num = page_num + + from_expr = f"project_logs('{btql_quote(source_project_id)}') spans" + + def _query_text_for_limit(n: int) -> str: + return build_btql_sorted_page_query( + from_expr=from_expr, + limit=n, + last_pagination_key=active_last_pk, + last_pagination_key_inclusive=active_last_pk_inclusive, + created_after=self._stream_state.created_after, + created_before=self._stream_state.created_before, + select="*", + ) + + page = await fetch_btql_sorted_page_with_retries( + client=self.source_client, + query_for_limit=_query_text_for_limit, + configured_limit=int(self.page_limit), + operation="btql_project_logs_page", + log_fields={"source_project_id": source_project_id}, + timeout_seconds=120.0, + ) + page_events = cast(list[dict[str, Any]], page.get("events") or []) + page_last_pk = cast(str | None, page.get("btql_last_pagination_key")) + current_page_events = len(page_events) + + if page_events: + _on_fetch( + { + "page_num": page_num, + "page_events": len(page_events), + "configured_fetch_limit": int(self.page_limit), + } + ) + + if not page_events: + await _flush_pending_events() + _save_state() + _on_done({"page_num": page_num}) + break + + pending_fetched_events += len(page_events) + + insert_events_list = [ + self._event_to_insert(event, source_project_id) + for event in page_events + ] + + if seen_db is not None: + all_ids = self._extract_ids(insert_events_list) + if all_ids: + unseen = set(seen_db.filter_unseen(all_ids)) + skipped_seen = len(all_ids) - len(unseen) + if skipped_seen: + pending_skipped_seen += skipped_seen + insert_events_list = [ + event + for event in insert_events_list + if event.get("id") in unseen + ] + + if pending_seen_ids: + deduped_events: list[dict[str, Any]] = [] + pending_duplicates = 0 + for event in insert_events_list: + event_id = event.get("id") + if isinstance(event_id, str) and event_id in pending_seen_ids: + pending_duplicates += 1 + continue + deduped_events.append(event) + if pending_duplicates: + pending_skipped_seen += pending_duplicates + insert_events_list = deduped_events + + if self._attachment_copier is not None: + copied = 0 + for event in insert_events_list: + copied += int( + await self._attachment_copier.rewrite_event_in_place(event) + ) + pending_attachments_copied += copied + + if insert_events_list: + pending_events.extend(insert_events_list) + pending_seen_ids.update(self._extract_ids(insert_events_list)) + pending_inserted_events += len(insert_events_list) + pending_inserted_bytes += approx_events_insert_payload_bytes( + insert_events_list + ) + pending_row_bytes += self._sum_event_bytes(insert_events_list) + + pending_last_pk = page_last_pk + last_created = page_events[-1].get("created") + if isinstance(last_created, str): + pending_last_created = last_created + active_last_pk = page_last_pk + active_last_pk_inclusive = False + + if ( + pending_inserted_events >= self._sdk_flush_max_rows + or pending_row_bytes >= self._sdk_flush_max_bytes + ): + await _flush_pending_events() + + _on_page( + { + "page_num": page_num, + "page_events": len(page_events), + } + ) return { "resource_type": self.resource_name, diff --git a/braintrust_migrate/sdk_logs.py b/braintrust_migrate/sdk_logs.py new file mode 100644 index 0000000..10f336f --- /dev/null +++ b/braintrust_migrate/sdk_logs.py @@ -0,0 +1,90 @@ +"""SDK-backed logs3 writers for pre-shaped migration rows.""" + +from __future__ import annotations + +import asyncio +from collections.abc import Mapping, Sequence +from typing import Any + +from braintrust_migrate.client import BraintrustClient + +PROJECT_LOGS_LOG_ID = "g" + + +class SDKRowWriter: + """Use the Braintrust Python SDK's logs3 transport for pre-shaped rows.""" + + def __init__( + self, dest_client: BraintrustClient, object_id_fields: Mapping[str, Any] + ) -> None: + self._dest_client = dest_client + self._object_id_fields = dict(object_id_fields) + self._background_logger: Any | None = None + self._lazy_value_cls: Any | None = None + + def _ensure_logger(self) -> None: + if self._background_logger is not None and self._lazy_value_cls is not None: + return + + try: + from braintrust.logger import HTTPConnection, _HTTPBackgroundLogger + from braintrust.util import LazyValue + except ImportError as exc: + raise ImportError( + "braintrust package is required for SDK-backed migration writes" + ) from exc + + conn = HTTPConnection(str(self._dest_client.org_config.url).rstrip("/")) + conn.set_token(self._dest_client.org_config.api_key) + conn.make_long_lived() + + logger = _HTTPBackgroundLogger(LazyValue(lambda: conn, use_mutex=True)) + logger.sync_flush = True + self._background_logger = logger + self._lazy_value_cls = LazyValue + + def _prepare_row(self, row: dict[str, Any]) -> dict[str, Any]: + return { + **dict(row), + **self._object_id_fields, + } + + def write_rows_sync(self, rows: Sequence[dict[str, Any]]) -> None: + self._ensure_logger() + assert self._background_logger is not None + assert self._lazy_value_cls is not None + + events = [ + self._lazy_value_cls( + lambda prepared=self._prepare_row(row): prepared, + use_mutex=False, + ) + for row in rows + ] + if events: + self._background_logger.log(*events) + self._background_logger.flush() + + async def write_rows(self, rows: Sequence[dict[str, Any]]) -> None: + await asyncio.to_thread(self.write_rows_sync, rows) + + +class SDKProjectLogsWriter(SDKRowWriter): + def __init__(self, dest_client: BraintrustClient, project_id: str) -> None: + super().__init__( + dest_client, + { + "project_id": project_id, + "log_id": PROJECT_LOGS_LOG_ID, + }, + ) + + +class SDKExperimentWriter(SDKRowWriter): + def __init__(self, dest_client: BraintrustClient, experiment_id: str) -> None: + super().__init__(dest_client, {"experiment_id": experiment_id}) + + +class SDKDatasetWriter(SDKRowWriter): + def __init__(self, dest_client: BraintrustClient, dataset_id: str) -> None: + super().__init__(dest_client, {"dataset_id": dataset_id}) diff --git a/braintrust_migrate/streaming_utils.py b/braintrust_migrate/streaming_utils.py index 9d5b534..0df3c2a 100644 --- a/braintrust_migrate/streaming_utils.py +++ b/braintrust_migrate/streaming_utils.py @@ -17,10 +17,9 @@ from braintrust_migrate.batching import ( approx_events_insert_payload_bytes, - iter_ordered_batches_by_count_and_bytes, + approx_json_bytes, ) from braintrust_migrate.btql import btql_quote -from braintrust_migrate.insert_bisect import insert_with_413_bisect @dataclass @@ -30,7 +29,6 @@ class EventsStreamState: version: str | None = None cursor: str | None = None btql_min_pagination_key: str | None = None - query_source: str | None = None fetched_events: int = 0 inserted_events: int = 0 inserted_bytes: int = 0 @@ -48,7 +46,6 @@ def from_path(cls, path: Path) -> EventsStreamState: version=data.get("version"), cursor=data.get("cursor"), btql_min_pagination_key=data.get("btql_min_pagination_key"), - query_source=data.get("query_source"), fetched_events=int(data.get("fetched_events", 0)), inserted_events=int(data.get("inserted_events", 0)), inserted_bytes=int(data.get("inserted_bytes", 0)), @@ -62,7 +59,6 @@ def to_dict(self) -> dict[str, Any]: "version": self.version, "cursor": self.cursor, "btql_min_pagination_key": self.btql_min_pagination_key, - "query_source": self.query_source, "fetched_events": self.fetched_events, "inserted_events": self.inserted_events, "inserted_bytes": self.inserted_bytes, @@ -171,54 +167,128 @@ def _extract_ids(events: list[dict[str, Any]]) -> list[str]: return ids -async def stream_btql_sorted_events( +async def stream_btql_sorted_events_buffered( *, fetch_page: Callable[[int], Awaitable[dict[str, Any]]], page_limit: int, - # State plumbing - get_last_pk: Callable[[], str | None], - set_last_pk: Callable[[str | None], None], + state: EventsStreamState, save_state: Callable[[], None], - # Event processing page_event_filter: Callable[[dict[str, Any]], bool] | None, event_to_insert: Callable[[dict[str, Any]], dict[str, Any]], - # Idempotency + batching seen_db: SeenIdsDB | None, - insert_batch_size: int, - insert_max_bytes: int | None, - # Optional attachment rewrite (returns count of rewritten refs or 0/1) rewrite_event_in_place: Callable[[dict[str, Any]], Awaitable[int]] | None, - # Insert behavior insert_events: Callable[[list[dict[str, Any]]], Awaitable[None]], + flush_max_rows: int, + flush_max_bytes: int, is_http_413: Callable[[Exception], bool], on_single_413: Callable[[dict[str, Any], Exception], Awaitable[None]] | None, - # Counters (mutated via closures so each migrator keeps its own state shape) - incr_fetched: Callable[[int], None], - incr_inserted: Callable[[int], None], - incr_inserted_bytes: Callable[[int], None], - incr_skipped_deleted: Callable[[int], None] | None, - incr_skipped_seen: Callable[[int], None] | None, - incr_attachments_copied: Callable[[int], None] | None, - pipeline: bool = False, - # Optional progress hooks hooks: StreamHooks | None = None, ) -> None: - """Shared BTQL-sorted streaming loop. - - This handles: - - Fetching pages (sorted by `_pagination_key`) - - Filtering deleted events (optional) - - Filtering seen IDs via SeenIdsDB (optional) - - Batching by count and bytes - - Optional attachment rewrite - - Insert with 413-bisect isolation - - Advancing pagination key only after successful inserts + """Stream BTQL-sorted rows, buffering multiple pages before flushing. + + This preserves restart safety by checkpointing only after a buffered flush + succeeds. Rows may therefore be fetched across several pages before any are + committed to the destination or to the seen-id database. """ - _ = pipeline + if flush_max_rows <= 0: + raise ValueError(f"flush_max_rows must be positive; got {flush_max_rows}") + if flush_max_bytes <= 0: + raise ValueError(f"flush_max_bytes must be positive; got {flush_max_bytes}") + + active_last_pk = state.btql_min_pagination_key + pending_events: list[dict[str, Any]] = [] + pending_seen_ids: set[str] = set() + pending_row_bytes = 0 + pending_fetched_events = 0 + pending_inserted_events = 0 + pending_inserted_bytes = 0 + pending_skipped_deleted = 0 + pending_skipped_seen = 0 + pending_attachments_copied = 0 + pending_last_pk: str | None = None + current_page_num: int | None = None + + async def _flush_pending() -> None: + nonlocal active_last_pk + nonlocal pending_events + nonlocal pending_seen_ids + nonlocal pending_row_bytes + nonlocal pending_fetched_events + nonlocal pending_inserted_events + nonlocal pending_inserted_bytes + nonlocal pending_skipped_deleted + nonlocal pending_skipped_seen + nonlocal pending_attachments_copied + nonlocal pending_last_pk + + if ( + pending_fetched_events == 0 + and pending_inserted_events == 0 + and pending_skipped_deleted == 0 + and pending_skipped_seen == 0 + and pending_attachments_copied == 0 + and pending_last_pk is None + ): + return + + if pending_events: + batch = list(pending_events) + started = time.perf_counter() + try: + await insert_events(batch) + except Exception as e: + if hooks and "on_batch_error" in hooks: + hooks["on_batch_error"]( + { + "page_num": current_page_num, + "batch": batch, + "error": e, + } + ) + if len(batch) == 1 and on_single_413 is not None and is_http_413(e): + await on_single_413(batch[0], e) + raise + + if seen_db is not None and pending_seen_ids: + seen_db.mark_seen(list(pending_seen_ids)) + state.inserted_events += pending_inserted_events + state.inserted_bytes += pending_inserted_bytes + if hooks and "on_insert" in hooks: + hooks["on_insert"]( + { + "inserted_last": pending_inserted_events, + "inserted_bytes_last": pending_inserted_bytes, + "insert_seconds": max(0.0, time.perf_counter() - started), + "flush_rows": pending_inserted_events, + "flush_buffer_bytes": pending_row_bytes, + } + ) + + state.fetched_events += pending_fetched_events + state.skipped_deleted += pending_skipped_deleted + state.skipped_seen += pending_skipped_seen + state.attachments_copied += pending_attachments_copied + state.btql_min_pagination_key = pending_last_pk + save_state() + active_last_pk = pending_last_pk + + pending_events = [] + pending_seen_ids = set() + pending_row_bytes = 0 + pending_fetched_events = 0 + pending_inserted_events = 0 + pending_inserted_bytes = 0 + pending_skipped_deleted = 0 + pending_skipped_seen = 0 + pending_attachments_copied = 0 + pending_last_pk = None + page_num = 0 while True: page_num += 1 + state.btql_min_pagination_key = active_last_pk + current_page_num = page_num page = await fetch_page(page_limit) page_events: list[dict[str, Any]] = cast( list[dict[str, Any]], page.get("events") or [] @@ -233,115 +303,118 @@ async def stream_btql_sorted_events( "page_num": page_num, "page_events": len(page_events), "configured_fetch_limit": int(page_limit), + "fetched_total": state.fetched_events + pending_fetched_events, + "inserted_total": state.inserted_events + pending_inserted_events, + "inserted_bytes_total": state.inserted_bytes + + pending_inserted_bytes, + "skipped_deleted_total": state.skipped_deleted + + pending_skipped_deleted, + "skipped_seen_total": state.skipped_seen + pending_skipped_seen, + "attachments_copied_total": state.attachments_copied + + pending_attachments_copied, + "pending_buffered_rows": pending_inserted_events, + "pending_buffered_bytes": pending_row_bytes, } ) if not page_events: - # Persist end-of-stream state (so resume knows it's complete and counters are saved). + await _flush_pending() save_state() if hooks and "on_done" in hooks: - hooks["on_done"]({"page_num": page_num}) + hooks["on_done"]( + { + "page_num": page_num, + "fetched_total": state.fetched_events, + "inserted_total": state.inserted_events, + "inserted_bytes_total": state.inserted_bytes, + "skipped_deleted_total": state.skipped_deleted, + "skipped_seen_total": state.skipped_seen, + "attachments_copied_total": state.attachments_copied, + "pending_buffered_rows": 0, + "pending_buffered_bytes": 0, + } + ) break - incr_fetched(len(page_events)) + pending_fetched_events += len(page_events) kept: list[dict[str, Any]] = [] if page_event_filter is None: kept = page_events else: skipped_deleted = 0 - for e in page_events: - if page_event_filter(e): + for event in page_events: + if page_event_filter(event): skipped_deleted += 1 continue - kept.append(e) - if skipped_deleted and incr_skipped_deleted is not None: - incr_skipped_deleted(skipped_deleted) + kept.append(event) + pending_skipped_deleted += skipped_deleted - insert_events_list = [event_to_insert(e) for e in kept] + insert_events_list = [event_to_insert(event) for event in kept] - # Filter seen ids up-front for the whole page (preserves order and reduces DB calls). if seen_db is not None: all_ids = _extract_ids(insert_events_list) if all_ids: unseen = set(seen_db.filter_unseen(all_ids)) - skipped_seen = len(all_ids) - len(unseen) - if skipped_seen and incr_skipped_seen is not None: - incr_skipped_seen(skipped_seen) + pending_skipped_seen += len(all_ids) - len(unseen) insert_events_list = [ - e for e in insert_events_list if e.get("id") in unseen + event for event in insert_events_list if event.get("id") in unseen ] - async def _insert_one(batch: list[dict[str, Any]]) -> float: - t0 = time.perf_counter() - await insert_events(batch) - return max(0.0, time.perf_counter() - t0) - - async def _on_success(batch: list[dict[str, Any]], dt: float) -> None: - if seen_db is not None: - seen_db.mark_seen(_extract_ids(batch)) - incr_inserted(len(batch)) - inserted_bytes_last = approx_events_insert_payload_bytes(batch) - incr_inserted_bytes(inserted_bytes_last) - if hooks and "on_insert" in hooks: - hooks["on_insert"]( - { - "inserted_last": len(batch), - "inserted_bytes_last": inserted_bytes_last, - "insert_seconds": dt, - } - ) + if pending_seen_ids: + deduped_events: list[dict[str, Any]] = [] + pending_duplicates = 0 + for event in insert_events_list: + event_id = event.get("id") + if isinstance(event_id, str) and event_id in pending_seen_ids: + pending_duplicates += 1 + continue + deduped_events.append(event) + pending_skipped_seen += pending_duplicates + insert_events_list = deduped_events + + if rewrite_event_in_place is not None: + copied = 0 + for event in insert_events_list: + copied += int(await rewrite_event_in_place(event)) + pending_attachments_copied += copied + + if insert_events_list: + pending_events.extend(insert_events_list) + pending_seen_ids.update(_extract_ids(insert_events_list)) + pending_inserted_events += len(insert_events_list) + pending_inserted_bytes += approx_events_insert_payload_bytes( + insert_events_list + ) + pending_row_bytes += sum( + approx_json_bytes(event) for event in insert_events_list + ) - async def _on_single_413(event: dict[str, Any], err: Exception) -> None: - if on_single_413 is not None: - await on_single_413(event, err) + pending_last_pk = page_last_pk + if isinstance(page_last_pk, str) and page_last_pk: + active_last_pk = page_last_pk - for batch in iter_ordered_batches_by_count_and_bytes( - insert_events_list, - max_items=int(insert_batch_size), - max_bytes=insert_max_bytes, + if ( + pending_inserted_events >= flush_max_rows + or pending_row_bytes >= flush_max_bytes ): - if not batch: - continue - - if ( - rewrite_event_in_place is not None - and incr_attachments_copied is not None - ): - copied = 0 - for e in batch: - copied += int(await rewrite_event_in_place(e)) - if copied: - incr_attachments_copied(copied) - - try: - await insert_with_413_bisect( - batch, - insert_fn=_insert_one, - is_http_413=is_http_413, - on_success=_on_success, - on_single_413=_on_single_413, - ) - except Exception as e: - if hooks and "on_batch_error" in hooks: - hooks["on_batch_error"]( - { - "page_num": page_num, - "batch": batch, - "error": e, - } - ) - raise - - # Commit pagination progress only after inserts for this page succeed. - if isinstance(page_last_pk, str) and page_last_pk: - set_last_pk(page_last_pk) - save_state() + await _flush_pending() if hooks and "on_page" in hooks: hooks["on_page"]( { "page_num": page_num, "page_events": len(page_events), + "fetched_total": state.fetched_events + pending_fetched_events, + "inserted_total": state.inserted_events + pending_inserted_events, + "inserted_bytes_total": state.inserted_bytes + + pending_inserted_bytes, + "skipped_deleted_total": state.skipped_deleted + + pending_skipped_deleted, + "skipped_seen_total": state.skipped_seen + pending_skipped_seen, + "attachments_copied_total": state.attachments_copied + + pending_attachments_copied, + "pending_buffered_rows": pending_inserted_events, + "pending_buffered_bytes": pending_row_bytes, } ) diff --git a/pyproject.toml b/pyproject.toml index 8454ff2..8252ab2 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -23,6 +23,7 @@ classifiers = [ ] dependencies = [ + "braintrust~=0.12.0", "typer[all]>=0.9.0", "pydantic>=2.5.0", "structlog>=23.2.0", diff --git a/tests/integration/test_migration_flow.py b/tests/integration/test_migration_flow.py index 170b88f..43953c9 100644 --- a/tests/integration/test_migration_flow.py +++ b/tests/integration/test_migration_flow.py @@ -82,7 +82,13 @@ async def mock_create_client_pair(source_config, dest_config, migration_config): ) mock_client.create_project = AsyncMock(return_value=mock_project_data) - async def mock_with_retry(_op_name, coro_func): + async def mock_with_retry( + _op_name, + coro_func, + *, + non_retryable_statuses=None, + ): + _ = non_retryable_statuses res = coro_func() if hasattr(res, "__await__"): return await res diff --git a/tests/unit/test_btql_helper.py b/tests/unit/test_btql_helper.py index 8afde2e..024cada 100644 --- a/tests/unit/test_btql_helper.py +++ b/tests/unit/test_btql_helper.py @@ -12,12 +12,39 @@ class _StubBtqlClient: def __init__(self) -> None: self.calls: list[dict[str, Any]] = [] self.mode: str = "ok" + self.with_retry_calls: list[dict[str, Any]] = [] - async def with_retry(self, _operation_name: str, coro_func): - res = coro_func() - if hasattr(res, "__await__"): - return await res - return res + async def with_retry( + self, + _operation_name: str, + coro_func, + *, + non_retryable_statuses: set[int] | None = None, + ): + self.with_retry_calls.append( + { + "non_retryable_statuses": set(non_retryable_statuses or set()), + } + ) + while True: + try: + res = coro_func() + if hasattr(res, "__await__"): + return await res + return res + except httpx.HTTPStatusError as exc: + status = ( + int(exc.response.status_code) if exc.response is not None else None + ) + if ( + status is not None + and non_retryable_statuses is not None + and status in non_retryable_statuses + ): + raise + if status == 429: + continue + raise async def raw_request( self, @@ -102,10 +129,10 @@ def q(limit: int) -> str: ) assert out["btql_last_pagination_key"] == "p1" - # Should try LIMIT 1000 first, then smaller. queries = [call["query"] for call in c.calls] - assert any("LIMIT 1000" in qq for qq in queries) - assert any("LIMIT 500" in qq for qq in queries) + assert sum("LIMIT 1000" in qq for qq in queries) == 1 + assert sum("LIMIT 500" in qq for qq in queries) == 1 + assert c.with_retry_calls[0]["non_retryable_statuses"] == {500, 504} @pytest.mark.asyncio @@ -126,5 +153,110 @@ def q(limit: int) -> str: assert out["btql_last_pagination_key"] == "p1" queries = [call["query"] for call in c.calls] - assert any("LIMIT 1000" in qq for qq in queries) - assert any("LIMIT 500" in qq for qq in queries) + assert sum("LIMIT 1000" in qq for qq in queries) == 1 + assert sum("LIMIT 500" in qq for qq in queries) == 1 + + +@pytest.mark.asyncio +async def test_btql_helper_still_retries_429_at_same_limit() -> None: + class _RateLimitOnceClient(_StubBtqlClient): + def __init__(self) -> None: + super().__init__() + self.rate_limited = False + + async def raw_request( + self, + method: str, + path: str, + *, + json: Any | None = None, + timeout: float | None = None, + **kwargs: Any, + ) -> Any: + _ = kwargs + assert method.upper() == "POST" + assert path == "/btql" + assert json is not None + assert timeout is not None + self.calls.append(json) + if not self.rate_limited: + self.rate_limited = True + req = httpx.Request("POST", "https://api.braintrust.dev/btql") + resp = httpx.Response(429, request=req, text="rate limit") + raise httpx.HTTPStatusError("rate limit", request=req, response=resp) + return {"data": [{"_pagination_key": "p1"}]} + + c = _RateLimitOnceClient() + + def q(limit: int) -> str: + return f"SELECT * FROM project_logs('p', shape => 'spans') ORDER BY _pagination_key ASC LIMIT {limit}" + + out = await fetch_btql_sorted_page_with_retries( + client=c, # type: ignore[arg-type] + query_for_limit=q, + configured_limit=1000, + operation="btql_test", + log_fields={"x": "y"}, + ) + + assert out["btql_last_pagination_key"] == "p1" + queries = [call["query"] for call in c.calls] + assert sum("LIMIT 1000" in qq for qq in queries) == 2 + + +@pytest.mark.asyncio +async def test_btql_helper_resets_to_configured_limit_on_next_fetch() -> None: + class _Fail1000Then500Client(_StubBtqlClient): + async def raw_request( + self, + method: str, + path: str, + *, + json: Any | None = None, + timeout: float | None = None, + **kwargs: Any, + ) -> Any: + _ = kwargs + assert method.upper() == "POST" + assert path == "/btql" + assert json is not None + assert timeout is not None + self.calls.append(json) + + q = json.get("query") + assert isinstance(q, str) + if "LIMIT 1000" in q: + req = httpx.Request("POST", "https://api.braintrust.dev/btql") + resp = httpx.Response(500, request=req, text="internal error") + raise httpx.HTTPStatusError("internal", request=req, response=resp) + return {"data": [{"_pagination_key": "p1"}]} + + c = _Fail1000Then500Client() + + def q(limit: int) -> str: + return f"SELECT * FROM project_logs('p', shape => 'spans') ORDER BY _pagination_key ASC LIMIT {limit}" + + out1 = await fetch_btql_sorted_page_with_retries( + client=c, # type: ignore[arg-type] + query_for_limit=q, + configured_limit=1000, + operation="btql_test", + log_fields={"x": "y"}, + ) + out2 = await fetch_btql_sorted_page_with_retries( + client=c, # type: ignore[arg-type] + query_for_limit=q, + configured_limit=1000, + operation="btql_test", + log_fields={"x": "y"}, + ) + + assert out1["btql_last_pagination_key"] == "p1" + assert out2["btql_last_pagination_key"] == "p1" + queries = [call["query"] for call in c.calls] + assert [("LIMIT 1000" in qq, "LIMIT 500" in qq) for qq in queries] == [ + (True, False), + (False, True), + (True, False), + (False, True), + ] diff --git a/tests/unit/test_byte_batching.py b/tests/unit/test_byte_batching.py index 62300b7..50cacf7 100644 --- a/tests/unit/test_byte_batching.py +++ b/tests/unit/test_byte_batching.py @@ -33,3 +33,12 @@ def test_byte_batcher_splits_by_bytes_and_preserves_order() -> None: assert len(batches) >= 2 for b in batches: assert approx_json_bytes({"events": b}) <= max_bytes + + +def test_approx_json_bytes_counts_utf8_bytes() -> None: + payload = {"events": [{"id": "a", "input": "🙂漢字"}]} + + serialized = '{"events": [{"id": "a", "input": "🙂漢字"}]}' + + assert approx_json_bytes(payload) == len(serialized.encode("utf-8")) + assert approx_json_bytes(payload) > len(serialized) diff --git a/tests/unit/test_concurrent_event_streaming.py b/tests/unit/test_concurrent_event_streaming.py index 2b40cdb..7b65ed2 100644 --- a/tests/unit/test_concurrent_event_streaming.py +++ b/tests/unit/test_concurrent_event_streaming.py @@ -12,6 +12,8 @@ from braintrust_migrate.resources.datasets import DatasetMigrator from braintrust_migrate.resources.experiments import ExperimentMigrator from braintrust_migrate.resources.base import MigrationResult +import braintrust_migrate.resources.datasets as datasets_module +import braintrust_migrate.resources.experiments as experiments_module # --------------------------------------------------------------------------- @@ -30,6 +32,7 @@ def __init__( self._pages = btql_pages_per_resource or {} self._page_idx: dict[str, int] = {} self.inserts: list[dict[str, Any]] = [] + self.btql_queries: list[str] = [] # Concurrency tracking self._in_flight = 0 @@ -42,8 +45,16 @@ def __init__( self.migration_config.copy_attachments = False self.migration_config.insert_max_request_bytes = 6 * 1024 * 1024 self.migration_config.insert_request_headroom_ratio = 0.75 + self.migration_config.events_fetch_group_size = 25 - async def with_retry(self, _name: str, coro_func): + async def with_retry( + self, + _name: str, + coro_func, + *, + non_retryable_statuses: set[int] | None = None, + ): + _ = non_retryable_statuses res = coro_func() if hasattr(res, "__await__"): return await res @@ -62,6 +73,8 @@ async def raw_request( _ = timeout if method.upper() == "POST" and path == "/btql": + query = json.get("query", "") if json else "" + self.btql_queries.append(query) async with self._lock: self._in_flight += 1 self.max_in_flight = max(self.max_in_flight, self._in_flight) @@ -69,27 +82,26 @@ async def raw_request( await asyncio.sleep(0.01) # simulate latency # Determine which resource this BTQL query is for by parsing the query. - query = json.get("query", "") if json else "" resource_id = None - for rid in self._pages: - if rid in query: - resource_id = rid - break + matching_ids = [rid for rid in self._pages if rid in query] async with self._lock: self._in_flight -= 1 - if resource_id and resource_id in self._pages: - idx = self._page_idx.get(resource_id, 0) - pages = self._pages[resource_id] - if idx < len(pages): - self._page_idx[resource_id] = idx + 1 - events = pages[idx] - last_pk = events[-1]["_pagination_key"] if events else None - return { - "data": events, - "btql_last_pagination_key": last_pk, - } + if matching_ids: + combined_events: list[dict[str, Any]] = [] + for resource_id in matching_ids: + idx = self._page_idx.get(resource_id, 0) + pages = self._pages[resource_id] + if idx < len(pages): + self._page_idx[resource_id] = idx + 1 + combined_events.extend(pages[idx]) + combined_events.sort(key=lambda event: event["_pagination_key"]) + last_pk = combined_events[-1]["_pagination_key"] if combined_events else None + return { + "data": combined_events, + "btql_last_pagination_key": last_pk, + } return {"data": []} if method.upper() == "POST" and "/insert" in path: @@ -116,39 +128,58 @@ async def raw_request( @pytest.mark.asyncio class TestConcurrentDatasetStreaming: - async def test_multiple_datasets_streamed_concurrently(self, tmp_path: Path): - """Event streams for multiple datasets should overlap.""" + async def test_multiple_datasets_streamed_in_grouped_btql_fetch( + self, tmp_path: Path + ): + """Dataset events should be fetched via grouped BTQL queries.""" # Set up 3 datasets, each with one page of events. pages = { - "ds1": [[{"id": "e1", "_pagination_key": "pk1", "_xact_id": "1"}]], - "ds2": [[{"id": "e2", "_pagination_key": "pk2", "_xact_id": "2"}]], - "ds3": [[{"id": "e3", "_pagination_key": "pk3", "_xact_id": "3"}]], + "ds1": [[{"id": "e1", "dataset_id": "ds1", "_pagination_key": "pk1", "_xact_id": "1"}]], + "ds2": [[{"id": "e2", "dataset_id": "ds2", "_pagination_key": "pk2", "_xact_id": "2"}]], + "ds3": [[{"id": "e3", "dataset_id": "ds3", "_pagination_key": "pk3", "_xact_id": "3"}]], } source = _ConcurrencyTrackingClient(btql_pages_per_resource=pages) dest = _ConcurrencyTrackingClient() + original_writer = datasets_module.SDKDatasetWriter + + class _FakeSDKDatasetWriter: + def __init__(self, dest_client: _ConcurrencyTrackingClient, dataset_id: str) -> None: + self._dest_client = dest_client + self._dataset_id = dataset_id + + async def write_rows(self, rows: list[dict[str, Any]]) -> None: + self._dest_client.inserts.append( + { + "dataset_id": self._dataset_id, + "events": [dict(row) for row in rows], + } + ) + + datasets_module.SDKDatasetWriter = _FakeSDKDatasetWriter - migrator = DatasetMigrator( - source, # type: ignore[arg-type] - dest, # type: ignore[arg-type] - tmp_path, - events_fetch_limit=100, - events_insert_batch_size=100, - events_use_seen_db=False, - ) + try: + migrator = DatasetMigrator( + source, # type: ignore[arg-type] + dest, # type: ignore[arg-type] + tmp_path, + events_fetch_limit=100, + events_use_seen_db=False, + ) - # Simulate successful dataset creations. - results = [ - MigrationResult(success=True, source_id="ds1", dest_id="dest-ds1"), - MigrationResult(success=True, source_id="ds2", dest_id="dest-ds2"), - MigrationResult(success=True, source_id="ds3", dest_id="dest-ds3"), - ] + results = [ + MigrationResult(success=True, source_id="ds1", dest_id="dest-ds1"), + MigrationResult(success=True, source_id="ds2", dest_id="dest-ds2"), + MigrationResult(success=True, source_id="ds3", dest_id="dest-ds3"), + ] - await migrator._migrate_records_for_datasets(results) + await migrator._migrate_records_for_datasets(results) - # Events should have been inserted for all 3 datasets. - assert len(dest.inserts) >= 3 - # Concurrency should have been observed (source fetch overlap). - assert source.max_in_flight >= 1 # At least serial works + assert len(dest.inserts) == 3 + assert source.max_in_flight >= 1 + assert len(source.btql_queries) == 2 + assert "dataset('ds1', 'ds2', 'ds3') spans" in source.btql_queries[0] + finally: + datasets_module.SDKDatasetWriter = original_writer # --------------------------------------------------------------------------- @@ -159,33 +190,53 @@ async def test_multiple_datasets_streamed_concurrently(self, tmp_path: Path): @pytest.mark.asyncio class TestConcurrentExperimentStreaming: - async def test_multiple_experiments_streamed_concurrently(self, tmp_path: Path): - """Event streams for multiple experiments should overlap.""" + async def test_multiple_experiments_streamed_in_grouped_btql_fetch( + self, tmp_path: Path + ): + """Experiment events should be fetched via grouped BTQL queries.""" pages = { - "exp1": [[{"id": "e1", "_pagination_key": "pk1", "_xact_id": "1"}]], - "exp2": [[{"id": "e2", "_pagination_key": "pk2", "_xact_id": "2"}]], - "exp3": [[{"id": "e3", "_pagination_key": "pk3", "_xact_id": "3"}]], + "exp1": [[{"id": "e1", "experiment_id": "exp1", "_pagination_key": "pk1", "_xact_id": "1"}]], + "exp2": [[{"id": "e2", "experiment_id": "exp2", "_pagination_key": "pk2", "_xact_id": "2"}]], + "exp3": [[{"id": "e3", "experiment_id": "exp3", "_pagination_key": "pk3", "_xact_id": "3"}]], } source = _ConcurrencyTrackingClient(btql_pages_per_resource=pages) dest = _ConcurrencyTrackingClient() - - migrator = ExperimentMigrator( - source, # type: ignore[arg-type] - dest, # type: ignore[arg-type] - tmp_path, - events_fetch_limit=100, - events_insert_batch_size=100, - events_use_seen_db=False, - ) - - results = [ - MigrationResult(success=True, source_id="exp1", dest_id="dest-exp1"), - MigrationResult(success=True, source_id="exp2", dest_id="dest-exp2"), - MigrationResult(success=True, source_id="exp3", dest_id="dest-exp3"), - ] - - await migrator._migrate_events_for_experiments(results) - - # Events should have been inserted. - assert len(dest.inserts) >= 3 - assert source.max_in_flight >= 1 + original_writer = experiments_module.SDKExperimentWriter + + class _FakeSDKExperimentWriter: + def __init__(self, dest_client: _ConcurrencyTrackingClient, experiment_id: str) -> None: + self._dest_client = dest_client + self._experiment_id = experiment_id + + async def write_rows(self, rows: list[dict[str, Any]]) -> None: + self._dest_client.inserts.append( + { + "experiment_id": self._experiment_id, + "events": [dict(row) for row in rows], + } + ) + + experiments_module.SDKExperimentWriter = _FakeSDKExperimentWriter + + try: + migrator = ExperimentMigrator( + source, # type: ignore[arg-type] + dest, # type: ignore[arg-type] + tmp_path, + events_fetch_limit=100, + events_use_seen_db=False, + ) + + results = [ + MigrationResult(success=True, source_id="exp1", dest_id="dest-exp1"), + MigrationResult(success=True, source_id="exp2", dest_id="dest-exp2"), + MigrationResult(success=True, source_id="exp3", dest_id="dest-exp3"), + ] + + await migrator._migrate_events_for_experiments(results) + + assert len(dest.inserts) == 3 + assert len(source.btql_queries) == 2 + assert "experiment('exp1', 'exp2', 'exp3') spans" in source.btql_queries[0] + finally: + experiments_module.SDKExperimentWriter = original_writer diff --git a/tests/unit/test_concurrent_migrate_batch.py b/tests/unit/test_concurrent_migrate_batch.py index b1f7570..263c301 100644 --- a/tests/unit/test_concurrent_migrate_batch.py +++ b/tests/unit/test_concurrent_migrate_batch.py @@ -170,3 +170,23 @@ async def mock_list(project_id=None): assert results["migrated"] == 6 assert m._max_in_flight <= 3 assert m._max_in_flight > 1 # Should have had some concurrency + + async def test_migrate_all_includes_skip_summary(self, tmp_path: Path): + """migrate_all should return a human-readable skip summary.""" + m = _TestMigrator(tmp_path) + m.state.id_mapping["existing"] = "dest-existing" + + async def mock_list(project_id=None): + _ = project_id + return [ + {"id": "existing", "name": "Already"}, + {"id": "new", "name": "New"}, + ] + + m.list_source_resources = mock_list # type: ignore[assignment] + + results = await m.migrate_all(project_id="proj", max_concurrent=2) + + assert results["skipped"] == 1 + assert results["skip_breakdown"] == {"already_migrated": 1} + assert results["skip_summary"] == "1 already migrated" diff --git a/tests/unit/test_config.py b/tests/unit/test_config.py index e40ddc5..f077845 100644 --- a/tests/unit/test_config.py +++ b/tests/unit/test_config.py @@ -132,6 +132,7 @@ def test_valid_env_config(self, monkeypatch): monkeypatch.setenv("MIGRATION_ACL_AUTO_INVITE_USERS", "true") monkeypatch.setenv("MIGRATION_GROUP_MAP_USERS", "true") monkeypatch.setenv("MIGRATION_GROUP_AUTO_INVITE_USERS", "true") + monkeypatch.setenv("MIGRATION_EVENTS_FETCH_GROUP_SIZE", "17") monkeypatch.setenv("LOG_LEVEL", "DEBUG") config = Config.from_env() @@ -145,4 +146,5 @@ def test_valid_env_config(self, monkeypatch): assert config.migration.acl_auto_invite_users is True assert config.migration.group_map_users is True assert config.migration.group_auto_invite_users is True + assert config.migration.events_fetch_group_size == 17 assert config.logging.level == "DEBUG" diff --git a/tests/unit/test_dataset_insert_bisect_413.py b/tests/unit/test_dataset_insert_bisect_413.py deleted file mode 100644 index 47462ed..0000000 --- a/tests/unit/test_dataset_insert_bisect_413.py +++ /dev/null @@ -1,109 +0,0 @@ -from __future__ import annotations - -from pathlib import Path -from typing import Any - -import httpx -import pytest - -from braintrust_migrate.resources.datasets import DatasetMigrator - - -class _SourceClient: - def __init__(self, btql_pages: list[list[dict[str, Any]]]) -> None: - self.btql_pages = btql_pages - - async def with_retry(self, _operation_name: str, coro_func): - res = coro_func() - if hasattr(res, "__await__"): - return await res - return res - - async def raw_request( - self, - method: str, - path: str, - *, - params: dict[str, Any] | None = None, - json: Any | None = None, - timeout: float | None = None, - ) -> Any: - _ = params - _ = timeout - assert method.lower() == "post" - if path == "/btql": - assert json is not None and isinstance(json.get("query"), str) - if not self.btql_pages: - return {"data": []} - return {"data": self.btql_pages.pop(0)} - raise AssertionError(f"Unexpected path: {path}") - - -class _Dest413Client: - def __init__(self, *, max_events_per_insert: int) -> None: - self.max_events_per_insert = max_events_per_insert - self.successful_inserts: list[int] = [] - - async def with_retry(self, _operation_name: str, coro_func): - res = coro_func() - if hasattr(res, "__await__"): - return await res - return res - - async def raw_request( - self, - method: str, - path: str, - *, - params: dict[str, Any] | None = None, - json: Any | None = None, - timeout: float | None = None, - ) -> Any: - _ = params - _ = timeout - assert method.lower() == "post" - if path.endswith("/insert"): - assert json is not None - events = json.get("events", []) - if len(events) > self.max_events_per_insert: - req = httpx.Request( - "POST", "https://api.braintrust.dev/v1/dataset/x/insert" - ) - resp = httpx.Response(413, request=req) - raise httpx.HTTPStatusError("413", request=req, response=resp) - self.successful_inserts.append(len(events)) - return {"row_ids": [e.get("id", "") for e in events]} - raise AssertionError(f"Unexpected path: {path}") - - -@pytest.mark.asyncio -async def test_dataset_insert_bisects_on_413(tmp_path: Path) -> None: - TOTAL_EVENTS = 25 - MAX_PER_INSERT = 10 - ds_id = "ds-source" - - page1 = [ - { - "id": f"e{i}", - "_pagination_key": f"p{i:05d}", - "_xact_id": str(100 - i), - "created": "2023-01-01T00:00:00Z", - } - for i in range(TOTAL_EVENTS) - ] - source = _SourceClient([page1]) - dest = _Dest413Client(max_events_per_insert=MAX_PER_INSERT) - - migrator = DatasetMigrator( - source, # type: ignore[arg-type] - dest, # type: ignore[arg-type] - tmp_path, - events_fetch_limit=50, - events_insert_batch_size=50, - events_use_version_snapshot=False, - events_use_seen_db=False, - ) - - await migrator._migrate_dataset_records_streaming(ds_id, "ds-dest") - assert sum(dest.successful_inserts) == TOTAL_EVENTS - assert all(n <= MAX_PER_INSERT for n in dest.successful_inserts) diff --git a/tests/unit/test_dataset_streaming_migrator.py b/tests/unit/test_dataset_streaming_migrator.py index f1f3e01..329863a 100644 --- a/tests/unit/test_dataset_streaming_migrator.py +++ b/tests/unit/test_dataset_streaming_migrator.py @@ -5,6 +5,7 @@ import pytest +import braintrust_migrate.resources.datasets as datasets_module from braintrust_migrate.resources.datasets import DatasetMigrator @@ -13,7 +14,14 @@ def __init__(self, *, btql_pages: list[list[dict[str, Any]]] | None = None) -> N self._btql_pages = btql_pages or [] self.inserts: list[dict[str, Any]] = [] - async def with_retry(self, _operation_name: str, coro_func): + async def with_retry( + self, + _operation_name: str, + coro_func, + *, + non_retryable_statuses: set[int] | None = None, + ): + _ = non_retryable_statuses res = coro_func() if hasattr(res, "__await__"): return await res @@ -38,15 +46,75 @@ async def raw_request( return {"data": self._btql_pages.pop(0)} return {"data": []} - if path.endswith("/insert"): - assert json is not None - self.inserts.append(json) - events = json.get("events", []) - return {"row_ids": [e.get("id", "") for e in events]} - raise AssertionError(f"Unexpected path: {path}") +class _FakeSDKDatasetWriter: + def __init__(self, dest_client: _StubClient, dataset_id: str) -> None: + self._dest_client = dest_client + self._dataset_id = dataset_id + + async def write_rows(self, rows: list[dict[str, Any]]) -> None: + self._dest_client.inserts.append( + { + "dataset_id": self._dataset_id, + "events": [dict(row) for row in rows], + } + ) + + +class _PaginationAwareStubClient(_StubClient): + def __init__(self) -> None: + super().__init__(btql_pages=None) + self.btql_queries: list[str] = [] + + async def raw_request( + self, + method: str, + path: str, + *, + params: dict[str, Any] | None = None, + json: Any | None = None, + timeout: float | None = None, + ) -> Any: + _ = params + _ = timeout + assert method.lower() == "post" + + if path != "/btql": + raise AssertionError(f"Unexpected path: {path}") + + assert json is not None and isinstance(json.get("query"), str) + query = json["query"] + self.btql_queries.append(query) + + if "_pagination_key > 'p2'" in query: + return {"data": []} + if "_pagination_key > 'p1'" in query: + return { + "data": [ + { + "id": "b", + "dataset_id": "source-dataset-id", + "_pagination_key": "p2", + "_xact_id": "2", + "created": "2023-01-01T00:00:01Z", + } + ] + } + return { + "data": [ + { + "id": "a", + "dataset_id": "source-dataset-id", + "_pagination_key": "p1", + "_xact_id": "1", + "created": "2023-01-01T00:00:00Z", + } + ] + } + + @pytest.mark.asyncio async def test_dataset_streaming_skips_deleted_and_duplicates(tmp_path: Path) -> None: # Page 1 includes one deleted event "d" @@ -72,23 +140,60 @@ async def test_dataset_streaming_skips_deleted_and_duplicates(tmp_path: Path) -> source = _StubClient(btql_pages=[page1, page2]) dest = _StubClient() + original_writer = datasets_module.SDKDatasetWriter + datasets_module.SDKDatasetWriter = _FakeSDKDatasetWriter + + try: + migrator = DatasetMigrator( + source, # type: ignore[arg-type] + dest, # type: ignore[arg-type] + tmp_path, + events_fetch_limit=1, + events_use_seen_db=True, + ) + + await migrator._migrate_dataset_records("source-dataset-id", "dest-dataset-id") # type: ignore[attr-defined] + + inserted_ids: list[str] = [] + for call in dest.inserts: + inserted_ids.extend([e["id"] for e in call["events"]]) + assert inserted_ids == ["a"] + assert len(dest.inserts) == 1 + assert all(call["dataset_id"] == "dest-dataset-id" for call in dest.inserts) + assert (tmp_path / "dataset_events" / "source-dataset-id_state.json").exists() + finally: + datasets_module.SDKDatasetWriter = original_writer - migrator = DatasetMigrator( - source, # type: ignore[arg-type] - dest, # type: ignore[arg-type] - tmp_path, - events_fetch_limit=1, - events_insert_batch_size=2, - events_use_version_snapshot=False, - events_use_seen_db=True, - ) - - await migrator._migrate_dataset_records("source-dataset-id", "dest-dataset-id") # type: ignore[attr-defined] - - inserted_ids: list[str] = [] - for call in dest.inserts: - inserted_ids.extend([e["id"] for e in call["events"]]) - assert inserted_ids == ["a"] - - # Checkpoint exists - assert (tmp_path / "dataset_events" / "source-dataset-id_state.json").exists() + +@pytest.mark.asyncio +async def test_dataset_streaming_advances_btql_pagination_before_flush( + tmp_path: Path, +) -> None: + source = _PaginationAwareStubClient() + dest = _StubClient() + original_writer = datasets_module.SDKDatasetWriter + datasets_module.SDKDatasetWriter = _FakeSDKDatasetWriter + + try: + migrator = DatasetMigrator( + source, # type: ignore[arg-type] + dest, # type: ignore[arg-type] + tmp_path, + events_fetch_limit=100000, + events_use_seen_db=False, + ) + migrator._sdk_flush_max_rows = 100 + + await migrator._migrate_dataset_records( # type: ignore[attr-defined] + "source-dataset-id", "dest-dataset-id" + ) + + inserted_ids: list[str] = [] + for call in dest.inserts: + inserted_ids.extend([e["id"] for e in call["events"]]) + + assert inserted_ids == ["a", "b"] + assert any("_pagination_key > 'p1'" in query for query in source.btql_queries) + assert len(source.btql_queries) == 3 + finally: + datasets_module.SDKDatasetWriter = original_writer diff --git a/tests/unit/test_dataset_streaming_resume.py b/tests/unit/test_dataset_streaming_resume.py index 8fd4d6b..6ef9a7f 100644 --- a/tests/unit/test_dataset_streaming_resume.py +++ b/tests/unit/test_dataset_streaming_resume.py @@ -5,6 +5,7 @@ import pytest +import braintrust_migrate.resources.datasets as datasets_module from braintrust_migrate.resources.datasets import DatasetMigrator @@ -21,7 +22,14 @@ def __init__( self.fail_on_insert_call: int | None = None self._insert_calls = 0 - async def with_retry(self, _operation_name: str, coro_func): + async def with_retry( + self, + _operation_name: str, + coro_func, + *, + non_retryable_statuses: set[int] | None = None, + ): + _ = non_retryable_statuses res = coro_func() if hasattr(res, "__await__"): return await res @@ -51,18 +59,21 @@ async def raw_request( return {"data": []} return {"data": self._page1} - if path.endswith("/insert"): - self._insert_calls += 1 - if self.fail_on_insert_call == self._insert_calls: - raise RuntimeError("simulated insert failure") - assert json is not None - events = json.get("events", []) - self.inserts.append(events) - return {"row_ids": [e.get("id", "") for e in events]} - raise AssertionError(f"Unexpected path: {path}") +class _FakeSDKDatasetWriter: + def __init__(self, dest_client: _StubClient, dataset_id: str) -> None: + self._dest_client = dest_client + self._dataset_id = dataset_id + + async def write_rows(self, rows: list[dict[str, Any]]) -> None: + self._dest_client._insert_calls += 1 + if self._dest_client.fail_on_insert_call == self._dest_client._insert_calls: + raise RuntimeError("simulated insert failure") + self._dest_client.inserts.append([dict(row) for row in rows]) + + @pytest.mark.asyncio async def test_dataset_streaming_resume_after_insert_failure(tmp_path: Path) -> None: page1 = [ @@ -85,28 +96,34 @@ async def test_dataset_streaming_resume_after_insert_failure(tmp_path: Path) -> source = _StubClient(page1=page1, page2=page2) dest = _StubClient() dest.fail_on_insert_call = 2 + original_writer = datasets_module.SDKDatasetWriter + original_flush_max_rows = datasets_module.DatasetMigrator.SDK_FLUSH_MAX_ROWS + datasets_module.SDKDatasetWriter = _FakeSDKDatasetWriter + datasets_module.DatasetMigrator.SDK_FLUSH_MAX_ROWS = 1 + + try: + migrator = DatasetMigrator( + source, # type: ignore[arg-type] + dest, # type: ignore[arg-type] + tmp_path, + events_fetch_limit=1, + events_use_seen_db=True, + ) + + with pytest.raises(RuntimeError): + await migrator._migrate_dataset_records( # type: ignore[attr-defined] + "source-dataset-id", "dest-dataset-id" + ) + + inserted_first = [e["id"] for batch in dest.inserts for e in batch] + assert inserted_first == ["a"] - migrator = DatasetMigrator( - source, # type: ignore[arg-type] - dest, # type: ignore[arg-type] - tmp_path, - events_fetch_limit=1, - events_insert_batch_size=10, - events_use_version_snapshot=False, - events_use_seen_db=True, - ) - - with pytest.raises(RuntimeError): + dest.fail_on_insert_call = None await migrator._migrate_dataset_records( # type: ignore[attr-defined] "source-dataset-id", "dest-dataset-id" ) - - inserted_first = [e["id"] for batch in dest.inserts for e in batch] - assert inserted_first == ["a"] - - dest.fail_on_insert_call = None - await migrator._migrate_dataset_records( # type: ignore[attr-defined] - "source-dataset-id", "dest-dataset-id" - ) - inserted_all = [e["id"] for batch in dest.inserts for e in batch] - assert inserted_all == ["a", "b"] + inserted_all = [e["id"] for batch in dest.inserts for e in batch] + assert inserted_all == ["a", "b"] + finally: + datasets_module.SDKDatasetWriter = original_writer + datasets_module.DatasetMigrator.SDK_FLUSH_MAX_ROWS = original_flush_max_rows diff --git a/tests/unit/test_experiment_dependencies.py b/tests/unit/test_experiment_dependencies.py index b921e29..d28e2dd 100644 --- a/tests/unit/test_experiment_dependencies.py +++ b/tests/unit/test_experiment_dependencies.py @@ -7,8 +7,14 @@ from braintrust_migrate.resources.experiments import ExperimentMigrator -async def _passthrough_with_retry(_operation_name: str, coro_func): +async def _passthrough_with_retry( + _operation_name: str, + coro_func, + *, + non_retryable_statuses: set[int] | None = None, +): """Execute the callable passed to with_retry, like the real BraintrustClient does.""" + _ = non_retryable_statuses result = coro_func() if hasattr(result, "__await__"): return await result @@ -197,7 +203,10 @@ async def test_migrate_resource_with_resolved_base_exp( """Test migration with resolved base experiment dependency.""" # Mock successful experiment creation via raw_request - async def mock_with_retry(operation_name, coro_func): + async def mock_with_retry( + operation_name, coro_func, *, non_retryable_statuses=None + ): + _ = operation_name, non_retryable_statuses result = coro_func() if hasattr(result, "__await__"): return await result @@ -234,7 +243,10 @@ async def test_migrate_resource_with_resolved_dataset( """Test migration with resolved dataset dependency.""" # Mock successful experiment creation - async def mock_with_retry(operation_name, coro_func): + async def mock_with_retry( + operation_name, coro_func, *, non_retryable_statuses=None + ): + _ = operation_name, non_retryable_statuses result = coro_func() if hasattr(result, "__await__"): return await result @@ -271,7 +283,10 @@ async def test_migrate_resource_with_unresolved_dependencies( """Test migration with unresolved dependencies (should log warnings but continue).""" # Mock successful experiment creation - async def mock_with_retry(operation_name, coro_func): + async def mock_with_retry( + operation_name, coro_func, *, non_retryable_statuses=None + ): + _ = operation_name, non_retryable_statuses result = coro_func() if hasattr(result, "__await__"): return await result @@ -307,7 +322,10 @@ async def test_migrate_resource_without_dependencies( """Test migration of experiment without dependencies.""" # Mock successful experiment creation - async def mock_with_retry(operation_name, coro_func): + async def mock_with_retry( + operation_name, coro_func, *, non_retryable_statuses=None + ): + _ = operation_name, non_retryable_statuses result = coro_func() if hasattr(result, "__await__"): return await result @@ -375,7 +393,10 @@ async def test_populate_dependency_mappings( } # Setup mock with_retry to handle raw_request calls - async def mock_with_retry(operation_name, coro_func): + async def mock_with_retry( + operation_name, coro_func, *, non_retryable_statuses=None + ): + _ = operation_name, non_retryable_statuses result = coro_func() if hasattr(result, "__await__"): return await result diff --git a/tests/unit/test_experiment_insert_bisect_413.py b/tests/unit/test_experiment_insert_bisect_413.py deleted file mode 100644 index 1766f8c..0000000 --- a/tests/unit/test_experiment_insert_bisect_413.py +++ /dev/null @@ -1,110 +0,0 @@ -from __future__ import annotations - -from pathlib import Path -from typing import Any - -import httpx -import pytest - -from braintrust_migrate.resources.experiments import ExperimentMigrator - - -class _SourceClient: - def __init__(self, btql_pages: list[list[dict[str, Any]]]) -> None: - self.btql_pages = btql_pages - - async def with_retry(self, _operation_name: str, coro_func): - res = coro_func() - if hasattr(res, "__await__"): - return await res - return res - - async def raw_request( - self, - method: str, - path: str, - *, - params: dict[str, Any] | None = None, - json: Any | None = None, - timeout: float | None = None, - ) -> Any: - _ = params - _ = timeout - assert method.lower() == "post" - if path == "/btql": - assert json is not None and isinstance(json.get("query"), str) - if not self.btql_pages: - return {"data": []} - return {"data": self.btql_pages.pop(0)} - raise AssertionError(f"Unexpected path: {path}") - - -class _Dest413Client: - def __init__(self, *, max_events_per_insert: int) -> None: - self.max_events_per_insert = max_events_per_insert - self.successful_inserts: list[int] = [] - - async def with_retry(self, _operation_name: str, coro_func): - res = coro_func() - if hasattr(res, "__await__"): - return await res - return res - - async def raw_request( - self, - method: str, - path: str, - *, - params: dict[str, Any] | None = None, - json: Any | None = None, - timeout: float | None = None, - ) -> Any: - _ = params - _ = timeout - assert method.lower() == "post" - if path.endswith("/insert"): - assert json is not None - events = json.get("events", []) - if len(events) > self.max_events_per_insert: - req = httpx.Request( - "POST", "https://api.braintrust.dev/v1/experiment/x/insert" - ) - resp = httpx.Response(413, request=req) - raise httpx.HTTPStatusError("413", request=req, response=resp) - self.successful_inserts.append(len(events)) - return {"row_ids": [e.get("id", "") for e in events]} - raise AssertionError(f"Unexpected path: {path}") - - -@pytest.mark.asyncio -async def test_experiment_insert_bisects_on_413(tmp_path: Path) -> None: - TOTAL_EVENTS = 25 - MAX_PER_INSERT = 10 - exp_id = "exp-source" - - page1 = [ - { - "id": f"e{i}", - "_pagination_key": f"p{i:05d}", - "_xact_id": str(100 - i), - "created": "2023-01-01T00:00:00Z", - } - for i in range(TOTAL_EVENTS) - ] - source = _SourceClient([page1]) - dest = _Dest413Client(max_events_per_insert=MAX_PER_INSERT) - - migrator = ExperimentMigrator( - source, # type: ignore[arg-type] - dest, # type: ignore[arg-type] - tmp_path, - events_fetch_limit=50, - events_insert_batch_size=50, - events_use_version_snapshot=False, - events_use_seen_db=False, - ) - - # call internal streaming method to avoid mocking experiment list/create plumbing - await migrator._migrate_experiment_events_streaming(exp_id, "exp-dest") - assert sum(dest.successful_inserts) == TOTAL_EVENTS - assert all(n <= MAX_PER_INSERT for n in dest.successful_inserts) diff --git a/tests/unit/test_experiment_streaming_migrator.py b/tests/unit/test_experiment_streaming_migrator.py index 2e54d3a..6750ec6 100644 --- a/tests/unit/test_experiment_streaming_migrator.py +++ b/tests/unit/test_experiment_streaming_migrator.py @@ -5,6 +5,7 @@ import pytest +import braintrust_migrate.resources.experiments as experiments_module from braintrust_migrate.resources.experiments import ExperimentMigrator @@ -14,7 +15,14 @@ def __init__(self, *, btql_pages: list[list[dict[str, Any]]] | None = None) -> N self.inserts: list[dict[str, Any]] = [] self.fail_insert_once: bool = False - async def with_retry(self, _operation_name: str, coro_func): + async def with_retry( + self, + _operation_name: str, + coro_func, + *, + non_retryable_statuses: set[int] | None = None, + ): + _ = non_retryable_statuses res = coro_func() if hasattr(res, "__await__"): return await res @@ -39,15 +47,75 @@ async def raw_request( return {"data": self._btql_pages.pop(0)} return {"data": []} - if path.endswith("/insert"): - assert json is not None - self.inserts.append(json) - events = json.get("events", []) - return {"row_ids": [e.get("id", "") for e in events]} - raise AssertionError(f"Unexpected path: {path}") +class _FakeSDKExperimentWriter: + def __init__(self, dest_client: _StubClient, experiment_id: str) -> None: + self._dest_client = dest_client + self._experiment_id = experiment_id + + async def write_rows(self, rows: list[dict[str, Any]]) -> None: + self._dest_client.inserts.append( + { + "experiment_id": self._experiment_id, + "events": [dict(row) for row in rows], + } + ) + + +class _PaginationAwareStubClient(_StubClient): + def __init__(self) -> None: + super().__init__(btql_pages=None) + self.btql_queries: list[str] = [] + + async def raw_request( + self, + method: str, + path: str, + *, + params: dict[str, Any] | None = None, + json: Any | None = None, + timeout: float | None = None, + ) -> Any: + _ = params + _ = timeout + assert method.lower() == "post" + + if path != "/btql": + raise AssertionError(f"Unexpected path: {path}") + + assert json is not None and isinstance(json.get("query"), str) + query = json["query"] + self.btql_queries.append(query) + + if "_pagination_key > 'p2'" in query: + return {"data": []} + if "_pagination_key > 'p1'" in query: + return { + "data": [ + { + "id": "b", + "experiment_id": "source-exp-id", + "_pagination_key": "p2", + "_xact_id": "2", + "created": "2023-01-01T00:00:01Z", + } + ] + } + return { + "data": [ + { + "id": "a", + "experiment_id": "source-exp-id", + "_pagination_key": "p1", + "_xact_id": "1", + "created": "2023-01-01T00:00:00Z", + } + ] + } + + @pytest.mark.asyncio async def test_experiment_streaming_skips_deleted_and_duplicates( tmp_path: Path, @@ -56,17 +124,25 @@ async def test_experiment_streaming_skips_deleted_and_duplicates( page1 = [ { "id": "a", + "experiment_id": "source-exp-id", "_pagination_key": "p1", "_xact_id": "10", "created": "2023-01-01T00:00:00Z", }, - {"id": "d", "_pagination_key": "p1.5", "_xact_id": "9", "_object_delete": True}, + { + "id": "d", + "experiment_id": "source-exp-id", + "_pagination_key": "p1.5", + "_xact_id": "9", + "_object_delete": True, + }, ] # Page 2 has duplicate older version of "a" (should be skipped) page2 = [ { "id": "a", + "experiment_id": "source-exp-id", "_pagination_key": "p2", "_xact_id": "8", "created": "2022-12-31T23:59:59Z", @@ -75,25 +151,62 @@ async def test_experiment_streaming_skips_deleted_and_duplicates( source = _StubClient(btql_pages=[page1, page2]) dest = _StubClient() + original_writer = experiments_module.SDKExperimentWriter + experiments_module.SDKExperimentWriter = _FakeSDKExperimentWriter + + try: + migrator = ExperimentMigrator( + source, # type: ignore[arg-type] + dest, # type: ignore[arg-type] + tmp_path, + events_fetch_limit=1, + events_use_seen_db=True, + ) + + await migrator._migrate_experiment_events( # type: ignore[attr-defined] + "source-exp-id", "dest-exp-id" + ) + + inserted_ids: list[str] = [] + for call in dest.inserts: + inserted_ids.extend([e["id"] for e in call["events"]]) + assert inserted_ids == ["a"] + assert len(dest.inserts) == 1 + assert all(call["experiment_id"] == "dest-exp-id" for call in dest.inserts) + assert (tmp_path / "experiment_events" / "source-exp-id_state.json").exists() + finally: + experiments_module.SDKExperimentWriter = original_writer + - migrator = ExperimentMigrator( - source, # type: ignore[arg-type] - dest, # type: ignore[arg-type] - tmp_path, - events_fetch_limit=1, - events_insert_batch_size=2, - events_use_version_snapshot=False, - events_use_seen_db=True, - ) - - await migrator._migrate_experiment_events( # type: ignore[attr-defined] - "source-exp-id", "dest-exp-id" - ) - - inserted_ids: list[str] = [] - for call in dest.inserts: - inserted_ids.extend([e["id"] for e in call["events"]]) - assert inserted_ids == ["a"] - - # Checkpoint exists - assert (tmp_path / "experiment_events" / "source-exp-id_state.json").exists() +@pytest.mark.asyncio +async def test_experiment_streaming_advances_btql_pagination_before_flush( + tmp_path: Path, +) -> None: + source = _PaginationAwareStubClient() + dest = _StubClient() + original_writer = experiments_module.SDKExperimentWriter + experiments_module.SDKExperimentWriter = _FakeSDKExperimentWriter + + try: + migrator = ExperimentMigrator( + source, # type: ignore[arg-type] + dest, # type: ignore[arg-type] + tmp_path, + events_fetch_limit=100000, + events_use_seen_db=False, + ) + migrator._sdk_flush_max_rows = 100 + + await migrator._migrate_experiment_events( # type: ignore[attr-defined] + "source-exp-id", "dest-exp-id" + ) + + inserted_ids: list[str] = [] + for call in dest.inserts: + inserted_ids.extend([e["id"] for e in call["events"]]) + + assert inserted_ids == ["a", "b"] + assert any("_pagination_key > 'p1'" in query for query in source.btql_queries) + assert len(source.btql_queries) == 3 + finally: + experiments_module.SDKExperimentWriter = original_writer diff --git a/tests/unit/test_experiment_streaming_resume.py b/tests/unit/test_experiment_streaming_resume.py index dec3aad..1c06601 100644 --- a/tests/unit/test_experiment_streaming_resume.py +++ b/tests/unit/test_experiment_streaming_resume.py @@ -5,6 +5,7 @@ import pytest +import braintrust_migrate.resources.experiments as experiments_module from braintrust_migrate.resources.experiments import ExperimentMigrator @@ -21,7 +22,14 @@ def __init__( self.fail_on_insert_call: int | None = None self._insert_calls = 0 - async def with_retry(self, _operation_name: str, coro_func): + async def with_retry( + self, + _operation_name: str, + coro_func, + *, + non_retryable_statuses: set[int] | None = None, + ): + _ = non_retryable_statuses res = coro_func() if hasattr(res, "__await__"): return await res @@ -51,24 +59,28 @@ async def raw_request( return {"data": []} return {"data": self._page1} - if path.endswith("/insert"): - self._insert_calls += 1 - if self.fail_on_insert_call == self._insert_calls: - raise RuntimeError("simulated insert failure") - assert json is not None - events = json.get("events", []) - self.inserts.append(events) - return {"row_ids": [e.get("id", "") for e in events]} - raise AssertionError(f"Unexpected path: {path}") +class _FakeSDKExperimentWriter: + def __init__(self, dest_client: _StubClient, experiment_id: str) -> None: + self._dest_client = dest_client + self._experiment_id = experiment_id + + async def write_rows(self, rows: list[dict[str, Any]]) -> None: + self._dest_client._insert_calls += 1 + if self._dest_client.fail_on_insert_call == self._dest_client._insert_calls: + raise RuntimeError("simulated insert failure") + self._dest_client.inserts.append([dict(row) for row in rows]) + + @pytest.mark.asyncio async def test_experiment_streaming_resume_after_insert_failure(tmp_path: Path) -> None: # Page 1 (will insert "a") page1 = [ { "id": "a", + "experiment_id": "source-exp-id", "_pagination_key": "p1", "_xact_id": "10", "created": "2023-01-01T00:00:00Z", @@ -78,6 +90,7 @@ async def test_experiment_streaming_resume_after_insert_failure(tmp_path: Path) page2 = [ { "id": "b", + "experiment_id": "source-exp-id", "_pagination_key": "p2", "_xact_id": "9", "created": "2023-01-01T00:00:01Z", @@ -87,29 +100,34 @@ async def test_experiment_streaming_resume_after_insert_failure(tmp_path: Path) source = _StubClient(page1=page1, page2=page2) dest = _StubClient() dest.fail_on_insert_call = 2 # fail on second insert during first run + original_writer = experiments_module.SDKExperimentWriter + original_flush_max_rows = experiments_module.ExperimentMigrator.SDK_FLUSH_MAX_ROWS + experiments_module.SDKExperimentWriter = _FakeSDKExperimentWriter + experiments_module.ExperimentMigrator.SDK_FLUSH_MAX_ROWS = 1 + + try: + migrator = ExperimentMigrator( + source, # type: ignore[arg-type] + dest, # type: ignore[arg-type] + tmp_path, + events_fetch_limit=1, + events_use_seen_db=True, + ) + + with pytest.raises(RuntimeError): + await migrator._migrate_experiment_events( # type: ignore[attr-defined] + "source-exp-id", "dest-exp-id" + ) + + inserted_first = [e["id"] for batch in dest.inserts for e in batch] + assert inserted_first == ["a"] - migrator = ExperimentMigrator( - source, # type: ignore[arg-type] - dest, # type: ignore[arg-type] - tmp_path, - events_fetch_limit=1, - events_insert_batch_size=10, - events_use_version_snapshot=False, - events_use_seen_db=True, - ) - - with pytest.raises(RuntimeError): + dest.fail_on_insert_call = None await migrator._migrate_experiment_events( # type: ignore[attr-defined] "source-exp-id", "dest-exp-id" ) - - inserted_first = [e["id"] for batch in dest.inserts for e in batch] - assert inserted_first == ["a"] - - # Resume - dest.fail_on_insert_call = None - await migrator._migrate_experiment_events( # type: ignore[attr-defined] - "source-exp-id", "dest-exp-id" - ) - inserted_all = [e["id"] for batch in dest.inserts for e in batch] - assert inserted_all == ["a", "b"] + inserted_all = [e["id"] for batch in dest.inserts for e in batch] + assert inserted_all == ["a", "b"] + finally: + experiments_module.SDKExperimentWriter = original_writer + experiments_module.ExperimentMigrator.SDK_FLUSH_MAX_ROWS = original_flush_max_rows diff --git a/tests/unit/test_logs_btql_sorted_fetch.py b/tests/unit/test_logs_btql_sorted_fetch.py index dc70cfb..5e7ad06 100644 --- a/tests/unit/test_logs_btql_sorted_fetch.py +++ b/tests/unit/test_logs_btql_sorted_fetch.py @@ -5,6 +5,7 @@ import pytest +import braintrust_migrate.resources.logs as logs_module from braintrust_migrate.config import MigrationConfig from braintrust_migrate.resources.logs import LogsMigrator @@ -17,7 +18,14 @@ def __init__( self.migration_config = mig_cfg self._calls = 0 - async def with_retry(self, _operation_name: str, coro_func): + async def with_retry( + self, + _operation_name: str, + coro_func, + *, + non_retryable_statuses: set[int] | None = None, + ): + _ = non_retryable_statuses res = coro_func() if hasattr(res, "__await__"): return await res @@ -61,7 +69,14 @@ def __init__( self.migration_config = mig_cfg self.queries: list[str] = [] - async def with_retry(self, _operation_name: str, coro_func): + async def with_retry( + self, + _operation_name: str, + coro_func, + *, + non_retryable_statuses: set[int] | None = None, + ): + _ = non_retryable_statuses res = coro_func() if hasattr(res, "__await__"): return await res @@ -101,18 +116,14 @@ def __init__(self, mig_cfg: MigrationConfig) -> None: self.migration_config = mig_cfg self.inserted_ids: list[str] = [] - class _Views: - async def create(self, **kwargs: Any) -> Any: - _ = kwargs - return {"id": "v1"} - - class _ClientObj: - def __init__(self) -> None: - self.views = _Views() - - self.client = _ClientObj() - - async def with_retry(self, _operation_name: str, coro_func): + async def with_retry( + self, + _operation_name: str, + coro_func, + *, + non_retryable_statuses: set[int] | None = None, + ): + _ = non_retryable_statuses res = coro_func() if hasattr(res, "__await__"): return await res @@ -121,15 +132,20 @@ async def with_retry(self, _operation_name: str, coro_func): async def raw_request( self, method: str, path: str, *, json: Any = None, **kwargs: Any ) -> Any: - _ = kwargs - assert method.upper() == "POST" - assert "/v1/project_logs/" in path - assert path.endswith("/insert") - events = (json or {}).get("events", []) - for e in events: - if isinstance(e, dict) and isinstance(e.get("id"), str): - self.inserted_ids.append(e["id"]) - return {"row_ids": self.inserted_ids} + _ = method, path, json, kwargs + raise AssertionError("SDK-backed logs migration should not use raw_request") + + +class _FakeSDKProjectLogsWriter: + def __init__(self, dest_client: _DestInsertClient, project_id: str) -> None: + self._dest_client = dest_client + self._project_id = project_id + + async def write_rows(self, rows: list[dict[str, Any]]) -> None: + _ = self._project_id + for row in rows: + if isinstance(row, dict) and isinstance(row.get("id"), str): + self._dest_client.inserted_ids.append(row["id"]) class _SourceBtqlClientCreatedAfter: @@ -143,7 +159,14 @@ def __init__( self._preflight_calls = 0 self._SECOND_CALL = 2 - async def with_retry(self, _operation_name: str, coro_func): + async def with_retry( + self, + _operation_name: str, + coro_func, + *, + non_retryable_statuses: set[int] | None = None, + ): + _ = non_retryable_statuses res = coro_func() if hasattr(res, "__await__"): return await res @@ -206,19 +229,24 @@ async def test_logs_btql_sorted_fetch_inserts_in_created_order(tmp_path: Path) - source = _SourceBtqlClient(pages, source_cfg) dest = _DestInsertClient(dest_cfg) - - migrator = LogsMigrator( - source, # type: ignore[arg-type] - dest, # type: ignore[arg-type] - tmp_path, - page_limit=2, - insert_batch_size=10, - use_version_snapshot=False, - use_seen_db=False, - progress_hook=None, - ) - migrator.set_destination_project_id("proj-dest") - res = await migrator.migrate_all("proj-source") + original_writer = logs_module.SDKProjectLogsWriter + logs_module.SDKProjectLogsWriter = _FakeSDKProjectLogsWriter + + try: + migrator = LogsMigrator( + source, # type: ignore[arg-type] + dest, # type: ignore[arg-type] + tmp_path, + page_limit=2, + insert_batch_size=10, + use_version_snapshot=False, + use_seen_db=False, + progress_hook=None, + ) + migrator.set_destination_project_id("proj-dest") + res = await migrator.migrate_all("proj-source") + finally: + logs_module.SDKProjectLogsWriter = original_writer assert res["migrated"] == EXPECTED_MIGRATED assert dest.inserted_ids == ["a", "b", "c"] @@ -236,19 +264,24 @@ async def test_logs_btql_fetch_retries_smaller_limit_on_500(tmp_path: Path) -> N source = _SourceBtqlClient500ThenOK(pages, source_cfg) dest = _DestInsertClient(dest_cfg) - - migrator = LogsMigrator( - source, # type: ignore[arg-type] - dest, # type: ignore[arg-type] - tmp_path, - page_limit=1000, # will 500, then retry with smaller limit - insert_batch_size=10, - use_version_snapshot=False, - use_seen_db=False, - progress_hook=None, - ) - migrator.set_destination_project_id("proj-dest") - res = await migrator.migrate_all("proj-source") + original_writer = logs_module.SDKProjectLogsWriter + logs_module.SDKProjectLogsWriter = _FakeSDKProjectLogsWriter + + try: + migrator = LogsMigrator( + source, # type: ignore[arg-type] + dest, # type: ignore[arg-type] + tmp_path, + page_limit=1000, # will 500, then retry with smaller limit + insert_batch_size=10, + use_version_snapshot=False, + use_seen_db=False, + progress_hook=None, + ) + migrator.set_destination_project_id("proj-dest") + res = await migrator.migrate_all("proj-source") + finally: + logs_module.SDKProjectLogsWriter = original_writer assert res["migrated"] == 1 assert dest.inserted_ids == ["a"] @@ -277,19 +310,24 @@ async def test_logs_created_after_uses_preflight_and_inclusive_start_pk( source = _SourceBtqlClientCreatedAfter(pages, source_cfg) dest = _DestInsertClient(dest_cfg) - - migrator = LogsMigrator( - source, # type: ignore[arg-type] - dest, # type: ignore[arg-type] - tmp_path, - page_limit=1, - insert_batch_size=10, - use_version_snapshot=False, - use_seen_db=False, - progress_hook=None, - ) - migrator.set_destination_project_id("proj-dest") - res = await migrator.migrate_all("proj-source") + original_writer = logs_module.SDKProjectLogsWriter + logs_module.SDKProjectLogsWriter = _FakeSDKProjectLogsWriter + + try: + migrator = LogsMigrator( + source, # type: ignore[arg-type] + dest, # type: ignore[arg-type] + tmp_path, + page_limit=1, + insert_batch_size=10, + use_version_snapshot=False, + use_seen_db=False, + progress_hook=None, + ) + migrator.set_destination_project_id("proj-dest") + res = await migrator.migrate_all("proj-source") + finally: + logs_module.SDKProjectLogsWriter = original_writer EXPECTED_MIGRATED = 2 assert res["migrated"] == EXPECTED_MIGRATED diff --git a/tests/unit/test_logs_byte_batching.py b/tests/unit/test_logs_byte_batching.py index efbf287..ebcc51d 100644 --- a/tests/unit/test_logs_byte_batching.py +++ b/tests/unit/test_logs_byte_batching.py @@ -7,6 +7,7 @@ from braintrust_migrate.batching import approx_json_bytes from braintrust_migrate.config import MigrationConfig +import braintrust_migrate.resources.logs as logs_module from braintrust_migrate.resources.logs import LogsMigrator @@ -16,7 +17,14 @@ def __init__(self, events: list[dict[str, Any]], mig_cfg: MigrationConfig) -> No self._calls = 0 self.migration_config = mig_cfg - async def with_retry(self, _operation_name: str, coro_func): + async def with_retry( + self, + _operation_name: str, + coro_func, + *, + non_retryable_statuses: set[int] | None = None, + ): + _ = non_retryable_statuses res = coro_func() if hasattr(res, "__await__"): return await res @@ -39,7 +47,14 @@ def __init__(self, mig_cfg: MigrationConfig) -> None: self.migration_config = mig_cfg self.insert_calls: list[list[str]] = [] - async def with_retry(self, _operation_name: str, coro_func): + async def with_retry( + self, + _operation_name: str, + coro_func, + *, + non_retryable_statuses: set[int] | None = None, + ): + _ = non_retryable_statuses res = coro_func() if hasattr(res, "__await__"): return await res @@ -48,27 +63,29 @@ async def with_retry(self, _operation_name: str, coro_func): async def raw_request( self, method: str, path: str, *, json: Any = None, **kwargs: Any ) -> Any: - _ = kwargs - assert method.upper() == "POST" - assert "/v1/project_logs/" in path - assert path.endswith("/insert") - events = (json or {}).get("events", []) + _ = method, path, json, kwargs + raise AssertionError("SDK-backed logs migration should not use raw_request") + + +class _FakeSDKProjectLogsWriter: + def __init__(self, dest_client: _DestInsertClient, project_id: str) -> None: + self._dest_client = dest_client + self._project_id = project_id + + async def write_rows(self, rows: list[dict[str, Any]]) -> None: ids: list[str] = [] - for e in events: + for e in rows: if not isinstance(e, dict): continue v = e.get("id") if isinstance(v, str): ids.append(v) - self.insert_calls.append(ids) - row_ids = ids - return {"row_ids": row_ids} + self._dest_client.insert_calls.append(ids) @pytest.mark.asyncio -async def test_logs_migrator_splits_inserts_by_bytes(tmp_path: Path) -> None: +async def test_logs_migrator_buffers_rows_before_sdk_flush(tmp_path: Path) -> None: EXPECTED_MIGRATED = 2 - MIN_INSERT_CALLS = 2 # Make events large enough that 2 events exceed the byte cap, but 1 fits. raw_events = [ { @@ -89,7 +106,8 @@ async def test_logs_migrator_splits_inserts_by_bytes(tmp_path: Path) -> None: two = approx_json_bytes({"events": raw_events}) assert one < two - # Target cap just above one-event payload so we must split. + # Target cap just above one-event payload. The logs migrator should still buffer + # both rows together and let the SDK handle any downstream request splitting. max_req = one + 25 cfg = MigrationConfig( @@ -99,23 +117,27 @@ async def test_logs_migrator_splits_inserts_by_bytes(tmp_path: Path) -> None: source = _SourceFetchClient(raw_events, cfg) dest = _DestInsertClient(cfg) - - migrator = LogsMigrator( - source, # type: ignore[arg-type] - dest, # type: ignore[arg-type] - tmp_path, - page_limit=10, - insert_batch_size=10_000, # ensure count does not cause splitting - use_version_snapshot=False, - use_seen_db=False, - progress_hook=None, - ) - migrator.set_destination_project_id("proj-dest") - res = await migrator.migrate_all("proj-source") + monkeypatch = pytest.MonkeyPatch() + monkeypatch.setattr(logs_module, "SDKProjectLogsWriter", _FakeSDKProjectLogsWriter) + + try: + migrator = LogsMigrator( + source, # type: ignore[arg-type] + dest, # type: ignore[arg-type] + tmp_path, + page_limit=10, + insert_batch_size=10_000, # ensure count does not cause splitting + use_version_snapshot=False, + use_seen_db=False, + progress_hook=None, + ) + migrator.set_destination_project_id("proj-dest") + res = await migrator.migrate_all("proj-source") + finally: + monkeypatch.undo() assert res["migrated"] == EXPECTED_MIGRATED assert migrator._stream_state.inserted_bytes > 0 - # Must have at least 2 insert calls due to byte cap. - assert len(dest.insert_calls) >= MIN_INSERT_CALLS + assert len(dest.insert_calls) == 1 flattened = [x for call in dest.insert_calls for x in call] assert flattened == ["a", "b"] diff --git a/tests/unit/test_logs_insert_bisect_413.py b/tests/unit/test_logs_insert_bisect_413.py deleted file mode 100644 index 0160311..0000000 --- a/tests/unit/test_logs_insert_bisect_413.py +++ /dev/null @@ -1,106 +0,0 @@ -from __future__ import annotations - -from pathlib import Path -from typing import Any - -import httpx -import pytest - -from braintrust_migrate.resources.logs import LogsMigrator - - -class _Dest413Client: - def __init__(self, *, max_events_per_insert: int) -> None: - self.max_events_per_insert = max_events_per_insert - self.inserts: list[int] = [] - - async def with_retry(self, _operation_name: str, coro_func): - return await coro_func() - - async def raw_request( - self, - method: str, - path: str, - *, - params: dict[str, Any] | None = None, - json: Any | None = None, - timeout: float | None = None, - ) -> Any: - _ = params - _ = timeout - assert method.lower() == "post" - if path.endswith("/insert"): - assert json is not None - events = json.get("events", []) - if len(events) > self.max_events_per_insert: - req = httpx.Request( - "POST", "https://api.braintrust.dev/v1/project_logs/x/insert" - ) - resp = httpx.Response(413, request=req) - raise httpx.HTTPStatusError("413", request=req, response=resp) - self.inserts.append(len(events)) - return {"row_ids": [e.get("id", "") for e in events]} - raise AssertionError(f"Unexpected path: {path}") - - -class _SourceClient: - def __init__(self, pages: list[list[dict[str, Any]]]) -> None: - self._pages = pages - - async def with_retry(self, _operation_name: str, coro_func): - return await coro_func() - - async def raw_request( - self, - method: str, - path: str, - *, - params: dict[str, Any] | None = None, - json: Any | None = None, - timeout: float | None = None, - ) -> Any: - _ = params - _ = timeout - assert method.lower() == "post" - assert path == "/btql" - assert json is not None - assert isinstance(json.get("query"), str) - if self._pages: - return {"data": self._pages.pop(0)} - return {"data": []} - - -@pytest.mark.asyncio -async def test_logs_insert_bisects_on_413(tmp_path: Path) -> None: - TOTAL_EVENTS = 25 - MAX_PER_INSERT = 10 - # One page with 25 events; initial insert batch is 25 and will 413. - page_events = [ - { - "id": f"e{i}", - "_pagination_key": f"p{i:05d}", - "_xact_id": str(100 - i), - "created": "2023-01-01T00:00:00Z", - } - for i in range(TOTAL_EVENTS) - ] - source = _SourceClient([page_events]) - dest = _Dest413Client(max_events_per_insert=MAX_PER_INSERT) - - migrator = LogsMigrator( - source, # type: ignore[arg-type] - dest, # type: ignore[arg-type] - tmp_path, - page_limit=50, - insert_batch_size=50, # force single large insert attempt - use_version_snapshot=False, - use_seen_db=False, - ) - migrator.set_destination_project_id("dest-project-id") - - result = await migrator.migrate_all("source-project-id") - assert result["migrated"] == TOTAL_EVENTS - # Should have split at least once (multiple insert calls) - assert len(dest.inserts) > 1 - # All individual inserts should be <= 10 - assert all(n <= MAX_PER_INSERT for n in dest.inserts) diff --git a/tests/unit/test_logs_sdk_writer_failure.py b/tests/unit/test_logs_sdk_writer_failure.py new file mode 100644 index 0000000..827f5c6 --- /dev/null +++ b/tests/unit/test_logs_sdk_writer_failure.py @@ -0,0 +1,97 @@ +from __future__ import annotations + +from pathlib import Path +from typing import Any + +import pytest + +import braintrust_migrate.resources.logs as logs_module +from braintrust_migrate.resources.logs import LogsMigrator + + +class _SourceClient: + def __init__(self, pages: list[list[dict[str, Any]]]) -> None: + self._pages = pages + + async def with_retry( + self, + _operation_name: str, + coro_func, + *, + non_retryable_statuses: set[int] | None = None, + ): + _ = non_retryable_statuses + return await coro_func() + + async def raw_request( + self, + method: str, + path: str, + *, + params: dict[str, Any] | None = None, + json: Any | None = None, + timeout: float | None = None, + ) -> Any: + _ = params + _ = timeout + assert method.lower() == "post" + assert path == "/btql" + assert json is not None + assert isinstance(json.get("query"), str) + if self._pages: + return {"data": self._pages.pop(0)} + return {"data": []} + + +class _DestClient: + async def with_retry( + self, + _operation_name: str, + coro_func, + *, + non_retryable_statuses: set[int] | None = None, + ): + _ = non_retryable_statuses + return await coro_func() + + +class _FailingSDKProjectLogsWriter: + def __init__(self, _dest_client: _DestClient, _project_id: str) -> None: + self.calls = 0 + + async def write_rows(self, rows: list[dict[str, Any]]) -> None: + _ = rows + raise RuntimeError("sdk flush failed") + + +@pytest.mark.asyncio +async def test_logs_migrator_propagates_sdk_writer_failure(tmp_path: Path) -> None: + page_events = [ + { + "id": "e1", + "_pagination_key": "p1", + "_xact_id": "10", + "created": "2023-01-01T00:00:00Z", + } + ] + source = _SourceClient([page_events]) + dest = _DestClient() + + original_writer = logs_module.SDKProjectLogsWriter + logs_module.SDKProjectLogsWriter = _FailingSDKProjectLogsWriter + try: + migrator = LogsMigrator( + source, # type: ignore[arg-type] + dest, # type: ignore[arg-type] + tmp_path, + page_limit=10, + insert_batch_size=10, + use_version_snapshot=False, + use_seen_db=False, + ) + migrator.set_destination_project_id("dest-project-id") + + with pytest.raises(RuntimeError, match="sdk flush failed"): + await migrator.migrate_all("source-project-id") + finally: + logs_module.SDKProjectLogsWriter = original_writer diff --git a/tests/unit/test_logs_streaming_migrator.py b/tests/unit/test_logs_streaming_migrator.py index bec12c2..877cf5a 100644 --- a/tests/unit/test_logs_streaming_migrator.py +++ b/tests/unit/test_logs_streaming_migrator.py @@ -5,6 +5,7 @@ import pytest +import braintrust_migrate.resources.logs as logs_module from braintrust_migrate.resources.logs import LogsMigrator @@ -13,7 +14,14 @@ def __init__(self, *, btql_pages: list[list[dict[str, Any]]] | None = None) -> N self._btql_pages = btql_pages or [] self.inserts: list[dict[str, Any]] = [] - async def with_retry(self, _operation_name: str, coro_func): + async def with_retry( + self, + _operation_name: str, + coro_func, + *, + non_retryable_statuses: set[int] | None = None, + ): + _ = non_retryable_statuses return await coro_func() async def raw_request( @@ -46,6 +54,20 @@ async def raw_request( raise AssertionError(f"Unexpected path: {path}") +class _FakeSDKProjectLogsWriter: + def __init__(self, dest_client: _StubClient, project_id: str) -> None: + self._dest_client = dest_client + self._project_id = project_id + + async def write_rows(self, rows: list[dict[str, Any]]) -> None: + self._dest_client.inserts.append( + { + "project_id": self._project_id, + "events": [dict(row) for row in rows], + } + ) + + @pytest.mark.asyncio async def test_logs_migrator_skips_duplicate_ids_across_pages(tmp_path: Path) -> None: expected_inserted = 2 @@ -78,29 +100,33 @@ async def test_logs_migrator_skips_duplicate_ids_across_pages(tmp_path: Path) -> source = _StubClient(btql_pages=[page1, page2]) dest = _StubClient() - - migrator = LogsMigrator( - source, # type: ignore[arg-type] - dest, # type: ignore[arg-type] - tmp_path, - page_limit=1, - insert_batch_size=2, - use_version_snapshot=False, - use_seen_db=True, - ) - migrator.set_destination_project_id("dest-project-id") - - result = await migrator.migrate_all("source-project-id") - assert result["streaming"] is True - assert result["version"] is None - assert result["migrated"] == expected_inserted # a + b inserted once - assert result["skipped"] == 1 # older duplicate 'a' skipped on page 2 - - # Ensure destination insert saw only two ids total - inserted_ids: list[str] = [] - for call in dest.inserts: - inserted_ids.extend([e["id"] for e in call["events"]]) - assert inserted_ids == ["a", "b"] - - # Ensure checkpoint exists - assert (tmp_path / "logs_streaming_state.json").exists() + original_writer = logs_module.SDKProjectLogsWriter + logs_module.SDKProjectLogsWriter = _FakeSDKProjectLogsWriter + + try: + migrator = LogsMigrator( + source, # type: ignore[arg-type] + dest, # type: ignore[arg-type] + tmp_path, + page_limit=1, + insert_batch_size=2, + use_version_snapshot=False, + use_seen_db=True, + ) + migrator.set_destination_project_id("dest-project-id") + + result = await migrator.migrate_all("source-project-id") + assert result["streaming"] is True + assert result["version"] is None + assert result["migrated"] == expected_inserted # a + b inserted once + assert result["skipped"] == 1 # older duplicate 'a' skipped on page 2 + + inserted_ids: list[str] = [] + for call in dest.inserts: + inserted_ids.extend([e["id"] for e in call["events"]]) + assert inserted_ids == ["a", "b"] + assert len(dest.inserts) == 1 + assert all(call["project_id"] == "dest-project-id" for call in dest.inserts) + assert (tmp_path / "logs_streaming_state.json").exists() + finally: + logs_module.SDKProjectLogsWriter = original_writer diff --git a/tests/unit/test_logs_streaming_resume.py b/tests/unit/test_logs_streaming_resume.py index 6e6f7e4..598fb7c 100644 --- a/tests/unit/test_logs_streaming_resume.py +++ b/tests/unit/test_logs_streaming_resume.py @@ -5,6 +5,7 @@ import pytest +import braintrust_migrate.resources.logs as logs_module from braintrust_migrate.resources.logs import LogsMigrator @@ -18,7 +19,14 @@ def __init__( self.fail_on_insert_call: int | None = None self._insert_calls = 0 - async def with_retry(self, _operation_name: str, coro_func): + async def with_retry( + self, + _operation_name: str, + coro_func, + *, + non_retryable_statuses: set[int] | None = None, + ): + _ = non_retryable_statuses res = coro_func() if hasattr(res, "__await__"): return await res @@ -61,6 +69,18 @@ async def raw_request( raise AssertionError(f"Unexpected path: {path}") +class _FakeSDKProjectLogsWriter: + def __init__(self, dest_client: _StubClient, project_id: str) -> None: + self._dest_client = dest_client + self._project_id = project_id + + async def write_rows(self, rows: list[dict[str, Any]]) -> None: + self._dest_client._insert_calls += 1 + if self._dest_client.fail_on_insert_call == self._dest_client._insert_calls: + raise RuntimeError("simulated insert failure") + self._dest_client.inserts.append([dict(row) for row in rows]) + + @pytest.mark.asyncio async def test_logs_migrator_resume_after_insert_failure(tmp_path: Path) -> None: page1 = [ @@ -83,27 +103,33 @@ async def test_logs_migrator_resume_after_insert_failure(tmp_path: Path) -> None source = _StubClient(page1=page1, page2=page2) dest = _StubClient(page1=page1, page2=page2) dest.fail_on_insert_call = 2 # fail on second insert (page2) during first run - - migrator = LogsMigrator( - source, # type: ignore[arg-type] - dest, # type: ignore[arg-type] - tmp_path, - page_limit=1, - insert_batch_size=10, - use_version_snapshot=True, - use_seen_db=True, - ) - migrator.set_destination_project_id("dest-project-id") - - with pytest.raises(RuntimeError): + original_writer = logs_module.SDKProjectLogsWriter + original_flush_max_rows = logs_module.LogsMigrator.SDK_FLUSH_MAX_ROWS + logs_module.SDKProjectLogsWriter = _FakeSDKProjectLogsWriter + logs_module.LogsMigrator.SDK_FLUSH_MAX_ROWS = 1 + + try: + migrator = LogsMigrator( + source, # type: ignore[arg-type] + dest, # type: ignore[arg-type] + tmp_path, + page_limit=1, + insert_batch_size=10, + use_version_snapshot=True, + use_seen_db=True, + ) + migrator.set_destination_project_id("dest-project-id") + + with pytest.raises(RuntimeError): + await migrator.migrate_all("source-project-id") + + inserted_first = [e["id"] for batch in dest.inserts for e in batch] + assert inserted_first == ["a"] + + dest.fail_on_insert_call = None await migrator.migrate_all("source-project-id") - - # First run inserted only "a" - inserted_first = [e["id"] for batch in dest.inserts for e in batch] - assert inserted_first == ["a"] - - # Second run should resume and insert only "b" (not reinsert "a") - dest.fail_on_insert_call = None - await migrator.migrate_all("source-project-id") - inserted_all = [e["id"] for batch in dest.inserts for e in batch] - assert inserted_all == ["a", "b"] + inserted_all = [e["id"] for batch in dest.inserts for e in batch] + assert inserted_all == ["a", "b"] + finally: + logs_module.SDKProjectLogsWriter = original_writer + logs_module.LogsMigrator.SDK_FLUSH_MAX_ROWS = original_flush_max_rows diff --git a/tests/unit/test_sdk_logs.py b/tests/unit/test_sdk_logs.py new file mode 100644 index 0000000..562175d --- /dev/null +++ b/tests/unit/test_sdk_logs.py @@ -0,0 +1,147 @@ +from __future__ import annotations + +import sys +from types import ModuleType, SimpleNamespace +from typing import Any + +from braintrust_migrate.sdk_logs import ( + SDKDatasetWriter, + PROJECT_LOGS_LOG_ID, + SDKExperimentWriter, + SDKProjectLogsWriter, +) + + +class _FakeLazyValue: + def __init__(self, fn, use_mutex: bool = False) -> None: + _ = use_mutex + self._fn = fn + + def get(self) -> Any: + return self._fn() + + +class _FakeHTTPConnection: + def __init__(self, base_url: str) -> None: + self.base_url = base_url + self.token: str | None = None + self.long_lived = False + + def set_token(self, token: str) -> None: + self.token = token + + def make_long_lived(self) -> None: + self.long_lived = True + + +class _FakeBackgroundLogger: + def __init__(self, api_conn: _FakeLazyValue) -> None: + self.api_conn = api_conn + self.sync_flush = False + self.logged_rows: list[dict[str, Any]] = [] + self.flush_count = 0 + + def log(self, *args: _FakeLazyValue) -> None: + self.logged_rows.extend(arg.get() for arg in args) + + def flush(self) -> None: + self.flush_count += 1 + + +def test_sdk_project_logs_writer_adds_logs3_object_ids(monkeypatch) -> None: + fake_logger_module = ModuleType("braintrust.logger") + fake_logger_module.HTTPConnection = _FakeHTTPConnection + fake_logger_module._HTTPBackgroundLogger = _FakeBackgroundLogger + fake_util_module = ModuleType("braintrust.util") + fake_util_module.LazyValue = _FakeLazyValue + + monkeypatch.setitem(sys.modules, "braintrust.logger", fake_logger_module) + monkeypatch.setitem(sys.modules, "braintrust.util", fake_util_module) + + dest_client = SimpleNamespace( + org_config=SimpleNamespace( + url="https://api.example.com", + api_key="secret-token", + ) + ) + + writer = SDKProjectLogsWriter(dest_client, "dest-project-id") + writer.write_rows_sync([{"id": "row1", "input": "hello"}]) + + logger = writer._background_logger + assert logger is not None + assert logger.flush_count == 1 + assert logger.logged_rows == [ + { + "id": "row1", + "input": "hello", + "project_id": "dest-project-id", + "log_id": PROJECT_LOGS_LOG_ID, + } + ] + conn = logger.api_conn.get() + assert conn.base_url == "https://api.example.com" + assert conn.token == "secret-token" + assert conn.long_lived is True + + +def test_sdk_experiment_writer_adds_experiment_id(monkeypatch) -> None: + fake_logger_module = ModuleType("braintrust.logger") + fake_logger_module.HTTPConnection = _FakeHTTPConnection + fake_logger_module._HTTPBackgroundLogger = _FakeBackgroundLogger + fake_util_module = ModuleType("braintrust.util") + fake_util_module.LazyValue = _FakeLazyValue + + monkeypatch.setitem(sys.modules, "braintrust.logger", fake_logger_module) + monkeypatch.setitem(sys.modules, "braintrust.util", fake_util_module) + + dest_client = SimpleNamespace( + org_config=SimpleNamespace( + url="https://api.example.com", + api_key="secret-token", + ) + ) + + writer = SDKExperimentWriter(dest_client, "dest-experiment-id") + writer.write_rows_sync([{"id": "row1", "input": "hello"}]) + + logger = writer._background_logger + assert logger is not None + assert logger.logged_rows == [ + { + "id": "row1", + "input": "hello", + "experiment_id": "dest-experiment-id", + } + ] + + +def test_sdk_dataset_writer_adds_dataset_id(monkeypatch) -> None: + fake_logger_module = ModuleType("braintrust.logger") + fake_logger_module.HTTPConnection = _FakeHTTPConnection + fake_logger_module._HTTPBackgroundLogger = _FakeBackgroundLogger + fake_util_module = ModuleType("braintrust.util") + fake_util_module.LazyValue = _FakeLazyValue + + monkeypatch.setitem(sys.modules, "braintrust.logger", fake_logger_module) + monkeypatch.setitem(sys.modules, "braintrust.util", fake_util_module) + + dest_client = SimpleNamespace( + org_config=SimpleNamespace( + url="https://api.example.com", + api_key="secret-token", + ) + ) + + writer = SDKDatasetWriter(dest_client, "dest-dataset-id") + writer.write_rows_sync([{"id": "row1", "input": "hello"}]) + + logger = writer._background_logger + assert logger is not None + assert logger.logged_rows == [ + { + "id": "row1", + "input": "hello", + "dataset_id": "dest-dataset-id", + } + ] diff --git a/tests/unit/test_streaming_pipeline.py b/tests/unit/test_streaming_pipeline.py deleted file mode 100644 index 8917320..0000000 --- a/tests/unit/test_streaming_pipeline.py +++ /dev/null @@ -1,236 +0,0 @@ -"""Unit tests for pipelined event streaming (pipeline=True).""" - -from __future__ import annotations - -import asyncio -from typing import Any - -import pytest - -from braintrust_migrate.streaming_utils import stream_btql_sorted_events - - -# --------------------------------------------------------------------------- -# Helpers -# --------------------------------------------------------------------------- - - -def _make_pages( - pages: list[list[dict[str, Any]]], -) -> tuple[list[dict[str, Any]], dict[str, Any]]: - """Convert a list of event-lists into BTQL-shaped page responses. - - Returns (page_responses, counters_dict). - """ - responses: list[dict[str, Any]] = [] - for i, events in enumerate(pages): - last_pk = events[-1]["_pagination_key"] if events else None - responses.append( - { - "events": events, - "btql_last_pagination_key": last_pk, - } - ) - # Append an empty terminal page. - responses.append({"events": [], "btql_last_pagination_key": None}) - return responses, {} - - -def _build_streaming_harness( - pages: list[list[dict[str, Any]]], - *, - pipeline: bool = False, - fail_on_insert_page: int | None = None, -): - """Build all the callbacks needed by stream_btql_sorted_events. - - Returns (run_coro, state) where state has fetch_order, insert_order, - and the final pagination key. - """ - page_responses, _ = _make_pages(pages) - page_idx = {"i": 0} - pk = {"value": None} - - fetch_order: list[int] = [] - insert_calls: list[list[dict[str, Any]]] = [] - fetch_count = {"n": 0} - insert_count = {"n": 0} - insert_bytes = {"n": 0} - - async def fetch_page(limit: int) -> dict[str, Any]: - idx = page_idx["i"] - page_idx["i"] += 1 - fetch_order.append(idx) - # Small delay to allow concurrency - await asyncio.sleep(0.005) - return page_responses[idx] - - async def insert_events(batch: list[dict[str, Any]]) -> None: - page_num = len(insert_calls) - if fail_on_insert_page is not None and page_num == fail_on_insert_page: - raise RuntimeError("insert failed") - insert_calls.append(batch) - await asyncio.sleep(0.005) - - state = { - "fetch_order": fetch_order, - "insert_calls": insert_calls, - "pk": pk, - "fetch_count": fetch_count, - "insert_count": insert_count, - "insert_bytes": insert_bytes, - } - - async def run() -> None: - await stream_btql_sorted_events( - fetch_page=fetch_page, - page_limit=100, - get_last_pk=lambda: pk["value"], - set_last_pk=lambda v: pk.__setitem__("value", v), - save_state=lambda: None, - page_event_filter=None, - event_to_insert=lambda e: e, - seen_db=None, - insert_batch_size=1000, - insert_max_bytes=None, - rewrite_event_in_place=None, - insert_events=insert_events, - is_http_413=lambda _: False, - on_single_413=None, - incr_fetched=lambda n: fetch_count.__setitem__("n", fetch_count["n"] + n), - incr_inserted=lambda n: insert_count.__setitem__("n", insert_count["n"] + n), - incr_inserted_bytes=lambda n: insert_bytes.__setitem__("n", insert_bytes["n"] + n), - incr_skipped_deleted=None, - incr_skipped_seen=None, - incr_attachments_copied=None, - pipeline=pipeline, - ) - - return run, state - - -# --------------------------------------------------------------------------- -# Tests -# --------------------------------------------------------------------------- - - -@pytest.mark.asyncio -class TestStreamingPipeline: - - async def test_pipeline_false_is_sequential(self): - """With pipeline=False, fetch and insert should not overlap.""" - pages = [ - [{"id": "a", "_pagination_key": "pk1"}], - [{"id": "b", "_pagination_key": "pk2"}], - ] - run, state = _build_streaming_harness(pages, pipeline=False) - await run() - - assert state["fetch_count"]["n"] == 2 - assert state["insert_count"]["n"] == 2 - assert state["pk"]["value"] == "pk2" - - async def test_pipeline_true_processes_all_events(self): - """With pipeline=True, all events should still be processed correctly.""" - pages = [ - [{"id": "a", "_pagination_key": "pk1"}], - [{"id": "b", "_pagination_key": "pk2"}], - [{"id": "c", "_pagination_key": "pk3"}], - ] - run, state = _build_streaming_harness(pages, pipeline=True) - await run() - - assert state["fetch_count"]["n"] == 3 - assert state["insert_count"]["n"] == 3 - assert state["pk"]["value"] == "pk3" - - # All events should have been inserted - all_inserted_ids = [e["id"] for batch in state["insert_calls"] for e in batch] - assert all_inserted_ids == ["a", "b", "c"] - - async def test_pipeline_commits_pk_after_insert(self): - """Pagination key should be committed (save_state) after inserts.""" - save_state_pks: list[str | None] = [] - pk = {"value": None} - - pages = [ - [{"id": "a", "_pagination_key": "pk1"}], - [{"id": "b", "_pagination_key": "pk2"}], - ] - page_responses, _ = _make_pages(pages) - page_idx = {"i": 0} - - async def fetch_page(limit: int) -> dict[str, Any]: - idx = page_idx["i"] - page_idx["i"] += 1 - return page_responses[idx] - - async def insert_events(batch: list[dict[str, Any]]) -> None: - pass - - def save_state() -> None: - save_state_pks.append(pk["value"]) - - await stream_btql_sorted_events( - fetch_page=fetch_page, - page_limit=100, - get_last_pk=lambda: pk["value"], - set_last_pk=lambda v: pk.__setitem__("value", v), - save_state=save_state, - page_event_filter=None, - event_to_insert=lambda e: e, - seen_db=None, - insert_batch_size=1000, - insert_max_bytes=None, - rewrite_event_in_place=None, - insert_events=insert_events, - is_http_413=lambda _: False, - on_single_413=None, - incr_fetched=lambda _: None, - incr_inserted=lambda _: None, - incr_inserted_bytes=lambda _: None, - incr_skipped_deleted=None, - incr_skipped_seen=None, - incr_attachments_copied=None, - pipeline=True, - ) - - # save_state should have been called after each page + end-of-stream. - # pk should advance: pk1, pk2, then end-of-stream save with pk2. - assert "pk1" in save_state_pks - assert "pk2" in save_state_pks - - async def test_pipeline_insert_error_cancels_prefetch(self): - """If insert fails, the prefetch task should be cancelled cleanly.""" - pages = [ - [{"id": "a", "_pagination_key": "pk1"}], - [{"id": "b", "_pagination_key": "pk2"}], - ] - run, state = _build_streaming_harness( - pages, pipeline=True, fail_on_insert_page=0 - ) - - with pytest.raises(RuntimeError, match="insert failed"): - await run() - - # The first insert failed; the second page may have been prefetched - # but its insert should NOT have happened. - assert len(state["insert_calls"]) == 0 # insert_calls only appended on success - - async def test_pipeline_empty_pages(self): - """Pipeline should handle empty input gracefully.""" - run, state = _build_streaming_harness([], pipeline=True) - await run() - - assert state["fetch_count"]["n"] == 0 - assert state["insert_count"]["n"] == 0 - - async def test_pipeline_single_page(self): - """Pipeline with only one data page should work fine.""" - pages = [[{"id": "x", "_pagination_key": "pk_only"}]] - run, state = _build_streaming_harness(pages, pipeline=True) - await run() - - assert state["fetch_count"]["n"] == 1 - assert state["insert_count"]["n"] == 1 - assert state["pk"]["value"] == "pk_only" diff --git a/tests/unit/test_streaming_utf8_and_rewrite_batching.py b/tests/unit/test_streaming_utf8_and_rewrite_batching.py new file mode 100644 index 0000000..95a2114 --- /dev/null +++ b/tests/unit/test_streaming_utf8_and_rewrite_batching.py @@ -0,0 +1,69 @@ +from __future__ import annotations + +from typing import Any + +import pytest + +from braintrust_migrate.batching import approx_json_bytes +from braintrust_migrate.streaming_utils import ( + EventsStreamState, + stream_btql_sorted_events_buffered, +) + + +@pytest.mark.asyncio +async def test_stream_batches_using_post_rewrite_payload_size() -> None: + max_bytes = approx_json_bytes({"id": "a", "input": "🙂" * 16}) + 1 + + page_calls = 0 + inserted_batches: list[list[str]] = [] + insert_updates: list[dict[str, Any]] = [] + state = EventsStreamState() + + async def fetch_page(_limit: int) -> dict[str, Any]: + nonlocal page_calls + page_calls += 1 + if page_calls == 1: + return { + "events": [ + {"id": "a", "_pagination_key": "p1", "input": "x"}, + {"id": "b", "_pagination_key": "p2", "input": "y"}, + ], + "btql_last_pagination_key": "p2", + } + return {"events": [], "btql_last_pagination_key": None} + + async def rewrite_event_in_place(event: dict[str, Any]) -> int: + event["input"] = "🙂" * 16 + return 1 + + async def insert_events(batch: list[dict[str, Any]]) -> None: + inserted_batches.append([str(event["id"]) for event in batch]) + + await stream_btql_sorted_events_buffered( + fetch_page=fetch_page, + page_limit=100, + state=state, + save_state=lambda: None, + page_event_filter=None, + event_to_insert=lambda event: { + "id": event["id"], + "input": event["input"], + }, + seen_db=None, + rewrite_event_in_place=rewrite_event_in_place, + insert_events=insert_events, + flush_max_rows=100, + flush_max_bytes=max_bytes, + is_http_413=lambda _exc: False, + on_single_413=None, + hooks={"on_insert": insert_updates.append}, + ) + + assert inserted_batches == [["a", "b"]] + assert sum(len(batch) for batch in inserted_batches) == 2 + assert state.inserted_events == 2 + assert state.inserted_bytes > max_bytes + assert state.attachments_copied == 2 + assert len(insert_updates) == 1 + assert insert_updates[0]["flush_buffer_bytes"] > max_bytes diff --git a/uv.lock b/uv.lock index ad73ce0..6b35a83 100644 --- a/uv.lock +++ b/uv.lock @@ -25,11 +25,44 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/a1/ee/48ca1a7c89ffec8b6a0c5d02b89c305671d5ffd8d3c94acf8b8c408575bb/anyio-4.9.0-py3-none-any.whl", hash = "sha256:9f76d541cad6e36af7beb62e978876f3b41e3e04f2c1fbf0884604c0a9c4d93c", size = 100916, upload-time = "2025-03-17T00:02:52.713Z" }, ] +[[package]] +name = "attrs" +version = "26.1.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/9a/8e/82a0fe20a541c03148528be8cac2408564a6c9a0cc7e9171802bc1d26985/attrs-26.1.0.tar.gz", hash = "sha256:d03ceb89cb322a8fd706d4fb91940737b6642aa36998fe130a9bc96c985eff32", size = 952055, upload-time = "2026-03-19T14:22:25.026Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/64/b4/17d4b0b2a2dc85a6df63d1157e028ed19f90d4cd97c36717afef2bc2f395/attrs-26.1.0-py3-none-any.whl", hash = "sha256:c647aa4a12dfbad9333ca4e71fe62ddc36f4e63b2d260a37a8b83d2f043ac309", size = 67548, upload-time = "2026-03-19T14:22:23.645Z" }, +] + +[[package]] +name = "braintrust" +version = "0.12.1" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "chevron" }, + { name = "exceptiongroup" }, + { name = "gitpython" }, + { name = "jsonschema" }, + { name = "packaging" }, + { name = "python-dotenv" }, + { name = "python-slugify" }, + { name = "requests" }, + { name = "sseclient-py" }, + { name = "tqdm" }, + { name = "typing-extensions" }, + { name = "wrapt" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/27/53/eeb7c55712d5b1c5710b343b1eb77e8462bcf5c808a23025b91b4518eb8a/braintrust-0.12.1.tar.gz", hash = "sha256:0656adc9367a1c8f0f2338af48340e01fea35ff617ab815dd71761354a3b11ff", size = 458002, upload-time = "2026-04-02T17:29:31.341Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/c0/d8/145d2bc55a2d53203c8a60457e13ff67afcd9b9586c07b9ae1441169cc2e/braintrust-0.12.1-py3-none-any.whl", hash = "sha256:cfcf6b2a7ca818aa85f496b2dbfc1250b80b8ef3fb4486f22333eee026afe237", size = 531466, upload-time = "2026-04-02T17:29:29.424Z" }, +] + [[package]] name = "braintrust-migrate" version = "0.1.0" source = { editable = "." } dependencies = [ + { name = "braintrust" }, { name = "httpx" }, { name = "pydantic" }, { name = "python-dotenv" }, @@ -58,6 +91,7 @@ scripts = [ [package.metadata] requires-dist = [ + { name = "braintrust", specifier = "~=0.12.0" }, { name = "httpx", specifier = ">=0.25.0" }, { name = "mypy", marker = "extra == 'dev'", specifier = ">=1.7.0" }, { name = "pydantic", specifier = ">=2.5.0" }, @@ -88,6 +122,88 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/4a/7e/3db2bd1b1f9e95f7cddca6d6e75e2f2bd9f51b1246e546d88addca0106bd/certifi-2025.4.26-py3-none-any.whl", hash = "sha256:30350364dfe371162649852c63336a15c70c6510c2ad5015b21c2345311805f3", size = 159618, upload-time = "2025-04-26T02:12:27.662Z" }, ] +[[package]] +name = "charset-normalizer" +version = "3.4.7" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/e7/a1/67fe25fac3c7642725500a3f6cfe5821ad557c3abb11c9d20d12c7008d3e/charset_normalizer-3.4.7.tar.gz", hash = "sha256:ae89db9e5f98a11a4bf50407d4363e7b09b31e55bc117b4f7d80aab97ba009e5", size = 144271, upload-time = "2026-04-02T09:28:39.342Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/0c/eb/4fc8d0a7110eb5fc9cc161723a34a8a6c200ce3b4fbf681bc86feee22308/charset_normalizer-3.4.7-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:eca9705049ad3c7345d574e3510665cb2cf844c2f2dcfe675332677f081cbd46", size = 311328, upload-time = "2026-04-02T09:26:24.331Z" }, + { url = "https://files.pythonhosted.org/packages/f8/e3/0fadc706008ac9d7b9b5be6dc767c05f9d3e5df51744ce4cc9605de7b9f4/charset_normalizer-3.4.7-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6178f72c5508bfc5fd446a5905e698c6212932f25bcdd4b47a757a50605a90e2", size = 208061, upload-time = "2026-04-02T09:26:25.568Z" }, + { url = "https://files.pythonhosted.org/packages/42/f0/3dd1045c47f4a4604df85ec18ad093912ae1344ac706993aff91d38773a2/charset_normalizer-3.4.7-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:e1421b502d83040e6d7fb2fb18dff63957f720da3d77b2fbd3187ceb63755d7b", size = 229031, upload-time = "2026-04-02T09:26:26.865Z" }, + { url = "https://files.pythonhosted.org/packages/dc/67/675a46eb016118a2fbde5a277a5d15f4f69d5f3f5f338e5ee2f8948fcf43/charset_normalizer-3.4.7-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:edac0f1ab77644605be2cbba52e6b7f630731fc42b34cb0f634be1a6eface56a", size = 225239, upload-time = "2026-04-02T09:26:28.044Z" }, + { url = "https://files.pythonhosted.org/packages/4b/f8/d0118a2f5f23b02cd166fa385c60f9b0d4f9194f574e2b31cef350ad7223/charset_normalizer-3.4.7-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:5649fd1c7bade02f320a462fdefd0b4bd3ce036065836d4f42e0de958038e116", size = 216589, upload-time = "2026-04-02T09:26:29.239Z" }, + { url = "https://files.pythonhosted.org/packages/b1/f1/6d2b0b261b6c4ceef0fcb0d17a01cc5bc53586c2d4796fa04b5c540bc13d/charset_normalizer-3.4.7-cp312-cp312-manylinux_2_31_armv7l.whl", hash = "sha256:203104ed3e428044fd943bc4bf45fa73c0730391f9621e37fe39ecf477b128cb", size = 202733, upload-time = "2026-04-02T09:26:30.5Z" }, + { url = "https://files.pythonhosted.org/packages/6f/c0/7b1f943f7e87cc3db9626ba17807d042c38645f0a1d4415c7a14afb5591f/charset_normalizer-3.4.7-cp312-cp312-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:298930cec56029e05497a76988377cbd7457ba864beeea92ad7e844fe74cd1f1", size = 212652, upload-time = "2026-04-02T09:26:31.709Z" }, + { url = "https://files.pythonhosted.org/packages/38/dd/5a9ab159fe45c6e72079398f277b7d2b523e7f716acc489726115a910097/charset_normalizer-3.4.7-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:708838739abf24b2ceb208d0e22403dd018faeef86ddac04319a62ae884c4f15", size = 211229, upload-time = "2026-04-02T09:26:33.282Z" }, + { url = "https://files.pythonhosted.org/packages/d5/ff/531a1cad5ca855d1c1a8b69cb71abfd6d85c0291580146fda7c82857caa1/charset_normalizer-3.4.7-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:0f7eb884681e3938906ed0434f20c63046eacd0111c4ba96f27b76084cd679f5", size = 203552, upload-time = "2026-04-02T09:26:34.845Z" }, + { url = "https://files.pythonhosted.org/packages/c1/4c/a5fb52d528a8ca41f7598cb619409ece30a169fbdf9cdce592e53b46c3a6/charset_normalizer-3.4.7-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:4dc1e73c36828f982bfe79fadf5919923f8a6f4df2860804db9a98c48824ce8d", size = 230806, upload-time = "2026-04-02T09:26:36.152Z" }, + { url = "https://files.pythonhosted.org/packages/59/7a/071feed8124111a32b316b33ae4de83d36923039ef8cf48120266844285b/charset_normalizer-3.4.7-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:aed52fea0513bac0ccde438c188c8a471c4e0f457c2dd20cdbf6ea7a450046c7", size = 212316, upload-time = "2026-04-02T09:26:37.672Z" }, + { url = "https://files.pythonhosted.org/packages/fd/35/f7dba3994312d7ba508e041eaac39a36b120f32d4c8662b8814dab876431/charset_normalizer-3.4.7-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:fea24543955a6a729c45a73fe90e08c743f0b3334bbf3201e6c4bc1b0c7fa464", size = 227274, upload-time = "2026-04-02T09:26:38.93Z" }, + { url = "https://files.pythonhosted.org/packages/8a/2d/a572df5c9204ab7688ec1edc895a73ebded3b023bb07364710b05dd1c9be/charset_normalizer-3.4.7-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:bb6d88045545b26da47aa879dd4a89a71d1dce0f0e549b1abcb31dfe4a8eac49", size = 218468, upload-time = "2026-04-02T09:26:40.17Z" }, + { url = "https://files.pythonhosted.org/packages/86/eb/890922a8b03a568ca2f336c36585a4713c55d4d67bf0f0c78924be6315ca/charset_normalizer-3.4.7-cp312-cp312-win32.whl", hash = "sha256:2257141f39fe65a3fdf38aeccae4b953e5f3b3324f4ff0daf9f15b8518666a2c", size = 148460, upload-time = "2026-04-02T09:26:41.416Z" }, + { url = "https://files.pythonhosted.org/packages/35/d9/0e7dffa06c5ab081f75b1b786f0aefc88365825dfcd0ac544bdb7b2b6853/charset_normalizer-3.4.7-cp312-cp312-win_amd64.whl", hash = "sha256:5ed6ab538499c8644b8a3e18debabcd7ce684f3fa91cf867521a7a0279cab2d6", size = 159330, upload-time = "2026-04-02T09:26:42.554Z" }, + { url = "https://files.pythonhosted.org/packages/9e/5d/481bcc2a7c88ea6b0878c299547843b2521ccbc40980cb406267088bc701/charset_normalizer-3.4.7-cp312-cp312-win_arm64.whl", hash = "sha256:56be790f86bfb2c98fb742ce566dfb4816e5a83384616ab59c49e0604d49c51d", size = 147828, upload-time = "2026-04-02T09:26:44.075Z" }, + { url = "https://files.pythonhosted.org/packages/c1/3b/66777e39d3ae1ddc77ee606be4ec6d8cbd4c801f65e5a1b6f2b11b8346dd/charset_normalizer-3.4.7-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:f496c9c3cc02230093d8330875c4c3cdfc3b73612a5fd921c65d39cbcef08063", size = 309627, upload-time = "2026-04-02T09:26:45.198Z" }, + { url = "https://files.pythonhosted.org/packages/2e/4e/b7f84e617b4854ade48a1b7915c8ccfadeba444d2a18c291f696e37f0d3b/charset_normalizer-3.4.7-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0ea948db76d31190bf08bd371623927ee1339d5f2a0b4b1b4a4439a65298703c", size = 207008, upload-time = "2026-04-02T09:26:46.824Z" }, + { url = "https://files.pythonhosted.org/packages/c4/bb/ec73c0257c9e11b268f018f068f5d00aa0ef8c8b09f7753ebd5f2880e248/charset_normalizer-3.4.7-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:a277ab8928b9f299723bc1a2dabb1265911b1a76341f90a510368ca44ad9ab66", size = 228303, upload-time = "2026-04-02T09:26:48.397Z" }, + { url = "https://files.pythonhosted.org/packages/85/fb/32d1f5033484494619f701e719429c69b766bfc4dbc61aa9e9c8c166528b/charset_normalizer-3.4.7-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:3bec022aec2c514d9cf199522a802bd007cd588ab17ab2525f20f9c34d067c18", size = 224282, upload-time = "2026-04-02T09:26:49.684Z" }, + { url = "https://files.pythonhosted.org/packages/fa/07/330e3a0dda4c404d6da83b327270906e9654a24f6c546dc886a0eb0ffb23/charset_normalizer-3.4.7-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:e044c39e41b92c845bc815e5ae4230804e8e7bc29e399b0437d64222d92809dd", size = 215595, upload-time = "2026-04-02T09:26:50.915Z" }, + { url = "https://files.pythonhosted.org/packages/e3/7c/fc890655786e423f02556e0216d4b8c6bcb6bdfa890160dc66bf52dee468/charset_normalizer-3.4.7-cp313-cp313-manylinux_2_31_armv7l.whl", hash = "sha256:f495a1652cf3fbab2eb0639776dad966c2fb874d79d87ca07f9d5f059b8bd215", size = 201986, upload-time = "2026-04-02T09:26:52.197Z" }, + { url = "https://files.pythonhosted.org/packages/d8/97/bfb18b3db2aed3b90cf54dc292ad79fdd5ad65c4eae454099475cbeadd0d/charset_normalizer-3.4.7-cp313-cp313-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:e712b419df8ba5e42b226c510472b37bd57b38e897d3eca5e8cfd410a29fa859", size = 211711, upload-time = "2026-04-02T09:26:53.49Z" }, + { url = "https://files.pythonhosted.org/packages/6f/a5/a581c13798546a7fd557c82614a5c65a13df2157e9ad6373166d2a3e645d/charset_normalizer-3.4.7-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:7804338df6fcc08105c7745f1502ba68d900f45fd770d5bdd5288ddccb8a42d8", size = 210036, upload-time = "2026-04-02T09:26:54.975Z" }, + { url = "https://files.pythonhosted.org/packages/8c/bf/b3ab5bcb478e4193d517644b0fb2bf5497fbceeaa7a1bc0f4d5b50953861/charset_normalizer-3.4.7-cp313-cp313-musllinux_1_2_armv7l.whl", hash = "sha256:481551899c856c704d58119b5025793fa6730adda3571971af568f66d2424bb5", size = 202998, upload-time = "2026-04-02T09:26:56.303Z" }, + { url = "https://files.pythonhosted.org/packages/e7/4e/23efd79b65d314fa320ec6017b4b5834d5c12a58ba4610aa353af2e2f577/charset_normalizer-3.4.7-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:f59099f9b66f0d7145115e6f80dd8b1d847176df89b234a5a6b3f00437aa0832", size = 230056, upload-time = "2026-04-02T09:26:57.554Z" }, + { url = "https://files.pythonhosted.org/packages/b9/9f/1e1941bc3f0e01df116e68dc37a55c4d249df5e6fa77f008841aef68264f/charset_normalizer-3.4.7-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:f59ad4c0e8f6bba240a9bb85504faa1ab438237199d4cce5f622761507b8f6a6", size = 211537, upload-time = "2026-04-02T09:26:58.843Z" }, + { url = "https://files.pythonhosted.org/packages/80/0f/088cbb3020d44428964a6c97fe1edfb1b9550396bf6d278330281e8b709c/charset_normalizer-3.4.7-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:3dedcc22d73ec993f42055eff4fcfed9318d1eeb9a6606c55892a26964964e48", size = 226176, upload-time = "2026-04-02T09:27:00.437Z" }, + { url = "https://files.pythonhosted.org/packages/6a/9f/130394f9bbe06f4f63e22641d32fc9b202b7e251c9aef4db044324dac493/charset_normalizer-3.4.7-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:64f02c6841d7d83f832cd97ccf8eb8a906d06eb95d5276069175c696b024b60a", size = 217723, upload-time = "2026-04-02T09:27:02.021Z" }, + { url = "https://files.pythonhosted.org/packages/73/55/c469897448a06e49f8fa03f6caae97074fde823f432a98f979cc42b90e69/charset_normalizer-3.4.7-cp313-cp313-win32.whl", hash = "sha256:4042d5c8f957e15221d423ba781e85d553722fc4113f523f2feb7b188cc34c5e", size = 148085, upload-time = "2026-04-02T09:27:03.192Z" }, + { url = "https://files.pythonhosted.org/packages/5d/78/1b74c5bbb3f99b77a1715c91b3e0b5bdb6fe302d95ace4f5b1bec37b0167/charset_normalizer-3.4.7-cp313-cp313-win_amd64.whl", hash = "sha256:3946fa46a0cf3e4c8cb1cc52f56bb536310d34f25f01ca9b6c16afa767dab110", size = 158819, upload-time = "2026-04-02T09:27:04.454Z" }, + { url = "https://files.pythonhosted.org/packages/68/86/46bd42279d323deb8687c4a5a811fd548cb7d1de10cf6535d099877a9a9f/charset_normalizer-3.4.7-cp313-cp313-win_arm64.whl", hash = "sha256:80d04837f55fc81da168b98de4f4b797ef007fc8a79ab71c6ec9bc4dd662b15b", size = 147915, upload-time = "2026-04-02T09:27:05.971Z" }, + { url = "https://files.pythonhosted.org/packages/97/c8/c67cb8c70e19ef1960b97b22ed2a1567711de46c4ddf19799923adc836c2/charset_normalizer-3.4.7-cp314-cp314-macosx_10_15_universal2.whl", hash = "sha256:c36c333c39be2dbca264d7803333c896ab8fa7d4d6f0ab7edb7dfd7aea6e98c0", size = 309234, upload-time = "2026-04-02T09:27:07.194Z" }, + { url = "https://files.pythonhosted.org/packages/99/85/c091fdee33f20de70d6c8b522743b6f831a2f1cd3ff86de4c6a827c48a76/charset_normalizer-3.4.7-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:1c2aed2e5e41f24ea8ef1590b8e848a79b56f3a5564a65ceec43c9d692dc7d8a", size = 208042, upload-time = "2026-04-02T09:27:08.749Z" }, + { url = "https://files.pythonhosted.org/packages/87/1c/ab2ce611b984d2fd5d86a5a8a19c1ae26acac6bad967da4967562c75114d/charset_normalizer-3.4.7-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:54523e136b8948060c0fa0bc7b1b50c32c186f2fceee897a495406bb6e311d2b", size = 228706, upload-time = "2026-04-02T09:27:09.951Z" }, + { url = "https://files.pythonhosted.org/packages/a8/29/2b1d2cb00bf085f59d29eb773ce58ec2d325430f8c216804a0a5cd83cbca/charset_normalizer-3.4.7-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:715479b9a2802ecac752a3b0efa2b0b60285cf962ee38414211abdfccc233b41", size = 224727, upload-time = "2026-04-02T09:27:11.175Z" }, + { url = "https://files.pythonhosted.org/packages/47/5c/032c2d5a07fe4d4855fea851209cca2b6f03ebeb6d4e3afdb3358386a684/charset_normalizer-3.4.7-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:bd6c2a1c7573c64738d716488d2cdd3c00e340e4835707d8fdb8dc1a66ef164e", size = 215882, upload-time = "2026-04-02T09:27:12.446Z" }, + { url = "https://files.pythonhosted.org/packages/2c/c2/356065d5a8b78ed04499cae5f339f091946a6a74f91e03476c33f0ab7100/charset_normalizer-3.4.7-cp314-cp314-manylinux_2_31_armv7l.whl", hash = "sha256:c45e9440fb78f8ddabcf714b68f936737a121355bf59f3907f4e17721b9d1aae", size = 200860, upload-time = "2026-04-02T09:27:13.721Z" }, + { url = "https://files.pythonhosted.org/packages/0c/cd/a32a84217ced5039f53b29f460962abb2d4420def55afabe45b1c3c7483d/charset_normalizer-3.4.7-cp314-cp314-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:3534e7dcbdcf757da6b85a0bbf5b6868786d5982dd959b065e65481644817a18", size = 211564, upload-time = "2026-04-02T09:27:15.272Z" }, + { url = "https://files.pythonhosted.org/packages/44/86/58e6f13ce26cc3b8f4a36b94a0f22ae2f00a72534520f4ae6857c4b81f89/charset_normalizer-3.4.7-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:e8ac484bf18ce6975760921bb6148041faa8fef0547200386ea0b52b5d27bf7b", size = 211276, upload-time = "2026-04-02T09:27:16.834Z" }, + { url = "https://files.pythonhosted.org/packages/8f/fe/d17c32dc72e17e155e06883efa84514ca375f8a528ba2546bee73fc4df81/charset_normalizer-3.4.7-cp314-cp314-musllinux_1_2_armv7l.whl", hash = "sha256:a5fe03b42827c13cdccd08e6c0247b6a6d4b5e3cdc53fd1749f5896adcdc2356", size = 201238, upload-time = "2026-04-02T09:27:18.229Z" }, + { url = "https://files.pythonhosted.org/packages/6a/29/f33daa50b06525a237451cdb6c69da366c381a3dadcd833fa5676bc468b3/charset_normalizer-3.4.7-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:2d6eb928e13016cea4f1f21d1e10c1cebd5a421bc57ddf5b1142ae3f86824fab", size = 230189, upload-time = "2026-04-02T09:27:19.445Z" }, + { url = "https://files.pythonhosted.org/packages/b6/6e/52c84015394a6a0bdcd435210a7e944c5f94ea1055f5cc5d56c5fe368e7b/charset_normalizer-3.4.7-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:e74327fb75de8986940def6e8dee4f127cc9752bee7355bb323cc5b2659b6d46", size = 211352, upload-time = "2026-04-02T09:27:20.79Z" }, + { url = "https://files.pythonhosted.org/packages/8c/d7/4353be581b373033fb9198bf1da3cf8f09c1082561e8e922aa7b39bf9fe8/charset_normalizer-3.4.7-cp314-cp314-musllinux_1_2_s390x.whl", hash = "sha256:d6038d37043bced98a66e68d3aa2b6a35505dc01328cd65217cefe82f25def44", size = 227024, upload-time = "2026-04-02T09:27:22.063Z" }, + { url = "https://files.pythonhosted.org/packages/30/45/99d18aa925bd1740098ccd3060e238e21115fffbfdcb8f3ece837d0ace6c/charset_normalizer-3.4.7-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:7579e913a5339fb8fa133f6bbcfd8e6749696206cf05acdbdca71a1b436d8e72", size = 217869, upload-time = "2026-04-02T09:27:23.486Z" }, + { url = "https://files.pythonhosted.org/packages/5c/05/5ee478aa53f4bb7996482153d4bfe1b89e0f087f0ab6b294fcf92d595873/charset_normalizer-3.4.7-cp314-cp314-win32.whl", hash = "sha256:5b77459df20e08151cd6f8b9ef8ef1f961ef73d85c21a555c7eed5b79410ec10", size = 148541, upload-time = "2026-04-02T09:27:25.146Z" }, + { url = "https://files.pythonhosted.org/packages/48/77/72dcb0921b2ce86420b2d79d454c7022bf5be40202a2a07906b9f2a35c97/charset_normalizer-3.4.7-cp314-cp314-win_amd64.whl", hash = "sha256:92a0a01ead5e668468e952e4238cccd7c537364eb7d851ab144ab6627dbbe12f", size = 159634, upload-time = "2026-04-02T09:27:26.642Z" }, + { url = "https://files.pythonhosted.org/packages/c6/a3/c2369911cd72f02386e4e340770f6e158c7980267da16af8f668217abaa0/charset_normalizer-3.4.7-cp314-cp314-win_arm64.whl", hash = "sha256:67f6279d125ca0046a7fd386d01b311c6363844deac3e5b069b514ba3e63c246", size = 148384, upload-time = "2026-04-02T09:27:28.271Z" }, + { url = "https://files.pythonhosted.org/packages/94/09/7e8a7f73d24dba1f0035fbbf014d2c36828fc1bf9c88f84093e57d315935/charset_normalizer-3.4.7-cp314-cp314t-macosx_10_15_universal2.whl", hash = "sha256:effc3f449787117233702311a1b7d8f59cba9ced946ba727bdc329ec69028e24", size = 330133, upload-time = "2026-04-02T09:27:29.474Z" }, + { url = "https://files.pythonhosted.org/packages/8d/da/96975ddb11f8e977f706f45cddd8540fd8242f71ecdb5d18a80723dcf62c/charset_normalizer-3.4.7-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:fbccdc05410c9ee21bbf16a35f4c1d16123dcdeb8a1d38f33654fa21d0234f79", size = 216257, upload-time = "2026-04-02T09:27:30.793Z" }, + { url = "https://files.pythonhosted.org/packages/e5/e8/1d63bf8ef2d388e95c64b2098f45f84758f6d102a087552da1485912637b/charset_normalizer-3.4.7-cp314-cp314t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:733784b6d6def852c814bce5f318d25da2ee65dd4839a0718641c696e09a2960", size = 234851, upload-time = "2026-04-02T09:27:32.44Z" }, + { url = "https://files.pythonhosted.org/packages/9b/40/e5ff04233e70da2681fa43969ad6f66ca5611d7e669be0246c4c7aaf6dc8/charset_normalizer-3.4.7-cp314-cp314t-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:a89c23ef8d2c6b27fd200a42aa4ac72786e7c60d40efdc76e6011260b6e949c4", size = 233393, upload-time = "2026-04-02T09:27:34.03Z" }, + { url = "https://files.pythonhosted.org/packages/be/c1/06c6c49d5a5450f76899992f1ee40b41d076aee9279b49cf9974d2f313d5/charset_normalizer-3.4.7-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:6c114670c45346afedc0d947faf3c7f701051d2518b943679c8ff88befe14f8e", size = 223251, upload-time = "2026-04-02T09:27:35.369Z" }, + { url = "https://files.pythonhosted.org/packages/2b/9f/f2ff16fb050946169e3e1f82134d107e5d4ae72647ec8a1b1446c148480f/charset_normalizer-3.4.7-cp314-cp314t-manylinux_2_31_armv7l.whl", hash = "sha256:a180c5e59792af262bf263b21a3c49353f25945d8d9f70628e73de370d55e1e1", size = 206609, upload-time = "2026-04-02T09:27:36.661Z" }, + { url = "https://files.pythonhosted.org/packages/69/d5/a527c0cd8d64d2eab7459784fb4169a0ac76e5a6fc5237337982fd61347e/charset_normalizer-3.4.7-cp314-cp314t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:3c9a494bc5ec77d43cea229c4f6db1e4d8fe7e1bbffa8b6f0f0032430ff8ab44", size = 220014, upload-time = "2026-04-02T09:27:38.019Z" }, + { url = "https://files.pythonhosted.org/packages/7e/80/8a7b8104a3e203074dc9aa2c613d4b726c0e136bad1cc734594b02867972/charset_normalizer-3.4.7-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:8d828b6667a32a728a1ad1d93957cdf37489c57b97ae6c4de2860fa749b8fc1e", size = 218979, upload-time = "2026-04-02T09:27:39.37Z" }, + { url = "https://files.pythonhosted.org/packages/02/9a/b759b503d507f375b2b5c153e4d2ee0a75aa215b7f2489cf314f4541f2c0/charset_normalizer-3.4.7-cp314-cp314t-musllinux_1_2_armv7l.whl", hash = "sha256:cf1493cd8607bec4d8a7b9b004e699fcf8f9103a9284cc94962cb73d20f9d4a3", size = 209238, upload-time = "2026-04-02T09:27:40.722Z" }, + { url = "https://files.pythonhosted.org/packages/c2/4e/0f3f5d47b86bdb79256e7290b26ac847a2832d9a4033f7eb2cd4bcf4bb5b/charset_normalizer-3.4.7-cp314-cp314t-musllinux_1_2_ppc64le.whl", hash = "sha256:0c96c3b819b5c3e9e165495db84d41914d6894d55181d2d108cc1a69bfc9cce0", size = 236110, upload-time = "2026-04-02T09:27:42.33Z" }, + { url = "https://files.pythonhosted.org/packages/96/23/bce28734eb3ed2c91dcf93abeb8a5cf393a7b2749725030bb630e554fdd8/charset_normalizer-3.4.7-cp314-cp314t-musllinux_1_2_riscv64.whl", hash = "sha256:752a45dc4a6934060b3b0dab47e04edc3326575f82be64bc4fc293914566503e", size = 219824, upload-time = "2026-04-02T09:27:43.924Z" }, + { url = "https://files.pythonhosted.org/packages/2c/6f/6e897c6984cc4d41af319b077f2f600fc8214eb2fe2d6bcb79141b882400/charset_normalizer-3.4.7-cp314-cp314t-musllinux_1_2_s390x.whl", hash = "sha256:8778f0c7a52e56f75d12dae53ae320fae900a8b9b4164b981b9c5ce059cd1fcb", size = 233103, upload-time = "2026-04-02T09:27:45.348Z" }, + { url = "https://files.pythonhosted.org/packages/76/22/ef7bd0fe480a0ae9b656189ec00744b60933f68b4f42a7bb06589f6f576a/charset_normalizer-3.4.7-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:ce3412fbe1e31eb81ea42f4169ed94861c56e643189e1e75f0041f3fe7020abe", size = 225194, upload-time = "2026-04-02T09:27:46.706Z" }, + { url = "https://files.pythonhosted.org/packages/c5/a7/0e0ab3e0b5bc1219bd80a6a0d4d72ca74d9250cb2382b7c699c147e06017/charset_normalizer-3.4.7-cp314-cp314t-win32.whl", hash = "sha256:c03a41a8784091e67a39648f70c5f97b5b6a37f216896d44d2cdcb82615339a0", size = 159827, upload-time = "2026-04-02T09:27:48.053Z" }, + { url = "https://files.pythonhosted.org/packages/7a/1d/29d32e0fb40864b1f878c7f5a0b343ae676c6e2b271a2d55cc3a152391da/charset_normalizer-3.4.7-cp314-cp314t-win_amd64.whl", hash = "sha256:03853ed82eeebbce3c2abfdbc98c96dc205f32a79627688ac9a27370ea61a49c", size = 174168, upload-time = "2026-04-02T09:27:49.795Z" }, + { url = "https://files.pythonhosted.org/packages/de/32/d92444ad05c7a6e41fb2036749777c163baf7a0301a040cb672d6b2b1ae9/charset_normalizer-3.4.7-cp314-cp314t-win_arm64.whl", hash = "sha256:c35abb8bfff0185efac5878da64c45dafd2b37fb0383add1be155a763c1f083d", size = 153018, upload-time = "2026-04-02T09:27:51.116Z" }, + { url = "https://files.pythonhosted.org/packages/db/8f/61959034484a4a7c527811f4721e75d02d653a35afb0b6054474d8185d4c/charset_normalizer-3.4.7-py3-none-any.whl", hash = "sha256:3dce51d0f5e7951f8bb4900c257dad282f49190fdbebecd4ba99bcc41fef404d", size = 61958, upload-time = "2026-04-02T09:28:37.794Z" }, +] + +[[package]] +name = "chevron" +version = "0.14.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/15/1f/ca74b65b19798895d63a6e92874162f44233467c9e7c1ed8afd19016ebe9/chevron-0.14.0.tar.gz", hash = "sha256:87613aafdf6d77b6a90ff073165a61ae5086e21ad49057aa0e53681601800ebf", size = 11440, upload-time = "2021-01-02T22:47:59.233Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/52/93/342cc62a70ab727e093ed98e02a725d85b746345f05d2b5e5034649f4ec8/chevron-0.14.0-py3-none-any.whl", hash = "sha256:fbf996a709f8da2e745ef763f482ce2d311aa817d287593a5b990d6d6e4f0443", size = 11595, upload-time = "2021-01-02T22:47:57.847Z" }, +] + [[package]] name = "click" version = "8.2.1" @@ -163,6 +279,42 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/f7/e6/efe534ef0952b531b630780e19cabd416e2032697019d5295defc6ef9bd9/deepdiff-8.6.1-py3-none-any.whl", hash = "sha256:ee8708a7f7d37fb273a541fa24ad010ed484192cd0c4ffc0fa0ed5e2d4b9e78b", size = 91378, upload-time = "2025-09-03T19:40:39.679Z" }, ] +[[package]] +name = "exceptiongroup" +version = "1.3.1" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "typing-extensions", marker = "python_full_version < '3.13'" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/50/79/66800aadf48771f6b62f7eb014e352e5d06856655206165d775e675a02c9/exceptiongroup-1.3.1.tar.gz", hash = "sha256:8b412432c6055b0b7d14c310000ae93352ed6754f70fa8f7c34141f91c4e3219", size = 30371, upload-time = "2025-11-21T23:01:54.787Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/8a/0e/97c33bf5009bdbac74fd2beace167cab3f978feb69cc36f1ef79360d6c4e/exceptiongroup-1.3.1-py3-none-any.whl", hash = "sha256:a7a39a3bd276781e98394987d3a5701d0c4edffb633bb7a5144577f82c773598", size = 16740, upload-time = "2025-11-21T23:01:53.443Z" }, +] + +[[package]] +name = "gitdb" +version = "4.0.12" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "smmap" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/72/94/63b0fc47eb32792c7ba1fe1b694daec9a63620db1e313033d18140c2320a/gitdb-4.0.12.tar.gz", hash = "sha256:5ef71f855d191a3326fcfbc0d5da835f26b13fbcba60c32c21091c349ffdb571", size = 394684, upload-time = "2025-01-02T07:20:46.413Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/a0/61/5c78b91c3143ed5c14207f463aecfc8f9dbb5092fb2869baf37c273b2705/gitdb-4.0.12-py3-none-any.whl", hash = "sha256:67073e15955400952c6565cc3e707c554a4eea2e428946f7a4c162fab9bd9bcf", size = 62794, upload-time = "2025-01-02T07:20:43.624Z" }, +] + +[[package]] +name = "gitpython" +version = "3.1.46" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "gitdb" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/df/b5/59d16470a1f0dfe8c793f9ef56fd3826093fc52b3bd96d6b9d6c26c7e27b/gitpython-3.1.46.tar.gz", hash = "sha256:400124c7d0ef4ea03f7310ac2fbf7151e09ff97f2a3288d64a440c584a29c37f", size = 215371, upload-time = "2026-01-01T15:37:32.073Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/6a/09/e21df6aef1e1ffc0c816f0522ddc3f6dcded766c3261813131c78a704470/gitpython-3.1.46-py3-none-any.whl", hash = "sha256:79812ed143d9d25b6d176a10bb511de0f9c67b1fa641d82097b0ab90398a2058", size = 208620, upload-time = "2026-01-01T15:37:30.574Z" }, +] + [[package]] name = "h11" version = "0.16.0" @@ -218,6 +370,33 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/2c/e1/e6716421ea10d38022b952c159d5161ca1193197fb744506875fbb87ea7b/iniconfig-2.1.0-py3-none-any.whl", hash = "sha256:9deba5723312380e77435581c6bf4935c94cbfab9b1ed33ef8d238ea168eb760", size = 6050, upload-time = "2025-03-19T20:10:01.071Z" }, ] +[[package]] +name = "jsonschema" +version = "4.26.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "attrs" }, + { name = "jsonschema-specifications" }, + { name = "referencing" }, + { name = "rpds-py" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/b3/fc/e067678238fa451312d4c62bf6e6cf5ec56375422aee02f9cb5f909b3047/jsonschema-4.26.0.tar.gz", hash = "sha256:0c26707e2efad8aa1bfc5b7ce170f3fccc2e4918ff85989ba9ffa9facb2be326", size = 366583, upload-time = "2026-01-07T13:41:07.246Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/69/90/f63fb5873511e014207a475e2bb4e8b2e570d655b00ac19a9a0ca0a385ee/jsonschema-4.26.0-py3-none-any.whl", hash = "sha256:d489f15263b8d200f8387e64b4c3a75f06629559fb73deb8fdfb525f2dab50ce", size = 90630, upload-time = "2026-01-07T13:41:05.306Z" }, +] + +[[package]] +name = "jsonschema-specifications" +version = "2025.9.1" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "referencing" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/19/74/a633ee74eb36c44aa6d1095e7cc5569bebf04342ee146178e2d36600708b/jsonschema_specifications-2025.9.1.tar.gz", hash = "sha256:b540987f239e745613c7a9176f3edb72b832a4ac465cf02712288397832b5e8d", size = 32855, upload-time = "2025-09-08T01:34:59.186Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/41/45/1a4ed80516f02155c51f51e8cedb3c1902296743db0bbc66608a0db2814f/jsonschema_specifications-2025.9.1-py3-none-any.whl", hash = "sha256:98802fee3a11ee76ecaca44429fda8a41bff98b00a0f2838151b113f210cc6fe", size = 18437, upload-time = "2025-09-08T01:34:57.871Z" }, +] + [[package]] name = "markdown-it-py" version = "3.0.0" @@ -415,6 +594,18 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/1e/18/98a99ad95133c6a6e2005fe89faedf294a748bd5dc803008059409ac9b1e/python_dotenv-1.1.0-py3-none-any.whl", hash = "sha256:d7c01d9e2293916c18baf562d95698754b0dbbb5e74d457c45d4f6561fb9d55d", size = 20256, upload-time = "2025-03-25T10:14:55.034Z" }, ] +[[package]] +name = "python-slugify" +version = "8.0.4" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "text-unidecode" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/87/c7/5e1547c44e31da50a460df93af11a535ace568ef89d7a811069ead340c4a/python-slugify-8.0.4.tar.gz", hash = "sha256:59202371d1d05b54a9e7720c5e038f928f45daaffe41dd10822f3907b937c856", size = 10921, upload-time = "2024-02-08T18:32:45.488Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/a4/62/02da182e544a51a5c3ccf4b03ab79df279f9c60c5e82d5e8bec7ca26ac11/python_slugify-8.0.4-py2.py3-none-any.whl", hash = "sha256:276540b79961052b66b7d116620b36518847f52d5fd9e3a70164fc8c50faa6b8", size = 10051, upload-time = "2024-02-08T18:32:43.911Z" }, +] + [[package]] name = "pyyaml" version = "6.0.2" @@ -441,6 +632,35 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/fa/de/02b54f42487e3d3c6efb3f89428677074ca7bf43aae402517bc7cca949f3/PyYAML-6.0.2-cp313-cp313-win_amd64.whl", hash = "sha256:8388ee1976c416731879ac16da0aff3f63b286ffdd57cdeb95f3f2e085687563", size = 156446, upload-time = "2024-08-06T20:33:04.33Z" }, ] +[[package]] +name = "referencing" +version = "0.37.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "attrs" }, + { name = "rpds-py" }, + { name = "typing-extensions", marker = "python_full_version < '3.13'" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/22/f5/df4e9027acead3ecc63e50fe1e36aca1523e1719559c499951bb4b53188f/referencing-0.37.0.tar.gz", hash = "sha256:44aefc3142c5b842538163acb373e24cce6632bd54bdb01b21ad5863489f50d8", size = 78036, upload-time = "2025-10-13T15:30:48.871Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/2c/58/ca301544e1fa93ed4f80d724bf5b194f6e4b945841c5bfd555878eea9fcb/referencing-0.37.0-py3-none-any.whl", hash = "sha256:381329a9f99628c9069361716891d34ad94af76e461dcb0335825aecc7692231", size = 26766, upload-time = "2025-10-13T15:30:47.625Z" }, +] + +[[package]] +name = "requests" +version = "2.33.1" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "certifi" }, + { name = "charset-normalizer" }, + { name = "idna" }, + { name = "urllib3" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/5f/a4/98b9c7c6428a668bf7e42ebb7c79d576a1c3c1e3ae2d47e674b468388871/requests-2.33.1.tar.gz", hash = "sha256:18817f8c57c6263968bc123d237e3b8b08ac046f5456bd1e307ee8f4250d3517", size = 134120, upload-time = "2026-03-30T16:09:15.531Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/d7/8e/7540e8a2036f79a125c1d2ebadf69ed7901608859186c856fa0388ef4197/requests-2.33.1-py3-none-any.whl", hash = "sha256:4e6d1ef462f3626a1f0a0a9c42dd93c63bad33f9f1c1937509b8c5c8718ab56a", size = 64947, upload-time = "2026-03-30T16:09:13.83Z" }, +] + [[package]] name = "respx" version = "0.22.0" @@ -466,6 +686,87 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/0d/9b/63f4c7ebc259242c89b3acafdb37b41d1185c07ff0011164674e9076b491/rich-14.0.0-py3-none-any.whl", hash = "sha256:1c9491e1951aac09caffd42f448ee3d04e58923ffe14993f6e83068dc395d7e0", size = 243229, upload-time = "2025-03-30T14:15:12.283Z" }, ] +[[package]] +name = "rpds-py" +version = "0.30.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/20/af/3f2f423103f1113b36230496629986e0ef7e199d2aa8392452b484b38ced/rpds_py-0.30.0.tar.gz", hash = "sha256:dd8ff7cf90014af0c0f787eea34794ebf6415242ee1d6fa91eaba725cc441e84", size = 69469, upload-time = "2025-11-30T20:24:38.837Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/03/e7/98a2f4ac921d82f33e03f3835f5bf3a4a40aa1bfdc57975e74a97b2b4bdd/rpds_py-0.30.0-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:a161f20d9a43006833cd7068375a94d035714d73a172b681d8881820600abfad", size = 375086, upload-time = "2025-11-30T20:22:17.93Z" }, + { url = "https://files.pythonhosted.org/packages/4d/a1/bca7fd3d452b272e13335db8d6b0b3ecde0f90ad6f16f3328c6fb150c889/rpds_py-0.30.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:6abc8880d9d036ecaafe709079969f56e876fcf107f7a8e9920ba6d5a3878d05", size = 359053, upload-time = "2025-11-30T20:22:19.297Z" }, + { url = "https://files.pythonhosted.org/packages/65/1c/ae157e83a6357eceff62ba7e52113e3ec4834a84cfe07fa4b0757a7d105f/rpds_py-0.30.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ca28829ae5f5d569bb62a79512c842a03a12576375d5ece7d2cadf8abe96ec28", size = 390763, upload-time = "2025-11-30T20:22:21.661Z" }, + { url = "https://files.pythonhosted.org/packages/d4/36/eb2eb8515e2ad24c0bd43c3ee9cd74c33f7ca6430755ccdb240fd3144c44/rpds_py-0.30.0-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:a1010ed9524c73b94d15919ca4d41d8780980e1765babf85f9a2f90d247153dd", size = 408951, upload-time = "2025-11-30T20:22:23.408Z" }, + { url = "https://files.pythonhosted.org/packages/d6/65/ad8dc1784a331fabbd740ef6f71ce2198c7ed0890dab595adb9ea2d775a1/rpds_py-0.30.0-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:f8d1736cfb49381ba528cd5baa46f82fdc65c06e843dab24dd70b63d09121b3f", size = 514622, upload-time = "2025-11-30T20:22:25.16Z" }, + { url = "https://files.pythonhosted.org/packages/63/8e/0cfa7ae158e15e143fe03993b5bcd743a59f541f5952e1546b1ac1b5fd45/rpds_py-0.30.0-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:d948b135c4693daff7bc2dcfc4ec57237a29bd37e60c2fabf5aff2bbacf3e2f1", size = 414492, upload-time = "2025-11-30T20:22:26.505Z" }, + { url = "https://files.pythonhosted.org/packages/60/1b/6f8f29f3f995c7ffdde46a626ddccd7c63aefc0efae881dc13b6e5d5bb16/rpds_py-0.30.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:47f236970bccb2233267d89173d3ad2703cd36a0e2a6e92d0560d333871a3d23", size = 394080, upload-time = "2025-11-30T20:22:27.934Z" }, + { url = "https://files.pythonhosted.org/packages/6d/d5/a266341051a7a3ca2f4b750a3aa4abc986378431fc2da508c5034d081b70/rpds_py-0.30.0-cp312-cp312-manylinux_2_31_riscv64.whl", hash = "sha256:2e6ecb5a5bcacf59c3f912155044479af1d0b6681280048b338b28e364aca1f6", size = 408680, upload-time = "2025-11-30T20:22:29.341Z" }, + { url = "https://files.pythonhosted.org/packages/10/3b/71b725851df9ab7a7a4e33cf36d241933da66040d195a84781f49c50490c/rpds_py-0.30.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:a8fa71a2e078c527c3e9dc9fc5a98c9db40bcc8a92b4e8858e36d329f8684b51", size = 423589, upload-time = "2025-11-30T20:22:31.469Z" }, + { url = "https://files.pythonhosted.org/packages/00/2b/e59e58c544dc9bd8bd8384ecdb8ea91f6727f0e37a7131baeff8d6f51661/rpds_py-0.30.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:73c67f2db7bc334e518d097c6d1e6fed021bbc9b7d678d6cc433478365d1d5f5", size = 573289, upload-time = "2025-11-30T20:22:32.997Z" }, + { url = "https://files.pythonhosted.org/packages/da/3e/a18e6f5b460893172a7d6a680e86d3b6bc87a54c1f0b03446a3c8c7b588f/rpds_py-0.30.0-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:5ba103fb455be00f3b1c2076c9d4264bfcb037c976167a6047ed82f23153f02e", size = 599737, upload-time = "2025-11-30T20:22:34.419Z" }, + { url = "https://files.pythonhosted.org/packages/5c/e2/714694e4b87b85a18e2c243614974413c60aa107fd815b8cbc42b873d1d7/rpds_py-0.30.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:7cee9c752c0364588353e627da8a7e808a66873672bcb5f52890c33fd965b394", size = 563120, upload-time = "2025-11-30T20:22:35.903Z" }, + { url = "https://files.pythonhosted.org/packages/6f/ab/d5d5e3bcedb0a77f4f613706b750e50a5a3ba1c15ccd3665ecc636c968fd/rpds_py-0.30.0-cp312-cp312-win32.whl", hash = "sha256:1ab5b83dbcf55acc8b08fc62b796ef672c457b17dbd7820a11d6c52c06839bdf", size = 223782, upload-time = "2025-11-30T20:22:37.271Z" }, + { url = "https://files.pythonhosted.org/packages/39/3b/f786af9957306fdc38a74cef405b7b93180f481fb48453a114bb6465744a/rpds_py-0.30.0-cp312-cp312-win_amd64.whl", hash = "sha256:a090322ca841abd453d43456ac34db46e8b05fd9b3b4ac0c78bcde8b089f959b", size = 240463, upload-time = "2025-11-30T20:22:39.021Z" }, + { url = "https://files.pythonhosted.org/packages/f3/d2/b91dc748126c1559042cfe41990deb92c4ee3e2b415f6b5234969ffaf0cc/rpds_py-0.30.0-cp312-cp312-win_arm64.whl", hash = "sha256:669b1805bd639dd2989b281be2cfd951c6121b65e729d9b843e9639ef1fd555e", size = 230868, upload-time = "2025-11-30T20:22:40.493Z" }, + { url = "https://files.pythonhosted.org/packages/ed/dc/d61221eb88ff410de3c49143407f6f3147acf2538c86f2ab7ce65ae7d5f9/rpds_py-0.30.0-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:f83424d738204d9770830d35290ff3273fbb02b41f919870479fab14b9d303b2", size = 374887, upload-time = "2025-11-30T20:22:41.812Z" }, + { url = "https://files.pythonhosted.org/packages/fd/32/55fb50ae104061dbc564ef15cc43c013dc4a9f4527a1f4d99baddf56fe5f/rpds_py-0.30.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:e7536cd91353c5273434b4e003cbda89034d67e7710eab8761fd918ec6c69cf8", size = 358904, upload-time = "2025-11-30T20:22:43.479Z" }, + { url = "https://files.pythonhosted.org/packages/58/70/faed8186300e3b9bdd138d0273109784eea2396c68458ed580f885dfe7ad/rpds_py-0.30.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:2771c6c15973347f50fece41fc447c054b7ac2ae0502388ce3b6738cd366e3d4", size = 389945, upload-time = "2025-11-30T20:22:44.819Z" }, + { url = "https://files.pythonhosted.org/packages/bd/a8/073cac3ed2c6387df38f71296d002ab43496a96b92c823e76f46b8af0543/rpds_py-0.30.0-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:0a59119fc6e3f460315fe9d08149f8102aa322299deaa5cab5b40092345c2136", size = 407783, upload-time = "2025-11-30T20:22:46.103Z" }, + { url = "https://files.pythonhosted.org/packages/77/57/5999eb8c58671f1c11eba084115e77a8899d6e694d2a18f69f0ba471ec8b/rpds_py-0.30.0-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:76fec018282b4ead0364022e3c54b60bf368b9d926877957a8624b58419169b7", size = 515021, upload-time = "2025-11-30T20:22:47.458Z" }, + { url = "https://files.pythonhosted.org/packages/e0/af/5ab4833eadc36c0a8ed2bc5c0de0493c04f6c06de223170bd0798ff98ced/rpds_py-0.30.0-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:692bef75a5525db97318e8cd061542b5a79812d711ea03dbc1f6f8dbb0c5f0d2", size = 414589, upload-time = "2025-11-30T20:22:48.872Z" }, + { url = "https://files.pythonhosted.org/packages/b7/de/f7192e12b21b9e9a68a6d0f249b4af3fdcdff8418be0767a627564afa1f1/rpds_py-0.30.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9027da1ce107104c50c81383cae773ef5c24d296dd11c99e2629dbd7967a20c6", size = 394025, upload-time = "2025-11-30T20:22:50.196Z" }, + { url = "https://files.pythonhosted.org/packages/91/c4/fc70cd0249496493500e7cc2de87504f5aa6509de1e88623431fec76d4b6/rpds_py-0.30.0-cp313-cp313-manylinux_2_31_riscv64.whl", hash = "sha256:9cf69cdda1f5968a30a359aba2f7f9aa648a9ce4b580d6826437f2b291cfc86e", size = 408895, upload-time = "2025-11-30T20:22:51.87Z" }, + { url = "https://files.pythonhosted.org/packages/58/95/d9275b05ab96556fefff73a385813eb66032e4c99f411d0795372d9abcea/rpds_py-0.30.0-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:a4796a717bf12b9da9d3ad002519a86063dcac8988b030e405704ef7d74d2d9d", size = 422799, upload-time = "2025-11-30T20:22:53.341Z" }, + { url = "https://files.pythonhosted.org/packages/06/c1/3088fc04b6624eb12a57eb814f0d4997a44b0d208d6cace713033ff1a6ba/rpds_py-0.30.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:5d4c2aa7c50ad4728a094ebd5eb46c452e9cb7edbfdb18f9e1221f597a73e1e7", size = 572731, upload-time = "2025-11-30T20:22:54.778Z" }, + { url = "https://files.pythonhosted.org/packages/d8/42/c612a833183b39774e8ac8fecae81263a68b9583ee343db33ab571a7ce55/rpds_py-0.30.0-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:ba81a9203d07805435eb06f536d95a266c21e5b2dfbf6517748ca40c98d19e31", size = 599027, upload-time = "2025-11-30T20:22:56.212Z" }, + { url = "https://files.pythonhosted.org/packages/5f/60/525a50f45b01d70005403ae0e25f43c0384369ad24ffe46e8d9068b50086/rpds_py-0.30.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:945dccface01af02675628334f7cf49c2af4c1c904748efc5cf7bbdf0b579f95", size = 563020, upload-time = "2025-11-30T20:22:58.2Z" }, + { url = "https://files.pythonhosted.org/packages/0b/5d/47c4655e9bcd5ca907148535c10e7d489044243cc9941c16ed7cd53be91d/rpds_py-0.30.0-cp313-cp313-win32.whl", hash = "sha256:b40fb160a2db369a194cb27943582b38f79fc4887291417685f3ad693c5a1d5d", size = 223139, upload-time = "2025-11-30T20:23:00.209Z" }, + { url = "https://files.pythonhosted.org/packages/f2/e1/485132437d20aa4d3e1d8b3fb5a5e65aa8139f1e097080c2a8443201742c/rpds_py-0.30.0-cp313-cp313-win_amd64.whl", hash = "sha256:806f36b1b605e2d6a72716f321f20036b9489d29c51c91f4dd29a3e3afb73b15", size = 240224, upload-time = "2025-11-30T20:23:02.008Z" }, + { url = "https://files.pythonhosted.org/packages/24/95/ffd128ed1146a153d928617b0ef673960130be0009c77d8fbf0abe306713/rpds_py-0.30.0-cp313-cp313-win_arm64.whl", hash = "sha256:d96c2086587c7c30d44f31f42eae4eac89b60dabbac18c7669be3700f13c3ce1", size = 230645, upload-time = "2025-11-30T20:23:03.43Z" }, + { url = "https://files.pythonhosted.org/packages/ff/1b/b10de890a0def2a319a2626334a7f0ae388215eb60914dbac8a3bae54435/rpds_py-0.30.0-cp313-cp313t-macosx_10_12_x86_64.whl", hash = "sha256:eb0b93f2e5c2189ee831ee43f156ed34e2a89a78a66b98cadad955972548be5a", size = 364443, upload-time = "2025-11-30T20:23:04.878Z" }, + { url = "https://files.pythonhosted.org/packages/0d/bf/27e39f5971dc4f305a4fb9c672ca06f290f7c4e261c568f3dea16a410d47/rpds_py-0.30.0-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:922e10f31f303c7c920da8981051ff6d8c1a56207dbdf330d9047f6d30b70e5e", size = 353375, upload-time = "2025-11-30T20:23:06.342Z" }, + { url = "https://files.pythonhosted.org/packages/40/58/442ada3bba6e8e6615fc00483135c14a7538d2ffac30e2d933ccf6852232/rpds_py-0.30.0-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:cdc62c8286ba9bf7f47befdcea13ea0e26bf294bda99758fd90535cbaf408000", size = 383850, upload-time = "2025-11-30T20:23:07.825Z" }, + { url = "https://files.pythonhosted.org/packages/14/14/f59b0127409a33c6ef6f5c1ebd5ad8e32d7861c9c7adfa9a624fc3889f6c/rpds_py-0.30.0-cp313-cp313t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:47f9a91efc418b54fb8190a6b4aa7813a23fb79c51f4bb84e418f5476c38b8db", size = 392812, upload-time = "2025-11-30T20:23:09.228Z" }, + { url = "https://files.pythonhosted.org/packages/b3/66/e0be3e162ac299b3a22527e8913767d869e6cc75c46bd844aa43fb81ab62/rpds_py-0.30.0-cp313-cp313t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:1f3587eb9b17f3789ad50824084fa6f81921bbf9a795826570bda82cb3ed91f2", size = 517841, upload-time = "2025-11-30T20:23:11.186Z" }, + { url = "https://files.pythonhosted.org/packages/3d/55/fa3b9cf31d0c963ecf1ba777f7cf4b2a2c976795ac430d24a1f43d25a6ba/rpds_py-0.30.0-cp313-cp313t-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:39c02563fc592411c2c61d26b6c5fe1e51eaa44a75aa2c8735ca88b0d9599daa", size = 408149, upload-time = "2025-11-30T20:23:12.864Z" }, + { url = "https://files.pythonhosted.org/packages/60/ca/780cf3b1a32b18c0f05c441958d3758f02544f1d613abf9488cd78876378/rpds_py-0.30.0-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:51a1234d8febafdfd33a42d97da7a43f5dcb120c1060e352a3fbc0c6d36e2083", size = 383843, upload-time = "2025-11-30T20:23:14.638Z" }, + { url = "https://files.pythonhosted.org/packages/82/86/d5f2e04f2aa6247c613da0c1dd87fcd08fa17107e858193566048a1e2f0a/rpds_py-0.30.0-cp313-cp313t-manylinux_2_31_riscv64.whl", hash = "sha256:eb2c4071ab598733724c08221091e8d80e89064cd472819285a9ab0f24bcedb9", size = 396507, upload-time = "2025-11-30T20:23:16.105Z" }, + { url = "https://files.pythonhosted.org/packages/4b/9a/453255d2f769fe44e07ea9785c8347edaf867f7026872e76c1ad9f7bed92/rpds_py-0.30.0-cp313-cp313t-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:6bdfdb946967d816e6adf9a3d8201bfad269c67efe6cefd7093ef959683c8de0", size = 414949, upload-time = "2025-11-30T20:23:17.539Z" }, + { url = "https://files.pythonhosted.org/packages/a3/31/622a86cdc0c45d6df0e9ccb6becdba5074735e7033c20e401a6d9d0e2ca0/rpds_py-0.30.0-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:c77afbd5f5250bf27bf516c7c4a016813eb2d3e116139aed0096940c5982da94", size = 565790, upload-time = "2025-11-30T20:23:19.029Z" }, + { url = "https://files.pythonhosted.org/packages/1c/5d/15bbf0fb4a3f58a3b1c67855ec1efcc4ceaef4e86644665fff03e1b66d8d/rpds_py-0.30.0-cp313-cp313t-musllinux_1_2_i686.whl", hash = "sha256:61046904275472a76c8c90c9ccee9013d70a6d0f73eecefd38c1ae7c39045a08", size = 590217, upload-time = "2025-11-30T20:23:20.885Z" }, + { url = "https://files.pythonhosted.org/packages/6d/61/21b8c41f68e60c8cc3b2e25644f0e3681926020f11d06ab0b78e3c6bbff1/rpds_py-0.30.0-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:4c5f36a861bc4b7da6516dbdf302c55313afa09b81931e8280361a4f6c9a2d27", size = 555806, upload-time = "2025-11-30T20:23:22.488Z" }, + { url = "https://files.pythonhosted.org/packages/f9/39/7e067bb06c31de48de3eb200f9fc7c58982a4d3db44b07e73963e10d3be9/rpds_py-0.30.0-cp313-cp313t-win32.whl", hash = "sha256:3d4a69de7a3e50ffc214ae16d79d8fbb0922972da0356dcf4d0fdca2878559c6", size = 211341, upload-time = "2025-11-30T20:23:24.449Z" }, + { url = "https://files.pythonhosted.org/packages/0a/4d/222ef0b46443cf4cf46764d9c630f3fe4abaa7245be9417e56e9f52b8f65/rpds_py-0.30.0-cp313-cp313t-win_amd64.whl", hash = "sha256:f14fc5df50a716f7ece6a80b6c78bb35ea2ca47c499e422aa4463455dd96d56d", size = 225768, upload-time = "2025-11-30T20:23:25.908Z" }, + { url = "https://files.pythonhosted.org/packages/86/81/dad16382ebbd3d0e0328776d8fd7ca94220e4fa0798d1dc5e7da48cb3201/rpds_py-0.30.0-cp314-cp314-macosx_10_12_x86_64.whl", hash = "sha256:68f19c879420aa08f61203801423f6cd5ac5f0ac4ac82a2368a9fcd6a9a075e0", size = 362099, upload-time = "2025-11-30T20:23:27.316Z" }, + { url = "https://files.pythonhosted.org/packages/2b/60/19f7884db5d5603edf3c6bce35408f45ad3e97e10007df0e17dd57af18f8/rpds_py-0.30.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:ec7c4490c672c1a0389d319b3a9cfcd098dcdc4783991553c332a15acf7249be", size = 353192, upload-time = "2025-11-30T20:23:29.151Z" }, + { url = "https://files.pythonhosted.org/packages/bf/c4/76eb0e1e72d1a9c4703c69607cec123c29028bff28ce41588792417098ac/rpds_py-0.30.0-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f251c812357a3fed308d684a5079ddfb9d933860fc6de89f2b7ab00da481e65f", size = 384080, upload-time = "2025-11-30T20:23:30.785Z" }, + { url = "https://files.pythonhosted.org/packages/72/87/87ea665e92f3298d1b26d78814721dc39ed8d2c74b86e83348d6b48a6f31/rpds_py-0.30.0-cp314-cp314-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:ac98b175585ecf4c0348fd7b29c3864bda53b805c773cbf7bfdaffc8070c976f", size = 394841, upload-time = "2025-11-30T20:23:32.209Z" }, + { url = "https://files.pythonhosted.org/packages/77/ad/7783a89ca0587c15dcbf139b4a8364a872a25f861bdb88ed99f9b0dec985/rpds_py-0.30.0-cp314-cp314-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:3e62880792319dbeb7eb866547f2e35973289e7d5696c6e295476448f5b63c87", size = 516670, upload-time = "2025-11-30T20:23:33.742Z" }, + { url = "https://files.pythonhosted.org/packages/5b/3c/2882bdac942bd2172f3da574eab16f309ae10a3925644e969536553cb4ee/rpds_py-0.30.0-cp314-cp314-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:4e7fc54e0900ab35d041b0601431b0a0eb495f0851a0639b6ef90f7741b39a18", size = 408005, upload-time = "2025-11-30T20:23:35.253Z" }, + { url = "https://files.pythonhosted.org/packages/ce/81/9a91c0111ce1758c92516a3e44776920b579d9a7c09b2b06b642d4de3f0f/rpds_py-0.30.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:47e77dc9822d3ad616c3d5759ea5631a75e5809d5a28707744ef79d7a1bcfcad", size = 382112, upload-time = "2025-11-30T20:23:36.842Z" }, + { url = "https://files.pythonhosted.org/packages/cf/8e/1da49d4a107027e5fbc64daeab96a0706361a2918da10cb41769244b805d/rpds_py-0.30.0-cp314-cp314-manylinux_2_31_riscv64.whl", hash = "sha256:b4dc1a6ff022ff85ecafef7979a2c6eb423430e05f1165d6688234e62ba99a07", size = 399049, upload-time = "2025-11-30T20:23:38.343Z" }, + { url = "https://files.pythonhosted.org/packages/df/5a/7ee239b1aa48a127570ec03becbb29c9d5a9eb092febbd1699d567cae859/rpds_py-0.30.0-cp314-cp314-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:4559c972db3a360808309e06a74628b95eaccbf961c335c8fe0d590cf587456f", size = 415661, upload-time = "2025-11-30T20:23:40.263Z" }, + { url = "https://files.pythonhosted.org/packages/70/ea/caa143cf6b772f823bc7929a45da1fa83569ee49b11d18d0ada7f5ee6fd6/rpds_py-0.30.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:0ed177ed9bded28f8deb6ab40c183cd1192aa0de40c12f38be4d59cd33cb5c65", size = 565606, upload-time = "2025-11-30T20:23:42.186Z" }, + { url = "https://files.pythonhosted.org/packages/64/91/ac20ba2d69303f961ad8cf55bf7dbdb4763f627291ba3d0d7d67333cced9/rpds_py-0.30.0-cp314-cp314-musllinux_1_2_i686.whl", hash = "sha256:ad1fa8db769b76ea911cb4e10f049d80bf518c104f15b3edb2371cc65375c46f", size = 591126, upload-time = "2025-11-30T20:23:44.086Z" }, + { url = "https://files.pythonhosted.org/packages/21/20/7ff5f3c8b00c8a95f75985128c26ba44503fb35b8e0259d812766ea966c7/rpds_py-0.30.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:46e83c697b1f1c72b50e5ee5adb4353eef7406fb3f2043d64c33f20ad1c2fc53", size = 553371, upload-time = "2025-11-30T20:23:46.004Z" }, + { url = "https://files.pythonhosted.org/packages/72/c7/81dadd7b27c8ee391c132a6b192111ca58d866577ce2d9b0ca157552cce0/rpds_py-0.30.0-cp314-cp314-win32.whl", hash = "sha256:ee454b2a007d57363c2dfd5b6ca4a5d7e2c518938f8ed3b706e37e5d470801ed", size = 215298, upload-time = "2025-11-30T20:23:47.696Z" }, + { url = "https://files.pythonhosted.org/packages/3e/d2/1aaac33287e8cfb07aab2e6b8ac1deca62f6f65411344f1433c55e6f3eb8/rpds_py-0.30.0-cp314-cp314-win_amd64.whl", hash = "sha256:95f0802447ac2d10bcc69f6dc28fe95fdf17940367b21d34e34c737870758950", size = 228604, upload-time = "2025-11-30T20:23:49.501Z" }, + { url = "https://files.pythonhosted.org/packages/e8/95/ab005315818cc519ad074cb7784dae60d939163108bd2b394e60dc7b5461/rpds_py-0.30.0-cp314-cp314-win_arm64.whl", hash = "sha256:613aa4771c99f03346e54c3f038e4cc574ac09a3ddfb0e8878487335e96dead6", size = 222391, upload-time = "2025-11-30T20:23:50.96Z" }, + { url = "https://files.pythonhosted.org/packages/9e/68/154fe0194d83b973cdedcdcc88947a2752411165930182ae41d983dcefa6/rpds_py-0.30.0-cp314-cp314t-macosx_10_12_x86_64.whl", hash = "sha256:7e6ecfcb62edfd632e56983964e6884851786443739dbfe3582947e87274f7cb", size = 364868, upload-time = "2025-11-30T20:23:52.494Z" }, + { url = "https://files.pythonhosted.org/packages/83/69/8bbc8b07ec854d92a8b75668c24d2abcb1719ebf890f5604c61c9369a16f/rpds_py-0.30.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:a1d0bc22a7cdc173fedebb73ef81e07faef93692b8c1ad3733b67e31e1b6e1b8", size = 353747, upload-time = "2025-11-30T20:23:54.036Z" }, + { url = "https://files.pythonhosted.org/packages/ab/00/ba2e50183dbd9abcce9497fa5149c62b4ff3e22d338a30d690f9af970561/rpds_py-0.30.0-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0d08f00679177226c4cb8c5265012eea897c8ca3b93f429e546600c971bcbae7", size = 383795, upload-time = "2025-11-30T20:23:55.556Z" }, + { url = "https://files.pythonhosted.org/packages/05/6f/86f0272b84926bcb0e4c972262f54223e8ecc556b3224d281e6598fc9268/rpds_py-0.30.0-cp314-cp314t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:5965af57d5848192c13534f90f9dd16464f3c37aaf166cc1da1cae1fd5a34898", size = 393330, upload-time = "2025-11-30T20:23:57.033Z" }, + { url = "https://files.pythonhosted.org/packages/cb/e9/0e02bb2e6dc63d212641da45df2b0bf29699d01715913e0d0f017ee29438/rpds_py-0.30.0-cp314-cp314t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:9a4e86e34e9ab6b667c27f3211ca48f73dba7cd3d90f8d5b11be56e5dbc3fb4e", size = 518194, upload-time = "2025-11-30T20:23:58.637Z" }, + { url = "https://files.pythonhosted.org/packages/ee/ca/be7bca14cf21513bdf9c0606aba17d1f389ea2b6987035eb4f62bd923f25/rpds_py-0.30.0-cp314-cp314t-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:e5d3e6b26f2c785d65cc25ef1e5267ccbe1b069c5c21b8cc724efee290554419", size = 408340, upload-time = "2025-11-30T20:24:00.2Z" }, + { url = "https://files.pythonhosted.org/packages/c2/c7/736e00ebf39ed81d75544c0da6ef7b0998f8201b369acf842f9a90dc8fce/rpds_py-0.30.0-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:626a7433c34566535b6e56a1b39a7b17ba961e97ce3b80ec62e6f1312c025551", size = 383765, upload-time = "2025-11-30T20:24:01.759Z" }, + { url = "https://files.pythonhosted.org/packages/4a/3f/da50dfde9956aaf365c4adc9533b100008ed31aea635f2b8d7b627e25b49/rpds_py-0.30.0-cp314-cp314t-manylinux_2_31_riscv64.whl", hash = "sha256:acd7eb3f4471577b9b5a41baf02a978e8bdeb08b4b355273994f8b87032000a8", size = 396834, upload-time = "2025-11-30T20:24:03.687Z" }, + { url = "https://files.pythonhosted.org/packages/4e/00/34bcc2565b6020eab2623349efbdec810676ad571995911f1abdae62a3a0/rpds_py-0.30.0-cp314-cp314t-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:fe5fa731a1fa8a0a56b0977413f8cacac1768dad38d16b3a296712709476fbd5", size = 415470, upload-time = "2025-11-30T20:24:05.232Z" }, + { url = "https://files.pythonhosted.org/packages/8c/28/882e72b5b3e6f718d5453bd4d0d9cf8df36fddeb4ddbbab17869d5868616/rpds_py-0.30.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:74a3243a411126362712ee1524dfc90c650a503502f135d54d1b352bd01f2404", size = 565630, upload-time = "2025-11-30T20:24:06.878Z" }, + { url = "https://files.pythonhosted.org/packages/3b/97/04a65539c17692de5b85c6e293520fd01317fd878ea1995f0367d4532fb1/rpds_py-0.30.0-cp314-cp314t-musllinux_1_2_i686.whl", hash = "sha256:3e8eeb0544f2eb0d2581774be4c3410356eba189529a6b3e36bbbf9696175856", size = 591148, upload-time = "2025-11-30T20:24:08.445Z" }, + { url = "https://files.pythonhosted.org/packages/85/70/92482ccffb96f5441aab93e26c4d66489eb599efdcf96fad90c14bbfb976/rpds_py-0.30.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:dbd936cde57abfee19ab3213cf9c26be06d60750e60a8e4dd85d1ab12c8b1f40", size = 556030, upload-time = "2025-11-30T20:24:10.956Z" }, + { url = "https://files.pythonhosted.org/packages/20/53/7c7e784abfa500a2b6b583b147ee4bb5a2b3747a9166bab52fec4b5b5e7d/rpds_py-0.30.0-cp314-cp314t-win32.whl", hash = "sha256:dc824125c72246d924f7f796b4f63c1e9dc810c7d9e2355864b3c3a73d59ade0", size = 211570, upload-time = "2025-11-30T20:24:12.735Z" }, + { url = "https://files.pythonhosted.org/packages/d0/02/fa464cdfbe6b26e0600b62c528b72d8608f5cc49f96b8d6e38c95d60c676/rpds_py-0.30.0-cp314-cp314t-win_amd64.whl", hash = "sha256:27f4b0e92de5bfbc6f86e43959e6edd1425c33b5e69aab0984a72047f2bcf1e3", size = 226532, upload-time = "2025-11-30T20:24:14.634Z" }, +] + [[package]] name = "ruff" version = "0.11.11" @@ -500,6 +801,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/e0/f9/0595336914c5619e5f28a1fb793285925a8cd4b432c9da0a987836c7f822/shellingham-1.5.4-py2.py3-none-any.whl", hash = "sha256:7ecfff8f2fd72616f7481040475a65b2bf8af90a56c89140852d1120324e8686", size = 9755, upload-time = "2023-10-24T04:13:38.866Z" }, ] +[[package]] +name = "smmap" +version = "5.0.3" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/1f/ea/49c993d6dfdd7338c9b1000a0f36817ed7ec84577ae2e52f890d1a4ff909/smmap-5.0.3.tar.gz", hash = "sha256:4d9debb8b99007ae47165abc08670bd74cb74b5227dda7f643eccc4e9eb5642c", size = 22506, upload-time = "2026-03-09T03:43:26.1Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/c1/d4/59e74daffcb57a07668852eeeb6035af9f32cbfd7a1d2511f17d2fe6a738/smmap-5.0.3-py3-none-any.whl", hash = "sha256:c106e05d5a61449cf6ba9a1e650227ecfb141590d2a98412103ff35d89fc7b2f", size = 24390, upload-time = "2026-03-09T03:43:24.361Z" }, +] + [[package]] name = "sniffio" version = "1.3.1" @@ -509,6 +819,14 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/e9/44/75a9c9421471a6c4805dbf2356f7c181a29c1879239abab1ea2cc8f38b40/sniffio-1.3.1-py3-none-any.whl", hash = "sha256:2f6da418d1f1e0fddd844478f41680e794e6051915791a034ff65e5f100525a2", size = 10235, upload-time = "2024-02-25T23:20:01.196Z" }, ] +[[package]] +name = "sseclient-py" +version = "1.9.0" +source = { registry = "https://pypi.org/simple" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/4d/2e/59920f7d66b7f9932a3d83dd0ec53fab001be1e058bf582606fe414a5198/sseclient_py-1.9.0-py3-none-any.whl", hash = "sha256:340062b1587fc2880892811e2ab5b176d98ef3eee98b3672ff3a3ba1e8ed0f6f", size = 8351, upload-time = "2026-01-02T23:39:30.995Z" }, +] + [[package]] name = "structlog" version = "25.3.0" @@ -527,6 +845,27 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/e5/30/643397144bfbfec6f6ef821f36f33e57d35946c44a2352d3c9f0ae847619/tenacity-9.1.2-py3-none-any.whl", hash = "sha256:f77bf36710d8b73a50b2dd155c97b870017ad21afe6ab300326b0371b3b05138", size = 28248, upload-time = "2025-04-02T08:25:07.678Z" }, ] +[[package]] +name = "text-unidecode" +version = "1.3" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/ab/e2/e9a00f0ccb71718418230718b3d900e71a5d16e701a3dae079a21e9cd8f8/text-unidecode-1.3.tar.gz", hash = "sha256:bad6603bb14d279193107714b288be206cac565dfa49aa5b105294dd5c4aab93", size = 76885, upload-time = "2019-08-30T21:36:45.405Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/a6/a5/c0b6468d3824fe3fde30dbb5e1f687b291608f9473681bbf7dabbf5a87d7/text_unidecode-1.3-py2.py3-none-any.whl", hash = "sha256:1311f10e8b895935241623731c2ba64f4c455287888b18189350b67134a822e8", size = 78154, upload-time = "2019-08-30T21:37:03.543Z" }, +] + +[[package]] +name = "tqdm" +version = "4.67.3" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "colorama", marker = "sys_platform == 'win32'" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/09/a9/6ba95a270c6f1fbcd8dac228323f2777d886cb206987444e4bce66338dd4/tqdm-4.67.3.tar.gz", hash = "sha256:7d825f03f89244ef73f1d4ce193cb1774a8179fd96f31d7e1dcde62092b960bb", size = 169598, upload-time = "2026-02-03T17:35:53.048Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/16/e1/3079a9ff9b8e11b846c6ac5c8b5bfb7ff225eee721825310c91b3b50304f/tqdm-4.67.3-py3-none-any.whl", hash = "sha256:ee1e4c0e59148062281c49d80b25b67771a127c85fc9676d3be5f243206826bf", size = 78374, upload-time = "2026-02-03T17:35:50.982Z" }, +] + [[package]] name = "typer" version = "0.16.0" @@ -583,3 +922,67 @@ sdist = { url = "https://files.pythonhosted.org/packages/8a/78/16493d9c386d8e60e wheels = [ { url = "https://files.pythonhosted.org/packages/6b/11/cc635220681e93a0183390e26485430ca2c7b5f9d33b15c74c2861cb8091/urllib3-2.4.0-py3-none-any.whl", hash = "sha256:4e16665048960a0900c702d4a66415956a584919c03361cac9f1df5c5dd7e813", size = 128680, upload-time = "2025-04-10T15:23:37.377Z" }, ] + +[[package]] +name = "wrapt" +version = "2.1.2" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/2e/64/925f213fdcbb9baeb1530449ac71a4d57fc361c053d06bf78d0c5c7cd80c/wrapt-2.1.2.tar.gz", hash = "sha256:3996a67eecc2c68fd47b4e3c564405a5777367adfd9b8abb58387b63ee83b21e", size = 81678, upload-time = "2026-03-06T02:53:25.134Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/4c/b6/1db817582c49c7fcbb7df6809d0f515af29d7c2fbf57eb44c36e98fb1492/wrapt-2.1.2-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:ff2aad9c4cda28a8f0653fc2d487596458c2a3f475e56ba02909e950a9efa6a9", size = 61255, upload-time = "2026-03-06T02:52:45.663Z" }, + { url = "https://files.pythonhosted.org/packages/a2/16/9b02a6b99c09227c93cd4b73acc3678114154ec38da53043c0ddc1fba0dc/wrapt-2.1.2-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:6433ea84e1cfacf32021d2a4ee909554ade7fd392caa6f7c13f1f4bf7b8e8748", size = 61848, upload-time = "2026-03-06T02:53:48.728Z" }, + { url = "https://files.pythonhosted.org/packages/af/aa/ead46a88f9ec3a432a4832dfedb84092fc35af2d0ba40cd04aea3889f247/wrapt-2.1.2-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:c20b757c268d30d6215916a5fa8461048d023865d888e437fab451139cad6c8e", size = 121433, upload-time = "2026-03-06T02:54:40.328Z" }, + { url = "https://files.pythonhosted.org/packages/3a/9f/742c7c7cdf58b59085a1ee4b6c37b013f66ac33673a7ef4aaed5e992bc33/wrapt-2.1.2-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:79847b83eb38e70d93dc392c7c5b587efe65b3e7afcc167aa8abd5d60e8761c8", size = 123013, upload-time = "2026-03-06T02:53:26.58Z" }, + { url = "https://files.pythonhosted.org/packages/e8/44/2c3dd45d53236b7ed7c646fcf212251dc19e48e599debd3926b52310fafb/wrapt-2.1.2-cp312-cp312-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:f8fba1bae256186a83d1875b2b1f4e2d1242e8fac0f58ec0d7e41b26967b965c", size = 117326, upload-time = "2026-03-06T02:53:11.547Z" }, + { url = "https://files.pythonhosted.org/packages/74/e2/b17d66abc26bd96f89dec0ecd0ef03da4a1286e6ff793839ec431b9fae57/wrapt-2.1.2-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:e3d3b35eedcf5f7d022291ecd7533321c4775f7b9cd0050a31a68499ba45757c", size = 121444, upload-time = "2026-03-06T02:54:09.5Z" }, + { url = "https://files.pythonhosted.org/packages/3c/62/e2977843fdf9f03daf1586a0ff49060b1b2fc7ff85a7ea82b6217c1ae36e/wrapt-2.1.2-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:6f2c5390460de57fa9582bc8a1b7a6c86e1a41dfad74c5225fc07044c15cc8d1", size = 116237, upload-time = "2026-03-06T02:54:03.884Z" }, + { url = "https://files.pythonhosted.org/packages/88/dd/27fc67914e68d740bce512f11734aec08696e6b17641fef8867c00c949fc/wrapt-2.1.2-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:7dfa9f2cf65d027b951d05c662cc99ee3bd01f6e4691ed39848a7a5fffc902b2", size = 120563, upload-time = "2026-03-06T02:53:20.412Z" }, + { url = "https://files.pythonhosted.org/packages/ec/9f/b750b3692ed2ef4705cb305bd68858e73010492b80e43d2a4faa5573cbe7/wrapt-2.1.2-cp312-cp312-win32.whl", hash = "sha256:eba8155747eb2cae4a0b913d9ebd12a1db4d860fc4c829d7578c7b989bd3f2f0", size = 58198, upload-time = "2026-03-06T02:53:37.732Z" }, + { url = "https://files.pythonhosted.org/packages/8e/b2/feecfe29f28483d888d76a48f03c4c4d8afea944dbee2b0cd3380f9df032/wrapt-2.1.2-cp312-cp312-win_amd64.whl", hash = "sha256:1c51c738d7d9faa0b3601708e7e2eda9bf779e1b601dce6c77411f2a1b324a63", size = 60441, upload-time = "2026-03-06T02:52:47.138Z" }, + { url = "https://files.pythonhosted.org/packages/44/e1/e328f605d6e208547ea9fd120804fcdec68536ac748987a68c47c606eea8/wrapt-2.1.2-cp312-cp312-win_arm64.whl", hash = "sha256:c8e46ae8e4032792eb2f677dbd0d557170a8e5524d22acc55199f43efedd39bf", size = 58836, upload-time = "2026-03-06T02:53:22.053Z" }, + { url = "https://files.pythonhosted.org/packages/4c/7a/d936840735c828b38d26a854e85d5338894cda544cb7a85a9d5b8b9c4df7/wrapt-2.1.2-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:787fd6f4d67befa6fe2abdffcbd3de2d82dfc6fb8a6d850407c53332709d030b", size = 61259, upload-time = "2026-03-06T02:53:41.922Z" }, + { url = "https://files.pythonhosted.org/packages/5e/88/9a9b9a90ac8ca11c2fdb6a286cb3a1fc7dd774c00ed70929a6434f6bc634/wrapt-2.1.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:4bdf26e03e6d0da3f0e9422fd36bcebf7bc0eeb55fdf9c727a09abc6b9fe472e", size = 61851, upload-time = "2026-03-06T02:52:48.672Z" }, + { url = "https://files.pythonhosted.org/packages/03/a9/5b7d6a16fd6533fed2756900fc8fc923f678179aea62ada6d65c92718c00/wrapt-2.1.2-cp313-cp313-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:bbac24d879aa22998e87f6b3f481a5216311e7d53c7db87f189a7a0266dafffb", size = 121446, upload-time = "2026-03-06T02:54:14.013Z" }, + { url = "https://files.pythonhosted.org/packages/45/bb/34c443690c847835cfe9f892be78c533d4f32366ad2888972c094a897e39/wrapt-2.1.2-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:16997dfb9d67addc2e3f41b62a104341e80cac52f91110dece393923c0ebd5ca", size = 123056, upload-time = "2026-03-06T02:54:10.829Z" }, + { url = "https://files.pythonhosted.org/packages/93/b9/ff205f391cb708f67f41ea148545f2b53ff543a7ac293b30d178af4d2271/wrapt-2.1.2-cp313-cp313-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:162e4e2ba7542da9027821cb6e7c5e068d64f9a10b5f15512ea28e954893a267", size = 117359, upload-time = "2026-03-06T02:53:03.623Z" }, + { url = "https://files.pythonhosted.org/packages/1f/3d/1ea04d7747825119c3c9a5e0874a40b33594ada92e5649347c457d982805/wrapt-2.1.2-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:f29c827a8d9936ac320746747a016c4bc66ef639f5cd0d32df24f5eacbf9c69f", size = 121479, upload-time = "2026-03-06T02:53:45.844Z" }, + { url = "https://files.pythonhosted.org/packages/78/cc/ee3a011920c7a023b25e8df26f306b2484a531ab84ca5c96260a73de76c0/wrapt-2.1.2-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:a9dd9813825f7ecb018c17fd147a01845eb330254dff86d3b5816f20f4d6aaf8", size = 116271, upload-time = "2026-03-06T02:54:46.356Z" }, + { url = "https://files.pythonhosted.org/packages/98/fd/e5ff7ded41b76d802cf1191288473e850d24ba2e39a6ec540f21ae3b57cb/wrapt-2.1.2-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:6f8dbdd3719e534860d6a78526aafc220e0241f981367018c2875178cf83a413", size = 120573, upload-time = "2026-03-06T02:52:50.163Z" }, + { url = "https://files.pythonhosted.org/packages/47/c5/242cae3b5b080cd09bacef0591691ba1879739050cc7c801ff35c8886b66/wrapt-2.1.2-cp313-cp313-win32.whl", hash = "sha256:5c35b5d82b16a3bc6e0a04349b606a0582bc29f573786aebe98e0c159bc48db6", size = 58205, upload-time = "2026-03-06T02:53:47.494Z" }, + { url = "https://files.pythonhosted.org/packages/12/69/c358c61e7a50f290958809b3c61ebe8b3838ea3e070d7aac9814f95a0528/wrapt-2.1.2-cp313-cp313-win_amd64.whl", hash = "sha256:f8bc1c264d8d1cf5b3560a87bbdd31131573eb25f9f9447bb6252b8d4c44a3a1", size = 60452, upload-time = "2026-03-06T02:53:30.038Z" }, + { url = "https://files.pythonhosted.org/packages/8e/66/c8a6fcfe321295fd8c0ab1bd685b5a01462a9b3aa2f597254462fc2bc975/wrapt-2.1.2-cp313-cp313-win_arm64.whl", hash = "sha256:3beb22f674550d5634642c645aba4c72a2c66fb185ae1aebe1e955fae5a13baf", size = 58842, upload-time = "2026-03-06T02:52:52.114Z" }, + { url = "https://files.pythonhosted.org/packages/da/55/9c7052c349106e0b3f17ae8db4b23a691a963c334de7f9dbd60f8f74a831/wrapt-2.1.2-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:0fc04bc8664a8bc4c8e00b37b5355cffca2535209fba1abb09ae2b7c76ddf82b", size = 63075, upload-time = "2026-03-06T02:53:19.108Z" }, + { url = "https://files.pythonhosted.org/packages/09/a8/ce7b4006f7218248dd71b7b2b732d0710845a0e49213b18faef64811ffef/wrapt-2.1.2-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:a9b9d50c9af998875a1482a038eb05755dfd6fe303a313f6a940bb53a83c3f18", size = 63719, upload-time = "2026-03-06T02:54:33.452Z" }, + { url = "https://files.pythonhosted.org/packages/e4/e5/2ca472e80b9e2b7a17f106bb8f9df1db11e62101652ce210f66935c6af67/wrapt-2.1.2-cp313-cp313t-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:2d3ff4f0024dd224290c0eabf0240f1bfc1f26363431505fb1b0283d3b08f11d", size = 152643, upload-time = "2026-03-06T02:52:42.721Z" }, + { url = "https://files.pythonhosted.org/packages/36/42/30f0f2cefca9d9cbf6835f544d825064570203c3e70aa873d8ae12e23791/wrapt-2.1.2-cp313-cp313t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:3278c471f4468ad544a691b31bb856374fbdefb7fee1a152153e64019379f015", size = 158805, upload-time = "2026-03-06T02:54:25.441Z" }, + { url = "https://files.pythonhosted.org/packages/bb/67/d08672f801f604889dcf58f1a0b424fe3808860ede9e03affc1876b295af/wrapt-2.1.2-cp313-cp313t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:a8914c754d3134a3032601c6984db1c576e6abaf3fc68094bb8ab1379d75ff92", size = 145990, upload-time = "2026-03-06T02:53:57.456Z" }, + { url = "https://files.pythonhosted.org/packages/68/a7/fd371b02e73babec1de6ade596e8cd9691051058cfdadbfd62a5898f3295/wrapt-2.1.2-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:ff95d4264e55839be37bafe1536db2ab2de19da6b65f9244f01f332b5286cfbf", size = 155670, upload-time = "2026-03-06T02:54:55.309Z" }, + { url = "https://files.pythonhosted.org/packages/86/2d/9fe0095dfdb621009f40117dcebf41d7396c2c22dca6eac779f4c007b86c/wrapt-2.1.2-cp313-cp313t-musllinux_1_2_riscv64.whl", hash = "sha256:76405518ca4e1b76fbb1b9f686cff93aebae03920cc55ceeec48ff9f719c5f67", size = 144357, upload-time = "2026-03-06T02:54:24.092Z" }, + { url = "https://files.pythonhosted.org/packages/0e/b6/ec7b4a254abbe4cde9fa15c5d2cca4518f6b07d0f1b77d4ee9655e30280e/wrapt-2.1.2-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:c0be8b5a74c5824e9359b53e7e58bef71a729bacc82e16587db1c4ebc91f7c5a", size = 150269, upload-time = "2026-03-06T02:53:31.268Z" }, + { url = "https://files.pythonhosted.org/packages/6e/6b/2fabe8ebf148f4ee3c782aae86a795cc68ffe7d432ef550f234025ce0cfa/wrapt-2.1.2-cp313-cp313t-win32.whl", hash = "sha256:f01277d9a5fc1862f26f7626da9cf443bebc0abd2f303f41c5e995b15887dabd", size = 59894, upload-time = "2026-03-06T02:54:15.391Z" }, + { url = "https://files.pythonhosted.org/packages/ca/fb/9ba66fc2dedc936de5f8073c0217b5d4484e966d87723415cc8262c5d9c2/wrapt-2.1.2-cp313-cp313t-win_amd64.whl", hash = "sha256:84ce8f1c2104d2f6daa912b1b5b039f331febfeee74f8042ad4e04992bd95c8f", size = 63197, upload-time = "2026-03-06T02:54:41.943Z" }, + { url = "https://files.pythonhosted.org/packages/c0/1c/012d7423c95d0e337117723eb8ecf73c622ce15a97847e84cf3f8f26cd7e/wrapt-2.1.2-cp313-cp313t-win_arm64.whl", hash = "sha256:a93cd767e37faeddbe07d8fc4212d5cba660af59bdb0f6372c93faaa13e6e679", size = 60363, upload-time = "2026-03-06T02:54:48.093Z" }, + { url = "https://files.pythonhosted.org/packages/39/25/e7ea0b417db02bb796182a5316398a75792cd9a22528783d868755e1f669/wrapt-2.1.2-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:1370e516598854e5b4366e09ce81e08bfe94d42b0fd569b88ec46cc56d9164a9", size = 61418, upload-time = "2026-03-06T02:53:55.706Z" }, + { url = "https://files.pythonhosted.org/packages/ec/0f/fa539e2f6a770249907757eaeb9a5ff4deb41c026f8466c1c6d799088a9b/wrapt-2.1.2-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:6de1a3851c27e0bd6a04ca993ea6f80fc53e6c742ee1601f486c08e9f9b900a9", size = 61914, upload-time = "2026-03-06T02:52:53.37Z" }, + { url = "https://files.pythonhosted.org/packages/53/37/02af1867f5b1441aaeda9c82deed061b7cd1372572ddcd717f6df90b5e93/wrapt-2.1.2-cp314-cp314-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:de9f1a2bbc5ac7f6012ec24525bdd444765a2ff64b5985ac6e0692144838542e", size = 120417, upload-time = "2026-03-06T02:54:30.74Z" }, + { url = "https://files.pythonhosted.org/packages/c3/b7/0138a6238c8ba7476c77cf786a807f871672b37f37a422970342308276e7/wrapt-2.1.2-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:970d57ed83fa040d8b20c52fe74a6ae7e3775ae8cff5efd6a81e06b19078484c", size = 122797, upload-time = "2026-03-06T02:54:51.539Z" }, + { url = "https://files.pythonhosted.org/packages/e1/ad/819ae558036d6a15b7ed290d5b14e209ca795dd4da9c58e50c067d5927b0/wrapt-2.1.2-cp314-cp314-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:3969c56e4563c375861c8df14fa55146e81ac11c8db49ea6fb7f2ba58bc1ff9a", size = 117350, upload-time = "2026-03-06T02:54:37.651Z" }, + { url = "https://files.pythonhosted.org/packages/8b/2d/afc18dc57a4600a6e594f77a9ae09db54f55ba455440a54886694a84c71b/wrapt-2.1.2-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:57d7c0c980abdc5f1d98b11a2aa3bb159790add80258c717fa49a99921456d90", size = 121223, upload-time = "2026-03-06T02:54:35.221Z" }, + { url = "https://files.pythonhosted.org/packages/b9/5b/5ec189b22205697bc56eb3b62aed87a1e0423e9c8285d0781c7a83170d15/wrapt-2.1.2-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:776867878e83130c7a04237010463372e877c1c994d449ca6aaafeab6aab2586", size = 116287, upload-time = "2026-03-06T02:54:19.654Z" }, + { url = "https://files.pythonhosted.org/packages/f7/2d/f84939a7c9b5e6cdd8a8d0f6a26cabf36a0f7e468b967720e8b0cd2bdf69/wrapt-2.1.2-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:fab036efe5464ec3291411fabb80a7a39e2dd80bae9bcbeeca5087fdfa891e19", size = 119593, upload-time = "2026-03-06T02:54:16.697Z" }, + { url = "https://files.pythonhosted.org/packages/0b/fe/ccd22a1263159c4ac811ab9374c061bcb4a702773f6e06e38de5f81a1bdc/wrapt-2.1.2-cp314-cp314-win32.whl", hash = "sha256:e6ed62c82ddf58d001096ae84ce7f833db97ae2263bff31c9b336ba8cfe3f508", size = 58631, upload-time = "2026-03-06T02:53:06.498Z" }, + { url = "https://files.pythonhosted.org/packages/65/0a/6bd83be7bff2e7efaac7b4ac9748da9d75a34634bbbbc8ad077d527146df/wrapt-2.1.2-cp314-cp314-win_amd64.whl", hash = "sha256:467e7c76315390331c67073073d00662015bb730c566820c9ca9b54e4d67fd04", size = 60875, upload-time = "2026-03-06T02:53:50.252Z" }, + { url = "https://files.pythonhosted.org/packages/6c/c0/0b3056397fe02ff80e5a5d72d627c11eb885d1ca78e71b1a5c1e8c7d45de/wrapt-2.1.2-cp314-cp314-win_arm64.whl", hash = "sha256:da1f00a557c66225d53b095a97eace0fc5349e3bfda28fa34ffae238978ee575", size = 59164, upload-time = "2026-03-06T02:53:59.128Z" }, + { url = "https://files.pythonhosted.org/packages/71/ed/5d89c798741993b2371396eb9d4634f009ff1ad8a6c78d366fe2883ea7a6/wrapt-2.1.2-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:62503ffbc2d3a69891cf29beeaccdb4d5e0a126e2b6a851688d4777e01428dbb", size = 63163, upload-time = "2026-03-06T02:52:54.873Z" }, + { url = "https://files.pythonhosted.org/packages/c6/8c/05d277d182bf36b0a13d6bd393ed1dec3468a25b59d01fba2dd70fe4d6ae/wrapt-2.1.2-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:c7e6cd120ef837d5b6f860a6ea3745f8763805c418bb2f12eeb1fa6e25f22d22", size = 63723, upload-time = "2026-03-06T02:52:56.374Z" }, + { url = "https://files.pythonhosted.org/packages/f4/27/6c51ec1eff4413c57e72d6106bb8dec6f0c7cdba6503d78f0fa98767bcc9/wrapt-2.1.2-cp314-cp314t-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:3769a77df8e756d65fbc050333f423c01ae012b4f6731aaf70cf2bef61b34596", size = 152652, upload-time = "2026-03-06T02:53:23.79Z" }, + { url = "https://files.pythonhosted.org/packages/db/4c/d7dd662d6963fc7335bfe29d512b02b71cdfa23eeca7ab3ac74a67505deb/wrapt-2.1.2-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:a76d61a2e851996150ba0f80582dd92a870643fa481f3b3846f229de88caf044", size = 158807, upload-time = "2026-03-06T02:53:35.742Z" }, + { url = "https://files.pythonhosted.org/packages/b4/4d/1e5eea1a78d539d346765727422976676615814029522c76b87a95f6bcdd/wrapt-2.1.2-cp314-cp314t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:6f97edc9842cf215312b75fe737ee7c8adda75a89979f8e11558dfff6343cc4b", size = 146061, upload-time = "2026-03-06T02:52:57.574Z" }, + { url = "https://files.pythonhosted.org/packages/89/bc/62cabea7695cd12a288023251eeefdcb8465056ddaab6227cb78a2de005b/wrapt-2.1.2-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:4006c351de6d5007aa33a551f600404ba44228a89e833d2fadc5caa5de8edfbf", size = 155667, upload-time = "2026-03-06T02:53:39.422Z" }, + { url = "https://files.pythonhosted.org/packages/e9/99/6f2888cd68588f24df3a76572c69c2de28287acb9e1972bf0c83ce97dbc1/wrapt-2.1.2-cp314-cp314t-musllinux_1_2_riscv64.whl", hash = "sha256:a9372fc3639a878c8e7d87e1556fa209091b0a66e912c611e3f833e2c4202be2", size = 144392, upload-time = "2026-03-06T02:54:22.41Z" }, + { url = "https://files.pythonhosted.org/packages/40/51/1dfc783a6c57971614c48e361a82ca3b6da9055879952587bc99fe1a7171/wrapt-2.1.2-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:3144b027ff30cbd2fca07c0a87e67011adb717eb5f5bd8496325c17e454257a3", size = 150296, upload-time = "2026-03-06T02:54:07.848Z" }, + { url = "https://files.pythonhosted.org/packages/6c/38/cbb8b933a0201076c1f64fc42883b0023002bdc14a4964219154e6ff3350/wrapt-2.1.2-cp314-cp314t-win32.whl", hash = "sha256:3b8d15e52e195813efe5db8cec156eebe339aaf84222f4f4f051a6c01f237ed7", size = 60539, upload-time = "2026-03-06T02:54:00.594Z" }, + { url = "https://files.pythonhosted.org/packages/82/dd/e5176e4b241c9f528402cebb238a36785a628179d7d8b71091154b3e4c9e/wrapt-2.1.2-cp314-cp314t-win_amd64.whl", hash = "sha256:08ffa54146a7559f5b8df4b289b46d963a8e74ed16ba3687f99896101a3990c5", size = 63969, upload-time = "2026-03-06T02:54:39Z" }, + { url = "https://files.pythonhosted.org/packages/5c/99/79f17046cf67e4a95b9987ea129632ba8bcec0bc81f3fb3d19bdb0bd60cd/wrapt-2.1.2-cp314-cp314t-win_arm64.whl", hash = "sha256:72aaa9d0d8e4ed0e2e98019cea47a21f823c9dd4b43c7b77bba6679ffcca6a00", size = 60554, upload-time = "2026-03-06T02:53:14.132Z" }, + { url = "https://files.pythonhosted.org/packages/1a/c7/8528ac2dfa2c1e6708f647df7ae144ead13f0a31146f43c7264b4942bf12/wrapt-2.1.2-py3-none-any.whl", hash = "sha256:b8fd6fa2b2c4e7621808f8c62e8317f4aae56e59721ad933bac5239d913cf0e8", size = 43993, upload-time = "2026-03-06T02:53:12.905Z" }, +]