Refactor#72
Open
SaiKireeti2905 wants to merge 90 commits into
Open
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request introduces a comprehensive modernization of the CogFlow codebase, focusing on replacing legacy plugin-based architecture with a cleaner, Pydantic-based configuration system and adding extensive test coverage across core modules. The changes enhance type safety, developer experience, and maintainability by removing deprecated .ini files, consolidating network utilities, and establishing a robust exception framework.
Changes:
- Introduced centralized Pydantic-based configuration management replacing legacy
.inifiles - Added comprehensive test coverage for storage, serving, orchestration, network, models, datasets, components, exceptions, and common utilities
- Removed legacy plugin system files (PluginManager, setup.py, docker-compose.yaml) and migrated to modern pyproject.toml-based project structure
Reviewed changes
Copilot reviewed 55 out of 60 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_storage.py | New test suite for MinIO client creation and S3 endpoint parsing |
| tests/test_serving.py | New test suite for ServingManager Kubernetes operations |
| tests/test_orchestration.py | New test suite for Kubeflow Pipelines orchestration layer |
| tests/test_network.py | New test suite for HTTP request utilities with retry logic |
| tests/test_models.py | Comprehensive test suite for ModelManager MLflow operations |
| tests/test_datasets.py | New test suite for DatasetManager operations |
| tests/test_exceptions.py | New test suite for CogFlow exception framework |
| tests/test_components.py | New test suite for component registry and YAML parsing |
| tests/test_common.py | New test suite for UUID, serialization, and Kubernetes utilities |
| tests/conftest.py | New pytest configuration with autouse fixtures for mocking |
| pytest.ini | Updated pytest configuration with asyncio mode |
| pyproject.toml | Migrated from setup.py to modern pyproject.toml with dynamic versioning |
| cogflow/utils/storage.py | New MinIO client utility with URL parsing |
| cogflow/utils/network.py | New centralized network request utilities with tenacity retry logic |
| cogflow/utils/logging.py | New standardized logger configuration |
| cogflow/utils/imports.py | New lazy import utility for heavy dependencies |
| cogflow/utils/exceptions.py | New comprehensive exception hierarchy and error handler |
| cogflow/utils/common.py | New common utilities for UUID, serialization, and Kubernetes helpers |
| cogflow/api.py | Commented out legacy API exposure in preparation for modular refactor |
| setup.py | Removed in favor of pyproject.toml |
| docker-compose.yaml | Removed local development compose stack |
| cogflow/v2/init.py | Removed KFP v2 globals exposure |
| cogflow/util.py | Removed and refactored into cogflow/utils/network.py and cogflow/utils/common.py |
| tests/test_plugin_manager.py | Removed legacy plugin system tests |
| tests/test_notebook_plugin.py | Removed legacy notebook plugin tests |
| tests/test_mlflowplugin.py | Removed legacy MLflow plugin tests |
| tests/test_kubeflowplugin.py | Removed legacy Kubeflow plugin tests |
| tests/test_dataset_plugin.py | Removed legacy dataset plugin tests |
| tests/test.ipynb | Removed test notebook file |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Models: - Add get_run(), get_run_artifact_uri(), list_artifacts_grouped() Serving: - Add model_type param to deploy_model(), update_model(), _process_isvc() - Add AsyncServingManager using kubernetes_asyncio for non-blocking ops - Expose async_deploy_model, async_list_models, etc. at module level Pipelines: - Add pipeline query functions: list_pipelines_by_name, get_pipeline_task_sequence*, get_task_structure_by_task_id - Add pod inspection: get_pod_definition, get_pod_events, get_pod_logs, get_inference_service_logs (both sync and async variants) - Add list_all_kfp_runs for paginated KFP run listing Dependencies: - Add kubernetes_asyncio to pyproject.toml
- Fix _parse_run() experiment_id extraction to iterate resource_references - Fix get_pipeline_task_sequence_by_pipeline_id to filter runs by pipeline_id - Fix _list_runs_by_pipeline_id to filter by pipeline_id - Fix get_pod_logs to raise CogflowConnectionError instead of generic Exception - Fix async config loading to use config_exception.ConfigException - Add model_type docstrings to deploy_model and update_model
- test_async_serving.py: 9 tests covering AsyncServingManager CRUD, deploy_model with/without model_type, list_models, and module exports - test_inspection.py: 13 tests covering pipeline inspection, pagination, pipeline_id filtering, workflow node parsing, and pod inspection error handling - test_serving.py: 2 new tests for sync deploy_model with model_type - Fix circular import in async_serving.py via lazy ServingManager loading - Fix test_init_loads_k8s_config_once to reset common._k8s_loaded_flag for proper test isolation
async_serving.py: - Align async create_isvc spec with sync ServingManager.create_isvc: use serviceAccountName/minReplicas in predictor, place transformer at top-level spec.transformer only when transformer_env present - Align async update_model transformer patch path with sync version (spec.transformer instead of spec.predictor.transformer) - Remove unreachable ApiException handler in update_model since get_isvc already wraps to CogflowConnectionError; allow that exception to propagate inspection.py: - get_pod_logs / async_get_pod_logs now return List[str] instead of JSON-encoded string, eliminating double-encoding in get_inference_service_logs / async_get_inference_service_logs - Remove unused imports (timedelta, CogflowErrorHandler, CogflowPipelineError) test_models.py: - Add 9 new tests covering get_run, get_run_artifact_uri, and list_artifacts_grouped (success, normalization, failure wrapping, directory grouping)
Release cogflow v2.0.1b1
Split build-and-release into build → publish-pypi → github-release jobs. Publish job uses OIDC trusted publishing (no API tokens) and runs only on v* tags via a `pypi` GitHub environment.
…ve-names deploy_llm: derive isvc_name / served_model_name from hf_model_id
Pre-release docker publishes (beta / rc / alpha) now build linux/amd64
only; stable releases stay multi-arch. Rationale: the pre-release
channels are consumed exclusively by our own dev-cluster CD, which
runs on amd64 nodes, so the arm64 layer was built and pushed but never
pulled. Arm64 emulation under QEMU was also the slow step of the job;
skipping it trims ~8–10 min per beta release.
Implementation: the tags step now also emits `platforms`, which the
build-push step consumes and which gates the QEMU setup via `contains(.,
'arm64')`. Stable still emits `linux/amd64,linux/arm64`.
Other improvements bundled:
- Add buildx GitHub-Actions cache (`type=gha`, `mode=max`) so runs
that only change the cogflow version or a requirements pin reuse
earlier layers instead of rebuilding from scratch.
- Bump docker/build-push-action v5 → v6 (matches Cog-Engine cd.yml).
- Bump Node-20 actions to Node-24-ready versions ahead of the 2026-06-02
runner switch:
actions/checkout v4 → v5
actions/setup-python v5 → v6
actions/upload-artifact v4 → v5
actions/download-artifact v4 → v5
- Tidy the docker-publish `if:` block and its comment — same condition,
clearer structure, explains why `always()` is still required for the
workflow_dispatch path.
Avoids a hardcoded platform list drifting from the step that gates the QEMU setup (Copilot review on the paired Cog-Engine PR #275).
…amd64-only-for-prereleases release.yml: amd64-only for pre-releases, cache, action bumps
Introduces the high-level LLM entrypoint — ``cogflow.serving.serve_llm``
and its async counterpart — that callers (notebook users or
Cog-Engine's POST /models-serving) can reach for to deploy an LLM and
have it appear in the CogFlow catalog in one shot.
Pairs with a new ``cogflow.models.register_llm_catalog_entry`` helper
that opens an MLflow run, tags it with LLM metadata (type, source,
hf_model_id, note), and POSTs a catalog entry to ``{API_PATH}/models/log``
— the same pattern ``log_model`` already uses for classical artifacts.
The POST is best-effort (warn-on-failure, like log_model) so a blip on
the catalog service doesn't abort a successful MLflow run.
Why: today, notebook users calling ``cogflow.serving.deploy_llm`` get
a KServe InferenceService but no catalog row — the LLM never shows up
in ``GET /models``. Cog-Engine's ``_deploy_llm`` papers over this by
inserting a row itself, but uses ``uuid4()`` as the id and skips
MLflow entirely. That breaks the long-standing invariant
``model_info.id == MLflow run_id`` and makes ``GET /models/{id}``
500 on LLM rows (because ``cogflow_models.get_run(model_id)`` has no
run to return). Putting the registration in cogflow fixes both paths
with one implementation.
What's in the box:
- ``ModelManager.register_llm_catalog_entry`` (cogflow/core/models.py):
opens the MLflow run, sets tags, best-effort POSTs to the catalog,
returns the run_id. Mirrors ``log_model``'s POST pattern (same URL,
same headers shape, same warn-on-failure).
- ``ServingManager.serve_llm`` / ``AsyncServingManager.serve_llm``
(cogflow/core/serving.py, cogflow/core/async_serving.py): resolves
storage_uri shorthand (``hf_model_id`` → ``hf://{id}``), derives
``isvc_name`` / ``served_model_name`` via the existing
``derive_llm_names`` helper, registers the catalog entry, merges
``model_id``/``model_type``/``hf_model_id`` annotations onto the
caller-provided ones, then delegates to ``deploy_llm`` for the ISVC
create. Returns {run_id, isvc_name, served_model_name, isvc}.
- Async path keeps the catalog step sync (MLflow tracking and the
catalog POST both block) and only the ISVC create is awaited —
preserves behavioural parity with the existing deploy_llm pair.
- ``deploy_llm`` itself is unchanged: kept as a low-level primitive so
callers that want just the ISVC (no catalog) can still reach for it.
- Tests cover: end-to-end serve_llm, hf:// shorthand → hf_model_id
extraction, caller-annotation merging (model_id overrides), missing
source → validation error, explicit isvc_name preserved, the
register_llm_catalog_entry POST payload shape, and the warn-only
behaviour on catalog POST failure. Async variant has a matching
regression test for the ISVC annotations + export list.
Catalog-side (Cog-Engine) will follow in a paired PR: switch
``_deploy_llm`` to call ``async_serve_llm``, drop its manual DB
insert, add a belt-and-braces MLflow-run fallback in
``get_model_by_id`` for legacy rows that predate this change.
- Identity annotations (``model_type``, ``hf_model_id``) are now set
authoritatively in both sync and async ``serve_llm`` — caller values
don't override what actually got deployed. Unrelated caller
annotations still pass through. Prevents ISVC metadata drifting from
the catalog entry.
- Reject ``storage_uri`` / ``hf_model_id`` mismatches up-front: if
both are provided and the ``hf://`` URI encodes a different id, raise
``CogflowValidationError``. Without this, we'd catalog/tag one model
while deploying another.
- ``register_llm_catalog_entry`` applies caller ``extra_tags`` *before*
the reserved identity tags so the reserved ones always win on a key
collision. A caller passing ``extra_tags={"type": "foo"}`` can no
longer desync the MLflow run from the catalog entry.
- ``model_version`` payload: use 0 (the "unknown registry version"
sentinel already used by ``log_model``) instead of 1. HF-sourced LLMs
have no MLflow registry version; 1 was arbitrary.
Tests:
- test_serve_llm_identity_annotations_are_authoritative: identity keys
overridden regardless of caller annotation values.
- test_serve_llm_storage_uri_hf_id_mismatch_raises: mismatch surfaces
as a typed error.
- test_register_llm_catalog_entry_extra_tags_cannot_override_reserved:
set_tag call order proves reserved keys win on collision.
serve_llm already validates hf_model_id when it arrives as part of a ``hf://`` storage_uri, but a caller passing only ``hf_model_id=`` got no validation before the catalog step. A value like ``"Qwen/Qwen2.5-Coder-7B-Instruct/"`` (trailing slash) would open an MLflow run + POST to /models/log with the unnormalized id, and only then get rejected inside deploy_llm — leaving an orphan run behind. Route a bare ``hf_model_id`` through the same ``_extract_hf_model_id`` helper the URI path already uses. Also normalize both sides of the ``storage_uri`` / ``hf_model_id`` mismatch check so ``"Org/Name/"`` compares equal to ``"Org/Name"`` (preventing false-negative mismatches). Applies to both sync and async serve_llm. Tests: - test_serve_llm_normalizes_dirty_hf_model_id: trailing/leading slashes are stripped; catalog and annotations see the canonical form. - test_serve_llm_invalid_hf_model_id_rejected_before_catalog: empty / whitespace / slash-only ids raise before register_llm_catalog_entry runs — no orphan run / catalog side effects.
If a caller accidentally passes ``hf_model_id="hf://Org/Name"``, we'd build ``"hf://hf://Org/Name"`` and ``_extract_hf_model_id`` would accept it (it only strips one ``hf://`` and doesn't reject embedded schemes), so we'd deploy / tag a bogus URI. Strip one optional ``hf://`` prefix up-front in both sync and async paths. Covers the bare-id branch and the mismatch-check branch with a single line. Test asserts the canonical id flows into tags, annotations, and the vLLM ``--model_id`` arg.
Four changes, all test/defensive hygiene: 1. Stub MLflow tracking-server health check at conftest-import time so tests importing ``cogflow.core.models`` don't hit real HTTP. The ``_models = ModelManager()`` singleton is built at module import, so an autouse fixture runs too late. Approach: swap the network helper for a no-op, force-import ``cogflow.core.models`` so its singleton initializes against the stub, then restore the real helper so ``tests/test_network.py`` still tests genuine behaviour. 2. Clean up ``_patch_llm_catalog`` test helper: remove the unused ``serving_module`` parameter and update the docstring to reference the actual patch target (``cogflow.core.models``, not ``cogflow.models``). All 10 call sites updated. 3. ``register_llm_catalog_entry`` now normalizes ``hf_model_id`` on entry — strip an optional ``hf://`` prefix, then route through ``_extract_hf_model_id`` to reject whitespace / empty / slash-only inputs. ``serve_llm`` already did this; direct callers (notebook users reaching for the helper) now get the same protection so we can't end up with an orphan MLflow run on bad input regardless of entry point. Full suite: 243 pass.
Round 3 of fixes. Previous single-layer hf:// strip was incomplete: an input like ``hf_model_id='hf://hf://Org/Name'`` would strip to ``hf://Org/Name``, pass ``_extract_hf_model_id`` (which only strips one prefix and didn't reject embedded schemes), and reach vLLM as an invalid ``--model_id``. Same flaw on ``storage_uri='hf://hf://…'``. Fix at the DRY layer: ``_extract_hf_model_id`` now rejects any extracted id containing ``://``. Every caller (``_build_llm_predictor``, sync+async ``serve_llm``, direct ``register_llm_catalog_entry``) inherits the validation. Real HF ids are ``org/name`` or ``name`` — they never contain ``://``, so the check is safe. Also closes a related consistency gap: rejecting ``hf_model_id`` paired with a non-``hf://`` ``storage_uri`` (e.g. ``s3://``). We'd otherwise tag the catalog as HuggingFace-sourced while deploying from s3 — catalog diverges from the actual artifact. MLflow-backed LLMs use ``storage_uri`` alone; ``hf_model_id`` stays reserved for the HF path. Comment on the single-layer ``hf://`` strip in ``serve_llm`` updated to note why there's no loop (the validator catches deeper nesting). Tests: - test_extract_hf_model_id_rejects_embedded_scheme: DRY-layer coverage for double-prefix, s3://-smuggled, file://-smuggled inputs. - test_serve_llm_rejects_double_hf_scheme_prefix: serve_llm paths (bare id, storage_uri, non-hf scheme) all reject pre-catalog. - test_serve_llm_rejects_hf_id_with_non_hf_storage_uri: non-HF storage with hf_model_id is rejected as inconsistent.
…m-with-catalog-registration serve_llm: one-call LLM deploy with MLflow-backed catalog registration
Root cause of the 2.0.1b10 docker-publish failure: after PR #87 made pre-release builds amd64-only, the total runtime dropped from ~17 min (with arm64 emulation) to ~3 min, removing an unintended "propagation buffer" between the ``Wait for PyPI simple index`` step and the ``pip install cogflow==$VERSION`` inside the docker build. The wait step saw a CDN edge that already had the new version; pip inside buildx hit a different edge still serving the stale cache. Cleanest fix: skip PyPI entirely on the tag-push path. The ``build`` job already produces the exact wheel PyPI gets and uploads it as an artifact. ``docker-publish`` now downloads that artifact into ``./dist/`` and the Dockerfile installs from the wheel directly. Upsides: - No race: the image uses the same bytes PyPI got. - ~4 min saved on every tag-push release (no more 4-min poll loop). - Works even during a transient PyPI outage. ``workflow_dispatch`` keeps the PyPI path (no artifact — the ``build`` job is skipped for that trigger). The wait step stays only for that path; dispatch targets already-published versions, so propagation is usually a non-issue anyway. The Dockerfile falls back to ``pip install cogflow==$VERSION`` when ``./dist/`` is empty, so: - Local ``docker build`` with no wheel still works (PyPI install). - workflow_dispatch with empty ``dist/`` uses PyPI as today. - Tag push with wheel present uses the local wheel. Glob check uses POSIX-sh-compatible ``[ -n "$(ls …)" ]`` since the ``python:3.10-slim`` base image's ``/bin/sh`` is dash (no ``compgen``). Verified locally: ``docker build`` with empty ``dist/`` falls back to the PyPI install path and produces a working image (``import cogflow`` smoke test passes).
Previous ``cogflow-*.whl`` glob matched any version, so a stale wheel
from a different build left in the context would silently install the
wrong cogflow and contradict the ``COGFLOW_VERSION`` build arg.
Now use ``cogflow-\${COGFLOW_VERSION}-*.whl`` and assert the match
count:
- exactly 1 → install that wheel
- more than 1 → hard-fail rather than guess which to install
- 0 → fall back to PyPI (workflow_dispatch / local no-artifact path)
Uses ``find ... -name`` instead of ``ls ...`` so the version can be
interpolated safely, stays POSIX-sh portable (slim base is dash), and
keeps the fail-fast semantic.
Verified locally: empty dist → PyPI fallback; single matching wheel →
install it; two matching wheels → fail; stale wheel from another
version → PyPI fallback (no silent wrong-version install).
Copilot review on PR #89.
Previous ``COPY dist/ /tmp/cogflow-dist/`` + later ``rm -rf`` leaves the wheel bytes committed in an intermediate image layer — Docker layers are immutable so the later ``rm`` can't reclaim the space. On the tag-push path the wheel is downloaded into ``dist/`` specifically to be consumed, so there's zero reason for it to end up in the final image. Switch to a BuildKit bind mount (``RUN --mount=type=bind,source=dist``): the wheel is mounted read-only for the duration of the RUN only, nothing gets committed to a layer. Added ``# syntax=docker/dockerfile:1`` at the top of the file — the ``--mount`` feature requires Dockerfile frontend 1.2+. ``docker/build-push-action@v6`` already uses BuildKit so the release workflow doesn't need any other change. Also dropped the now-unnecessary ``rm -rf /tmp/cogflow-dist`` cleanup (the mount disappears automatically on RUN exit) and the matching cleanup in the error branch. Verified locally: ``DOCKER_BUILDKIT=1 docker build`` with empty ``dist/`` still falls through to the PyPI install path and produces a working image (``import cogflow`` smoke test passes).
Real fix for the cog-api -> cogflow -> cog-api deadlock hit in prod
by POST /apidev/models-serving. When cog-api's async handler awaits
async_serve_llm, the inner register_llm_catalog_entry called a
blocking requests.post back to {API_PATH}/models/log — and since that
URL resolves to the same uvicorn worker, the event loop accepting
the callback was the one blocked on it. 15s × 3 retries, catalog
POST lost, ISVC orphan.
Unlike asyncio.to_thread (a workaround), this gives the callback
path real async I/O:
- Add httpx dep (>=0.25,<1). aiohttp was available transitively but
httpx matches the codebase's sync requests style 1:1 (json=, .is_success,
.raise_for_status, etc.) and lets us drop it in without rethinking
multipart/streaming semantics.
- Add cogflow.utils.network.make_async_post_request — mirror of the
sync make_post_request but httpx-backed. tenacity @Retry works on
async functions since 6.2, so we keep the same 3-attempt backoff
policy.
- Extract two helpers in ModelManager so sync and async registration
paths don't duplicate logic:
- _prepare_llm_catalog_run: validate hf_model_id, open/close the
MLflow run, set tags, return (run_id, start_time_ms, description,
normalized_hf_model_id).
- _build_llm_catalog_payload: shape the /models/log POST body
(url, model_dict, headers, resolved_user).
- Sync register_llm_catalog_entry stays behaviourally identical,
now built on the helpers.
- New async_register_llm_catalog_entry: same semantics as the sync
version, uses make_async_post_request for the backend call.
- async_serve_llm awaits async_register_llm_catalog_entry instead
of the sync method.
MLflow's tracking client is still sync — no first-party async MLflow
exists and MLflow server is a separate service, so the brief
event-loop block it causes doesn't self-deadlock. If/when an async
MLflow client appears, _prepare_llm_catalog_run is the single place
to swap it in.
Sync callers (notebook users of cogflow.models.register_llm_catalog_entry,
the sync serve_llm path) are unchanged.
1. Add httpx to requirements.txt — CI installs from this file (not
pyproject.toml), so without it the import in network.py fails at
test time.
2. Mirror sync make_post_request body conditional: change
'if data is not None' to 'if data:' so empty dict {} is treated
as no body (matches sync contract).
3. Restrict @Retry to httpx.HTTPError via retry_if_exception_type so
logic errors (e.g. ValueError from a bogus json() call) fail fast
on the first attempt instead of being retried 3x. Docstring already
claimed this; now the behaviour matches.
4. Add four async tests in test_network.py mirroring the sync POST
tests:
- success with json body
- empty body sends no json (parity with sync)
- 4xx triggers raise_for_status -> httpx.HTTPError -> RetryError
after 3 attempts (verifies retry policy + count)
- non-httpx exception (ValueError from .json()) raised on first
attempt (verifies the retry restriction)
Cogflow repo guideline: public (non-underscore) function/method docstrings must not name internal implementation dependencies — consumers shouldn't couple their mental model to what's under the hood. Only the public docstrings touched in this PR are changed; pre-existing mentions elsewhere in the module are out of scope. - make_async_post_request: drop httpx / tenacity mentions; abstract to "async HTTP client" and "retries up to N times". - register_llm_catalog_entry: drop MLflow mentions; use "tracking run", "tracking backend", "catalog service". Also reshape the source-tag note so it no longer spells the fallback string. - async_register_llm_catalog_entry: same; describe behaviour in terms of "async HTTP client" and "tracking service", not MLflow or uvicorn. - async_serve_llm: drop httpx / uvicorn / MLflow mentions; use "async HTTP client", "process", "tracking service". Private helpers (_prepare_llm_catalog_run, _build_llm_catalog_payload) and inline code comments keep the concrete lib names for maintainer clarity — same guideline's carve-out.
…llm-catalog-httpx async_serve_llm: use httpx for /models/log callback
- Pin syntax frontend to docker/dockerfile:1.4 instead of bare "1". Bare "1" floats to whatever is latest; 1.4 is the minimum that supports RUN --mount=type=bind and lets builds stay reproducible. - Add --no-cache-dir to both wheel-install and PyPI-fallback pip calls (plus the shap/xgboost install above) so pip's download/ build cache doesn't persist into an image layer. protobuf install already uses it. - Capture the find result once instead of running it twice. A single variable (wheel_paths) now backs both the count derivation and the install-target / stderr dump, eliminating the count-vs-path divergence risk Copilot flagged. Semantics unchanged: 0 / 1 / >1 branches behave the same.
…use-build-artifact-no-pypi-race release: install cogflow from build-job artifact, not PyPI
LLM pods are large, GPU-pinned, and almost always run as a single replica. Knative/Serverless mode requires the new revision to reach Ready before the old one is torn down — but on a one-GPU node the new pod can never schedule (the GPU is held by the old pod), so the rollout deadlocks. Defaulting to RawDeployment with ``strategy.type: Recreate`` makes the old pod terminate first, freeing the GPU for the new one. - Adds ``ServingManager._apply_raw_deployment_defaults`` static helper. Returns the (possibly augmented) predictor and the merged annotations dict. Caller-supplied ``serving.kserve.io/deploymentMode`` wins, so opting back into Serverless is a single annotation away. - ``deploymentStrategy`` is rejected by the KServe webhook for Serverless ISVCs, so it's only injected when the resolved deployment mode is RawDeployment. - Wired into both ``ServingManager.deploy_llm`` (sync) and ``AsyncServingManager._deploy_llm`` (async) so the two surfaces stay consistent.
Stage N+1 of the Cog-Engine async migration. ``cogflow/utils/network.py`` already had ``make_async_post_request`` (added in PR #91 to fix the cog-api → cogflow → cog-api self-deadlock). This PR completes the async surface so any callsite that's already inside an ``async def`` can issue any HTTP verb without blocking the event loop. New helpers, all using ``httpx.AsyncClient`` and matching their sync counterparts call-for-call: - ``make_async_get_request`` — same path-param + pagination semantics as the sync ``make_get_request``. Tenacity retries on ``httpx.HTTPError`` only (not on JSON-decode bugs). - ``make_async_delete_request`` — 200/202/204 → True; same retry-on-transport-error policy. - ``make_async_patch_request`` — JSON body when ``data`` is set, raises ``RetryError`` after 3 retries on a non-success. - ``make_async_get_request_raw`` — returns parsed JSON or ``None`` on transport failure (mirrors the sync helper's actual behaviour; the ``Response`` annotation in the original docstring was wrong). - ``make_async_health_check_request`` — no retry; any 2xx → True, anything else (including transport error) → False. ``make_async_get_request_stream`` deliberately omitted: streaming with ``httpx.AsyncClient`` requires the client to outlive the iteration, which doesn't fit the "single helper, return a Response" shape. Callers needing async streaming can use the ``AsyncExitStack`` + handed-off-flag pattern from ``S3Utils.fetch_s3_object_stream`` in Cog-Engine. Tests: extended ``_FakeAsyncClient`` to cover GET/DELETE/PATCH per-verb captured args + per-call response sequences for pagination. 17 new tests covering success, failure (RetryError after 3 retries on transport errors), and the no-retry-on-transport-failure shape of the raw + health-check helpers. ``test_make_async_get_request_pagination`` exercises the multi-page path with two distinct responses returned in sequence.
Add async_get_dataset / async_get_prometheus_dataset / async_delete_dataset on DatasetManager, plus an async_register_component function. All route through the async network helpers landed earlier in this PR so async callers (e.g. an async FastAPI route) no longer block the event loop on the registry round-trip. The MinIO upload prelude in async_register_component still runs sync inside asyncio.to_thread because the MinIO SDK has no async client; that consumes a worker thread but doesn't block the loop. Skipped on purpose: - async_register_dataset: needs multipart files= which the new async POST helper doesn't accept yet. - async_download_dataset: streaming response, needs the AsyncExitStack pattern (separate follow-up). Tests cover happy + error paths for each new public method.
network.py - All async helpers now instantiate httpx.AsyncClient with follow_redirects=True to match sync requests' default. Without it the async path silently diverged on HTTP->HTTPS hops and any trailing-slash redirect (Copilot inline comments #1, #2, #5). - POST and PATCH success check switched from response.is_success (2xx only) to "not response.is_error" (<400) so behaviour matches sync requests.Response.ok exactly. Also closes the implicit-None return path on a 3xx end-state for PATCH that Copilot flagged on the original code (#4). - GET pagination loop switched from "not is_success" to "is_error" for the same reason (3xx pages should not silently break the loop). components.py - Dropped the unused make_async_get_request import (#6). - Tightened docstring: only the MinIO upload runs inside asyncio.to_thread, not the YAML parse or file read (#7). - async_register_component now raises CogflowComponentRegistryError when the registry check returns None (transport failure) instead of silently treating it as "no existing component" and risking a duplicate create during an intermittent registry outage (#8). tests - FakeAsyncResponse now exposes is_error derived from is_success so existing fixtures keep working without per-test changes. - New test_async_clients_follow_redirects pins down that every async helper instantiates httpx.AsyncClient with follow_redirects=True, so this divergence cannot be reintroduced. - New test_async_register_component_registry_unreachable covers the None-from-make_async_get_request_raw path. Copilot inline comment #3 (DELETE retries on 4xx/5xx through raise_for_status -> HTTPStatusError) is intentional: the sync make_delete_request behaves identically (RequestException is the base class, bare @Retry catches everything). The PR description will be updated separately to remove the misleading "transport-error-only" wording for DELETE.
network: async mirrors for GET/DELETE/PATCH/raw/health
…w-deployment-default Default LLM ISVCs to RawDeployment + Recreate strategy
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces major improvements to the CogFlow project, focusing on codebase modernization, configuration management, and developer ergonomics. The most significant changes include a complete rewrite of the
README.mdfor clarity, the introduction of a centralized and environment-driven configuration system using Pydantic, the addition of a lazy-loading mechanism for submodules, and the removal of legacy plugin configuration files and unused Kafka consumer code. Pre-commit and linting configurations were also updated for consistency and to better exclude virtual environments and test directories.Documentation and Developer Experience:
README.mdwith a clear, modern overview of CogFlow, its architecture, and its core features, replacing the previous function-by-function listing with a more conceptual and user-focused introduction.Configuration Management:
config.pymodule using Pydantic for centralized, environment-variable-driven configuration, replacing the old.ini-based approach. This improves maintainability, type safety, and integration with modern Python tooling.cogflow_config.inifile, consolidating all configuration into the new Python-based system.Codebase Modernization and Simplification:
_lazy.pymodule implementing a robust lazy-loading mechanism forcogflowsubmodules, improving import performance and reducing unnecessary dependencies at startup.kafka/consumer.py, reducing codebase clutter and potential maintenance overhead.core/pipelines/__init__.pyto provide a unified, user-facing API for pipeline and component management, improving discoverability and consistency.api.pyin preparation for a more modern, modular API approach.Linting and Pre-commit Configuration:
.pre-commit-config.yamlto more comprehensively exclude virtual environments, examples, wrappers, and test directories from linting and formatting hooks, ensuring that only relevant source code is checked. [1] [2] [3] [4].pylintrcto allow more attributes per class, increasedmax-args, and added new disables, reflecting evolving codebase complexity and style needs. (Fceca9eaL1)