Skip to content

Refactor#72

Open
SaiKireeti2905 wants to merge 90 commits into
developfrom
refactor
Open

Refactor#72
SaiKireeti2905 wants to merge 90 commits into
developfrom
refactor

Conversation

@SaiKireeti2905
Copy link
Copy Markdown
Collaborator

This pull request introduces major improvements to the CogFlow project, focusing on codebase modernization, configuration management, and developer ergonomics. The most significant changes include a complete rewrite of the README.md for clarity, the introduction of a centralized and environment-driven configuration system using Pydantic, the addition of a lazy-loading mechanism for submodules, and the removal of legacy plugin configuration files and unused Kafka consumer code. Pre-commit and linting configurations were also updated for consistency and to better exclude virtual environments and test directories.

Documentation and Developer Experience:

  • Revamped README.md with a clear, modern overview of CogFlow, its architecture, and its core features, replacing the previous function-by-function listing with a more conceptual and user-focused introduction.

Configuration Management:

  • Introduced a new config.py module using Pydantic for centralized, environment-variable-driven configuration, replacing the old .ini-based approach. This improves maintainability, type safety, and integration with modern Python tooling.
  • Removed the obsolete cogflow_config.ini file, consolidating all configuration into the new Python-based system.

Codebase Modernization and Simplification:

  • Added a _lazy.py module implementing a robust lazy-loading mechanism for cogflow submodules, improving import performance and reducing unnecessary dependencies at startup.
  • Deleted the unused Kafka consumer implementation in kafka/consumer.py, reducing codebase clutter and potential maintenance overhead.
  • Updated core/pipelines/__init__.py to provide a unified, user-facing API for pipeline and component management, improving discoverability and consistency.
  • Commented out legacy API exposure in api.py in preparation for a more modern, modular API approach.

Linting and Pre-commit Configuration:

  • Enhanced .pre-commit-config.yaml to more comprehensively exclude virtual environments, examples, wrappers, and test directories from linting and formatting hooks, ensuring that only relevant source code is checked. [1] [2] [3] [4]
  • Updated .pylintrc to allow more attributes per class, increased max-args, and added new disables, reflecting evolving codebase complexity and style needs. (Fceca9eaL1)
  • Removed redundant test execution step from the CI workflow, likely to avoid duplication with pre-commit or other test runners.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces a comprehensive modernization of the CogFlow codebase, focusing on replacing legacy plugin-based architecture with a cleaner, Pydantic-based configuration system and adding extensive test coverage across core modules. The changes enhance type safety, developer experience, and maintainability by removing deprecated .ini files, consolidating network utilities, and establishing a robust exception framework.

Changes:

  • Introduced centralized Pydantic-based configuration management replacing legacy .ini files
  • Added comprehensive test coverage for storage, serving, orchestration, network, models, datasets, components, exceptions, and common utilities
  • Removed legacy plugin system files (PluginManager, setup.py, docker-compose.yaml) and migrated to modern pyproject.toml-based project structure

Reviewed changes

Copilot reviewed 55 out of 60 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/test_storage.py New test suite for MinIO client creation and S3 endpoint parsing
tests/test_serving.py New test suite for ServingManager Kubernetes operations
tests/test_orchestration.py New test suite for Kubeflow Pipelines orchestration layer
tests/test_network.py New test suite for HTTP request utilities with retry logic
tests/test_models.py Comprehensive test suite for ModelManager MLflow operations
tests/test_datasets.py New test suite for DatasetManager operations
tests/test_exceptions.py New test suite for CogFlow exception framework
tests/test_components.py New test suite for component registry and YAML parsing
tests/test_common.py New test suite for UUID, serialization, and Kubernetes utilities
tests/conftest.py New pytest configuration with autouse fixtures for mocking
pytest.ini Updated pytest configuration with asyncio mode
pyproject.toml Migrated from setup.py to modern pyproject.toml with dynamic versioning
cogflow/utils/storage.py New MinIO client utility with URL parsing
cogflow/utils/network.py New centralized network request utilities with tenacity retry logic
cogflow/utils/logging.py New standardized logger configuration
cogflow/utils/imports.py New lazy import utility for heavy dependencies
cogflow/utils/exceptions.py New comprehensive exception hierarchy and error handler
cogflow/utils/common.py New common utilities for UUID, serialization, and Kubernetes helpers
cogflow/api.py Commented out legacy API exposure in preparation for modular refactor
setup.py Removed in favor of pyproject.toml
docker-compose.yaml Removed local development compose stack
cogflow/v2/init.py Removed KFP v2 globals exposure
cogflow/util.py Removed and refactored into cogflow/utils/network.py and cogflow/utils/common.py
tests/test_plugin_manager.py Removed legacy plugin system tests
tests/test_notebook_plugin.py Removed legacy notebook plugin tests
tests/test_mlflowplugin.py Removed legacy MLflow plugin tests
tests/test_kubeflowplugin.py Removed legacy Kubeflow plugin tests
tests/test_dataset_plugin.py Removed legacy dataset plugin tests
tests/test.ipynb Removed test notebook file

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

SaiKireeti2905 and others added 12 commits January 22, 2026 07:41
Models:
- Add get_run(), get_run_artifact_uri(), list_artifacts_grouped()

Serving:
- Add model_type param to deploy_model(), update_model(), _process_isvc()
- Add AsyncServingManager using kubernetes_asyncio for non-blocking ops
- Expose async_deploy_model, async_list_models, etc. at module level

Pipelines:
- Add pipeline query functions: list_pipelines_by_name,
  get_pipeline_task_sequence*, get_task_structure_by_task_id
- Add pod inspection: get_pod_definition, get_pod_events, get_pod_logs,
  get_inference_service_logs (both sync and async variants)
- Add list_all_kfp_runs for paginated KFP run listing

Dependencies:
- Add kubernetes_asyncio to pyproject.toml
- Fix _parse_run() experiment_id extraction to iterate resource_references
- Fix get_pipeline_task_sequence_by_pipeline_id to filter runs by pipeline_id
- Fix _list_runs_by_pipeline_id to filter by pipeline_id
- Fix get_pod_logs to raise CogflowConnectionError instead of generic Exception
- Fix async config loading to use config_exception.ConfigException
- Add model_type docstrings to deploy_model and update_model
- test_async_serving.py: 9 tests covering AsyncServingManager CRUD,
  deploy_model with/without model_type, list_models, and module exports
- test_inspection.py: 13 tests covering pipeline inspection, pagination,
  pipeline_id filtering, workflow node parsing, and pod inspection error
  handling
- test_serving.py: 2 new tests for sync deploy_model with model_type
- Fix circular import in async_serving.py via lazy ServingManager loading
- Fix test_init_loads_k8s_config_once to reset common._k8s_loaded_flag
  for proper test isolation
async_serving.py:
- Align async create_isvc spec with sync ServingManager.create_isvc:
  use serviceAccountName/minReplicas in predictor, place transformer
  at top-level spec.transformer only when transformer_env present
- Align async update_model transformer patch path with sync version
  (spec.transformer instead of spec.predictor.transformer)
- Remove unreachable ApiException handler in update_model since
  get_isvc already wraps to CogflowConnectionError; allow that
  exception to propagate

inspection.py:
- get_pod_logs / async_get_pod_logs now return List[str] instead of
  JSON-encoded string, eliminating double-encoding in
  get_inference_service_logs / async_get_inference_service_logs
- Remove unused imports (timedelta, CogflowErrorHandler,
  CogflowPipelineError)

test_models.py:
- Add 9 new tests covering get_run, get_run_artifact_uri, and
  list_artifacts_grouped (success, normalization, failure wrapping,
  directory grouping)
Split build-and-release into build → publish-pypi → github-release jobs.
Publish job uses OIDC trusted publishing (no API tokens) and runs only on
v* tags via a `pypi` GitHub environment.
alishalbaf and others added 2 commits April 21, 2026 13:36
alishalbaf and others added 11 commits April 21, 2026 14:04
Pre-release docker publishes (beta / rc / alpha) now build linux/amd64
only; stable releases stay multi-arch. Rationale: the pre-release
channels are consumed exclusively by our own dev-cluster CD, which
runs on amd64 nodes, so the arm64 layer was built and pushed but never
pulled. Arm64 emulation under QEMU was also the slow step of the job;
skipping it trims ~8–10 min per beta release.

Implementation: the tags step now also emits `platforms`, which the
build-push step consumes and which gates the QEMU setup via `contains(.,
'arm64')`. Stable still emits `linux/amd64,linux/arm64`.

Other improvements bundled:

- Add buildx GitHub-Actions cache (`type=gha`, `mode=max`) so runs
  that only change the cogflow version or a requirements pin reuse
  earlier layers instead of rebuilding from scratch.
- Bump docker/build-push-action v5 → v6 (matches Cog-Engine cd.yml).
- Bump Node-20 actions to Node-24-ready versions ahead of the 2026-06-02
  runner switch:
    actions/checkout        v4 → v5
    actions/setup-python    v5 → v6
    actions/upload-artifact v4 → v5
    actions/download-artifact v4 → v5
- Tidy the docker-publish `if:` block and its comment — same condition,
  clearer structure, explains why `always()` is still required for the
  workflow_dispatch path.
Avoids a hardcoded platform list drifting from the step that gates
the QEMU setup (Copilot review on the paired Cog-Engine PR #275).
…amd64-only-for-prereleases

release.yml: amd64-only for pre-releases, cache, action bumps
Introduces the high-level LLM entrypoint — ``cogflow.serving.serve_llm``
and its async counterpart — that callers (notebook users or
Cog-Engine's POST /models-serving) can reach for to deploy an LLM and
have it appear in the CogFlow catalog in one shot.

Pairs with a new ``cogflow.models.register_llm_catalog_entry`` helper
that opens an MLflow run, tags it with LLM metadata (type, source,
hf_model_id, note), and POSTs a catalog entry to ``{API_PATH}/models/log``
— the same pattern ``log_model`` already uses for classical artifacts.
The POST is best-effort (warn-on-failure, like log_model) so a blip on
the catalog service doesn't abort a successful MLflow run.

Why: today, notebook users calling ``cogflow.serving.deploy_llm`` get
a KServe InferenceService but no catalog row — the LLM never shows up
in ``GET /models``. Cog-Engine's ``_deploy_llm`` papers over this by
inserting a row itself, but uses ``uuid4()`` as the id and skips
MLflow entirely. That breaks the long-standing invariant
``model_info.id == MLflow run_id`` and makes ``GET /models/{id}``
500 on LLM rows (because ``cogflow_models.get_run(model_id)`` has no
run to return). Putting the registration in cogflow fixes both paths
with one implementation.

What's in the box:

- ``ModelManager.register_llm_catalog_entry`` (cogflow/core/models.py):
  opens the MLflow run, sets tags, best-effort POSTs to the catalog,
  returns the run_id. Mirrors ``log_model``'s POST pattern (same URL,
  same headers shape, same warn-on-failure).
- ``ServingManager.serve_llm`` / ``AsyncServingManager.serve_llm``
  (cogflow/core/serving.py, cogflow/core/async_serving.py): resolves
  storage_uri shorthand (``hf_model_id`` → ``hf://{id}``), derives
  ``isvc_name`` / ``served_model_name`` via the existing
  ``derive_llm_names`` helper, registers the catalog entry, merges
  ``model_id``/``model_type``/``hf_model_id`` annotations onto the
  caller-provided ones, then delegates to ``deploy_llm`` for the ISVC
  create. Returns {run_id, isvc_name, served_model_name, isvc}.
- Async path keeps the catalog step sync (MLflow tracking and the
  catalog POST both block) and only the ISVC create is awaited —
  preserves behavioural parity with the existing deploy_llm pair.
- ``deploy_llm`` itself is unchanged: kept as a low-level primitive so
  callers that want just the ISVC (no catalog) can still reach for it.
- Tests cover: end-to-end serve_llm, hf:// shorthand → hf_model_id
  extraction, caller-annotation merging (model_id overrides), missing
  source → validation error, explicit isvc_name preserved, the
  register_llm_catalog_entry POST payload shape, and the warn-only
  behaviour on catalog POST failure. Async variant has a matching
  regression test for the ISVC annotations + export list.

Catalog-side (Cog-Engine) will follow in a paired PR: switch
``_deploy_llm`` to call ``async_serve_llm``, drop its manual DB
insert, add a belt-and-braces MLflow-run fallback in
``get_model_by_id`` for legacy rows that predate this change.
- Identity annotations (``model_type``, ``hf_model_id``) are now set
  authoritatively in both sync and async ``serve_llm`` — caller values
  don't override what actually got deployed. Unrelated caller
  annotations still pass through. Prevents ISVC metadata drifting from
  the catalog entry.
- Reject ``storage_uri`` / ``hf_model_id`` mismatches up-front: if
  both are provided and the ``hf://`` URI encodes a different id, raise
  ``CogflowValidationError``. Without this, we'd catalog/tag one model
  while deploying another.
- ``register_llm_catalog_entry`` applies caller ``extra_tags`` *before*
  the reserved identity tags so the reserved ones always win on a key
  collision. A caller passing ``extra_tags={"type": "foo"}`` can no
  longer desync the MLflow run from the catalog entry.
- ``model_version`` payload: use 0 (the "unknown registry version"
  sentinel already used by ``log_model``) instead of 1. HF-sourced LLMs
  have no MLflow registry version; 1 was arbitrary.

Tests:
- test_serve_llm_identity_annotations_are_authoritative: identity keys
  overridden regardless of caller annotation values.
- test_serve_llm_storage_uri_hf_id_mismatch_raises: mismatch surfaces
  as a typed error.
- test_register_llm_catalog_entry_extra_tags_cannot_override_reserved:
  set_tag call order proves reserved keys win on collision.
serve_llm already validates hf_model_id when it arrives as part of a
``hf://`` storage_uri, but a caller passing only ``hf_model_id=`` got
no validation before the catalog step. A value like
``"Qwen/Qwen2.5-Coder-7B-Instruct/"`` (trailing slash) would open an
MLflow run + POST to /models/log with the unnormalized id, and only
then get rejected inside deploy_llm — leaving an orphan run behind.

Route a bare ``hf_model_id`` through the same ``_extract_hf_model_id``
helper the URI path already uses. Also normalize both sides of the
``storage_uri`` / ``hf_model_id`` mismatch check so ``"Org/Name/"``
compares equal to ``"Org/Name"`` (preventing false-negative mismatches).
Applies to both sync and async serve_llm.

Tests:
- test_serve_llm_normalizes_dirty_hf_model_id: trailing/leading slashes
  are stripped; catalog and annotations see the canonical form.
- test_serve_llm_invalid_hf_model_id_rejected_before_catalog: empty /
  whitespace / slash-only ids raise before register_llm_catalog_entry
  runs — no orphan run / catalog side effects.
If a caller accidentally passes ``hf_model_id="hf://Org/Name"``, we'd
build ``"hf://hf://Org/Name"`` and ``_extract_hf_model_id`` would
accept it (it only strips one ``hf://`` and doesn't reject embedded
schemes), so we'd deploy / tag a bogus URI.

Strip one optional ``hf://`` prefix up-front in both sync and async
paths. Covers the bare-id branch and the mismatch-check branch with
a single line. Test asserts the canonical id flows into tags,
annotations, and the vLLM ``--model_id`` arg.
Four changes, all test/defensive hygiene:

1. Stub MLflow tracking-server health check at conftest-import time so
   tests importing ``cogflow.core.models`` don't hit real HTTP. The
   ``_models = ModelManager()`` singleton is built at module import,
   so an autouse fixture runs too late. Approach: swap the network
   helper for a no-op, force-import ``cogflow.core.models`` so its
   singleton initializes against the stub, then restore the real
   helper so ``tests/test_network.py`` still tests genuine behaviour.

2. Clean up ``_patch_llm_catalog`` test helper: remove the unused
   ``serving_module`` parameter and update the docstring to reference
   the actual patch target (``cogflow.core.models``, not
   ``cogflow.models``). All 10 call sites updated.

3. ``register_llm_catalog_entry`` now normalizes ``hf_model_id`` on
   entry — strip an optional ``hf://`` prefix, then route through
   ``_extract_hf_model_id`` to reject whitespace / empty / slash-only
   inputs. ``serve_llm`` already did this; direct callers (notebook
   users reaching for the helper) now get the same protection so we
   can't end up with an orphan MLflow run on bad input regardless of
   entry point.

Full suite: 243 pass.
Round 3 of fixes. Previous single-layer hf:// strip was incomplete:
an input like ``hf_model_id='hf://hf://Org/Name'`` would strip to
``hf://Org/Name``, pass ``_extract_hf_model_id`` (which only strips
one prefix and didn't reject embedded schemes), and reach vLLM as an
invalid ``--model_id``. Same flaw on ``storage_uri='hf://hf://…'``.

Fix at the DRY layer: ``_extract_hf_model_id`` now rejects any
extracted id containing ``://``. Every caller
(``_build_llm_predictor``, sync+async ``serve_llm``, direct
``register_llm_catalog_entry``) inherits the validation. Real HF ids
are ``org/name`` or ``name`` — they never contain ``://``, so the
check is safe.

Also closes a related consistency gap: rejecting ``hf_model_id`` paired
with a non-``hf://`` ``storage_uri`` (e.g. ``s3://``). We'd otherwise
tag the catalog as HuggingFace-sourced while deploying from s3 —
catalog diverges from the actual artifact. MLflow-backed LLMs use
``storage_uri`` alone; ``hf_model_id`` stays reserved for the HF path.

Comment on the single-layer ``hf://`` strip in ``serve_llm`` updated
to note why there's no loop (the validator catches deeper nesting).

Tests:
- test_extract_hf_model_id_rejects_embedded_scheme: DRY-layer coverage
  for double-prefix, s3://-smuggled, file://-smuggled inputs.
- test_serve_llm_rejects_double_hf_scheme_prefix: serve_llm paths
  (bare id, storage_uri, non-hf scheme) all reject pre-catalog.
- test_serve_llm_rejects_hf_id_with_non_hf_storage_uri: non-HF
  storage with hf_model_id is rejected as inconsistent.
…m-with-catalog-registration

serve_llm: one-call LLM deploy with MLflow-backed catalog registration
alishalbaf and others added 15 commits April 22, 2026 01:02
Root cause of the 2.0.1b10 docker-publish failure: after PR #87 made
pre-release builds amd64-only, the total runtime dropped from ~17 min
(with arm64 emulation) to ~3 min, removing an unintended "propagation
buffer" between the ``Wait for PyPI simple index`` step and the
``pip install cogflow==$VERSION`` inside the docker build. The wait
step saw a CDN edge that already had the new version; pip inside
buildx hit a different edge still serving the stale cache.

Cleanest fix: skip PyPI entirely on the tag-push path. The ``build``
job already produces the exact wheel PyPI gets and uploads it as an
artifact. ``docker-publish`` now downloads that artifact into
``./dist/`` and the Dockerfile installs from the wheel directly.

Upsides:
- No race: the image uses the same bytes PyPI got.
- ~4 min saved on every tag-push release (no more 4-min poll loop).
- Works even during a transient PyPI outage.

``workflow_dispatch`` keeps the PyPI path (no artifact — the ``build``
job is skipped for that trigger). The wait step stays only for that
path; dispatch targets already-published versions, so propagation is
usually a non-issue anyway.

The Dockerfile falls back to ``pip install cogflow==$VERSION`` when
``./dist/`` is empty, so:
- Local ``docker build`` with no wheel still works (PyPI install).
- workflow_dispatch with empty ``dist/`` uses PyPI as today.
- Tag push with wheel present uses the local wheel.

Glob check uses POSIX-sh-compatible ``[ -n "$(ls …)" ]`` since the
``python:3.10-slim`` base image's ``/bin/sh`` is dash (no ``compgen``).

Verified locally: ``docker build`` with empty ``dist/`` falls back to
the PyPI install path and produces a working image (``import cogflow``
smoke test passes).
Previous ``cogflow-*.whl`` glob matched any version, so a stale wheel
from a different build left in the context would silently install the
wrong cogflow and contradict the ``COGFLOW_VERSION`` build arg.

Now use ``cogflow-\${COGFLOW_VERSION}-*.whl`` and assert the match
count:
- exactly 1 → install that wheel
- more than 1 → hard-fail rather than guess which to install
- 0 → fall back to PyPI (workflow_dispatch / local no-artifact path)

Uses ``find ... -name`` instead of ``ls ...`` so the version can be
interpolated safely, stays POSIX-sh portable (slim base is dash), and
keeps the fail-fast semantic.

Verified locally: empty dist → PyPI fallback; single matching wheel →
install it; two matching wheels → fail; stale wheel from another
version → PyPI fallback (no silent wrong-version install).

Copilot review on PR #89.
Previous ``COPY dist/ /tmp/cogflow-dist/`` + later ``rm -rf`` leaves
the wheel bytes committed in an intermediate image layer — Docker
layers are immutable so the later ``rm`` can't reclaim the space. On
the tag-push path the wheel is downloaded into ``dist/`` specifically
to be consumed, so there's zero reason for it to end up in the final
image.

Switch to a BuildKit bind mount (``RUN --mount=type=bind,source=dist``):
the wheel is mounted read-only for the duration of the RUN only,
nothing gets committed to a layer. Added ``# syntax=docker/dockerfile:1``
at the top of the file — the ``--mount`` feature requires Dockerfile
frontend 1.2+. ``docker/build-push-action@v6`` already uses BuildKit
so the release workflow doesn't need any other change.

Also dropped the now-unnecessary ``rm -rf /tmp/cogflow-dist`` cleanup
(the mount disappears automatically on RUN exit) and the matching
cleanup in the error branch.

Verified locally: ``DOCKER_BUILDKIT=1 docker build`` with empty
``dist/`` still falls through to the PyPI install path and produces
a working image (``import cogflow`` smoke test passes).
Real fix for the cog-api -> cogflow -> cog-api deadlock hit in prod
by POST /apidev/models-serving. When cog-api's async handler awaits
async_serve_llm, the inner register_llm_catalog_entry called a
blocking requests.post back to {API_PATH}/models/log — and since that
URL resolves to the same uvicorn worker, the event loop accepting
the callback was the one blocked on it. 15s × 3 retries, catalog
POST lost, ISVC orphan.

Unlike asyncio.to_thread (a workaround), this gives the callback
path real async I/O:

- Add httpx dep (>=0.25,<1). aiohttp was available transitively but
  httpx matches the codebase's sync requests style 1:1 (json=, .is_success,
  .raise_for_status, etc.) and lets us drop it in without rethinking
  multipart/streaming semantics.
- Add cogflow.utils.network.make_async_post_request — mirror of the
  sync make_post_request but httpx-backed. tenacity @Retry works on
  async functions since 6.2, so we keep the same 3-attempt backoff
  policy.
- Extract two helpers in ModelManager so sync and async registration
  paths don't duplicate logic:
  - _prepare_llm_catalog_run: validate hf_model_id, open/close the
    MLflow run, set tags, return (run_id, start_time_ms, description,
    normalized_hf_model_id).
  - _build_llm_catalog_payload: shape the /models/log POST body
    (url, model_dict, headers, resolved_user).
- Sync register_llm_catalog_entry stays behaviourally identical,
  now built on the helpers.
- New async_register_llm_catalog_entry: same semantics as the sync
  version, uses make_async_post_request for the backend call.
- async_serve_llm awaits async_register_llm_catalog_entry instead
  of the sync method.

MLflow's tracking client is still sync — no first-party async MLflow
exists and MLflow server is a separate service, so the brief
event-loop block it causes doesn't self-deadlock. If/when an async
MLflow client appears, _prepare_llm_catalog_run is the single place
to swap it in.

Sync callers (notebook users of cogflow.models.register_llm_catalog_entry,
the sync serve_llm path) are unchanged.
1. Add httpx to requirements.txt — CI installs from this file (not
   pyproject.toml), so without it the import in network.py fails at
   test time.
2. Mirror sync make_post_request body conditional: change
   'if data is not None' to 'if data:' so empty dict {} is treated
   as no body (matches sync contract).
3. Restrict @Retry to httpx.HTTPError via retry_if_exception_type so
   logic errors (e.g. ValueError from a bogus json() call) fail fast
   on the first attempt instead of being retried 3x. Docstring already
   claimed this; now the behaviour matches.
4. Add four async tests in test_network.py mirroring the sync POST
   tests:
   - success with json body
   - empty body sends no json (parity with sync)
   - 4xx triggers raise_for_status -> httpx.HTTPError -> RetryError
     after 3 attempts (verifies retry policy + count)
   - non-httpx exception (ValueError from .json()) raised on first
     attempt (verifies the retry restriction)
Cogflow repo guideline: public (non-underscore) function/method
docstrings must not name internal implementation dependencies —
consumers shouldn't couple their mental model to what's under the
hood. Only the public docstrings touched in this PR are changed;
pre-existing mentions elsewhere in the module are out of scope.

- make_async_post_request: drop httpx / tenacity mentions; abstract
  to "async HTTP client" and "retries up to N times".
- register_llm_catalog_entry: drop MLflow mentions; use "tracking
  run", "tracking backend", "catalog service". Also reshape the
  source-tag note so it no longer spells the fallback string.
- async_register_llm_catalog_entry: same; describe behaviour in
  terms of "async HTTP client" and "tracking service", not MLflow
  or uvicorn.
- async_serve_llm: drop httpx / uvicorn / MLflow mentions; use
  "async HTTP client", "process", "tracking service".

Private helpers (_prepare_llm_catalog_run, _build_llm_catalog_payload)
and inline code comments keep the concrete lib names for maintainer
clarity — same guideline's carve-out.
…llm-catalog-httpx

async_serve_llm: use httpx for /models/log callback
- Pin syntax frontend to docker/dockerfile:1.4 instead of bare "1".
  Bare "1" floats to whatever is latest; 1.4 is the minimum that
  supports RUN --mount=type=bind and lets builds stay reproducible.
- Add --no-cache-dir to both wheel-install and PyPI-fallback pip
  calls (plus the shap/xgboost install above) so pip's download/
  build cache doesn't persist into an image layer. protobuf install
  already uses it.
- Capture the find result once instead of running it twice. A single
  variable (wheel_paths) now backs both the count derivation and the
  install-target / stderr dump, eliminating the count-vs-path
  divergence risk Copilot flagged.

Semantics unchanged: 0 / 1 / >1 branches behave the same.
…use-build-artifact-no-pypi-race

release: install cogflow from build-job artifact, not PyPI
LLM pods are large, GPU-pinned, and almost always run as a single
replica. Knative/Serverless mode requires the new revision to reach
Ready before the old one is torn down — but on a one-GPU node the
new pod can never schedule (the GPU is held by the old pod), so the
rollout deadlocks. Defaulting to RawDeployment with
``strategy.type: Recreate`` makes the old pod terminate first,
freeing the GPU for the new one.

- Adds ``ServingManager._apply_raw_deployment_defaults`` static
  helper. Returns the (possibly augmented) predictor and the
  merged annotations dict. Caller-supplied ``serving.kserve.io/deploymentMode``
  wins, so opting back into Serverless is a single annotation away.
- ``deploymentStrategy`` is rejected by the KServe webhook for
  Serverless ISVCs, so it's only injected when the resolved
  deployment mode is RawDeployment.
- Wired into both ``ServingManager.deploy_llm`` (sync) and
  ``AsyncServingManager._deploy_llm`` (async) so the two surfaces
  stay consistent.
Stage N+1 of the Cog-Engine async migration. ``cogflow/utils/network.py``
already had ``make_async_post_request`` (added in PR #91 to fix the
cog-api → cogflow → cog-api self-deadlock). This PR completes the
async surface so any callsite that's already inside an ``async def``
can issue any HTTP verb without blocking the event loop.

New helpers, all using ``httpx.AsyncClient`` and matching their sync
counterparts call-for-call:

- ``make_async_get_request`` — same path-param + pagination
  semantics as the sync ``make_get_request``. Tenacity retries on
  ``httpx.HTTPError`` only (not on JSON-decode bugs).
- ``make_async_delete_request`` — 200/202/204 → True; same
  retry-on-transport-error policy.
- ``make_async_patch_request`` — JSON body when ``data`` is set,
  raises ``RetryError`` after 3 retries on a non-success.
- ``make_async_get_request_raw`` — returns parsed JSON or ``None``
  on transport failure (mirrors the sync helper's actual behaviour;
  the ``Response`` annotation in the original docstring was wrong).
- ``make_async_health_check_request`` — no retry; any 2xx → True,
  anything else (including transport error) → False.

``make_async_get_request_stream`` deliberately omitted: streaming with
``httpx.AsyncClient`` requires the client to outlive the iteration,
which doesn't fit the "single helper, return a Response" shape.
Callers needing async streaming can use the ``AsyncExitStack`` +
handed-off-flag pattern from ``S3Utils.fetch_s3_object_stream`` in
Cog-Engine.

Tests: extended ``_FakeAsyncClient`` to cover GET/DELETE/PATCH per-verb
captured args + per-call response sequences for pagination. 17 new
tests covering success, failure (RetryError after 3 retries on
transport errors), and the no-retry-on-transport-failure shape of the
raw + health-check helpers. ``test_make_async_get_request_pagination``
exercises the multi-page path with two distinct responses returned in
sequence.
Add async_get_dataset / async_get_prometheus_dataset /
async_delete_dataset on DatasetManager, plus an async_register_component
function. All route through the async network helpers landed earlier
in this PR so async callers (e.g. an async FastAPI route) no longer
block the event loop on the registry round-trip.

The MinIO upload prelude in async_register_component still runs sync
inside asyncio.to_thread because the MinIO SDK has no async client;
that consumes a worker thread but doesn't block the loop.

Skipped on purpose:
- async_register_dataset: needs multipart files= which the new async
  POST helper doesn't accept yet.
- async_download_dataset: streaming response, needs the AsyncExitStack
  pattern (separate follow-up).

Tests cover happy + error paths for each new public method.
network.py
- All async helpers now instantiate httpx.AsyncClient with
  follow_redirects=True to match sync requests' default. Without it
  the async path silently diverged on HTTP->HTTPS hops and any
  trailing-slash redirect (Copilot inline comments #1, #2, #5).
- POST and PATCH success check switched from response.is_success
  (2xx only) to "not response.is_error" (<400) so behaviour matches
  sync requests.Response.ok exactly. Also closes the implicit-None
  return path on a 3xx end-state for PATCH that Copilot flagged on
  the original code (#4).
- GET pagination loop switched from "not is_success" to "is_error"
  for the same reason (3xx pages should not silently break the loop).

components.py
- Dropped the unused make_async_get_request import (#6).
- Tightened docstring: only the MinIO upload runs inside
  asyncio.to_thread, not the YAML parse or file read (#7).
- async_register_component now raises CogflowComponentRegistryError
  when the registry check returns None (transport failure) instead
  of silently treating it as "no existing component" and risking a
  duplicate create during an intermittent registry outage (#8).

tests
- FakeAsyncResponse now exposes is_error derived from is_success so
  existing fixtures keep working without per-test changes.
- New test_async_clients_follow_redirects pins down that every async
  helper instantiates httpx.AsyncClient with follow_redirects=True,
  so this divergence cannot be reintroduced.
- New test_async_register_component_registry_unreachable covers the
  None-from-make_async_get_request_raw path.

Copilot inline comment #3 (DELETE retries on 4xx/5xx through
raise_for_status -> HTTPStatusError) is intentional: the sync
make_delete_request behaves identically (RequestException is the
base class, bare @Retry catches everything). The PR description
will be updated separately to remove the misleading
"transport-error-only" wording for DELETE.
network: async mirrors for GET/DELETE/PATCH/raw/health
…w-deployment-default

Default LLM ISVCs to RawDeployment + Recreate strategy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants