Refactor by SaiKireeti2905 · Pull Request #72 · HIRO-MicroDataCenters-BV/cogflow

SaiKireeti2905 · 2026-01-20T08:53:11Z

This pull request introduces major improvements to the CogFlow project, focusing on codebase modernization, configuration management, and developer ergonomics. The most significant changes include a complete rewrite of the README.md for clarity, the introduction of a centralized and environment-driven configuration system using Pydantic, the addition of a lazy-loading mechanism for submodules, and the removal of legacy plugin configuration files and unused Kafka consumer code. Pre-commit and linting configurations were also updated for consistency and to better exclude virtual environments and test directories.

Documentation and Developer Experience:

Revamped README.md with a clear, modern overview of CogFlow, its architecture, and its core features, replacing the previous function-by-function listing with a more conceptual and user-focused introduction.

Configuration Management:

Introduced a new config.py module using Pydantic for centralized, environment-variable-driven configuration, replacing the old .ini-based approach. This improves maintainability, type safety, and integration with modern Python tooling.
Removed the obsolete cogflow_config.ini file, consolidating all configuration into the new Python-based system.

Codebase Modernization and Simplification:

Added a _lazy.py module implementing a robust lazy-loading mechanism for cogflow submodules, improving import performance and reducing unnecessary dependencies at startup.
Deleted the unused Kafka consumer implementation in kafka/consumer.py, reducing codebase clutter and potential maintenance overhead.
Updated core/pipelines/__init__.py to provide a unified, user-facing API for pipeline and component management, improving discoverability and consistency.
Commented out legacy API exposure in api.py in preparation for a more modern, modular API approach.

Linting and Pre-commit Configuration:

Enhanced .pre-commit-config.yaml to more comprehensively exclude virtual environments, examples, wrappers, and test directories from linting and formatting hooks, ensuring that only relevant source code is checked. [1] [2] [3] [4]
Updated .pylintrc to allow more attributes per class, increased max-args, and added new disables, reflecting evolving codebase complexity and style needs. (Fceca9eaL1)
Removed redundant test execution step from the CI workflow, likely to avoid duplication with pre-commit or other test runners.

2.0.0b1

Copilot

Pull request overview

This pull request introduces a comprehensive modernization of the CogFlow codebase, focusing on replacing legacy plugin-based architecture with a cleaner, Pydantic-based configuration system and adding extensive test coverage across core modules. The changes enhance type safety, developer experience, and maintainability by removing deprecated .ini files, consolidating network utilities, and establishing a robust exception framework.

Changes:

Introduced centralized Pydantic-based configuration management replacing legacy .ini files
Added comprehensive test coverage for storage, serving, orchestration, network, models, datasets, components, exceptions, and common utilities
Removed legacy plugin system files (PluginManager, setup.py, docker-compose.yaml) and migrated to modern pyproject.toml-based project structure

Reviewed changes

Copilot reviewed 55 out of 60 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
tests/test_storage.py	New test suite for MinIO client creation and S3 endpoint parsing
tests/test_serving.py	New test suite for ServingManager Kubernetes operations
tests/test_orchestration.py	New test suite for Kubeflow Pipelines orchestration layer
tests/test_network.py	New test suite for HTTP request utilities with retry logic
tests/test_models.py	Comprehensive test suite for ModelManager MLflow operations
tests/test_datasets.py	New test suite for DatasetManager operations
tests/test_exceptions.py	New test suite for CogFlow exception framework
tests/test_components.py	New test suite for component registry and YAML parsing
tests/test_common.py	New test suite for UUID, serialization, and Kubernetes utilities
tests/conftest.py	New pytest configuration with autouse fixtures for mocking
pytest.ini	Updated pytest configuration with asyncio mode
pyproject.toml	Migrated from setup.py to modern pyproject.toml with dynamic versioning
cogflow/utils/storage.py	New MinIO client utility with URL parsing
cogflow/utils/network.py	New centralized network request utilities with tenacity retry logic
cogflow/utils/logging.py	New standardized logger configuration
cogflow/utils/imports.py	New lazy import utility for heavy dependencies
cogflow/utils/exceptions.py	New comprehensive exception hierarchy and error handler
cogflow/utils/common.py	New common utilities for UUID, serialization, and Kubernetes helpers
cogflow/api.py	Commented out legacy API exposure in preparation for modular refactor
setup.py	Removed in favor of pyproject.toml
docker-compose.yaml	Removed local development compose stack
cogflow/v2/init.py	Removed KFP v2 globals exposure
cogflow/util.py	Removed and refactored into cogflow/utils/network.py and cogflow/utils/common.py
tests/test_plugin_manager.py	Removed legacy plugin system tests
tests/test_notebook_plugin.py	Removed legacy notebook plugin tests
tests/test_mlflowplugin.py	Removed legacy MLflow plugin tests
tests/test_kubeflowplugin.py	Removed legacy Kubeflow plugin tests
tests/test_dataset_plugin.py	Removed legacy dataset plugin tests
tests/test.ipynb	Removed test notebook file

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

2.0.0b1

Models: - Add get_run(), get_run_artifact_uri(), list_artifacts_grouped() Serving: - Add model_type param to deploy_model(), update_model(), _process_isvc() - Add AsyncServingManager using kubernetes_asyncio for non-blocking ops - Expose async_deploy_model, async_list_models, etc. at module level Pipelines: - Add pipeline query functions: list_pipelines_by_name, get_pipeline_task_sequence*, get_task_structure_by_task_id - Add pod inspection: get_pod_definition, get_pod_events, get_pod_logs, get_inference_service_logs (both sync and async variants) - Add list_all_kfp_runs for paginated KFP run listing Dependencies: - Add kubernetes_asyncio to pyproject.toml

- Fix _parse_run() experiment_id extraction to iterate resource_references - Fix get_pipeline_task_sequence_by_pipeline_id to filter runs by pipeline_id - Fix _list_runs_by_pipeline_id to filter by pipeline_id - Fix get_pod_logs to raise CogflowConnectionError instead of generic Exception - Fix async config loading to use config_exception.ConfigException - Add model_type docstrings to deploy_model and update_model

- test_async_serving.py: 9 tests covering AsyncServingManager CRUD, deploy_model with/without model_type, list_models, and module exports - test_inspection.py: 13 tests covering pipeline inspection, pagination, pipeline_id filtering, workflow node parsing, and pod inspection error handling - test_serving.py: 2 new tests for sync deploy_model with model_type - Fix circular import in async_serving.py via lazy ServingManager loading - Fix test_init_loads_k8s_config_once to reset common._k8s_loaded_flag for proper test isolation

async_serving.py: - Align async create_isvc spec with sync ServingManager.create_isvc: use serviceAccountName/minReplicas in predictor, place transformer at top-level spec.transformer only when transformer_env present - Align async update_model transformer patch path with sync version (spec.transformer instead of spec.predictor.transformer) - Remove unreachable ApiException handler in update_model since get_isvc already wraps to CogflowConnectionError; allow that exception to propagate inspection.py: - get_pod_logs / async_get_pod_logs now return List[str] instead of JSON-encoded string, eliminating double-encoding in get_inference_service_logs / async_get_inference_service_logs - Remove unused imports (timedelta, CogflowErrorHandler, CogflowPipelineError) test_models.py: - Add 9 new tests covering get_run, get_run_artifact_uri, and list_artifacts_grouped (success, normalization, failure wrapping, directory grouping)

Release cogflow v2.0.1b1

Split build-and-release into build → publish-pypi → github-release jobs. Publish job uses OIDC trusted publishing (no API tokens) and runs only on v* tags via a `pypi` GitHub environment.

…ve-names deploy_llm: derive isvc_name / served_model_name from hf_model_id

Pre-release docker publishes (beta / rc / alpha) now build linux/amd64 only; stable releases stay multi-arch. Rationale: the pre-release channels are consumed exclusively by our own dev-cluster CD, which runs on amd64 nodes, so the arm64 layer was built and pushed but never pulled. Arm64 emulation under QEMU was also the slow step of the job; skipping it trims ~8–10 min per beta release. Implementation: the tags step now also emits `platforms`, which the build-push step consumes and which gates the QEMU setup via `contains(., 'arm64')`. Stable still emits `linux/amd64,linux/arm64`. Other improvements bundled: - Add buildx GitHub-Actions cache (`type=gha`, `mode=max`) so runs that only change the cogflow version or a requirements pin reuse earlier layers instead of rebuilding from scratch. - Bump docker/build-push-action v5 → v6 (matches Cog-Engine cd.yml). - Bump Node-20 actions to Node-24-ready versions ahead of the 2026-06-02 runner switch: actions/checkout v4 → v5 actions/setup-python v5 → v6 actions/upload-artifact v4 → v5 actions/download-artifact v4 → v5 - Tidy the docker-publish `if:` block and its comment — same condition, clearer structure, explains why `always()` is still required for the workflow_dispatch path.

Avoids a hardcoded platform list drifting from the step that gates the QEMU setup (Copilot review on the paired Cog-Engine PR #275).

…amd64-only-for-prereleases release.yml: amd64-only for pre-releases, cache, action bumps

Introduces the high-level LLM entrypoint — ``cogflow.serving.serve_llm`` and its async counterpart — that callers (notebook users or Cog-Engine's POST /models-serving) can reach for to deploy an LLM and have it appear in the CogFlow catalog in one shot. Pairs with a new ``cogflow.models.register_llm_catalog_entry`` helper that opens an MLflow run, tags it with LLM metadata (type, source, hf_model_id, note), and POSTs a catalog entry to ``{API_PATH}/models/log`` — the same pattern ``log_model`` already uses for classical artifacts. The POST is best-effort (warn-on-failure, like log_model) so a blip on the catalog service doesn't abort a successful MLflow run. Why: today, notebook users calling ``cogflow.serving.deploy_llm`` get a KServe InferenceService but no catalog row — the LLM never shows up in ``GET /models``. Cog-Engine's ``_deploy_llm`` papers over this by inserting a row itself, but uses ``uuid4()`` as the id and skips MLflow entirely. That breaks the long-standing invariant ``model_info.id == MLflow run_id`` and makes ``GET /models/{id}`` 500 on LLM rows (because ``cogflow_models.get_run(model_id)`` has no run to return). Putting the registration in cogflow fixes both paths with one implementation. What's in the box: - ``ModelManager.register_llm_catalog_entry`` (cogflow/core/models.py): opens the MLflow run, sets tags, best-effort POSTs to the catalog, returns the run_id. Mirrors ``log_model``'s POST pattern (same URL, same headers shape, same warn-on-failure). - ``ServingManager.serve_llm`` / ``AsyncServingManager.serve_llm`` (cogflow/core/serving.py, cogflow/core/async_serving.py): resolves storage_uri shorthand (``hf_model_id`` → ``hf://{id}``), derives ``isvc_name`` / ``served_model_name`` via the existing ``derive_llm_names`` helper, registers the catalog entry, merges ``model_id``/``model_type``/``hf_model_id`` annotations onto the caller-provided ones, then delegates to ``deploy_llm`` for the ISVC create. Returns {run_id, isvc_name, served_model_name, isvc}. - Async path keeps the catalog step sync (MLflow tracking and the catalog POST both block) and only the ISVC create is awaited — preserves behavioural parity with the existing deploy_llm pair. - ``deploy_llm`` itself is unchanged: kept as a low-level primitive so callers that want just the ISVC (no catalog) can still reach for it. - Tests cover: end-to-end serve_llm, hf:// shorthand → hf_model_id extraction, caller-annotation merging (model_id overrides), missing source → validation error, explicit isvc_name preserved, the register_llm_catalog_entry POST payload shape, and the warn-only behaviour on catalog POST failure. Async variant has a matching regression test for the ISVC annotations + export list. Catalog-side (Cog-Engine) will follow in a paired PR: switch ``_deploy_llm`` to call ``async_serve_llm``, drop its manual DB insert, add a belt-and-braces MLflow-run fallback in ``get_model_by_id`` for legacy rows that predate this change.

- Identity annotations (``model_type``, ``hf_model_id``) are now set authoritatively in both sync and async ``serve_llm`` — caller values don't override what actually got deployed. Unrelated caller annotations still pass through. Prevents ISVC metadata drifting from the catalog entry. - Reject ``storage_uri`` / ``hf_model_id`` mismatches up-front: if both are provided and the ``hf://`` URI encodes a different id, raise ``CogflowValidationError``. Without this, we'd catalog/tag one model while deploying another. - ``register_llm_catalog_entry`` applies caller ``extra_tags`` *before* the reserved identity tags so the reserved ones always win on a key collision. A caller passing ``extra_tags={"type": "foo"}`` can no longer desync the MLflow run from the catalog entry. - ``model_version`` payload: use 0 (the "unknown registry version" sentinel already used by ``log_model``) instead of 1. HF-sourced LLMs have no MLflow registry version; 1 was arbitrary. Tests: - test_serve_llm_identity_annotations_are_authoritative: identity keys overridden regardless of caller annotation values. - test_serve_llm_storage_uri_hf_id_mismatch_raises: mismatch surfaces as a typed error. - test_register_llm_catalog_entry_extra_tags_cannot_override_reserved: set_tag call order proves reserved keys win on collision.

serve_llm already validates hf_model_id when it arrives as part of a ``hf://`` storage_uri, but a caller passing only ``hf_model_id=`` got no validation before the catalog step. A value like ``"Qwen/Qwen2.5-Coder-7B-Instruct/"`` (trailing slash) would open an MLflow run + POST to /models/log with the unnormalized id, and only then get rejected inside deploy_llm — leaving an orphan run behind. Route a bare ``hf_model_id`` through the same ``_extract_hf_model_id`` helper the URI path already uses. Also normalize both sides of the ``storage_uri`` / ``hf_model_id`` mismatch check so ``"Org/Name/"`` compares equal to ``"Org/Name"`` (preventing false-negative mismatches). Applies to both sync and async serve_llm. Tests: - test_serve_llm_normalizes_dirty_hf_model_id: trailing/leading slashes are stripped; catalog and annotations see the canonical form. - test_serve_llm_invalid_hf_model_id_rejected_before_catalog: empty / whitespace / slash-only ids raise before register_llm_catalog_entry runs — no orphan run / catalog side effects.

If a caller accidentally passes ``hf_model_id="hf://Org/Name"``, we'd build ``"hf://hf://Org/Name"`` and ``_extract_hf_model_id`` would accept it (it only strips one ``hf://`` and doesn't reject embedded schemes), so we'd deploy / tag a bogus URI. Strip one optional ``hf://`` prefix up-front in both sync and async paths. Covers the bare-id branch and the mismatch-check branch with a single line. Test asserts the canonical id flows into tags, annotations, and the vLLM ``--model_id`` arg.

Four changes, all test/defensive hygiene: 1. Stub MLflow tracking-server health check at conftest-import time so tests importing ``cogflow.core.models`` don't hit real HTTP. The ``_models = ModelManager()`` singleton is built at module import, so an autouse fixture runs too late. Approach: swap the network helper for a no-op, force-import ``cogflow.core.models`` so its singleton initializes against the stub, then restore the real helper so ``tests/test_network.py`` still tests genuine behaviour. 2. Clean up ``_patch_llm_catalog`` test helper: remove the unused ``serving_module`` parameter and update the docstring to reference the actual patch target (``cogflow.core.models``, not ``cogflow.models``). All 10 call sites updated. 3. ``register_llm_catalog_entry`` now normalizes ``hf_model_id`` on entry — strip an optional ``hf://`` prefix, then route through ``_extract_hf_model_id`` to reject whitespace / empty / slash-only inputs. ``serve_llm`` already did this; direct callers (notebook users reaching for the helper) now get the same protection so we can't end up with an orphan MLflow run on bad input regardless of entry point. Full suite: 243 pass.

Round 3 of fixes. Previous single-layer hf:// strip was incomplete: an input like ``hf_model_id='hf://hf://Org/Name'`` would strip to ``hf://Org/Name``, pass ``_extract_hf_model_id`` (which only strips one prefix and didn't reject embedded schemes), and reach vLLM as an invalid ``--model_id``. Same flaw on ``storage_uri='hf://hf://…'``. Fix at the DRY layer: ``_extract_hf_model_id`` now rejects any extracted id containing ``://``. Every caller (``_build_llm_predictor``, sync+async ``serve_llm``, direct ``register_llm_catalog_entry``) inherits the validation. Real HF ids are ``org/name`` or ``name`` — they never contain ``://``, so the check is safe. Also closes a related consistency gap: rejecting ``hf_model_id`` paired with a non-``hf://`` ``storage_uri`` (e.g. ``s3://``). We'd otherwise tag the catalog as HuggingFace-sourced while deploying from s3 — catalog diverges from the actual artifact. MLflow-backed LLMs use ``storage_uri`` alone; ``hf_model_id`` stays reserved for the HF path. Comment on the single-layer ``hf://`` strip in ``serve_llm`` updated to note why there's no loop (the validator catches deeper nesting). Tests: - test_extract_hf_model_id_rejects_embedded_scheme: DRY-layer coverage for double-prefix, s3://-smuggled, file://-smuggled inputs. - test_serve_llm_rejects_double_hf_scheme_prefix: serve_llm paths (bare id, storage_uri, non-hf scheme) all reject pre-catalog. - test_serve_llm_rejects_hf_id_with_non_hf_storage_uri: non-HF storage with hf_model_id is rejected as inconsistent.

…m-with-catalog-registration serve_llm: one-call LLM deploy with MLflow-backed catalog registration

Root cause of the 2.0.1b10 docker-publish failure: after PR #87 made pre-release builds amd64-only, the total runtime dropped from ~17 min (with arm64 emulation) to ~3 min, removing an unintended "propagation buffer" between the ``Wait for PyPI simple index`` step and the ``pip install cogflow==$VERSION`` inside the docker build. The wait step saw a CDN edge that already had the new version; pip inside buildx hit a different edge still serving the stale cache. Cleanest fix: skip PyPI entirely on the tag-push path. The ``build`` job already produces the exact wheel PyPI gets and uploads it as an artifact. ``docker-publish`` now downloads that artifact into ``./dist/`` and the Dockerfile installs from the wheel directly. Upsides: - No race: the image uses the same bytes PyPI got. - ~4 min saved on every tag-push release (no more 4-min poll loop). - Works even during a transient PyPI outage. ``workflow_dispatch`` keeps the PyPI path (no artifact — the ``build`` job is skipped for that trigger). The wait step stays only for that path; dispatch targets already-published versions, so propagation is usually a non-issue anyway. The Dockerfile falls back to ``pip install cogflow==$VERSION`` when ``./dist/`` is empty, so: - Local ``docker build`` with no wheel still works (PyPI install). - workflow_dispatch with empty ``dist/`` uses PyPI as today. - Tag push with wheel present uses the local wheel. Glob check uses POSIX-sh-compatible ``[ -n "$(ls …)" ]`` since the ``python:3.10-slim`` base image's ``/bin/sh`` is dash (no ``compgen``). Verified locally: ``docker build`` with empty ``dist/`` falls back to the PyPI install path and produces a working image (``import cogflow`` smoke test passes).

Previous ``cogflow-*.whl`` glob matched any version, so a stale wheel from a different build left in the context would silently install the wrong cogflow and contradict the ``COGFLOW_VERSION`` build arg. Now use ``cogflow-\${COGFLOW_VERSION}-*.whl`` and assert the match count: - exactly 1 → install that wheel - more than 1 → hard-fail rather than guess which to install - 0 → fall back to PyPI (workflow_dispatch / local no-artifact path) Uses ``find ... -name`` instead of ``ls ...`` so the version can be interpolated safely, stays POSIX-sh portable (slim base is dash), and keeps the fail-fast semantic. Verified locally: empty dist → PyPI fallback; single matching wheel → install it; two matching wheels → fail; stale wheel from another version → PyPI fallback (no silent wrong-version install). Copilot review on PR #89.

Previous ``COPY dist/ /tmp/cogflow-dist/`` + later ``rm -rf`` leaves the wheel bytes committed in an intermediate image layer — Docker layers are immutable so the later ``rm`` can't reclaim the space. On the tag-push path the wheel is downloaded into ``dist/`` specifically to be consumed, so there's zero reason for it to end up in the final image. Switch to a BuildKit bind mount (``RUN --mount=type=bind,source=dist``): the wheel is mounted read-only for the duration of the RUN only, nothing gets committed to a layer. Added ``# syntax=docker/dockerfile:1`` at the top of the file — the ``--mount`` feature requires Dockerfile frontend 1.2+. ``docker/build-push-action@v6`` already uses BuildKit so the release workflow doesn't need any other change. Also dropped the now-unnecessary ``rm -rf /tmp/cogflow-dist`` cleanup (the mount disappears automatically on RUN exit) and the matching cleanup in the error branch. Verified locally: ``DOCKER_BUILDKIT=1 docker build`` with empty ``dist/`` still falls through to the PyPI install path and produces a working image (``import cogflow`` smoke test passes).

@Retry

Real fix for the cog-api -> cogflow -> cog-api deadlock hit in prod by POST /apidev/models-serving. When cog-api's async handler awaits async_serve_llm, the inner register_llm_catalog_entry called a blocking requests.post back to {API_PATH}/models/log — and since that URL resolves to the same uvicorn worker, the event loop accepting the callback was the one blocked on it. 15s × 3 retries, catalog POST lost, ISVC orphan. Unlike asyncio.to_thread (a workaround), this gives the callback path real async I/O: - Add httpx dep (>=0.25,<1). aiohttp was available transitively but httpx matches the codebase's sync requests style 1:1 (json=, .is_success, .raise_for_status, etc.) and lets us drop it in without rethinking multipart/streaming semantics. - Add cogflow.utils.network.make_async_post_request — mirror of the sync make_post_request but httpx-backed. tenacity @Retry works on async functions since 6.2, so we keep the same 3-attempt backoff policy. - Extract two helpers in ModelManager so sync and async registration paths don't duplicate logic: - _prepare_llm_catalog_run: validate hf_model_id, open/close the MLflow run, set tags, return (run_id, start_time_ms, description, normalized_hf_model_id). - _build_llm_catalog_payload: shape the /models/log POST body (url, model_dict, headers, resolved_user). - Sync register_llm_catalog_entry stays behaviourally identical, now built on the helpers. - New async_register_llm_catalog_entry: same semantics as the sync version, uses make_async_post_request for the backend call. - async_serve_llm awaits async_register_llm_catalog_entry instead of the sync method. MLflow's tracking client is still sync — no first-party async MLflow exists and MLflow server is a separate service, so the brief event-loop block it causes doesn't self-deadlock. If/when an async MLflow client appears, _prepare_llm_catalog_run is the single place to swap it in. Sync callers (notebook users of cogflow.models.register_llm_catalog_entry, the sync serve_llm path) are unchanged.

@Retry

1. Add httpx to requirements.txt — CI installs from this file (not pyproject.toml), so without it the import in network.py fails at test time. 2. Mirror sync make_post_request body conditional: change 'if data is not None' to 'if data:' so empty dict {} is treated as no body (matches sync contract). 3. Restrict @Retry to httpx.HTTPError via retry_if_exception_type so logic errors (e.g. ValueError from a bogus json() call) fail fast on the first attempt instead of being retried 3x. Docstring already claimed this; now the behaviour matches. 4. Add four async tests in test_network.py mirroring the sync POST tests: - success with json body - empty body sends no json (parity with sync) - 4xx triggers raise_for_status -> httpx.HTTPError -> RetryError after 3 attempts (verifies retry policy + count) - non-httpx exception (ValueError from .json()) raised on first attempt (verifies the retry restriction)

Cogflow repo guideline: public (non-underscore) function/method docstrings must not name internal implementation dependencies — consumers shouldn't couple their mental model to what's under the hood. Only the public docstrings touched in this PR are changed; pre-existing mentions elsewhere in the module are out of scope. - make_async_post_request: drop httpx / tenacity mentions; abstract to "async HTTP client" and "retries up to N times". - register_llm_catalog_entry: drop MLflow mentions; use "tracking run", "tracking backend", "catalog service". Also reshape the source-tag note so it no longer spells the fallback string. - async_register_llm_catalog_entry: same; describe behaviour in terms of "async HTTP client" and "tracking service", not MLflow or uvicorn. - async_serve_llm: drop httpx / uvicorn / MLflow mentions; use "async HTTP client", "process", "tracking service". Private helpers (_prepare_llm_catalog_run, _build_llm_catalog_payload) and inline code comments keep the concrete lib names for maintainer clarity — same guideline's carve-out.

…llm-catalog-httpx async_serve_llm: use httpx for /models/log callback

- Pin syntax frontend to docker/dockerfile:1.4 instead of bare "1". Bare "1" floats to whatever is latest; 1.4 is the minimum that supports RUN --mount=type=bind and lets builds stay reproducible. - Add --no-cache-dir to both wheel-install and PyPI-fallback pip calls (plus the shap/xgboost install above) so pip's download/ build cache doesn't persist into an image layer. protobuf install already uses it. - Capture the find result once instead of running it twice. A single variable (wheel_paths) now backs both the count derivation and the install-target / stderr dump, eliminating the count-vs-path divergence risk Copilot flagged. Semantics unchanged: 0 / 1 / >1 branches behave the same.

…use-build-artifact-no-pypi-race release: install cogflow from build-job artifact, not PyPI

LLM pods are large, GPU-pinned, and almost always run as a single replica. Knative/Serverless mode requires the new revision to reach Ready before the old one is torn down — but on a one-GPU node the new pod can never schedule (the GPU is held by the old pod), so the rollout deadlocks. Defaulting to RawDeployment with ``strategy.type: Recreate`` makes the old pod terminate first, freeing the GPU for the new one. - Adds ``ServingManager._apply_raw_deployment_defaults`` static helper. Returns the (possibly augmented) predictor and the merged annotations dict. Caller-supplied ``serving.kserve.io/deploymentMode`` wins, so opting back into Serverless is a single annotation away. - ``deploymentStrategy`` is rejected by the KServe webhook for Serverless ISVCs, so it's only injected when the resolved deployment mode is RawDeployment. - Wired into both ``ServingManager.deploy_llm`` (sync) and ``AsyncServingManager._deploy_llm`` (async) so the two surfaces stay consistent.

Stage N+1 of the Cog-Engine async migration. ``cogflow/utils/network.py`` already had ``make_async_post_request`` (added in PR #91 to fix the cog-api → cogflow → cog-api self-deadlock). This PR completes the async surface so any callsite that's already inside an ``async def`` can issue any HTTP verb without blocking the event loop. New helpers, all using ``httpx.AsyncClient`` and matching their sync counterparts call-for-call: - ``make_async_get_request`` — same path-param + pagination semantics as the sync ``make_get_request``. Tenacity retries on ``httpx.HTTPError`` only (not on JSON-decode bugs). - ``make_async_delete_request`` — 200/202/204 → True; same retry-on-transport-error policy. - ``make_async_patch_request`` — JSON body when ``data`` is set, raises ``RetryError`` after 3 retries on a non-success. - ``make_async_get_request_raw`` — returns parsed JSON or ``None`` on transport failure (mirrors the sync helper's actual behaviour; the ``Response`` annotation in the original docstring was wrong). - ``make_async_health_check_request`` — no retry; any 2xx → True, anything else (including transport error) → False. ``make_async_get_request_stream`` deliberately omitted: streaming with ``httpx.AsyncClient`` requires the client to outlive the iteration, which doesn't fit the "single helper, return a Response" shape. Callers needing async streaming can use the ``AsyncExitStack`` + handed-off-flag pattern from ``S3Utils.fetch_s3_object_stream`` in Cog-Engine. Tests: extended ``_FakeAsyncClient`` to cover GET/DELETE/PATCH per-verb captured args + per-call response sequences for pagination. 17 new tests covering success, failure (RetryError after 3 retries on transport errors), and the no-retry-on-transport-failure shape of the raw + health-check helpers. ``test_make_async_get_request_pagination`` exercises the multi-page path with two distinct responses returned in sequence.

Add async_get_dataset / async_get_prometheus_dataset / async_delete_dataset on DatasetManager, plus an async_register_component function. All route through the async network helpers landed earlier in this PR so async callers (e.g. an async FastAPI route) no longer block the event loop on the registry round-trip. The MinIO upload prelude in async_register_component still runs sync inside asyncio.to_thread because the MinIO SDK has no async client; that consumes a worker thread but doesn't block the loop. Skipped on purpose: - async_register_dataset: needs multipart files= which the new async POST helper doesn't accept yet. - async_download_dataset: streaming response, needs the AsyncExitStack pattern (separate follow-up). Tests cover happy + error paths for each new public method.

@Retry

network.py - All async helpers now instantiate httpx.AsyncClient with follow_redirects=True to match sync requests' default. Without it the async path silently diverged on HTTP->HTTPS hops and any trailing-slash redirect (Copilot inline comments #1, #2, #5). - POST and PATCH success check switched from response.is_success (2xx only) to "not response.is_error" (<400) so behaviour matches sync requests.Response.ok exactly. Also closes the implicit-None return path on a 3xx end-state for PATCH that Copilot flagged on the original code (#4). - GET pagination loop switched from "not is_success" to "is_error" for the same reason (3xx pages should not silently break the loop). components.py - Dropped the unused make_async_get_request import (#6). - Tightened docstring: only the MinIO upload runs inside asyncio.to_thread, not the YAML parse or file read (#7). - async_register_component now raises CogflowComponentRegistryError when the registry check returns None (transport failure) instead of silently treating it as "no existing component" and risking a duplicate create during an intermittent registry outage (#8). tests - FakeAsyncResponse now exposes is_error derived from is_success so existing fixtures keep working without per-test changes. - New test_async_clients_follow_redirects pins down that every async helper instantiates httpx.AsyncClient with follow_redirects=True, so this divergence cannot be reintroduced. - New test_async_register_component_registry_unreachable covers the None-from-make_async_get_request_raw path. Copilot inline comment #3 (DELETE retries on 4xx/5xx through raise_for_status -> HTTPStatusError) is intentional: the sync make_delete_request behaves identically (RequestException is the base class, bare @Retry catches everything). The PR description will be updated separately to remove the misleading "transport-error-only" wording for DELETE.

network: async mirrors for GET/DELETE/PATCH/raw/health

…w-deployment-default Default LLM ISVCs to RawDeployment + Recreate strategy

SaiKireeti2905 and others added 16 commits October 30, 2025 18:02

refactor

8f5069c

abc

8b538b4

abc

2776813

refactor

8f71d95

fixed models

1e302aa

fixed models

506adce

fixed version

8fc93dc

Remove accidentally committed .venv311 virtual environment

8ad0c13

2.0.0b1

a8871e1

working models

ba4e0a7

added tests

9fcca9b

serving

861c9b6

datasets

1638170

v2 b1

f011f4f

v2 b1

5e096d2

Merge pull request #70 from HIRO-MicroDataCenters-BV/2.0.0b1

ae8d234

2.0.0b1

SaiKireeti2905 requested a review from Copilot January 20, 2026 08:54

Copilot AI reviewed Jan 20, 2026

View reviewed changes

SaiKireeti2905 and others added 12 commits January 22, 2026 07:41

2.0.0b1

d5fc9f7

Merge pull request #73 from HIRO-MicroDataCenters-BV/2.0.0b1

5f87c76

2.0.0b1

Bump version to 2.0.1b1

4b2ae0b

Add GitHub Actions workflow for build and release

24565d8

Add pytest-asyncio to dev dependencies for async tests

659166f

Mock K8s config loading in async tests for CI compatibility

9a5cfff

Merge pull request #74 from HIRO-MicroDataCenters-BV/release/v2.0.1b1

ce29176

Release cogflow v2.0.1b1

Add PyPI trusted publishing to release workflow

1d920fe

Split build-and-release into build → publish-pypi → github-release jobs. Publish job uses OIDC trusted publishing (no API tokens) and runs only on v* tags via a `pypi` GitHub environment.

alishalbaf and others added 2 commits April 21, 2026 13:36

Merge pull request #86 from HIRO-MicroDataCenters-BV/feature/llm-deri…

bcb79f4

…ve-names deploy_llm: derive isvc_name / served_model_name from hf_model_id

Bump version to 2.0.1b9 for LLM-name-derivation release

ed193ac

alishalbaf temporarily deployed to pypi April 21, 2026 11:40 — with GitHub Actions Inactive

alishalbaf and others added 11 commits April 21, 2026 14:04

Drive QEMU platforms from the computed output

9ceb364

Avoids a hardcoded platform list drifting from the step that gates the QEMU setup (Copilot review on the paired Cog-Engine PR #275).

Merge pull request #87 from HIRO-MicroDataCenters-BV/feature/release-…

a3018a9

…amd64-only-for-prereleases release.yml: amd64-only for pre-releases, cache, action bumps

Merge pull request #88 from HIRO-MicroDataCenters-BV/feature/serve-ll…

9f54a74

…m-with-catalog-registration serve_llm: one-call LLM deploy with MLflow-backed catalog registration

Bump version to 2.0.1b10 for serve_llm release

2c48a18

alishalbaf temporarily deployed to pypi April 21, 2026 22:15 — with GitHub Actions Inactive

alishalbaf and others added 15 commits April 22, 2026 01:02

Merge pull request #91 from HIRO-MicroDataCenters-BV/fix/async-serve-…

61cbaed

…llm-catalog-httpx async_serve_llm: use httpx for /models/log callback

Merge pull request #89 from HIRO-MicroDataCenters-BV/feature/release-…

4b7152e

…use-build-artifact-no-pypi-race release: install cogflow from build-job artifact, not PyPI

Merge pull request #93 from HIRO-MicroDataCenters-BV/feat/async-network

7db985f

network: async mirrors for GET/DELETE/PATCH/raw/health

Merge pull request #92 from HIRO-MicroDataCenters-BV/feat/llm-isvc-ra…

8b11394

…w-deployment-default Default LLM ISVCs to RawDeployment + Recreate strategy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor#72

Refactor#72
SaiKireeti2905 wants to merge 90 commits into
developfrom
refactor

SaiKireeti2905 commented Jan 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SaiKireeti2905 commented Jan 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants