From b3216b2183e74571db6516efc8ad02f13db92a1a Mon Sep 17 00:00:00 2001 From: Popescu V <136721202+popescu-v@users.noreply.github.com> Date: Tue, 19 May 2026 12:25:34 +0200 Subject: [PATCH 1/2] Remove unused file --- .../Reviews.csv | 11 ----------- 1 file changed, 11 deletions(-) delete mode 100644 tests/resources/tmp/test_error_y_type_must_be_str_when_x_type_is_tuple/Reviews.csv diff --git a/tests/resources/tmp/test_error_y_type_must_be_str_when_x_type_is_tuple/Reviews.csv b/tests/resources/tmp/test_error_y_type_must_be_str_when_x_type_is_tuple/Reviews.csv deleted file mode 100644 index 716dd083..00000000 --- a/tests/resources/tmp/test_error_y_type_must_be_str_when_x_type_is_tuple/Reviews.csv +++ /dev/null @@ -1,11 +0,0 @@ -User_ID Age Clothing ID Date New Title Recommended IND Positive Feedback average class -60B2Xk_3Fw 33 767 2019-03-22 True Awesome 1 0.0 Intimates -J94geVHf_- 34 1080 2019-03-23 False Very lovely 1 4.3 Dresses -jsPsQUdVAL 60 1077 2019-03-24 True Some major design flaws 0 0.0 Dresses -tSSBwAcIvw 50 1049 2019-03-25 False My favorite buy! 1 0.5 Pants --I-UlX4n-B 47 847 2019-03-26 False Flattering shirt 1 6.0 Blouses -4TQsd3FX7i 49 1080 2019-03-27 True Not for the very petite 0 4.0 Dresses -7w824zHOgN 39 858 2019-03-28 True Cagrcoal shimmer fun 1 3.6 Knits -Cm6fu01r99 39 858 2019-03-29 True Shimmer, surprisingly goes with lots 1 4.0 Knits -zbbZRgbqar 24 1077 2019-03-30 False Flattering 1 0.0 Dresses -WfkfYVhQFy 34 1077 2019-03-31 False Such a fun dress! 1 0.0 Dresses From 02411cef46b06bc9d760c5032ed67dc9d4152d6e Mon Sep 17 00:00:00 2001 From: Popescu V <136721202+popescu-v@users.noreply.github.com> Date: Tue, 28 Apr 2026 17:07:43 +0200 Subject: [PATCH 2/2] Add Copilot instructions files - generic instruction file: .github/copilot-instructions.md - instruction file specific to the CI workflows: .github/instructions/ci-workflows.instructions.md - instruction file specific to the documentation generation: .github/instructions/doc-changes.instructions.md - instruction file specific to the development Docker image maintenance: .github/instructions/docker-changes.instructions.md - instruction file specific to the maintenance of the Python code itself: .github/instructions/python-changes.instructions.md --- .github/copilot-instructions.md | 105 +++++++++ .../instructions/ci-workflows.instructions.md | 120 ++++++++++ .../instructions/doc-changes.instructions.md | 208 ++++++++++++++++++ .../docker-changes.instructions.md | 147 +++++++++++++ .../python-changes.instructions.md | 85 +++++++ 5 files changed, 665 insertions(+) create mode 100644 .github/copilot-instructions.md create mode 100644 .github/instructions/ci-workflows.instructions.md create mode 100644 .github/instructions/doc-changes.instructions.md create mode 100644 .github/instructions/docker-changes.instructions.md create mode 100644 .github/instructions/python-changes.instructions.md diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 00000000..1a70b6c8 --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,105 @@ +# Copilot Instructions for khiops-python + +Use this file as the shared repository guide. When you work in a path covered by a +scoped instruction file, apply both this document and the matching file in +`.github/instructions/`. + +## Scoped Instruction Files + +- `.github/instructions/python-changes.instructions.md` — Python source and test + changes (`**/*.py`) +- `.github/instructions/docker-changes.instructions.md` — development Docker image + changes (`packaging/docker/khiopspydev/**`) +- `.github/instructions/doc-changes.instructions.md` — documentation source changes + (`doc/**`) +- `.github/instructions/ci-workflows.instructions.md` — GitHub Actions workflow + changes (`.github/workflows/**`) + +## Architecture + +Khiops Python is a Python interface to the **Khiops AutoML suite** for building +supervised models (classifiers, regressors, encoders) and unsupervised models +(coclusterings). It provides two ways to use Khiops from Python: + +- **`khiops.core`** — The low-level API that drives the Khiops binaries via + dictionary files (`.kdic`, `.kdicj`) and tabular data files. The code which implements this API must depend only on Python built-in modules. + - `core.api` — public functions such as `train_predictor` and + `train_recoder` + - `core.dictionary`, — data classes for Khiops dictionary files (in the + `.kdic` and JSON `.kdicj` formats) + `core.coclustering_results` — data classes for Khiops report files + (`.khj`, `.khcj`) + - `core.internals.runner` — backend abstraction for local, Docker, and other + execution modes, configurable with `get_runner()` and `set_runner()` + - `core.internals.filesystems` — filesystem abstraction for local, S3, and + GCS access + - `core.internals.task`, `core.internals.tasks/` — task definitions for + Khiops operations +- **`khiops.sklearn`** — Scikit-Learn compatible estimators built on top of + `khiops.core`. The code which implements these estimators may depend on Pandas and Scikit-learn only. + ``` + KhiopsEstimator(ABC, BaseEstimator) + ├── KhiopsCoclustering(ClusterMixin) + └── KhiopsSupervisedEstimator + ├── KhiopsPredictor + │ ├── KhiopsClassifier(ClassifierMixin) + │ └── KhiopsRegressor(RegressorMixin) + └── KhiopsEncoder(TransformerMixin) + ``` + - `sklearn.dataset` — normalizes DataFrames, file paths, and multi-table + dictionaries into Khiops-compatible datasets +- **`khiops.extras`** — Optional integrations such as the Docker runner +- **`khiops.samples`** — Sample scripts, also used to generate parts of the + documentation via `doc/convert-samples-hook` + +Keep changes inside these layer boundaries. + +## Shared Conventions + +### Python Style + +- Use **paragraph-oriented programming**: group code into short paragraphs with + a comment header describing the intent, separated by blank lines. Avoid + commenting every line. +- Format Python code with **Black** (88-character line length) and sort imports + with **isort** using the Black profile. Configuration is in `pyproject.toml`. +- Black does not wrap long literal strings. Wrap those manually and use + `pylint --disable=all --enable=line-too-long khiops/` to find violations. +- Address all pylint **errors** (code E). Other pylint warnings are lower + priority — do not be a slave of the linter. +- Keep code and comments in English. +- `pylint: disable=invalid-name` is used in `khiops/sklearn/estimators.py` to + allow scikit-learn's `X` and `y` naming convention. Do not add that + suppression elsewhere. + +### Dependency Rules + +- `khiops.core` must only import Python built-in modules. +- `khiops.sklearn` may directly depend on Pandas and Scikit-learn only. +- Do not add new external dependencies without discussion. Minimize external + package dependencies to reduce installation problems. +- Development and documentation generation dependencies (e.g., `black`, + `isort`, `sphinx`, `wrapt`, `furo`) can be more permissive, but still avoid + unnecessary additions. +- Test dependencies are listed in `test-requirements.txt` (`coverage`, `wrapt`). + Package dependencies are extracted from `pyproject.toml` at CI time via + `scripts/extract_dependencies_from_pyproject_toml.py`. + +### Python Support Policy + +- CI tests run against Python 3.10–3.14. + +### Versioning + +The project uses `MAJOR.MINOR.PATCH.INCREMENT[-PRE_RELEASE]`, where +`MAJOR.MINOR.PATCH` tracks the compatible Khiops native version and `INCREMENT` +tracks the Python package's own evolution. + +For Pip and Conda packages, the dash before the pre-release atom is removed to +comply with +[Python version specifiers](https://packaging.python.org/en/latest/specifications/version-specifiers/#version-specifiers) +(e.g., `11.0.0.2a1` instead of `11.0.0.2-a.1`). + +## License + +BSD 3-Clause-Clear. See `LICENSE.md`. diff --git a/.github/instructions/ci-workflows.instructions.md b/.github/instructions/ci-workflows.instructions.md new file mode 100644 index 00000000..f4f8ff79 --- /dev/null +++ b/.github/instructions/ci-workflows.instructions.md @@ -0,0 +1,120 @@ +--- +applyTo: ".github/workflows/**" +--- + +# CI Workflow Changes + +Use these rules for files under `.github/workflows/`. Apply the shared guidance +from `.github/copilot-instructions.md` first, then this workflow-specific +guidance. + +## Workflow Overview + +This repository has seven GitHub Actions workflows in `.github/workflows/`. Most +workflows use concurrency groups to cancel in-progress runs when superseded, +except `release.yml` (no concurrency group) and `api-docs.yml` (which uses a +`pages` concurrency group that does not cancel in-progress runs). + +### `quick-checks.yml` + +Runs pre-commit hooks on every pull request and on `workflow_dispatch`. Uses +Python 3.11 on `ubuntu-latest`. The hooks (configured in +`.pre-commit-config.yaml`) are: Black, pylint, isort (with special no-sections +config for sample files), yamlfix, shellcheck, GitHub workflow/action schema +validation (`check-github-workflows`, `check-github-actions`), and a local +`samples-generation` hook that regenerates reST samples when +`khiops/samples/samples.py` or `khiops/samples/samples_sklearn.py` change. + +### `tests.yml` + +The main test suite. Triggers on PRs that touch `khiops/**/*.py`, +`tests/**/*.py`, `tests/resources/**` (excluding `tests/resources/**/*.md`), or +the workflow file itself. Also supports `workflow_dispatch`. + +Three job groups: + +- **`run`** (Linux matrix): Runs across Python 3.10–3.14 in custom Docker + containers (`ghcr.io/khiopsml/khiops-python/khiopspydev-ubuntu22.04`). Each + Python version uses a dedicated Conda environment with native Khiops. + Coverage is collected with `coverage` and reported as XML. Test results use + JUnit XML via `unittest-xml-reporting`. +- **`check-khiops-integration-on-linux`**: Runs integration tests on multiple + Linux containers (ubuntu22.04, rocky8, rocky9, debian13). Validates Khiops + status, runs samples, tests major-version mismatch detection with a + `py3_khiops10_conda` environment, and runs the integration test suite. +- **`check-khiops-integration-on-windows`**: Installs Khiops Desktop via NSIS + installer on Windows 2022 with Python 3.12. Runs integration tests and + samples outside a Python virtual environment, then installs khiops-python + inside a venv and validates the installation status. + +**Expensive tests** (remote file access with S3/GCS/Azure): Skipped by default +on feature branches. Enabled on `main`/`main-v10` branches or via the +`run-expensive-tests` workflow dispatch input. These require GCP Workload +Identity Federation, a local fake S3 server, and Azure storage credentials. + +**Environment variables**: `KHIOPS_SAMPLES_DIR` points to a checkout of +`khiopsml/khiops-samples`. `KHIOPS_PROC_NUMBER=4` forces MPI multi-process +execution. MPI oversubscribe flags are set for Open MPI 4.x and 5+. + +### `pip.yml` + +Builds an **sdist** package (no wheel) and tests it in Docker containers +(ubuntu22.04, rocky9, debian13). Triggers on: + +- Tag pushes (any tag) — automatically publishes to GitHub Releases +- PRs touching `pyproject.toml`, `LICENSE.md`, or the workflow file +- `workflow_dispatch` with optional `pypi-target` choice (`None`, `testpypi`, + `pypi`) + +Publishing to TestPyPI/PyPI uses OIDC Trusted Publishing and requires the +corresponding GitHub environment (`testpypi` or `pypi`). Only runs for the +`KhiopsML` org on tag pushes. + +### `release.yml` + +Manual workflow that merges `dev` into `main`, tags the merge commit with the +provided version, and resets `dev` to `main`. Only triggered via +`workflow_dispatch` with a `version` input. + +### `api-docs.yml` + +Builds Sphinx documentation inside a dev Docker container. Triggers on: + +- Tag pushes — builds docs and uploads a zip archive to GitHub Releases +- PRs touching `doc/**/*.rst`, `doc/create-doc`, `doc/clean-doc`, `doc/*.py`, + `khiops/**/*.py`, or the workflow file +- `workflow_dispatch` with optional tutorial and samples revision inputs + +Uses the `khiopspydev-ubuntu22.04` Docker image and runs +`./create-doc -t -d -g `. Uses a `pages` concurrency group that does +**not** cancel in-progress runs (to avoid interrupting production deployments). + +### `dev-docker.yml` + +Builds development Docker images for multiple OS targets (ubuntu22.04, rocky8, +rocky9, debian13) with configurable Khiops revision, server revision, Python +versions (3.10–3.14), and remote file driver versions (GCS, S3, Azure). +Triggers on PRs touching `packaging/docker/khiopspydev/Dockerfile.*` or the +workflow file, and on `workflow_dispatch`. Images are pushed to +`ghcr.io/khiopsml/khiops-python/khiopspydev-*` only when manually requested via +`push: true`. The `set-latest` flag only works on the `main` or `main-v10` +branches. + +### `test-conda-forge-package.yml` + +Manual-only workflow that tests the released `khiops` Conda package on the +`conda-forge` channel across a broad matrix: Python 3.10–3.14 × multiple OS +environments (Ubuntu 20.04/22.04/24.04, Rocky 8/9, Windows 2022/2025, macOS +14/15/15-Intel). Tests both normal Conda environments and "Conda-based +environments" (where `CONDA_PREFIX` is unset to simulate non-Conda invocation). + +## Editing Rules + +- Workflow YAML files are validated by pre-commit hooks + (`check-github-workflows`, `check-github-actions`) and formatted by `yamlfix`. +- The dev Docker images are the test environment for both `tests.yml` and + `pip.yml`. If you need new system dependencies in CI, they go into the + Dockerfiles under `packaging/docker/khiopspydev/`. +- Test dependencies are in `test-requirements.txt` (`coverage`, `wrapt`). + Package dependencies are extracted from `pyproject.toml` at CI time via + `scripts/extract_dependencies_from_pyproject_toml.py`. diff --git a/.github/instructions/doc-changes.instructions.md b/.github/instructions/doc-changes.instructions.md new file mode 100644 index 00000000..d448674c --- /dev/null +++ b/.github/instructions/doc-changes.instructions.md @@ -0,0 +1,208 @@ +--- +applyTo: "doc/**" +--- + +# Documentation Changes + +Use these rules for files under `doc/`. Apply the shared guidance from +`.github/copilot-instructions.md` first, then this documentation-specific +guidance. + +## Folder Structure + +``` +doc/ +├── conf.py # Sphinx configuration +├── index.rst # Top-level doc page +├── create-doc # Full build script (tutorials + Sphinx) +├── clean-doc # Clean script (supports --clean-tutorial) +├── convert-samples-hook # Pre-commit hook: regenerates sample reST + notebooks +├── convert_samples.py # Converts samples.py / samples_sklearn.py to reST or .ipynb +├── convert_tutorials.py # Converts tutorial Jupyter notebooks to reST +├── requirements.txt # Python doc-build dependencies +├── multi_table_primer.rst # Multi-table learning guide +├── notes.rst # API notes (common params, input types, sampling) +├── core/index.rst # khiops.core API reference (autosummary) +├── sklearn/index.rst # khiops.sklearn API reference (autosummary) +├── internal/index.rst # Internal modules reference +├── tools/index.rst # khiops.tools reference +├── samples/ # Generated reST sample pages (via convert-samples-hook) +├── tutorials/ # Generated reST tutorials (via create-doc -t) +├── _static/ # CSS and images (branding, logo) +└── _templates/autosummary/ # Custom autosummary templates (class, function, method, module) +``` + +## Build and Validation + +```bash +cd doc + +# Install doc dependencies (do NOT create a virtualenv inside doc/ — Sphinx will process its .rst files) +pip install -U -r requirements.txt + +# Also requires: +# - A system-wide pandoc installation (used by nbconvert for notebook→reST conversion) +# - The 'black' Python package (used by convert_samples.py to format code snippets) + +# Regenerate reST samples and notebooks from samples.py / samples_sklearn.py. +# This hook also runs automatically via pre-commit when those files are modified. +./convert-samples-hook + +# Full build: download tutorials, convert notebooks to reST, run Sphinx +./create-doc -d -t + +# Incremental build (Sphinx only, after reST files are already generated): +sphinx-build -M html . _build/ + +# Clean generated docs (add --clean-tutorial to also remove tutorials/ and khiops-python-tutorial/) +./clean-doc +``` + +The `create-doc` script requires `tar`, `python`, `make`, `zip`, and `git` (if +downloading tutorials). Output goes to `doc/_build/html/`. + +The `create-doc` script accepts the following options: + +- `-d` — Download the khiops-python-tutorial repository (implies `-t`) +- `-t` — Transform tutorial Jupyter notebooks into reST +- `-r REPO_URL` — Set the tutorial repository URL +- `-g GIT_REF` — Set the tutorial repository Git reference (branch or tag) +- `-l DIR` — Set the local directory of the tutorial repository + +## CI Workflow + +The **API Docs** workflow (`.github/workflows/api-docs.yml`) triggers on: + +- **Tag pushes** — builds docs and uploads a zip archive to GitHub Releases as + a prerelease (with `allowUpdates: true`) +- **PRs** touching `doc/**.rst`, `doc/create-doc`, `doc/clean-doc`, `doc/*.py`, + `khiops/**.py`, or the workflow file itself +- **`workflow_dispatch`** with optional inputs: + - `khiops-python-tutorial-revision` (default: `11.0.0.0`) + - `khiops-samples-revision` (default: `11.0.0`) + - `image-tag` (default: `latest`) — the dev Docker image tag + +The workflow uses a concurrency group (`pages`) so only one deployment runs at a +time — queued runs are skipped but in-progress runs are never cancelled. + +**Build job** — runs inside the +`ghcr.io/khiopsml/khiops-python/khiopspydev-ubuntu22.04:` Docker +image: + +1. Installs the khiops-python package itself (`pip install .`) +2. Downloads sample datasets via `kh-download-datasets` +3. Installs doc Python requirements from `doc/requirements.txt` +4. Runs `./create-doc -t -d -g ` +5. Uploads the built HTML as a `api-docs` artifact + +**Release job** (tag pushes only) — downloads the artifact, zips it, and +uploads the zip to GitHub Releases. + +## Sphinx Setup + +- **Engine**: Sphinx with the [Furo](https://pradyunsg.me/furo/) theme + (Orange-branded colors and Helvetica Neue font) +- **Docstring format**: [NumPy style](https://numpydoc.readthedocs.io/en/latest/format.html) parsed by the `numpydoc` extension + (`numpydoc_show_class_members = False`) +- **Extensions**: `autodoc`, `autosummary`, `intersphinx`, `numpydoc`, + `sphinx_copybutton` +- **Intersphinx targets**: Python, pandas, scikit-learn, NumPy, SciPy +- **Custom templates**: `_templates/autosummary/` provides templates for + `class.rst`, `function.rst`, `method.rst`, `module.rst` +- **Strict mode**: `nitpicky = True` — broken references are errors +- **Default role**: `obj` (configured as `default_role = "obj"` in `conf.py`) — + allows cross-referencing without explicit `:class:`/`:func:` qualifiers in + most cases +- **Warning suppression**: `conf.py` defines a `suppress_sklearn_warnings` + callback that silences known false-positive missing-reference warnings for + sklearn variables (`X`, `y`) and tutorial literals +- Sphinx warnings **should not be ignored** — they almost always indicate + rendering errors + +## Docstring Conventions + +### Parameters and Attributes (NumPy format) + +**Always put a space before the colon** or the rendering will break: + +``` +# Mandatory parameter +some_param : str + Description ending in a period. + +# Optional parameter +some_param : str, optional + Description ending in a period. + +# Optional with default +some_param : int, default 10 + Description ending in a period. +``` + +### Punctuation Rules + +- Docstring title: **no punctuation**. Put details in the long description. +- Parameter/attribute header: only a colon, no trailing period. +- Parameter/attribute description: **must end in a period**. + +```python +# Correct +def train(data): + """Trains a model + + Trains a supervised model on the provided dataset. + """ +``` + +### Verbatim Markup + +Use for: Python constants (`True`, `None`), file names/extensions, parameter names. +Do **not** use for: string values (use double quotes), numeric values. + +### Container Types + +Keep concise — use `list of ` for simple cases. For complex containers, put +`list` or `dict` and describe contents in the description body. + +### Type Referencing + +Use type referencing (backtick cross-references) only for complex types and +Exceptions. Do not use it for built-in types like `str` or `int`. + +``` +# No — str and int link to the Python docs unnecessarily: +some_string : `str` +some_int : `int` + +# Yes — Khiops internal class: +dictionary : `.Dictionary` + +# Yes — Pandas project class (via intersphinx): +df : `pandas.DataFrame` + +# Yes — Exception: +Raises +------ +`ValueError` + When something wrong happens. +``` + +### Cross-References + +```rst +`~khiops.core.api.train_predictor` # shows "train_predictor" (short form) +`khiops.core.api.train_predictor` # shows full path +`.train_predictor` # wildcard — works if unambiguous +`train_predictor` # within the same module +``` + +Use explicit `:func:`, `:class:` domains only for complex types and Exceptions. +The `default_role = "obj"` setting handles most cases. + +## reStructuredText Pitfalls + +The docstrings use **reST, not Markdown**. Key differences: + +- **Lists** require an empty line before the first item +- **Code blocks** use `::` (with empty line + indentation) or `.. code-block:: python` +- **Links**: `` `Link text `_ `` instead of `[text](url)` diff --git a/.github/instructions/docker-changes.instructions.md b/.github/instructions/docker-changes.instructions.md new file mode 100644 index 00000000..da5625b6 --- /dev/null +++ b/.github/instructions/docker-changes.instructions.md @@ -0,0 +1,147 @@ +--- +applyTo: "packaging/docker/khiopspydev/**" +--- + +# Docker Packaging Changes + +Use these rules for files under `packaging/docker/khiopspydev/`. Apply the +shared guidance from `.github/copilot-instructions.md` first, then this +packaging-specific guidance. + +## Scope + +The Dockerfiles in `packaging/docker/khiopspydev/` build **development images** +used by the CI workflows `tests.yml`, `pip.yml`, and `api-docs.yml` to run +tests, build packages, and generate documentation. They are built and published +by the `dev-docker.yml` workflow. + +## Image Variants + +There are three Dockerfiles, one per OS family: + +| Dockerfile | OS targets | Package manager | Remote file drivers | +|---|---|---|---| +| `Dockerfile.ubuntu` | Ubuntu 22.04 | apt | System-wide `.deb` (GCS, S3, Azure) + fakeS3 | +| `Dockerfile.debian` | Debian 13 | apt | System-wide bookworm `.deb` (GCS, S3, Azure) + fakeS3 | +| `Dockerfile.rocky` | Rocky 8, Rocky 9 | dnf | None | + +All images are published to `ghcr.io/khiopsml/khiops-python/khiopspydev-`. + +### Debian and Ubuntu + +`Dockerfile.debian` and `Dockerfile.ubuntu` are nearly identical. They diverge +because Debian 13 remote file driver packages are not available, so Debian +forces the Debian 12 (bookworm) builds for the GCS, S3, and Azure driver `.deb` +packages. Any shared change should be applied to both files. There is an open +TODO to unify them (see the comment at the top of `Dockerfile.debian`). + +### Rocky + +`Dockerfile.rocky` uses `dnf` instead of `apt` and installs Python 3.11 +explicitly on Rocky 8/9 (which ship with Python ≤ 3.9). It does **not** install +system-wide remote file drivers or fakeS3 (no Ruby/gem support). It also does +not copy `run_fake_remote_file_servers.sh` (only `run_service.sh`). + +## Build Arguments + +All Dockerfiles accept these `ARG` values, supplied by `dev-docker.yml`: + +| ARG | Description | Example | +|---|---|---| +| `KHIOPSDEV_OS` | OS tag for the base image (`ubuntu22.04`, `rocky8`, `rocky9`, `debian13`) | `ubuntu22.04` | +| `KHIOPS_REVISION` | Khiops native release tag to install | `11.0.0` | +| `SERVER_REVISION` | Git ref for the `khiops-server` image (copied into the final stage) | `main` | +| `PYTHON_VERSIONS` | Space-separated Python versions for Conda environments | `3.10 3.11 3.12 3.13 3.14` | +| `KHIOPS_GCS_DRIVER_REVISION` | GCS remote file driver version (Debian/Ubuntu only) | `0.0.16` | +| `KHIOPS_S3_DRIVER_REVISION` | S3 remote file driver version (Debian/Ubuntu only) | `0.0.15` | +| `KHIOPS_AZURE_DRIVER_REVISION` | Azure remote file driver version (Debian/Ubuntu only) | `0.0.6` | + +## Multi-Stage Build Structure + +Each Dockerfile uses a multi-stage build: + +1. **`khiopsdev`** — Based on `ghcr.io/khiopsml/khiops/khiopsdev-:latest`. + Installs dev tools (git, pip, pandoc, wget), the Khiops native binary, and + Miniforge for Conda. On Debian/Ubuntu, also installs system-wide remote file + drivers (GCS, S3, Azure `.deb` packages) and `ruby-dev` (needed for fakeS3). +2. **`server`** — Pulls the `khiops-server` binary from + `ghcr.io/khiopsml/khiops-server:`. +3. **`base`** (final) — Copies the server binary into the `khiopsdev` stage. On + Ubuntu/Debian, also installs fakeS3 via Ruby gem and exposes port 4569. + +## Conda Environments + +For each Python version in `PYTHON_VERSIONS`, a Conda environment is created: + +- **`py`** — Bare Python (for pip-based test installs). + +A special **`py3_khiops10_conda`** environment is always created with +`khiops-core==10.3.2` to test backward compatibility with Khiops major +version 10. + +## Helper Scripts + +| Script | Purpose | +|---|---| +| `run_service.sh` | Runs `/usr/bin/service` (the khiops-server binary) if present; otherwise exits. Copied into all images. | +| `run_fake_remote_file_servers.sh` | Launches fakeS3 in background on the port extracted from `$AWS_ENDPOINT_URL`. Serves pre-provisioned files from `tests/resources/remote-access`. Copied into Ubuntu/Debian images only. | + +## CI Workflow Integration + +### `dev-docker.yml` (Build and Push) + +- **Triggers**: PRs touching `Dockerfile.*` or the workflow file; + `workflow_dispatch` for manual builds. +- **Matrix**: `ubuntu22.04`, `rocky8`, `rocky9`, `debian13`. +- **Concurrency**: Per-workflow + per-PR/ref, with `cancel-in-progress: true`. +- **Push**: Only on manual dispatch with `push: true`. The `set-latest` tag is + restricted to the `main` or `main-v10` branches. +- **Image tags**: `.` (e.g., `11.0.0.0`), + optionally also `latest`. +- **Build context**: `./packaging/docker/khiopspydev/`. +- **Build args**: Passes `KHIOPS_REVISION`, `KHIOPSDEV_OS`, `SERVER_REVISION`, + `PYTHON_VERSIONS`, `KHIOPS_GCS_DRIVER_REVISION`, `KHIOPS_S3_DRIVER_REVISION`, + and `KHIOPS_AZURE_DRIVER_REVISION`. +- **Important**: The `add-hosts: s3-bucket.localhost:127.0.0.1` input is required + because buildx mounts `/etc/hosts` read-only, so the fakeS3 hostname cannot + be added inside the Dockerfile. + +### `tests.yml` (Consumer) + +Runs the main test matrix inside `khiopspydev-ubuntu22.04` containers across +Python 3.10–3.14. Integration tests also run on `rocky8`, `rocky9`, and +`debian13` containers. + +### `pip.yml` (Consumer) + +Builds and tests the source distribution package inside +`khiopspydev-ubuntu22.04`, `khiopspydev-rocky9`, and `khiopspydev-debian13` +containers. + +### `api-docs.yml` (Consumer) + +Builds the Sphinx documentation inside the `khiopspydev-ubuntu22.04` container. + +## Editing Rules + +- **Apply shared changes to all relevant Dockerfiles.** Ubuntu and Debian + Dockerfiles are near-duplicates; always update both. +- **Bump Miniforge version** by updating the download URL, filename, and SHA-256 + checksum in all three Dockerfiles. +- **Adding Python versions**: Update `DEFAULT_PYTHON_VERSIONS` in + `dev-docker.yml` and add the new version to the `matrix.python-version` lists + in consumer workflows (`tests.yml`, etc.). +- **Adding system dependencies**: Install them in the appropriate `RUN` block + (apt for Debian/Ubuntu, dnf for Rocky). +- **Remote file drivers**: Version bumps go in the + `DEFAULT_KHIOPS_GCS_DRIVER_REVISION` / `DEFAULT_KHIOPS_S3_DRIVER_REVISION` / + `DEFAULT_KHIOPS_AZURE_DRIVER_REVISION` env vars in `dev-docker.yml`. Note: + there is a known workaround in the Ubuntu and Debian Dockerfiles for a release + tag typo in the Azure driver repository (the download URL hard-codes tag + `0.0.7` regardless of the revision ARG — see the `XXX` comment). +- **fakeS3**: Pinned to version `1.2.1` because `>= 1.3` requires a license key. + If fakeS3 becomes incompatible, consider alternatives (the Dockerfile comments + mention `s3rver` as a candidate). +- After modifying Dockerfiles, **images are rebuilt and pushed** via a manual + `dev-docker.yml` run with `push: true` before merging, so that consumer + workflows pick up the changes. diff --git a/.github/instructions/python-changes.instructions.md b/.github/instructions/python-changes.instructions.md new file mode 100644 index 00000000..6845d824 --- /dev/null +++ b/.github/instructions/python-changes.instructions.md @@ -0,0 +1,85 @@ +--- +applyTo: "**/*.py" +--- + +# Python Code Changes + +Use these rules for any Python source or test change. Apply the shared guidance +from `.github/copilot-instructions.md` first, then these Python-specific rules. + +## Architecture and Dependencies + +- `khiops.core` is the low-level API that drives the Khiops binary and must only + depend on Python built-in modules. +- `khiops.sklearn` builds on top of `khiops.core` and may directly depend on + pandas and scikit-learn only. +- Keep changes inside these layer boundaries and do not introduce new external + dependencies without asking. +- Extra, optional, dependencies are required for the remote-access filesystem + API in `core.internals.filesystems`, viz. for accessing S3, GCS, and Azure + remote storages. +- Test dependencies are listed in `test-requirements.txt` (`coverage`, `wrapt`). + Package dependencies are declared in `pyproject.toml`. + +## Style and Formatting + +- Format with **Black** (88 char line length) and sort imports with **isort** + (Black profile). Configuration is in `pyproject.toml`. +- **isort exception**: `khiops/samples/samples.py` and + `khiops/samples/samples_sklearn.py` are handled by a separate `isort-samples` + pre-commit hook with `--no-sections` (no import section grouping). +- Wrap long literal strings to stay under 88 chars. +- Pylint **hard failures** (`fail-on` in `pyproject.toml`): all errors (code E), + `line-too-long`, `unused-variable`, and `unused-import`. The overall pylint + score must stay at or above `9.9` (`fail-under`). Other pylint warnings are + lower priority. +- Pylint is **not run** on `doc/convert_samples.py` and `doc/conf.py` (excluded + in `.pre-commit-config.yaml`). +- All code and comments must be in English. +- `pylint: disable=invalid-name` is used in `khiops/sklearn/estimators.py` to + permit scikit-learn's `X`, `y` naming convention. Do not add this suppression + elsewhere. + +## Paragraph-Oriented Programming + +Structure code as **paragraphs**: a comment header describing the intent, +followed by the code body, separated by blank lines. + +```python +def value_count(values): + """Prints the counts of each unique value in an array""" + + # Initialize the counts dictionary + counts = {} + + # Count the unique occurrences in values + for value in values: + if value in counts: + counts[value] += 1 + else: + counts[value] = 1 + + # Print the counts + for value, count in counts.items(): + print(f"{value}: {count}") +``` + +Exceptions where a paragraph header is not needed: +- Return statements +- Loop variable assignments (e.g., in `while` loops) +- Very short and obvious methods where the docstring suffices + +Keep the number of paragraphs minimal. Commenting every line technically +conforms but defeats the purpose. + +## Pre-Commit Hooks + +The `.pre-commit-config.yaml` runs the following hooks on Python files: + +| Hook | Scope | Notes | +|---|---|---| +| **black** | All `.py` files | Code formatting | +| **pylint** | All `.py` except `doc/convert_samples.py`, `doc/conf.py` | Linting | +| **isort** | All `.py` except samples scripts | Import sorting (Black profile) | +| **isort-samples** | `khiops/samples/samples.py`, `samples_sklearn.py` | Import sorting with `--no-sections` | +| **samples-generation** | Triggered by changes to samples scripts | Runs `doc/convert-samples-hook` to regenerate reST pages and notebooks |