-
Notifications
You must be signed in to change notification settings - Fork 2
577 add comprehensive instructions for khiops python development #580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
popescu-v
wants to merge
2
commits into
main
Choose a base branch
from
577-add-comprehensive-instructions-for-khiops-python-development
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,105 @@ | ||
| # Copilot Instructions for khiops-python | ||
|
|
||
| Use this file as the shared repository guide. When you work in a path covered by a | ||
| scoped instruction file, apply both this document and the matching file in | ||
| `.github/instructions/`. | ||
|
|
||
| ## Scoped Instruction Files | ||
|
|
||
| - `.github/instructions/python-changes.instructions.md` — Python source and test | ||
| changes (`**/*.py`) | ||
| - `.github/instructions/docker-changes.instructions.md` — development Docker image | ||
| changes (`packaging/docker/khiopspydev/**`) | ||
| - `.github/instructions/doc-changes.instructions.md` — documentation source changes | ||
| (`doc/**`) | ||
| - `.github/instructions/ci-workflows.instructions.md` — GitHub Actions workflow | ||
| changes (`.github/workflows/**`) | ||
|
|
||
| ## Architecture | ||
|
|
||
| Khiops Python is a Python interface to the **Khiops AutoML suite** for building | ||
| supervised models (classifiers, regressors, encoders) and unsupervised models | ||
| (coclusterings). It provides two ways to use Khiops from Python: | ||
|
|
||
| - **`khiops.core`** — The low-level API that drives the Khiops binaries via | ||
| dictionary files (`.kdic`, `.kdicj`) and tabular data files. The code which implements this API must depend only on Python built-in modules. | ||
| - `core.api` — public functions such as `train_predictor` and | ||
| `train_recoder` | ||
| - `core.dictionary`, — data classes for Khiops dictionary files (in the | ||
| `.kdic` and JSON `.kdicj` formats) | ||
| `core.coclustering_results` — data classes for Khiops report files | ||
| (`.khj`, `.khcj`) | ||
| - `core.internals.runner` — backend abstraction for local, Docker, and other | ||
| execution modes, configurable with `get_runner()` and `set_runner()` | ||
| - `core.internals.filesystems` — filesystem abstraction for local, S3, and | ||
| GCS access | ||
| - `core.internals.task`, `core.internals.tasks/` — task definitions for | ||
| Khiops operations | ||
| - **`khiops.sklearn`** — Scikit-Learn compatible estimators built on top of | ||
| `khiops.core`. The code which implements these estimators may depend on Pandas and Scikit-learn only. | ||
| ``` | ||
| KhiopsEstimator(ABC, BaseEstimator) | ||
| ├── KhiopsCoclustering(ClusterMixin) | ||
| └── KhiopsSupervisedEstimator | ||
| ├── KhiopsPredictor | ||
| │ ├── KhiopsClassifier(ClassifierMixin) | ||
| │ └── KhiopsRegressor(RegressorMixin) | ||
| └── KhiopsEncoder(TransformerMixin) | ||
| ``` | ||
| - `sklearn.dataset` — normalizes DataFrames, file paths, and multi-table | ||
| dictionaries into Khiops-compatible datasets | ||
| - **`khiops.extras`** — Optional integrations such as the Docker runner | ||
| - **`khiops.samples`** — Sample scripts, also used to generate parts of the | ||
| documentation via `doc/convert-samples-hook` | ||
|
|
||
| Keep changes inside these layer boundaries. | ||
|
|
||
| ## Shared Conventions | ||
|
|
||
| ### Python Style | ||
|
|
||
| - Use **paragraph-oriented programming**: group code into short paragraphs with | ||
| a comment header describing the intent, separated by blank lines. Avoid | ||
| commenting every line. | ||
| - Format Python code with **Black** (88-character line length) and sort imports | ||
| with **isort** using the Black profile. Configuration is in `pyproject.toml`. | ||
| - Black does not wrap long literal strings. Wrap those manually and use | ||
| `pylint --disable=all --enable=line-too-long khiops/` to find violations. | ||
| - Address all pylint **errors** (code E). Other pylint warnings are lower | ||
| priority — do not be a slave of the linter. | ||
| - Keep code and comments in English. | ||
| - `pylint: disable=invalid-name` is used in `khiops/sklearn/estimators.py` to | ||
| allow scikit-learn's `X` and `y` naming convention. Do not add that | ||
| suppression elsewhere. | ||
|
|
||
| ### Dependency Rules | ||
|
|
||
| - `khiops.core` must only import Python built-in modules. | ||
| - `khiops.sklearn` may directly depend on Pandas and Scikit-learn only. | ||
| - Do not add new external dependencies without discussion. Minimize external | ||
| package dependencies to reduce installation problems. | ||
| - Development and documentation generation dependencies (e.g., `black`, | ||
| `isort`, `sphinx`, `wrapt`, `furo`) can be more permissive, but still avoid | ||
| unnecessary additions. | ||
| - Test dependencies are listed in `test-requirements.txt` (`coverage`, `wrapt`). | ||
| Package dependencies are extracted from `pyproject.toml` at CI time via | ||
| `scripts/extract_dependencies_from_pyproject_toml.py`. | ||
|
|
||
| ### Python Support Policy | ||
|
|
||
| - CI tests run against Python 3.10–3.14. | ||
|
|
||
| ### Versioning | ||
|
|
||
| The project uses `MAJOR.MINOR.PATCH.INCREMENT[-PRE_RELEASE]`, where | ||
| `MAJOR.MINOR.PATCH` tracks the compatible Khiops native version and `INCREMENT` | ||
| tracks the Python package's own evolution. | ||
|
|
||
| For Pip and Conda packages, the dash before the pre-release atom is removed to | ||
| comply with | ||
| [Python version specifiers](https://packaging.python.org/en/latest/specifications/version-specifiers/#version-specifiers) | ||
| (e.g., `11.0.0.2a1` instead of `11.0.0.2-a.1`). | ||
|
|
||
| ## License | ||
|
|
||
| BSD 3-Clause-Clear. See `LICENSE.md`. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,120 @@ | ||
| --- | ||
| applyTo: ".github/workflows/**" | ||
| --- | ||
|
|
||
| # CI Workflow Changes | ||
|
|
||
| Use these rules for files under `.github/workflows/`. Apply the shared guidance | ||
| from `.github/copilot-instructions.md` first, then this workflow-specific | ||
| guidance. | ||
|
|
||
| ## Workflow Overview | ||
|
|
||
| This repository has seven GitHub Actions workflows in `.github/workflows/`. Most | ||
| workflows use concurrency groups to cancel in-progress runs when superseded, | ||
| except `release.yml` (no concurrency group) and `api-docs.yml` (which uses a | ||
| `pages` concurrency group that does not cancel in-progress runs). | ||
|
|
||
| ### `quick-checks.yml` | ||
|
|
||
| Runs pre-commit hooks on every pull request and on `workflow_dispatch`. Uses | ||
| Python 3.11 on `ubuntu-latest`. The hooks (configured in | ||
| `.pre-commit-config.yaml`) are: Black, pylint, isort (with special no-sections | ||
| config for sample files), yamlfix, shellcheck, GitHub workflow/action schema | ||
| validation (`check-github-workflows`, `check-github-actions`), and a local | ||
| `samples-generation` hook that regenerates reST samples when | ||
| `khiops/samples/samples.py` or `khiops/samples/samples_sklearn.py` change. | ||
|
|
||
| ### `tests.yml` | ||
|
|
||
| The main test suite. Triggers on PRs that touch `khiops/**/*.py`, | ||
| `tests/**/*.py`, `tests/resources/**` (excluding `tests/resources/**/*.md`), or | ||
| the workflow file itself. Also supports `workflow_dispatch`. | ||
|
|
||
| Three job groups: | ||
|
|
||
| - **`run`** (Linux matrix): Runs across Python 3.10–3.14 in custom Docker | ||
| containers (`ghcr.io/khiopsml/khiops-python/khiopspydev-ubuntu22.04`). Each | ||
| Python version uses a dedicated Conda environment with native Khiops. | ||
| Coverage is collected with `coverage` and reported as XML. Test results use | ||
| JUnit XML via `unittest-xml-reporting`. | ||
| - **`check-khiops-integration-on-linux`**: Runs integration tests on multiple | ||
| Linux containers (ubuntu22.04, rocky8, rocky9, debian13). Validates Khiops | ||
| status, runs samples, tests major-version mismatch detection with a | ||
| `py3_khiops10_conda` environment, and runs the integration test suite. | ||
| - **`check-khiops-integration-on-windows`**: Installs Khiops Desktop via NSIS | ||
| installer on Windows 2022 with Python 3.12. Runs integration tests and | ||
| samples outside a Python virtual environment, then installs khiops-python | ||
| inside a venv and validates the installation status. | ||
|
|
||
| **Expensive tests** (remote file access with S3/GCS/Azure): Skipped by default | ||
| on feature branches. Enabled on `main`/`main-v10` branches or via the | ||
| `run-expensive-tests` workflow dispatch input. These require GCP Workload | ||
| Identity Federation, a local fake S3 server, and Azure storage credentials. | ||
|
|
||
| **Environment variables**: `KHIOPS_SAMPLES_DIR` points to a checkout of | ||
| `khiopsml/khiops-samples`. `KHIOPS_PROC_NUMBER=4` forces MPI multi-process | ||
| execution. MPI oversubscribe flags are set for Open MPI 4.x and 5+. | ||
|
|
||
| ### `pip.yml` | ||
|
|
||
| Builds an **sdist** package (no wheel) and tests it in Docker containers | ||
| (ubuntu22.04, rocky9, debian13). Triggers on: | ||
|
|
||
| - Tag pushes (any tag) — automatically publishes to GitHub Releases | ||
| - PRs touching `pyproject.toml`, `LICENSE.md`, or the workflow file | ||
| - `workflow_dispatch` with optional `pypi-target` choice (`None`, `testpypi`, | ||
| `pypi`) | ||
|
|
||
| Publishing to TestPyPI/PyPI uses OIDC Trusted Publishing and requires the | ||
| corresponding GitHub environment (`testpypi` or `pypi`). Only runs for the | ||
| `KhiopsML` org on tag pushes. | ||
|
|
||
| ### `release.yml` | ||
|
|
||
| Manual workflow that merges `dev` into `main`, tags the merge commit with the | ||
| provided version, and resets `dev` to `main`. Only triggered via | ||
| `workflow_dispatch` with a `version` input. | ||
|
|
||
| ### `api-docs.yml` | ||
|
|
||
| Builds Sphinx documentation inside a dev Docker container. Triggers on: | ||
|
|
||
| - Tag pushes — builds docs and uploads a zip archive to GitHub Releases | ||
| - PRs touching `doc/**/*.rst`, `doc/create-doc`, `doc/clean-doc`, `doc/*.py`, | ||
| `khiops/**/*.py`, or the workflow file | ||
| - `workflow_dispatch` with optional tutorial and samples revision inputs | ||
|
|
||
| Uses the `khiopspydev-ubuntu22.04` Docker image and runs | ||
| `./create-doc -t -d -g <revision>`. Uses a `pages` concurrency group that does | ||
| **not** cancel in-progress runs (to avoid interrupting production deployments). | ||
|
|
||
| ### `dev-docker.yml` | ||
|
|
||
| Builds development Docker images for multiple OS targets (ubuntu22.04, rocky8, | ||
| rocky9, debian13) with configurable Khiops revision, server revision, Python | ||
| versions (3.10–3.14), and remote file driver versions (GCS, S3, Azure). | ||
| Triggers on PRs touching `packaging/docker/khiopspydev/Dockerfile.*` or the | ||
| workflow file, and on `workflow_dispatch`. Images are pushed to | ||
| `ghcr.io/khiopsml/khiops-python/khiopspydev-*` only when manually requested via | ||
| `push: true`. The `set-latest` flag only works on the `main` or `main-v10` | ||
| branches. | ||
|
|
||
| ### `test-conda-forge-package.yml` | ||
|
|
||
| Manual-only workflow that tests the released `khiops` Conda package on the | ||
| `conda-forge` channel across a broad matrix: Python 3.10–3.14 × multiple OS | ||
| environments (Ubuntu 20.04/22.04/24.04, Rocky 8/9, Windows 2022/2025, macOS | ||
| 14/15/15-Intel). Tests both normal Conda environments and "Conda-based | ||
| environments" (where `CONDA_PREFIX` is unset to simulate non-Conda invocation). | ||
|
|
||
| ## Editing Rules | ||
|
|
||
| - Workflow YAML files are validated by pre-commit hooks | ||
| (`check-github-workflows`, `check-github-actions`) and formatted by `yamlfix`. | ||
| - The dev Docker images are the test environment for both `tests.yml` and | ||
| `pip.yml`. If you need new system dependencies in CI, they go into the | ||
| Dockerfiles under `packaging/docker/khiopspydev/`. | ||
| - Test dependencies are in `test-requirements.txt` (`coverage`, `wrapt`). | ||
| Package dependencies are extracted from `pyproject.toml` at CI time via | ||
| `scripts/extract_dependencies_from_pyproject_toml.py`. |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add Azure too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will update this.