Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 105 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Copilot Instructions for khiops-python

Use this file as the shared repository guide. When you work in a path covered by a
scoped instruction file, apply both this document and the matching file in
`.github/instructions/`.

## Scoped Instruction Files

- `.github/instructions/python-changes.instructions.md` — Python source and test
changes (`**/*.py`)
- `.github/instructions/docker-changes.instructions.md` — development Docker image
changes (`packaging/docker/khiopspydev/**`)
- `.github/instructions/doc-changes.instructions.md` — documentation source changes
(`doc/**`)
- `.github/instructions/ci-workflows.instructions.md` — GitHub Actions workflow
changes (`.github/workflows/**`)

## Architecture

Khiops Python is a Python interface to the **Khiops AutoML suite** for building
supervised models (classifiers, regressors, encoders) and unsupervised models
(coclusterings). It provides two ways to use Khiops from Python:

- **`khiops.core`** — The low-level API that drives the Khiops binaries via
dictionary files (`.kdic`, `.kdicj`) and tabular data files. The code which implements this API must depend only on Python built-in modules.
- `core.api` — public functions such as `train_predictor` and
`train_recoder`
- `core.dictionary`, — data classes for Khiops dictionary files (in the
`.kdic` and JSON `.kdicj` formats)
`core.coclustering_results` — data classes for Khiops report files
(`.khj`, `.khcj`)
- `core.internals.runner` — backend abstraction for local, Docker, and other
execution modes, configurable with `get_runner()` and `set_runner()`
- `core.internals.filesystems` — filesystem abstraction for local, S3, and
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add Azure too

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will update this.

GCS access
- `core.internals.task`, `core.internals.tasks/` — task definitions for
Khiops operations
- **`khiops.sklearn`** — Scikit-Learn compatible estimators built on top of
`khiops.core`. The code which implements these estimators may depend on Pandas and Scikit-learn only.
```
KhiopsEstimator(ABC, BaseEstimator)
├── KhiopsCoclustering(ClusterMixin)
└── KhiopsSupervisedEstimator
├── KhiopsPredictor
│ ├── KhiopsClassifier(ClassifierMixin)
│ └── KhiopsRegressor(RegressorMixin)
└── KhiopsEncoder(TransformerMixin)
```
- `sklearn.dataset` — normalizes DataFrames, file paths, and multi-table
dictionaries into Khiops-compatible datasets
- **`khiops.extras`** — Optional integrations such as the Docker runner
- **`khiops.samples`** — Sample scripts, also used to generate parts of the
documentation via `doc/convert-samples-hook`

Keep changes inside these layer boundaries.

## Shared Conventions

### Python Style

- Use **paragraph-oriented programming**: group code into short paragraphs with
a comment header describing the intent, separated by blank lines. Avoid
commenting every line.
- Format Python code with **Black** (88-character line length) and sort imports
with **isort** using the Black profile. Configuration is in `pyproject.toml`.
- Black does not wrap long literal strings. Wrap those manually and use
`pylint --disable=all --enable=line-too-long khiops/` to find violations.
- Address all pylint **errors** (code E). Other pylint warnings are lower
priority — do not be a slave of the linter.
- Keep code and comments in English.
- `pylint: disable=invalid-name` is used in `khiops/sklearn/estimators.py` to
allow scikit-learn's `X` and `y` naming convention. Do not add that
suppression elsewhere.

### Dependency Rules

- `khiops.core` must only import Python built-in modules.
- `khiops.sklearn` may directly depend on Pandas and Scikit-learn only.
- Do not add new external dependencies without discussion. Minimize external
package dependencies to reduce installation problems.
- Development and documentation generation dependencies (e.g., `black`,
`isort`, `sphinx`, `wrapt`, `furo`) can be more permissive, but still avoid
unnecessary additions.
- Test dependencies are listed in `test-requirements.txt` (`coverage`, `wrapt`).
Package dependencies are extracted from `pyproject.toml` at CI time via
`scripts/extract_dependencies_from_pyproject_toml.py`.

### Python Support Policy

- CI tests run against Python 3.10–3.14.

### Versioning

The project uses `MAJOR.MINOR.PATCH.INCREMENT[-PRE_RELEASE]`, where
`MAJOR.MINOR.PATCH` tracks the compatible Khiops native version and `INCREMENT`
tracks the Python package's own evolution.

For Pip and Conda packages, the dash before the pre-release atom is removed to
comply with
[Python version specifiers](https://packaging.python.org/en/latest/specifications/version-specifiers/#version-specifiers)
(e.g., `11.0.0.2a1` instead of `11.0.0.2-a.1`).

## License

BSD 3-Clause-Clear. See `LICENSE.md`.
120 changes: 120 additions & 0 deletions .github/instructions/ci-workflows.instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
---
applyTo: ".github/workflows/**"
---

# CI Workflow Changes

Use these rules for files under `.github/workflows/`. Apply the shared guidance
from `.github/copilot-instructions.md` first, then this workflow-specific
guidance.

## Workflow Overview

This repository has seven GitHub Actions workflows in `.github/workflows/`. Most
workflows use concurrency groups to cancel in-progress runs when superseded,
except `release.yml` (no concurrency group) and `api-docs.yml` (which uses a
`pages` concurrency group that does not cancel in-progress runs).

### `quick-checks.yml`

Runs pre-commit hooks on every pull request and on `workflow_dispatch`. Uses
Python 3.11 on `ubuntu-latest`. The hooks (configured in
`.pre-commit-config.yaml`) are: Black, pylint, isort (with special no-sections
config for sample files), yamlfix, shellcheck, GitHub workflow/action schema
validation (`check-github-workflows`, `check-github-actions`), and a local
`samples-generation` hook that regenerates reST samples when
`khiops/samples/samples.py` or `khiops/samples/samples_sklearn.py` change.

### `tests.yml`

The main test suite. Triggers on PRs that touch `khiops/**/*.py`,
`tests/**/*.py`, `tests/resources/**` (excluding `tests/resources/**/*.md`), or
the workflow file itself. Also supports `workflow_dispatch`.

Three job groups:

- **`run`** (Linux matrix): Runs across Python 3.10–3.14 in custom Docker
containers (`ghcr.io/khiopsml/khiops-python/khiopspydev-ubuntu22.04`). Each
Python version uses a dedicated Conda environment with native Khiops.
Coverage is collected with `coverage` and reported as XML. Test results use
JUnit XML via `unittest-xml-reporting`.
- **`check-khiops-integration-on-linux`**: Runs integration tests on multiple
Linux containers (ubuntu22.04, rocky8, rocky9, debian13). Validates Khiops
status, runs samples, tests major-version mismatch detection with a
`py3_khiops10_conda` environment, and runs the integration test suite.
- **`check-khiops-integration-on-windows`**: Installs Khiops Desktop via NSIS
installer on Windows 2022 with Python 3.12. Runs integration tests and
samples outside a Python virtual environment, then installs khiops-python
inside a venv and validates the installation status.

**Expensive tests** (remote file access with S3/GCS/Azure): Skipped by default
on feature branches. Enabled on `main`/`main-v10` branches or via the
`run-expensive-tests` workflow dispatch input. These require GCP Workload
Identity Federation, a local fake S3 server, and Azure storage credentials.

**Environment variables**: `KHIOPS_SAMPLES_DIR` points to a checkout of
`khiopsml/khiops-samples`. `KHIOPS_PROC_NUMBER=4` forces MPI multi-process
execution. MPI oversubscribe flags are set for Open MPI 4.x and 5+.

### `pip.yml`

Builds an **sdist** package (no wheel) and tests it in Docker containers
(ubuntu22.04, rocky9, debian13). Triggers on:

- Tag pushes (any tag) — automatically publishes to GitHub Releases
- PRs touching `pyproject.toml`, `LICENSE.md`, or the workflow file
- `workflow_dispatch` with optional `pypi-target` choice (`None`, `testpypi`,
`pypi`)

Publishing to TestPyPI/PyPI uses OIDC Trusted Publishing and requires the
corresponding GitHub environment (`testpypi` or `pypi`). Only runs for the
`KhiopsML` org on tag pushes.

### `release.yml`

Manual workflow that merges `dev` into `main`, tags the merge commit with the
provided version, and resets `dev` to `main`. Only triggered via
`workflow_dispatch` with a `version` input.

### `api-docs.yml`

Builds Sphinx documentation inside a dev Docker container. Triggers on:

- Tag pushes — builds docs and uploads a zip archive to GitHub Releases
- PRs touching `doc/**/*.rst`, `doc/create-doc`, `doc/clean-doc`, `doc/*.py`,
`khiops/**/*.py`, or the workflow file
- `workflow_dispatch` with optional tutorial and samples revision inputs

Uses the `khiopspydev-ubuntu22.04` Docker image and runs
`./create-doc -t -d -g <revision>`. Uses a `pages` concurrency group that does
**not** cancel in-progress runs (to avoid interrupting production deployments).

### `dev-docker.yml`

Builds development Docker images for multiple OS targets (ubuntu22.04, rocky8,
rocky9, debian13) with configurable Khiops revision, server revision, Python
versions (3.10–3.14), and remote file driver versions (GCS, S3, Azure).
Triggers on PRs touching `packaging/docker/khiopspydev/Dockerfile.*` or the
workflow file, and on `workflow_dispatch`. Images are pushed to
`ghcr.io/khiopsml/khiops-python/khiopspydev-*` only when manually requested via
`push: true`. The `set-latest` flag only works on the `main` or `main-v10`
branches.

### `test-conda-forge-package.yml`

Manual-only workflow that tests the released `khiops` Conda package on the
`conda-forge` channel across a broad matrix: Python 3.10–3.14 × multiple OS
environments (Ubuntu 20.04/22.04/24.04, Rocky 8/9, Windows 2022/2025, macOS
14/15/15-Intel). Tests both normal Conda environments and "Conda-based
environments" (where `CONDA_PREFIX` is unset to simulate non-Conda invocation).

## Editing Rules

- Workflow YAML files are validated by pre-commit hooks
(`check-github-workflows`, `check-github-actions`) and formatted by `yamlfix`.
- The dev Docker images are the test environment for both `tests.yml` and
`pip.yml`. If you need new system dependencies in CI, they go into the
Dockerfiles under `packaging/docker/khiopspydev/`.
- Test dependencies are in `test-requirements.txt` (`coverage`, `wrapt`).
Package dependencies are extracted from `pyproject.toml` at CI time via
`scripts/extract_dependencies_from_pyproject_toml.py`.
Loading
Loading