From 0bb637b37093754ad840401df2194a52204e0bbd Mon Sep 17 00:00:00 2001 From: Li Jiang Date: Wed, 6 May 2026 11:08:33 +0800 Subject: [PATCH 1/4] =?UTF-8?q?chore:=20post-CLI=20cleanup=20=E2=80=94=20f?= =?UTF-8?q?ix=20Feature.compute,=20llm=20exports,=20cache=20silence;=20add?= =?UTF-8?q?=20CLI=20docs?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Findings from a post-PR-#5 sweep of the repo: Bug fixes --------- 1. `featcopilot/core/feature.py` — `Feature.compute` previously called `exec(self.code, {"__builtins__": {}}, locals)` so any stored snippet that used a Python builtin (`len`, `range`, `int`, `sum`, ...) raised `NameError` at exec time, even though the snippet was otherwise valid. The fix introduces a curated `_SAFE_BUILTINS` dict mirroring `TransformRule._get_safe_builtins` so common idioms work without giving the snippet unrestricted access to imports / file I/O. 2. `featcopilot/llm/__init__.py` — `SyncCopilotFeatureClient` was defined in `copilot_client.py` and used internally by sibling modules but was missing from the package's imports and `__all__`, while sister classes `SyncOpenAIFeatureClient` and `SyncLiteLLMFeatureClient` were both exported. `from featcopilot.llm import SyncCopilotFeatureClient` raised `ImportError` even though the class itself was importable from the submodule. The fix adds the missing import + `__all__` entry so all three sync clients have a consistent public surface. 3. `featcopilot/utils/cache.py` — three bare `except Exception: pass` blocks silently swallowed disk I/O / pickle / JSON errors when the on-disk cache entry was corrupted, when a value was unpicklable (e.g. lambda), or when metadata JSON was truncated. The fix replaces each with `logger.warning(...)` so callers can diagnose disk issues, version mismatches, and unpicklable values from logs while still preserving the "cache is best-effort, never crash the caller" contract. Documentation fixes ------------------- 4. `docs/user-guide/cli.md` — NEW page documenting the `featcopilot` CLI added in PR #5 (`info` / `transform` / `explain` subcommands, output contract, agentic-usage tips, parquet caveat, `--config` example). The CLI was previously only mentioned in the README; mkdocs had no nav entry. 5. `mkdocs.yml` — added `Command-Line Interface: user-guide/cli.md` to the `User Guide` nav section. 6. `docs/getting-started/installation.md:82` — sample expected output updated from `# Output: 0.1.0` to `# Output: 0.3.7` (matches current `pyproject.toml` version). Tests ----- * `tests/test_core.py::TestFeature::test_feature_compute_uses_safe_builtins` — pins the safe-builtin whitelist; covers `len` / `range` / `sum` / `int` / `abs` / `round` / `min` / `max`. * `tests/test_utils.py` — three new `FeatureCache` regression tests for corrupted `.pkl` reads, unpicklable values on `set`, and corrupted metadata JSON, each asserting `logger.warning` is called via `mock.patch` on the module logger. * `tests/test_llm_modules.py::TestLLMPackageExports` — pins the `featcopilot.llm` public API: all three sync clients are importable directly AND listed in `__all__`, with identity checks against the underlying submodule definitions. Local: 912 tests pass (full suite), pre-commit clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- docs/getting-started/installation.md | 2 +- docs/user-guide/cli.md | 223 +++++++++++++++++++++++++++ featcopilot/core/feature.py | 55 ++++++- featcopilot/llm/__init__.py | 3 +- featcopilot/utils/cache.py | 45 +++++- mkdocs.yml | 1 + tests/test_core.py | 36 +++++ tests/test_llm_modules.py | 49 ++++++ tests/test_utils.py | 75 +++++++++ 9 files changed, 479 insertions(+), 10 deletions(-) create mode 100644 docs/user-guide/cli.md diff --git a/docs/getting-started/installation.md b/docs/getting-started/installation.md index 89c8964..e567704 100644 --- a/docs/getting-started/installation.md +++ b/docs/getting-started/installation.md @@ -79,7 +79,7 @@ Development dependencies include: ```python import featcopilot print(featcopilot.__version__) -# Output: 0.1.0 +# Output: 0.3.7 from featcopilot import AutoFeatureEngineer print("Installation successful!") diff --git a/docs/user-guide/cli.md b/docs/user-guide/cli.md new file mode 100644 index 0000000..de27b36 --- /dev/null +++ b/docs/user-guide/cli.md @@ -0,0 +1,223 @@ +# Command-Line Interface + +FeatCopilot ships a stable, agent-friendly `featcopilot` CLI for using the +library from shells, CI pipelines, and **agentic / LLM tool-use** workflows +without writing Python glue. All subcommands accept `--json` for +machine-readable stdout; user-facing errors are written to **stderr** with +a non-zero exit code so that automation can parse failures +deterministically. + +The CLI is installed automatically with the package via the +`[project.scripts]` entry point (`featcopilot = "featcopilot.cli:main"`), +so after `pip install featcopilot` the `featcopilot` command is available +on `$PATH`. The equivalent module form `python -m featcopilot ...` always +works regardless of how the package was installed. + +## Subcommands + +| Command | Purpose | +| --- | --- | +| `featcopilot info` | Print version, supported engines, selection methods, leakage guards, I/O formats, and a runtime `parquet_available` flag. | +| `featcopilot transform` | Read a CSV / Parquet / JSON file, run [`AutoFeatureEngineer`](../user-guide/overview.md), and write engineered features to an output file. | +| `featcopilot explain` | Fit and print a JSON document with `{name, explanation, code}` per feature for downstream LLM consumption (no output file is written). | + +Run any subcommand with `--help` to see the full flag list: + +```bash +featcopilot --help +featcopilot transform --help +featcopilot explain --help +``` + +## Output contract + +All three subcommands honor the same agent-friendly contract: + +* **`stdout`** carries the result. With `--json` (always implicit for + `explain`), exactly one JSON document is written. +* **`stderr`** is reserved for failures. A successful run keeps `stderr` + empty even when `AutoFeatureEngineer` emits leakage warnings or + `verbose` logger output ─ those are surfaced via the JSON payload's + `warnings` field instead. This same contract covers warnings emitted + during pandas / pyarrow read or write phases (e.g. `DtypeWarning` on + mixed-type CSVs, `FutureWarning` from a successful Parquet write): + they are routed to the JSON `warnings` field, never to `stderr`. +* **Exit codes**: `0` on success; `2` for user-input errors (missing + files, malformed config, unknown target, etc.); `1` for unexpected + internal errors. + +## `featcopilot info` + +Discover capabilities without running an engineer: + +```bash +featcopilot info --json +``` + +Sample (truncated) output: + +```json +{ + "version": "0.3.7", + "supported_engines": ["relational", "tabular", "text", "timeseries"], + "supported_selection_methods": ["correlation", "importance", "mutual_info"], + "supported_leakage_guards": ["block", "ignore", "warn"], + "supported_input_formats": ["csv", "json"], + "supported_output_formats": ["csv", "json"], + "parquet_available": false +} +``` + +`parquet_available` reflects whether `pyarrow` or `fastparquet` is +importable in the current environment. The base FeatCopilot install does +not pin a parquet engine; install one with +`pip install pyarrow` (or `fastparquet`) to enable Parquet I/O. + +## `featcopilot transform` + +Run feature engineering on a tabular input and write the engineered +features to disk: + +```bash +featcopilot transform \ + --input data.csv --target label --output features.csv \ + --engines tabular --max-features 50 \ + --json +``` + +Common flags: + +| Flag | Purpose | +| --- | --- | +| `--input / -i` | Path to input file (CSV / Parquet / JSON). Required. | +| `--output / -o` | Path to output file. Required. | +| `--target / -t` | Target column. Required when feature selection is applied (i.e. when `--max-features` / config `max_features` is set). | +| `--input-format` / `--output-format` | Override format detection (`csv` / `parquet` / `json`). | +| `--engines` | One or more engines to enable (default: `tabular`). | +| `--max-features N` | Cap on engine output / selection. Forwarded both to engine constructors and to the selector. | +| `--no-selection` | Skip feature selection entirely (raw feature generation). | +| `--selection-methods` | Override the default `mutual_info importance` selection set. | +| `--leakage-guard` | How to handle suspicious column names: `warn` (default), `block`, or `ignore`. | +| `--include-target` | Re-attach the target column to the output file (collision-safe). | +| `--task-description` | Free-form ML task description forwarded to LLM-aware engines. | +| `--config FILE` | JSON config with nested keys (e.g. `llm_config`, `selection_methods`). CLI flags override config values. | +| `--verbose / --no-verbose` | Toggle verbose logging. With `--json`, log records are routed to the JSON `warnings` field rather than `stderr`. | +| `--gate-n-jobs` | Parallelism for the do-no-harm gate's RF (default 1; `-1` = all cores). | +| `--json` | Emit a one-line JSON status object on stdout instead of human-readable text. | + +A successful `--json` run prints something like: + +```json +{ + "status": "ok", + "input": "data.csv", + "output": "features.csv", + "input_format": "csv", + "output_format": "csv", + "n_rows": 1000, + "n_features": 47, + "n_input_columns": 12, + "n_generated_features": 47, + "engines": ["tabular"], + "selection_methods": ["mutual_info", "importance"], + "max_features": 50, + "target": "label", + "selection_applied": true, + "warnings": [] +} +``` + +## `featcopilot explain` + +Fit the engineer (without writing any output file) and print a JSON +catalog of generated features for downstream LLM consumption: + +```bash +featcopilot explain --input data.csv --target label +``` + +Each entry in the `features` array contains the feature `name`, an +LLM-style natural-language `explanation`, and the executable Python +`code` used to produce it. + +`explain` defaults to running on the **full** input so the metadata is +a faithful description of what a corresponding `transform` would +generate. Some engines (notably the tabular engine's categorical +encoding) consult per-row / per-category statistics when planning +features, so blind subsampling can silently change results. For very +large inputs where metadata-only `explain` should not pay full memory +or compute cost, opt in with: + +```bash +featcopilot explain --input big.csv --target label --explain-sample-size 5000 +``` + +The cap is a deterministic *head slice* (the first N rows), threaded +through `pd.read_csv(nrows=N)` for CSV so memory is bounded natively. +For Parquet / JSON pandas has no native row-limit, so the file is +fully read and then truncated; a `UserWarning` explaining the +limitation is emitted (and surfaced in the JSON `warnings` field) only +when the cap actually truncates the input. + +## Configuration files + +Pass `--config config.json` to provide nested keys that don't have +matching CLI flags, such as the `llm_config` engine kwargs: + +```json +{ + "engines": ["tabular", "llm"], + "max_features": 80, + "selection_methods": ["mutual_info", "importance"], + "llm_config": { + "backend": "litellm", + "model": "gpt-4o", + "max_suggestions": 20 + } +} +``` + +Explicit CLI flags override values from the config file. Any malformed +scalar (e.g. `"max_features": "5"`, `"verbose": "false"`) is rejected +with a clean exit-2 error rather than failing later inside the +engineer. + +## Parquet I/O + +The base FeatCopilot install does not pin a parquet engine. To use +`--input file.parquet` / `--output file.parquet` (or the `parquet` +value of `--input-format` / `--output-format`), install one of: + +```bash +pip install pyarrow # recommended +# or +pip install fastparquet +``` + +Confirm with `featcopilot info --json`: + +```json +{ "parquet_available": true, ... } +``` + +If neither engine is installed, attempting Parquet I/O fails with a +clean exit-2 error pointing at the missing dependency. + +## Agentic-usage tips + +* Always pass `--json`. Treat anything on `stderr` as a hard failure; + treat anything on `stdout` as the JSON result. +* Treat the JSON `warnings` field as a list of human-readable + diagnostic strings ─ it is non-empty for `transform` runs that + generated leakage / mock-mode / sampling notices, and empty for + fully clean runs. +* For long-running batch jobs, prefer `featcopilot transform` to + `python -m featcopilot transform` only because the former is shorter; + both invoke the exact same entry point. + +## See also + +* [Overview](overview.md) ─ the underlying `AutoFeatureEngineer` API. +* [Engines](engines.md) ─ what each engine generates. +* [LLM Features](llm-features.md) ─ configuring the LLM backend (used + by `--config llm_config`). diff --git a/featcopilot/core/feature.py b/featcopilot/core/feature.py index 4e3e933..2feff77 100644 --- a/featcopilot/core/feature.py +++ b/featcopilot/core/feature.py @@ -12,6 +12,45 @@ logger = get_logger(__name__) +# Curated set of safe Python builtins exposed to ``Feature.compute``'s +# stored code. Without this whitelist (i.e. with ``{"__builtins__": {}}``) +# even basic idioms like ``len(df)``, ``range(...)``, ``sum(...)``, or +# ``int(x)`` raise ``NameError`` at exec time, which means a feature whose +# code legitimately uses a Python builtin crashes during ``compute`` even +# though the snippet is otherwise valid. The set mirrors the one used by +# :class:`featcopilot.core.transform_rule.TransformRule` so both code +# execution paths agree on what is safe. +_SAFE_BUILTINS: dict[str, Any] = { + "len": len, + "sum": sum, + "max": max, + "min": min, + "int": int, + "float": float, + "str": str, + "bool": bool, + "abs": abs, + "round": round, + "pow": pow, + "range": range, + "list": list, + "dict": dict, + "set": set, + "tuple": tuple, + "sorted": sorted, + "reversed": reversed, + "enumerate": enumerate, + "zip": zip, + "any": any, + "all": all, + "map": map, + "filter": filter, + "isinstance": isinstance, + "hasattr": hasattr, + "getattr": getattr, +} + + class FeatureType(Enum): """Types of features.""" @@ -109,6 +148,13 @@ def compute(self, df: pd.DataFrame) -> pd.Series: """ Compute feature values from DataFrame using stored code. + The stored ``code`` is executed with ``df``, ``np`` and ``pd`` available + as local variables and a curated set of safe Python builtins + (``len``, ``range``, ``sum``, numeric / sequence constructors, etc.) + in globals so common idioms work without giving the snippet + unrestricted access to imports / file I/O. The snippet must + bind its output to a local variable named ``result``. + Parameters ---------- df : DataFrame @@ -118,11 +164,16 @@ def compute(self, df: pd.DataFrame) -> pd.Series: ------- Series Computed feature values + + Raises + ------ + ValueError + If ``self.code`` is empty / missing or if the executed code + does not bind a ``result`` variable. """ if self.code: - # Execute stored code to compute feature local_vars = {"df": df, "np": np, "pd": pd} - exec(self.code, {"__builtins__": {}}, local_vars) + exec(self.code, {"__builtins__": _SAFE_BUILTINS}, local_vars) if "result" in local_vars: return local_vars["result"] raise ValueError(f"No code defined for feature {self.name}") diff --git a/featcopilot/llm/__init__.py b/featcopilot/llm/__init__.py index d63780e..3039868 100644 --- a/featcopilot/llm/__init__.py +++ b/featcopilot/llm/__init__.py @@ -4,7 +4,7 @@ """ from featcopilot.llm.code_generator import FeatureCodeGenerator -from featcopilot.llm.copilot_client import CopilotFeatureClient +from featcopilot.llm.copilot_client import CopilotFeatureClient, SyncCopilotFeatureClient from featcopilot.llm.explainer import FeatureExplainer from featcopilot.llm.litellm_client import LiteLLMFeatureClient, SyncLiteLLMFeatureClient from featcopilot.llm.openai_client import OpenAIFeatureClient, SyncOpenAIFeatureClient @@ -13,6 +13,7 @@ __all__ = [ "CopilotFeatureClient", + "SyncCopilotFeatureClient", "LiteLLMFeatureClient", "SyncLiteLLMFeatureClient", "OpenAIFeatureClient", diff --git a/featcopilot/utils/cache.py b/featcopilot/utils/cache.py index ad8e23e..1067727 100644 --- a/featcopilot/utils/cache.py +++ b/featcopilot/utils/cache.py @@ -8,6 +8,10 @@ import pandas as pd +from featcopilot.utils.logger import get_logger + +logger = get_logger(__name__) + class FeatureCache: """ @@ -85,8 +89,18 @@ def get(self, key: str, data_hash: str | None = None) -> Any | None: # Store in memory cache self._memory_cache[cache_key] = value return value - except Exception: - pass + except Exception as exc: # noqa: BLE001 - we re-surface via logger + # Cache reads must never crash the caller (a corrupted + # entry should fall back to recomputing the feature) + # but the failure must not be silent: log a warning + # so users can diagnose disk issues / pickle version + # mismatches / partially-written files. + logger.warning( + "FeatureCache: failed to read cache entry %s (%s: %s)", + cache_path, + type(exc).__name__, + exc, + ) return None @@ -137,8 +151,18 @@ def set( meta_path = self.cache_dir / f"{cache_key}.meta.json" with open(meta_path, "w") as f: json.dump(metadata or {}, f) - except Exception: - pass + except Exception as exc: # noqa: BLE001 - we re-surface via logger + # Persisting to disk is best-effort: a write failure (e.g. + # disk full, read-only filesystem, unpicklable value) + # should not crash the caller, but it must not be silent + # either. Log a warning so operational issues with the + # cache directory surface in observability. + logger.warning( + "FeatureCache: failed to persist cache entry %s (%s: %s)", + cache_path, + type(exc).__name__, + exc, + ) def has(self, key: str, data_hash: str | None = None) -> bool: """Check if key exists in cache.""" @@ -205,8 +229,17 @@ def get_metadata(self, key: str, data_hash: str | None = None) -> dict | None: try: with open(meta_path) as f: return json.load(f) - except Exception: - pass + except Exception as exc: # noqa: BLE001 - we re-surface via logger + # Same rationale as the cache-read path: a corrupted + # metadata file should fall back to "no metadata + # available" rather than crashing the caller, but + # the failure must surface via the logger. + logger.warning( + "FeatureCache: failed to read cache metadata %s (%s: %s)", + meta_path, + type(exc).__name__, + exc, + ) return None diff --git a/mkdocs.yml b/mkdocs.yml index 8136adf..48ca57f 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -80,6 +80,7 @@ nav: - LLM Features: user-guide/llm-features.md - Feature Selection: user-guide/selection.md - Feature Stores: user-guide/feature-stores.md + - Command-Line Interface: user-guide/cli.md - Benchmarks: user-guide/benchmarks.md - Best Practices: user-guide/best-practices.md - Examples: diff --git a/tests/test_core.py b/tests/test_core.py index cbee907..c014df3 100644 --- a/tests/test_core.py +++ b/tests/test_core.py @@ -60,6 +60,42 @@ def test_feature_compute(self): assert list(result) == [1, 4, 9, 16] + def test_feature_compute_uses_safe_builtins(self): + """Regression: ``Feature.compute`` previously called + ``exec(code, {"__builtins__": {}}, locals)`` which broke any + snippet that used a Python builtin (``len``, ``range``, ``int``, + ``sum``, ...). The fix exposes a curated set of safe builtins + identical to ``TransformRule._get_safe_builtins`` so common + idioms work without giving the snippet unrestricted access. + """ + df = pd.DataFrame({"col1": [10, 20, 30, 40, 50]}) + + # ``len`` (most common builtin used in a feature) — would NameError + # before the fix. + f_len = Feature(name="row_count", code="result = pd.Series([len(df)] * len(df))") + result = f_len.compute(df) + assert list(result) == [5, 5, 5, 5, 5] + + # ``range`` + ``sum`` + numeric builtins — exercises a broader + # subset of the whitelist in one snippet. + f_range = Feature( + name="cumulative_index_sum", + code="result = pd.Series([sum(range(int(v))) for v in df['col1']])", + ) + result = f_range.compute(df) + # sum(range(10))=45, sum(range(20))=190, sum(range(30))=435, + # sum(range(40))=780, sum(range(50))=1225 + assert list(result) == [45, 190, 435, 780, 1225] + + # ``abs`` + ``round`` + ``min`` / ``max`` — ensure the + # whitelist's full set is exposed. + f_clip = Feature( + name="clipped", + code="result = pd.Series([round(min(max(abs(v - 25), 0), 20), 2) for v in df['col1']])", + ) + result = f_clip.compute(df) + assert list(result) == [15, 5, 5, 15, 20] + class TestFeatureSet: """Tests for FeatureSet class.""" diff --git a/tests/test_llm_modules.py b/tests/test_llm_modules.py index e1c426b..a42a2d8 100644 --- a/tests/test_llm_modules.py +++ b/tests/test_llm_modules.py @@ -2596,3 +2596,52 @@ def test_send_prompt_handles_concurrent_futures_timeout_error(self): result = asyncio.run(client.send_prompt("hi")) assert "timed out" in result.lower() + + +# --------------------------------------------------------------------------- +# featcopilot.llm package public-API consistency +# --------------------------------------------------------------------------- + + +class TestLLMPackageExports: + """Pin the public API of the ``featcopilot.llm`` package. + + Regression: ``SyncCopilotFeatureClient`` was defined in + ``copilot_client.py`` and used by sibling modules (e.g. + ``code_generator.py``, ``explainer.py``) but was missing from + ``featcopilot/llm/__init__.py``'s imports and ``__all__`` while the + sister classes ``SyncOpenAIFeatureClient`` and + ``SyncLiteLLMFeatureClient`` WERE both exported. That made + ``from featcopilot.llm import SyncCopilotFeatureClient`` raise + ``ImportError`` even though the class itself was importable from the + submodule, which is a public-API surface inconsistency callers + legitimately notice. + """ + + def test_sync_clients_are_all_exported(self): + """All three sync LLM clients must be importable directly from + the ``featcopilot.llm`` package.""" + from featcopilot.llm import ( + SyncCopilotFeatureClient, + SyncLiteLLMFeatureClient, + SyncOpenAIFeatureClient, + ) + + # Identity check: the names exposed by the package must be the + # same objects defined in the underlying submodules. + from featcopilot.llm.copilot_client import SyncCopilotFeatureClient as _SCC + from featcopilot.llm.litellm_client import SyncLiteLLMFeatureClient as _SLC + from featcopilot.llm.openai_client import SyncOpenAIFeatureClient as _SOC + + assert SyncCopilotFeatureClient is _SCC + assert SyncLiteLLMFeatureClient is _SLC + assert SyncOpenAIFeatureClient is _SOC + + def test_sync_clients_are_listed_in_dunder_all(self): + """``__all__`` must mirror the imports so ``from featcopilot.llm + import *`` is consistent with explicit imports.""" + import featcopilot.llm as llm_pkg + + assert "SyncCopilotFeatureClient" in llm_pkg.__all__ + assert "SyncLiteLLMFeatureClient" in llm_pkg.__all__ + assert "SyncOpenAIFeatureClient" in llm_pkg.__all__ diff --git a/tests/test_utils.py b/tests/test_utils.py index 75731d3..9996895 100644 --- a/tests/test_utils.py +++ b/tests/test_utils.py @@ -204,6 +204,81 @@ def test_list_keys_includes_disk(self): cache._memory_cache.clear() assert "disk_key" in cache.list_keys() + def test_corrupted_cache_read_logs_warning(self): + """Regression: ``FeatureCache.get`` previously had a bare + ``except Exception: pass`` that silently swallowed ``pickle`` + errors when the on-disk cache entry was corrupted (truncated + write, version mismatch, partial flush). The fix logs a + warning via ``featcopilot.utils.cache``'s module logger so + users can diagnose disk issues. This test corrupts the on-disk + ``.pkl`` and asserts ``get()`` returns ``None`` AND emits a + warning containing the cache path. + """ + from pathlib import Path + + with tempfile.TemporaryDirectory() as tmpdir: + cache = FeatureCache(cache_dir=tmpdir) + cache.set("victim", {"value": 42}) + + # Find the persisted file and corrupt it. + pkl_path = next(Path(tmpdir).glob("*.pkl")) + pkl_path.write_bytes(b"\x00\x01\x02not-a-valid-pickle-stream") + + # Drop the in-memory entry so ``get`` falls through to disk. + cache._memory_cache.clear() + + with patch("featcopilot.utils.cache.logger") as mock_logger: + value = cache.get("victim") + assert value is None + assert mock_logger.warning.called, "FeatureCache must log on read failure" + msg = mock_logger.warning.call_args[0][0] + assert "failed to read cache entry" in msg + + def test_unpicklable_value_logs_warning_on_set(self): + """Regression: ``FeatureCache.set`` previously had a bare + ``except Exception: pass`` that silently swallowed + ``pickle.PicklingError`` when the value could not be persisted + (e.g. lambda / open file handle). The fix logs a warning so + operational issues with the cache directory surface in + observability. + """ + with tempfile.TemporaryDirectory() as tmpdir: + cache = FeatureCache(cache_dir=tmpdir) + + # Lambdas cannot be pickled by the default protocol. + unpicklable = lambda x: x # noqa: E731 + + with patch("featcopilot.utils.cache.logger") as mock_logger: + cache.set("bad", unpicklable) + assert mock_logger.warning.called, "FeatureCache must log on persist failure" + msg = mock_logger.warning.call_args[0][0] + assert "failed to persist cache entry" in msg + + def test_corrupted_metadata_read_logs_warning(self): + """Regression: ``FeatureCache.get_metadata`` previously had a + bare ``except Exception: pass`` that hid corrupted JSON + metadata files. The fix logs a warning. + """ + from pathlib import Path + + with tempfile.TemporaryDirectory() as tmpdir: + cache = FeatureCache(cache_dir=tmpdir) + cache.set("k", "v", metadata={"src": "tabular"}) + + # Corrupt the metadata JSON. + meta_path = next(Path(tmpdir).glob("*.meta.json")) + meta_path.write_text("{not valid json") + + # Force re-read from disk by clearing the in-memory metadata cache. + cache._metadata.clear() + + with patch("featcopilot.utils.cache.logger") as mock_logger: + md = cache.get_metadata("k") + assert md is None + assert mock_logger.warning.called, "FeatureCache must log on metadata read failure" + msg = mock_logger.warning.call_args[0][0] + assert "failed to read cache metadata" in msg + class TestFeatureCacheDataHash: """Tests for FeatureCache with data_hash parameter.""" From 3cd59a699c4aa01ee3eb14fd0faa5780e52ea863 Mon Sep 17 00:00:00 2001 From: Li Jiang Date: Wed, 6 May 2026 15:07:59 +0800 Subject: [PATCH 2/4] fix(feature): single namespace for exec; per-call builtin copy; distinct error MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Round-1 review on PR #6 raised four threads (3x copilot-pull-request-reviewer, 1x codex). All addressed: 1. **Codex P2 (feature.py:176) — comprehensions/lambdas saw NameError** Free variables inside a comprehension or lambda body are resolved against the enclosing function's globals, NOT against the caller's locals dict passed to `exec`. The previous fix put `df`/`np`/ `pd` only in `locals` so a snippet such as `[df['c'].iloc[i] for i in range(len(df))]` would still raise `NameError` and `FeatureSet.compute_all` would silently drop the feature. `Feature.compute` now uses a SINGLE shared namespace (passing one dict as both `globals` and `locals` to `exec`), so comprehensions / lambdas / generator expressions resolve `df`, `np`, `pd` and the safe builtins identically to the top-level statement scope. 2. **Copilot (feature.py:51) — `_SAFE_BUILTINS` mutation persisted** `_SAFE_BUILTINS` was passed to `exec` by reference, so a snippet that rebound entries (`__builtins__['len'] = lambda x: -999`) could leak the mutation into every subsequent `Feature.compute` call in the same process. `compute` now passes `dict(_SAFE_BUILTINS)` (a fresh shallow copy) on every call, mirroring `TransformRule._get_safe_builtins()` which already returns a fresh dict per call. Side effects are now strictly local to the single `exec` call. 3. **Copilot (feature.py:179) — misleading error when `result` missing** The docstring said `ValueError` would be raised when the executed code did not bind a `result` variable, but the implementation always surfaced `"No code defined for feature ..."` even when `self.code` was clearly present and only `result` was missing. The two branches now produce DISTINCT messages: - `self.code` empty → `"No code defined for feature ..."` - `self.code` present but no `result` → `"Feature '...' code did not produce a 'result' variable. Stored snippet must bind its output to a name called 'result'."` 4. **Copilot (docs/user-guide/cli.md:223) — misleading "by `--config llm_config`"** `--config` takes a JSON file path; `llm_config` is a key inside that JSON. Reworded the See-also bullet to "provide an `llm_config` object inside the JSON file passed to `--config`" and link back to the existing Configuration Files section. Tests added (regression locks): - `test_feature_compute_resolves_df_inside_comprehension` — list comprehension, lambda body, and generator expression all referencing `df` / `pd` / `len` / `range` succeed. - `test_feature_compute_isolates_safe_builtins_across_calls` — runs a "poisoner" snippet that rebinds `__builtins__['len']`, then asserts a clean follow-up snippet still sees the original `len`. - `test_feature_compute_distinguishes_missing_code_from_missing_result` — pins the two distinct `ValueError` messages. Local: 915 tests pass (full suite); +3 new tests vs round-0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- docs/user-guide/cli.md | 5 +- featcopilot/core/feature.py | 65 ++++++++++++++++----- tests/test_core.py | 112 ++++++++++++++++++++++++++++++++++++ 3 files changed, 166 insertions(+), 16 deletions(-) diff --git a/docs/user-guide/cli.md b/docs/user-guide/cli.md index de27b36..c832bd0 100644 --- a/docs/user-guide/cli.md +++ b/docs/user-guide/cli.md @@ -219,5 +219,6 @@ clean exit-2 error pointing at the missing dependency. * [Overview](overview.md) ─ the underlying `AutoFeatureEngineer` API. * [Engines](engines.md) ─ what each engine generates. -* [LLM Features](llm-features.md) ─ configuring the LLM backend (used - by `--config llm_config`). +* [LLM Features](llm-features.md) ─ configuring the LLM backend (provide + an `llm_config` object inside the JSON file passed to `--config`, as + shown in the [Configuration files](#configuration-files) section above). diff --git a/featcopilot/core/feature.py b/featcopilot/core/feature.py index 2feff77..b1fe375 100644 --- a/featcopilot/core/feature.py +++ b/featcopilot/core/feature.py @@ -148,12 +148,28 @@ def compute(self, df: pd.DataFrame) -> pd.Series: """ Compute feature values from DataFrame using stored code. - The stored ``code`` is executed with ``df``, ``np`` and ``pd`` available - as local variables and a curated set of safe Python builtins - (``len``, ``range``, ``sum``, numeric / sequence constructors, etc.) - in globals so common idioms work without giving the snippet - unrestricted access to imports / file I/O. The snippet must - bind its output to a local variable named ``result``. + The stored ``code`` is executed in a single shared namespace + with ``df``, ``np`` and ``pd`` bound as names alongside a + curated set of safe Python builtins (``len``, ``range``, + ``sum``, numeric / sequence constructors, etc.) so common + idioms work without giving the snippet unrestricted access to + imports / file I/O. The snippet must bind its output to a name + called ``result``. + + A *fresh copy* of the safe-builtins dict is passed into ``exec`` + on every call so that any mutation the snippet performs on + ``__builtins__`` (``del``, ``pop``, attribute assignment, + rebinding) does not bleed into subsequent ``compute`` calls. + Likewise the data-bound namespace is constructed fresh per + call. Using a SINGLE dict for both ``globals`` and ``locals`` + is what makes free variables inside comprehensions and lambdas + — which Python resolves against the enclosing function's + globals, not the caller's locals — see ``df``, ``np`` and + ``pd`` correctly. With separate ``locals`` and ``globals`` + dicts a snippet such as ``[df['c'].iloc[i] for i in + range(len(df))]`` would otherwise raise ``NameError`` because + the implicit comprehension function's body looks ``df`` up in + the (empty) ``globals``. Parameters ---------- @@ -168,15 +184,36 @@ def compute(self, df: pd.DataFrame) -> pd.Series: Raises ------ ValueError - If ``self.code`` is empty / missing or if the executed code - does not bind a ``result`` variable. + * If ``self.code`` is empty / missing — message + ``"No code defined for feature ..."``. + * If ``self.code`` is present but did not bind a + ``result`` variable — message + ``"Feature ... code did not produce a 'result' variable"``. + These two cases produce DIFFERENT messages so a failing + snippet is distinguishable from an unset feature when + debugging. """ - if self.code: - local_vars = {"df": df, "np": np, "pd": pd} - exec(self.code, {"__builtins__": _SAFE_BUILTINS}, local_vars) - if "result" in local_vars: - return local_vars["result"] - raise ValueError(f"No code defined for feature {self.name}") + if not self.code: + raise ValueError(f"No code defined for feature {self.name}") + + # Single shared namespace so comprehensions / lambdas / + # generator expressions inside the snippet see ``df``, ``np``, + # ``pd`` and the safe builtins. Fresh dicts per call so the + # snippet cannot pollute either the safe-builtins whitelist or + # the data bindings for later ``compute`` invocations. + namespace: dict[str, Any] = { + "__builtins__": dict(_SAFE_BUILTINS), + "df": df, + "np": np, + "pd": pd, + } + exec(self.code, namespace) + if "result" not in namespace: + raise ValueError( + f"Feature {self.name!r} code did not produce a 'result' variable. " + "Stored snippet must bind its output to a name called 'result'." + ) + return namespace["result"] class FeatureSet: diff --git a/tests/test_core.py b/tests/test_core.py index c014df3..66b2610 100644 --- a/tests/test_core.py +++ b/tests/test_core.py @@ -96,6 +96,118 @@ def test_feature_compute_uses_safe_builtins(self): result = f_clip.compute(df) assert list(result) == [15, 5, 5, 15, 20] + def test_feature_compute_resolves_df_inside_comprehension(self): + """Regression for round-2 review: when a snippet uses a + comprehension or lambda whose body references ``df``, ``np`` or + ``pd``, Python resolves those free variables against the + enclosing function's *globals*, not the caller's locals. With + a previous implementation that passed ``df`` only via the + ``locals`` argument to ``exec`` (and had ``globals`` set to + only ``__builtins__``), the inner reference would fail with + ``NameError`` and the call would either crash or be silently + dropped by ``FeatureSet.compute_all``. The fix uses a single + shared namespace for both ``globals`` and ``locals``. + """ + df = pd.DataFrame({"col1": [1, 2, 3, 4, 5]}) + + # List comprehension whose body references ``df`` and uses + # ``range`` / ``len`` builtins — this is the exact pattern + # called out in the round-2 review. + f_compre = Feature( + name="from_comprehension", + code="result = pd.Series([df['col1'].iloc[i] * 2 for i in range(len(df))])", + ) + result = f_compre.compute(df) + assert list(result) == [2, 4, 6, 8, 10] + + # Lambda whose body references ``df`` and ``pd`` — same + # globals-resolution rule applies. + f_lambda = Feature( + name="from_lambda", + code=( + "fn = lambda i: int(df['col1'].iloc[i]) ** 2\n" "result = pd.Series([fn(i) for i in range(len(df))])" + ), + ) + result = f_lambda.compute(df) + assert list(result) == [1, 4, 9, 16, 25] + + # Generator expression piped into a builtin — exercises the + # iterator-protocol path through the same namespace. + f_gen = Feature( + name="from_generator", + code="result = pd.Series([sum(v for v in df['col1'] if v <= n) for n in df['col1']])", + ) + result = f_gen.compute(df) + # cumulative sums for thresholds 1, 2, 3, 4, 5: 1, 3, 6, 10, 15 + assert list(result) == [1, 3, 6, 10, 15] + + def test_feature_compute_isolates_safe_builtins_across_calls(self): + """Regression: ``_SAFE_BUILTINS`` must NOT be passed to ``exec`` + by reference. If it were, a malicious / buggy snippet could + delete or rebind entries (e.g. ``del __builtins__['len']``) + and the mutation would persist across subsequent ``Feature.compute`` + calls in the same process, breaking unrelated features. The + fix passes ``dict(_SAFE_BUILTINS)`` (a shallow copy) per call + so that side effects are local to that single ``exec``. + """ + df = pd.DataFrame({"col1": [1, 2, 3]}) + + # Snippet that rebinds ``len`` inside its own builtins view. + # If the implementation passed _SAFE_BUILTINS by reference, + # this would corrupt the whitelist for every feature that runs + # after it. + f_attacker = Feature( + name="poisoner", + code=( + "import builtins\n" + # We can't directly mutate __builtins__ because exec + # turns the dict into a module, but we CAN rebind keys. + "_b = __builtins__\n" + "if isinstance(_b, dict):\n" + " _b['len'] = lambda x: -999\n" + "result = pd.Series([1, 2, 3])" + ), + ) + # The attacker may or may not actually corrupt anything — + # depending on Python's exec semantics — but the next call + # MUST behave correctly regardless. + try: + f_attacker.compute(df) + except Exception: + # Even if the attacker snippet itself fails, the isolation + # contract still applies to the next call. + pass + + # A clean, well-behaved feature run AFTER the attacker must + # see the original ``len`` builtin, not the poisoned version. + f_victim = Feature( + name="clean_user", + code="result = pd.Series([len(df)] * len(df))", + ) + result = f_victim.compute(df) + assert list(result) == [3, 3, 3], ( + "Per-call dict copy of _SAFE_BUILTINS is broken: a previous " "feature's mutation leaked into a later call" + ) + + def test_feature_compute_distinguishes_missing_code_from_missing_result(self): + """Regression: the ``ValueError`` raised by ``Feature.compute`` + must use DIFFERENT messages for the two distinct failure + modes (``self.code`` empty / missing vs. snippet ran but did + not bind ``result``). Previously both paths surfaced the + misleading ``"No code defined ..."`` message even when the + code was clearly present, making debugging harder. + """ + # No code at all → "No code defined" branch. + f_empty = Feature(name="no_code") + with pytest.raises(ValueError, match="No code defined for feature no_code"): + f_empty.compute(pd.DataFrame({"x": [1]})) + + # Code present but does not produce ``result`` → distinct + # message that points at the actual cause. + f_no_result = Feature(name="no_result", code="x = df['col1'] * 2") + with pytest.raises(ValueError, match=r"Feature 'no_result' code did not produce a 'result' variable"): + f_no_result.compute(pd.DataFrame({"col1": [1, 2, 3]})) + class TestFeatureSet: """Tests for FeatureSet class.""" From a9c25ec5d97c4d4d1c3032ffe25de092e45358e4 Mon Sep 17 00:00:00 2001 From: Li Jiang Date: Wed, 6 May 2026 16:03:29 +0800 Subject: [PATCH 3/4] docs(cli): correct --leakage-guard values; pin doc-vs-source agreement MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Round-2 review on PR #6: Codex P2 noted that the user-guide CLI doc listed `--leakage-guard` choices as `warn` / `block` / `ignore` and the `info` JSON sample showed `["block", "ignore", "warn"]`, but the actual `AutoFeatureEngineer.SUPPORTED_LEAKAGE_GUARDS` set is `{"off", "warn", "raise"}` (per `featcopilot/transformers/sklearn_compat.py:136`) and the argparse parser validates against `sorted(SUPPORTED_LEAKAGE_GUARDS)`. Users following the doc with `--leakage-guard block` would get an argparse error rather than the described behavior. Two doc fixes: - `"supported_leakage_guards": ["block", "ignore", "warn"]` → `"supported_leakage_guards": ["off", "raise", "warn"]` in the `info` JSON sample. - `--leakage-guard` row in the transform flags table now lists the three actual values (`warn`, `raise`, `off`) with a short description of what each does. Added `test_cli_doc_leakage_guard_values_match_source_of_truth` which reads `docs/user-guide/cli.md` and asserts (a) the JSON sample contains the literal sorted-list of `SUPPORTED_LEAKAGE_GUARDS` and (b) every valid value appears as a backtick-quoted token AND no non-existent value (`block`, `ignore`) appears anywhere on the page. Future drift in either direction (renaming a guard, adding a new one, removing one) will fail this test fast. Local: 916 tests pass (full suite); pre-commit clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- docs/user-guide/cli.md | 4 ++-- tests/test_cli.py | 38 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+), 2 deletions(-) diff --git a/docs/user-guide/cli.md b/docs/user-guide/cli.md index c832bd0..8345a5d 100644 --- a/docs/user-guide/cli.md +++ b/docs/user-guide/cli.md @@ -61,7 +61,7 @@ Sample (truncated) output: "version": "0.3.7", "supported_engines": ["relational", "tabular", "text", "timeseries"], "supported_selection_methods": ["correlation", "importance", "mutual_info"], - "supported_leakage_guards": ["block", "ignore", "warn"], + "supported_leakage_guards": ["off", "raise", "warn"], "supported_input_formats": ["csv", "json"], "supported_output_formats": ["csv", "json"], "parquet_available": false @@ -97,7 +97,7 @@ Common flags: | `--max-features N` | Cap on engine output / selection. Forwarded both to engine constructors and to the selector. | | `--no-selection` | Skip feature selection entirely (raw feature generation). | | `--selection-methods` | Override the default `mutual_info importance` selection set. | -| `--leakage-guard` | How to handle suspicious column names: `warn` (default), `block`, or `ignore`. | +| `--leakage-guard` | How to handle suspicious column names: `warn` (default — log a warning and continue), `raise` (hard-fail with an error), or `off` (disable the check). | | `--include-target` | Re-attach the target column to the output file (collision-safe). | | `--task-description` | Free-form ML task description forwarded to LLM-aware engines. | | `--config FILE` | JSON config with nested keys (e.g. `llm_config`, `selection_methods`). CLI flags override config values. | diff --git a/tests/test_cli.py b/tests/test_cli.py index 2f8c1f3..c7e3cfa 100644 --- a/tests/test_cli.py +++ b/tests/test_cli.py @@ -124,6 +124,44 @@ def test_info_text_mode_is_human_readable(): assert __version__ in out +def test_cli_doc_leakage_guard_values_match_source_of_truth(): + """Regression for round-1 review: the user-guide CLI doc previously + listed ``--leakage-guard`` choices as ``warn`` / ``block`` / ``ignore``, + but the actual ``AutoFeatureEngineer.SUPPORTED_LEAKAGE_GUARDS`` set is + ``{off, warn, raise}``. Following the doc with + ``--leakage-guard block`` produced an argparse error rather than the + described behavior. This test pins the doc against the source of + truth so any future drift fails fast. + """ + from featcopilot.transformers.sklearn_compat import AutoFeatureEngineer + + expected = sorted(AutoFeatureEngineer.SUPPORTED_LEAKAGE_GUARDS) + cli_doc = (Path(__file__).resolve().parent.parent / "docs" / "user-guide" / "cli.md").read_text(encoding="utf-8") + + # The ``info`` JSON sample MUST list the actual values (sorted). + expected_json_fragment = '"supported_leakage_guards": ' + json.dumps(expected) + assert ( + expected_json_fragment in cli_doc + ), f"docs/user-guide/cli.md must show {expected_json_fragment!r} in the info sample; got mismatch with source of truth" + + # The ``--leakage-guard`` row MUST mention each valid value as + # ```value`` and MUST NOT mention any value that is not actually + # accepted. Using a substring check on the literal backtick-quoted + # token avoids false positives from prose elsewhere on the page + # (e.g. ``warn`` appears in unrelated wording). + invalid_examples = ("`block`", "`ignore`") + for bad in invalid_examples: + assert bad not in cli_doc, ( + f"docs/user-guide/cli.md must not advertise {bad} as a leakage_guard value; " + f"actual valid values are {expected}" + ) + for value in expected: + assert f"`{value}`" in cli_doc, ( + f"docs/user-guide/cli.md must advertise `{value}` as a valid leakage_guard value; " + f"current values are {expected}" + ) + + def test_top_level_version_flag(capsys): # ``--version`` (argparse action) prints to stdout; main() now traps the # SystemExit and returns the code so the API contract is consistent. From e5b7a1976ec62082a7b8991f906fe3d2bb56e512 Mon Sep 17 00:00:00 2001 From: Li Jiang Date: Wed, 6 May 2026 16:25:29 +0800 Subject: [PATCH 4/4] fix: rigorous builtin-isolation test; complete info sample; honest docstring MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Round-2 review on PR #6 (4 new threads). All addressed: 1. **Codex P3 (test_core.py:162) + Copilot (test_core.py:168) — broken isolation test.** `test_feature_compute_isolates_safe_builtins_across_calls` started its attacker snippet with `import builtins`, but `__import__` is intentionally NOT in the safe-builtins whitelist, so the snippet raised at the first line and the surrounding `try/except` swallowed the failure. Result: the test would still pass even if `_SAFE_BUILTINS` were passed by reference, so it was not actually pinning the per-call-copy contract it claims to protect. Rewritten to mutate `__builtins__` directly via the dict that `exec` sees as the snippet's builtins source AND assert inside the snippet (`len(df) == -999`) that the rebinding took effect. Now requires BOTH the in-snippet mutation to succeed AND the next call to see the original `len`: a real regression check. 2. **Codex P2 (cli.md:63) — info sample omitted valid choices.** The sample showed `supported_engines: ["relational", "tabular", "text", "timeseries"]` (missing `llm`) and `supported_selection_methods: ["correlation", "importance", "mutual_info"]` (missing `chi2`, `f_test`, `xgboost`). The sample now lists the FULL sorted set from `AutoFeatureEngineer.SUPPORTED_*` and a follow-up paragraph clarifies the parquet-conditional behaviour for the I/O format arrays. Added `test_cli_doc_info_sample_matches_engines_and_selection_methods` which pins both lists against the source of truth. 3. **Copilot (feature.py:156) — misleading docstring.** The previous docstring claimed the snippet ran "without giving the snippet unrestricted access to imports / file I/O". That's incorrect: `pd` is in scope, which means `pd.read_csv` / `pd.read_parquet` / `df.to_csv` are all reachable. The fix reframes the doc accurately: `__import__` is not in the safe builtins (so plain `import foo` raises) but the runtime is NOT a security sandbox; stored snippets must come from a trusted source. Verification - 917 tests pass (full suite); pre-commit clean. - The new attacker-isolation test would FAIL if `_SAFE_BUILTINS` were passed by reference (the `-999` rebinding would leak into the next call's `[len(df)] * len(df)` and produce `[-999, -999, -999]` instead of `[3, 3, 3]`); confirmed locally. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- docs/user-guide/cli.md | 16 ++++++++-- featcopilot/core/feature.py | 43 ++++++++++++++++--------- tests/test_cli.py | 42 +++++++++++++++++++++++++ tests/test_core.py | 63 +++++++++++++++++++------------------ 4 files changed, 117 insertions(+), 47 deletions(-) diff --git a/docs/user-guide/cli.md b/docs/user-guide/cli.md index 8345a5d..ed68367 100644 --- a/docs/user-guide/cli.md +++ b/docs/user-guide/cli.md @@ -59,8 +59,15 @@ Sample (truncated) output: ```json { "version": "0.3.7", - "supported_engines": ["relational", "tabular", "text", "timeseries"], - "supported_selection_methods": ["correlation", "importance", "mutual_info"], + "supported_engines": ["llm", "relational", "tabular", "text", "timeseries"], + "supported_selection_methods": [ + "chi2", + "correlation", + "f_test", + "importance", + "mutual_info", + "xgboost" + ], "supported_leakage_guards": ["off", "raise", "warn"], "supported_input_formats": ["csv", "json"], "supported_output_formats": ["csv", "json"], @@ -68,6 +75,11 @@ Sample (truncated) output: } ``` +When a parquet engine (`pyarrow` or `fastparquet`) IS importable in the +current environment, `"parquet"` is added to `supported_input_formats` +and `supported_output_formats` (in source order, so the lists become +`["csv", "parquet", "json"]`) and `parquet_available` flips to `true`. + `parquet_available` reflects whether `pyarrow` or `fastparquet` is importable in the current environment. The base FeatCopilot install does not pin a parquet engine; install one with diff --git a/featcopilot/core/feature.py b/featcopilot/core/feature.py index b1fe375..dc13209 100644 --- a/featcopilot/core/feature.py +++ b/featcopilot/core/feature.py @@ -152,24 +152,37 @@ def compute(self, df: pd.DataFrame) -> pd.Series: with ``df``, ``np`` and ``pd`` bound as names alongside a curated set of safe Python builtins (``len``, ``range``, ``sum``, numeric / sequence constructors, etc.) so common - idioms work without giving the snippet unrestricted access to - imports / file I/O. The snippet must bind its output to a name - called ``result``. + idioms work without giving the snippet a Python import system + — ``__import__`` is intentionally NOT in the safe builtins, so + an ``import foo`` statement inside the snippet raises at exec + time. The snippet must bind its output to a name called + ``result``. + + .. note:: + This is **not** a security sandbox for untrusted code. + ``pd`` is in scope, which means the snippet can reach + pandas' file I/O helpers (``pd.read_csv``, ``pd.read_parquet``, + ``df.to_csv``, ...), and dunder attribute access on objects + reachable from ``df`` / ``np`` / ``pd`` is not blocked. The + builtin whitelist limits the *namespace* available to plain + Python idioms; it does not isolate FeatCopilot from the + ambient process. Stored snippets must therefore come from a + trusted source (your own code generator, a vetted feature + store, or a transform-rule registry you control). A *fresh copy* of the safe-builtins dict is passed into ``exec`` on every call so that any mutation the snippet performs on - ``__builtins__`` (``del``, ``pop``, attribute assignment, - rebinding) does not bleed into subsequent ``compute`` calls. - Likewise the data-bound namespace is constructed fresh per - call. Using a SINGLE dict for both ``globals`` and ``locals`` - is what makes free variables inside comprehensions and lambdas - — which Python resolves against the enclosing function's - globals, not the caller's locals — see ``df``, ``np`` and - ``pd`` correctly. With separate ``locals`` and ``globals`` - dicts a snippet such as ``[df['c'].iloc[i] for i in - range(len(df))]`` would otherwise raise ``NameError`` because - the implicit comprehension function's body looks ``df`` up in - the (empty) ``globals``. + ``__builtins__`` (rebinding entries, ``del``, ``pop``) does not + bleed into subsequent ``compute`` calls. Likewise the + data-bound namespace is constructed fresh per call. Using a + SINGLE dict for both ``globals`` and ``locals`` is what makes + free variables inside comprehensions and lambdas — which Python + resolves against the enclosing function's globals, not the + caller's locals — see ``df``, ``np`` and ``pd`` correctly. + With separate ``locals`` and ``globals`` dicts a snippet such + as ``[df['c'].iloc[i] for i in range(len(df))]`` would + otherwise raise ``NameError`` because the implicit comprehension + function's body looks ``df`` up in the (empty) ``globals``. Parameters ---------- diff --git a/tests/test_cli.py b/tests/test_cli.py index c7e3cfa..4cb0e9b 100644 --- a/tests/test_cli.py +++ b/tests/test_cli.py @@ -162,6 +162,48 @@ def test_cli_doc_leakage_guard_values_match_source_of_truth(): ) +def test_cli_doc_info_sample_matches_engines_and_selection_methods(): + """Regression for round-2 review: the CLI doc's ``info`` JSON sample + must list the FULL sorted ``SUPPORTED_ENGINES`` and + ``SUPPORTED_SELECTION_METHODS`` sets, not a truncated subset. + + Round-2 specifically caught that the sample omitted ``llm`` from + ``supported_engines`` and ``chi2`` / ``f_test`` / ``xgboost`` from + ``supported_selection_methods``. Even with a "truncated" caveat the + arrays look complete and steered users away from valid CLI choices, + so we now pin the entire sorted list against + ``AutoFeatureEngineer.SUPPORTED_*`` so future drift in either + direction (adding a new engine, renaming a method) fails fast. + """ + from featcopilot.transformers.sklearn_compat import AutoFeatureEngineer + + cli_doc = (Path(__file__).resolve().parent.parent / "docs" / "user-guide" / "cli.md").read_text(encoding="utf-8") + + expected_engines = sorted(AutoFeatureEngineer.SUPPORTED_ENGINES) + engines_fragment = '"supported_engines": ' + json.dumps(expected_engines) + assert engines_fragment in cli_doc, ( + f"docs/user-guide/cli.md ``info`` sample must show {engines_fragment!r}; " + "the JSON sample drifted away from AutoFeatureEngineer.SUPPORTED_ENGINES" + ) + + expected_methods = sorted(AutoFeatureEngineer.SUPPORTED_SELECTION_METHODS) + # The methods array is multi-line in the doc (one entry per line) + # for readability, so check that each value is present as a + # JSON-quoted string in the sample. We anchor the search inside the + # ``"supported_selection_methods": [`` block to avoid false + # positives from prose like "``mutual_info``". + sm_marker = '"supported_selection_methods": [' + sm_idx = cli_doc.find(sm_marker) + assert sm_idx != -1, "docs/user-guide/cli.md must contain a supported_selection_methods JSON sample block" + sm_block_end = cli_doc.find("]", sm_idx) + sm_block = cli_doc[sm_idx:sm_block_end] + for method in expected_methods: + assert f'"{method}"' in sm_block, ( + f"docs/user-guide/cli.md ``info`` sample must list {method!r} in supported_selection_methods; " + f"current set is {expected_methods}" + ) + + def test_top_level_version_flag(capsys): # ``--version`` (argparse action) prints to stdout; main() now traps the # SystemExit and returns the code so the API contract is consistent. diff --git a/tests/test_core.py b/tests/test_core.py index 66b2610..a4c4b59 100644 --- a/tests/test_core.py +++ b/tests/test_core.py @@ -143,40 +143,43 @@ def test_feature_compute_resolves_df_inside_comprehension(self): def test_feature_compute_isolates_safe_builtins_across_calls(self): """Regression: ``_SAFE_BUILTINS`` must NOT be passed to ``exec`` - by reference. If it were, a malicious / buggy snippet could - delete or rebind entries (e.g. ``del __builtins__['len']``) - and the mutation would persist across subsequent ``Feature.compute`` - calls in the same process, breaking unrelated features. The - fix passes ``dict(_SAFE_BUILTINS)`` (a shallow copy) per call - so that side effects are local to that single ``exec``. + by reference. If it were, a snippet that rebinds entries in its + own ``__builtins__`` view (e.g. ``__builtins__['len'] = lambda + x: -999``) would leak that mutation into every subsequent + ``Feature.compute`` call in the same process, breaking unrelated + features. The fix passes ``dict(_SAFE_BUILTINS)`` (a fresh + shallow copy) per call so that side effects are local to that + single ``exec``. + + Round-2 review note: an earlier version of this test started + the attacker snippet with ``import builtins``, but ``__import__`` + is intentionally NOT in the safe-builtins whitelist, so the + snippet raised at the very first line and the surrounding + ``try/except`` swallowed the failure ─ meaning the test would + pass even if ``_SAFE_BUILTINS`` were still being shared by + reference. This rewrite mutates ``__builtins__`` directly via + the dict that ``exec`` sees as the snippet's builtins source, + AND asserts inside the snippet (via ``len(df) == -999``) that + the rebinding actually took effect. That dual assertion is + what makes the test a real regression check. """ df = pd.DataFrame({"col1": [1, 2, 3]}) - # Snippet that rebinds ``len`` inside its own builtins view. - # If the implementation passed _SAFE_BUILTINS by reference, - # this would corrupt the whitelist for every feature that runs - # after it. + # Attacker snippet: rebind ``len`` in this call's __builtins__ + # dict. The trailing ``result = pd.Series([len(df)])`` would + # yield 3 if the rebinding had failed, and -999 if the + # rebinding succeeded. That's the in-snippet proof that + # mutation IS possible within a single exec ─ which is what + # makes the per-call-copy isolation contract meaningful. f_attacker = Feature( name="poisoner", - code=( - "import builtins\n" - # We can't directly mutate __builtins__ because exec - # turns the dict into a module, but we CAN rebind keys. - "_b = __builtins__\n" - "if isinstance(_b, dict):\n" - " _b['len'] = lambda x: -999\n" - "result = pd.Series([1, 2, 3])" - ), + code=("__builtins__['len'] = lambda x: -999\nresult = pd.Series([len(df)])\n"), + ) + poisoned = f_attacker.compute(df) + assert list(poisoned) == [-999], ( + "Attacker snippet failed to mutate __builtins__: the regression test " + "is no longer exercising the isolation path it claims to protect" ) - # The attacker may or may not actually corrupt anything — - # depending on Python's exec semantics — but the next call - # MUST behave correctly regardless. - try: - f_attacker.compute(df) - except Exception: - # Even if the attacker snippet itself fails, the isolation - # contract still applies to the next call. - pass # A clean, well-behaved feature run AFTER the attacker must # see the original ``len`` builtin, not the poisoned version. @@ -184,8 +187,8 @@ def test_feature_compute_isolates_safe_builtins_across_calls(self): name="clean_user", code="result = pd.Series([len(df)] * len(df))", ) - result = f_victim.compute(df) - assert list(result) == [3, 3, 3], ( + clean = f_victim.compute(df) + assert list(clean) == [3, 3, 3], ( "Per-call dict copy of _SAFE_BUILTINS is broken: a previous " "feature's mutation leaked into a later call" )