diff --git a/docs/getting-started/installation.md b/docs/getting-started/installation.md index 89c8964..e567704 100644 --- a/docs/getting-started/installation.md +++ b/docs/getting-started/installation.md @@ -79,7 +79,7 @@ Development dependencies include: ```python import featcopilot print(featcopilot.__version__) -# Output: 0.1.0 +# Output: 0.3.7 from featcopilot import AutoFeatureEngineer print("Installation successful!") diff --git a/docs/user-guide/cli.md b/docs/user-guide/cli.md new file mode 100644 index 0000000..ed68367 --- /dev/null +++ b/docs/user-guide/cli.md @@ -0,0 +1,236 @@ +# Command-Line Interface + +FeatCopilot ships a stable, agent-friendly `featcopilot` CLI for using the +library from shells, CI pipelines, and **agentic / LLM tool-use** workflows +without writing Python glue. All subcommands accept `--json` for +machine-readable stdout; user-facing errors are written to **stderr** with +a non-zero exit code so that automation can parse failures +deterministically. + +The CLI is installed automatically with the package via the +`[project.scripts]` entry point (`featcopilot = "featcopilot.cli:main"`), +so after `pip install featcopilot` the `featcopilot` command is available +on `$PATH`. The equivalent module form `python -m featcopilot ...` always +works regardless of how the package was installed. + +## Subcommands + +| Command | Purpose | +| --- | --- | +| `featcopilot info` | Print version, supported engines, selection methods, leakage guards, I/O formats, and a runtime `parquet_available` flag. | +| `featcopilot transform` | Read a CSV / Parquet / JSON file, run [`AutoFeatureEngineer`](../user-guide/overview.md), and write engineered features to an output file. | +| `featcopilot explain` | Fit and print a JSON document with `{name, explanation, code}` per feature for downstream LLM consumption (no output file is written). | + +Run any subcommand with `--help` to see the full flag list: + +```bash +featcopilot --help +featcopilot transform --help +featcopilot explain --help +``` + +## Output contract + +All three subcommands honor the same agent-friendly contract: + +* **`stdout`** carries the result. With `--json` (always implicit for + `explain`), exactly one JSON document is written. +* **`stderr`** is reserved for failures. A successful run keeps `stderr` + empty even when `AutoFeatureEngineer` emits leakage warnings or + `verbose` logger output ─ those are surfaced via the JSON payload's + `warnings` field instead. This same contract covers warnings emitted + during pandas / pyarrow read or write phases (e.g. `DtypeWarning` on + mixed-type CSVs, `FutureWarning` from a successful Parquet write): + they are routed to the JSON `warnings` field, never to `stderr`. +* **Exit codes**: `0` on success; `2` for user-input errors (missing + files, malformed config, unknown target, etc.); `1` for unexpected + internal errors. + +## `featcopilot info` + +Discover capabilities without running an engineer: + +```bash +featcopilot info --json +``` + +Sample (truncated) output: + +```json +{ + "version": "0.3.7", + "supported_engines": ["llm", "relational", "tabular", "text", "timeseries"], + "supported_selection_methods": [ + "chi2", + "correlation", + "f_test", + "importance", + "mutual_info", + "xgboost" + ], + "supported_leakage_guards": ["off", "raise", "warn"], + "supported_input_formats": ["csv", "json"], + "supported_output_formats": ["csv", "json"], + "parquet_available": false +} +``` + +When a parquet engine (`pyarrow` or `fastparquet`) IS importable in the +current environment, `"parquet"` is added to `supported_input_formats` +and `supported_output_formats` (in source order, so the lists become +`["csv", "parquet", "json"]`) and `parquet_available` flips to `true`. + +`parquet_available` reflects whether `pyarrow` or `fastparquet` is +importable in the current environment. The base FeatCopilot install does +not pin a parquet engine; install one with +`pip install pyarrow` (or `fastparquet`) to enable Parquet I/O. + +## `featcopilot transform` + +Run feature engineering on a tabular input and write the engineered +features to disk: + +```bash +featcopilot transform \ + --input data.csv --target label --output features.csv \ + --engines tabular --max-features 50 \ + --json +``` + +Common flags: + +| Flag | Purpose | +| --- | --- | +| `--input / -i` | Path to input file (CSV / Parquet / JSON). Required. | +| `--output / -o` | Path to output file. Required. | +| `--target / -t` | Target column. Required when feature selection is applied (i.e. when `--max-features` / config `max_features` is set). | +| `--input-format` / `--output-format` | Override format detection (`csv` / `parquet` / `json`). | +| `--engines` | One or more engines to enable (default: `tabular`). | +| `--max-features N` | Cap on engine output / selection. Forwarded both to engine constructors and to the selector. | +| `--no-selection` | Skip feature selection entirely (raw feature generation). | +| `--selection-methods` | Override the default `mutual_info importance` selection set. | +| `--leakage-guard` | How to handle suspicious column names: `warn` (default — log a warning and continue), `raise` (hard-fail with an error), or `off` (disable the check). | +| `--include-target` | Re-attach the target column to the output file (collision-safe). | +| `--task-description` | Free-form ML task description forwarded to LLM-aware engines. | +| `--config FILE` | JSON config with nested keys (e.g. `llm_config`, `selection_methods`). CLI flags override config values. | +| `--verbose / --no-verbose` | Toggle verbose logging. With `--json`, log records are routed to the JSON `warnings` field rather than `stderr`. | +| `--gate-n-jobs` | Parallelism for the do-no-harm gate's RF (default 1; `-1` = all cores). | +| `--json` | Emit a one-line JSON status object on stdout instead of human-readable text. | + +A successful `--json` run prints something like: + +```json +{ + "status": "ok", + "input": "data.csv", + "output": "features.csv", + "input_format": "csv", + "output_format": "csv", + "n_rows": 1000, + "n_features": 47, + "n_input_columns": 12, + "n_generated_features": 47, + "engines": ["tabular"], + "selection_methods": ["mutual_info", "importance"], + "max_features": 50, + "target": "label", + "selection_applied": true, + "warnings": [] +} +``` + +## `featcopilot explain` + +Fit the engineer (without writing any output file) and print a JSON +catalog of generated features for downstream LLM consumption: + +```bash +featcopilot explain --input data.csv --target label +``` + +Each entry in the `features` array contains the feature `name`, an +LLM-style natural-language `explanation`, and the executable Python +`code` used to produce it. + +`explain` defaults to running on the **full** input so the metadata is +a faithful description of what a corresponding `transform` would +generate. Some engines (notably the tabular engine's categorical +encoding) consult per-row / per-category statistics when planning +features, so blind subsampling can silently change results. For very +large inputs where metadata-only `explain` should not pay full memory +or compute cost, opt in with: + +```bash +featcopilot explain --input big.csv --target label --explain-sample-size 5000 +``` + +The cap is a deterministic *head slice* (the first N rows), threaded +through `pd.read_csv(nrows=N)` for CSV so memory is bounded natively. +For Parquet / JSON pandas has no native row-limit, so the file is +fully read and then truncated; a `UserWarning` explaining the +limitation is emitted (and surfaced in the JSON `warnings` field) only +when the cap actually truncates the input. + +## Configuration files + +Pass `--config config.json` to provide nested keys that don't have +matching CLI flags, such as the `llm_config` engine kwargs: + +```json +{ + "engines": ["tabular", "llm"], + "max_features": 80, + "selection_methods": ["mutual_info", "importance"], + "llm_config": { + "backend": "litellm", + "model": "gpt-4o", + "max_suggestions": 20 + } +} +``` + +Explicit CLI flags override values from the config file. Any malformed +scalar (e.g. `"max_features": "5"`, `"verbose": "false"`) is rejected +with a clean exit-2 error rather than failing later inside the +engineer. + +## Parquet I/O + +The base FeatCopilot install does not pin a parquet engine. To use +`--input file.parquet` / `--output file.parquet` (or the `parquet` +value of `--input-format` / `--output-format`), install one of: + +```bash +pip install pyarrow # recommended +# or +pip install fastparquet +``` + +Confirm with `featcopilot info --json`: + +```json +{ "parquet_available": true, ... } +``` + +If neither engine is installed, attempting Parquet I/O fails with a +clean exit-2 error pointing at the missing dependency. + +## Agentic-usage tips + +* Always pass `--json`. Treat anything on `stderr` as a hard failure; + treat anything on `stdout` as the JSON result. +* Treat the JSON `warnings` field as a list of human-readable + diagnostic strings ─ it is non-empty for `transform` runs that + generated leakage / mock-mode / sampling notices, and empty for + fully clean runs. +* For long-running batch jobs, prefer `featcopilot transform` to + `python -m featcopilot transform` only because the former is shorter; + both invoke the exact same entry point. + +## See also + +* [Overview](overview.md) ─ the underlying `AutoFeatureEngineer` API. +* [Engines](engines.md) ─ what each engine generates. +* [LLM Features](llm-features.md) ─ configuring the LLM backend (provide + an `llm_config` object inside the JSON file passed to `--config`, as + shown in the [Configuration files](#configuration-files) section above). diff --git a/featcopilot/core/feature.py b/featcopilot/core/feature.py index 4e3e933..dc13209 100644 --- a/featcopilot/core/feature.py +++ b/featcopilot/core/feature.py @@ -12,6 +12,45 @@ logger = get_logger(__name__) +# Curated set of safe Python builtins exposed to ``Feature.compute``'s +# stored code. Without this whitelist (i.e. with ``{"__builtins__": {}}``) +# even basic idioms like ``len(df)``, ``range(...)``, ``sum(...)``, or +# ``int(x)`` raise ``NameError`` at exec time, which means a feature whose +# code legitimately uses a Python builtin crashes during ``compute`` even +# though the snippet is otherwise valid. The set mirrors the one used by +# :class:`featcopilot.core.transform_rule.TransformRule` so both code +# execution paths agree on what is safe. +_SAFE_BUILTINS: dict[str, Any] = { + "len": len, + "sum": sum, + "max": max, + "min": min, + "int": int, + "float": float, + "str": str, + "bool": bool, + "abs": abs, + "round": round, + "pow": pow, + "range": range, + "list": list, + "dict": dict, + "set": set, + "tuple": tuple, + "sorted": sorted, + "reversed": reversed, + "enumerate": enumerate, + "zip": zip, + "any": any, + "all": all, + "map": map, + "filter": filter, + "isinstance": isinstance, + "hasattr": hasattr, + "getattr": getattr, +} + + class FeatureType(Enum): """Types of features.""" @@ -109,6 +148,42 @@ def compute(self, df: pd.DataFrame) -> pd.Series: """ Compute feature values from DataFrame using stored code. + The stored ``code`` is executed in a single shared namespace + with ``df``, ``np`` and ``pd`` bound as names alongside a + curated set of safe Python builtins (``len``, ``range``, + ``sum``, numeric / sequence constructors, etc.) so common + idioms work without giving the snippet a Python import system + — ``__import__`` is intentionally NOT in the safe builtins, so + an ``import foo`` statement inside the snippet raises at exec + time. The snippet must bind its output to a name called + ``result``. + + .. note:: + This is **not** a security sandbox for untrusted code. + ``pd`` is in scope, which means the snippet can reach + pandas' file I/O helpers (``pd.read_csv``, ``pd.read_parquet``, + ``df.to_csv``, ...), and dunder attribute access on objects + reachable from ``df`` / ``np`` / ``pd`` is not blocked. The + builtin whitelist limits the *namespace* available to plain + Python idioms; it does not isolate FeatCopilot from the + ambient process. Stored snippets must therefore come from a + trusted source (your own code generator, a vetted feature + store, or a transform-rule registry you control). + + A *fresh copy* of the safe-builtins dict is passed into ``exec`` + on every call so that any mutation the snippet performs on + ``__builtins__`` (rebinding entries, ``del``, ``pop``) does not + bleed into subsequent ``compute`` calls. Likewise the + data-bound namespace is constructed fresh per call. Using a + SINGLE dict for both ``globals`` and ``locals`` is what makes + free variables inside comprehensions and lambdas — which Python + resolves against the enclosing function's globals, not the + caller's locals — see ``df``, ``np`` and ``pd`` correctly. + With separate ``locals`` and ``globals`` dicts a snippet such + as ``[df['c'].iloc[i] for i in range(len(df))]`` would + otherwise raise ``NameError`` because the implicit comprehension + function's body looks ``df`` up in the (empty) ``globals``. + Parameters ---------- df : DataFrame @@ -118,14 +193,40 @@ def compute(self, df: pd.DataFrame) -> pd.Series: ------- Series Computed feature values + + Raises + ------ + ValueError + * If ``self.code`` is empty / missing — message + ``"No code defined for feature ..."``. + * If ``self.code`` is present but did not bind a + ``result`` variable — message + ``"Feature ... code did not produce a 'result' variable"``. + These two cases produce DIFFERENT messages so a failing + snippet is distinguishable from an unset feature when + debugging. """ - if self.code: - # Execute stored code to compute feature - local_vars = {"df": df, "np": np, "pd": pd} - exec(self.code, {"__builtins__": {}}, local_vars) - if "result" in local_vars: - return local_vars["result"] - raise ValueError(f"No code defined for feature {self.name}") + if not self.code: + raise ValueError(f"No code defined for feature {self.name}") + + # Single shared namespace so comprehensions / lambdas / + # generator expressions inside the snippet see ``df``, ``np``, + # ``pd`` and the safe builtins. Fresh dicts per call so the + # snippet cannot pollute either the safe-builtins whitelist or + # the data bindings for later ``compute`` invocations. + namespace: dict[str, Any] = { + "__builtins__": dict(_SAFE_BUILTINS), + "df": df, + "np": np, + "pd": pd, + } + exec(self.code, namespace) + if "result" not in namespace: + raise ValueError( + f"Feature {self.name!r} code did not produce a 'result' variable. " + "Stored snippet must bind its output to a name called 'result'." + ) + return namespace["result"] class FeatureSet: diff --git a/featcopilot/llm/__init__.py b/featcopilot/llm/__init__.py index d63780e..3039868 100644 --- a/featcopilot/llm/__init__.py +++ b/featcopilot/llm/__init__.py @@ -4,7 +4,7 @@ """ from featcopilot.llm.code_generator import FeatureCodeGenerator -from featcopilot.llm.copilot_client import CopilotFeatureClient +from featcopilot.llm.copilot_client import CopilotFeatureClient, SyncCopilotFeatureClient from featcopilot.llm.explainer import FeatureExplainer from featcopilot.llm.litellm_client import LiteLLMFeatureClient, SyncLiteLLMFeatureClient from featcopilot.llm.openai_client import OpenAIFeatureClient, SyncOpenAIFeatureClient @@ -13,6 +13,7 @@ __all__ = [ "CopilotFeatureClient", + "SyncCopilotFeatureClient", "LiteLLMFeatureClient", "SyncLiteLLMFeatureClient", "OpenAIFeatureClient", diff --git a/featcopilot/utils/cache.py b/featcopilot/utils/cache.py index ad8e23e..1067727 100644 --- a/featcopilot/utils/cache.py +++ b/featcopilot/utils/cache.py @@ -8,6 +8,10 @@ import pandas as pd +from featcopilot.utils.logger import get_logger + +logger = get_logger(__name__) + class FeatureCache: """ @@ -85,8 +89,18 @@ def get(self, key: str, data_hash: str | None = None) -> Any | None: # Store in memory cache self._memory_cache[cache_key] = value return value - except Exception: - pass + except Exception as exc: # noqa: BLE001 - we re-surface via logger + # Cache reads must never crash the caller (a corrupted + # entry should fall back to recomputing the feature) + # but the failure must not be silent: log a warning + # so users can diagnose disk issues / pickle version + # mismatches / partially-written files. + logger.warning( + "FeatureCache: failed to read cache entry %s (%s: %s)", + cache_path, + type(exc).__name__, + exc, + ) return None @@ -137,8 +151,18 @@ def set( meta_path = self.cache_dir / f"{cache_key}.meta.json" with open(meta_path, "w") as f: json.dump(metadata or {}, f) - except Exception: - pass + except Exception as exc: # noqa: BLE001 - we re-surface via logger + # Persisting to disk is best-effort: a write failure (e.g. + # disk full, read-only filesystem, unpicklable value) + # should not crash the caller, but it must not be silent + # either. Log a warning so operational issues with the + # cache directory surface in observability. + logger.warning( + "FeatureCache: failed to persist cache entry %s (%s: %s)", + cache_path, + type(exc).__name__, + exc, + ) def has(self, key: str, data_hash: str | None = None) -> bool: """Check if key exists in cache.""" @@ -205,8 +229,17 @@ def get_metadata(self, key: str, data_hash: str | None = None) -> dict | None: try: with open(meta_path) as f: return json.load(f) - except Exception: - pass + except Exception as exc: # noqa: BLE001 - we re-surface via logger + # Same rationale as the cache-read path: a corrupted + # metadata file should fall back to "no metadata + # available" rather than crashing the caller, but + # the failure must surface via the logger. + logger.warning( + "FeatureCache: failed to read cache metadata %s (%s: %s)", + meta_path, + type(exc).__name__, + exc, + ) return None diff --git a/mkdocs.yml b/mkdocs.yml index 8136adf..48ca57f 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -80,6 +80,7 @@ nav: - LLM Features: user-guide/llm-features.md - Feature Selection: user-guide/selection.md - Feature Stores: user-guide/feature-stores.md + - Command-Line Interface: user-guide/cli.md - Benchmarks: user-guide/benchmarks.md - Best Practices: user-guide/best-practices.md - Examples: diff --git a/tests/test_cli.py b/tests/test_cli.py index 2f8c1f3..4cb0e9b 100644 --- a/tests/test_cli.py +++ b/tests/test_cli.py @@ -124,6 +124,86 @@ def test_info_text_mode_is_human_readable(): assert __version__ in out +def test_cli_doc_leakage_guard_values_match_source_of_truth(): + """Regression for round-1 review: the user-guide CLI doc previously + listed ``--leakage-guard`` choices as ``warn`` / ``block`` / ``ignore``, + but the actual ``AutoFeatureEngineer.SUPPORTED_LEAKAGE_GUARDS`` set is + ``{off, warn, raise}``. Following the doc with + ``--leakage-guard block`` produced an argparse error rather than the + described behavior. This test pins the doc against the source of + truth so any future drift fails fast. + """ + from featcopilot.transformers.sklearn_compat import AutoFeatureEngineer + + expected = sorted(AutoFeatureEngineer.SUPPORTED_LEAKAGE_GUARDS) + cli_doc = (Path(__file__).resolve().parent.parent / "docs" / "user-guide" / "cli.md").read_text(encoding="utf-8") + + # The ``info`` JSON sample MUST list the actual values (sorted). + expected_json_fragment = '"supported_leakage_guards": ' + json.dumps(expected) + assert ( + expected_json_fragment in cli_doc + ), f"docs/user-guide/cli.md must show {expected_json_fragment!r} in the info sample; got mismatch with source of truth" + + # The ``--leakage-guard`` row MUST mention each valid value as + # ```value`` and MUST NOT mention any value that is not actually + # accepted. Using a substring check on the literal backtick-quoted + # token avoids false positives from prose elsewhere on the page + # (e.g. ``warn`` appears in unrelated wording). + invalid_examples = ("`block`", "`ignore`") + for bad in invalid_examples: + assert bad not in cli_doc, ( + f"docs/user-guide/cli.md must not advertise {bad} as a leakage_guard value; " + f"actual valid values are {expected}" + ) + for value in expected: + assert f"`{value}`" in cli_doc, ( + f"docs/user-guide/cli.md must advertise `{value}` as a valid leakage_guard value; " + f"current values are {expected}" + ) + + +def test_cli_doc_info_sample_matches_engines_and_selection_methods(): + """Regression for round-2 review: the CLI doc's ``info`` JSON sample + must list the FULL sorted ``SUPPORTED_ENGINES`` and + ``SUPPORTED_SELECTION_METHODS`` sets, not a truncated subset. + + Round-2 specifically caught that the sample omitted ``llm`` from + ``supported_engines`` and ``chi2`` / ``f_test`` / ``xgboost`` from + ``supported_selection_methods``. Even with a "truncated" caveat the + arrays look complete and steered users away from valid CLI choices, + so we now pin the entire sorted list against + ``AutoFeatureEngineer.SUPPORTED_*`` so future drift in either + direction (adding a new engine, renaming a method) fails fast. + """ + from featcopilot.transformers.sklearn_compat import AutoFeatureEngineer + + cli_doc = (Path(__file__).resolve().parent.parent / "docs" / "user-guide" / "cli.md").read_text(encoding="utf-8") + + expected_engines = sorted(AutoFeatureEngineer.SUPPORTED_ENGINES) + engines_fragment = '"supported_engines": ' + json.dumps(expected_engines) + assert engines_fragment in cli_doc, ( + f"docs/user-guide/cli.md ``info`` sample must show {engines_fragment!r}; " + "the JSON sample drifted away from AutoFeatureEngineer.SUPPORTED_ENGINES" + ) + + expected_methods = sorted(AutoFeatureEngineer.SUPPORTED_SELECTION_METHODS) + # The methods array is multi-line in the doc (one entry per line) + # for readability, so check that each value is present as a + # JSON-quoted string in the sample. We anchor the search inside the + # ``"supported_selection_methods": [`` block to avoid false + # positives from prose like "``mutual_info``". + sm_marker = '"supported_selection_methods": [' + sm_idx = cli_doc.find(sm_marker) + assert sm_idx != -1, "docs/user-guide/cli.md must contain a supported_selection_methods JSON sample block" + sm_block_end = cli_doc.find("]", sm_idx) + sm_block = cli_doc[sm_idx:sm_block_end] + for method in expected_methods: + assert f'"{method}"' in sm_block, ( + f"docs/user-guide/cli.md ``info`` sample must list {method!r} in supported_selection_methods; " + f"current set is {expected_methods}" + ) + + def test_top_level_version_flag(capsys): # ``--version`` (argparse action) prints to stdout; main() now traps the # SystemExit and returns the code so the API contract is consistent. diff --git a/tests/test_core.py b/tests/test_core.py index cbee907..a4c4b59 100644 --- a/tests/test_core.py +++ b/tests/test_core.py @@ -60,6 +60,157 @@ def test_feature_compute(self): assert list(result) == [1, 4, 9, 16] + def test_feature_compute_uses_safe_builtins(self): + """Regression: ``Feature.compute`` previously called + ``exec(code, {"__builtins__": {}}, locals)`` which broke any + snippet that used a Python builtin (``len``, ``range``, ``int``, + ``sum``, ...). The fix exposes a curated set of safe builtins + identical to ``TransformRule._get_safe_builtins`` so common + idioms work without giving the snippet unrestricted access. + """ + df = pd.DataFrame({"col1": [10, 20, 30, 40, 50]}) + + # ``len`` (most common builtin used in a feature) — would NameError + # before the fix. + f_len = Feature(name="row_count", code="result = pd.Series([len(df)] * len(df))") + result = f_len.compute(df) + assert list(result) == [5, 5, 5, 5, 5] + + # ``range`` + ``sum`` + numeric builtins — exercises a broader + # subset of the whitelist in one snippet. + f_range = Feature( + name="cumulative_index_sum", + code="result = pd.Series([sum(range(int(v))) for v in df['col1']])", + ) + result = f_range.compute(df) + # sum(range(10))=45, sum(range(20))=190, sum(range(30))=435, + # sum(range(40))=780, sum(range(50))=1225 + assert list(result) == [45, 190, 435, 780, 1225] + + # ``abs`` + ``round`` + ``min`` / ``max`` — ensure the + # whitelist's full set is exposed. + f_clip = Feature( + name="clipped", + code="result = pd.Series([round(min(max(abs(v - 25), 0), 20), 2) for v in df['col1']])", + ) + result = f_clip.compute(df) + assert list(result) == [15, 5, 5, 15, 20] + + def test_feature_compute_resolves_df_inside_comprehension(self): + """Regression for round-2 review: when a snippet uses a + comprehension or lambda whose body references ``df``, ``np`` or + ``pd``, Python resolves those free variables against the + enclosing function's *globals*, not the caller's locals. With + a previous implementation that passed ``df`` only via the + ``locals`` argument to ``exec`` (and had ``globals`` set to + only ``__builtins__``), the inner reference would fail with + ``NameError`` and the call would either crash or be silently + dropped by ``FeatureSet.compute_all``. The fix uses a single + shared namespace for both ``globals`` and ``locals``. + """ + df = pd.DataFrame({"col1": [1, 2, 3, 4, 5]}) + + # List comprehension whose body references ``df`` and uses + # ``range`` / ``len`` builtins — this is the exact pattern + # called out in the round-2 review. + f_compre = Feature( + name="from_comprehension", + code="result = pd.Series([df['col1'].iloc[i] * 2 for i in range(len(df))])", + ) + result = f_compre.compute(df) + assert list(result) == [2, 4, 6, 8, 10] + + # Lambda whose body references ``df`` and ``pd`` — same + # globals-resolution rule applies. + f_lambda = Feature( + name="from_lambda", + code=( + "fn = lambda i: int(df['col1'].iloc[i]) ** 2\n" "result = pd.Series([fn(i) for i in range(len(df))])" + ), + ) + result = f_lambda.compute(df) + assert list(result) == [1, 4, 9, 16, 25] + + # Generator expression piped into a builtin — exercises the + # iterator-protocol path through the same namespace. + f_gen = Feature( + name="from_generator", + code="result = pd.Series([sum(v for v in df['col1'] if v <= n) for n in df['col1']])", + ) + result = f_gen.compute(df) + # cumulative sums for thresholds 1, 2, 3, 4, 5: 1, 3, 6, 10, 15 + assert list(result) == [1, 3, 6, 10, 15] + + def test_feature_compute_isolates_safe_builtins_across_calls(self): + """Regression: ``_SAFE_BUILTINS`` must NOT be passed to ``exec`` + by reference. If it were, a snippet that rebinds entries in its + own ``__builtins__`` view (e.g. ``__builtins__['len'] = lambda + x: -999``) would leak that mutation into every subsequent + ``Feature.compute`` call in the same process, breaking unrelated + features. The fix passes ``dict(_SAFE_BUILTINS)`` (a fresh + shallow copy) per call so that side effects are local to that + single ``exec``. + + Round-2 review note: an earlier version of this test started + the attacker snippet with ``import builtins``, but ``__import__`` + is intentionally NOT in the safe-builtins whitelist, so the + snippet raised at the very first line and the surrounding + ``try/except`` swallowed the failure ─ meaning the test would + pass even if ``_SAFE_BUILTINS`` were still being shared by + reference. This rewrite mutates ``__builtins__`` directly via + the dict that ``exec`` sees as the snippet's builtins source, + AND asserts inside the snippet (via ``len(df) == -999``) that + the rebinding actually took effect. That dual assertion is + what makes the test a real regression check. + """ + df = pd.DataFrame({"col1": [1, 2, 3]}) + + # Attacker snippet: rebind ``len`` in this call's __builtins__ + # dict. The trailing ``result = pd.Series([len(df)])`` would + # yield 3 if the rebinding had failed, and -999 if the + # rebinding succeeded. That's the in-snippet proof that + # mutation IS possible within a single exec ─ which is what + # makes the per-call-copy isolation contract meaningful. + f_attacker = Feature( + name="poisoner", + code=("__builtins__['len'] = lambda x: -999\nresult = pd.Series([len(df)])\n"), + ) + poisoned = f_attacker.compute(df) + assert list(poisoned) == [-999], ( + "Attacker snippet failed to mutate __builtins__: the regression test " + "is no longer exercising the isolation path it claims to protect" + ) + + # A clean, well-behaved feature run AFTER the attacker must + # see the original ``len`` builtin, not the poisoned version. + f_victim = Feature( + name="clean_user", + code="result = pd.Series([len(df)] * len(df))", + ) + clean = f_victim.compute(df) + assert list(clean) == [3, 3, 3], ( + "Per-call dict copy of _SAFE_BUILTINS is broken: a previous " "feature's mutation leaked into a later call" + ) + + def test_feature_compute_distinguishes_missing_code_from_missing_result(self): + """Regression: the ``ValueError`` raised by ``Feature.compute`` + must use DIFFERENT messages for the two distinct failure + modes (``self.code`` empty / missing vs. snippet ran but did + not bind ``result``). Previously both paths surfaced the + misleading ``"No code defined ..."`` message even when the + code was clearly present, making debugging harder. + """ + # No code at all → "No code defined" branch. + f_empty = Feature(name="no_code") + with pytest.raises(ValueError, match="No code defined for feature no_code"): + f_empty.compute(pd.DataFrame({"x": [1]})) + + # Code present but does not produce ``result`` → distinct + # message that points at the actual cause. + f_no_result = Feature(name="no_result", code="x = df['col1'] * 2") + with pytest.raises(ValueError, match=r"Feature 'no_result' code did not produce a 'result' variable"): + f_no_result.compute(pd.DataFrame({"col1": [1, 2, 3]})) + class TestFeatureSet: """Tests for FeatureSet class.""" diff --git a/tests/test_llm_modules.py b/tests/test_llm_modules.py index e1c426b..a42a2d8 100644 --- a/tests/test_llm_modules.py +++ b/tests/test_llm_modules.py @@ -2596,3 +2596,52 @@ def test_send_prompt_handles_concurrent_futures_timeout_error(self): result = asyncio.run(client.send_prompt("hi")) assert "timed out" in result.lower() + + +# --------------------------------------------------------------------------- +# featcopilot.llm package public-API consistency +# --------------------------------------------------------------------------- + + +class TestLLMPackageExports: + """Pin the public API of the ``featcopilot.llm`` package. + + Regression: ``SyncCopilotFeatureClient`` was defined in + ``copilot_client.py`` and used by sibling modules (e.g. + ``code_generator.py``, ``explainer.py``) but was missing from + ``featcopilot/llm/__init__.py``'s imports and ``__all__`` while the + sister classes ``SyncOpenAIFeatureClient`` and + ``SyncLiteLLMFeatureClient`` WERE both exported. That made + ``from featcopilot.llm import SyncCopilotFeatureClient`` raise + ``ImportError`` even though the class itself was importable from the + submodule, which is a public-API surface inconsistency callers + legitimately notice. + """ + + def test_sync_clients_are_all_exported(self): + """All three sync LLM clients must be importable directly from + the ``featcopilot.llm`` package.""" + from featcopilot.llm import ( + SyncCopilotFeatureClient, + SyncLiteLLMFeatureClient, + SyncOpenAIFeatureClient, + ) + + # Identity check: the names exposed by the package must be the + # same objects defined in the underlying submodules. + from featcopilot.llm.copilot_client import SyncCopilotFeatureClient as _SCC + from featcopilot.llm.litellm_client import SyncLiteLLMFeatureClient as _SLC + from featcopilot.llm.openai_client import SyncOpenAIFeatureClient as _SOC + + assert SyncCopilotFeatureClient is _SCC + assert SyncLiteLLMFeatureClient is _SLC + assert SyncOpenAIFeatureClient is _SOC + + def test_sync_clients_are_listed_in_dunder_all(self): + """``__all__`` must mirror the imports so ``from featcopilot.llm + import *`` is consistent with explicit imports.""" + import featcopilot.llm as llm_pkg + + assert "SyncCopilotFeatureClient" in llm_pkg.__all__ + assert "SyncLiteLLMFeatureClient" in llm_pkg.__all__ + assert "SyncOpenAIFeatureClient" in llm_pkg.__all__ diff --git a/tests/test_utils.py b/tests/test_utils.py index 75731d3..9996895 100644 --- a/tests/test_utils.py +++ b/tests/test_utils.py @@ -204,6 +204,81 @@ def test_list_keys_includes_disk(self): cache._memory_cache.clear() assert "disk_key" in cache.list_keys() + def test_corrupted_cache_read_logs_warning(self): + """Regression: ``FeatureCache.get`` previously had a bare + ``except Exception: pass`` that silently swallowed ``pickle`` + errors when the on-disk cache entry was corrupted (truncated + write, version mismatch, partial flush). The fix logs a + warning via ``featcopilot.utils.cache``'s module logger so + users can diagnose disk issues. This test corrupts the on-disk + ``.pkl`` and asserts ``get()`` returns ``None`` AND emits a + warning containing the cache path. + """ + from pathlib import Path + + with tempfile.TemporaryDirectory() as tmpdir: + cache = FeatureCache(cache_dir=tmpdir) + cache.set("victim", {"value": 42}) + + # Find the persisted file and corrupt it. + pkl_path = next(Path(tmpdir).glob("*.pkl")) + pkl_path.write_bytes(b"\x00\x01\x02not-a-valid-pickle-stream") + + # Drop the in-memory entry so ``get`` falls through to disk. + cache._memory_cache.clear() + + with patch("featcopilot.utils.cache.logger") as mock_logger: + value = cache.get("victim") + assert value is None + assert mock_logger.warning.called, "FeatureCache must log on read failure" + msg = mock_logger.warning.call_args[0][0] + assert "failed to read cache entry" in msg + + def test_unpicklable_value_logs_warning_on_set(self): + """Regression: ``FeatureCache.set`` previously had a bare + ``except Exception: pass`` that silently swallowed + ``pickle.PicklingError`` when the value could not be persisted + (e.g. lambda / open file handle). The fix logs a warning so + operational issues with the cache directory surface in + observability. + """ + with tempfile.TemporaryDirectory() as tmpdir: + cache = FeatureCache(cache_dir=tmpdir) + + # Lambdas cannot be pickled by the default protocol. + unpicklable = lambda x: x # noqa: E731 + + with patch("featcopilot.utils.cache.logger") as mock_logger: + cache.set("bad", unpicklable) + assert mock_logger.warning.called, "FeatureCache must log on persist failure" + msg = mock_logger.warning.call_args[0][0] + assert "failed to persist cache entry" in msg + + def test_corrupted_metadata_read_logs_warning(self): + """Regression: ``FeatureCache.get_metadata`` previously had a + bare ``except Exception: pass`` that hid corrupted JSON + metadata files. The fix logs a warning. + """ + from pathlib import Path + + with tempfile.TemporaryDirectory() as tmpdir: + cache = FeatureCache(cache_dir=tmpdir) + cache.set("k", "v", metadata={"src": "tabular"}) + + # Corrupt the metadata JSON. + meta_path = next(Path(tmpdir).glob("*.meta.json")) + meta_path.write_text("{not valid json") + + # Force re-read from disk by clearing the in-memory metadata cache. + cache._metadata.clear() + + with patch("featcopilot.utils.cache.logger") as mock_logger: + md = cache.get_metadata("k") + assert md is None + assert mock_logger.warning.called, "FeatureCache must log on metadata read failure" + msg = mock_logger.warning.call_args[0][0] + assert "failed to read cache metadata" in msg + class TestFeatureCacheDataHash: """Tests for FeatureCache with data_hash parameter."""