Skip to content

Commit 07b4cf4

Browse files
test: add staged pytest integration suite (#182)
* test: add staged pytest integration suite * fix: clean up verbose integration logging * chore: bump version to 1.3.5 * fix: make staged integration framework-aware on resume * chore: clean catboost artifacts in staged integration runner * chore: address review feedback on staged integration runner
1 parent 89f8083 commit 07b4cf4

20 files changed

Lines changed: 843 additions & 7 deletions

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,7 @@ cython_debug/
174174

175175
# Working directory for model generation
176176
workdir/
177+
catboost_info/
177178

178179
# Files generated by running the MLE Bench script
179180
mle-bench-config.yaml

AGENTS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,8 @@ make build-databricks # Databricks Connect
8181

8282
# Run tests
8383
poetry run pytest tests/unit/
84+
make test-integration # Staged pytest integration suite (seed -> search -> eval)
85+
make test-integration-verbose # Same suite with live test logs in terminal
8486

8587
# Format and lint
8688
poetry run black .

CONTRIBUTING.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,21 @@ To set up the development environment:
9191
poetry run pytest
9292
```
9393

94+
4. **Run staged integration tests before opening a PR**:
95+
96+
```bash
97+
# Requires ANTHROPIC_API_KEY and local Spark/Java setup
98+
bash scripts/tests/run_integration_staged.sh
99+
```
100+
101+
The staged suite runs three pytest phases with hard barriers:
102+
- `integration_seed`: builds reusable checkpoints through phase 3
103+
- `integration_search`: resumes from seeds and runs model search
104+
- `integration_eval`: resumes from search checkpoints, runs evaluation, and validates predictor inference
105+
106+
This `tests/integration` suite is the primary pre-PR integration workflow.
107+
Makefile Docker targets remain optional/manual end-to-end checks.
108+
94109
Ensure all tests pass before making contributions.
95110

96111
## Style Guides

Makefile

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44
# Quick reference for developers:
55
# make help Show all available commands
66
# make test-quick Fast test (~30s, 1 iteration)
7+
# make test-integration Staged pytest integration suite
8+
# make test-integration-verbose Staged suite with live logs
79
# make test-xgboost Test XGBoost only
810
# make test-catboost Test CatBoost only
911
# make test-all-models Test all model types
@@ -36,6 +38,8 @@ help:
3638
@echo " make test-lightgbm Test LightGBM model type"
3739
@echo " make test-pytorch Test PyTorch model type"
3840
@echo " make test-keras Test Keras model type"
41+
@echo " make test-integration Run staged pytest integration suite"
42+
@echo " make test-integration-verbose Run staged suite with live logs"
3943
@echo " make test-all-models Test all model types (sequential)"
4044
@echo " make test-full Full test run (3 iterations + evaluation)"
4145
@echo ""
@@ -61,6 +65,28 @@ help:
6165
# Quick Development Tests
6266
# ============================================
6367

68+
# Staged pytest-native integration suite (seed -> search -> eval).
69+
# Optional: make test-integration INTEGRATION_RUN_ID=my_run_id
70+
.PHONY: test-integration
71+
test-integration:
72+
@echo "🧪 Running staged pytest integration suite..."
73+
@if [ -n "$(INTEGRATION_RUN_ID)" ]; then \
74+
echo "Using integration run id: $(INTEGRATION_RUN_ID)"; \
75+
PLEXE_IT_RUN_ID="$(INTEGRATION_RUN_ID)" bash scripts/tests/run_integration_staged.sh; \
76+
else \
77+
bash scripts/tests/run_integration_staged.sh; \
78+
fi
79+
80+
.PHONY: test-integration-verbose
81+
test-integration-verbose:
82+
@echo "🧪 Running staged pytest integration suite (verbose)..."
83+
@if [ -n "$(INTEGRATION_RUN_ID)" ]; then \
84+
echo "Using integration run id: $(INTEGRATION_RUN_ID)"; \
85+
PLEXE_IT_RUN_ID="$(INTEGRATION_RUN_ID)" PLEXE_IT_VERBOSE=1 bash scripts/tests/run_integration_staged.sh; \
86+
else \
87+
PLEXE_IT_VERBOSE=1 bash scripts/tests/run_integration_staged.sh; \
88+
fi
89+
6490
# Fast sanity check - 1 iteration, minimal config
6591
.PHONY: test-quick
6692
test-quick: build

plexe/config.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -562,6 +562,9 @@ def setup_logging(config: Config) -> logging.Logger:
562562
# Get package root logger
563563
package_logger = logging.getLogger("plexe")
564564
package_logger.setLevel(getattr(logging, config.log_level.upper()))
565+
# Avoid duplicate output when external handlers (e.g., pytest live logging)
566+
# are attached to the root logger.
567+
package_logger.propagate = False
565568

566569
# Clear existing handlers to avoid duplicates
567570
package_logger.handlers = []

plexe/execution/dataproc/session.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -109,8 +109,8 @@ def _create_local_spark(config) -> SparkSession:
109109
logger.info("Using pre-bundled Spark JARs from /opt/spark-jars/")
110110
builder = builder.config("spark.jars", spark_jars_env)
111111
else:
112-
# Fallback: Download JARs at runtime via Maven (local development)
113-
logger.info("Downloading Spark JARs from Maven Central (first run may take ~40s)")
112+
# Fallback: Resolve JARs via Maven (download occurs only on cache miss)
113+
logger.info("Resolving Spark JARs via Maven Central (download only on first run/cache miss)")
114114
builder = builder.config(
115115
"spark.jars.packages",
116116
"org.apache.hadoop:hadoop-aws:3.3.6,com.amazonaws:aws-java-sdk-bundle:1.12.367",

plexe/workflow.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,33 @@
7676
# ============================================
7777

7878

79+
def _apply_allowed_model_types_on_resume(context: BuildContext, config: Config, start_phase: int) -> None:
80+
"""Restrict checkpoint-resumed model types to config.allowed_model_types when provided."""
81+
if start_phase <= 1 or not config.allowed_model_types:
82+
return
83+
84+
allowed_types = list(dict.fromkeys(config.allowed_model_types))
85+
if not context.viable_model_types:
86+
context.viable_model_types = allowed_types
87+
logger.info(f"Checkpoint missing viable model types; using allowed model types: {allowed_types}")
88+
return
89+
90+
filtered_model_types = [m for m in context.viable_model_types if m in allowed_types]
91+
if not filtered_model_types:
92+
raise ValueError(
93+
"No model types remain after applying allowed_model_types on resume: "
94+
f"checkpoint={context.viable_model_types}, allowed={allowed_types}"
95+
)
96+
97+
if filtered_model_types != context.viable_model_types:
98+
logger.info(
99+
"Restricting resumed model types from checkpoint %s to %s",
100+
context.viable_model_types,
101+
filtered_model_types,
102+
)
103+
context.viable_model_types = filtered_model_types
104+
105+
79106
def build_model(
80107
spark: SparkSession,
81108
train_dataset_uri: str,
@@ -182,6 +209,8 @@ def build_model(
182209
context.scratch["_user_feedback"] = user_feedback
183210
logger.info("📝 User feedback injected - agents will incorporate guidance into their work")
184211

212+
_apply_allowed_model_types_on_resume(context, config, start_phase)
213+
185214
# Wrap entire workflow in top-level trace span
186215
with tracer.start_as_current_span("ModelBuilder") as root_span:
187216
root_span.set_attribute("experiment_id", experiment_id)

poetry.lock

Lines changed: 37 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pyproject.toml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[tool.poetry]
22
name = "plexe"
3-
version = "1.3.4"
3+
version = "1.3.5"
44
description = "An agentic framework for building ML models from natural language"
55
authors = [
66
"Marcello De Bernardi <mdebernardi@plexe.ai>",
@@ -84,13 +84,21 @@ vision = ["torch"]
8484

8585
[tool.poetry.group.dev.dependencies]
8686
pytest = "^8.3.4"
87+
pytest-xdist = "^3.8.0"
8788
pre-commit = "^4.0.1"
8889
ruff = "^0.14.9"
8990
black = ">=23.0.0"
9091
streamlit = ">=1.52.1,<2.0.0"
9192
plotly = ">=6.5.0,<7.0.0"
9293
boto3 = "^1.42.44"
9394

95+
[tool.pytest.ini_options]
96+
markers = [
97+
"integration_seed: stage 1 integration tests that build reusable checkpoints through phase 3",
98+
"integration_search: stage 2 integration tests that resume from seeds and pause after phase 4",
99+
"integration_eval: stage 3 integration tests that resume from search checkpoints and run evaluation + packaging",
100+
]
101+
94102
[tool.semantic_release]
95103
version_variables = ["pyproject.toml:version"]
96104
commit_parser = "angular"
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
#!/usr/bin/env bash
2+
set -euo pipefail
3+
4+
ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
5+
cd "$ROOT_DIR"
6+
7+
CATBOOST_INFO_DIR="$ROOT_DIR/catboost_info"
8+
9+
cleanup_catboost_info() {
10+
if [[ "${PLEXE_IT_KEEP_CATBOOST_INFO:-0}" == "1" ]]; then
11+
return
12+
fi
13+
rm -rf "$CATBOOST_INFO_DIR"
14+
}
15+
16+
# Remove stale CatBoost local artifacts from previous runs.
17+
cleanup_catboost_info
18+
# Keep repo clean even if a stage fails midway.
19+
trap cleanup_catboost_info EXIT
20+
21+
if [[ -z "${PLEXE_IT_RUN_ID:-}" ]]; then
22+
PLEXE_IT_RUN_ID="$(date +%Y%m%d_%H%M%S)"
23+
fi
24+
export PLEXE_IT_RUN_ID
25+
26+
ARTIFACT_ROOT="$ROOT_DIR/.pytest_cache/integration/$PLEXE_IT_RUN_ID"
27+
mkdir -p "$ARTIFACT_ROOT"
28+
29+
if ! poetry run python -c "import importlib.util,sys; sys.exit(0 if importlib.util.find_spec('xdist') else 1)"; then
30+
echo "ERROR: pytest-xdist is required for staged integration tests."
31+
echo "Install dependencies with: poetry install"
32+
echo "Then verify with: poetry run pytest --help | grep -E '(^| )-n( |$)'"
33+
exit 2
34+
fi
35+
36+
if [[ -n "${PLEXE_IT_WORKERS:-}" ]]; then
37+
WORKERS="${PLEXE_IT_WORKERS}"
38+
elif [[ "${PLEXE_IT_VERBOSE:-0}" == "1" ]]; then
39+
# In verbose mode, default to main-process execution for reliable live logs.
40+
WORKERS="0"
41+
else
42+
WORKERS="auto"
43+
fi
44+
PYTEST_PARALLEL_ARGS=(-n "$WORKERS")
45+
PYTEST_LOG_DISABLE_ARGS=(
46+
--log-disable=LiteLLM
47+
--log-disable=litellm
48+
--log-disable=httpx
49+
--log-disable=httpcore
50+
--log-disable=urllib3
51+
--log-disable=py4j
52+
--log-disable=py4j.clientserver
53+
--log-disable=py4j.java_gateway
54+
)
55+
56+
run_stage() {
57+
local marker="$1"
58+
local cmd=(poetry run pytest tests/integration -m "$marker" "${PYTEST_PARALLEL_ARGS[@]}" --maxfail=1)
59+
60+
if [[ "${PLEXE_IT_VERBOSE:-0}" == "1" ]]; then
61+
cmd+=(-s -vv -o log_cli=true -o log_cli_level=INFO --capture=tee-sys "${PYTEST_LOG_DISABLE_ARGS[@]}")
62+
fi
63+
64+
"${cmd[@]}"
65+
}
66+
67+
echo "Running staged integration tests with run id: $PLEXE_IT_RUN_ID"
68+
echo "Artifacts: $ARTIFACT_ROOT"
69+
echo "Workers: $WORKERS"
70+
if [[ "${PLEXE_IT_VERBOSE:-0}" == "1" ]]; then
71+
echo "Verbose mode: enabled (live logs and test output)"
72+
fi
73+
74+
echo ""
75+
echo "Stage 1/3: building reusable seeds through phase 3"
76+
run_stage "integration_seed"
77+
78+
echo ""
79+
echo "Stage 2/3: resuming from seeds through phase 4"
80+
run_stage "integration_search"
81+
82+
echo ""
83+
echo "Stage 3/3: final evaluation, packaging, and predictor checks"
84+
run_stage "integration_eval"
85+
86+
echo ""
87+
echo "Staged integration suite completed successfully."

0 commit comments

Comments
 (0)