UmdTask464_DATA605_Spring2026_Ray_Housing_Price_Prediction by Delvitron1019 · Pull Request #487 · gpsaggese/gpsaggese.github.io

Delvitron1019 · 2026-04-29T18:01:21Z

Overview

This project implements a Ray-based housing price prediction pipeline.

Changes

Data loading and preprocessing
Exploratory Data Analysis (EDA)
Baseline RandomForest model
Parallel training using Ray
Hyperparameter tuning using Ray Tune

Results

Best configuration:

n_estimators: 200
max_depth: 20
RMSE: ~0.504

Ray enabled efficient parallel experimentation and improved model performance.

Status

Project complete.

I have to do this because I am currently using university issued laptop and it does not support git and usage of terminal even for cloning.

…y_Housing_Price_Prediction

- Remove redundant local copy of project_template/ (canonical version is at class_project/project_template/, available via sparse checkout) - Add .gitignore for Jupyter checkpoints, __pycache__, and OS files

Copies the canonical Docker build/run/jupyter scripts and helper files from class_project/project_template/ into the project directory. Uses Dockerfile.python_slim as the base image to keep the image small. - Dockerfile (python:3.12-slim base) - .dockerignore - bashrc, etc_sudoers (config) - version.sh (logs installed package versions during build) - docker_*.sh family (build, bash, clean, cmd, exec, jupyter, push) - run_jupyter.sh (launches JupyterLab inside the container)

- docker_name.sh: set IMAGE_NAME to the project tag - requirements.txt: pin ray[default,tune,serve]==2.49.0 and the rest of the ML stack to versions known compatible with Python 3.12 Verified end-to-end: container builds in ~70s, Ray imports cleanly, and ray_utils.load_data() returns the California Housing dataset.

Replaces the 5-line stub with comprehensive documentation: - What Ray is and the four libraries used (Core, Data, Tune, Serve) - Project objective and pipeline overview - File layout with descriptions - Quick-start build and run instructions - Notebook tour distinguishing API.ipynb (didactic) from example.ipynb (applied) - curl example for the deployed REST endpoint - Results summary and architectural decisions - References to Ray docs, dataset, and class template Hits the documentation rubric: installation steps, usage examples, API descriptions, and architectural decisions.

Restructures the applied end-to-end pipeline into 9 narrated sections and fixes four real bugs in the original code: 1. Replaced deprecated 'from ray.air import session' with 'tune.report' (ray.air.session was removed in Ray 2.10+). 2. Pass training data into the Tune trainable explicitly via tune.with_parameters instead of capturing notebook globals; the old pattern would break on any multi-process Ray runtime. 3. Replaced deprecated mean_squared_error(squared=False) with root_mean_squared_error throughout. 4. The Ray Serve deployment used to deploy the *baseline* model from section 4 (a closure over a notebook variable). It now refits a RandomForestRegressor with the best Tune config, persists it via joblib, and HousingModel.__init__ loads model.pkl from disk — so the API is self-contained and uses the actually-tuned model. Added markdown headers for sections 1-10 explaining what each Ray concept does and why we use it. Verified end-to-end: notebook runs cleanly inside the Docker image and the deployed endpoint returns predictions.

- *.pkl: model.pkl can be hundreds of MB and is regenerated whenever the example notebook runs end-to-end - .Trash-*/: directories created by JupyterLab when files are deleted from the UI inside the container should not leak into git

Replaces the old housing-pipeline content with a clean, didactic tour of Ray's four core APIs in isolation: - @ray.remote — task parallelism with a trivial squaring example - ray.data.from_pandas — wrap a 5-row DataFrame and apply a parallelized transformation via map_batches - tune.run — minimize a one-parameter parabola over a small grid search to demonstrate the trainable / report / analysis loop - @serve.deployment — a hello-world endpoint queried via requests.get Each section uses the smallest possible self-contained example so the notebook reads as documentation rather than as an applied project. The Serve example explicitly returns a starlette JSONResponse rather than a bare dict; in Ray 2.49 the auto-conversion path can return 500 for dicts depending on deployment context, while JSONResponse is unambiguous. Verified end-to-end inside the Docker image: every cell runs cleanly and the live HTTP endpoint returns the expected JSON.

The previous version had every markdown special character escaped (\# instead of #, etc.), apparently from a paste through an intermediate that auto-escaped. GitHub couldn't parse the file as markdown and rendered the raw escape sequences. This commit replaces it with clean markdown that GitHub renders properly.

Delvitron1019 and others added 21 commits April 1, 2026 23:02

Create project_template

a162787

I have to do this because I am currently using university issued laptop and it does not support git and usage of terminal even for cloning.

Create .dockerignore

65e55bc

Add files via upload

a648a76

Merge branch 'gpsaggese:master' into master

34e6613

feat: initialize Ray housing project structure with required files

c5a5009

refactor: fix project structure and isolate project files

0d68be9

feat: add dataset loading and basic EDA

4b47a74

feat: add baseline RandomForest model with RMSE and R2 evaluation

eaa1480

feat: run parallel training experiments using Ray

085f28e

feat: finalize Ray Tune hyperparameter tuning with best configuration

d03cc98

feat: complete end-to-end Ray pipeline with deployment

e633693

Merge branch 'gpsaggese:master' into UmdTask464_DATA605_Spring2026_Ra…

ce3049c

…y_Housing_Price_Prediction

feat: integrate Ray Data into load_data for distributed loading

00282d8

chore: clean up project structure

568a42b

- Remove redundant local copy of project_template/ (canonical version is at class_project/project_template/, available via sparse checkout) - Add .gitignore for Jupyter checkpoints, __pycache__, and OS files

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UmdTask464_DATA605_Spring2026_Ray_Housing_Price_Prediction#487

UmdTask464_DATA605_Spring2026_Ray_Housing_Price_Prediction#487
Delvitron1019 wants to merge 21 commits into
gpsaggese:masterfrom
Delvitron1019:UmdTask464_DATA605_Spring2026_Ray_Housing_Price_Prediction

Delvitron1019 commented Apr 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Delvitron1019 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Changes

Results

Status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Delvitron1019 commented Apr 29, 2026 •

edited

Loading