Skip to content

UmdTask464_DATA605_Spring2026_Ray_Housing_Price_Prediction#487

Open
Delvitron1019 wants to merge 21 commits into
gpsaggese:masterfrom
Delvitron1019:UmdTask464_DATA605_Spring2026_Ray_Housing_Price_Prediction
Open

UmdTask464_DATA605_Spring2026_Ray_Housing_Price_Prediction#487
Delvitron1019 wants to merge 21 commits into
gpsaggese:masterfrom
Delvitron1019:UmdTask464_DATA605_Spring2026_Ray_Housing_Price_Prediction

Conversation

@Delvitron1019
Copy link
Copy Markdown

@Delvitron1019 Delvitron1019 commented Apr 29, 2026

Overview

This project implements a Ray-based housing price prediction pipeline.

Changes

  • Data loading and preprocessing
  • Exploratory Data Analysis (EDA)
  • Baseline RandomForest model
  • Parallel training using Ray
  • Hyperparameter tuning using Ray Tune

Results

Best configuration:

  • n_estimators: 200
  • max_depth: 20
  • RMSE: ~0.504

Ray enabled efficient parallel experimentation and improved model performance.

Status

Project complete.

Delvitron1019 and others added 21 commits April 1, 2026 23:02
I have to do this because I am currently using university issued laptop and it does not support git and usage of terminal even for cloning.
- Remove redundant local copy of project_template/ (canonical version
  is at class_project/project_template/, available via sparse checkout)
- Add .gitignore for Jupyter checkpoints, __pycache__, and OS files
Copies the canonical Docker build/run/jupyter scripts and helper files
from class_project/project_template/ into the project directory.
Uses Dockerfile.python_slim as the base image to keep the image small.

- Dockerfile (python:3.12-slim base)
- .dockerignore
- bashrc, etc_sudoers (config)
- version.sh (logs installed package versions during build)
- docker_*.sh family (build, bash, clean, cmd, exec, jupyter, push)
- run_jupyter.sh (launches JupyterLab inside the container)
- docker_name.sh: set IMAGE_NAME to the project tag
- requirements.txt: pin ray[default,tune,serve]==2.49.0 and the rest
  of the ML stack to versions known compatible with Python 3.12

Verified end-to-end: container builds in ~70s, Ray imports cleanly,
and ray_utils.load_data() returns the California Housing dataset.
Replaces the 5-line stub with comprehensive documentation:

- What Ray is and the four libraries used (Core, Data, Tune, Serve)
- Project objective and pipeline overview
- File layout with descriptions
- Quick-start build and run instructions
- Notebook tour distinguishing API.ipynb (didactic) from example.ipynb (applied)
- curl example for the deployed REST endpoint
- Results summary and architectural decisions
- References to Ray docs, dataset, and class template

Hits the documentation rubric: installation steps, usage examples,
API descriptions, and architectural decisions.
Restructures the applied end-to-end pipeline into 9 narrated sections
and fixes four real bugs in the original code:

1. Replaced deprecated 'from ray.air import session' with 'tune.report'
   (ray.air.session was removed in Ray 2.10+).
2. Pass training data into the Tune trainable explicitly via
   tune.with_parameters instead of capturing notebook globals; the old
   pattern would break on any multi-process Ray runtime.
3. Replaced deprecated mean_squared_error(squared=False) with
   root_mean_squared_error throughout.
4. The Ray Serve deployment used to deploy the *baseline* model from
   section 4 (a closure over a notebook variable). It now refits a
   RandomForestRegressor with the best Tune config, persists it via
   joblib, and HousingModel.__init__ loads model.pkl from disk — so
   the API is self-contained and uses the actually-tuned model.

Added markdown headers for sections 1-10 explaining what each Ray
concept does and why we use it. Verified end-to-end: notebook runs
cleanly inside the Docker image and the deployed endpoint returns
predictions.
- *.pkl: model.pkl can be hundreds of MB and is regenerated whenever
  the example notebook runs end-to-end
- .Trash-*/: directories created by JupyterLab when files are deleted
  from the UI inside the container should not leak into git
Replaces the old housing-pipeline content with a clean, didactic tour
of Ray's four core APIs in isolation:

- @ray.remote — task parallelism with a trivial squaring example
- ray.data.from_pandas — wrap a 5-row DataFrame and apply a
  parallelized transformation via map_batches
- tune.run — minimize a one-parameter parabola over a small grid
  search to demonstrate the trainable / report / analysis loop
- @serve.deployment — a hello-world endpoint queried via requests.get

Each section uses the smallest possible self-contained example so the
notebook reads as documentation rather than as an applied project.

The Serve example explicitly returns a starlette JSONResponse rather
than a bare dict; in Ray 2.49 the auto-conversion path can return 500
for dicts depending on deployment context, while JSONResponse is
unambiguous.

Verified end-to-end inside the Docker image: every cell runs cleanly
and the live HTTP endpoint returns the expected JSON.
The previous version had every markdown special character escaped
(\# instead of #, etc.), apparently from a paste through an
intermediate that auto-escaped. GitHub couldn't parse the file as
markdown and rendered the raw escape sequences. This commit replaces
it with clean markdown that GitHub renders properly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant