UmdTask464_DATA605_Spring2026_Ray_Housing_Price_Prediction#487
Open
Delvitron1019 wants to merge 21 commits into
Open
UmdTask464_DATA605_Spring2026_Ray_Housing_Price_Prediction#487Delvitron1019 wants to merge 21 commits into
Delvitron1019 wants to merge 21 commits into
Conversation
I have to do this because I am currently using university issued laptop and it does not support git and usage of terminal even for cloning.
…y_Housing_Price_Prediction
- Remove redundant local copy of project_template/ (canonical version is at class_project/project_template/, available via sparse checkout) - Add .gitignore for Jupyter checkpoints, __pycache__, and OS files
Copies the canonical Docker build/run/jupyter scripts and helper files from class_project/project_template/ into the project directory. Uses Dockerfile.python_slim as the base image to keep the image small. - Dockerfile (python:3.12-slim base) - .dockerignore - bashrc, etc_sudoers (config) - version.sh (logs installed package versions during build) - docker_*.sh family (build, bash, clean, cmd, exec, jupyter, push) - run_jupyter.sh (launches JupyterLab inside the container)
- docker_name.sh: set IMAGE_NAME to the project tag - requirements.txt: pin ray[default,tune,serve]==2.49.0 and the rest of the ML stack to versions known compatible with Python 3.12 Verified end-to-end: container builds in ~70s, Ray imports cleanly, and ray_utils.load_data() returns the California Housing dataset.
Replaces the 5-line stub with comprehensive documentation: - What Ray is and the four libraries used (Core, Data, Tune, Serve) - Project objective and pipeline overview - File layout with descriptions - Quick-start build and run instructions - Notebook tour distinguishing API.ipynb (didactic) from example.ipynb (applied) - curl example for the deployed REST endpoint - Results summary and architectural decisions - References to Ray docs, dataset, and class template Hits the documentation rubric: installation steps, usage examples, API descriptions, and architectural decisions.
Restructures the applied end-to-end pipeline into 9 narrated sections and fixes four real bugs in the original code: 1. Replaced deprecated 'from ray.air import session' with 'tune.report' (ray.air.session was removed in Ray 2.10+). 2. Pass training data into the Tune trainable explicitly via tune.with_parameters instead of capturing notebook globals; the old pattern would break on any multi-process Ray runtime. 3. Replaced deprecated mean_squared_error(squared=False) with root_mean_squared_error throughout. 4. The Ray Serve deployment used to deploy the *baseline* model from section 4 (a closure over a notebook variable). It now refits a RandomForestRegressor with the best Tune config, persists it via joblib, and HousingModel.__init__ loads model.pkl from disk — so the API is self-contained and uses the actually-tuned model. Added markdown headers for sections 1-10 explaining what each Ray concept does and why we use it. Verified end-to-end: notebook runs cleanly inside the Docker image and the deployed endpoint returns predictions.
- *.pkl: model.pkl can be hundreds of MB and is regenerated whenever the example notebook runs end-to-end - .Trash-*/: directories created by JupyterLab when files are deleted from the UI inside the container should not leak into git
Replaces the old housing-pipeline content with a clean, didactic tour of Ray's four core APIs in isolation: - @ray.remote — task parallelism with a trivial squaring example - ray.data.from_pandas — wrap a 5-row DataFrame and apply a parallelized transformation via map_batches - tune.run — minimize a one-parameter parabola over a small grid search to demonstrate the trainable / report / analysis loop - @serve.deployment — a hello-world endpoint queried via requests.get Each section uses the smallest possible self-contained example so the notebook reads as documentation rather than as an applied project. The Serve example explicitly returns a starlette JSONResponse rather than a bare dict; in Ray 2.49 the auto-conversion path can return 500 for dicts depending on deployment context, while JSONResponse is unambiguous. Verified end-to-end inside the Docker image: every cell runs cleanly and the live HTTP endpoint returns the expected JSON.
The previous version had every markdown special character escaped (\# instead of #, etc.), apparently from a paste through an intermediate that auto-escaped. GitHub couldn't parse the file as markdown and rendered the raw escape sequences. This commit replaces it with clean markdown that GitHub renders properly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This project implements a Ray-based housing price prediction pipeline.
Changes
Results
Best configuration:
Ray enabled efficient parallel experimentation and improved model performance.
Status
Project complete.