By the end of Session 8, you should be able to:
- explain what Apache Spark is and why PySpark is useful
- run PySpark in Google Colab without installing Spark locally
- create and inspect Spark DataFrames
- transform data with
select,filter,withColumn, andgroupBy - run SQL-style queries with Spark SQL
- install and test PySpark locally if you want a local development setup
- Part 1: PySpark in Google Colab
- Part 2: SQL-style analytics with PySpark
- Part 3: Install Spark locally
- Homework
- Practice with quizzes when ready.
- Write your own work in solutions.
- Review reference solutions only after attempting tasks yourself.
quizmd quizzes/python-session-08-part-01-quiz.md
quizmd quizzes/python-session-08-part-02-quiz.md
quizmd quizzes/python-session-08-part-03-quiz.md
quizmd quizzes/python-session-08-homework-quiz.md- Parts 1 and 2 are designed for Google Colab.
- Part 3 is optional for students who want to run PySpark locally.
- Tutorial and warm-up material is included directly inside each part markdown file.
- Keep your own solutions in separate files inside
solutions/. - Use exercise-style names in
solutions/:exercise-08-01.pyexercise-08-02.pyexercise-08-03.pyexercise-08-homework.md
- Reference answers are in
session_solutions/. - Install local dependencies with:
pip install -r requirements.txt