bda/session9/README.md at main · warestack/bda

Welcome to Session 9

Learning goals

By the end of Session 9, you should be able to:

load CSV data into Spark with a clear schema
inspect rows, columns, data types, and row counts
register Spark DataFrames as temporary SQL views
create derived columns with Spark DataFrame functions
extract date, hour, and day-of-week features from timestamps
write grouped and sorted Spark SQL analytics queries
build rankings with Spark window functions
save a final Spark summary CSV for reporting

Recommended order

Part 1: Load service events with Spark
Part 2: Derived columns and time features
Part 3: Rankings and final summary tables
Homework
Practice with quizzes when ready.
Write your own work in solutions.
Review reference solutions only after attempting tasks yourself.

Dataset

The tutorial dataset is:

datasets/service_events.csv

It contains small cloud service activity logs with service names, regions, timestamps, request counts, error counts, latency, and traffic columns.

Run local Python files from the session9 folder so this path works as written:

events_path = "datasets/service_events.csv"

In Google Colab, if you upload the CSV directly into the notebook files panel, use:

events_path = "service_events.csv"

Quizzes

quizmd quizzes/python-session-09-part-01-quiz.md
quizmd quizzes/python-session-09-part-02-quiz.md
quizmd quizzes/python-session-09-part-03-quiz.md
quizmd quizzes/python-session-09-homework-quiz.md

Notes

The tutorial parts are written for Google Colab first, but the same code also works locally.
Part 3 shows patterns that are useful for the final project Spark analytics section.
This session uses service logs instead of market data so you can practice the same Spark skills without copying final project answers.
Keep your own solutions in separate files inside solutions/.
Use exercise-style names in solutions/:
- exercise-09-01.py
- exercise-09-02.py
- exercise-09-03.py
- exercise-09-homework.md
Reference answers are in session_solutions/.
Install local dependencies with:
- pip install -r requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Welcome to Session 9

Learning goals

Recommended order

Dataset

Quizzes

Notes

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Welcome to Session 9

Learning goals

Recommended order

Dataset

Quizzes

Notes