Learning Session Transcriber

Simple, KISS-style tool to help you download/collect learning session videos, transcribe them with the OpenAI API, and organise the outputs (videos, transcripts, prompts) by session.

Everything is file-based and explicit: you describe a session in a session.yaml, then run small, focused steps.

Project layout

src/learning_session_transcriber/
- config.py – env-based configuration (OpenAI models, log level, etc.).
- sessions.py – SessionConfig models and session.yaml loader/validation.
- run_session.py – high-level pipeline entry point that orchestrates all steps for a session.
- downloader.py – video acquisition step (copy local files or download via yt-dlp).
- transcriber.py – transcription step using OpenAI audio transcription.
- extract_pdf.py – optional PDF extraction step for attaching PDF content to prompts.
- synthesizer.py – builds the combined main document from per-video transcripts.
- prompts.py – applies per-video and main-document prompts defined in prompts.yaml.
- manifest.py – manages manifest.json tracking all generated artifacts.
- audio_joiner.py – standalone module to convert WAV/M4A to MP3 and join session audio with silence gaps and ID3 metadata.
tests/ – pytest tests
- test_config.py – unit tests for Config.from_env.
- test_downloader.py – offline test for the downloader.
- test_transcriber.py – integration test for OpenAI transcription (opt‑in).
- tests/__init__.py – test package marker.
scripts/
- openai_chat_demo.py – manual script to test OpenAI chat.
- openai_audio_transcription_demo.py – manual script to test audio transcription.
- openai_list_models.py – manual script to list available OpenAI models.
pyproject.toml – project metadata and tool config.
requirements.txt – pinned dependencies (mirrors pyproject.toml).
env.example – example env vars (copy to .env).

Installation and setup

Create and activate a virtual environment

Unix/macOS:

python3 -m venv .venv
source .venv/bin/activate

Windows:

py -m venv .venv
.venv\Scripts\activate

Install dependencies

pip install -r requirements.txt
pip install -e ".[dev]"

Configure environment variables

Copy the example file and edit it:
```
cp env.example .env
```
The important variables are:
- APP_ENV – e.g. development, production. Defaults to development.
- LOG_LEVEL – e.g. INFO, DEBUG. Defaults to INFO (uppercased by code).
- OPENAI_API_KEY – your OpenAI API key.
- OPENAI_MODEL – chat model (e.g. gpt-5-mini by default, or another gpt-5.x model).
- OPENAI_TRANSCRIPTION_MODEL – audio transcription model (e.g. gpt-4o-transcribe).
The Config class in config.py reads from the OS environment and .env (via python-dotenv) without overwriting existing OS variables.

Defining a learning session

Each learning session is described in a YAML configuration file, typically under:

sessions/<content_name>/session.yaml

Where content_name follows the pattern:

YYYYMMDD_HHmmss_session-topic

Example:

Content name: YYYYMMDD_HHmmss_session-topic
Folder: sessions/YYYYMMDD_HHmmss_session-topic/
Config file: sessions/YYYYMMDD_HHmmss_session-topic/session.yaml

YAML Format

See session.example.yaml for a complete, up‑to‑date example including:

Basic fields (content_name, topic, language, llm_model).
Optional prompts_file (path relative to session directory) to specify which prompts YAML file to use; if omitted, defaults to prompts.yaml at project root.
videos list with index, title, url (and optional local_path).
Optional postprocess_prompts per video (a list) to choose one or more per‑video prompts from the prompts file.
Optional main_postprocess_prompts (a list) to choose one or more main‑document prompts from the prompts file.
Optional include_resources (key → path, relative to session directory) to attach extra material (e.g. slides, notes) to prompts.

The SessionConfig model in sessions.py validates this structure and exposes helper properties such as:

outputs_root – outputs/<content_name>/ (all artifacts for a session live here in a flattened layout)

Workflow: from `session.yaml` to transcripts

1. Download or collect videos

The downloader reads your session.yaml, then:

Copies local files when local_path is provided, or
Uses yt-dlp to download from url.

For each video it:

Writes an .mp4 file into outputs/<content_name>/ named:
- <content_name>_index_<n>_video.mp4
Extracts an .mp3 audio file from that video using ffmpeg, named:
- <content_name>_index_<n>_audio.mp3
Records both paths in a manifest.json at outputs/<content_name>/manifest.json:
- output_path – path to the .mp4 video.
- audio_path – path to the extracted .mp3 audio (preferred for transcription).

You normally do not need to call the downloader directly; instead, use the unified pipeline entry point described below. For advanced/manual usage you can still run:

python -m learning_session_transcriber.downloader --config sessions/YYYYMMDD_HHmmss_session-topic/session.yaml

2. Transcribe videos with OpenAI

The transcriber:

Loads the same session.yaml.
Reads manifest.json produced by the downloader.
For each entry, prefers the extracted .mp3 in audio_path (falling back to output_path).
Splits long audio files into sequential chunks using ffmpeg so they respect the model’s maximum duration per request.
Calls the OpenAI audio transcription API for each chunk and concatenates the partial transcripts.
Writes Markdown transcripts directly into outputs/<content_name>/ with filenames:
- <content_name>_index_<n>_transcript.md

Again, the recommended way is to use the unified pipeline. For manual control you can run:

python -m learning_session_transcriber.transcriber --config sessions/YYYYMMDD_HHmmss_session-topic/session.yaml

Internally, this calls transcribe_videos(config_path: Path), which uses:

Config.from_env() to obtain the transcription model.
An OpenAI client (with OPENAI_API_KEY from env).
ffmpeg to split audio into smaller chunks when needed.

3. (Optional) Post‑process transcripts and main document

If you configure postprocess_prompts per video and/or main_postprocess_prompts in your session.yaml, you can run an additional step that applies prompt templates defined in your prompts file (default: prompts.yaml at project root, or the file specified by prompts_file in your session):

per_video: prompts are used for individual transcripts (e.g. summary, key_concepts).
main_document: prompts are used for the combined main document (e.g. study_guide, executive_summary).

Run the prompt application step:

python -m learning_session_transcriber.prompts --config sessions/YYYYMMDD_HHmmss_session-topic/session.yaml

For each video where postprocess_prompts contains one or more prompt names, this will:

For each prompt name, append a ## Postprocess: <prompt_name> section to <content_name>_index_<n>_transcript.md.
Write sibling files <content_name>_index_<n>_<prompt_name>.md in outputs/<content_name>/.

If main_postprocess_prompts is set and the main document already exists, it will:

For each prompt name, append a ## Postprocess: <prompt_name> section to <content_name>_main.md.
Write sibling files <content_name>_main_<prompt_name>.md in outputs/<content_name>/.

Unified pipeline (`run_session`)

For most use cases you will want to run the whole pipeline for a session with a single command:

python -m learning_session_transcriber.run_session --config sessions/<content_name>/session.yaml

Or, after installing the package (e.g. pip install -e .), via the console script:

learning-session-transcriber --config sessions/<content_name>/session.yaml

This will, in order:

Download or copy videos according to your session.yaml.
Transcribe audio into per-video transcripts.
Optionally extract PDFs if configured.
Synthesize a main document from all transcripts.
Apply any configured per-video and main-document prompts.

Steps can be selectively enabled/disabled via command-line arguments; see the module docstring in run_session.py for details.

Audio joiner (standalone)

The audio joiner is a separate module for folders that contain only audio files (WAV or MP3). It does not use session.yaml; it uses its own audio_metadata.yaml in the session folder.

Use case: You have a folder of numbered WAV (or MP3) files and want to convert WAVs to MP3, then join all MP3s into a single file with a configurable silence gap between segments and ID3v1 metadata.

Create a folder under sessions/ (e.g. sessions/MyAudioSession/).
Put your WAV and/or MP3 files in that folder (any names; they are processed in alphabetical order).
Copy audio_metadata.example.yaml into the folder as audio_metadata.yaml and edit:
- silence_gap_seconds – seconds of silence between each segment.
- output_filename – name of the final joined file (see template variables below).
- per_file – ID3v1 tags applied to each individual MP3 when converting WAV/M4A to MP3.
- joined – ID3v1 tags applied to the final joined MP3.
Template variables (resolved at runtime via Python str.format()):
- per_file section: {index} (1-based position), {filename} (original filename without extension), {session_name} (folder name).
- joined section and output_filename: {session_name} (folder name), {total_files} (number of audio files).
Run:

python -m learning_session_transcriber.audio_joiner --session sessions/MyAudioSession/

Output is written to outputs/MyAudioSession/: converted/copied per-file MP3s and one joined MP3. Requires ffmpeg on your PATH.

Repository and generated data

The sessions/ and outputs/ directories are treated as user data (per-session configs and generated artifacts) and are ignored by git via .gitignore.
The canonical configuration and documentation files tracked in the repository are:
- session.example.yaml – template for new session configs.
- audio_metadata.example.yaml – template for audio joiner metadata (ID3 and silence gap).
- env.example – template for environment variables.
- prompts.yaml – shared prompt definitions.
- README.md – this documentation.

Manual OpenAI smoke tests (`scripts/`)

You can quickly verify OpenAI connectivity and models using the small scripts in scripts/.

Chat demo
```
python -m scripts.openai_chat_demo
```
Uses OPENAI_API_KEY and OPENAI_MODEL to send a short prompt (in Spanish) and print the response.
Audio transcription demo
```
python -m scripts.openai_audio_transcription_demo path/to/audio_or_video.mp4
```
Uses OPENAI_TRANSCRIPTION_MODEL to transcribe a single file and print the text.
List available models
```
python -m scripts.openai_list_models
```
Lists all models available for your OPENAI_API_KEY, sorted alphabetically.

These scripts are for manual testing only and are not part of the automated pytest suite.

Running tests

Unit tests (fast, CI‑friendly):
```
pytest
```
This runs tests in tests/ such as:
- test_config.py – configuration defaults and custom env handling.
- test_downloader.py – downloader behaviour using a fake yt-dlp.

Coverage:

pytest --cov=src/learning_session_transcriber

OpenAI integration test (optional):

tests/test_transcriber.py is marked as an integration test and:
- Skips automatically when OPENAI_API_KEY is not set.
- Calls the real OpenAI transcription API with a small dummy video file.
To run only integration tests, set your key and use a marker, for example:
```
export OPENAI_API_KEY=sk-...
pytest -m integration
```
(You can further customise markers in pytest.ini if needed.)

Design principles

KISS: prefer simple, explicit steps and file structures over heavy frameworks.
Layered: keep configuration, session description, downloading, and transcription separated.
Config via env: Config.from_env() is the single source of truth; no secrets in code.
Testable: downloader is testable without network; OpenAI integration is opt‑in and clearly marked.
Scriptable: small python -m ... entry points instead of complex CLIs, so you can compose steps however you like.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Session Transcriber

Project layout

Installation and setup

Defining a learning session

YAML Format

Workflow: from `session.yaml` to transcripts

1. Download or collect videos

2. Transcribe videos with OpenAI

3. (Optional) Post‑process transcripts and main document

Unified pipeline (`run_session`)

Audio joiner (standalone)

Repository and generated data

Manual OpenAI smoke tests (`scripts/`)

Running tests

Design principles

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
scripts		scripts
src/learning_session_transcriber		src/learning_session_transcriber
tests		tests
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
LICENSE		LICENSE
README.md		README.md
audio_metadata.example.yaml		audio_metadata.example.yaml
env.example		env.example
prompts.yaml		prompts.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
session.example.yaml		session.example.yaml

Folders and files

Latest commit

History

Repository files navigation

Learning Session Transcriber

Project layout

Installation and setup

Defining a learning session

YAML Format

Workflow: from session.yaml to transcripts

1. Download or collect videos

2. Transcribe videos with OpenAI

3. (Optional) Post‑process transcripts and main document

Unified pipeline (run_session)

Audio joiner (standalone)

Repository and generated data

Manual OpenAI smoke tests (scripts/)

Running tests

Design principles

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Workflow: from `session.yaml` to transcripts

Unified pipeline (`run_session`)

Manual OpenAI smoke tests (`scripts/`)

Packages