Simple, KISS-style tool to help you download/collect learning session videos, transcribe them with the OpenAI API, and organise the outputs (videos, transcripts, prompts) by session.
Everything is file-based and explicit: you describe a session in a session.yaml, then run small, focused steps.
src/learning_session_transcriber/config.py– env-based configuration (OpenAI models, log level, etc.).sessions.py–SessionConfigmodels andsession.yamlloader/validation.run_session.py– high-level pipeline entry point that orchestrates all steps for a session.downloader.py– video acquisition step (copy local files or download viayt-dlp).transcriber.py– transcription step using OpenAI audio transcription.extract_pdf.py– optional PDF extraction step for attaching PDF content to prompts.synthesizer.py– builds the combined main document from per-video transcripts.prompts.py– applies per-video and main-document prompts defined inprompts.yaml.manifest.py– managesmanifest.jsontracking all generated artifacts.audio_joiner.py– standalone module to convert WAV/M4A to MP3 and join session audio with silence gaps and ID3 metadata.
tests/– pytest teststest_config.py– unit tests forConfig.from_env.test_downloader.py– offline test for the downloader.test_transcriber.py– integration test for OpenAI transcription (opt‑in).tests/__init__.py– test package marker.
scripts/openai_chat_demo.py– manual script to test OpenAI chat.openai_audio_transcription_demo.py– manual script to test audio transcription.openai_list_models.py– manual script to list available OpenAI models.
pyproject.toml– project metadata and tool config.requirements.txt– pinned dependencies (mirrorspyproject.toml).env.example– example env vars (copy to.env).
-
Create and activate a virtual environment
Unix/macOS:
python3 -m venv .venv source .venv/bin/activateWindows:
py -m venv .venv .venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt pip install -e ".[dev]" -
Configure environment variables
Copy the example file and edit it:
cp env.example .env
The important variables are:
APP_ENV– e.g.development,production. Defaults todevelopment.LOG_LEVEL– e.g.INFO,DEBUG. Defaults toINFO(uppercased by code).OPENAI_API_KEY– your OpenAI API key.OPENAI_MODEL– chat model (e.g.gpt-5-miniby default, or anothergpt-5.xmodel).OPENAI_TRANSCRIPTION_MODEL– audio transcription model (e.g.gpt-4o-transcribe).
The
Configclass inconfig.pyreads from the OS environment and.env(viapython-dotenv) without overwriting existing OS variables.
Each learning session is described in a YAML configuration file, typically under:
sessions/<content_name>/session.yaml
Where content_name follows the pattern:
YYYYMMDD_HHmmss_session-topic
Example:
- Content name:
YYYYMMDD_HHmmss_session-topic - Folder:
sessions/YYYYMMDD_HHmmss_session-topic/ - Config file:
sessions/YYYYMMDD_HHmmss_session-topic/session.yaml
See session.example.yaml for a complete, up‑to‑date example including:
- Basic fields (
content_name,topic,language,llm_model). - Optional
prompts_file(path relative to session directory) to specify which prompts YAML file to use; if omitted, defaults toprompts.yamlat project root. videoslist withindex,title,url(and optionallocal_path).- Optional
postprocess_promptsper video (a list) to choose one or more per‑video prompts from the prompts file. - Optional
main_postprocess_prompts(a list) to choose one or more main‑document prompts from the prompts file. - Optional
include_resources(key → path, relative to session directory) to attach extra material (e.g. slides, notes) to prompts.
The SessionConfig model in sessions.py validates this structure and exposes helper properties such as:
outputs_root–outputs/<content_name>/(all artifacts for a session live here in a flattened layout)
The downloader reads your session.yaml, then:
- Copies local files when
local_pathis provided, or - Uses
yt-dlpto download fromurl.
For each video it:
- Writes an
.mp4file intooutputs/<content_name>/named:<content_name>_index_<n>_video.mp4
- Extracts an
.mp3audio file from that video usingffmpeg, named:<content_name>_index_<n>_audio.mp3
- Records both paths in a
manifest.jsonatoutputs/<content_name>/manifest.json:output_path– path to the.mp4video.audio_path– path to the extracted.mp3audio (preferred for transcription).
You normally do not need to call the downloader directly; instead, use the unified pipeline entry point described below. For advanced/manual usage you can still run:
python -m learning_session_transcriber.downloader --config sessions/YYYYMMDD_HHmmss_session-topic/session.yamlThe transcriber:
- Loads the same
session.yaml. - Reads
manifest.jsonproduced by the downloader. - For each entry, prefers the extracted
.mp3inaudio_path(falling back tooutput_path). - Splits long audio files into sequential chunks using
ffmpegso they respect the model’s maximum duration per request. - Calls the OpenAI audio transcription API for each chunk and concatenates the partial transcripts.
- Writes Markdown transcripts directly into
outputs/<content_name>/with filenames:<content_name>_index_<n>_transcript.md
Again, the recommended way is to use the unified pipeline. For manual control you can run:
python -m learning_session_transcriber.transcriber --config sessions/YYYYMMDD_HHmmss_session-topic/session.yamlInternally, this calls transcribe_videos(config_path: Path), which uses:
Config.from_env()to obtain the transcription model.- An
OpenAIclient (withOPENAI_API_KEYfrom env). ffmpegto split audio into smaller chunks when needed.
If you configure postprocess_prompts per video and/or main_postprocess_prompts in your session.yaml, you can run an additional step that applies prompt templates defined in your prompts file (default: prompts.yaml at project root, or the file specified by prompts_file in your session):
per_video:prompts are used for individual transcripts (e.g.summary,key_concepts).main_document:prompts are used for the combined main document (e.g.study_guide,executive_summary).
Run the prompt application step:
python -m learning_session_transcriber.prompts --config sessions/YYYYMMDD_HHmmss_session-topic/session.yamlFor each video where postprocess_prompts contains one or more prompt names, this will:
- For each prompt name, append a
## Postprocess: <prompt_name>section to<content_name>_index_<n>_transcript.md. - Write sibling files
<content_name>_index_<n>_<prompt_name>.mdinoutputs/<content_name>/.
If main_postprocess_prompts is set and the main document already exists, it will:
- For each prompt name, append a
## Postprocess: <prompt_name>section to<content_name>_main.md. - Write sibling files
<content_name>_main_<prompt_name>.mdinoutputs/<content_name>/.
For most use cases you will want to run the whole pipeline for a session with a single command:
python -m learning_session_transcriber.run_session --config sessions/<content_name>/session.yamlOr, after installing the package (e.g. pip install -e .), via the console script:
learning-session-transcriber --config sessions/<content_name>/session.yamlThis will, in order:
- Download or copy videos according to your
session.yaml. - Transcribe audio into per-video transcripts.
- Optionally extract PDFs if configured.
- Synthesize a main document from all transcripts.
- Apply any configured per-video and main-document prompts.
Steps can be selectively enabled/disabled via command-line arguments; see the module docstring in run_session.py for details.
The audio joiner is a separate module for folders that contain only audio files (WAV or MP3). It does not use session.yaml; it uses its own audio_metadata.yaml in the session folder.
Use case: You have a folder of numbered WAV (or MP3) files and want to convert WAVs to MP3, then join all MP3s into a single file with a configurable silence gap between segments and ID3v1 metadata.
-
Create a folder under
sessions/(e.g.sessions/MyAudioSession/). -
Put your WAV and/or MP3 files in that folder (any names; they are processed in alphabetical order).
-
Copy
audio_metadata.example.yamlinto the folder asaudio_metadata.yamland edit:silence_gap_seconds– seconds of silence between each segment.output_filename– name of the final joined file (see template variables below).per_file– ID3v1 tags applied to each individual MP3 when converting WAV/M4A to MP3.joined– ID3v1 tags applied to the final joined MP3.
Template variables (resolved at runtime via Python
str.format()):- per_file section:
{index}(1-based position),{filename}(original filename without extension),{session_name}(folder name). - joined section and output_filename:
{session_name}(folder name),{total_files}(number of audio files).
-
Run:
python -m learning_session_transcriber.audio_joiner --session sessions/MyAudioSession/Output is written to outputs/MyAudioSession/: converted/copied per-file MP3s and one joined MP3. Requires ffmpeg on your PATH.
- The
sessions/andoutputs/directories are treated as user data (per-session configs and generated artifacts) and are ignored by git via.gitignore. - The canonical configuration and documentation files tracked in the repository are:
session.example.yaml– template for new session configs.audio_metadata.example.yaml– template for audio joiner metadata (ID3 and silence gap).env.example– template for environment variables.prompts.yaml– shared prompt definitions.README.md– this documentation.
You can quickly verify OpenAI connectivity and models using the small scripts in scripts/.
-
Chat demo
python -m scripts.openai_chat_demo
Uses
OPENAI_API_KEYandOPENAI_MODELto send a short prompt (in Spanish) and print the response. -
Audio transcription demo
python -m scripts.openai_audio_transcription_demo path/to/audio_or_video.mp4
Uses
OPENAI_TRANSCRIPTION_MODELto transcribe a single file and print the text. -
List available models
python -m scripts.openai_list_models
Lists all models available for your
OPENAI_API_KEY, sorted alphabetically.
These scripts are for manual testing only and are not part of the automated pytest suite.
-
Unit tests (fast, CI‑friendly):
pytest
This runs tests in
tests/such as:test_config.py– configuration defaults and custom env handling.test_downloader.py– downloader behaviour using a fakeyt-dlp.
-
Coverage:
pytest --cov=src/learning_session_transcriber
-
OpenAI integration test (optional):
tests/test_transcriber.pyis marked as an integration test and:- Skips automatically when
OPENAI_API_KEYis not set. - Calls the real OpenAI transcription API with a small dummy video file.
To run only integration tests, set your key and use a marker, for example:
export OPENAI_API_KEY=sk-... pytest -m integration(You can further customise markers in
pytest.iniif needed.) - Skips automatically when
- KISS: prefer simple, explicit steps and file structures over heavy frameworks.
- Layered: keep configuration, session description, downloading, and transcription separated.
- Config via env:
Config.from_env()is the single source of truth; no secrets in code. - Testable: downloader is testable without network; OpenAI integration is opt‑in and clearly marked.
- Scriptable: small
python -m ...entry points instead of complex CLIs, so you can compose steps however you like.