The Export module (src/journal_utilities/export/) handles the conversion of transcripts into various distribution formats.
The module uses a functional strategy pattern where each ExportFormat is mapped to a specific writer function.
The module uses a functional strategy pattern — each ExportFormat is mapped to a writer function. There is no class; all logic is in top-level functions.
- Input: Directory of
.txttranscript files. - Output: Subdirectories for each format (
pdf/,json/, etc.). - Concurrency: Sequential processing (CPU-bound/IO-bound).
| Format | Extension | Library | Implementation Details |
|---|---|---|---|
| Plaintext | .txt |
Built-in | Metadata header (title, category, speakers, duration, URL), UTF-8. |
| Markdown | .md |
Built-in | YAML frontmatter (title, category, series, episode, speakers, duration, URL, views). |
| JSON | .json |
Built-in | Structured data including metadata, word count, and transcript. |
| HTML | .html |
Built-in | Standalone file with embedded CSS, metadata subtitle, and YouTube link. |
.pdf |
fpdf2 |
Clean typography with title + metadata subtitle, sanitized Unicode. |
The export module is typically invoked via the main run.py runner:
# Export all configured formats
python run.py exportfrom journal_utilities.export.exporter import export_transcripts, ExportFormat
from pathlib import Path
export_transcripts(
transcript_dir=Path("data/output/transcripts"),
output_dir=Path("data/export"),
formats=[ExportFormat.PDF, ExportFormat.MARKDOWN],
data_dir=Path("data/output"), # enables metadata enrichment
)When data_dir is provided, the exporter loads channel_videos.json and enriches every export with:
- Title: Full video title (replaces YouTube ID)
- Category/Series/Episode: From the categorizer heuristics
- Speakers: Parsed from titles (text after
~) - Duration: Formatted as
H:MM:SS - URL: Direct YouTube link
- Views: View count
In config.ini:
[export]
# Enable/disable specific formats
plaintext = true
markdown = true
pdf = true
json = true
html = true
# Output directory relative to project root
output_dir = data/export