GBrain supports storage tiering to separate version-controlled content from bulk machine-generated data. This prevents git repositories from becoming bloated with large amounts of automatically generated content while still preserving it in the database.
Note on naming: prior to v0.22.11 the keys were
git_tracked/supabase_only. The canonical names are nowdb_tracked/db_only(engine-agnostic — works on both PGLite and Postgres). The deprecated keys still load with a once-per-process warning. Rungbrain doctor --fixfor an automated rename when that path lands.
Add a storage section to your gbrain.yml file in the brain repository root:
storage:
# Directories that are version-controlled (human-edited, committed to git).
db_tracked:
- people/
- companies/
- deals/
- concepts/
- yc/
- ideas/
- projects/
# Directories persisted via the brain database only (bulk machine-generated
# content). Written to disk as a local cache but not committed to git;
# `gbrain sync` auto-manages .gitignore for these paths. `gbrain export
# --restore-only` repopulates missing files from the database.
db_only:
- media/x/
- media/articles/
- meetings/transcripts/Path requirements:
- Each directory must end with
/for canonical form. The validator auto-normalizes missing trailing slashes (one-time info note shows what changed). - A directory cannot appear in both tiers — that's a tier-overlap error and
loadStorageConfigthrowsStorageConfigError. Editgbrain.ymlto remove the overlap and try again.
When storage configuration is present, gbrain sync automatically manages .gitignore entries on every successful sync:
- Adds missing
db_onlydirectory patterns to.gitignore. - Idempotent — re-running adds no duplicate entries.
- Stable comment header so the managed block is grep-able.
- Skipped on
--dry-run(don't mutate disk in preview mode). - Skipped on
blocked_by_failuresstatus (sync state is inconsistent). - Skipped when the repo is a git submodule (
.gitis a file, not a directory) — submodule .gitignore changes don't survive parent updates. A warning explains. - Skipped entirely when
GBRAIN_NO_GITIGNORE=1is set (escape hatch for shared-repo setups where a maintainer wants gbrain to leave .gitignore alone). - Failures (write permission denied, etc.) are caught and logged, never crash sync.
Example .gitignore addition:
# Auto-managed by gbrain (db_only directories)
media/x/
media/articles/
meetings/transcripts/# Restore only missing db_only files from the database.
gbrain export --restore-only --repo /path/to/brain
# Filter by page type.
gbrain export --restore-only --type media --repo /path/to/brain
# Filter by slug prefix.
gbrain export --restore-only --slug-prefix media/x/ --repo /path/to/brain
# Combine filters.
gbrain export --restore-only --type media --slug-prefix media/x/ --repo /path/to/brainThe --restore-only flag:
- Resolves repoPath via the chain
--repo→ typedsources.getDefault()→ hard error. Never falls through to the current directory. - Only exports pages that match
db_onlypatterns AND are missing from disk. - Ideal for container restart recovery and fresh clones.
# Human-readable status.
gbrain storage status --repo /path/to/brain
# JSON output for scripts and orchestrators.
gbrain storage status --repo /path/to/brain --jsonOutput includes:
- Total page counts by storage tier.
- Disk usage breakdown by tier.
- Missing files that need restoration (top 10 shown; full list in
--json). - Configuration validation warnings.
- Current tier directory listing.
Example output:
Storage Status
==============
Repository: /data/brain
Total pages: 15,243
Storage Tiers:
-------------
DB tracked: 2,156 pages
DB only: 12,887 pages
Unspecified: 200 pages
Disk Usage:
-----------
DB tracked: 45.2 MB
DB only: 2.1 GB
Missing Files (need restore):
-----------------------------
media/x/tweet-1234567890
media/x/tweet-0987654321
... and 47 more
Use: gbrain export --restore-only --repo "/data/brain"
Configuration:
--------------
DB tracked directories:
- people/
- companies/
- deals/
DB-only directories:
- media/x/
- media/articles/
- meetings/transcripts/
loadStorageConfig runs normalizeAndValidateStorageConfig after parsing:
- Auto-fixes (silent, with one-time info note showing what changed):
- Missing trailing
/is added:'media/x'→'media/x/'.
- Missing trailing
- Throws
StorageConfigError(caller sees a clean exit-1 with actionable message):- Same directory in both
db_trackedanddb_only(ambiguous routing).
- Same directory in both
Perfect for brain repositories crossing 50K-200K+ files where:
- Core knowledge (people, companies, deals) remains git-tracked.
- Bulk data (tweets, articles, transcripts) moves to db_only.
- Development stays fast with smaller git repos.
- Full data remains available via the database.
Essential for ephemeral container environments:
- Git repo contains only essential files.
- Container restarts don't lose db_only data.
gbrain export --restore-onlyquickly restores bulk files when needed.- Local disk acts as a cache layer.
Enables consistent data access across environments:
- Development: small git clone, restore bulk data on demand.
- Production: full dataset via the database, selective local caching.
- CI/CD: fast tests with git-tracked data only.
- Assess current repository: use
gbrain storage statusto understand current distribution. - Plan directory structure: identify which directories should be db_tracked vs db_only.
- Create
gbrain.yml: add storage configuration to the repository root. - Test with dry-run:
gbrain sync --dry-runto verify behavior;.gitignoreis NOT touched on dry-run. - Run a real sync:
gbrain syncupdates.gitignoreautomatically on success. - Verify restore: test
gbrain export --restore-only --repo .against a small db_only directory.
- Directory naming: end storage paths with
/(canonical form). The validator normalizes if you forget. - Start small: begin with clearly machine-generated directories in
db_only. - Address validation errors: tier overlap is an error, not a warning. Fix it before sync.
- Test restore: regularly test
--restore-onlyin staging environments. - Document decisions: comment your
gbrain.ymlto explain tier choices.
On the PGLite engine (gbrain's local-only embedded Postgres), the "DB" your db_only pages live in IS the local file gbrain uses for everything else. The .gitignore housekeeping still helps (keeps bulk content out of git history), but the offload-to-DB promise is technically vacuous. A once-per-process soft-warn explains when the engine is detected. To get full tiering, migrate to Postgres with gbrain migrate --to supabase.
- Backward compatible: systems without
gbrain.ymlwork unchanged. - Progressive enhancement: add configuration when needed.
- Database unchanged: all data remains in Postgres regardless of tier.
- Existing workflows: all existing
syncandexportbehavior preserved. - Deprecated keys:
git_tracked/supabase_onlystill load with a once-per-process warning.