Skip to content

Threedv/llm_text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

llm_text

Small Python utility to turn a local directory into one LLM-friendly text digest.

It walks a directory, prints the included file tree, and appends the content of each included text file into a single output text file.

Install

python -m pip install -r requirements.txt

Useful when:

  • a repo is private
  • you want to share project context with an LLM without pushing the repo
  • you want a simple local alternative to remote ingestion tools

What it does

  • scans a local directory only
  • applies the source directory's .gitignore by default
  • skips common cache / build / binary files
  • includes code, config, and doc files up to a configurable size limit
  • supports include / exclude glob filters
  • can convert .ipynb notebooks into readable text
  • estimates tokens with tiktoken using the o200k_base encoding
  • can print lightweight scan progress
  • writes to a file or stdout

Requirements

  • Python 3.9+
  • pathspec>=0.12.1
  • tiktoken>=0.7.0

Quick start

python local_dir_ingest.py /path/to/project

This writes digest.txt in the current directory.

Basic usage

Write digest to a file

python local_dir_ingest.py /path/to/project -o project_digest.txt

Print digest to stdout

python local_dir_ingest.py /path/to/project -o -

Show scan progress

python local_dir_ingest.py /path/to/project --progress

Increase max file size

python local_dir_ingest.py /path/to/project --max-file-kb 200

Include only selected file types

python local_dir_ingest.py /path/to/project -i "*.py" -i "*.md"

Exclude paths or file patterns

python local_dir_ingest.py /path/to/project -e "data/*" -e "*.pt"

Include notebook outputs

python local_dir_ingest.py /path/to/project --include-notebook-output

Follow symlinked directories

python local_dir_ingest.py /path/to/project --follow-symlinks

Ignore .gitignore

python local_dir_ingest.py /path/to/project --no-gitignore

Output format

The generated digest contains:

  1. source directory summary
  2. traversal stats
  3. included directory tree
  4. file-by-file content blocks

Example:

Directory: /path/to/project
Scanned directories: 24
Scanned files: 312
Files analyzed: 12
Included bytes: 41,203 (40.2 KB)
Estimated tokens: 12.3k
...

Directory structure:
└── project/
    ├── README.md
    ├── app.py
    └── utils/
        └── helpers.py

================================================
FILE: README.md
================================================
...

Notes

  • default max file size is 50 KB
  • default include patterns cover common code, config, and docs such as *.py, *.sh, *.md, *.txt, *.yaml, *.json, and *.toml
  • the source directory's .gitignore is applied by default
  • binary / media / archive files are skipped
  • common directories like .git, node_modules, dist, build, and virtual envs are skipped
  • token estimates use the same o200k_base tokenizer approach as gitingest
  • if no files match, the tool still writes a valid digest with a message

Example

python local_dir_ingest.py ~/dev/my_private_repo -i "*.py" -i "*.md" -e "data/*" -o digest.txt

License

MIT

About

llm_text

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages