- Python 3.10+
- pip (included with Python)
- Virtual environment (recommended)
# Clone and install
git clone https://github.com/codefzer/QuickCat.git
cd QuickCat
# Create virtual environment (optional but recommended)
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Verify installation
pytest -vQuickCat/
├── shared-resources/ # Core utilities (stdlib + pymarc only)
│ └── scripts/
│ ├── quickcat_loader.py # Bootstrap — registers all cross-package imports
│ ├── config_loader.py # Cached JSON config loaders (config.json, servers.json, validation-rules.json)
│ ├── marc_io.py # read_mrc() / write_mrc() helpers
│ ├── marc_utils.py # nfc() NFC normalizer, similarity() scorer
│ ├── normalize_dates.py
│ ├── parse_marc.py
│ └── transaction_log.py
├── skills/ # Feature modules (loaded on-demand)
│ ├── copy-cataloger/ # Metadata harvesting & consensus
│ ├── batch-cleaner/ # MARC record sanitization
│ ├── brief-to-full-enhancer/ # AI metadata enhancement
│ ├── authority-grounder/ # Authority heading matching
│ ├── marc-importer/ # Excel/MARC import
│ ├── marc-exporter/ # MARC export & validation
│ ├── vision-to-marc/ # OCR to MARC conversion
│ └── record-rollback/ # Revision management
├── tests/
│ ├── conftest.py # Module shim + fixtures
│ ├── (unit tests by skill)
│ └── integration/ # End-to-end pipeline tests
├── shared-resources/scripts/quickcat_loader.py # Bootstrap module (solves hyphen/underscore imports)
├── pyproject.toml # Pytest configuration
└── requirements.txt
Problem: Directory names use hyphens (shared-resources, batch-cleaner) but Python imports use underscores. Standard sys.path doesn't resolve hyphens.
Solution: quickcat_loader.py loads scripts by file path using importlib and registers them in sys.modules:
import importlib.util
spec = importlib.util.spec_from_file_location("module_name", path)
mod = importlib.util.module_from_spec(spec)
sys.modules["module_name"] = mod
spec.loader.exec_module(mod)This allows normal imports like from batch_clean import clean_record even though the actual path is skills/batch-cleaner/scripts/batch_clean.py.
QuickCat uses a 4-layer architecture to minimize startup overhead:
| Layer | When Loaded | Dependencies | Use Case |
|---|---|---|---|
| Eager | On import quickcat_loader |
stdlib + pymarc | All scripts need this |
| register_copy_cataloger() | On demand | httpx, tenacity | Metadata harvest/consensus |
| register_tie_breaker() | On demand | (calls copy_cataloger) | Conflict resolution |
| register_batch_cleaner() | On demand | pymarc | Record cleaning |
| register_marc_importer() | On demand | pandas | Excel import only |
Each script calls only the helpers it needs:
import quickcat_loader
quickcat_loader.register_batch_cleaner() # Load batch_clean only
from skills.batch_cleaner.scripts.batch_clean import clean_recordpytest -v # Run all 146 tests
pytest tests/ --ignore=tests/integration/ # Unit tests only (139)
pytest tests/integration/ -v # Integration tests only (7)- Unit Tests (139): Isolated tests for each module in
tests/*/test_*.py - Integration Tests (7): End-to-end pipeline tests in
tests/integration/
Directory names use hyphens (shared-resources, batch-cleaner) but Python imports use underscores. The solution in tests/conftest.py:
def _reg(name, path):
"""Load a script by file path and register in sys.modules."""
spec = importlib.util.spec_from_file_location(name, path)
mod = importlib.util.module_from_spec(spec)
sys.modules[name] = mod
spec.loader.exec_module(mod)This allows test files to use normal imports like import import_pipeline even though the actual path is skills/marc-importer/scripts/import_pipeline.py.
QuickCat uses lazy loading to avoid unnecessary dependencies:
-
Eager layer (always loaded):
normalize_dates,transaction_log,parse_marc,config_loader,marc_io,marc_utils(stdlib + pymarc)
-
On-demand layers (loaded when needed):
register_copy_cataloger()– heavy: httpx, tenacityregister_batch_cleaner()– needs pymarcregister_marc_importer()– heavy: pandas
Scripts only load what they need. Tests verify this isolation (see test_import_pipeline_e2e.py).
| What | How | When |
|---|---|---|
| httpx calls | respx fixtures |
test_harvest_metadata.py |
| Anthropic API | monkeypatch _call_claude |
test_enhance_record.py |
| File I/O | tmp_path fixture |
All file tests |
| Config loading | monkeypatch.setattr |
Integration tests |
@pytest.mark.asynciofor async functionspytest-asyncioconfigured inpyproject.tomlwithasyncio_mode = "auto"- No manual
asyncio.run()needed in test definitions
| Skill | Tests | Focus |
|---|---|---|
| shared_resources | 34 | Config loading, MARC I/O, utils, date normalization, MARC parsing, revision log |
| copy_cataloger | 17 | ISBN/LCCN validation, metadata consensus |
| batch_cleaner | 11 | Record sanitization, field deletion, org codes |
| brief_to_full_enhancer | 10 | AI-powered summaries, field stamping |
| authority_grounder | 8 | Authority heading matching, LC API mocking |
| marc_exporter | 7 | Record classification, validation |
| marc_importer | 9 | Name inversion, column mapping, CSV parsing |
| vision_to_marc | 6 | OCR → MARC conversion, template application |
| record_rollback | 5 | Revision log parsing, record restoration |
| Integration | 7 | Full pipeline workflows, output validation |
| TOTAL | 146 | Comprehensive coverage |
- Create test file in appropriate
tests/SKILL_NAME/directory - Import modules directly:
import module_name(conftest handles paths) - Use shared fixtures from
conftest.py:sample_record,tmp_path, etc. - For async tests, use
@pytest.mark.asyncio - For httpx mocking, use
respx(already in requirements.txt)
tests/conftest.py– Module shim + shared fixturespyproject.toml– pytest config with asyncio_mode="auto"requirements.txt– Must include: pytest, pytest-asyncio, respx, pymarc
To verify a script only loads expected dependencies:
# Check what gets loaded with lightweight input
python3 skills/marc-importer/scripts/import_pipeline.py sample.mrc
# Should NOT import pandas/excel librariesTest: test_import_pipeline_e2e.py verifies ISO-2709 path never touches Excel stack.
- Never rename directories – Hyphenated names are integral to the design
- Add imports at module level – Lazy-load heavy dependencies only where used
- Document module boundaries – Comments show which helpers register which modules
- Run tests after changes –
pytest -vmust pass before committing
# Example: MARC import pipeline with ISO-2709 input
python3 skills/marc-importer/scripts/import_pipeline.py input.mrc --out output.mrc
# With validation profile
python3 skills/marc-importer/scripts/import_pipeline.py input.mrc \
--profile skills/batch-cleaner/assets/default-profile.json
# Check startup overhead for lightweight operations
python3 skills/marc-importer/scripts/import_pipeline.py sample.mrc
# Should NOT import pandas since input is .mrc (ISO-2709)- Create in
skills/NEW_SKILL/scripts/ - Add unit tests in
tests/new_skill/ - If heavy dependencies, create a register function in
quickcat_loader.py - Add integration test if pipeline-relevant
- Update this CLAUDE.md with gotchas
# Create feature branch
git checkout -b feature/my-feature
# Make changes and test
pytest -v
git add .
git commit -m "Add my feature description"
# Push and open PR
git push origin feature/my-featureImportant: All 127 tests must pass before pushing.
Gotcha: Running a script directly (e.g., python3 batch_clean.py) causes the file to be imported twice: once as __main__, and again via quickcat_loader._reg().
Solution: _reg() checks if the target path matches sys.modules["__main__"].__file__ and aliases instead of re-executing. See quickcat_loader.py lines 49-55.
Gotcha: Code at module level in shared modules runs every time the module is loaded.
Solution: Isolate heavy imports behind register_*() helpers so unrelated scripts don't pay the cost. Example:
# harvest_metadata.py imports httpx at module level — this is fine,
# but quickcat_loader defers its registration so only scripts that
# call register_copy_cataloger() load it.
# In quickcat_loader.py:
def register_copy_cataloger():
"""Deferred because harvest_metadata imports httpx/tenacity at module level."""
_reg("harvest_metadata", ROOT / "skills/copy-cataloger/scripts/harvest_metadata.py")Gotcha: Scripts define _load_config() at module level but call it inside runtime functions (e.g., orchestrate(), harvest_metadata()). The real config.json must exist at the repo root for production use, but tests should mock it to avoid file I/O.
Solution: Use the shared mock_config_factory fixture from conftest.py:
async def test_example(mock_config_factory):
mock_config_factory(harvest_orchestrator, {"org_code": "TEST_ORG"})Gotcha: Testing merge/consensus logic requires multiple metadata sources, otherwise the loop never executes.
Solution: Always test with sources=["loc", "nls"] or similar to trigger records[1:] loop. See test_orchestrate_full_harvest_workflow().
Gotcha: Async functions without the decorator won't run under pytest.
Solution: Add @pytest.mark.asyncio above async test functions. pyproject.toml sets asyncio_mode = "auto" so no manual asyncio.run() needed.