ForgeLM Architecture

ForgeLM is designed with modularity and extensibility in mind. The workflow is broken down into distinct stages, each handled by a dedicated module.

System Overview

forgelm --config job.yaml
    │
    ├── cli/                → CLI package (Phase 15 split)
    │   ├── _parser.py          → 18 subcommands + global flags
    │   ├── _dispatch.py        → Mode dispatcher
    │   ├── _exit_codes.py      → 0/1/2/3/4 contract
    │   └── subcommands/        → Per-subcommand handlers
    │       ├── ingest, audit, chat, export, deploy, doctor,
    │       │   cache, purge, reverse_pii, approve, approvals,
    │       │   safety_eval, verify_audit, verify_annex_iv,
    │       │   verify_gguf, quickstart
    ├── config.py           → Pydantic validation (21 config models)
    ├── utils.py            → HF authentication
    ├── model.py            → Load model + tokenizer + LoRA/PEFT
    ├── data.py             → Load + format dataset
    ├── data_audit/         → Audit package (Phase 14 split)
    │   ├── _orchestrator, _aggregator, _streaming, _simhash,
    │   │   _minhash, _pii_regex, _pii_ml, _secrets, _quality,
    │   │   _croissant, _summary, _splits
    ├── trainer.py          → Train (6 trainer types via TRL)
    │   ├── benchmark.py        → lm-eval-harness evaluation
    │   ├── safety.py           → Llama Guard safety check
    │   ├── judge.py            → LLM-as-Judge scoring
    │   ├── model_card.py       → Auto-generate HF model card
    │   ├── compliance.py       → EU AI Act audit artifacts
    │   └── webhook.py          → Slack/Teams notifications
    ├── merging.py          → TIES/DARE/SLERP model merge
    ├── synthetic.py        → Synthetic data generation
    └── wizard/             → Interactive config generator (sub-package, Phase 22)

Directory Layout

ForgeLM/
├── forgelm/                # Core Python package (~22 single-file modules + 2 sub-packages)
│   ├── __init__.py         # Lazy imports for fast CLI startup
│   ├── cli/                # CLI sub-package (Phase 15 split)
│   │   ├── _parser.py          # 18 subcommands + global flags
│   │   ├── _dispatch.py        # Mode dispatcher
│   │   ├── _exit_codes.py      # Public 0/1/2/3/4 contract
│   │   └── subcommands/        # Per-subcommand handler modules
│   │       └── _audit, _ingest, _chat, _export, _deploy, _doctor,
│   │           _cache, _purge, _reverse_pii, _approve, _approvals,
│   │           _safety_eval, _verify_audit, _verify_annex_iv,
│   │           _verify_gguf, _quickstart
│   ├── data_audit/         # Data-audit sub-package (Phase 14 split)
│   │   └── _orchestrator, _aggregator, _streaming, _simhash,
│   │       _minhash, _pii_regex, _pii_ml, _secrets, _quality,
│   │       _croissant, _summary, _splits, _types, _optional
│   ├── config.py           # 21 Pydantic config models
│   ├── data.py             # Dataset loading (SFT/DPO/KTO/GRPO/multimodal)
│   ├── ingestion.py        # Raw docs → SFT JSONL (PDF/DOCX/EPUB/TXT/Markdown)
│   ├── model.py            # Model + LoRA/DoRA/PiSSA + MoE detection
│   ├── trainer.py          # Training orchestration (6 trainer types)
│   ├── inference.py        # Shared inference primitives (load/generate/stream)
│   ├── chat.py             # Interactive terminal REPL with slash commands
│   ├── export.py           # GGUF export via llama-cpp-python
│   ├── fit_check.py        # Pre-flight VRAM estimator
│   ├── deploy.py           # Deployment config generator (Ollama/vLLM/TGI/HF Endpoints)
│   ├── results.py          # TrainResult dataclass (no heavy deps)
│   ├── benchmark.py        # lm-evaluation-harness integration
│   ├── safety.py           # Post-training safety evaluation (Llama Guard)
│   ├── judge.py            # LLM-as-Judge (API + local)
│   ├── compliance.py       # EU AI Act compliance + audit log + provenance
│   ├── model_card.py       # HF-compatible model card generation
│   ├── merging.py          # Model merging (TIES/DARE/SLERP/linear)
│   ├── synthetic.py        # Synthetic data generation (teacher→student)
│   ├── grpo_rewards.py     # Built-in GRPO format/length reward shapers
│   ├── quickstart.py       # Bundled one-command templates
│   ├── wizard/             # Interactive configuration wizard (sub-package — Phase 22)
│   ├── webhook.py          # Webhook notifications (Slack/Teams)
│   ├── _http.py            # SSRF-guarded HTTP chokepoint
│   ├── _version.py         # __version__ + __api_version__ (decoupled)
│   └── utils.py            # Authentication + checkpoint management
├── forgelm/templates/      # 5 quickstart template bundles
├── configs/deepspeed/      # ZeRO-2, ZeRO-3, ZeRO-3+Offload presets
├── notebooks/              # 10 Colab-ready Jupyter notebooks
├── tests/                  # ~70 test modules
├── tools/                  # CI guards: bilingual_parity, anchor_resolution,
│                            # cli_help_consistency, yaml_snippets,
│                            # audit_event_catalog, library_api_doc,
│                            # doc_numerical_claims, bilingual_code_blocks
├── docs/                   # Guides, reference docs, QMS templates
│   ├── guides/             # User guides (ingestion, audit, alignment, CI/CD, …)
│   └── qms/                # EU AI Act QMS SOP templates
├── Dockerfile              # Multi-stage Docker build
├── docker-compose.yaml     # Train + TensorBoard services
├── config_template.yaml    # Annotated config example
└── CONTRIBUTING.md         # Contributor guide

Component Details

`cli/`

The orchestrator (Phase 15 split). _parser.py registers 18 subcommands (audit, approve, approvals, reject, cache-models, cache-tasks, chat, deploy, doctor, export, ingest, purge, quickstart, reverse-pii, safety-eval, verify-annex-iv, verify-audit, verify-gguf) plus the legacy training-mode flag set. _dispatch.py routes to the appropriate handler in subcommands/. _exit_codes.py defines the public 0/1/2/3/4 contract.

`config.py`

21 Pydantic v2 models providing strict validation for all YAML configuration. Includes cross-field validation (e.g., high-risk classification enforces safety evaluation). Config models cover: model, LoRA, training, data, evaluation, safety, benchmark, judge, webhook, distributed, merge, compliance, retention, risk assessment, monitoring, MoE, multimodal, data governance, and synthetic-data generation.

`data.py`

Interfaces with HuggingFace datasets library. Auto-detects dataset format (SFT, DPO, KTO, GRPO, multimodal) and validates against trainer_type. Handles multi-dataset mixing with configurable ratios. Applies chat templates via tokenizer.apply_chat_template() with fallback formatting.

`model.py`

Loads models via HuggingFace Transformers or Unsloth backend. Configures QLoRA (4-bit NF4), PEFT adapters (LoRA, DoRA, PiSSA, rsLoRA), and MoE expert quantization/selection. Distributed-aware: skips device_map="auto" when DeepSpeed/FSDP is active. Multimodal-aware: loads AutoProcessor instead of AutoTokenizer for VLM models.

`trainer.py`

Wraps TRL's trainers (SFTTrainer, DPOTrainer, KTOTrainer, ORPOTrainer, CPOTrainer/SimPO, GRPOTrainer) with ForgeLM's pipeline: baseline evaluation → training → post-training evaluation chain (loss → benchmark → safety → LLM-judge) → model save → model card → compliance artifacts → webhook notification. Supports GaLore optimizer-level memory optimization (gradient low-rank projection for full-parameter training) and long-context features (RoPE scaling, NEFTune noise injection, sliding window attention, sample packing). Includes auto-revert, human approval gate, audit logging, and resource tracking.

`results.py`

Lightweight TrainResult dataclass — importable without torch/transformers. Carries success status, metrics, benchmark scores, resource usage, safety pass/fail, and judge scores.

`benchmark.py`

Wraps EleutherAI lm-evaluation-harness. Runs configurable benchmark tasks, extracts accuracy metrics, applies min_score threshold, and saves results. Optional dependency: pip install forgelm[eval].

`safety.py`

Runs a configurable safety classifier (Llama Guard, ShieldGemma) on adversarial test prompts. Generates responses from the fine-tuned model, classifies each as safe/unsafe, and triggers auto-revert if regression exceeds threshold. Errors are treated as unsafe (fail-safe principle).

`judge.py`

LLM-as-Judge evaluation supporting API-based judges (OpenAI-compatible endpoint) and local model judges. Includes robust JSON parsing with markdown code block extraction. Scores on 1-10 scale with configurable minimum threshold.

`compliance.py`

EU AI Act compliance engine covering Articles 9-17:

AuditLogger: Append-only JSON Lines event log with unique run IDs
generate_training_manifest(): Annex IV technical documentation
generate_data_governance_report(): Data quality statistics
generate_model_integrity(): SHA-256 checksums of output artifacts
generate_deployer_instructions(): Art. 13 deployer document
export_compliance_artifacts(): All artifacts to directory
export_evidence_bundle(): ZIP archive for auditors

`model_card.py`

Generates HuggingFace-compatible README.md with YAML front matter, training parameters table, metrics, benchmark results, config snippet, and usage example. Excludes auth tokens from exported config.

`merging.py`

Model merging with 4 strategies: linear interpolation, TIES-Merging (trim + sign election + merge), DARE (random drop + rescale), and SLERP (spherical interpolation for 2 models). Operates on state dicts — no mergekit dependency required.

`synthetic.py`

Synthetic data generation via teacher-to-student distillation. The SyntheticDataGenerator class takes a teacher model (API-based or local), generates training samples from seed prompts, and outputs formatted JSONL datasets. Triggered via --generate-data CLI flag or synthetic config section. Supports configurable teacher backends, output formats, and generation parameters.

`wizard/`

Interactive CLI wizard for generating valid YAML configs. Phase 22 modernisation (2026-05-08) brought the CLI to parity with site/js/wizard.js: 9-step state machine (welcome / use-case / model / strategy / trainer / dataset / training-params / compliance / operations), per-trainer hyperparameters (dpo_beta / simpo_beta + simpo_gamma / kto_beta / orpo_beta / grpo_*), full PEFT method coverage (lora / dora / pissa / rslora) plus GaLore axis, EU AI Act Article 9 + 10 + 11 + 12+17 compliance accordions, F-compliance-110 strict-tier auto-coercion, back / reset navigation, XDG-aware persistence at $XDG_CACHE_HOME/forgelm/wizard_state.yaml, step-diff preview, beginner / expert toggle, and the Phase 11.5 / 12.5 BYOD inline ingest + audit helpers (_offer_ingest_for_directory, _offer_audit_for_jsonl).

`webhook.py`

Sends structured JSON payloads to Slack/Teams/generic webhooks on training start, success, and failure. Supports URL from config or environment variable. Graceful error handling with configurable timeout.

`utils.py`

HuggingFace authentication (token from config, env var, or local cache with modern XDG path support) and checkpoint management (keep, delete, compress with UUID-suffixed archives).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ForgeLM Architecture

System Overview

Directory Layout

Component Details

`cli/`

`config.py`

`data.py`

`model.py`

`trainer.py`

`results.py`

`benchmark.py`

`safety.py`

`judge.py`

`compliance.py`

`model_card.py`

`merging.py`

`synthetic.py`

`wizard/`

`webhook.py`

`utils.py`

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

ForgeLM Architecture

System Overview

Directory Layout

Component Details

cli/

config.py

data.py

model.py

trainer.py

results.py

benchmark.py

safety.py

judge.py

compliance.py

model_card.py

merging.py

synthetic.py

wizard/

webhook.py

utils.py

`cli/`

`config.py`

`data.py`

`model.py`

`trainer.py`

`results.py`

`benchmark.py`

`safety.py`

`judge.py`

`compliance.py`

`model_card.py`

`merging.py`

`synthetic.py`

`wizard/`

`webhook.py`

`utils.py`