Skip to content

feat: normalize OmniRoute model metadata and deduplicate alias entries#20

Closed
Alph4d0g wants to merge 28 commits into
mainfrom
feat/omniroute-normalization-dedup
Closed

feat: normalize OmniRoute model metadata and deduplicate alias entries#20
Alph4d0g wants to merge 28 commits into
mainfrom
feat/omniroute-normalization-dedup

Conversation

@Alph4d0g
Copy link
Copy Markdown
Owner

Summary

This PR addresses the review feedback from #18 by implementing comprehensive model metadata normalization and deduplication:

Changes

  1. Extended Types (src/types.ts)

    • Added snake_case fields (context_length, max_input_tokens, max_output_tokens) to OmniRouteModel
    • Added capabilities object with vision, tool_calling, reasoning, thinking, attachment, temperature fields
    • Added top-level snake_case capability fields (vision, tool_calling) for direct API mapping
  2. Comprehensive Normalization (src/models.ts)

    • Extracted inline normalization into standalone normalizeModel() function
    • Reads all field variants with proper precedence: camelCase > snake_case > capabilities object
    • Handles context limits, streaming, vision, tools, reasoning, attachment, and temperature
  3. Provider Alias Mapping (src/constants.ts)

    • Added PROVIDER_ALIAS_TO_CANONICAL mapping for known aliases:
      • ollamacloudollama-cloud
      • ccclaude
      • ghgithub
      • cxcodex
      • krkiro
      • ifqoder
  4. Smart Deduplication (src/models.ts)

    • Added deduplicateModels() that groups by canonical provider+model key
    • Only deduplicates known aliases; preserves unknown provider prefixes to avoid breaking user metadata
    • Prefers canonical metadata when both alias and canonical exist
    • Normalizes alias IDs to canonical form even when canonical is missing
  5. Metadata Key Canonicalization (src/plugin.ts)

    • Added resolveProviderAliasForMetadata() to canonicalize model IDs for metadata lookups
    • Added isProviderAlias() helper
    • User metadata with alias keys (e.g., cx/gpt-5.5) is merged into canonical keys (e.g., codex/gpt-5.5)
    • Generated metadata uses canonical keys to match deduplicated model IDs
  6. Test Coverage (test/models.test.mjs)

    • normalizeModel reads snake_case fields - verifies capabilities object and snake_case field reading
    • normalizeModel prefers camelCase over snake_case - verifies precedence rules
    • deduplication removes alias when canonical exists - verifies canonical preference
    • deduplication keeps alias when canonical is missing - verifies alias normalization

Test Results

All 42 tests passing:

  • 38 existing tests (no regressions)
  • 4 new tests for normalization and deduplication

Design Decisions

  • Precedence: camelCase > snake_case > capabilities object (consistent with existing code)
  • Selective Deduplication: Only known aliases are deduplicated to preserve user metadata for unknown providers
  • Metadata Key Canonicalization: User metadata keys are canonicalized to match deduplicated model IDs
  • Timing: Normalization and deduplication happen before models.dev enrichment

Backward Compatibility

  • All existing tests pass without modification (except updating expectations for canonical IDs)
  • Unknown provider prefixes are preserved as-is
  • User metadata with alias keys continues to work (merged into canonical keys)

abien and others added 28 commits May 16, 2026 08:37
… resolution

- Add env var fallback for API key in config hook (OMNIROUTE_API_KEY)
- Generate modelMetadata from enriched models so OpenCode applies capabilities
- Add top-level capability fields (temperature, reasoning, attachment, tool_call, modalities) to provider model output
- Generate reasoning effort variants (low/medium/high) for reasoning models
- Add provider aliases: glmt/glm→zai-coding-plan, kimi-coding/kmc→moonshotai, gh/github→google
- Add subscription fallback: zai-coding-plan→zai, kimi-for-coding→moonshotai
- Add model aliases for naming mismatches (kimi-k2.6-thinking→kimi-k2-thinking)
- Strip reasoning effort variant suffixes (gpt-5.5-xhigh→gpt-5.5) for lookup
- Close 19 of 132 models with 4096 default context limits
Preserve user modelMetadata while merging generated capabilities, propagate new capability fields through models.dev and combo paths, and consolidate alias resolution.
Treat missing attachment capability as unknown instead of false when folding combo model capabilities, while still honoring explicit false values.
Avoid falling back to vision support when models.dev or combo metadata explicitly marks attachments unsupported.
- Fix temperature AND-chain bug in calculateLowestCommonCapabilities by tracking hasTemperatureMetadata
- Fix reasoning AND-chain bug in calculateLowestCommonCapabilities by tracking hasReasoningMetadata
- Respect explicit supportsAttachment=false for vision models in toProviderModel
- Add normalized key candidates to getModelLookupCandidates for better alias/variant matching

Addresses feedback from Copilot, Gemini Code Assist, and Kilo Code Bot reviews.
- Expand sanitizeForLog to strip all control chars except tab (was only \r\n)
- Export sanitizeForLog from omniroute-combos.ts for reuse in plugin.ts
- Sanitize model IDs and external values in plugin.ts debug/warn lines:
  - Available models list (line 146)
  - provider.api / apiMode mismatch (lines 209, 215)
  - Unsupported apiMode option (line 231)
  - Unsupported baseURL protocol (line 254)
  - Invalid baseURL (line 260)
  - Invalid metadata field (line 371)
  - Intercepting request URL (line 581)
- Include models.ts null-data fix in same commit

Refs: task-10-log-injection
…ariant+alias, and subscription fallback tests

- Task 15: Add test verifying single model and combo-with-self produce identical capabilities
- Task 17: Add temperature/reasoning combo tests (mixed defined/undefined, false override, all three capabilities, undefined single model)
- Task 18: Add variant suffix stripping + alias resolution integration test
- Task 19: Add subscription provider fallback enrichment test
- Add getDummyBaseUrl(port) helper function to eliminate magic numbers
- Replace all hardcoded http://localhost:20xxx URLs with helper calls
- Keeps assertions and external URLs using hardcoded strings for clarity
@Alph4d0g
Copy link
Copy Markdown
Owner Author

Implementation Complete

This PR implements the comprehensive normalization + deduplication plan discussed in #18.

Key Implementation Details

Normalization Precedence:

camelCase fields > snake_case fields > capabilities object

Example: If API returns both contextWindow: 64000 and context_length: 32000, we use 64000.

Deduplication Logic:

  • Only deduplicates known aliases from PROVIDER_ALIAS_TO_CANONICAL
  • Unknown provider prefixes are kept as-is to preserve user metadata
  • When both alias and canonical exist: keeps canonical, drops alias
  • When only alias exists: normalizes ID to canonical form

Metadata Handling:

  • User metadata keys with aliases (e.g., cx/gpt-5.5) are merged into canonical keys (e.g., codex/gpt-5.5)
  • Generated metadata from fetched models uses canonical keys
  • This ensures metadata lookups work correctly after deduplication

Verification

npm test
# 42 tests passing (38 existing + 4 new)

npm run build
# TypeScript compilation: clean

npx tsc --noEmit
# Type checking: no errors

Files Changed

  • src/types.ts - Extended OmniRouteModel interface
  • src/models.ts - Added normalizeModel(), deduplicateModels(), resolveProviderAliasForMetadata(), isProviderAlias()
  • src/constants.ts - Added PROVIDER_ALIAS_TO_CANONICAL mapping
  • src/plugin.ts - Updated metadata generation to use canonical keys
  • test/models.test.mjs - Added 4 new test cases
  • test/plugin.test.mjs - Updated expectations for canonical IDs

Ready for review!

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a wide range of improvements to the OmniRoute plugin, including a shift to asynchronous logging, enhanced model normalization, and a deduplication system for provider aliases. It expands the model metadata schema to support temperature, reasoning, and attachments, and implements a mechanism for handling model variants like reasoning effort. Review feedback points out critical issues in the implementation of metadata merging and deduplication: specifically, the need for proper precedence and validation in array-based metadata configurations, and the risk of data loss or unreachable logic when merging canonical model entries with their aliases.

Comment thread src/plugin.ts
Comment on lines +363 to +370
if (Array.isArray(userConfig)) {
const generatedBlocks = Object.entries(generated).map(([id, metadata]) => ({
match: id,
...metadata,
}));

return [...generatedBlocks, ...userConfig];
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There are two issues here:

  1. Precedence: In most OpenCode matching systems, the first matching block in the modelMetadata array wins. By prepending generatedBlocks, user-defined overrides in userConfig will be ignored for any model already present in the generated metadata. userConfig should be placed first.
  2. Validation: Unlike the object-based merge logic below, the array-based merge does not validate the user-provided metadata blocks. Invalid metadata should be filtered out to prevent issues in the OpenCode framework.
  if (Array.isArray(userConfig)) {
    const validUserConfig = userConfig.filter((block) => {
      const validation = isValidModelMetadata(block);
      if (!validation.valid) {
        warn(`Invalid metadata block for match "${sanitizeForLog(String(block.match))}" (field: ${sanitizeForLog(validation.field ?? '')}), skipping`);
        return false;
      }
      return true;
    });

    const generatedBlocks = Object.entries(generated).map(([id, metadata]) => ({
      match: id,
      ...metadata,
    }));

    // User config should come first to take precedence in first-match-wins systems
    return [...validUserConfig, ...generatedBlocks];
  }

Comment thread src/models.ts
Comment on lines +123 to +126
if (!canonicalPrefix) {
seen.set(model.id, model);
continue;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When a canonical model entry is encountered, it currently overwrites any existing entry in the seen map. If an alias entry for the same model was processed first, any metadata unique to that alias entry will be lost. It is better to merge the metadata while ensuring the canonical entry's fields take precedence.

Suggested change
if (!canonicalPrefix) {
seen.set(model.id, model);
continue;
}
if (!canonicalPrefix) {
const existing = seen.get(model.id);
seen.set(model.id, existing ? { ...existing, ...model } : model);
continue;
}

Comment thread src/models.ts
Comment on lines +130 to +148
const existing = seen.get(canonicalId);
if (!existing) {
// First time seeing this model - store with canonical ID
seen.set(canonicalId, {
...model,
id: canonicalId,
});
} else {
// Already have canonical version - merge metadata, prefer non-alias
const isAlias = providerPrefix !== canonicalPrefix;
if (!isAlias) {
// This is the canonical version, overwrite alias
seen.set(canonicalId, {
...model,
id: canonicalId,
});
}
// If alias and we already have canonical, drop it
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current deduplication logic for aliases has two issues: it contains dead code (the if (!isAlias) block is unreachable because canonicalPrefix is only defined for aliases) and it fails to merge metadata from the alias into an existing canonical entry. Merging metadata ensures that fields present in the alias but missing in the canonical entry are preserved.

    const existing = seen.get(canonicalId);
    if (!existing) {
      seen.set(canonicalId, { ...model, id: canonicalId });
    } else {
      // Current model is an alias, merge into existing preferring existing fields
      // (which might be the canonical entry or a previously seen alias)
      seen.set(canonicalId, { ...model, ...existing, id: canonicalId });
    }

@Alph4d0g Alph4d0g closed this May 17, 2026
Alph4d0g added a commit that referenced this pull request May 17, 2026
- Fix array-based metadata merge precedence: userConfig now comes first
  in first-match-wins systems instead of being overridden by generated blocks
- Add validation for array-based user metadata blocks to filter out invalid
  entries before merging
- Merge metadata for unknown provider prefixes instead of overwriting,
  preserving all available metadata
- Simplify deduplication logic: remove unreachable isAlias check and merge
  alias metadata into canonical entries, preferring existing (canonical) fields
- Update test expectations to match new array ordering (user first, generated second)
Alph4d0g added a commit that referenced this pull request May 17, 2026
* feat: enrich model capabilities via models.dev and fix provider alias resolution

- Add env var fallback for API key in config hook (OMNIROUTE_API_KEY)
- Generate modelMetadata from enriched models so OpenCode applies capabilities
- Add top-level capability fields (temperature, reasoning, attachment, tool_call, modalities) to provider model output
- Generate reasoning effort variants (low/medium/high) for reasoning models
- Add provider aliases: glmt/glm→zai-coding-plan, kimi-coding/kmc→moonshotai, gh/github→google
- Add subscription fallback: zai-coding-plan→zai, kimi-for-coding→moonshotai
- Add model aliases for naming mismatches (kimi-k2.6-thinking→kimi-k2-thinking)
- Strip reasoning effort variant suffixes (gpt-5.5-xhigh→gpt-5.5) for lookup
- Close 19 of 132 models with 4096 default context limits

* fix: address model metadata review feedback

Preserve user modelMetadata while merging generated capabilities, propagate new capability fields through models.dev and combo paths, and consolidate alias resolution.

* fix: preserve combo attachment metadata

Treat missing attachment capability as unknown instead of false when folding combo model capabilities, while still honoring explicit false values.

* fix: respect explicit attachment false

Avoid falling back to vision support when models.dev or combo metadata explicitly marks attachments unsupported.

* fix: address code review issues from PR #18

- Fix temperature AND-chain bug in calculateLowestCommonCapabilities by tracking hasTemperatureMetadata
- Fix reasoning AND-chain bug in calculateLowestCommonCapabilities by tracking hasReasoningMetadata
- Respect explicit supportsAttachment=false for vision models in toProviderModel
- Add normalized key candidates to getModelLookupCandidates for better alias/variant matching

Addresses feedback from Copilot, Gemini Code Assist, and Kilo Code Bot reviews.

* fix: use nullish coalescing for API key fallback

* fix: add runtime validation for modelMetadata merge

* fix: improve modelMetadata validation coverage

* fix: include modelsDev config in cache key

* refactor: extract splitModelId to eliminate DRY violation

* docs: document supportsTools default rationale

* fix: avoid logging sensitive response data in models fetch error

* style: add trailing newline to models-dev.ts

* fix: sanitize model IDs in log messages to prevent injection

* perf: use async file I/O for logger to prevent event loop blocking

* fix: strengthen log sanitization and cover missed interpolation sites

- Expand sanitizeForLog to strip all control chars except tab (was only \r\n)
- Export sanitizeForLog from omniroute-combos.ts for reuse in plugin.ts
- Sanitize model IDs and external values in plugin.ts debug/warn lines:
  - Available models list (line 146)
  - provider.api / apiMode mismatch (lines 209, 215)
  - Unsupported apiMode option (line 231)
  - Unsupported baseURL protocol (line 254)
  - Invalid baseURL (line 260)
  - Invalid metadata field (line 371)
  - Intercepting request URL (line 581)
- Include models.ts null-data fix in same commit

Refs: task-10-log-injection

* perf: avoid duplicate lookup candidates when alias resolves to same key

* types: tighten variants type to prevent invalid reasoning effort values

* refactor: extract magic constant 4096 to named defaults

* style: remove redundant as const assertions

* test(tasks 15,17,18,19): Add model metadata, temperature/reasoning, variant+alias, and subscription fallback tests

- Task 15: Add test verifying single model and combo-with-self produce identical capabilities
- Task 17: Add temperature/reasoning combo tests (mixed defined/undefined, false override, all three capabilities, undefined single model)
- Task 18: Add variant suffix stripping + alias resolution integration test
- Task 19: Add subscription provider fallback enrichment test

* test(task 20): Replace hardcoded ports with getDummyBaseUrl() helper

- Add getDummyBaseUrl(port) helper function to eliminate magic numbers
- Replace all hardcoded http://localhost:20xxx URLs with helper calls
- Keeps assertions and external URLs using hardcoded strings for clarity

* fix: align provider model fields with OpenCode plugin types

* types: add OmniRoute native fields (snake_case, capabilities) to OmniRouteModel

* feat: normalize all OmniRoute field variants (snake_case, capabilities)

* feat: add provider alias-to-canonical mapping for deduplication

* test: verify normalization of snake_case and capabilities fields

* fix: preserve user metadata by only deduplicating known aliases and canonicalizing metadata keys

* fix: address gemini-code-assist review feedback on PR #20

- Fix array-based metadata merge precedence: userConfig now comes first
  in first-match-wins systems instead of being overridden by generated blocks
- Add validation for array-based user metadata blocks to filter out invalid
  entries before merging
- Merge metadata for unknown provider prefixes instead of overwriting,
  preserving all available metadata
- Simplify deduplication logic: remove unreachable isAlias check and merge
  alias metadata into canonical entries, preferring existing (canonical) fields
- Update test expectations to match new array ordering (user first, generated second)

* chore: bump version to 1.2.0, update CHANGELOG and README

- Bump version from 1.1.4 to 1.2.0 in package.json
- Add comprehensive CHANGELOG entry for v1.2.0 covering:
  - Model metadata normalization (camelCase, snake_case, capabilities)
  - Provider alias deduplication system
  - Array-based metadata validation and precedence fixes
  - Metadata key canonicalization
  - All gemini-code-assist review feedback addressed
- Update README with:
  - New features: metadata normalization, alias deduplication, variant support
  - Updated OmniRouteModel type documentation with native fields
  - Star History graph at the bottom

---------

Co-authored-by: Alexander Bien <abien@gmx.net>
@Alph4d0g Alph4d0g deleted the feat/omniroute-normalization-dedup branch May 19, 2026 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants