Added a proxy for model swapping by samfundev · Pull Request #1645 · LostRuins/koboldcpp

samfundev · 2025-07-13T20:12:51Z

This is a very rough version of a proxy for kobold so that it can swap models for each request.

Specify models by name
Multimodal support
Try using reload_config instead of subprocessing
Possibly split out into a separate file?

Only text models are supported but that'll be fixed as well. First step towards #1623.

This can currently be used w/ things like open webui to chat with multiple models.

LostRuins · 2025-07-14T07:06:54Z

Hmm I think

A proxy, if added, should be in a separate external file, instead of directly in KoboldCpp.py
Why use subprocesses to open/close every time? Just use the admin api which is designed for switching models already.

henk717 · 2025-07-14T22:57:05Z

Would a seperate file play nice with the pyinstallers?

LostRuins · 2025-07-15T14:15:48Z

If communicating solely thru API I don't see why not.

Proxy accepts request, stalls the user
Proxy calls /api/admin/reload_config to switch model and waits until model switched
Proxy calls /api/v1/generate with request (or v1/chat/completions for openai mode)
Proxy receives reply from KoboldCpp
Proxy sends reply back to original requestor transparently.

This should be doable cleanly as an entirely separate program. However, SSE streaming will be more challenging.

henk717 · 2025-07-16T14:34:00Z

Either way it should be the regular koboldcpp setting it up. I dont want the mess of users having to start seperate things for this feature. However we do it should be a thing the main koboldcpp launcher / cli starts when admin mode is in use

Personally I do think integrating it into koboldcpp.py makes sense.

samfundev · 2025-08-06T19:25:13Z

I've updated this to use the model name. I also implemented support for getting the list of models instead of the currently active model. This was enough to get it working in open webui. I've updated the original comment with a checklist.

henk717 · 2025-08-07T00:28:34Z

The idea is that it would show the configs the admin api already shows. You get multi modal for free since it accepts kcpps files.

samfundev · 2025-08-21T14:55:11Z

I could be missing something, but I don't think that config files give us multi modal for free. If the goal is to keep at most one model loaded for each modality, having a config file either comes with a drawback (implementation 1) or requires splitting configs for each modality (implementation 2).

There's a two ways I can think to implement multi modality:

Have one server and if that server doesn't have the right model, swap to one that does. This has the drawback that if you have multiple modalities loaded, you have to load/unload multiple models for each swap of the server.
We can split each modality into a separate server, then each modality can be swapped without effecting the others. This has the drawback of more overhead since multiple servers are running. But avoids the drawback of implementation 1.

samfundev · 2025-08-22T16:41:40Z

@LostRuins: Why use subprocesses to open/close every time? Just use the admin api which is designed for switching models already.

I've just tried to implement it using this way but I ran into a problem. The API responds before the server has switched over. If I try to connect to the server right after the server responds, the old server is still active so it errors out when the connection gets closed. I added a sleep to try to wait until the old server closes but it feels like this method is unreliable.

pqnet · 2025-09-13T15:20:52Z

@LostRuins: Why use subprocesses to open/close every time? Just use the admin api which is designed for switching models already.

I've just tried to implement it using this way but I ran into a problem. The API responds before the server has switched over. If I try to connect to the server right after the server responds, the old server is still active so it errors out when the connection gets closed. I added a sleep to try to wait until the old server closes but it feels like this method is unreliable.

For this particular feature you could poll the model endpoint until it matches the model you want (or you get "no model loaded" in case of an error)

samfundev · 2025-09-20T23:12:19Z

Based on @pqnet suggested, I swapped over to the admin api.

lee-b · 2026-02-16T16:07:22Z

I could be missing something, but I don't think that config files give us multi modal for free. If the goal is to keep at most one model loaded for each modality, having a config file either comes with a drawback (implementation 1) or requires splitting configs for each modality (implementation 2).

There's a two ways I can think to implement multi modality:
1. Have one server and if that server doesn't have the right model, swap to one that does. This has the drawback that if you have multiple modalities loaded, you have to load/unload multiple models for each swap of the server.

2. We can split each modality into a separate server, then each modality can be swapped without effecting the others. This has the drawback of more overhead since multiple servers are running. But avoids the drawback of implementation 1.

#3 would be to process the chat completion's history of messages according to modality, and feed them to text/vision/audio/other-modality-capable mapped models accordingly, translating all to the nearest "dumbed-down" equivalent that the main model can support. i.e., receive image, but main llm only handles text? call image-to-text model, replace, cache prompt, feed to llm.

HOWEVER, #1 through #3 are HACKS. True multimodality (#4) means that the same model can look at the image, and the audio, and video, and text, and whatever else, load it ALL into the same high dimensional "mind", and "figure out what it all means" before rendering it back down into an answer. Converting it all to text and feeding it into a text model is very different. But #3 in particular would be a reasonable HACK and to my mind the only one that's API-correct if we're aiming to be multi-modal and emulate models like GPT-4o accessed via OpenAI API's.

What a similar approach, like #1 or llama-swap (which is really #1 as a proxy using #2) does for you, though, is to let you configure one server endpoint that LISTS multiple models and can serve them, and then the CLIENT can switch between, IFF it has separate configuration for each modality. That's just the client and server supporting multiple models and modalities separately though, not "multi-modality".

samfundev marked this pull request as draft July 13, 2025 20:12

samfundev force-pushed the concedo branch from 47a5291 to 9c92eb8 Compare August 6, 2025 19:19

samfundev added 2 commits September 20, 2025 19:10

added a proxy

7a32fd2

switch to the admin api

684aed4

samfundev force-pushed the concedo branch from 9c92eb8 to 684aed4 Compare September 20, 2025 23:10

LostRuins added the enhancement New feature or request label Oct 31, 2025

LostRuins mentioned this pull request Jan 25, 2026

Reload_config api to return promise #1945

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added a proxy for model swapping#1645

Added a proxy for model swapping#1645
samfundev wants to merge 2 commits intoLostRuins:concedofrom
samfundev:concedo

samfundev commented Jul 13, 2025 •

edited

Loading

Uh oh!

LostRuins commented Jul 14, 2025

Uh oh!

henk717 commented Jul 14, 2025

Uh oh!

LostRuins commented Jul 15, 2025 •

edited

Loading

Uh oh!

henk717 commented Jul 16, 2025 •

edited

Loading

Uh oh!

samfundev commented Aug 6, 2025

Uh oh!

henk717 commented Aug 7, 2025

Uh oh!

samfundev commented Aug 21, 2025

Uh oh!

samfundev commented Aug 22, 2025

Uh oh!

pqnet commented Sep 13, 2025

Uh oh!

samfundev commented Sep 20, 2025

Uh oh!

lee-b commented Feb 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

samfundev commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Jul 14, 2025

Uh oh!

henk717 commented Jul 14, 2025

Uh oh!

LostRuins commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

henk717 commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samfundev commented Aug 6, 2025

Uh oh!

henk717 commented Aug 7, 2025

Uh oh!

samfundev commented Aug 21, 2025

Uh oh!

samfundev commented Aug 22, 2025

Uh oh!

pqnet commented Sep 13, 2025

Uh oh!

samfundev commented Sep 20, 2025

Uh oh!

lee-b commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

samfundev commented Jul 13, 2025 •

edited

Loading

LostRuins commented Jul 15, 2025 •

edited

Loading

henk717 commented Jul 16, 2025 •

edited

Loading

lee-b commented Feb 16, 2026 •

edited

Loading