This repo contains Oumi’s problem statement and starter template for:
- Nebius.Build SF, March 15, 2026
- Eclipse 6.0, April 4-5, 2026
The theme for Oumi’s hackathon track is Small Language Models (SLMs) for Voice Agents.
A Voice Agent differs from a standard AI agent in that users interact with it through spoken conversation. Instead of typing prompts, users speak to the agent and it responds with synthesized speech.
A typical Voice Agent pipeline works as follows:
- Audio is captured from the user’s microphone.
- A speech-to-text (STT) model transcribes the audio into text.
- The text is processed by an AI agent powered by a language model.
- The agent’s response is converted back into audio using a text-to-speech (TTS) model and played to the user.
Voice Agents are commonly used in real-time applications such as automated telephone customer support.
In this hackathon, Small Language Models (SLMs) are defined as language models with fewer than ~10 billion parameters. Compared to large models, SLMs are significantly cheaper to run and can often operate efficiently on consumer-grade edge devices. When fine-tuned for specific tasks, SLMs can even outperform much larger models such as GPT-5.4 or Claude Opus 4.6.
Voice Agents have strict latency requirements because they operate in real time. Delays in transcription, reasoning, or speech synthesis can significantly degrade the conversational experience. Because SLMs are smaller and faster to run, they can reduce response times and improve overall responsiveness. In many cases, an architecture composed of multiple specialized SLMs working together may achieve lower latency and better performance than a single large general-purpose model.
Your task in this hackathon is to build a Voice Agent where one or more fine-tuned SLMs play a central role.
Requirements:
- You must use Oumi to fine-tune the models in your solution and 🌟 star the Oumi GitHub repo 🌟.
- There are no restrictions on the application domain, but the agent should address a specific, realistic use case.
- There is no need to justify the use of SLMs with evaluations, although the latency benefits should be clear
- There is no requirement to use open-weight models, although this is highly encouraged.
Submissions will be evaluated based on:
- Creativity
- Real-world impact
- Technical quality
Here are some suggestions for how you can modularize an agent into task specific models, each of which could be implemented as a fine-tuned SLM:
- Guardrails and LLM-Judges (are my inputs and outputs valid, safe, and relevant?)
- Query rewriting (how could this query be rewritten for more effective knowledge retrieval?)
- Execution routing (which step in the workflow should I take next given the user query?)
- Retrieval routing (which of my data sources - vector/graph database(s) etc. - should I search given the user query?)
- Model routing (should this query go to the powerful LLM or simpler SLM?)
- Planner (develop a plan, i.e. multiple steps, to achieve the intended outcome)
- Verifier (what would happen if we carried out the plan - is it a good idea?)
- Executors (convert the plan into a sequence of tool calls)
- Memory management (are there any relevant facts in the query that would be useful in the future?)
We have included a starter template Voice Agent in this repo under /template. See the README there for instruction on how to install and use.
The template is intended to allow participants to focus on the "agent" part of the voice agent and not have to worry about the STT, TTS, and audio pipeline parts. There are no requirements to use this code, although it may help you to build faster.
After judging is complete, we will add interesting submissions to this section.
- Blog
- Oumi Open-Source Stack
- Agent frameworks
- Other libraries
- Papers
