Gemma Coding Test: Agent Task Arena

This repository is a deliberately scoped coding-agent evaluation project for Gemma, a local Hermes Agent profile.

Gemma's assignment is to build Agent Task Arena: a local-first Next.js dashboard for managing coding-agent tasks, sessions, implementation notes, and evaluation outcomes.

What is in this repo now?

This seed repo contains the planning artifacts Gemma needs before implementation:

docs/PRD.md — product requirements document
docs/TASKS.md — sequential implementation plan
docs/GEMMA_WORKFLOW.md — exact instructions for running Gemma one fresh session per task
docs/EVALUATION_RUBRIC.md — scoring criteria for each task/session
docs/task-packets/ — copy/paste-ready task prompts for Gemma

Recommended execution model

Use the async runner to execute one fresh Gemma session per task while Gemma works unattended:

cd ~/projects/gemma-coding-test
HERMES_PROFILE=gemma scripts/run-gemma-tasks.sh

The runner commits/pushes after each task and stops on the first failure. This preserves clean evaluation boundaries without requiring Jake to manually restart Gemma each time.

This is intentionally more controlled than blasting everything in one session, because it produces a cleaner eval:

Gemma receives bounded context.
Each task has a clear diff.
Failures are isolated.
GitHub history becomes the audit trail.
Jake can stop, inspect, or redirect between tasks.

See docs/GEMMA_WORKFLOW.md for exact commands.

Target application

Agent Task Arena will be a local-first taskboard for coding-agent work:

Create implementation tasks with acceptance criteria
Track status across backlog/running/review/done/failed
Record agent sessions and notes per task
Score task outcomes with a rubric
Export task/session reports as markdown

Tech stack target

Next.js App Router
TypeScript
Tailwind CSS
shadcn/ui-style primitives or simple accessible components
Local JSON persistence first, SQLite optional later
Vitest for unit tests
Playwright for smoke/e2e tests if practical

Current status

Seed docs only. Gemma should implement the application by following docs/TASKS.md in order.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.hermes/plans		.hermes/plans
app		app
docs		docs
scripts		scripts
.gitignore		.gitignore
README.md		README.md
next-env.d.ts		next-env.d.ts
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
tsconfig.tsbuildinfo		tsconfig.tsbuildinfo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gemma Coding Test: Agent Task Arena

What is in this repo now?

Recommended execution model

Target application

Tech stack target

Current status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Gemma Coding Test: Agent Task Arena

What is in this repo now?

Recommended execution model

Target application

Tech stack target

Current status

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages