assets-manager-start-projects-evaluation

This repository contains the code and data used to evaluate different AI models and prompting techniques for selecting relevant code snippets based on project descriptions. Selection algorithm is implemented in assets-manager-api

Overview

Assets manager API automatically assembles starter projects for digital banking components by selecting the most relevant code snippets based on a text description. I compared several AI models by processing 100 synthetic “projects” against a pool of 100 candidate snippets, measuring how accurately each model retrieved exactly the needed fragments.

Model Selection Evaluation

Models are benchmarked in ascending order of cost (per 1 M in + 1 M out tokens):

gpt-4.1-nano ($0.50)
gpt-4o-mini ($0.75)
o3-mini ($5.50)
o1 ($75)
gpt-4o ($12.50)
gpt-4.5 ($225)

Each model received the project description and the full snippet pool and returned its top choices. I classified outcomes into:

Exact match: all and only needed snippets selected
Partial – Extra: most needed, but with extras
Partial – Missing: missed some needed
Partial – Mixed: extras and missing together
Mismatch: irrelevant selections

Prompt Engineering Evaluation

Using gpt-4o-mini as my chosen backbone, I tested three prompting styles on the same data:

Zero-shot (no examples)
Few-shot (with example pairs)
Chain-of-Thought (model explains reasoning then selects)

Technique	Exact Match
Zero-shot	16%
Few-shot	22%
Chain-of-Thought	34%

Chain-of-Thought prompting achieved near the performance of much costlier models at a fraction of the price.

Results

Best Model: gpt-4o-mini for cost/quality trade-off
Best Prompting: Chain-of-Thought

Notes

To access API endpoints exposed by a different docker container from within a current one, use host.docker.internal instead of localhost

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.devcontainer		.devcontainer
images		images
input		input
output		output
.gitignore		.gitignore
README.md		README.md
evaluate_start_projects.py		evaluate_start_projects.py
evaluation_results.ipynb		evaluation_results.ipynb
generate_code_assets.py		generate_code_assets.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

assets-manager-start-projects-evaluation

Table of Contents

Overview

Model Selection Evaluation

Prompt Engineering Evaluation

Results

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

assets-manager-start-projects-evaluation

Table of Contents

Overview

Model Selection Evaluation

Prompt Engineering Evaluation

Results

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages