This repository contains the implementation code for the paper "Humanizing Automated Programming Feedback: Fine-Tuning Generative Models with Student-Written Feedback" presented at EDM 2025.
.
├── data/
│ ├── bugspotter_problems/ # Synthetic programming problems with bugs
│ │ └── problem_1/ # Example problem with multiple buggy variations
│ │ ├── description.txt # Problem description
│ │ ├── driver_template.c # Test driver template
│ │ └── prog_{1,2,3}/ # Different buggy implementations
│ │ ├── buggy.c # Buggy code
│ │ ├── fixed.c # Fixed version
│ │ ├── testcases.json # Test cases
│ └── finetuning_data/ # Training data for fine-tuning
│ └── example_processed_data.jsonl # Example feedback data in JSONL format
├── src/
│ ├── finetuning.py # Fine-tuning script for models
│ ├── generate_prompts.py # Generate prompts for inference
│ ├── generate_responses.py # Generate model responses
│ └── utils.py # Utility functions
└── prompt_templates/
└── inference_templates.py # Prompt templates for different feedback types
# (Recommended) create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate
# Install required packages
pip install -r requirements.txtGenerate prompts for model inference using the buggy programs and test cases:
python src/generate_prompts.py --type basic_prompt
# or
python src/generate_prompts.py --type engineered_promptParameters:
--type: Type of feedback generation (basic_promptfor basic feedback,engineered_promptfor detailed feedback)
Fine-tune language models on the student-written feedback data. Key arguments:
--model_name: HF model id (e.g.,microsoft/Phi-3-mini-4k-instruct)--seed: random seed--data_file: path to JSONL training data--output_dir: directory to save the finetuned model Tested with:microsoft/Phi-3-mini-4k-instructandmeta-llama/Meta-Llama-3-8B-Instruct.
Example:
python src/finetuning.py \
--model_name microsoft/Phi-3-mini-4k-instruct \
--seed 42 \
--data_file data/finetuning_data/example_processed_data.jsonl \
--output_dir models/sft_bugspotterThe script will:
- Load the training data from
--data_file - Fine-tune the specified model
- Save the final model and a
training_details.jsonto--output_dir
Generate feedback using either a local Hugging Face model or an OpenAI model.
Required inputs:
--model: model identifier. Use a Hugging Face model id (e.g.,microsoft/Phi-3-mini-4k-instruct) or an OpenAI model name (e.g.,gpt-4o-mini).--type: prompt set to use. Must match what you generated earlier withsrc/generate_prompts.py(e.g.,basic_promptorengineered_prompt).
Optional inputs:
--seed: random seed (used for local HF models; default 37)--tokenizer: tokenizer path/id (defaults to--model)--OPENAI_API_KEY: only when using OpenAI models
Notes:
- Ensure you have generated prompts first (see section above). The script reads from
generated/prompts/{type}/. - Outputs are saved under
generated/responses/{type}/{model}/{temperature}/with per-prompt subfolders.
Example (local Hugging Face model):
python src/generate_responses.py \
--model microsoft/Phi-3-mini-4k-instruct \
--type basic_prompt \
--seed 42Example (OpenAI model):
python src/generate_responses.py \
--model gpt-4o-mini \
--type basic_prompt \
--OPENAI_API_KEY "$OPENAI_API_KEY"Each problem contains:
- description.txt: Natural language description of the programming task
- driver_template.c: Template for testing student solutions
- Multiple program variations with:
- buggy.c: Student code with bugs
- fixed.c: Corrected version
- testcases.json: Input/output test cases
The fine-tuning data is in JSONL format with each line containing:
{
"input": "Problem description, test case, buggy code, and fixed code",
"output": "Student-written or expert feedback"
}This repository includes synthetic example data to demonstrate the system's functionality:
- 1 programming problem (CountEvenNumbers)
- 3 buggy variations (off-by-one error, array bounds error, logic error)
- 4 example feedback instances for fine-tuning
If you use this code or methodology in your research, please cite our paper:
@inproceedings{2025.EDM.short-papers.35,
author = {Victor-Alexandru P{\u a}durean and Tung Phung and Nachiket Kotalwar and Michael Liut and Juho Leinonen and Paul Denny and Adish Singla},
booktitle = {Proceedings of the 18th International Conference on Educational Data Mining},
doi = {10.5281/zenodo.15870290},
editor = {Caitlin Mills and Giora Alexandron and Davide Taibi and Giosuè Lo Bosco and Luc Paquette},
isbn = {978-1-7336736-6-2},
month = {July},
pages = {434--441},
publisher = {International Educational Data Mining Society},
title = {Humanizing Automated Programming Feedback: Fine-Tuning Generative Models with Student-Written Feedback},
year = {2025}
}