Humanizing Automated Programming Feedback: Fine-Tuning Generative Models

This repository contains the implementation code for the paper "Humanizing Automated Programming Feedback: Fine-Tuning Generative Models with Student-Written Feedback" presented at EDM 2025.

Repository Structure

.
├── data/
│   ├── bugspotter_problems/       # Synthetic programming problems with bugs
│   │   └── problem_1/              # Example problem with multiple buggy variations
│   │       ├── description.txt    # Problem description
│   │       ├── driver_template.c  # Test driver template
│   │       └── prog_{1,2,3}/      # Different buggy implementations
│   │           ├── buggy.c        # Buggy code
│   │           ├── fixed.c    # Fixed version
│   │           ├── testcases.json # Test cases
│   └── finetuning_data/           # Training data for fine-tuning
│       └── example_processed_data.jsonl # Example feedback data in JSONL format
├── src/
│   ├── finetuning.py              # Fine-tuning script for models
│   ├── generate_prompts.py        # Generate prompts for inference
│   ├── generate_responses.py      # Generate model responses
│   └── utils.py                   # Utility functions
└── prompt_templates/
    └── inference_templates.py      # Prompt templates for different feedback types

Installing Requirements

# (Recommended) create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate

# Install required packages
pip install -r requirements.txt

Usage

1. Generate Prompts

Generate prompts for model inference using the buggy programs and test cases:

python src/generate_prompts.py --type basic_prompt
# or
python src/generate_prompts.py --type engineered_prompt

Parameters:

--type: Type of feedback generation (basic_prompt for basic feedback, engineered_prompt for detailed feedback)

2. Fine-tune Models

Fine-tune language models on the student-written feedback data. Key arguments:

--model_name: HF model id (e.g., microsoft/Phi-3-mini-4k-instruct)
--seed: random seed
--data_file: path to JSONL training data
--output_dir: directory to save the finetuned model Tested with: microsoft/Phi-3-mini-4k-instruct and meta-llama/Meta-Llama-3-8B-Instruct.

Example:

python src/finetuning.py \
  --model_name microsoft/Phi-3-mini-4k-instruct \
  --seed 42 \
  --data_file data/finetuning_data/example_processed_data.jsonl \
  --output_dir models/sft_bugspotter

The script will:

Load the training data from --data_file
Fine-tune the specified model
Save the final model and a training_details.json to --output_dir

3. Generate Responses

Generate feedback using either a local Hugging Face model or an OpenAI model.

Required inputs:

--model: model identifier. Use a Hugging Face model id (e.g., microsoft/Phi-3-mini-4k-instruct) or an OpenAI model name (e.g., gpt-4o-mini).
--type: prompt set to use. Must match what you generated earlier with src/generate_prompts.py (e.g., basic_prompt or engineered_prompt).

Optional inputs:

--seed: random seed (used for local HF models; default 37)
--tokenizer: tokenizer path/id (defaults to --model)
--OPENAI_API_KEY: only when using OpenAI models

Notes:

Ensure you have generated prompts first (see section above). The script reads from generated/prompts/{type}/.
Outputs are saved under generated/responses/{type}/{model}/{temperature}/ with per-prompt subfolders.

Example (local Hugging Face model):

python src/generate_responses.py \
  --model microsoft/Phi-3-mini-4k-instruct \
  --type basic_prompt \
  --seed 42

Example (OpenAI model):

python src/generate_responses.py \
  --model gpt-4o-mini \
  --type basic_prompt \
  --OPENAI_API_KEY "$OPENAI_API_KEY"

Data Format

Problem Structure

Each problem contains:

description.txt: Natural language description of the programming task
driver_template.c: Template for testing student solutions
Multiple program variations with:
- buggy.c: Student code with bugs
- fixed.c: Corrected version
- testcases.json: Input/output test cases

Fine-tuning Data Format

The fine-tuning data is in JSONL format with each line containing:

{
  "input": "Problem description, test case, buggy code, and fixed code",
  "output": "Student-written or expert feedback"
}

Synthetic Data

This repository includes synthetic example data to demonstrate the system's functionality:

1 programming problem (CountEvenNumbers)
3 buggy variations (off-by-one error, array bounds error, logic error)
4 example feedback instances for fine-tuning

Citation

If you use this code or methodology in your research, please cite our paper:

@inproceedings{2025.EDM.short-papers.35,
 author = {Victor-Alexandru P{\u a}durean and Tung Phung and Nachiket Kotalwar and Michael Liut and Juho Leinonen and Paul Denny and Adish Singla},
 booktitle = {Proceedings of the 18th International Conference on Educational Data Mining},
 doi = {10.5281/zenodo.15870290},
 editor = {Caitlin Mills and Giora Alexandron and Davide Taibi and Giosuè Lo Bosco and Luc Paquette},
 isbn = {978-1-7336736-6-2},
 month = {July},
 pages = {434--441},
 publisher = {International Educational Data Mining Society},
 title = {Humanizing Automated Programming Feedback: Fine-Tuning Generative Models with Student-Written Feedback},
 year = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
prompt_templates		prompt_templates
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Humanizing Automated Programming Feedback: Fine-Tuning Generative Models

Repository Structure

Installing Requirements

Usage

1. Generate Prompts

2. Fine-tune Models

3. Generate Responses

Data Format

Problem Structure

Fine-tuning Data Format

Synthetic Data

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Humanizing Automated Programming Feedback: Fine-Tuning Generative Models

Repository Structure

Installing Requirements

Usage

1. Generate Prompts

2. Fine-tune Models

3. Generate Responses

Data Format

Problem Structure

Fine-tuning Data Format

Synthetic Data

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages