Skip to content

machine-teaching-group/edm2025-humanizing-feedback

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Humanizing Automated Programming Feedback: Fine-Tuning Generative Models

This repository contains the implementation code for the paper "Humanizing Automated Programming Feedback: Fine-Tuning Generative Models with Student-Written Feedback" presented at EDM 2025.

Repository Structure

.
├── data/
│   ├── bugspotter_problems/       # Synthetic programming problems with bugs
│   │   └── problem_1/              # Example problem with multiple buggy variations
│   │       ├── description.txt    # Problem description
│   │       ├── driver_template.c  # Test driver template
│   │       └── prog_{1,2,3}/      # Different buggy implementations
│   │           ├── buggy.c        # Buggy code
│   │           ├── fixed.c    # Fixed version
│   │           ├── testcases.json # Test cases
│   └── finetuning_data/           # Training data for fine-tuning
│       └── example_processed_data.jsonl # Example feedback data in JSONL format
├── src/
│   ├── finetuning.py              # Fine-tuning script for models
│   ├── generate_prompts.py        # Generate prompts for inference
│   ├── generate_responses.py      # Generate model responses
│   └── utils.py                   # Utility functions
└── prompt_templates/
    └── inference_templates.py      # Prompt templates for different feedback types

Installing Requirements

# (Recommended) create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate

# Install required packages
pip install -r requirements.txt

Usage

1. Generate Prompts

Generate prompts for model inference using the buggy programs and test cases:

python src/generate_prompts.py --type basic_prompt
# or
python src/generate_prompts.py --type engineered_prompt

Parameters:

  • --type: Type of feedback generation (basic_prompt for basic feedback, engineered_prompt for detailed feedback)

2. Fine-tune Models

Fine-tune language models on the student-written feedback data. Key arguments:

  • --model_name: HF model id (e.g., microsoft/Phi-3-mini-4k-instruct)
  • --seed: random seed
  • --data_file: path to JSONL training data
  • --output_dir: directory to save the finetuned model Tested with: microsoft/Phi-3-mini-4k-instruct and meta-llama/Meta-Llama-3-8B-Instruct.

Example:

python src/finetuning.py \
  --model_name microsoft/Phi-3-mini-4k-instruct \
  --seed 42 \
  --data_file data/finetuning_data/example_processed_data.jsonl \
  --output_dir models/sft_bugspotter

The script will:

  • Load the training data from --data_file
  • Fine-tune the specified model
  • Save the final model and a training_details.json to --output_dir

3. Generate Responses

Generate feedback using either a local Hugging Face model or an OpenAI model.

Required inputs:

  • --model: model identifier. Use a Hugging Face model id (e.g., microsoft/Phi-3-mini-4k-instruct) or an OpenAI model name (e.g., gpt-4o-mini).
  • --type: prompt set to use. Must match what you generated earlier with src/generate_prompts.py (e.g., basic_prompt or engineered_prompt).

Optional inputs:

  • --seed: random seed (used for local HF models; default 37)
  • --tokenizer: tokenizer path/id (defaults to --model)
  • --OPENAI_API_KEY: only when using OpenAI models

Notes:

  • Ensure you have generated prompts first (see section above). The script reads from generated/prompts/{type}/.
  • Outputs are saved under generated/responses/{type}/{model}/{temperature}/ with per-prompt subfolders.

Example (local Hugging Face model):

python src/generate_responses.py \
  --model microsoft/Phi-3-mini-4k-instruct \
  --type basic_prompt \
  --seed 42

Example (OpenAI model):

python src/generate_responses.py \
  --model gpt-4o-mini \
  --type basic_prompt \
  --OPENAI_API_KEY "$OPENAI_API_KEY"

Data Format

Problem Structure

Each problem contains:

  • description.txt: Natural language description of the programming task
  • driver_template.c: Template for testing student solutions
  • Multiple program variations with:
    • buggy.c: Student code with bugs
    • fixed.c: Corrected version
    • testcases.json: Input/output test cases

Fine-tuning Data Format

The fine-tuning data is in JSONL format with each line containing:

{
  "input": "Problem description, test case, buggy code, and fixed code",
  "output": "Student-written or expert feedback"
}

Synthetic Data

This repository includes synthetic example data to demonstrate the system's functionality:

  • 1 programming problem (CountEvenNumbers)
  • 3 buggy variations (off-by-one error, array bounds error, logic error)
  • 4 example feedback instances for fine-tuning

Citation

If you use this code or methodology in your research, please cite our paper:

@inproceedings{2025.EDM.short-papers.35,
 author = {Victor-Alexandru P{\u a}durean and Tung Phung and Nachiket Kotalwar and Michael Liut and Juho Leinonen and Paul Denny and Adish Singla},
 booktitle = {Proceedings of the 18th International Conference on Educational Data Mining},
 doi = {10.5281/zenodo.15870290},
 editor = {Caitlin Mills and Giora Alexandron and Davide Taibi and Giosuè Lo Bosco and Luc Paquette},
 isbn = {978-1-7336736-6-2},
 month = {July},
 pages = {434--441},
 publisher = {International Educational Data Mining Society},
 title = {Humanizing Automated Programming Feedback: Fine-Tuning Generative Models with Student-Written Feedback},
 year = {2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors