Skip to content

[Feature] Add Evaluation & Benchmarking ScriptAdd files via upload#49

Open
Diksha-3905 wants to merge 1 commit into
apple:mainfrom
Diksha-3905:main
Open

[Feature] Add Evaluation & Benchmarking ScriptAdd files via upload#49
Diksha-3905 wants to merge 1 commit into
apple:mainfrom
Diksha-3905:main

Conversation

@Diksha-3905

Copy link
Copy Markdown

Summary
This PR introduces a new benchmark.py script to evaluate FastVLM model checkpoints with:

Time-to-First-Token (TTFT) measurement

Latency per image

Simple accuracy metric (placeholder for VQA/captioning tasks)

CLI interface for easy use with different checkpoints and datasets

Key Changes
Added benchmark.py script with modular design:

Supports folder-based image datasets.

Uses build_model_and_transforms to load any FastVLM checkpoint.

Computes and logs TTFT, latency, and simple accuracy.

Outputs a summary of benchmark results.

Integrated with torchvision transforms for preprocessing.
Included CLI arguments for model path, image directory, and device selection.
Created a minimal dataset loader (ImageFolderDataset) for quick evaluation.

Usage
python benchmark.py
--model checkpoints/fastvlm_0.5b_stage3
--img-dir ./sample_images
--device cuda

Future Enhancements
Add COCO/VQA dataset loaders.

Integrate BLEU, CIDEr, and other standard metrics.

Support batch inference and multi-GPU evaluation.

Generate JSON/CSV reports and visual plots.

Testing
Verified with FastVLM-0.5B checkpoint on sample images.

Works on CUDA and CPU devices.

Checklist
Code compiles and runs without errors.

Tested basic benchmarking on sample images.

Added CLI interface and documented usage.

@Diksha-3905

Copy link
Copy Markdown
Author

feat: add evaluation & benchmarking script for FastVLM models

  • Introduced benchmark.py to evaluate model checkpoints.
  • Measures Time-to-First-Token (TTFT), average latency, and simple accuracy.
  • Added CLI interface for easy benchmarking with any image folder.
  • Implemented a lightweight dataset loader (ImageFolderDataset).
  • Prepared for future metrics (BLEU, CIDEr) and dataset integrations (COCO, VQA).

Usage:
python benchmark.py --model checkpoints/fastvlm_0.5b_stage3 --img-dir ./sample_images --device cuda

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant