[Feature] Add Evaluation & Benchmarking ScriptAdd files via upload#49
Open
Diksha-3905 wants to merge 1 commit into
Open
[Feature] Add Evaluation & Benchmarking ScriptAdd files via upload#49Diksha-3905 wants to merge 1 commit into
Diksha-3905 wants to merge 1 commit into
Conversation
Author
|
feat: add evaluation & benchmarking script for FastVLM models
Usage: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a new benchmark.py script to evaluate FastVLM model checkpoints with:
Time-to-First-Token (TTFT) measurement
Latency per image
Simple accuracy metric (placeholder for VQA/captioning tasks)
CLI interface for easy use with different checkpoints and datasets
Key Changes
Added benchmark.py script with modular design:
Supports folder-based image datasets.
Uses build_model_and_transforms to load any FastVLM checkpoint.
Computes and logs TTFT, latency, and simple accuracy.
Outputs a summary of benchmark results.
Integrated with torchvision transforms for preprocessing.
Included CLI arguments for model path, image directory, and device selection.
Created a minimal dataset loader (ImageFolderDataset) for quick evaluation.
Usage
python benchmark.py
--model checkpoints/fastvlm_0.5b_stage3
--img-dir ./sample_images
--device cuda
Future Enhancements
Add COCO/VQA dataset loaders.
Integrate BLEU, CIDEr, and other standard metrics.
Support batch inference and multi-GPU evaluation.
Generate JSON/CSV reports and visual plots.
Testing
Verified with FastVLM-0.5B checkpoint on sample images.
Works on CUDA and CPU devices.
Checklist
Code compiles and runs without errors.
Tested basic benchmarking on sample images.
Added CLI interface and documented usage.