Conversation
… format (not just db). New unit tests
…checkpointing 🚀 Major improvements to dataset generation pipeline: ✅ QA Parallel Processing (qa_workers): - Implement ThreadPoolExecutor.map() for order-preserving parallelization - Thread-safe image indexing with unique IDs - Automatic fallback to sequential when qa_workers=1 - 2-4x speedup for ground truth scenarios, no overhead for GPU inference - Comprehensive testing: verified identical output between parallel/sequential ✅ Simplified Logging System: - Single GRAID_DEBUG_VERBOSE env var controls console debug output - Debug messages always go to log files (for troubleshooting) - Timestamped log files: graid_YYYYMMDD_HHMM.log - Cleaned up complex logging logic ✅ Robust Checkpointing: - Save/resume functionality via save_steps parameter - Automatic checkpoint cleanup on successful completion - Force restart capability (force parameter) - Crash recovery for large dataset generation ✅ Enhanced Configuration: - Added force, save_steps, use_original_filenames, filename_prefix parameters - CLI arguments now properly override config file values - Maintains backward compatibility 🧪 Verified Features: - Parallel QA generates identical results as sequential (100% match) - Order preservation maintained across all scenarios - Question-image correspondence preserved - Profiling and timing aggregation works across threads - Debug logging working correctly (both console and file) All changes maintain full backward compatibility and existing functionality.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.