-
Notifications
You must be signed in to change notification settings - Fork 77
support multiple device evaluation for activation quantized model #1394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses test and code improvements by commenting out multiple test cases and adding new device management functionality.
Changes:
- Commented out extensive test cases in the scheme test file to focus on a single
test_set_schemetest - Added a new
dispatch_model_block_wiseutility function for multi-device model dispatching - Updated evaluation code to use the new device dispatch mechanism
- Changed a logging level from warning to info and added
tie_weights()calls in multiple locations
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/test_cpu/schemes/test_scheme.py | Commented out multiple test cases, leaving only test_set_scheme active |
| auto_round/utils/device.py | Added new dispatch_model_block_wise function for block-wise model dispatching across devices |
| auto_round/eval/evaluation.py | Updated prepare_model_for_eval to use new device dispatch utility and changed parameter name |
| auto_round/compressors/base.py | Changed logger level from warning to info, added tie_weights() call, and modified error raising |
| auto_round/auto_scheme/utils.py | Added tie_weights() call before device map inference |
for more information, see https://pre-commit.ci
…ix_0204 # Conflicts: # auto_round/eval/eval_cli.py
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Signed-off-by: n1ck-guo <heng.guo@intel.com>
| # temporary tensors, other processes, and allocator fragmentation, reducing | ||
| # the chance of runtime OOM while still utilizing most available memory. | ||
| new_max_memory[device] = max_memory[device] * max_mem_ratio | ||
| new_max_memory = get_balanced_memory( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my memory, this will use all CUDA_VISIBLE_DEVICES. Setting CUDA_VISIBLE_DEVICES with device_map information might help.
Description
Please briefly describe your main changes, the motivation.
Type of Change
Related Issues
Fixes or relates to #
Checklist Before Submitting