Conversation
bb41004 to
a282bef
Compare
|
mlx-community/GLM-Z1-9B-0414-bf16 fails with:
|
|
Models are now tracked within catalogue. See catalog |
|
Added the working models to the catalogue. I'll look into 6bit and bf16 quantization problems |
ae1e76a to
a2e676b
Compare
a2e676b to
d04d04e
Compare
|
I'm curious. Are you not using the native MLX engine that already supports so many more model architectures? It would be wonderfull to be able to use any MLX supported model. Currently the project don't make much sense with my 128GB Mac. Only one supported model would be bigger than what I can run conventionally: Hermes-4-405B-MLX-4bit, at 228GB. But MOE like Qwen3 235B-A22B (132B at 4bit), or GLM 4.6 355B-A32B (198GB at 4bit, 154GB at 3bit) would be much more relevant. Large dense model are too slow for inference on Apple Silicon. |
You're right on this. Reason we started very minimum is to test and expand the software itself. Although we are using MLX, models are not directly usable, we need to update model scripts and test them accordingly. In short, we'll add all the models supported by MLX. We are also working on many optimizations including MoE runtime routing, expert assignments and sparsity. |
Summary
Add support for new model architectures:
Changes
Testing
Also modified existent catalogue entries:
Dependencies
Commit is dependent on
distilpPRs: firstbatchxyz/distilp#18 and firstbatchxyz/distilp#17