- [x] More clear instructions to run the model - [x] Add completion feature (python) - [x] Add chat feature (python) - [x] Add samples folders with several examples - [x] Check the llama-cpp-python repo - [ ] Add batching infer feature (python) - [ ] Add quantization methods - [ ] Add server API support - [ ] Add more arguments to control the model (based on the libgemma interface) - [ ] Add the stream return for the completion function