Hello InteractiveOmni Team,
First of all, I would like to express my sincere appreciation for your outstanding work on InteractiveOmni. The model's performance is impressive, and we find the Audio-Visual Multi-turn Dialogue capability particularly robust and well-suited for the research tasks our team is currently working on.
We are very interested in diving deeper into this project and would like to inquire about a few things regarding future updates:
-
Training Code Release: Do you have any plans to open-source the training code? Having access to the training pipeline would be incredibly helpful for us to understand the model better and adapt it to our specific scenarios.
-
Framework Support: Are there any plans to support popular training frameworks such as ms-swift or LLaMA-Factory? Integration with these frameworks would greatly facilitate the fine-tuning and deployment process for the community.
-
RL / GRPO Support: Given the recent advancements in Multimodal RL, we are wondering if you are considering supporting Reinforcement Learning methods like GRPO (Group Relative Policy Optimization) for this model? We believe this could further enhance the model's reasoning and interaction capabilities.
Thank you again for your contribution to the open-source community! Looking forward to your response.
Best regards,
Hello InteractiveOmni Team,
First of all, I would like to express my sincere appreciation for your outstanding work on InteractiveOmni. The model's performance is impressive, and we find the Audio-Visual Multi-turn Dialogue capability particularly robust and well-suited for the research tasks our team is currently working on.
We are very interested in diving deeper into this project and would like to inquire about a few things regarding future updates:
Training Code Release: Do you have any plans to open-source the training code? Having access to the training pipeline would be incredibly helpful for us to understand the model better and adapt it to our specific scenarios.
Framework Support: Are there any plans to support popular training frameworks such as ms-swift or LLaMA-Factory? Integration with these frameworks would greatly facilitate the fine-tuning and deployment process for the community.
RL / GRPO Support: Given the recent advancements in Multimodal RL, we are wondering if you are considering supporting Reinforcement Learning methods like GRPO (Group Relative Policy Optimization) for this model? We believe this could further enhance the model's reasoning and interaction capabilities.
Thank you again for your contribution to the open-source community! Looking forward to your response.
Best regards,