Is your feature request related to a problem? Please describe.
Right now, in dynamicemb, for embeddings, there are multiple times memory copy happening in the forward and backward, which hurts the performance. To achieve best practice, we should batch those copies.
Describe the solution you'd like
- For HBM mode, remove intermedia unique_emb buffer
- For cache mode, reuse forward results in backward to avoid touching non-HBM storage multiple times in single iteration
- Fuse insert with update for training.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.
By submitting this issue, you agree to follow our code of conduct and our contributing guidelines.
Is your feature request related to a problem? Please describe.
Right now, in dynamicemb, for embeddings, there are multiple times memory copy happening in the forward and backward, which hurts the performance. To achieve best practice, we should batch those copies.
Describe the solution you'd like
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.
By submitting this issue, you agree to follow our code of conduct and our contributing guidelines.