Skip to content

feat(megatron): Add support for Gated Delta Net (GDN) & Kimi Delta Attention (KDA)#676

Draft
clairesonglee wants to merge 20 commits intomainfrom
clairlee/kda-optimized-training-patch
Draft

feat(megatron): Add support for Gated Delta Net (GDN) & Kimi Delta Attention (KDA)#676
clairesonglee wants to merge 20 commits intomainfrom
clairlee/kda-optimized-training-patch

Conversation

@clairesonglee
Copy link
Copy Markdown
Collaborator

No description provided.

Comment thread tools/chat_zebra_llama.py
Comment on lines +121 to +122
# try:
# user = input("User> ").strip()
Comment thread tools/chat_zebra_llama.py
Comment on lines +127 to +134
# if not user:
# continue
# if user == "/exit":
# return
# if user == "/reset":
# history.clear()
# print("[Info] History cleared.\n")
# continue
Comment on lines +120 to +123
# if 'layer_norm_weight' in new_key:
# new_key = new_key.replace('layer_norm_weight', 'weight')
# if 'layer_norm_bias' in new_key:
# new_key = new_key.replace('layer_norm_bias', 'bias')

# Ensure that the tensor passed between pipeline parallel stages is
# viewless. See related notes in TransformerBlock and TransformerLayer
output = make_viewless_tensor(
):
# no cache support
_ = past_key_value
use_cache = False
):
# No KV cache support for now.
_ = past_key_value
use_cache = False
):
# No cache support
_ = past_key_values
use_cache = False

import argparse
import json
import os
from megatron.core.inference.contexts import BaseInferenceContext
from megatron.core.process_groups_config import ProcessGroupCollection
from megatron.core.ssm.mamba_hybrid_layer_allocation import Symbols as LayerSymbols
from megatron.core.ssm.mamba_hybrid_layer_allocation import allocate_layers
from megatron.core.transformer.identity_op import IdentityOp
from megatron.core.fusions.fused_bias_dropout import get_bias_dropout_add
from megatron.core.models.gpt.moe_module_specs import get_moe_module_spec
from megatron.core.ssm.mamba_block import MambaStack, MambaStackSubmodules
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants