-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Labels
coreImproves core model while keeping core idea intactImproves core model while keeping core idea intactengineeringSoftware-engineering problems that don't require ML-ExpertiseSoftware-engineering problems that don't require ML-Expertise
Description
Our learning rate scheduler currently uses a linear increase and exponential dropoff, so our learning rate curve looks like the following:

where the duration of the initial ramp-up and the decay are tuneable hyperparameters.
However, others pointed out that square ramp-up and square decay can perform significantly better, so we might also want to use them. The modified curve (orange) would look like the following:

Metadata
Metadata
Assignees
Labels
coreImproves core model while keeping core idea intactImproves core model while keeping core idea intactengineeringSoftware-engineering problems that don't require ML-ExpertiseSoftware-engineering problems that don't require ML-Expertise