Adadelta optimizer
It has been proposed in ADADELTA: An Adaptive Learning Rate Method
optim_adadelta(params, lr = 1, rho = 0.9, eps = 1e-06, weight_decay = 0)
params |
(iterable): list of parameters to optimize or list defining parameter groups |
lr |
(float, optional): learning rate (default: 1e-3) |
rho |
(float, optional): coefficient used for computing a running average of squared gradients (default: 0.9) |
eps |
(float, optional): term added to the denominator to improve numerical stability (default: 1e-6) |
weight_decay |
(float, optional): weight decay (L2 penalty) (default: 0) |
According to the original paper, decaying average of the squared gradients is computed as follows:
E[g^2]_{t} = ρ E[g^2]_{t- 1} + (1 - ρ){g_{t}}^2
RMS of previous squared gradients up to time t:
RMS[g_{t}] = √{E[g^2]_{t} + ε }
Adadelta update rule:
\begin{array}{ll} Δ θ_{t} = - \frac{RMS [Δ θ]_{t - 1} }{RMS[g]_{t}} θ_{t+1} = θ_{t} + Δ θ_{t} \end{array}
if (torch_is_installed()) { ## Not run: optimizer <- optim_adadelta(model$parameters, lr = 0.1) optimizer$zero_grad() loss_fn(model(input), target)$backward() optimizer$step() ## End(Not run) }
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.