Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

optim_adadelta

Adadelta optimizer


Description

Usage

optim_adadelta(params, lr = 1, rho = 0.9, eps = 1e-06, weight_decay = 0)

Arguments

params

(iterable): list of parameters to optimize or list defining parameter groups

lr

(float, optional): learning rate (default: 1e-3)

rho

(float, optional): coefficient used for computing a running average of squared gradients (default: 0.9)

eps

(float, optional): term added to the denominator to improve numerical stability (default: 1e-6)

weight_decay

(float, optional): weight decay (L2 penalty) (default: 0)

Note

According to the original paper, decaying average of the squared gradients is computed as follows:

E[g^2]_{t} = ρ E[g^2]_{t- 1} + (1 - ρ){g_{t}}^2

RMS of previous squared gradients up to time t:

RMS[g_{t}] = √{E[g^2]_{t} + ε }

Adadelta update rule:

\begin{array}{ll} Δ θ_{t} = - \frac{RMS [Δ θ]_{t - 1} }{RMS[g]_{t}} θ_{t+1} = θ_{t} + Δ θ_{t} \end{array}

Examples

if (torch_is_installed()) {
## Not run: 
optimizer <- optim_adadelta(model$parameters, lr = 0.1)
optimizer$zero_grad()
loss_fn(model(input), target)$backward()
optimizer$step()

## End(Not run)

}

torch

Tensors and Neural Networks with 'GPU' Acceleration

v0.3.0
MIT + file LICENSE
Authors
Daniel Falbel [aut, cre, cph], Javier Luraschi [aut], Dmitriy Selivanov [ctb], Athos Damiani [ctb], Christophe Regouby [ctb], Krzysztof Joachimiak [ctb], RStudio [cph]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.