torch: optim_adadelta – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

torch

optim_adadelta

Adadelta optimizer

Description

It has been proposed in ADADELTA: An Adaptive Learning Rate Method

Usage

optim_adadelta(params, lr = 1, rho = 0.9, eps = 1e-06, weight_decay = 0)

Arguments

`params`	(iterable): list of parameters to optimize or list defining parameter groups
`lr`	(float, optional): learning rate (default: 1e-3)
`rho`	(float, optional): coefficient used for computing a running average of squared gradients (default: 0.9)
`eps`	(float, optional): term added to the denominator to improve numerical stability (default: 1e-6)
`weight_decay`	(float, optional): weight decay (L2 penalty) (default: 0)

Note

According to the original paper, decaying average of the squared gradients is computed as follows:

E[g^2]_{t} = ρ E[g^2]_{t- 1} + (1 - ρ){g_{t}}^2

RMS of previous squared gradients up to time t:

RMS[g_{t}] = √{E[g^2]_{t} + ε }

Adadelta update rule:

\begin{array}{ll} Δ θ_{t} = - \frac{RMS [Δ θ]_{t - 1} }{RMS[g]_{t}} θ_{t+1} = θ_{t} + Δ θ_{t} \end{array}

Examples

if (torch_is_installed()) {
## Not run: 
optimizer <- optim_adadelta(model$parameters, lr = 0.1)
optimizer$zero_grad()
loss_fn(model(input), target)$backward()
optimizer$step()

## End(Not run)

}

torch

Tensors and Neural Networks with 'GPU' Acceleration

v0.3.0

MIT + file LICENSE

Authors

Daniel Falbel [aut, cre, cph], Javier Luraschi [aut], Dmitriy Selivanov [ctb], Athos Damiani [ctb], Christophe Regouby [ctb], Krzysztof Joachimiak [ctb], RStudio [cph]

Initial release

optim_adadelta

Description

Usage

Arguments

Note

Examples

torch

We don't support your browser anymore