Layer normalization
Applies Layer Normalization over a mini-batch of inputs as described in the paper Layer Normalization
nn_layer_norm(normalized_shape, eps = 1e-05, elementwise_affine = TRUE)
normalized_shape |
(int or list): input shape from an expected input of size [* \times \mbox{normalized\_shape}[0] \times \mbox{normalized\_shape}[1] \times … \times \mbox{normalized\_shape}[-1]] If a single integer is used, it is treated as a singleton list, and this module will normalize over the last dimension which is expected to be of that specific size. |
eps |
a value added to the denominator for numerical stability. Default: 1e-5 |
elementwise_affine |
a boolean value that when set to |
y = \frac{x - \mathrm{E}[x]}{ √{\mathrm{Var}[x] + ε}} * γ + β
The mean and standard-deviation are calculated separately over the last
certain number dimensions which have to be of the shape specified by
normalized_shape
.
γ and β are learnable affine transform parameters of
normalized_shape
if elementwise_affine
is TRUE
.
The standard-deviation is calculated via the biased estimator, equivalent to
torch_var(input, unbiased=FALSE)
.
Input: (N, *)
Output: (N, *) (same shape as input)
Unlike Batch Normalization and Instance Normalization, which applies
scalar scale and bias for each entire channel/plane with the
affine
option, Layer Normalization applies per-element scale and
bias with elementwise_affine
.
This layer uses statistics computed from input data in both training and evaluation modes.
if (torch_is_installed()) { input <- torch_randn(20, 5, 10, 10) # With Learnable Parameters m <- nn_layer_norm(input$size()[-1]) # Without Learnable Parameters m <- nn_layer_norm(input$size()[-1], elementwise_affine=FALSE) # Normalize over last two dimensions m <- nn_layer_norm(c(10, 10)) # Normalize over last dimension of size 10 m <- nn_layer_norm(10) # Activating the module output <- m(input) }
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.