SFSI: fitBLUP – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

SFSI

fitBLUP

Function fitBLUP

Description

Solves the Linear Mixed Model and calculates the Best Linear Unbiased Predictor (BLUP)

Usage

fitBLUP(y, X = NULL, Z = NULL, K = NULL, U = NULL, d = NULL, 
        indexK = NULL, h2 = NULL, BLUP = TRUE, 
        method = c("REML","ML"), return.Hinv = FALSE, tol = 1E-5, 
        maxIter = 1000, interval = c(1E-9,1E9), warn = TRUE)

Arguments

`y`	(numeric vector) Response variable
`X`	(numeric matrix) Design matrix for fixed effects. When `X=NULL` a vector of ones is constructed only for the intercept (default)
`Z`	(numeric matrix) Design matrix for random effects. When `Z=NULL` an identity matrix is considered (default) thus G = K; otherwise G = Z K Z' is used
`K`	(numeric matrix) Kinship relationships. This can be a name of a binary file where the matrix is stored
`U`	(numeric matrix) Eigenvectors from spectral value decomposition of G = U D U'
`d`	(numeric vector) Eigenvalues from spectral value decomposition of G = U D U'
`indexK`	(integer vector) Which columns and rows will be read when `K` is the name of a binary file. Default `indexK=NULL` will read the whole matrix
`h2`	(numeric) Heritability of the response variable. When it is not `NULL`, the variance components are not estimated and only fixed and random effects are computed
`BLUP`	`TRUE` or `FALSE` to whether return the random effects estimates
`method`	(character) Either 'REML' (Restricted Maximum Likelihood) or 'ML' (Maximum Likelihood)
`return.Hinv`	`TRUE` or `FALSE` to whether return the inverse of the matrix H
`tol`	(numeric) Maximum error between two consecutive solutions (convergence tolerance) when finding the root of the log-likelihood's first derivative
`maxIter`	(integer) Maximum number of iterations to run before convergence is reached
`interval`	(numeric vector) Range of values in which the root is searched
`warn`	`TRUE` or `FALSE` to whether show warnings

Details

The basic linear mixed model that relates phenotypes with genetic values is of the form

y = X b + Z u + e

where y is a vector with the response, b is the vector of fixed effects, u is the vector of the (random) genetic values of the genotypes, e is the vector of environmental residuals (random error), and X and Z are design matrices conecting the fixed and genetic effects with replicates. Genetic values are assumed to follow a Normal distribution as u ~ N(0,σ²_uK), and the error terms are assumed e ~ N(0,σ²_eI).

The resulting vector of genetic values g = Z u will therefore follow g ~ N(0,σ²_uG) where G = Z K Z'. In the un-replicated case, Z = I is an identity matrix, and hence g = u and G = K.

Only information from observed data (training set) is used to derive predictions of the values u_trn = (u_i), i = 1,2,...,n_trn, as

u_trn = H (y_trn - X_trnb)

where H is a matrix of weights given by

H = G_trn (G_trn + λ₀I)^-1

where G_trn is the sub-matrix corresponding to the training set, and λ₀ = σ²_e/σ²_u is the variance components ratio representing a shrinkage parameter. This parameter is expressed in terms of the heritability, h² = σ²_u/(σ²_u + σ²_e), as λ₀ = (1 - h²)/h².

The prediction of values u_tst corresponding to un-observed data (testing set) can be done by using

H = G_tst,trn (G_trn + λ₀I)^-1

where G_tst,trn is the sub-matrix of G corresponding to the testing set (in rows) and training set (in columns).

Solutions are found using the GEMMA (Genome-wide Efficient Mixed Model Analysis) approach (Zhou & Stephens, 2012). GEMMA implements the Brent's method to efficiently optimize the first derivative of the log-likelihood to solve for the variance components ratio.

Value

Returns a list object that contains the elements:

b: (vector) fixed effects solutions (including the intercept).
u: (vector) random effects solutions.
varU: random effect variance.
varE: error variance.
h2: heritability.
convergence: (logical) whether Brent's method converged.
method: either 'REML' or 'ML' method used.

Author(s)

Paulino Perez-Rodriguez, Marco Lopez-Cruz (maraloc@gmail.com) and Gustavo de los Campos

References

VanRaden PM (2008). Efficient methods to compute genomic predictions. Journal of Dairy Science, 91(11), 4414–4423.

Zhou X, Stephens M (2012). Genome-wide efficient mixed-model analysis for association studies. Nature Genetics, 44(7), 821-824

Examples

require(SFSI)
  data(wheatHTP)
  
  X = scale(X[1:300,])/sqrt(ncol(X))   # Subset and scale markers
  G = tcrossprod(X)                    # Genomic relationship matrix
  y = as.vector(scale(Y[1:300,"YLD"])) # Subset response variable

  # Training and testing sets
  tst = 1:ceiling(0.3*length(y))
  trn = seq_along(y)[-tst]

  yNA <- y
  yNA[tst] <- NA
  fm1 = fitBLUP(yNA, K=G)
  plot(y[tst],fm1$u[tst])      # Predicted vs observed values in testing set
  cor(y[tst],fm1$u[tst])       # Prediction accuracy in testing set
  cor(y[trn],fm1$u[trn])       # Prediction accuracy in training set
  fm1$h2                       # Heritability
  
  if(requireNamespace("float")){
   # Using a 'float' type variable
   G2 = float::fl(G)
   fm2 = fitBLUP(yNA, K=G2)
   max(abs(fm1$u-fm2$u))  # Check for discrepances
  }

SFSI

Sparse Family and Selection Index

v0.3.0

GPL-3

Authors

Marco Lopez-Cruz [aut, cre], Gustavo de los Campos [aut], Paulino Perez-Rodriguez [ctb]

Initial release

2021-04-29

fitBLUP

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

SFSI

We don't support your browser anymore