Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

fitBLUP

Function fitBLUP


Description

Solves the Linear Mixed Model and calculates the Best Linear Unbiased Predictor (BLUP)

Usage

fitBLUP(y, X = NULL, Z = NULL, K = NULL, U = NULL, d = NULL, 
        indexK = NULL, h2 = NULL, BLUP = TRUE, 
        method = c("REML","ML"), return.Hinv = FALSE, tol = 1E-5, 
        maxIter = 1000, interval = c(1E-9,1E9), warn = TRUE)

Arguments

y

(numeric vector) Response variable

X

(numeric matrix) Design matrix for fixed effects. When X=NULL a vector of ones is constructed only for the intercept (default)

Z

(numeric matrix) Design matrix for random effects. When Z=NULL an identity matrix is considered (default) thus G = K; otherwise G = Z K Z' is used

K

(numeric matrix) Kinship relationships. This can be a name of a binary file where the matrix is stored

U

(numeric matrix) Eigenvectors from spectral value decomposition of G = U D U'

d

(numeric vector) Eigenvalues from spectral value decomposition of G = U D U'

indexK

(integer vector) Which columns and rows will be read when K is the name of a binary file. Default indexK=NULL will read the whole matrix

h2

(numeric) Heritability of the response variable. When it is not NULL, the variance components are not estimated and only fixed and random effects are computed

BLUP

TRUE or FALSE to whether return the random effects estimates

method

(character) Either 'REML' (Restricted Maximum Likelihood) or 'ML' (Maximum Likelihood)

return.Hinv

TRUE or FALSE to whether return the inverse of the matrix H

tol

(numeric) Maximum error between two consecutive solutions (convergence tolerance) when finding the root of the log-likelihood's first derivative

maxIter

(integer) Maximum number of iterations to run before convergence is reached

interval

(numeric vector) Range of values in which the root is searched

warn

TRUE or FALSE to whether show warnings

Details

The basic linear mixed model that relates phenotypes with genetic values is of the form

y = X b + Z u + e

where y is a vector with the response, b is the vector of fixed effects, u is the vector of the (random) genetic values of the genotypes, e is the vector of environmental residuals (random error), and X and Z are design matrices conecting the fixed and genetic effects with replicates. Genetic values are assumed to follow a Normal distribution as u ~ N(02uK), and the error terms are assumed e ~ N(02eI).

The resulting vector of genetic values g = Z u will therefore follow g ~ N(02uG) where G = Z K Z'. In the un-replicated case, Z = I is an identity matrix, and hence g = u and G = K.

Only information from observed data (training set) is used to derive predictions of the values utrn = (ui), i = 1,2,...,ntrn, as

utrn = H (ytrn - Xtrnb)

where H is a matrix of weights given by

H = Gtrn (Gtrn + λ0I)-1

where Gtrn is the sub-matrix corresponding to the training set, and λ0 = σ2e2u is the variance components ratio representing a shrinkage parameter. This parameter is expressed in terms of the heritability, h2 = σ2u/(σ2u + σ2e), as λ0 = (1 - h2)/h2.

The prediction of values utst corresponding to un-observed data (testing set) can be done by using

H = Gtst,trn (Gtrn + λ0I)-1

where Gtst,trn is the sub-matrix of G corresponding to the testing set (in rows) and training set (in columns).

Solutions are found using the GEMMA (Genome-wide Efficient Mixed Model Analysis) approach (Zhou & Stephens, 2012). GEMMA implements the Brent's method to efficiently optimize the first derivative of the log-likelihood to solve for the variance components ratio.

Value

Returns a list object that contains the elements:

  • b: (vector) fixed effects solutions (including the intercept).

  • u: (vector) random effects solutions.

  • varU: random effect variance.

  • varE: error variance.

  • h2: heritability.

  • convergence: (logical) whether Brent's method converged.

  • method: either 'REML' or 'ML' method used.

Author(s)

Paulino Perez-Rodriguez, Marco Lopez-Cruz (maraloc@gmail.com) and Gustavo de los Campos

References

VanRaden PM (2008). Efficient methods to compute genomic predictions. Journal of Dairy Science, 91(11), 4414–4423.

Zhou X, Stephens M (2012). Genome-wide efficient mixed-model analysis for association studies. Nature Genetics, 44(7), 821-824

Examples

require(SFSI)
  data(wheatHTP)
  
  X = scale(X[1:300,])/sqrt(ncol(X))   # Subset and scale markers
  G = tcrossprod(X)                    # Genomic relationship matrix
  y = as.vector(scale(Y[1:300,"YLD"])) # Subset response variable

  # Training and testing sets
  tst = 1:ceiling(0.3*length(y))
  trn = seq_along(y)[-tst]

  yNA <- y
  yNA[tst] <- NA
  fm1 = fitBLUP(yNA, K=G)
  plot(y[tst],fm1$u[tst])      # Predicted vs observed values in testing set
  cor(y[tst],fm1$u[tst])       # Prediction accuracy in testing set
  cor(y[trn],fm1$u[trn])       # Prediction accuracy in training set
  fm1$h2                       # Heritability
  
  if(requireNamespace("float")){
   # Using a 'float' type variable
   G2 = float::fl(G)
   fm2 = fitBLUP(yNA, K=G2)
   max(abs(fm1$u-fm2$u))  # Check for discrepances
  }

SFSI

Sparse Family and Selection Index

v0.3.0
GPL-3
Authors
Marco Lopez-Cruz [aut, cre], Gustavo de los Campos [aut], Paulino Perez-Rodriguez [ctb]
Initial release
2021-04-29

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.