This function fits multi-age genetic competition models using asreml::asreml(). Currently only models for forest breeding are implemented.

asr_ma(
  prep.out,
  fixed,
  random = ~1,
  spatial = TRUE,
  cor = TRUE,
  lrtest = FALSE,
  ...
)

Arguments

prep.out

A comprepfor object.

fixed

A formula object specifying the fixed terms in the model, with the response on the left of a ~ operator, and the terms, separated by + operators, on the right. If data is given, all names used in all formulae should appear in the data frame. A model with the intercept as the only fixed effect can be specified as ~ 1; there must be at least one fixed effect specified. If the response (y) evaluates to a matrix then a factor trait with levels dimnames(y)[[2]] is added to the model frame, and must be explicitly included in the model formulae (default = NULL).

random

The function is set to fit the genetic competition models in the random term. To consider further random effects, random receives a formula object specifying them. Otherwise, random = ~1 (default). This argument has the same general characteristics as fixed, but there can be no left side to the ~ operator. Variance structures imposed on random terms are specified using special model functions. See asreml::asreml() for details.

spatial

A logical value. If TRUE (default), fits a spatial-genetic competition model

cor

A logical value. If TRUE (default), fits a model considering the correlation between direct and indirect genetic effects.

lrtest

A logical value. If TRUE, performs a likelihood ratio test to verify the significance of the direct and indirect genetic effects. Defaults to FALSE.

...

Arguments passed on to asreml::asreml

sparse

A formula object, specifying the fixed effects for which the full variance-covariance matrix is not required. This argument has the same general characteristics as fixed, but there can be no left side to the ~ expression. Wald statistics are not available for sparse fixed terms in order to reduce the computing load (default = ~NULL).

G.param

Either:

  • A list object derived from the random formula, holding initial parameter estimates and boundary constraints for each term, or

  • A character string naming a comma-delimited file with a header line and three columns for the variance component name, initial value and constraint code, respectively. This file can be created using the start.values = TRUE argument; the internal list object is then generated from the contents of this file.

On termination, G.param is updated with the final random component estimates.

R.param

Either:

  • A list object derived from the random formula, holding initial parameter estimates and boundary constraints, or

  • A character string naming a comma-delimited file with a header line and three columns for the variance component name, initial value and constraint code, respectively. This file can be created using the start.values = TRUE argument; the internal list object is then generated from the contents of this file.

On termination, R.param is updated with the final residual component estimates.

na.action

A call to na.method() specifying the action to be taken when missing values are encountered in the response (y) or explanatory variables (x for factors and/or variates). The function definition for na.method is:

function(y = c("include", "omit", "fail"),

x = c("fail", "include", "omit"))

The default action is "include" (and estimate) missing values in the response, and raise an error ("fail" if there are missing values in any of the explanatory variables.

subset

A logical vector identifying which subset of the rows of data should be used in the fit. All observations are included by default.

weights

A character string or name identifying the column of data to use as weights in the fit.

predict

A list object specifying the classifying factors and related options when forming predictions from the model. This list would normally be the value returned by a call to the method predict for asreml objects.

vcm

A matrix defining relationships among variance parameters. The matrix has a row for each original variance parameter and a column for each new parameter. The default is the identity matrix, that is, no action. See function vcm.lm for further information and an example.

vcc

Equality constraints between variance parameters; a two-column numeric matrix with a dimnames attribute. The first column defines the grouping structure of equated components, that is, components within an equality group are given the same numeric index, and the second column contains the scaling coefficients. The dimnames()[[1]] attribute must match the component names in the asreml parameter vector; see start.values.

The parameters are scaled relative to the first parameter in its group, so the scaling of the first parameter in each group is one.

For example, the following vcc matrix:

11
21
22
31

is equivalent to the vcm matrix:

100
010
020
001
family

A list of functions and expressions for defining the link and variance functions.

Supported families are: gaussian, inverse Gaussian, binomial, negative binomial, poisson and Gamma. Family objects are generated from the asreml family distributions which prefix the usual function names with "asr_"; for example asr_gaussian(), asr_binomial(), etc. In addition to the link argument, these functions take an additional dispersion argument and a total argument where relevant; for example:

asr_binomial(dispersion = 1.0, total = counts).

The default for asr_gaussian() is dispersion = NA, which implies that asreml will estimate the dispersion parameter, otherwise the scale is fixed at the nominated value.

asmv

A character string or name specifying the column in the data that identifies the traits in a multivariate analysis. If not NULL, asmv implies that the data for a multivariate analysis is set up as though it were for a univariate analysis with the response in a single variate (default = NULL).

mbf

A named list specifying sets of covariates to be included with one or more mbf() model functions. Each component of the list must in turn contain components named key and cov, where cov is a character string naming the data frame holding the covariates, and key is a character vector of length two naming the columns in data and cov, respectively, used to match corresponding records in the two data frames. The default is an empty list.

equate.levels

A character vector of factor names whose levels are to be equated. For example, if factor A has levels a, b, c, d and factor B has levels a, b, c, e, the effect of equate.levels(A, B) is that both A and B have five levels, with as.numeric(A) = 1, 2, 3, 4 and as.numeric(B) = 1, 2, 3, 5. This may be necessary if using the and() model function to overlay columns of the model's design matrix in forming a compound term. The default is a zero-length character vector.

start.values

If TRUE, asreml exits prior to the fitting process and returns a list of length three: the G.param and R.param lists, and a data frame (containing variance parameter names, initial values and boundary constraints). Initial values or constraints can then be set in the list or data frame objects (default = FALSE).

If this is a character string, then a file of that name is created and the data frame object containing initial parameter values is written out in comma-separated form. This file can be edited externally and subsequently specified in the G.param or R.param arguments.

knot.points

A named list where each component is a vector of user-supplied knot points for a particular spline term; the component name is the object of the spl() model function.

pwr.points

A named list with each component containing a vector of distances to be used in a one-dimensional power model. The component names must correspond to the object arguments of the power function model terms.

wald

A named list with three components: denDF, ssType and Ftest.

  • denDF: A character string from the options: "none", "numeric", "algebraic", and "default" specifying the calculation of approximate denominator degrees of freedom. The option "none" is to suppress the computations. Algebraic computations are not feasible in large analyses; use "default" to automatically choose numeric or algebraic computations depending on problem size. The denominator degrees of freedom are calculated according to Kenward and Roger, 1997 for terms in the fixed model formula (default = "none").

  • ssType: It can be "incremental" for incremental sum of squares or "conditional" for F-tests that respect both structural and intrinsic marginality (default = "incremental".

  • Ftest: A one-sided formula of the form ~ test_term | background_terms specifying a conditional Wald test of the contribution of test_term conditional on those fixed terms listed in background_terms, and the those in the random and sparse model formulae.

prune

A named list with each component generated from a call to Subset(). The argument prune, in conjunction with Subset and the model function sbs(), forms a new factor from an existing one by selecting a subset of its levels. The function Subset is defined as:

function(f, x)

where f is the name of an existing factor and x is a character or numeric vector of levels to select. The name of the list component is the new factor that may appear in the model formulae as the argument to the sbs() model function. For example,

prune = list(A = Subset(Site, c(2, 3)))

creates a new factor A by selecting the second and third levels of Site, and would be included in the model as: sbs(A), for example by using idv(sbs(A)) as part of a random term. While the actions of prune can be duplicated outside asreml, sbs() is necessary if the asreml method predict() is to be used.

combine

A named list with each component generated from a call to Levels(). The argument combine, in conjunction with Levels and the model function gpf(), forms a new factor from an existing one by merging a subset of its levels. The function Levels is defined as:

function(f, x)

where f is the name of an existing factor and x is a vector of length
length(levels(f)) defining the levels of f to merge. The name of the list component is the new factor that may appear in the model formulae as the argument to the gpf() model function. For example, if Site has levels "1", "2" and "3",

combine = list(A = Levels(Site, c("1", "2", "1")))

creates a new factor A with levels "1" and "2" by merging levels "1" and "3" of Site, and would be included in the model as gpf(A). While the actions of combine can be duplicated outside asreml, gpf() is necessary if the asreml method predict() is to be used.

uid

A named list with each component generated from a call to Units(). The argument uid, in conjunction with Units and the model function uni(), forms a new factor by selecting a subset of records from an existing one. The function Units is defined as:

function(f, n = 0)

where f is the name of an existing factor and n is a character or numeric scalar that determines which records are selected. The default, n = 0, forms a factor with a level for each record where f is non-zero (strictly, f != 0). Otherwise, a factor with a level for each record in data where f has the value n is formed. For example,

uid = list(A = Units(group, 1))

creates a new factor A with levels from row.names(data) where group = 1, and would be included in the model as: uni(A). While the actions of uid can be duplicated outside asreml, uni() is necessary if the asreml method predict() is to be used.

mef

A named list linking a relationship matrix (or its inverse) as specified in the vm() special function with the original matrix of subject x regressor (typically molecular marker) scores. If this is not an empty list mef flags the computation of the regressor (marker) effects from the subject effects. For example,

mef = list(MM = snp.mat)

links the relationship matrix MM to the original marker scores found in the file snp.mat.

The mef list would typically be set from a call to the asreml meff() method.

last

A named list restricting the order equations are solved in the sparse partition for the nominated model terms. Each component of the list is named by a model term and contains a scalar \(n\) specifying that the first \(n\) levels of the term will be solved after all others in the sparse set. It is intended for use when there are multiple fixed terms in the sparse equations so that asreml will be consistent in which effects are identified as singular. A maximum of three factor/level pairs can be specified.

model.frame

If TRUE, the model frame (a data.table object with additional attributes derived from the model specification) is included in the returned object (default = TRUE). The model frame is required by the asreml summary, plot, resid and fitted methods.

In large analyses, the model frame is likely to be a large object. If model.frame is a character string, the model frame is saved in a file as an RDS object by a call to saveRDS(), and named by the supplied string with the extension .RDS. If the model frame is not included in the returned asreml object, this RDS file is searched for by the methods noted previously.

Value

An object of class asreml containing the results of the fitted linear model. Instances of generic methods such as plot(), predict() and summary() return various derived results of the fit. The method resid(), coef() and fitted() extract some of its components. See asreml::asreml.object() for details of the components of the returned list.

Details

A general multi-age genetic competition linear mixed model can be represented by:

$$\mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{Z}_g \mathbf{g} + \mathbf{Z}_c \mathbf{c} + \mathbf{Z}_p \mathbf{p} + \boldsymbol{\xi}$$

where \(\mathbf{y}\) is the vector of phenotypic records, \(\boldsymbol{\beta}\) is the vector of fixed effects, \(\mathbf{g}\) is the vector of direct genetic effects (DGE) within ages, \(\mathbf{c}\) is the vector of indirect genetic effects (IGE) within ages, \(\mathbf{p}\) is the vector of other random effects, and \(\boldsymbol{\varepsilon}\) is the vector of errors. \(\mathbf{X}\) is the incidence matrix of the fixed effects, \(\mathbf{Z}_g\) is the DGE incidence matrix, \(\mathbf{Z}_c\) is the IGE incidence matrix , and \(\mathbf{Z}_p\) is the design matrix of other random effects. The dimensions of \(\mathbf{Z}_c\) are the same as \(\mathbf{Z}_g\). \(\mathbf{g}\) and \(\mathbf{c}\) follow a compound symmetry structure, with a explicit variance for the main effects, and another for the genotype-by-environment interaction. The residual variance is assumed to be heterogeneous (one variance per age). If spatial = TRUE, the errors will be adjusted using a first-order autoregressive structure, with autocorrelation coefficients particularized per age. If cor = TRUE, the function will fit a model in which the main effects of \(\mathbf{g}\) and \(\mathbf{c}\) are correlated outcomes of the main genotypic effects decomposition. They both follow a Gaussian distribution, with mean centred in zero, and covariance given by:

$$\mathbf{\Sigma_g} = \begin{bmatrix}\sigma_{\text{g}}^2 & \sigma_{\text{gc}}\\\sigma_{\text{gc}} & \sigma_{\text{c}}^2\\\end{bmatrix}\otimes {{\mathbf I_V}}$$

where \(\sigma_{\text{g}}^2\) is the DGE variance, \(\sigma_{\text{c}}^2\) is the IGE variance, and \(\sigma_{\text{gc}}\) is the covariance between DGE and IGE.

The likelihood ratio test is performed using a model without the correlation between DGE and IGE.

Examples

# \donttest{
 library(gencomp)
 comp_mat = prepfor(data = euca, gen = 'clone', area = 'area',
                   plt = 'tree', age = 'age', row = 'row', col = 'col',
                   dist.col = 3, dist.row = 2, trait = 'MAI', method = 'SK',
                   n.dec = 3, verbose = FALSE, effs = c("block"))
 model = asr_ma(prep.out = comp_mat,
                fixed = MAI~ age, 
                random = ~ block:age, 
                lrtest = TRUE, 
                spatial = TRUE, 
                cor = TRUE, 
                maxit = 20)
#> ASReml Version 4.2 21/01/2025 18:22:44
#>           LogLik        Sigma2     DF     wall
#>  1     -5695.638           1.0   1645   18:22:44  (  4 restrained)
#>  2     -5632.490           1.0   1645   18:22:45  (  5 restrained)
#>  3     -5562.100           1.0   1645   18:22:45  (  3 restrained)
#>  4     -5515.785           1.0   1645   18:22:45  (  2 restrained)
#>  5     -5489.058           1.0   1645   18:22:45  (  1 restrained)
#>  6     -5461.928           1.0   1645   18:22:46  (  1 restrained)
#>  7     -5447.018           1.0   1645   18:22:46
#>  8     -5442.411           1.0   1645   18:22:46
#>  9     -5441.232           1.0   1645   18:22:47
#> 10     -5441.178           1.0   1645   18:22:47
#> 11     -5441.176           1.0   1645   18:22:47
#> ====> Starting likelihood ratio tests
#> ASReml Version 4.2 21/01/2025 18:22:47
#>           LogLik        Sigma2     DF     wall
#>  1     -5567.226           1.0   1645   18:22:48  (  1 restrained)
#>  2     -5502.968           1.0   1645   18:22:48  (  3 restrained)
#>  3     -5479.250           1.0   1645   18:22:48  (  1 restrained)
#>  4     -5460.279           1.0   1645   18:22:48  (  1 restrained)
#>  5     -5453.254           1.0   1645   18:22:49  (  1 restrained)
#>  6     -5450.791           1.0   1645   18:22:49  (  1 restrained)
#>  7     -5449.892           1.0   1645   18:22:49
#>  8     -5449.753           1.0   1645   18:22:49
#>  9     -5449.751           1.0   1645   18:22:50
#> ASReml Version 4.2 21/01/2025 18:22:50
#>           LogLik        Sigma2     DF     wall
#>  1     -5596.539           1.0   1645   18:22:50  (  1 restrained)
#>  2     -5528.593           1.0   1645   18:22:50  (  3 restrained)
#>  3     -5496.563           1.0   1645   18:22:50  (  1 restrained)
#>  4     -5469.550           1.0   1645   18:22:51  (  1 restrained)
#>  5     -5457.877           1.0   1645   18:22:51  (  1 restrained)
#>  6     -5453.135           1.0   1645   18:22:51  (  1 restrained)
#>  7     -5451.147           1.0   1645   18:22:51
#>  8     -5450.770           1.0   1645   18:22:51
#>  9     -5450.762           1.0   1645   18:22:51
#> ASReml Version 4.2 21/01/2025 18:22:51
#>           LogLik        Sigma2     DF     wall
#>  1     -5538.532           1.0   1645   18:22:52
#>  2     -5501.241           1.0   1645   18:22:52  (  2 restrained)
#>  3     -5479.725           1.0   1645   18:22:52  (  1 restrained)
#>  4     -5466.675           1.0   1645   18:22:52
#>  5     -5455.535           1.0   1645   18:22:52
#>  6     -5450.825           1.0   1645   18:22:52
#>  7     -5449.790           1.0   1645   18:22:52
#>  8     -5449.751           1.0   1645   18:22:53
#>  9     -5449.751           1.0   1645   18:22:53
#> ASReml Version 4.2 21/01/2025 18:22:53
#>           LogLik        Sigma2     DF     wall
#>  1     -5683.330           1.0   1645   18:22:53  (  1 restrained)
#>  2     -5604.604           1.0   1645   18:22:53  (  2 restrained)
#>  3     -5565.312           1.0   1645   18:22:53  (  2 restrained)
#>  4     -5541.890           1.0   1645   18:22:54  (  2 restrained)
#>  5     -5527.033           1.0   1645   18:22:54  (  1 restrained)
#>  6     -5512.722           1.0   1645   18:22:54  (  1 restrained)
#>  7     -5506.215           1.0   1645   18:22:54
#>  8     -5504.876           1.0   1645   18:22:54
#>  9     -5504.665           1.0   1645   18:22:55
#> 10     -5504.661           1.0   1645   18:22:55
#> ASReml Version 4.2 21/01/2025 18:22:55
#>           LogLik        Sigma2     DF     wall
#>  1     -5573.612           1.0   1645   18:22:55
#>  2     -5532.774           1.0   1645   18:22:55  (  2 restrained)
#>  3     -5509.337           1.0   1645   18:22:55
#>  4     -5489.122           1.0   1645   18:22:55
#>  5     -5480.358           1.0   1645   18:22:56
#>  6     -5478.228           1.0   1645   18:22:56
#>  7     -5478.111           1.0   1645   18:22:56
#>  8     -5478.108           1.0   1645   18:22:56
#> ====> LRT results:
#>    effect  LR-statistic    Pr(Chisq)
#> 1     DGE  1.098212e+02 0.000000e+00
#> 2     IGE  5.671546e+01 2.520206e-14
#> 3 DGE:age  2.022865e+00 7.747327e-02
#> 4 IGE:age -9.514352e-05 5.000000e-01
 # }