Automate the AgMIP Phase IV Calibration protocol

run_protocol_agmip(
  obs_list,
  model_function,
  model_options,
  optim_options = list(),
  param_info = NULL,
  forced_param_values = NULL,
  transform_var = NULL,
  transform_obs = NULL,
  transform_sim = NULL,
  satisfy_par_const = NULL,
  var_to_simulate = NULL,
  info_crit_func = list(CroptimizR::AICc, CroptimizR::AIC, CroptimizR::BIC),
  step,
  out_dir = getwd(),
  info_level = 0
)

Arguments

obs_list

List of observed values to use in the protocol, in cropr format. A named list (names = situation names) of data.frames, each containing:

one column named Date with the observation dates (in Date or POSIXct format),
and one column per observed variable, containing either the measured values or NA if the variable is not observed at the given date.

model_function

Crop Model wrapper function to use.

model_options

List of options for the Crop Model wrapper (see help of the Crop Model wrapper function used).

optim_options

(optional) List of options controlling the minimization method (Nelder–Mead simplex), containing:

ranseed: random seed used to make results reproducible. If NULL, the seed is not fixed and results may differ between runs. Otherwise, set it to an integer value (e.g. 1234). Default is NULL.
nb_rep: number of multi-start repetitions of the minimization. If provided, this value is used for all minimizations in both step6 and step7. The same setting is therefore applied to the whole protocol. By default, this is set to c(10, 5) for step6 (respectively for major-parameter estimation and for each candidate-parameter addition) and to c(20) for step7.
maxeval: maximum number of evaluations of the minimized criterion per minimization (default: 50000).
xtol_rel: relative tolerance threshold on parameter values. Minimization stops when parameter values change by less than xtol_rel times the absolute value of the parameter between two successive iterations. The default value is 1e-3, which provides a good compromise between computational cost and numerical accuracy for most applications of the AgMIP protocol.
additional options can be provided; see ?nl.opts for a complete list.

For debugging or testing purposes (i.e. to simply check that the protocol executes correctly without aiming at meaningful results), the user can use very small values, for example nb_rep = 1 and maxeval = 2, in order to obtain a fast "dry run" of the whole protocol. Such settings must of course be removed for any real calibration experiment.

param_info

Information about the parameters to estimate. A list containing:

ub and lb: named numeric vectors of upper and lower bounds,
default: named numeric vector of default values (optional).

The names correspond to the parameter names. Default values are used when a parameter is not estimated in the current step (e.g. major or candidate parameter estimated in a subsequent step, candidate parameter that was not selected, etc.), and also as one of the initial values when the parameter is estimated.

forced_param_values

(optional) Named vector or list specifying parameter values to force in the model. It may also contain arithmetic expressions to define equality constraints between parameters (see the Details section of estim_param). In this case, the values to force are computed before each call to the model wrapper and passed through its param_values argument during the estimation procedure. This argument should not include values for parameters that are estimated (i.e. parameters defined in param_info).

transform_var

Named vector of functions to apply both on simulated and observed variables. transform_var=c(var1=log, var2=sqrt) will for example apply log-transformation on simulated and observed values of variable var1, and square-root transformation on values of variable var2.

transform_obs

(optional) User-defined function to transform observations before each criterion evaluation. See the Details section of estim_param for more information.

transform_sim

(optional) User-defined function to transform simulations before each criterion evaluation. See the Details section of estim_param for more information.

satisfy_par_const

(optional) User-defined function to enforce inequality constraints on estimated parameters. See the Details section of estim_param for more information.

var_to_simulate

(optional) List of variables for which the model wrapper must return results. By default the wrapper is asked to simulate only the observed variables. However, it may be useful to simulate also other variables, typically when transform_sim and/or transform_obs functions are used. Note however that it is active only if the model_function used handles this argument.

info_crit_func

Function or list of functions used to compute information criteria (optional; see the default value in the function signature and https://sticsrpacks.github.io/CroptimizR/reference/information_criteria.html for the list of available criteria).

The values of all provided information criteria are stored in the returned object. If parameter selection is activated (i.e. if candidate_param is provided in at least one step of step6), the first information criterion in the list is used for parameter selection.

step

A list defining the sub-steps for step 6 of the AgMIP Calibration protocol (see Details section).

out_dir

Path to the directory where the optimization results will be written. (optional, default to getwd())

info_level

(optional) Integer controlling how much information is stored during each call to estim_param() inside the AgMIP calibration protocol.

This argument is a direct pass-through to the info_level argument of estim_param(), and therefore controls the amount of diagnostic information kept in memory during the optimization steps.

Because the AgMIP protocol may involve a large number of successive calibrations, the default value is set to 0 in order to limit memory usage and disk storage.

However, note that:

Setting info_level = 0 disables the storage of intermediate values of the simplex and therefore prevents the generation of diagnostic plots such as ValuesVSIt.pdf and ValuesVSIt_2D.pdf.
A value of at least 1 is required to enable these diagnostic outputs.

Higher values provide increasingly detailed information (simulations, observations, full model outputs) for each evaluation, but may lead to very large memory consumption and should therefore be used with caution inside the AgMIP protocol.

See estim_param for a detailed description of the different levels.

Value

Prints, graphs, and a list containing the results of the AgMIP Phase IV Calibration protocol. All results are saved in the folder specified by out_dir.

During execution, a console display indicates the description of the step currently being run.

The generated plots include:

diagnostics recommended in Wallach et al. (2025): MSE, bias², rRMSE, and Efficiency for each variable at each step (files barplot_rRMSE_EF_per_step.pdf and plot_MSE_Bias2_per_step.pdf),
scatter plots of simulations versus observations before and after each step (files scatter_plots_*.pdf),
diagnostic plots for each minimization performed (see subfolders AgMIP_protocol_step6, AgMIP_protocol_step7, and their contents).

The returned object is a list containing:

final_values: a named vector with the values of all parameters that were finally estimated. This includes all parameters selected during step6 and re-estimated in step7.
forced_param_values: a named vector with the values of all parameters that were not estimated in the final calibration step 7. This includes in particular:
- candidate parameters that were tested during step6 but not selected,
- parameters defined through equality constraints or forced by the user (see input argument forced_param_values).
obs_var_list: a character vector with the names of observed variables used in the protocol,
values_per_step: a data.frame containing the default parameter values (from param_info$default, or NA if not provided) and the estimated values after step6 and step7,
stats_per_step: a data.frame containing statistics (MSE, bias², rRMSE, and Efficiency) for each variable, before and after each step,
step6: a list with detailed results for step6,
step7: a list with detailed results for step7.

Details

The AgMIP Phase IV Calibration protocol

The AgMIP Phase IV Calibration protocol is thoroughly described in Wallach et al. (2024) and Wallach et al. (2025).

This protocol consists of two successive steps, called step6 and step7.

Step6 consists in a sequential parameter estimation by groups of variables. For each group of variables, parameters are estimated by ordinary least squares (OLS) using a multi-start Nelder–Mead optimization (i.e. several minimizations starting from different initial values). Once estimated, parameters are fixed to their estimated values for the subsequent steps.

For each group of variables, the user defines:

a set of major parameters, supposed to mainly reduce bias for these variables,
and a set of candidate parameters, expected to explain variability between environments.

Candidate parameters should be ordered, as far as possible, by decreasing expected importance. For each group of variables, candidate parameters are progressively added to the list of parameters to estimate, and are retained only if they improve an information criterion (corrected Akaike Information Criterion by default). If a candidate parameter is not selected, it is fixed to its default value.

By default, the estimation of the major parameters for a given step is performed using 10 multi-start repetitions. When candidate parameters are considered, 5 additional multi-start repetitions are performed each time a new candidate parameter is added to the set of parameters to estimate.

Step7 consists in re-estimating all parameters selected during step6 using all available observations, by weighted least squares (WLS). The weights are set to the estimated standard deviation of the model error for each variable, as obtained at the end of step6.

The WLS minimization is performed using a multi-start Nelder–Mead simplex algorithm with 20 repetitions by default. The first two repetitions are initialized respectively from: (i) the parameter values estimated at the end of step6, and (ii) the default parameter values. The remaining repetitions are initialized from parameter values randomly drawn within their respective bounds.

Loading protocol definitions from Excel

Protocol definitions (step, param_info, forced_param_values) can either be:

Created directly in R (as detailed in sections and examples below), or
Loaded from an Excel file using load_protocol_agmip.

Helper functions get_agmip_protocol_template and get_agmip_protocol_example are provided to obtain a ready-to-use template or a fully worked example of an AgMIP calibration protocol, respectively. See the vignette agmip_calibration_protocol for a complete, step-by-step workflow.

Describing step6 (argument `step`)

The argument step is a list of lists describing the successive sub-steps to apply in step6 of the AgMIP protocol. Each sub-step corresponds to a group of variables (e.g. phenology, biomass, etc.).

Each element of step is a named list that must contain:

obs_var: a character vector giving the names of the observed variables to use at this step,
optionally, major_param: a character vector giving the names of the major parameters to estimate at this step,
optionally, candidate_param: a character vector giving the names of the candidate parameters.

At least one of major_param or candidate_param must be provided. If candidate_param is not provided, only the major parameters are estimated for this step. If major_param is not provided, the step only performs candidate-parameter selection.

The name of each list element is optional, but it is recommended to use the name of the corresponding group of variables. This name is used in printed outputs and in the results.

Technical information about parameters (bounds, default values, ...), the observation list in cropr format, optimization options, forced parameter values, transformation functions, functions defining equality constraints, the information criterion function, etc., can be provided once for all steps via the corresponding arguments of run_protocol_agmip (param_info, obs_list, ...).

If the user wants this information to be specific to a given step, it can also be provided inside the corresponding step description, using the same argument names. Please note however, that obs_list and transform_obs cannot be provided inside a sub-step. They must always be passed directly as arguments to run_protocol_agmip.

For example:

param_info <- list(
  p1 = list(lb = 0, ub = 1, default = 0.1),
  p2 = list(lb = 0, ub = 1, default = 0.5),
  p3 = list(lb = 5, ub = 15, default = 15)
)

steps <- list(
  group1 = list(
    obs_var = c("var1", "var2"),
    major_param = c("p1"),
    candidate_param = c("p2")
  ),
  group2 = list(
    obs_var = c("var3"),
    major_param = c("p3")
  )
)

res <- run_protocol_agmip(
  obs_list = obs_list,
  model_function = my_model_wrapper,
  model_options = model_options,
  param_info = param_info,
  step = steps
)

In this example, step6 of the AgMIP protocol will be run in two successive steps called "group1" and "group2". In the first step, variables "var1" and "var2" are used to estimate parameter "p1" (major parameter), and candidate parameter "p2" is considered in the automatic parameter selection procedure. The observations for variables "var1" and "var2", as well as the information about parameters "p1" and "p2", are automatically extracted from obs_list and param_info, which contain the information for all steps.

Observations used in step7

Note that in step7, all observations included in obs_list are used, regardless of the obs_var variables defined for step6. Thus, observations for variables not used in step6 (e.g. because no parameter is directly associated with these variables) can still be used in step7, where all parameters selected during step6 are re-estimated using all observed variables (WLS step).