R/run_protocol_agmip.R
run_protocol_agmip.RdAutomate the AgMIP Phase IV Calibration protocol
run_protocol_agmip(
obs_list,
model_function,
model_options,
optim_options = list(),
param_info = NULL,
forced_param_values = NULL,
transform_var = NULL,
transform_obs = NULL,
transform_sim = NULL,
satisfy_par_const = NULL,
var_to_simulate = NULL,
info_crit_func = list(CroptimizR::AICc, CroptimizR::AIC, CroptimizR::BIC),
step,
out_dir = getwd(),
info_level = 0
)List of observed values to use in the protocol, in cropr format.
A named list (names = situation names) of data.frames, each containing:
one column named Date with the observation dates (in Date or POSIXct format),
and one column per observed variable, containing either the measured values or NA
if the variable is not observed at the given date.
Crop Model wrapper function to use.
List of options for the Crop Model wrapper (see help of the Crop Model wrapper function used).
(optional) List of options controlling the minimization method (Nelder–Mead simplex), containing:
ranseed: random seed used to make results reproducible. If NULL, the seed is not
fixed and results may differ between runs. Otherwise, set it to an integer value
(e.g. 1234). Default is NULL.
nb_rep: number of multi-start repetitions of the minimization.
If provided, this value is used for all minimizations in both step6 and step7.
The same setting is therefore applied to the whole protocol.
By default, this is set to c(10, 5) for step6 (respectively for major-parameter estimation
and for each candidate-parameter addition) and to c(20) for step7.
maxeval: maximum number of evaluations of the minimized criterion per minimization
(default: 50000).
xtol_rel: relative tolerance threshold on parameter values. Minimization stops when
parameter values change by less than xtol_rel times the absolute value of the parameter
between two successive iterations.
The default value is 1e-3, which provides a good compromise between computational cost
and numerical accuracy for most applications of the AgMIP protocol.
additional options can be provided; see ?nl.opts for a complete list.
For debugging or testing purposes (i.e. to simply check that the protocol executes correctly
without aiming at meaningful results), the user can use very small values, for example
nb_rep = 1 and maxeval = 2, in order to obtain a fast "dry run" of the whole protocol.
Such settings must of course be removed for any real calibration experiment.
Information about the parameters to estimate. A list containing:
ub and lb: named numeric vectors of upper and lower bounds,
default: named numeric vector of default values (optional).
The names correspond to the parameter names. Default values are used when a parameter is not estimated in the current step (e.g. major or candidate parameter estimated in a subsequent step, candidate parameter that was not selected, etc.), and also as one of the initial values when the parameter is estimated.
(optional) Named vector or list specifying parameter values to force in
the model.
It may also contain arithmetic expressions to define equality constraints between parameters
(see the Details section of estim_param). In this case, the values to force are computed before
each call to the model wrapper and passed through its param_values argument during the estimation procedure.
This argument should not include values for parameters that are estimated (i.e. parameters
defined in param_info).
Named vector of functions to apply both on simulated and
observed variables. transform_var=c(var1=log, var2=sqrt) will for example
apply log-transformation on simulated and observed values of variable var1,
and square-root transformation on values of variable var2.
(optional) User-defined function to transform observations before each criterion
evaluation. See the Details section of estim_param for more information.
(optional) User-defined function to transform simulations before each criterion
evaluation. See the Details section of estim_param for more information.
(optional) User-defined function to enforce inequality constraints on estimated
parameters. See the Details section of estim_param for more information.
(optional) List of variables for which the model wrapper must return results. By default the wrapper is asked to simulate only the observed variables. However, it may be useful to simulate also other variables, typically when transform_sim and/or transform_obs functions are used. Note however that it is active only if the model_function used handles this argument.
Function or list of functions used to compute information criteria (optional; see the default value in the function signature and https://sticsrpacks.github.io/CroptimizR/reference/information_criteria.html for the list of available criteria).
The values of all provided information criteria are stored in the returned object.
If parameter selection is activated (i.e. if candidate_param is provided in at least one
step of step6), the first information criterion in the list is used for parameter selection.
A list defining the sub-steps for step 6 of the AgMIP Calibration protocol (see Details section).
Path to the directory where the optimization results will be written. (optional, default to getwd())
(optional) Integer controlling how much information is stored during each call
to estim_param() inside the AgMIP calibration protocol.
This argument is a direct pass-through to the info_level argument of estim_param(),
and therefore controls the amount of diagnostic information kept in memory during the optimization steps.
Because the AgMIP protocol may involve a large number of successive calibrations,
the default value is set to 0 in order to limit memory usage and disk storage.
However, note that:
Setting info_level = 0 disables the storage of intermediate values of the simplex
and therefore prevents the generation of diagnostic plots such as
ValuesVSIt.pdf and ValuesVSIt_2D.pdf.
A value of at least 1 is required to enable these diagnostic outputs.
Higher values provide increasingly detailed information (simulations, observations, full model outputs) for each evaluation, but may lead to very large memory consumption and should therefore be used with caution inside the AgMIP protocol.
See estim_param for a detailed description of the different levels.
Prints, graphs, and a list containing the results of the AgMIP Phase IV Calibration protocol.
All results are saved in the folder specified by out_dir.
During execution, a console display indicates the description of the step currently being run.
The generated plots include:
diagnostics recommended in Wallach et al. (2025): MSE, bias², rRMSE, and Efficiency
for each variable at each step (files barplot_rRMSE_EF_per_step.pdf and
plot_MSE_Bias2_per_step.pdf),
scatter plots of simulations versus observations before and after each step
(files scatter_plots_*.pdf),
diagnostic plots for each minimization performed (see subfolders AgMIP_protocol_step6,
AgMIP_protocol_step7, and their contents).
The returned object is a list containing:
final_values: a named vector with the values of all parameters that were finally estimated.
This includes all parameters selected during step6 and re-estimated in step7.
forced_param_values: a named vector with the values of all parameters that were not estimated
in the final calibration step 7. This includes in particular:
candidate parameters that were tested during step6 but not selected,
parameters defined through equality constraints or forced by the user (see input argument forced_param_values).
obs_var_list: a character vector with the names of observed variables used in the protocol,
values_per_step: a data.frame containing the default parameter values (from param_info$default, or NA if not provided) and the estimated
values after step6 and step7,
stats_per_step: a data.frame containing statistics (MSE, bias², rRMSE, and Efficiency)
for each variable, before and after each step,
step6: a list with detailed results for step6,
step7: a list with detailed results for step7.
The AgMIP Phase IV Calibration protocol is thoroughly described in Wallach et al. (2024) and Wallach et al. (2025).
This protocol consists of two successive steps, called step6 and step7.
Step6 consists in a sequential parameter estimation by groups of variables. For each group of variables, parameters are estimated by ordinary least squares (OLS) using a multi-start Nelder–Mead optimization (i.e. several minimizations starting from different initial values). Once estimated, parameters are fixed to their estimated values for the subsequent steps.
For each group of variables, the user defines:
a set of major parameters, supposed to mainly reduce bias for these variables,
and a set of candidate parameters, expected to explain variability between environments.
Candidate parameters should be ordered, as far as possible, by decreasing expected importance. For each group of variables, candidate parameters are progressively added to the list of parameters to estimate, and are retained only if they improve an information criterion (corrected Akaike Information Criterion by default). If a candidate parameter is not selected, it is fixed to its default value.
By default, the estimation of the major parameters for a given step is performed using 10 multi-start repetitions. When candidate parameters are considered, 5 additional multi-start repetitions are performed each time a new candidate parameter is added to the set of parameters to estimate.
Step7 consists in re-estimating all parameters selected during step6 using all available observations, by weighted least squares (WLS). The weights are set to the estimated standard deviation of the model error for each variable, as obtained at the end of step6.
The WLS minimization is performed using a multi-start Nelder–Mead simplex algorithm with 20 repetitions by default. The first two repetitions are initialized respectively from: (i) the parameter values estimated at the end of step6, and (ii) the default parameter values. The remaining repetitions are initialized from parameter values randomly drawn within their respective bounds.
Protocol definitions (step, param_info, forced_param_values) can either be:
Created directly in R (as detailed in sections and examples below), or
Loaded from an Excel file using load_protocol_agmip.
Helper functions get_agmip_protocol_template and get_agmip_protocol_example
are provided to obtain a ready-to-use template or a fully worked example of an AgMIP
calibration protocol, respectively. See the vignette agmip_calibration_protocol for
a complete, step-by-step workflow.
step)The argument step is a list of lists describing the successive sub-steps to apply in step6
of the AgMIP protocol. Each sub-step corresponds to a group of variables (e.g. phenology,
biomass, etc.).
Each element of step is a named list that must contain:
obs_var: a character vector giving the names of the observed variables to use at this step,
optionally, major_param: a character vector giving the names of the major parameters to estimate at this step,
optionally, candidate_param: a character vector giving the names of the candidate parameters.
At least one of major_param or candidate_param must be provided.
If candidate_param is not provided, only the major parameters are estimated for this step.
If major_param is not provided, the step only performs candidate-parameter selection.
The name of each list element is optional, but it is recommended to use the name of the corresponding group of variables. This name is used in printed outputs and in the results.
Technical information about parameters (bounds, default values, ...), the observation list
in cropr format, optimization options, forced parameter values, transformation functions,
functions defining equality constraints, the information criterion function, etc., can be
provided once for all steps via the corresponding arguments of run_protocol_agmip
(param_info, obs_list, ...).
If the user wants this information to be specific to a given step, it can also be provided
inside the corresponding step description, using the same argument names. Please note however,
that obs_list and transform_obs cannot be provided inside a sub-step.
They must always be passed directly as arguments to run_protocol_agmip.
For example:
param_info <- list(
p1 = list(lb = 0, ub = 1, default = 0.1),
p2 = list(lb = 0, ub = 1, default = 0.5),
p3 = list(lb = 5, ub = 15, default = 15)
)
steps <- list(
group1 = list(
obs_var = c("var1", "var2"),
major_param = c("p1"),
candidate_param = c("p2")
),
group2 = list(
obs_var = c("var3"),
major_param = c("p3")
)
)
res <- run_protocol_agmip(
obs_list = obs_list,
model_function = my_model_wrapper,
model_options = model_options,
param_info = param_info,
step = steps
)In this example, step6 of the AgMIP protocol will be run in two successive steps called
"group1" and "group2".
In the first step, variables "var1" and "var2" are used to estimate parameter "p1"
(major parameter), and candidate parameter "p2" is considered in the automatic parameter
selection procedure.
The observations for variables "var1" and "var2", as well as the information about
parameters "p1" and "p2", are automatically extracted from obs_list and param_info,
which contain the information for all steps.
Note that in step7, all observations included in obs_list are used, regardless of the
obs_var variables defined for step6.
Thus, observations for variables not used in step6 (e.g. because no parameter is directly
associated with these variables) can still be used in step7, where all parameters selected
during step6 are re-estimated using all observed variables (WLS step).
load_protocol_agmip to extract step, param_info and forced_param_values from a structured Excel file,
get_agmip_protocol_template and get_agmip_protocol_example to obtain
template and example protocol files,
The vignette agmip_calibration_protocol for a complete example workflow,
Wallach et al. (2024, 2025) for detailed AgMIP protocol description and examples,
estim_param for basic parameter estimation using CroptimizR.