main function for parameter estimation

estim_param(
  obs_list,
  model_function,
  model_options = NULL,
  crit_function = crit_log_cwss,
  optim_method = "nloptr.simplex",
  optim_options = NULL,
  param_info,
  forced_param_values = NULL,
  candidate_param = NULL,
  transform_var = NULL,
  transform_obs = NULL,
  transform_sim = NULL,
  satisfy_par_const = NULL,
  var_to_simulate = NULL,
  info_crit_func = list(CroptimizR::AICc, CroptimizR::AIC, CroptimizR::BIC),
  weight = NULL,
  step = NULL,
  out_dir = getwd(),
  info_level = 1,
  var = lifecycle::deprecated()
)

Arguments

obs_list

List of observed values to use for parameter estimation. A named list (names = situations names) of data.frame containing one column named Date with the dates (Date or POSIXct format) of the different observations and one column per observed variables with either the measured values or NA, if the variable is not observed at the given date. See details section for more information on the list of observations actually used during the parameter estimation procedure.

model_function

Crop Model wrapper function to use.

model_options

List of options for the Crop Model wrapper (see help of the Crop Model wrapper function used).

crit_function

Function implementing the criterion to optimize (optional, see default value in the function signature). See here for more details about the list of proposed criteria.

optim_method

Name of the parameter estimation method to use (optional, see default value in the function signature). For the moment, can be "simplex" or "dreamzs". See here for a brief description and references on the available methods.

optim_options

List of options of the parameter estimation method, containing:

  • ranseed Set random seed so that each execution of estim_param give the same results when using the same seed. If you want randomization, set it to NULL, otherwise set it to a number of your choice (e.g. 1234) (optional, default to NULL, which means random seed)

  • specific options depending on the method used. Click on the links to see examples with the simplex and DreamZS methods.

  • out_dir [Deprecated] Definition of out_dir in optim_options is no longer supported, use the new argument out_dir of estim_param instead.

param_info

Information on the parameters to estimate. Either a list containing:

  • ub and lb, named vectors of upper and lower bounds (-Inf and Inf can be used if init_values is provided),

  • default, named vectors of default values (optional, corresponding parameters are set to these values when the parameter is part of the candidate_param list and when it is not estimated ; these values are also used as first initial values when the parameters are estimated)

  • init_values, a data.frame containing initial values to test for the parameters (optional, if not provided, or if less values than number of repetitions of the minimization are provided, the, or part of the, initial values will be randomly generated using LHS sampling within parameter bounds).

or a named list containing for each parameter:

  • sit_list, list the groups of situations for which the current estimated parameter must take different values (see here for an example),

  • ub and lb, vectors of upper and lower bounds (one value per group),

  • init_values, the list of initial values per group (data.frame, one column per group, optional).

  • default, vector of default values per group (optional, the parameter is set to its default value when it is part of the candidate_param list and when it is not estimated ; the default value is also used as first initial value when the parameter is estimated)

forced_param_values

Named vector or list, must contain the values (or arithmetic expression, see details section) for the model parameters to force. The corresponding values will be transferred to the model wrapper through its param_values argument during the estimation procedure. Should not include values for estimated parameters (i.e. parameters defined in param_info argument), except if they are listed as candidate parameters (see argument candidate_param).

candidate_param

Names of the parameters, among those defined in the argument param_info, that must only be considered as candidate for parameter estimation (see details section). All parameters included in param_info that are not listed in candidate_param will be estimated.

transform_var

Named vector of functions to apply both on simulated and observed variables. transform_var=c(var1=log, var2=sqrt) will for example apply log-transformation on simulated and observed values of variable var1, and square-root transformation on values of variable var2.

transform_obs

User function for transforming observations before each criterion evaluation (optional), see details section for more information.

transform_sim

User function for transforming simulations before each criterion evaluation (optional), see details section for more information.

satisfy_par_const

User function for including constraints on estimated parameters (optional), see details section for more information.

var_to_simulate

(optional) List of variables for which the model wrapper must return results. By default the wrapper is asked to simulate only the observed variables. However, it may be useful to simulate also other variables, typically when transform_sim and/or transform_obs functions are used. Note however that it is active only if the model_function used handles this argument.

info_crit_func

Function (or list of functions) to compute information criteria. (optional, see default value in the function signature and here for more details about the list of proposed information criteria.). Values of the information criteria will be stored in the returned list. In case parameter selection is activated (i.e. if the argument candidate_param is defined (see details section)), the first information criterion given will be used. ONLY AVAILABLE FOR THE MOMENT FOR crit_function==crit_ols.

weight

Weights to use in the criterion to optimize. A function that takes in input a vector of observed values and the name of the corresponding variable and that must return either a single value for the weights for the given variable or a vector of values of length the length of the vector of observed values given in input.

step

(optional) List that describes the steps of the parameter estimation procedure (see details section). If NULL, a single default step will be created using the estim_param arguments

out_dir

Path to the directory where the optimization results will be written. (optional, default to getwd())

info_level

(optional) Integer that controls the level of information returned and stored by estim_param (in addition to the results automatically provided that depends on the method used). Higher code give more details.

  • 0 to add nothing,

  • 1 to add criterion and parameters values, and constraint if satisfy_par_const is provided, for each evaluation (element params_and_crit in the returned list),

  • 2 to add model results, after transformation if transform_sim is provided, and after intersection with observations, i.e. as used to compute the criterion for each evaluation (element sim_intersect in the returned list),

  • 3 to add observations, after transformation if transform_obs is provided, and after intersection with simulations, i.e. as used to compute the criterion for each evaluation (element obs_intersect in the returned list),

  • 4 to add all model wrapper results for each evaluation, and all transformations if transform_sim is provided. (elements sim and sim_transformed in the returned list).

var

[Deprecated] var is no longer supported, use var_to_simulate instead.

Value

prints, graphs and a list containing the results of the parameter estimation, which content depends on the method used and on the values of the info_level argument. All results are saved in the folder out_dir.

Details

Observation used

In CroptimizR, parameter estimation is based on the comparison between the values of the observed and simulated variables at corresponding dates. Only the situations, variables and dates common to both observations (provided in obs_list argument), and simulations returned by the wrapper used, will be taken into account in the parameter estimation procedure. In case where the value of an observed variable is NA for a given situation and date, it will not be taken into account. In case where the value of a simulated variable is NA (or Inf) for a given situation and date for which there is an observation, the optimized criterion will take the NA value, which may stop the procedure, and the user will be warned.

Parameter selection procedure (argument candidate_param)

If the candidate_param argument is given, a parameter selection procedure following Wallach et al. (2023) will be performed.

The candidate parameters are added one by one (in the given order) to the parameters that MUST be estimated (i.e. the one defined in param_info but not in candidate_param). Each time a new candidate is added:

  • the parameter estimation is performed and an information criterion is computed (see argument info_crit_func)

  • if the information criterion is inferior to all the ones obtained before, then the current candidate parameter is added to the list of parameters to estimate

The result includes a summary of all the steps (data.frame param_selection_steps).

For an example of this procedure, see the vignette Parameter selection with CroptimizR.

Transformation of simulations and observations (arguments transform_sim and transform_obs)

The optional argument transform_sim must be a function with 4 arguments:

  • model_results: the list of simulated results returned by the mode_wrapper used

  • obs_list: the list of observations as given to estim_param function

  • param_values: a named vector containing the current parameters values proposed by the estimation algorithm

  • model_options: the list of model options as given to estim_param function

It must return a list of simulated results (same format as this returned by the model wrapper used) that will be used to compute the criterion to optimize.

The optional argument transform_obs must be a function with 4 arguments:

  • model_results: the list of simulated results returned by the mode_wrapper used

  • obs_list: the list of observations as given to estim_param function

  • param_values: a named vector containing the current parameters values proposed by the estimation algorithm

  • model_options: the list of model options as given to estim_param function

It must return a list of observations (same format as obs_list argument) that will be used to compute the criterion to optimize.

Constraints on estimated parameters (argument satisfy_par_const)

The optional argument satisfy_par_const must be a function with 2 arguments:

  • param_values: a named vector containing the current parameters values proposed by the estimation algorithm

  • model_options: the list of model options as given to estim_param function

It must return a logical indicating if the parameters values satisfies the constraints (freely defined by the user in the function body) or not.

Model parameters to force (argument forced_param_values)

The optional argument forced_param_values may contain arithmetic expressions to automatically compute the values of some parameters in function of the values of parameters that are estimated (equality constraints). For that, forced_param_values must be a named list. Arithmetic expressions must be R expressions given under the shape of character strings. For example:

forced_param_values = list(p1=5, p2=7, p3="5*p5+p6")

will pass to the model wrapper the value 5 for parameter p1, 7 for parameter p2, and will dynamically compute the value of p3 in function of the values of parameters p5 and p6 iteratively provided by the parameter estimation algorithm. In this example, the parameters p5 and p6 must thus be part of the list of parameters to estimate, i.e. described in the param_info argument.

Multi-steps estimation procedure (argument step)

The argument step is a list of lists used to perform parameter estimation in multiple sequential steps. If provided, each step represents a separate stage in the estimation procedure, allowing different configurations for each step (e.g., different sets of parameters to estimate, different observed variables, different situations, etc.). When multiple steps are defined, the parameter values estimated in one step are used as fixed values in the subsequent step.

Each step is a named list that may contain any argument of the estim_param function (e.g. candidate_param, optim_options, ...). Only the arguments that differ from those given to estim_param need to be specified: any element not explicitly defined in a step inherits its value from the corresponding argument of estim_param.

When step is used, the set of parameters to estimate and observed variables to use usually differs between steps. For sake of simplicity, a single global param_info list can be provided to estim_param (containing bounds, etc. for all parameters that may ever be estimated), and each step specifies explicitly:

  • major_param: a vector containing the name of the parameters that must be estimated at this step,

  • candidate_param (optional): a vector containing the name of the parameters that are candidates for estimation at this step,

  • obs_var (optional): a vector containing the name of the observed variables to use at this step,

  • situation (optional): a vector containing the name of the situations to use at this step.

When step is not used (step = NULL), a single-step estimation is performed using the arguments of estim_param. In this case, the list of parameters to be estimated is automatically deduced from the param_info argument: all parameters defined in param_info are considered for estimation (possibly subject to selection if candidate_param is used).

Suppose the step argument is defined as follows:

step <- list()
step[[1]] <- list(
  major_param = c("p1"),
  candidate_param = c("p2"),
  obs_var = c("var1")
)
step[[2]] <- list(
  major_param = c("p3"),
  obs_var = c("var2")
)

In this case, the parameter estimation procedure will proceed in two steps:

  • Step 1: Parameter p1 is estimated, while p2 is included in a parameter selection procedure. Only observed variable var1 (from obs_list defined in argument of estim_param) is used.

  • Step 2: Parameter p3 is estimated, and only observed variable var2 is used. Parameters p1 (and possibly p2, if selected) are fixed at the values estimated in Step 1.

Technical information about parameters (bounds, default values, ...) can be provided once for all steps via the global param_info argument of estim_param.

The results of the parameter estimation procedure are stored in the folder out_dir, with a separate subfolder for each step.

See also

For more details and examples, see the different vignettes in CroptimizR website