Model quality assessment — predictor

Provide several metrics to assess the quality of the predictions of a model (see note) against observations.

n_obs(obs)

mean_obs(obs, na.rm = TRUE)

mean_sim(sim, na.rm = TRUE)

sd_obs(obs, na.rm = TRUE)

sd_sim(sim, na.rm = TRUE)

CV_obs(obs, na.rm = TRUE)

CV_sim(sim, na.rm = TRUE)

r_means(sim, obs, na.rm = TRUE)

R2(sim, obs, na.action = stats::na.omit)

SS_res(sim, obs, na.rm = TRUE)

Inter(sim, obs, na.action = stats::na.omit)

Slope(sim, obs, na.action = stats::na.omit)

RMSE(sim, obs, na.rm = TRUE)

RMSEs(sim, obs, na.rm = TRUE)

RMSEu(sim, obs, na.rm = TRUE)

nRMSE(sim, obs, na.rm = TRUE)

rRMSE(sim, obs, na.rm = TRUE)

rRMSEs(sim, obs, na.rm = TRUE)

rRMSEu(sim, obs, na.rm = TRUE)

pMSEs(sim, obs, na.rm = TRUE)

pMSEu(sim, obs, na.rm = TRUE)

Bias2(sim, obs, na.rm = TRUE)

SDSD(sim, obs, na.rm = TRUE)

LCS(sim, obs, na.rm = TRUE)

rbias2(sim, obs, na.rm = TRUE)

rSDSD(sim, obs, na.rm = TRUE)

rLCS(sim, obs, na.rm = TRUE)

MAE(sim, obs, na.rm = TRUE)

ABS(sim, obs, na.rm = TRUE)

MSE(sim, obs, na.rm = TRUE)

EF(sim, obs, na.rm = TRUE)

NSE(sim, obs, na.rm = TRUE)

Bias(sim, obs, na.rm = TRUE)

MAPE(sim, obs, na.rm = TRUE)

FVU(sim, obs, na.rm = TRUE)

RME(sim, obs, na.rm = TRUE)

tSTUD(sim, obs, na.rm = TRUE)

tLimit(sim, obs, risk = 0.05, na.rm = TRUE)

Decision(sim, obs, risk = 0.05, na.rm = TRUE)

Arguments

obs: Observed values
na.rm: Boolean. Remove NA values if TRUE (default)
sim: Simulated values
na.action: A function which indicates what should happen when the data contain NAs.
risk: Risk of the statistical test

Value

A statistic depending on the function used.

Details

The statistics for model quality can differ between sources. Here is a short description of each statistic and its equation (see html version for LATEX):

n_obs(): Number of observations.
mean_obs(): Mean of observed values
mean_sim(): Mean of simulated values
sd_obs(): Standard deviation of observed values
sd_sim(): standard deviation of simulated values
CV_obs(): Coefficient of variation of observed values
CV_sim(): Coefficient of variation of simulated values
r_means(): Ratio between mean simulated values and mean observed values (%), computed as : $$r\_means = \frac{100*\frac{\sum_1^n(\hat{y_i})}{n}} {\frac{\sum_1^n(y_i)}{n}}$$
R2(): coefficient of determination, computed using stats::lm() on obs~sim.
SS_res(): residual sum of squares (see notes).
Inter(): Intercept of regression line, computed using stats::lm() on sim~obs.
Slope(): Slope of regression line, computed using stats::lm() on sim~obs.
RMSE(): Root Mean Squared Error, computed as $$RMSE = \sqrt{\frac{\sum_1^n(\hat{y_i}-y_i)^2}{n}}$$ RMSE = sqrt(mean((sim-obs)^2)
RMSEs(): Systematic Root Mean Squared Error, computed as $$RMSEs = \sqrt{\frac{\sum_1^n(\sim{y_i}-y_i)^2}{n}}$$ RMSEs = sqrt(mean((fitted.values(lm(formula=sim~obs))-obs)^2)
RMSEu(): Unsystematic Root Mean Squared Error, computed as $$RMSEu = \sqrt{\frac{\sum_1^n(\sim{y_i}-\hat{y_i})^2}{n}}$$ RMSEu = sqrt(mean((fitted.values(lm(formula=sim~obs))-sim)^2)
NSE(): Nash-Sutcliffe Efficiency, alias of EF, provided for user convenience.
nRMSE(): Normalized Root Mean Squared Error, also denoted as CV(RMSE), and computed as: $$nRMSE = \frac{RMSE}{\bar{y}}\cdot100$$ nRMSE = (RMSE/mean(obs))*100
rRMSE(): Relative Root Mean Squared Error, computed as: $$rRMSE = \frac{RMSE}{\bar{y}}$$
rRMSEs(): Relative Systematic Root Mean Squared Error, computed as $$rRMSEs = \frac{RMSEs}{\bar{y}}$$
rRMSEu(): Relative Unsystematic Root Mean Squared Error, computed as $$rRMSEu = \frac{RMSEu}{\bar{y}}$$
pMSEs(): Proportion of Systematic Mean Squared Error in Mean Square Error, computed as: $$pMSEs = \frac{MSEs}{MSE}$$
pMSEu(): Proportion of Unsystematic Mean Squared Error in MEan Square Error, computed as: $$pMSEu = \frac{MSEu}{MSE}$$
Bias2(): Bias squared (1st term of Kobayashi and Salam (2000) MSE decomposition): $$Bias2 = Bias^2$$
SDSD(): Difference between sd_obs and sd_sim squared (2nd term of Kobayashi and Salam (2000) MSE decomposition), computed as: $$SDSD = (sd\_obs-sd\_sim)^2$$
LCS(): Correlation between observed and simulated values (3rd term of Kobayashi and Salam (2000) MSE decomposition), computed as: $$LCS = 2*sd\_obs*sd\_sim*(1-r)$$
rbias2(): Relative bias squared, computed as: $$rbias2 = \frac{Bias^2}{\bar{y}^2}$$ rbias2 = Bias^2/mean(obs)^2
rSDSD(): Relative difference between sd_obs and sd_sim squared, computed as: $$rSDSD = \frac{SDSD}{\bar{y}^2}$$
rLCS(): Relative correlation between observed and simulated values, computed as: $$rLCS = \frac{LCS}{\bar{y}^2}$$
MAE(): Mean Absolute Error, computed as: $$MAE = \frac{\sum_1^n(\left|\hat{y_i}-y_i\right|)}{n}$$ MAE = mean(abs(sim-obs))
ABS(): Mean Absolute Bias, which is an alias of MAE()
FVU(): Fraction of variance unexplained, computed as: $$FVU = \frac{SS_{res}}{SS_{tot}}$$
MSE(): Mean squared Error, computed as: $$MSE = \frac{1}{n}\sum_{i=1}^n(Y_i-\hat{Y_i})^2$$ MSE = mean((sim-obs)^2)
EF(): Model efficiency, also called Nash-Sutcliffe efficiency (NSE). This statistic is related to the FVU as $EF= 1-FVU$. It is also related to the $R^2$ because they share the same equation, except SStot is applied relative to the identity function (i.e. 1:1 line) instead of the regression line. It is computed as: $$EF = 1-\frac{SS_{res}}{SS_{tot}}$$
Bias(): Modelling bias, simply computed as: $$Bias = \frac{\sum_1^n(\hat{y_i}-y_i)}{n}$$ Bias = mean(sim-obs)
MAPE(): Mean Absolute Percent Error, computed as: $$MAPE = \frac{\sum_1^n(\frac{\left|\hat{y_i}-y_i\right|} {y_i})}{n}$$
RME(): Relative mean error, computed as: $$RME = \frac{\sum_1^n(\frac{\hat{y_i}-y_i}{y_i})}{n}$$ RME = mean((sim-obs)/obs)
tSTUD(): T student test of the mean difference, computed as: $$tSTUD = \frac{Bias}{\sqrt(\frac{var(M)}{n_obs})}$$ tSTUD = Bias/sqrt(var(M)/n_obs)
tLimit(): T student threshold, computed using qt(): $$tLimit = qt(1-\frac{\alpha}{2},df=length(obs)-1)$$ tLimit = qt(1-risk/2,df =length(obs)-1)
Decision(): Decision of the t student test of the mean difference (can bias be considered statistically not different from 0 at alpha level 0.05, i.e. 5% probability of erroneously rejecting this hypothesis?), computed as: $$Decision = abs(tSTUD ) < tLimit$$

Note

$SS_{res}$ is the residual sum of squares and $SS_{tot}$ the total sum of squares. They are computed as: $$SS_{res} = \sum_{i=1}^n (y_i - \hat{y_i})^2$$ SS_res= sum((obs-sim)^2) $$SS_{tot} = \sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)^2$$ SS_tot= sum((obs-mean(obs))^2 Also, it should be noted that $y_i$ refers to the observed values and $\hat{y_i}$ to the predicted values, $\bar{y}$ to the mean value of observations and $\sim{y_i}$ to values predicted by linear regression.

Examples

if (FALSE) {
sim <- rnorm(n = 5, mean = 1, sd = 1)
obs <- rnorm(n = 5, mean = 1, sd = 1)
RMSE(sim, obs)
}