Abstract

We estimate the parameters of 21 simulation models using 9 estimation algorithms to discover which one is better at matching simulations to data. Unfortunately no single algorithm is best at minimizing estimation error for all or even most the simulations; the best algorithm differs for each simulation, and sometimes for each parameter of each simulation. Cross-validation is necessary to pair each simulation’s parameter to its appropriate estimation algorithm. This is computationally feasible for reference table algorithms. In terms of confidence intervals the results are more clear: bootstrap generates more precise prediction intervals than either quantiles or Approximate Bayesian Computation.Agent-based models; Individual-based models; Estimation; Calibration; Approximate Bayesian Computation; Random Forest; Generalized Additive Model; Bootstrap;

Models take parameters as input to describe and predict a phenomenon. When we gather real data we can search for the input parameters that most likely generated it. This is difficult for simulation models whose likelihood function is often unknown or intractable. Fortunately, many likelihood-free estimation algorithms are available(Hartig et al. 2011). We would like to know if any of them estimate parameters better, particularly when data is high dimensional, simulating is computationally expensive and in the presence of identification issues.

In this paper we estimate the parameters of 21 simulations with nine different algorithms. We focus on “reference table” algorithms: rejection ABC (Approximate Bayesian Computation as in Beaumont, Zhang, and Balding 2002), regression-adjusted ABC and regression-only methods. We rank them by their estimation error and the quality of their confidence intervals. Superficially, there is a winner: GAM, generalized additive models, regressions (Hastie and Tibshirani 1986; Wood and Augustin 2002) produce most often the lowest estimation errors, the best confidence intervals and no identification failures. Even so, GAMs produce the best estimates for only 30% of the parameters. In fact, each of the algorithms tested is the best estimator in at least one instance.

Since best estimation algorithm changes between simulations and even between parameters of the same simulation, there is no alternative to test multiple algorithms against each parameter of a simulation. Fortunately this plays to the advantage of reference table algorithms since the data used to train one algorithm can be recycled to train all the others as well as rank them by cross-validation.

We define here a simulation model as any function that depends on a set of parameter \(\theta\) to generate a set of summary statistic \(S(\theta)\). We are interested in the estimation problem where we observe summary statistics \(S^*\) and we want to know which parameter \(\theta^*\) most likely generated them.

We parametrize 21 simulation models (described in section 2.1). We have nine candidate algorithms to do so (described in section 2.2). All are “reference table” algorithms: algorithms whose only input for estimation is a table of simulation parameters \(\theta\), selected by random sampling, and the summary statistics \(S(\theta)\) they generate. We split this reference table into training and testing sets and ask each algorithm to estimate the parameters of the testing set observing only the training portion of the reference table.

We measure algorithm performance with two indicators: predictivity and coverage. Predictivity (Salle and Yıldızoğlu 2014; modelling efficiency in Stow et al. 2009) is the out of sample mean square error when using a particular algorithm normalized by the mean square error of using the average parameter instead: \[\begin{equation} 1 - \frac{ \sum \left( \theta_i - \hat \theta_i \right)^2}{ \sum \left( \theta_i - \bar \theta \right)^2} \tag{2.1} \end{equation}\] Where \(\hat \theta\) is the estimated parameter, \(\theta\) is the real simulation parameter and \(\bar \theta\) is the average parameter value. Predictivity ranges from 1 (perfectly estimated) to 0 (unidentified) to negative values (misidentified).

We define coverage as in Raynal et al. (2018) as the percentage of times the real parameter falls within the 95% prediction intervals suggested by the estimating algorithm. The best coverage is 95%: higher generates type I errors, lower generates type II errors.

We estimate the parameters of 21 separate simulations, repeating some with different amount of data or summary statistics. We can roughly categorize the simulations into three groups: simple, ill posed and complicated. Table 2.1 lists them all.

Experiment | No. of parameters | No. of summary statistics | No. of simulations | Testing |
---|---|---|---|---|

\(\alpha\)-stable | 3 | 11 | 1,250 or 5,000 | 5-fold CV |

Birds ABM | 2 | 2 or 105 | 5,000 | 5-fold CV |

Broken Line | 1 | 10 | 1,250 or 5,000 | 5-fold CV |

Coalescence | 2 | 7 | 100,000 | Single testing set |

Earthworm | 11 | 160 | 100,000 | Single testing set |

\(g\)-and-\(k\) distribution | 4 | 11 | 1,250 or 5,000 | 5-fold CV |

Hierarchical Normal Mean | 2 | 61 | 1,250 or 5,000 | 5-fold CV |

Lotke-Volterra | 2 | 16 (noisy or non-noisy) | 100,000 | Single testing set |

Locally Identifiable | 2 | 2 | 1,250 or 5,000 | 5-fold CV |

Moving Average (2) | 2 | 2 | 1,250 or 5,000 | 5-fold CV |

Median and MAD | 2 | 2 or 4 | 1,250 or 5,000 | 5-fold CV |

\(\mu\)-\(\sigma^2\) | 2 | 2 | 10,000 | 5-fold CV |

Normal 25 | 2 | 25 | 1,250 or 5,000 | 5-fold CV |

Scale | 2 | 1 | 1,250 or 5,000 | 5-fold CV |

Unidentifiable | 2 | 1 | 1,250 or 5,000 | 5-fold CV |

Partially Identifiable | 2 | 2 | 1,250 or 5,000 | 5-fold CV |

Real Business Cycle | 6 | 44 or 48 | 2,944 or 2,961 | 5-fold CV |

Pathogen | 4 | 11 | 200,000 | Single testing set |

Toy Model | 2 | 2 | 1,250 or 5,000 | 5-fold CV |

Ecological Traits | 4 | 4 | 1,250 or 5,000 | 5-fold CV |

Wilkinson | 1 | 1 | 1,250 or 5,000 | 5-fold CV |

Simple simulations, few parameters and summary statistics, feature prominently in the ABC literature both as a teaching tool and to compare different techniques. They are useful because they run quickly but they may bias comparisons towards simpler estimation algorithms. We compute predictivity and coverage for all the experiments in this section by 5-fold cross-validation: keeping one fifth of the data out of sample, using the remaining portion to train our algorithms and doing this five times, rotating each time the portion of data used for testing. We run all the experiments in this section twice: once the total data is 1,250 simulation runs and once it is 5,000 simulation runs.

*\(\alpha\)-stable*: Rubio and Johansen (2013) uses ABC to recover the parameters of an \(\alpha\)-stable distribution by looking at sample of 1096 independent observations from it. We replicate this here using the original priors for the three parameters (\(\alpha \sim U(1,2)\), \(\mu \sim U(-0.1,0.1)\), \(\sigma \sim U(0.0035,0.0125)\)). We use 11 summary statistics representing the 0%,10%,\(\dots\),100% deciles of each sample generated.

*\(g\)-and-\(k\) distribution*: Karabatsos and Leisen (2017) uses ABC to estimate the parameters of the g-and-k distribution (an extension of the normal distribution whose density function has no analytical expression). We replicate this here using the `gk`

package in R (Prangle 2017).
We want to retrieve the 4 parameters of the distribution \(A,B,g,k \sim U[0,10]\) given the 11 deciles (0%,10%,…,100%) of a sample of 1,000 observations from that distribution.

*Normal 25*: Sometimes sufficient summary statistics exist but the modeller may miss them and use others of lower quality.
In this example 25 i.i.d observations from the same normal distribution \(\sim N(\mu,\sigma^2)| \mu \sim U(-5,5); \sigma \sim U(1,10)\) are used directly as summary statistics to retrieve the two distribution parameters \(\mu,\sigma^2\).

*Moving Average(2)*: Creel (2017) used neural networks to recover the parameters of the MA(2) process with \(\beta_1 \sim U(-2,2); \beta_2\sim U(-1,1)\). We observe generated a time series of size 100 and we summarise it with the coefficients a AR(10) regression.

*Median and MAD*: As a simple experiment we sample 100 observations from a normal distribution \(\mu \sim U(-5,5)\) and \(\sigma \sim U(0.1,10)\) and we collect as summary statistics their median and median absolute deviation, using them to retrieve the original distributions.
We run this experiment twice, the second time adding two useless summary statistics \(S_3 \sim N(3,1)\) and \(S_4 \sim N(100,.01)\).

*\(\mu\)-\(\sigma^2\)*: The `abc`

package in R (Csilléry, François, and Blum 2012) provides a simple dataset example connecting two observed statistics: “mean”" and “variance” as" generated by the parameters \(\mu\) and \(\sigma^2\). The posterior that connects the two derives from the Iris setosa observation (Anderson 1935).
The data set contains 10,000 observations and we log-transform \(\sigma^2\) when estimating.

*Toy Model*: A simple toy model suggested by the `EasyABC`

R package(Jabot, Faure, and Dumoulin 2013) involves retrieving two parameters, \(a \sim U[0,1]; b \sim U[1,2]\), observing two summary statistics \(S_1 = a + b + \epsilon_1 ; S_2 = a b +\epsilon_2 | \epsilon_1,\epsilon_2 \sim N(0,.1^2)\).

*Ecological Traits*: The `EasyABC`

R package(Jabot, Faure, and Dumoulin 2013) provides a replication of Jabot (2010), a trait-based ecological simulator. Here we fix the number of individuals to 500 and the number of traits to 1, leaving four free parameters: \(I \sim U(3,5),A\sim U(0.1,5),h\sim U(-25,125),\sigma\sim U(0.5,25)\). We want to estimate these with four summary statistics: richness of community \(S\), shannon index \(H\), mean and skewness of traiv values in the community.

*Wilkinson*: Wilkinson (2013) suggested a simple toy model with one parameter, \(\theta \sim U(-10,10)\), and one summary statistic \(S_1 \sim N(2 (\theta + 2) \theta(\theta-2), 0.1 + \theta^2)\).
We run this experiment twice, once where the total data is 1,250 sets of summary statistics and one where the total data is 5,000 sets of summary statistics.

We often face identification issues: the inability to recover parameters given the information we have. Because these issues take many forms, we produce a series of experiments to test each. As with simple simulations, we test all the experiments with 5-fold cross validation and run each twice: once where the total reference table has 1,250 total rows, and once where it has 5,000.

Ideally two behaviours should emerge from an estimation algorithm under these circumstances. First we would like to maximize the quality of our estimated parameters when the information is noisy (the lesser problem of “weak” identification). Second we would like our estimation algorithm to recognize when the model cannot be identified and not be fooled into still producing an arbitrary estimate and a small confidence interval around it.

*Broken Line*: we observe 10 summary statistics \(S=(S_0,\dots,S_9)\) generated by:
\[
S_i=\left\{\begin{matrix}
\epsilon & i < 5\\
\beta i + \epsilon & i\geq5
\end{matrix}\right. \tag{2.2}
\]
and where \(\beta \sim U(0,2)\)

*Hierarchical Normal Mean*: Raynal et al. (2018) compares ABC to direct random forest estimation in a “toy” hierarchical normal mean model:
\[
\left.\begin{matrix}
y_i |\theta_1,\theta_2 \sim N(\theta_1,\theta_2) \\
\theta_1|\theta_2 \sim N(0,\theta_2) \\
\theta_2 \sim IG(\kappa,\lambda)
\end{matrix}\right. \tag{2.3}
\]
Where \(IG(\cdot)\) is the inverse gamma distribution. We want to estimate \(\theta_1,\theta_2\) given a sampled vecor \(y\) of size 10 which is described by 61 summary statistics: the mean, the variance, the median absolute deviation of the sample, all possible combinations of their products and sums as well as 50 noise summary statistics \(\sim U(0,1)\).

*Locally Identifiable*: macroeconomics often deals with structural models that are only locally identifiable (see Fernández-Villaverde, Rubio Ramírez, and Schorfheide 2016). These are models where the true parameter is only present in the data for some of its possible values. Here we use the example:
\[
S_i=\left\{\begin{matrix}
y \sim N(\theta_1,\theta_2) & \theta_1>2, \theta_2>2\\
y \sim N(0,1) & \text{Otherwise}
\end{matrix}\right. \tag{2.4}
\]
Where \(\theta_1,\theta_2 \sim U[0.1,5]\), each simulation we sample the vector \(y\) of size 100 and we collect its mean and standard deviation as summary statistics.

*Scale*: a common source of under-identification in economics occurs when “when two structural parameters enter the objective function only proportionally, making them separately unrecoverable”(Canova and Sala 2009).
In this example, two people of weight \(w_1,w_2\sim U[80,150]\) step together on a scale whose reading \(S_1 = w_1 + w_2 + \epsilon | \epsilon \sim N(0,1)\) is the only summary statistic we can use.
This problem is locally identifiable to an extent: very low readings means both people are light (and viceversa).

*Unidentifiable*: in some cases the model parameters are just unrecoverable and we hope that our estimation algorithm does not tell us otherwise.
In this example the three summary statistics \(S_1,S_2,S_3 \sim N(x,1)| x \sim U[0,50]\) provide no information regarding the two parameters we are interested in: \(\mu\sim U(0,50), \sigma \sim U(0,25)\).

*Partially Identifiable*: Fernández-Villaverde, Rubio Ramírez, and Schorfheide (2016) mention how partial identification can occur when a model is the real data generating process conditional on some other unobserved parameter. This makes the model identifiable in some samples but not others. The example we use is a slight modification of the original: we try to retrieve parameter \(\theta \sim U[1,5]\) when we observe mean and standard deviation of a size 10 vector \(y\) generated as follows:
\[
y \sim N(\theta\cdot x,1), ~
x=\left\{\begin{matrix}
0 & \text{with probability } \frac 1 2\\
\sim N(1,1) & \text{Otherwise}
\end{matrix}\right. \tag{2.5}
\]

Simulations, particularly agent-based models, tend to be large. They involve many input parameters and summary statistics. Inference in a high-dimensional space is difficult not just for ABC methods but for non-parametric smoothing as well (see section 4.5 in Wasserman 2006). In principle one could solve this by just simulating more data but complicated simulation models tend also to be slow. This puts a premium on the quality of the algorithms to extrapolate from the data they have.

*Birds ABM*: Thiele, Kurth, and Grimm (2014) estimated the parameters of a simple agent-based bird population model (originally in Railsback and Grimm (2012)) with ABC. Their paper provided an open source NETLOGO implementation of the model. The model depends on two parameters: `scout-prob`

\(\sim U[0,0.5]\) and `survival-prob`

\(\sim U[0.95,1]\). We ran this experiment twice, once where there are only 2 summary statistics: mean abundance and mean variation over 20 years, and one where are 105 (comprising the average, last value, standard deviation, range and the coefficients of fitting an AR(5) regression to the time series of abundance, variation, months spent foraging and average age within bird population).
This experiment is useful because in the original specification (with 2 summary statistics) the `scout-prob`

parameter is unidentifiable.
For each experiment we ran the model 5000 times.

*Coalescence*: the `abctools`

package (Nunes and Prangle 2015) provides 100,000 observations of 7 summary statistics from a DNA coalescent model depending on two parameters \(\theta \sim u[2,10]\) and \(\rho \sim U[0,10]\). Blum et al. (2013) in particular used this dataset to compare the quality of ABC dimensionality reduction schemes to better estimate the two parameters.
This data-set is too big for cross-validation so in this experiment we simply used 1,250 observation as the testing data-set and the rest for training.

*Lotke-Volterra*: Toni et al. (2009) showcases SMC-ABC with a 2 species deterministic Lotke-Volterra model with 2 parameters: \(a,b\).
\[
\left\{\begin{matrix}
\frac{dx}{dt} = ax - yx \\
\frac{dy}{dt} = bxy - y
\end{matrix}\right.
\]
Here we assume \(a,b \sim U(0,10)\) (avoiding the negative values in the original paper). For each simulation we sample 8 observations for predator and prey at time \(t=1,1.2, 2.4, 3.9, 5.7, 7.5, 9.6, 11.9, 14.5\) (as in the original paper).
We run this experiment twice, once where data is observed perfectly and one where to each observation we add noise \(\sim N(0,0.5)\).
In both experiments we do not perform 5-fold cross validation, rather we generate 100,000 sets of summary statistics for training and another 1,250 sets of summary statistics to test the parametrization.

*Real Business Cycle*: we want to parametrize the default Real Business Cycle model (a simple but outdated class of macro-economics models) implemented in the `gEcon`

R package(Klima, Podemski, and Retkiewicz-Wijtiwiak 2018).
It has 6 parameters (\(\beta,\delta,\eta,\mu,\phi,\sigma\)) and we try to parametrize them in two separate experiments.
In the first, we use as summary statistics the -10,+10 cross-correlation table between output \(Y\), consumption \(C\), investment \(I\), interest rates \(r\) and employment \(L\) (44 summary statistics in total). For this experiment we have 2,944 distinct observations.
In the second experiment we follow Carrella, Bailey, and Madsen (2018) using as summary statistics (i) coefficients of regressing \(Y\) on \(Y_{t-1},I_{t},I_{t-1}\), (ii) coefficients of regressing \(Y\) on \(Y_{t-1},C_{t},C_{t-1}\), (iii) coefficients of regressing \(Y\) on \(Y_{t-1},r_{t},r_{t-1}\), (iv) coefficients of regressing \(Y\) on \(Y_{t-1},L_{t},L_{t-1}\), (v) coefficients of regressing \(Y\) on \(C,r\) (vi) coefficients of fitting AR(5) on \(Y\), (vii) the (lower triangular) covariance matrix of \(Y,I,C,r,L\). 48 summary statistics in total. For this experiment we have 2,961 distinct observations.

*Pathogen*: another dataset used by Blum et al. (2013) to test dimensionality reduction methods for ABC concerns the ability to predict pathogens’ fitness changes due to antibiotic resistance (the original model and data is from Francis et al. 2009). The model has four free parameters and 11 summary statistics.
While the original data-set contains 1,000,000 separate observations, we only sample 200,000 at random for training the algorithms and 1,250 more for testing.

*Earthworm*: Vaart et al. (2015) calibrated an agent-based model of earthworms with rejection ABC. The simplified version of the model contains 11 parameters and 160 summary statistics. The original paper already carried out cross-validation proving under-identification: the model contains a mixture of unidentified, weakly identified and well identified parameters.
We use 100,000 runs from the original paper, setting 1,250 aside for out of sample testing and using the rest for training.

We test nine algorithms to parametrize simulations: five are ABC and four are regressions-only. In this paper we focused exclusively on reference table algorithms. We thus ignored methods that combine search and estimation, such as synthetic likelihood (Wood 2010; Fasiolo and Wood 2014), ABC-MCMC (Hartig et al. 2011) and Bayesian optimization (Snoek, Larochelle, and Adams 2012). We also ignored regression-only techniques that do not generate prediction intervals such as the deep neural networks proposed in Creel (2017) and the elastic nets proposed in Carrella, Bailey, and Madsen (2018).

The first algorithm we use is the simple rejection ABC (Pritchard et al. 1999; Beaumont, Zhang, and Balding 2002). Start by ranking all training observations by their euclidean distance to the testing summary statistics \(\left( \sum_i S_i(\theta) - S_i(\theta^*) \right)^2\). Ignore all training observations except the closest 10%. Use the \(\theta\) parameters of the accepted observations to generate the posterior estimate for the testing parameter \(\theta^*\).

The second algorithm is the local-linear regression adjusted ABC (Beaumont, Zhang, and Balding 2002). Weigh all training observations by an Epanechnikov kernel with bandwidth equal to euclidean the distance between the testing summary statistics \(S(\theta^*)\) and the furthest \(S(\theta)\) we would have accepted using simple rejection. Then run a local-linear regression on the weighted training set to predict \(E[\theta |S(\theta)]\) and use the residuals of that regression to estimate the posterior distribution for the testing parameter \(\theta^*\).

The third algorithm, neural network ABC, feeds the same weighted training set to a feed forward neural network (Blum and Francois 2010). The approach is similar to the local-linear regression above but the residuals are also weighted by a second regression (on the log squared residuals) to correct for heteroskedasticity.

These three algorithms are implemented in the `abc`

package(Csilléry, François, and Blum 2012) in R. We used the package default settings for its neural networks (10 networks, 5 units in the hidden layer and weight decay randomly chosen for each network between \(0.0001\),\(0.001\) and \(0.01\)).

The fourth and fifth algorithm are semi-automatic ABC methods which “pre-process” summary statistics before applying rejection ABC(Prangle et al. 2014).
More precisely, the original summary statistics \(S(\theta)\) are fed into a set linear regressions estimating \(r_i=E[\theta_i|S(\theta)]\) (one for each parameter of the model) and then regression fitted values are used as summary statistics for the simple rejection ABC.
The rationale is that these regressions will project the summary statistics into a space where rejection ABC performs better.
We do this in two different ways here: by running first or fourth degree linear regressions in the pre-processing phase.
This is done using the R package `abctools`

(Nunes and Prangle 2015) and their default parameters: using half of the training set to run the regression and the other half to run the rejection ABC.

A feature of all ABC methods is that they are local: they remove or weight training observations differently depending on the \(\theta^*\) we want to estimate. This means that during cross-validation we need to retrain each ABC for each row of the testing set.

Estimating parameters by regression is a straightforward process. We build a separate regression \(r\) for each \(\theta\) in the training set as dependent variable using the summary statistics \(S(\theta)\) as the independent variables. We then plug in these regressions the testing summary statistic \(S(\theta^*)\) to predict the testing parameter \(\theta^*\). When a simulation depends on multiple parameters we build a separate regression for each parameter.

The simplest algorithm of this class is linear regression of degree one. It is linear, its output is understandable and is fast to compute. This speed allows us to estimate the prediction interval of \(\theta^*\) by resampling bootstrap(Davison and Hinkley 1997): we produce 200 training data sets by resampling with replacement from the original one and run the same linear regression on on each new data set. From each regression \(i\) we collect their prediction \(\beta_i S(\theta^*)\) and sample one standardized residual \(e\) (a residual divided by the square root of one minus the hat value associated with that residual). This produces a set of 200 \(\beta_i S(\theta) + e_i\). The 95% prediction interval is then defined by 2.5 and 97.5 percentile of this set.

A more complex algorithm that is not linear but still additive is the generalized additive model(GAM), where we regress:
\[ \theta = \sum s_i(S_i(\theta)) \]
\(s_i\) is a smooth spline transformation (see chapter 9 in Hastie, Tibshirani, and Friedman 2009; also Wood and Augustin 2002).
We use the `mgcv`

R package(Wood 2017, 2004).
The boostrap prediction intervals techniques we used for linear regressions is too computationally expensive to replicate with GAMs.
Instead we produce prediction intervals by assuming normal standard errors (generated by the regression itself) and by resampling residuals directly: we generate 10,000 draws of \(z(S(\theta))+\epsilon\) where \(z\) is normally distributed with standard deviation equal to regression’s standard error at \(S(\theta)\) and \(\epsilon\) is a randomly drawn residual of the original regression.

The 95% prediction interval for \(\theta^*\) is then defined by 2.5 and 97.5 percentile of the generated \(z(S(\theta^*))+\epsilon\) set.

A completely non-parametric regression advocated in Raynal et al. (2018) is the random forest(Breiman 2001).
We implement this in two ways here.
First, as a quantile random forest (Meinshausen 2006), using in `quantregForest`

R package (Meinshausen 2017); prediction intervals for any simulation parameter \(\theta^*\) are the predicted 2.5 and 97.5 quantile at \(S(\theta^*)\).
Second, as a regression random forest using the `ranger`

and `caret`

packages in R (Wright and Ziegler 2015; Kuhn 2008).
For this method we generate prediction intervals as in GAM regressions.
We generate 10,000 draws of \(z(S(\theta))+\epsilon\) where \(z\) is normally distributed with standard deviation equal to the infinitesimal jackknife standard error(Wager, Hastie, and Efron 2014) at \(S(\theta)\) and \(\epsilon\) is a resampled residual; we then take the 2.5 and 97.5 percentile of the \(z(S(\theta^*))+\epsilon\) set as our prediction interval.

Table 3.1 summarises the predictivity of each algorithm across all identifiable estimation problems (here defined as those where at least one algorithm achieves predictivity of 0.05 or above). Estimation by GAM achieves the highest average predictivity and the lowest regret (average distance between its predictivity and the highest predictivity in each simulation). Even so, GAM has the best predictivity only for 30 out of a total of 93 identifiable parameters.

Non-linear methods like Random Forest and Neural Network ABC have high average predictivity and together they account for the top performance of another 35 parameters. Local-linear ABC is on average a very effective algorithm but its average predictivity is penalized by very large errors in a few experiments (in particular the “Hierarchical Normal Mean”).

Algorithm | # of times highest predictivity | Regret | Average predictivity |
---|---|---|---|

Rejection ABC | 2 | -0.433 | 0.348 |

Semi-automatic ABC 4D | 2 | -0.207 | 0.476 |

Semi-automatic ABC 1D | 1 | -0.297 | 0.424 |

Local-linear ABC | 15 | < -10 | < -10 |

Neural Network ABC | 12 | -0.120 | 0.556 |

Linear Regression | 8 | -0.248 | 0.479 |

GAM | 30 | -0.072 | 0.577 |

Quantile Random Forest | 3 | -0.187 | 0.547 |

Regression Random Forest | 20 | -0.117 | 0.571 |

Table 3.2 lists the number of identification failures: parameters for which at least another algorithm achieved predictivity of .1 but the algorithm in the table achieved predictivity below .05. Most parameters are either identified by all the algorithms or none of them. Local-linear regression struggled with the “natural mean hierarchy” simulation. Linear regression failed to estimate the \(b\) parameter from the Lotka-Volterra models, the \(\sigma\) parameter from the normal distribution and the \(A\) parameter from the ecological traits model. Random forests failed to identify \(\mu\) and \(\delta\) from the RBC macroeconomic model.

Algorithm | Identification Failures |
---|---|

Rejection ABC | 11 |

Semi-automatic ABC 4D | 1 |

Semi-automatic ABC 1D | 1 |

Local-linear ABC | 8 |

Neural Network ABC | 0 |

Linear Regression | 6 |

GAM | 0 |

Quantile Random Forest | 3 |

Regression Random Forest | 2 |

Figure compares algorithms pairwise with respect to their predictivity. Even the “best” algorithm, GAM, has lower predictivity for 30-40% of the parameters against to neural network ABC, local-linear regression ABC and random forests. Linear regression of degree one wins about half of the comparisons against any ABC.