Estimating and cross-validating the Schelling segregation model with freelunch

18 February 2021

A new R-package to estimate simulation models!

So I’ve been working on yet another revision of a paper on estimation of agent-based models and I thought it would be useful to just extract all the estimation commands to make them available in a R package for others to use.
So here is the freelunch package.

It focuses only on “reference tables” methods:

  1. Run the model a bunch of times feeding it random parameters
  2. Collect for each set of parameters you put in, a set of summary statistics describing what the model outputs
  3. Try to generalize this data-set of input-output simulations via a regression/run an Approximate Bayesian Computaiton
  4. Use the generalization to estimate the REAL parameters observing REAL summary statistics.

It’s an extremely inefficient way of estimating models compared to search-based methods but it is has also many advantages:

  1. It produces decent confidence intervals
  2. It is quickly testable (figure out how good the estimates/CI are)
  3. It’s all “post-processing”: all it needs is a csv files with a bunch of random runs, no need to mess around with NETLOGO or whatever else you are using to run the model

Here I thought I’d showcase it by estimating a four-populations, three-parameters Schelling model I nicked from COMSES.

Installation guide

Grab it straight from github, beware that the installation may take a while (since this package really just provides an interface to call methods from better packages):

##Might need to uncomment this line on windows/mac
##if you don't build packages from sources:
##Sys.setenv("R_REMOTES_NO_ERRORS_FROM_WARNINGS" = "true")

devtools::install_github("carrknight/freelunch")

Estimating Schelling with ONE summary statistics

We are studying the housing spatial distribution of 4 racial groups within a city and we notice a steady trend towards segregation. We think the Schelling segregation model could provide a good explanation for it so we want to estimate its parameters given the data we have.
Unfortunately the only data we have is that “similarity” (% of people of the same race being neighbors, i.e. segregation) has increased steadily, at a linear trend of about .1 per month.

So what we do is that we take a NETLOGO version of the famous Schelling segregation model and just run it a bunch of times with random parameters.
This package comes with those runs already done so let’s load them!

library(tidyverse)
library(ggplot2)
library(freelunch)

data("shelling")
glimpse(shelling)
#> Rows: 5,000
#> Columns: 81
#> $ random.seed                     <dbl> 19434.69, 68105.88, 58719.91, 43160.4…
#> $ density                         <dbl> 72, 94, 55, 85, 52, 65, 95, 86, 85, 8…
#> $ X..similar.wanted               <dbl> 30, 35, 27, 61, 43, 38, 73, 39, 28, 3…
#> $ radiusNeighborhood              <dbl> 4, 5, 3, 3, 3, 4, 5, 2, 4, 4, 1, 3, 1…
#> $ mean.percent.similar            <dbl> 63.76226, 42.62576, 65.66402, 66.1955…
#> $ mean.percent.similar.Red        <dbl> 70.01694, 59.62370, 72.47819, 84.8666…
#> $ mean.percent.similar.Blue       <dbl> 54.17323, 10.57785, 52.76714, 17.5342…
#> $ mean.percent.unhappy            <dbl> 9.3662493, 37.7464349, 3.0284300, 47.…
#> $ mean.percent.unhappy.Red        <dbl> 7.911612e-02, 5.548658e-03, 5.432130e…
#> $ mean.percent.unhappy.Blue       <dbl> 32.549750, 100.000000, 13.306136, 99.…
#> $ mean.percent.clustering.Red     <dbl> 1.218620, 1.014474, 1.218561, 1.41699…
#> $ min.percent.similar             <dbl> 40.15779, 41.02683, 41.61093, 41.6127…
#> $ min.percent.similar.Red         <dbl> 57.89020, 58.80828, 59.02070, 59.7649…
#> $ min.percent.similar.Blue        <dbl> 10.181818, 9.719713, 8.983218, 9.2171…
#> $ min.percent.unhappy             <dbl> 2.3516836, 36.9734151, 0.0000000, 40.…
#> $ min.percent.unhappy.Red         <dbl> 0.0000000, 0.0000000, 0.0000000, 15.5…
#> $ min.percent.unhappy.Blue        <dbl> 10.101010, 100.000000, 0.000000, 99.0…
#> $ min.percent.clustering.Red      <dbl> 1.0075588, 1.0006003, 0.9923031, 0.99…
#> $ last.percent.similar            <dbl> 70.50973, 42.72531, 68.04761, 72.2500…
#> $ last.percent.similar.Red        <dbl> 73.86879, 59.66695, 73.59369, 88.1371…
#> $ last.percent.similar.Blue       <dbl> 70.97535, 10.76825, 60.23023, 20.7817…
#> $ last.percent.unhappy            <dbl> 2.3516836, 37.5051125, 0.0000000, 40.…
#> $ last.percent.unhappy.Red        <dbl> 0.0000000, 0.0000000, 0.0000000, 15.7…
#> $ last.percent.unhappy.Blue       <dbl> 10.101010, 100.000000, 0.000000, 100.…
#> $ last.percent.clustering.Red     <dbl> 1.285661, 1.015210, 1.237316, 1.47160…
#> $ sd.percent.similar              <dbl> 7.8630897, 0.3545979, 4.6797985, 4.60…
#> $ sd.percent.similar.Red          <dbl> 4.3909583, 0.1642589, 2.3431204, 3.90…
#> $ sd.percent.similar.Blue         <dbl> 19.8695685, 0.3512378, 13.9403901, 2.…
#> $ sd.percent.unhappy              <dbl> 8.3924392, 0.6793271, 5.7913405, 4.39…
#> $ sd.percent.unhappy.Red          <dbl> 0.133751471, 0.027814485, 0.172597010…
#> $ sd.percent.unhappy.Blue         <dbl> 26.86042555, 0.00000000, 25.05878194,…
#> $ sd.percent.clustering.Red       <dbl> 0.076423098, 0.002794801, 0.039394406…
#> $ trend.percent.similar           <dbl> 0.081336262, 0.001945294, 0.036708780…
#> $ trend.percent.similar.Red       <dbl> 0.0458888407, 0.0008668063, 0.0175192…
#> $ trend.percent.similar.Blue      <dbl> 0.2059321799, 0.0001900657, 0.1173271…
#> $ trend.percent.unhappy           <dbl> -0.083779995, -0.003628294, -0.046317…
#> $ trend.percent.unhappy.Red       <dbl> -8.743380e-04, -9.921442e-05, -9.4404…
#> $ trend.percent.unhappy.Blue      <dbl> -2.691585e-01, 3.475738e-17, -2.09609…
#> $ trend.percent.clustering.Red    <dbl> 7.986793e-04, 1.474837e-05, 2.945478e…
#> $ max.percent.similar             <dbl> 70.74361, 42.98516, 68.04761, 72.2500…
#> $ max.percent.similar.Red         <dbl> 74.18262, 59.78800, 73.59369, 88.3309…
#> $ max.percent.similar.Blue        <dbl> 72.26667, 11.66986, 60.23023, 22.5548…
#> $ max.percent.unhappy             <dbl> 39.33725, 41.18609, 33.75617, 72.7354…
#> $ max.percent.unhappy.Red         <dbl> 0.7441860, 0.2087683, 1.4218009, 54.4…
#> $ max.percent.unhappy.Blue        <dbl> 100.00000, 100.00000, 99.26471, 100.0…
#> $ max.percent.clustering.Red      <dbl> 1.291123, 1.017270, 1.237316, 1.47484…
#> $ snap.50.percent.similar         <dbl> 55.60111, 42.81253, 63.64492, 62.1832…
#> $ snap.50.percent.similar.Red     <dbl> 65.00607, 59.70926, 71.46123, 82.0816…
#> $ snap.50.percent.similar.Blue    <dbl> 29.018987, 10.540541, 42.900830, 15.2…
#> $ snap.50.percent.unhappy         <dbl> 17.42383752, 37.50511247, 5.56730092,…
#> $ snap.50.percent.unhappy.Red     <dbl> 0.00000000, 0.00000000, 0.11848341, 2…
#> $ snap.50.percent.unhappy.Blue    <dbl> 61.11111, 100.00000, 28.67647, 100.00…
#> $ snap.50.percent.clustering.Red  <dbl> 1.131408, 1.015930, 1.201463, 1.37049…
#> $ snap.100.percent.similar        <dbl> 63.10817, 42.78007, 67.20875, 65.8841…
#> $ snap.100.percent.similar.Red    <dbl> 69.04976, 59.69415, 73.23038, 85.2958…
#> $ snap.100.percent.similar.Blue   <dbl> 54.48378, 10.40620, 58.16366, 17.9016…
#> $ snap.100.percent.unhappy        <dbl> 8.711919, 37.627812, 1.409443, 48.625…
#> $ snap.100.percent.unhappy.Red    <dbl> 0.00000000, 0.00000000, 0.00000000, 1…
#> $ snap.100.percent.unhappy.Blue   <dbl> 30.303030, 100.000000, 5.147059, 100.…
#> $ snap.100.percent.clustering.Red <dbl> 1.201787, 1.015673, 1.231208, 1.42416…
#> $ snap.150.percent.similar        <dbl> 67.34460, 42.62108, 67.67686, 67.6822…
#> $ snap.150.percent.similar.Red    <dbl> 72.18845, 59.65246, 73.37447, 86.3703…
#> $ snap.150.percent.similar.Blue   <dbl> 64.45296, 10.43992, 59.82778, 19.2883…
#> $ snap.150.percent.unhappy        <dbl> 5.6654196, 37.6687117, 0.4933051, 47.…
#> $ snap.150.percent.unhappy.Red    <dbl> 0.09302326, 0.00000000, 0.00000000, 1…
#> $ snap.150.percent.unhappy.Blue   <dbl> 19.6969697, 100.0000000, 0.7352941, 1…
#> $ snap.150.percent.clustering.Red <dbl> 1.256415, 1.014964, 1.233630, 1.44210…
#> $ snap.200.percent.similar        <dbl> 69.32517, 42.69064, 67.86721, 69.0428…
#> $ snap.200.percent.similar.Red    <dbl> 73.19037, 59.61686, 73.47304, 87.2113…
#> $ snap.200.percent.similar.Blue   <dbl> 67.20882, 10.52940, 60.18553, 18.3012…
#> $ snap.200.percent.unhappy        <dbl> 3.84820951, 37.62781186, 0.14094433, …
#> $ snap.200.percent.unhappy.Red    <dbl> 0.00000000, 0.00000000, 0.00000000, 1…
#> $ snap.200.percent.unhappy.Blue   <dbl> 15.65657, 100.00000, 0.00000, 100.000…
#> $ snap.200.percent.clustering.Red <dbl> 1.273853, 1.014358, 1.235287, 1.45614…
#> $ snap.250.percent.similar        <dbl> 70.33764, 42.59361, 68.04761, 69.8444…
#> $ snap.250.percent.similar.Red    <dbl> 73.68131, 59.65367, 73.59369, 87.0185…
#> $ snap.250.percent.similar.Blue   <dbl> 70.705171, 9.860148, 60.230227, 18.25…
#> $ snap.250.percent.unhappy        <dbl> 3.0464992, 37.6687117, 0.0000000, 44.…
#> $ snap.250.percent.unhappy.Red    <dbl> 0.00000000, 0.00000000, 0.00000000, 1…
#> $ snap.250.percent.unhappy.Blue   <dbl> 12.1212121, 100.0000000, 0.0000000, 1…
#> $ snap.250.percent.clustering.Red <dbl> 1.282398, 1.014984, 1.237316, 1.45292…

We ran the model 5,000 times changing its three parameters density(controls how much “free” space is there for people to move about; set to between 50 and 99), radiusNeighborhood(how large spatially people look when judging their neighbors; between 1 and 5) and X..similar.wanted (% of similiar people wanted in one’s neigbhorhood, between 25% and 75%). Besides the random-seed we collected 77 summary statistics (basic min/max/trend and snapshots for a few time series).

Of course in reality as we said we only have one summary statistic: trend.percent.similar is 0.1.
We can try to estimate the three unknown Schelling parameters with that one summary statistic we have. Let’s try a simple rejection ABC

estimation<-freelunch::fit_rejection_abc(
  # insert here the random simulations you want to generalize from
  training_runs = shelling,
  # insert here the observed summary statistics from the REAL world
  target_runs = c(0.1),
  # insert here the column names for the parameters in the training dataset
  parameter_colnames = c("density","radiusNeighborhood","X..similar.wanted"),
  # insert here the column names for the summary statics in the training dataset 
  # (the same order as you input them in target runs!)
  summary_statistics_colnames = c("trend.percent.similar")
)
freelunch::tidy_up_estimation(estimation) %>% knitr::kable()
run variable estimate higher_bound lower_bound real
1 density 64 93.525 51 NA
1 radiusNeighborhood 4 5.000 1 NA
1 X..similar.wanted 52 66.000 34 NA

Well, we do get an estimate for the three parameters but accepting it at face value is very dangerous. Is it a good estimate?
The first alarm bell starts ringing when we look at the confidence intervals: they are huge! Not much different from our original bound, in fact.

In practice however, wide CIs alone are just a hint (and they could be wrong too!). If we want to know how good our estimate are we need to test the estimation method itself. We do this by cross-validation (taking some runs and pretending they are the real data to see if we estimate their parameters correctly).
We can do this in one line here:

abc_cv<-cross_validate_rejection_abc(
  # this is again target runs
  total_data  = shelling,
  # number of sub-groups to do cross-validation with
  ngroup = 5,
  parameter_colnames = c("density","radiusNeighborhood","X..similar.wanted"),
  summary_statistics_colnames = c("trend.percent.similar")
)
abc_cv$performance
#>            density radiusNeighborhood  X..similar.wanted 
#>        0.126938269       -0.009338484       -0.021497262

And the results are awful. Performance, defined as 1 minus the ratio of mean square root error of the method (abc here) and just guessing the average, goes from zero (the estimation is no better than just guessing the average) to 1 (perfect estimation). Performance of zero means that both radius and %-similarity are not identified at all. Density has also abysmal performance. My hand rule is that things start looking somewhat identified when performance reaches 0.3; however in practice it’s just much easier to just check visually what we have managed to estimate.

The confidence intervals seem to be a bit too large since their coverage (the % of time the real parameter is in the CI) exceeds the target of 95%:

abc_cv$contained
#>            density radiusNeighborhood  X..similar.wanted 
#>             0.9648             0.9952             0.9722

and we can confirm this visually::

plot_confidence_intervals(abc_cv,limit_number_of_runs_to_plot_to = 60) +
  theme_minimal()

Red dots are the “real” value and the error bars are the CI of our estimations. Again, it’s clear we haven’t really discovered anything: the CIs are very wide and about the size of the original priors. ABC is refusing to make a call.
Notice however that it seems to be possible to narrow down the % similarity wanted parameter when it’s around 40%-65%. This makes sense since that’s the sweet-spot of Schelling (much lower nobody moves, much higher everybody is always dissatisfied and moves).

We can look also at point-prediction quality, which is just as dreadful

plot_point_prediction_quality(abc_cv)