Estimating and cross-validating the Schelling segregation model with freelunch

18 February 2021

A new R-package to estimate simulation models!

So I’ve been working on yet another revision of a paper on estimation of agent-based models and I thought it would be useful to just extract all the estimation commands to make them available in a R package for others to use.
So here is the freelunch package.

It focuses only on “reference tables” methods:

  1. Run the model a bunch of times feeding it random parameters
  2. Collect for each set of parameters you put in, a set of summary statistics describing what the model outputs
  3. Try to generalize this data-set of input-output simulations via a regression/run an Approximate Bayesian Computaiton
  4. Use the generalization to estimate the REAL parameters observing REAL summary statistics.

It’s an extremely inefficient way of estimating models compared to search-based methods but it is has also many advantages:

  1. It produces decent confidence intervals
  2. It is quickly testable (figure out how good the estimates/CI are)
  3. It’s all “post-processing”: all it needs is a csv files with a bunch of random runs, no need to mess around with NETLOGO or whatever else you are using to run the model

Here I thought I’d showcase it by estimating a four-populations, three-parameters Schelling model I nicked from COMSES.

Installation guide

Grab it straight from github, beware that the installation may take a while (since this package really just provides an interface to call methods from better packages):

##Might need to uncomment this line on windows/mac
##if you don't build packages from sources:
##Sys.setenv("R_REMOTES_NO_ERRORS_FROM_WARNINGS" = "true")

devtools::install_github("carrknight/freelunch")

Estimating Schelling with ONE summary statistics

We are studying the housing spatial distribution of 4 racial groups within a city and we notice a steady trend towards segregation. We think the Schelling segregation model could provide a good explanation for it so we want to estimate its parameters given the data we have.
Unfortunately the only data we have is that “similarity” (% of people of the same race being neighbors, i.e. segregation) has increased steadily, at a linear trend of about .1 per month.

So what we do is that we take a NETLOGO version of the famous Schelling segregation model and just run it a bunch of times with random parameters.
This package comes with those runs already done so let’s load them!

library(tidyverse)
library(ggplot2)
library(freelunch)

data("shelling")
glimpse(shelling)
#> Rows: 5,000
#> Columns: 81
#> $ random.seed                     <dbl> 19434.69, 68105.88, 58719.91, 43160.4…
#> $ density                         <dbl> 72, 94, 55, 85, 52, 65, 95, 86, 85, 8…
#> $ X..similar.wanted               <dbl> 30, 35, 27, 61, 43, 38, 73, 39, 28, 3…
#> $ radiusNeighborhood              <dbl> 4, 5, 3, 3, 3, 4, 5, 2, 4, 4, 1, 3, 1…
#> $ mean.percent.similar            <dbl> 63.76226, 42.62576, 65.66402, 66.1955…
#> $ mean.percent.similar.Red        <dbl> 70.01694, 59.62370, 72.47819, 84.8666…
#> $ mean.percent.similar.Blue       <dbl> 54.17323, 10.57785, 52.76714, 17.5342…
#> $ mean.percent.unhappy            <dbl> 9.3662493, 37.7464349, 3.0284300, 47.…
#> $ mean.percent.unhappy.Red        <dbl> 7.911612e-02, 5.548658e-03, 5.432130e…
#> $ mean.percent.unhappy.Blue       <dbl> 32.549750, 100.000000, 13.306136, 99.…
#> $ mean.percent.clustering.Red     <dbl> 1.218620, 1.014474, 1.218561, 1.41699…
#> $ min.percent.similar             <dbl> 40.15779, 41.02683, 41.61093, 41.6127…
#> $ min.percent.similar.Red         <dbl> 57.89020, 58.80828, 59.02070, 59.7649…
#> $ min.percent.similar.Blue        <dbl> 10.181818, 9.719713, 8.983218, 9.2171…
#> $ min.percent.unhappy             <dbl> 2.3516836, 36.9734151, 0.0000000, 40.…
#> $ min.percent.unhappy.Red         <dbl> 0.0000000, 0.0000000, 0.0000000, 15.5…
#> $ min.percent.unhappy.Blue        <dbl> 10.101010, 100.000000, 0.000000, 99.0…
#> $ min.percent.clustering.Red      <dbl> 1.0075588, 1.0006003, 0.9923031, 0.99…
#> $ last.percent.similar            <dbl> 70.50973, 42.72531, 68.04761, 72.2500…
#> $ last.percent.similar.Red        <dbl> 73.86879, 59.66695, 73.59369, 88.1371…
#> $ last.percent.similar.Blue       <dbl> 70.97535, 10.76825, 60.23023, 20.7817…
#> $ last.percent.unhappy            <dbl> 2.3516836, 37.5051125, 0.0000000, 40.…
#> $ last.percent.unhappy.Red        <dbl> 0.0000000, 0.0000000, 0.0000000, 15.7…
#> $ last.percent.unhappy.Blue       <dbl> 10.101010, 100.000000, 0.000000, 100.…
#> $ last.percent.clustering.Red     <dbl> 1.285661, 1.015210, 1.237316, 1.47160…
#> $ sd.percent.similar              <dbl> 7.8630897, 0.3545979, 4.6797985, 4.60…
#> $ sd.percent.similar.Red          <dbl> 4.3909583, 0.1642589, 2.3431204, 3.90…
#> $ sd.percent.similar.Blue         <dbl> 19.8695685, 0.3512378, 13.9403901, 2.…
#> $ sd.percent.unhappy              <dbl> 8.3924392, 0.6793271, 5.7913405, 4.39…
#> $ sd.percent.unhappy.Red          <dbl> 0.133751471, 0.027814485, 0.172597010…
#> $ sd.percent.unhappy.Blue         <dbl> 26.86042555, 0.00000000, 25.05878194,…
#> $ sd.percent.clustering.Red       <dbl> 0.076423098, 0.002794801, 0.039394406…
#> $ trend.percent.similar           <dbl> 0.081336262, 0.001945294, 0.036708780…
#> $ trend.percent.similar.Red       <dbl> 0.0458888407, 0.0008668063, 0.0175192…
#> $ trend.percent.similar.Blue      <dbl> 0.2059321799, 0.0001900657, 0.1173271…
#> $ trend.percent.unhappy           <dbl> -0.083779995, -0.003628294, -0.046317…
#> $ trend.percent.unhappy.Red       <dbl> -8.743380e-04, -9.921442e-05, -9.4404…
#> $ trend.percent.unhappy.Blue      <dbl> -2.691585e-01, 3.475738e-17, -2.09609…
#> $ trend.percent.clustering.Red    <dbl> 7.986793e-04, 1.474837e-05, 2.945478e…
#> $ max.percent.similar             <dbl> 70.74361, 42.98516, 68.04761, 72.2500…
#> $ max.percent.similar.Red         <dbl> 74.18262, 59.78800, 73.59369, 88.3309…
#> $ max.percent.similar.Blue        <dbl> 72.26667, 11.66986, 60.23023, 22.5548…
#> $ max.percent.unhappy             <dbl> 39.33725, 41.18609, 33.75617, 72.7354…
#> $ max.percent.unhappy.Red         <dbl> 0.7441860, 0.2087683, 1.4218009, 54.4…
#> $ max.percent.unhappy.Blue        <dbl> 100.00000, 100.00000, 99.26471, 100.0…
#> $ max.percent.clustering.Red      <dbl> 1.291123, 1.017270, 1.237316, 1.47484…
#> $ snap.50.percent.similar         <dbl> 55.60111, 42.81253, 63.64492, 62.1832…
#> $ snap.50.percent.similar.Red     <dbl> 65.00607, 59.70926, 71.46123, 82.0816…
#> $ snap.50.percent.similar.Blue    <dbl> 29.018987, 10.540541, 42.900830, 15.2…
#> $ snap.50.percent.unhappy         <dbl> 17.42383752, 37.50511247, 5.56730092,…
#> $ snap.50.percent.unhappy.Red     <dbl> 0.00000000, 0.00000000, 0.11848341, 2…
#> $ snap.50.percent.unhappy.Blue    <dbl> 61.11111, 100.00000, 28.67647, 100.00…
#> $ snap.50.percent.clustering.Red  <dbl> 1.131408, 1.015930, 1.201463, 1.37049…
#> $ snap.100.percent.similar        <dbl> 63.10817, 42.78007, 67.20875, 65.8841…
#> $ snap.100.percent.similar.Red    <dbl> 69.04976, 59.69415, 73.23038, 85.2958…
#> $ snap.100.percent.similar.Blue   <dbl> 54.48378, 10.40620, 58.16366, 17.9016…
#> $ snap.100.percent.unhappy        <dbl> 8.711919, 37.627812, 1.409443, 48.625…
#> $ snap.100.percent.unhappy.Red    <dbl> 0.00000000, 0.00000000, 0.00000000, 1…
#> $ snap.100.percent.unhappy.Blue   <dbl> 30.303030, 100.000000, 5.147059, 100.…
#> $ snap.100.percent.clustering.Red <dbl> 1.201787, 1.015673, 1.231208, 1.42416…
#> $ snap.150.percent.similar        <dbl> 67.34460, 42.62108, 67.67686, 67.6822…
#> $ snap.150.percent.similar.Red    <dbl> 72.18845, 59.65246, 73.37447, 86.3703…
#> $ snap.150.percent.similar.Blue   <dbl> 64.45296, 10.43992, 59.82778, 19.2883…
#> $ snap.150.percent.unhappy        <dbl> 5.6654196, 37.6687117, 0.4933051, 47.…
#> $ snap.150.percent.unhappy.Red    <dbl> 0.09302326, 0.00000000, 0.00000000, 1…
#> $ snap.150.percent.unhappy.Blue   <dbl> 19.6969697, 100.0000000, 0.7352941, 1…
#> $ snap.150.percent.clustering.Red <dbl> 1.256415, 1.014964, 1.233630, 1.44210…
#> $ snap.200.percent.similar        <dbl> 69.32517, 42.69064, 67.86721, 69.0428…
#> $ snap.200.percent.similar.Red    <dbl> 73.19037, 59.61686, 73.47304, 87.2113…
#> $ snap.200.percent.similar.Blue   <dbl> 67.20882, 10.52940, 60.18553, 18.3012…
#> $ snap.200.percent.unhappy        <dbl> 3.84820951, 37.62781186, 0.14094433, …
#> $ snap.200.percent.unhappy.Red    <dbl> 0.00000000, 0.00000000, 0.00000000, 1…
#> $ snap.200.percent.unhappy.Blue   <dbl> 15.65657, 100.00000, 0.00000, 100.000…
#> $ snap.200.percent.clustering.Red <dbl> 1.273853, 1.014358, 1.235287, 1.45614…
#> $ snap.250.percent.similar        <dbl> 70.33764, 42.59361, 68.04761, 69.8444…
#> $ snap.250.percent.similar.Red    <dbl> 73.68131, 59.65367, 73.59369, 87.0185…
#> $ snap.250.percent.similar.Blue   <dbl> 70.705171, 9.860148, 60.230227, 18.25…
#> $ snap.250.percent.unhappy        <dbl> 3.0464992, 37.6687117, 0.0000000, 44.…
#> $ snap.250.percent.unhappy.Red    <dbl> 0.00000000, 0.00000000, 0.00000000, 1…
#> $ snap.250.percent.unhappy.Blue   <dbl> 12.1212121, 100.0000000, 0.0000000, 1…
#> $ snap.250.percent.clustering.Red <dbl> 1.282398, 1.014984, 1.237316, 1.45292…

We ran the model 5,000 times changing its three parameters density(controls how much “free” space is there for people to move about; set to between 50 and 99), radiusNeighborhood(how large spatially people look when judging their neighbors; between 1 and 5) and X..similar.wanted (% of similiar people wanted in one’s neigbhorhood, between 25% and 75%). Besides the random-seed we collected 77 summary statistics (basic min/max/trend and snapshots for a few time series).

Of course in reality as we said we only have one summary statistic: trend.percent.similar is 0.1.
We can try to estimate the three unknown Schelling parameters with that one summary statistic we have. Let’s try a simple rejection ABC

estimation<-freelunch::fit_rejection_abc(
  # insert here the random simulations you want to generalize from
  training_runs = shelling,
  # insert here the observed summary statistics from the REAL world
  target_runs = c(0.1),
  # insert here the column names for the parameters in the training dataset
  parameter_colnames = c("density","radiusNeighborhood","X..similar.wanted"),
  # insert here the column names for the summary statics in the training dataset 
  # (the same order as you input them in target runs!)
  summary_statistics_colnames = c("trend.percent.similar")
)
freelunch::tidy_up_estimation(estimation) %>% knitr::kable()
run variable estimate higher_bound lower_bound real
1 density 64 93.525 51 NA
1 radiusNeighborhood 4 5.000 1 NA
1 X..similar.wanted 52 66.000 34 NA

Well, we do get an estimate for the three parameters but accepting it at face value is very dangerous. Is it a good estimate?
The first alarm bell starts ringing when we look at the confidence intervals: they are huge! Not much different from our original bound, in fact.

In practice however, wide CIs alone are just a hint (and they could be wrong too!). If we want to know how good our estimate are we need to test the estimation method itself. We do this by cross-validation (taking some runs and pretending they are the real data to see if we estimate their parameters correctly).
We can do this in one line here:

abc_cv<-cross_validate_rejection_abc(
  # this is again target runs
  total_data  = shelling,
  # number of sub-groups to do cross-validation with
  ngroup = 5,
  parameter_colnames = c("density","radiusNeighborhood","X..similar.wanted"),
  summary_statistics_colnames = c("trend.percent.similar")
)
abc_cv$performance
#>            density radiusNeighborhood  X..similar.wanted 
#>        0.126938269       -0.009338484       -0.021497262

And the results are awful. Performance, defined as 1 minus the ratio of mean square root error of the method (abc here) and just guessing the average, goes from zero (the estimation is no better than just guessing the average) to 1 (perfect estimation). Performance of zero means that both radius and %-similarity are not identified at all. Density has also abysmal performance. My hand rule is that things start looking somewhat identified when performance reaches 0.3; however in practice it’s just much easier to just check visually what we have managed to estimate.

The confidence intervals seem to be a bit too large since their coverage (the % of time the real parameter is in the CI) exceeds the target of 95%:

abc_cv$contained
#>            density radiusNeighborhood  X..similar.wanted 
#>             0.9648             0.9952             0.9722

and we can confirm this visually::

plot_confidence_intervals(abc_cv,limit_number_of_runs_to_plot_to = 60) +
  theme_minimal()

Red dots are the “real” value and the error bars are the CI of our estimations. Again, it’s clear we haven’t really discovered anything: the CIs are very wide and about the size of the original priors. ABC is refusing to make a call.
Notice however that it seems to be possible to narrow down the % similarity wanted parameter when it’s around 40%-65%. This makes sense since that’s the sweet-spot of Schelling (much lower nobody moves, much higher everybody is always dissatisfied and moves).

We can look also at point-prediction quality, which is just as dreadful

plot_point_prediction_quality(abc_cv) 

We would like points to stay on the 45 degree line and instead they are everywhere. I think it’s clear the estimation here doesn’t work.
The model is under-identified with the data we have!

We could try with something a bit more fancy, like a local-linear ABC but it ends up being no better (in fact slightly worse!):

ll_cv<-cross_validate_loclinear_abc(total_data  = shelling,
                                    ngroup = 5,
                                    parameter_colnames = c("density","radiusNeighborhood","X..similar.wanted"),
                                    summary_statistics_colnames = c("trend.percent.similar")
)
ll_cv$performance
#>            density radiusNeighborhood  X..similar.wanted 
#>         0.12555132         0.01697889        -0.01305731

We could (should) try other methods but it it seems that you are just unable to estimate those parameters with just one summary statistic.

Adding more summary statistics

Trend in general unhappiness

So we couldn’t quite estimate anything just looking at the segregation trend. What if we tried adding more data?

Imagine that we rummage through the real data and discover that together with a trend in segregation, somebody ran a monthly survey asking people if they were unhappy about their neighbors and were planning to move. This nets us a second trend (trend.percent.unhappy) and we can plug it in to see the effects.

So we run again the cross-validation with one more summary statistic, this time using GAM (because we are fancy and probably listen to radio 4 on our way to work).

gam_cv<-cross_validate_gam(shelling,ngroup=5,
                           parameter_colnames = c("density","radiusNeighborhood","X..similar.wanted"),
                           summary_statistics_colnames =
                             c(
                               #new summary statistic:
                               "trend.percent.unhappy",
                               "trend.percent.similar"))
gam_cv$performance
#>            density radiusNeighborhood  X..similar.wanted 
#>          0.1484307          0.0750796          0.1045652

Unfortunately performance is still very low, and it’s clear by looking at the plots that we are still far from properly estimating the model.

gam_cv %>% plot_confidence_intervals()

Trend in general unhappiness - per population

We go back to the data and see that we can disaggregate the overall trend in unhappiness and compute it separately for two of the four sub-populations: the “red” population (which is the majority) and “blue” population (which is one of three minorities).

This leaves us with four summary statistics. Let’s switch to random forests for this, maybe using the “quick and dirty” option to run it quickly:

cv_rf<-cross_validate_random_forest(shelling,ngroup=5, fast=TRUE,
                                     parameter_colnames = 
                                       c("density","radiusNeighborhood","X..similar.wanted"),
                                     summary_statistics_colnames =
                                       c("trend.percent.unhappy",
                                         ##new summary statistics here:
                                         "trend.percent.unhappy.Red",
                                         "trend.percent.unhappy.Blue",
                                         "trend.percent.similar"))

cv_rf$performance
#>            density radiusNeighborhood  X..similar.wanted 
#>          0.3822976          0.5624102          0.8088432

That performance is much better, particularly for “% of similarity wanted”! And the visuals confirm it.

cv_rf %>% plot_confidence_intervals() + theme_minimal(20)

cv_rf %>% plot_point_prediction_quality() 

Now we can estimate!

So now that we have some confidence in the estimation capability of the model given the data we have, we can plug in the “real” values. We bring out

estimation<-fit_random_forest(training_runs = shelling, fast=TRUE,
                              target_runs = 
                                ### pluggin in the "real" data:
                                c(
                                  -0.02,
                                  -0.05,
                                  -0.02,
                                  0.1),
                              parameter_colnames = c("density","radiusNeighborhood","X..similar.wanted"),
                              summary_statistics_colnames =
                                c("trend.percent.unhappy",
                                  ##new summary statistics here:
                                  "trend.percent.unhappy.Red",
                                  "trend.percent.unhappy.Blue",
                                  "trend.percent.similar"))
tidy_up_estimation(estimation) %>% knitr::kable()
run variable estimate higher_bound lower_bound real
1 density 66.79780 84.045475 48.779854 NA
1 radiusNeighborhood 1.97510 3.284475 1.013845 NA
1 X..similar.wanted 62.32203 69.439441 55.278477 NA

So it is still quite difficult to get neighborhood and density narrowed down but at least we know that “% of similarity wanted” is somewhere between 55% and 69%; the scientific proof that people are quite racist. Time to tweet the results and harvest the likes.

What next?

Well, this really depends. We now have a parametrized model, we could go back to the simulator, plug in the parameters check its dynamics and compute alternative histories, policies and so on.
It might be that it’s imperative to get the radiusNeighborhood right, at which point we need to come back here and try to narrow down its range and improve its performance. But at least this package can help.