# 1 The Model

## 1.1 Fishers

Fishers are simple state automata. A fisher is either at port, moving to and from his fishing spot (using the standard A* pathfinder) or fishing. Each fisher has a boat (with given speed, mileage and hold size), a fishing gear (with given catchability and gas consumption per hour deployed), a bank balance and a port he belongs to.
Finally a fisher has a utility function to judge his status and a set of friends he can communicate with.

A fisher has to make 3 decisions repeatedly: whether to go out fishing, where to go and when to come back. Fishers need to take these decisions without precise knowledge of the world laws and states besides what they can gleam from past experience. Knowledge decays rapidly however as biomass, quotas and profit opportunities are being simultaneously exploited by competitors. Additional decisions may be possible in certain situations: the fisher might need to choose his gear or his reservation price for additional catch quotas.

We model here fishers’ decisions as separate bandit problems. The “Multi-armed bandit problem” is a framework to study the exploration-exploitation trade-off faced when repeatedly choosing among a finite set of options (see the monograph on the subject by Bubeck and Cesa-Bianchi 2012 as a reference for both the mathematical problem and common algorithms to solve it; see Kuleshov and Precup 2014 for a gentler introduction). For example, when choosing where to fish, one either goes to the best known spot (exploitation) or tows at a new promising location (exploration). Often fishers face bandit problems that are adversarial (as biomass and profitability are affected by natural processes and other fishers), contextual (as information about one area might provide clues about neighbouring locations) and vast: a map gridded with 50 rows and columns forces fishers to choose 1 option out of 2500.

In this model fishers deal with bandit problems with a slightly modified $$\epsilon$$-greedy algorithm. Fishers will explore with probability $$\epsilon$$ and otherwise exploit. Exploiting means repeating the last best choice. Exploring means choosing a new option in the neighbourhood of the last one. Geographical exploration means picking a random cell from the $$\delta$$-sized Von-Neumann neighbourhood of the current best. Numerical exploration means shocking the current best by random noise $$\delta \sim U[\text{Minimum},\text{Maximum}]$$ (these are just variants of stochastic hill-climbing , see chapter 4 of Russell and Norvig (2010)). If the explored choice improves utility it becomes the current best. The following pseudo-code summarizes this approach:

given current_best
with probability epsilon explore:
choose new_option in neighborhood of current_best
if utility(new_option) > utility(current_best) :
current_best = new_option
else exploit:
choose current_best

We show a geographical example below. The dark grey patch maximises profits but the fisher doesn’t know it. He initially tows at a random spot and from time to time explores around it. He eventually finds the best area.