Way back when this century was young, the navigation group in Oxford published a series of papers demonstrating that pigeons, when repeatedly released from the same site, would learn to follow the same route back the home loft each time.
If a pigeon is learning and following a route this ought to make its flight patterns predictable. If those flights are getting more and more predictable we should be able to observe that by using a model to predict the flights with increasing accuracy. In other words, we should have a model which gives the probability of a flight path, and that probability should get higher as our predictions get better.
In the last post we saw how to assign probabilities to individual flight paths using a Gaussian process (GP). The precise probability of a given flight path depended on the mean, m, and covariance, S, of that GP. I told you that the covariance dictated how likely the flight path was to be either smooth or wiggly, and we used the straight line between release point and home loft to create the mean. For convenience I'll write down the resulting probability as:
p(x| m, S) = GP(x; m, S)
Now, the reason we chose the straight line path to be the mean was that if we only look at a single path, and we have never seen this particular bird fly before, there is no reason to assume it will fly either one side or the other from this most efficient route. We don't expect the flight path to be perfectly straight, but we don't know beforehand in which direction it will go.
Imagine instead that we had already seen the flight paths below.
Now we should have a very good idea where the next path is going to be, somewhere close to the paths we have already seen. It looks like the pigeon is following a particular route home every time, so its unlikely to suddenly fly directly south from the release point next time. Obviously it doesn't fly exactly the same path every time, but each new flight path is like an imperfect attempt to fly some memorised route.
Lets imagine that we could look into the mind of the pigeon and retrieve exactly what its memorised route looks like. We can call this route h (for 'habitual'). Then we might replace the earlier straight line mean path with the one we now know the bird is trying to fly
p(x | h, S) = GP(x; h, S)
(I'm going to assume for simplicity that we know what S is, but in practice we would infer it from the data)
Whats more, if we want to find the probability of several flight paths by the same bird, each an attempt to replicate h, we can simply multiply the probability of each path together, because each one is independent if we know h.
p(x1, x2, ...,xn | h, S) = GP(x1; h, S) x GP(x2; h, S) x ... x GP(xn; h, S)
Hang on! Surely those flight paths aren't really independent?! After all, they all look the same. Yes! But the reason they look the same is that they are all attempts to replicate h. They way each path varies around h is independent. All the shared structure in the paths is located in h.
Ok, thats nice, but the problem is that we don't know what h is. All we can see are a few paths that look a bit like h. But never fear - Bayes is here...we can use those flight paths we have actually seen to infer what h is. Recall Bayes' rule which allows use to reverse the order of the conditional probability:
p(h | x1, x2, ...,xn, S) = p(x1, x2, ...,xn| h, S) x p(h | S) / p(x1, x2, ...,xn| S)
But we seem to be creating more trouble for ourselves. Now we need to know two more things, p(h | S) and p(x1, x2, ...,xn | S). Are we digging a hole for ourselves?
No! The first of these terms is a prior distribution. It's how likely we think any particular habitual route would be before we see any real paths. So we need to place a probability distribution over a path that could lie anywhere between the release point and the home loft. Thats exactly what we learned how to do in the last post! Before we see any real paths theres no reason to expect the habitual path to be on either side of the straight line, so the probability of h is exactly like a single path on its own, with the straight line as a mean.
p(h | S) = GP(h; m, S)
The second term is the joint probability of the real paths, if we don't know what h is. This can be calculated by integrating over all possible values of h.
∫ p(x1, x2, ...,xn | h, S) x p(h | S) dh
and this is where the theory of Gaussian processes really helps us. Integrals like this are really easy to do (using a few matrix rules...easy is a relative term!) when everything is Gaussian...
∫ p(x1, x2, ...,xn | h, S) p(h | S) dh = ∫ GP(x1; h, S) GP(x2; h, S) GP(xn; h, S) GP(h; m, S) dh
= GP ([x1, x2, ...,xn], [m,m,...,m], Σ)
where those square brackets indicate that we're concatenating the n paths and n copies of the vector m. We have a big new covariance matrix, Σ, which is generated from S. If we want to mathematical details of how we do that I would suggest reading them in this paper (Open access), where it's all properly formatted without the restrictions of html. Here we'll just assume we know the matrix rules for multiplying Gaussian distributions together - check out Appendix A of my thesis if you're interested.
The upshot of all this is that we can calculate a probability distribution, p(h | x1, x2, ...,xn, S), which tells us how likely any given habitual route h is, based on the flight paths we've already seen. Does it work? Well, look at the picture below, showing a set of flight paths from two birds, and the distribution (mean + variance) of the inferred habitual routes. The faint black lines are the flight paths, recorded from GPS. The thick black lines are the 'best guess' of the habitual routes, and the dashed red lines indicate how uncertain these are. The dashed black lines indicate where most future flight paths are expected to lie.
If we can infer what the habitual route is, we should then be able to do exactly what I suggested at the top of this post, and make some predictions about where future flight paths will be, and see if these become more accurate as the birds learn their routes. In fact, we have already done everything we need. We calculated the joint probability of n paths, assuming that we didn't know the habitual route.
p(x1, x2, ...,xn| S) = GP ([x1, x2, ...,xn], [m,m,...,m], Sigma)
if we want to calculate how probable path xn is, based on the previous n-1 paths, we simply calculate the joint probability of x1, x2, ...,xnand of x1, x2, ...,xn-1
p(xn | x1, x2, ...,xn-1| S ) = p(x1, x2, ...,xn | S) / p(x1, x2, ...,xn-1 | S)
So lets test it out. In the experiments done in Oxford the typical procedure was to release the same bird 20 times from the same spot. What happens if we calculate how likely each of these flight paths are, based on the previous 2 flights immediately before?
That graph shows the (log) probability of the next path becoming higher over time - the pigeons are becoming more predictable, just as we hoped! Where the y-axis is equal to zero is the point at which the paths are more predictable than if we just guessed wildly without seeing any other previous flights. Therefore we can say that after ~10 flights the birds are more predictable than random - they have learnt their routes.
This demonstration of increasing predictability is a nice alternative way of seeing route learning that was previously shown by measuring the average distance between successive paths, but its not immediately clear why it should be any more useful. In the next post we'll see how we can see now only that the route is being learnt, but where it is being learnt, to identify where the landmarks the pigeons use to navigate are and what they might be.