For our purposes a pigeon flight path consists of a number of recorded 'x' and 'y' co-ordinates from a Global Positioning Satellite (GPS, don't confuse the two!) recorder, each with a time stamp 't'.

For the sake of simplicity, lets imagine that any such path begins at time

**t=1**, and ends at time

**t=100**, with 100 recorded points equally spaced in time between (this isn't strictly true, but it won't make any real difference in understanding this). How can we assign a probability to this path?

What we do is claim that the 100 recorded 'x' co-ordinates are a sample from a 100-dimensional multivariate Normal distribution,

**N**, with some mean vector,

**m**and covariance matrix

**S**.

**p([x**

_{1}, x_{2}, ..., x_{100}]) = N(**[x**

_{1}, x_{2}, ..., x_{100}];**m, S)**

*(NB: the 'y' co-ordinates will have their own distribution, but we can get away with just considering the 'x's for now, we'll worry about the 'y's a bit later )*

Now, a 100-dimensional distribution sounds a lot scarier than it actually is. All this is telling us is that these 100 recorded locations are connected,

*e.g.*

**x**is likely to be very close to

_{6}**x**, since the pigeon does not have time to move very far between

_{5}**t=5**and

**t=6**. Conversely, the connection between

**x**and

_{5}**x**will be much weaker, since the bird is free to move a large distance during that time. The Normal distribution provides a convenient tool for assigning probabilities to large numbers of correlated variables, and its mathematically easy to deal with (as we'll see as we go further).

_{90}So what are

**m**and

**S**? The mean vector,

**m**, is quite simple. It is where we "expect" the bird to be at a given time. Since we know where the bird starts and finishes, we can expect that

**x**will be at the release point and

_{1}**x**will be at the home loft. Without any other information it is reasonable to assume that the other 98 points should be spaced equally along the straight line between the release point and home. Of course, they almost certainly won't actually be exactly on this line, but there is no reason for us to believe the bird will show a preference to fly one way or another before we see any data. In the picture below the thick black line indicates the locations of

_{100}**m**

The covariance matrix,

**S**, specifies two things. Firstly, the diagonal entries, such as

**S**, specify how much the values of x

_{ii}_{i}are likely to differ from the expected values of the mean,

**m**. The other entries,

_{i}**S**, indicate how strongly connected the values of x

_{ij}_{i}and x

_{j}are. High values of

**S**mean that

_{ij}**x**and

_{i}**x**will be strongly correlated. If

_{j}**S**is zero then there is no correlation between

_{ij}**x**and

_{i}**x**.

_{j}We don't want to have to specify a correlation between every pair of points individually. Instead we construct the matrix

**S**using a

*covariance*

*function*

**k(i, j)**, which depends on the difference between

**i**and

**j**,

*e.g.*

**S**

_{ij}= k(i, j) = k_{0}exp(-(i-j)^2/L)with this function the correlation between

**x**and

_{i}**x**gets weaker as the difference

_{j}**|(i-j)|**gets larger. The parameter

**L**determines how quickly this happens. If

**L**is large then correlations will persist over longer separations between points. If

**L**is very small then correlations will almost disappear after just few time steps. If the correlations between points persist for long periods of time then the path will be very smooth, since any points close to each other in time must also be close in space. Equally, if

**L**is small then the path can be much more 'wiggly' and the bird can change its position quickly.

**k**tells us how uncertain the path is. If

_{0 }**k**were to be zero then all of the entries of

_{0}**S**would be zero and the path would be forced to lie along the mean - their would be no uncertainty. Large values of

**k**mean that any path can be quite far from the straight line. The plot below shows

_{0}**k(i, j)**as a function of

**dt = |i-j|**, using different values of

**L**(the

*Input Scale*), with

**k**set to 1.

_{0}
By applying the function k(i, j) to every pair of points we can construct the full matrix

**S**, which will typically look like the example below:
The values of

**S**peak along the main diagonal and decay as you move away from this. The width of the central red band shows how strongly correlations persist over time. Here points are correlated when they are within about 20-30 time steps of each other.
So, we can get the probability of any path of 100 points, given only a mean and a covariance matrix. The mean, as we saw, is specified simply by knowing where the bird starts and finishes. The covariance matrix is specified by only 2 parameters,

**k**and_{0}**L**. So, the probability of the x co-ordinates depends only on these two parameters (as well as knowing the start and finish, which we'll assume are always known)**p(x | k**

_{0}, L) = N(x; m, S(**k**

_{0}, L)**)**

We can take this further and either find the optimal values of

**k_0**and**L**, or even better, sum over our uncertainty by using an appropriate prior distribution that expresses how likely we think different values of these parameters are (see the post on Bayesianism for more details). This gives us a probability for the path, independent of any particular choice of parameters.**p(x) = ∫ ∫ p(x | k**

_{0}, L)p(k_{0})p(L) dk_{0}dL =**∫ ∫**

**N(x; m, S(**

**k**

_{0}, L)**)**

**p(k**

_{0})p(L) dk_{0}dL

Now, remember those y co-ordinates we removed? We can apply exactly the same analysis as we've done here for the x co-ordinates, but for the y co-ordinates instead, with their own mean (derived again from the straight line path) and covariance (the bird may vary more along x or y axes). Not knowing anything in advance about how the bird's path will vary around the straight line we can treat the x and y co-ordinates as independent (once the mean path is accounted for). Therefore we can get the probability of the whole path simply by multiplying the two probabilities for both sets of co-ordinates.

**p(path) = p(x)p(y)**

So thats how we go about assigning a probability to a path. This probability will reflect our instincts about how 'likely' a path is: paths that lie close to the straight line will be more likely than ones that go off in some bizarre direction, and paths that are excessively 'wiggly' will have a low probability. Nice smooth flight paths in the vague vicinity of the straight line are what we expect a flying animal that cares about energy efficiency to produce.

This might all seem a little dry and you may be wondering exactly what we gain by doing this. For now, I'm going to have ask you to trust me. In the next few posts we'll see how the simple act of matching paths to probabilities gives us some exciting analytical power.