Tuesday, November 6, 2018

Noise and collective behaviour

"You should never do anything random!", Michael Osborne told the room at large for the umpteenth time during my PhD. Mike, now an Associate Professor at Oxford, is still telling anyone who will listen about the evils of using random numbers in machine learning algorithms. He contends that for any algorithm that makes decisions at random, there is a better algorithm that makes those decisions according to some non-random criterion.
Having seen this debate sporadically erupt over a decade or so has burned the issue of random decision making deep into my psyche, where it has lurked until I recently had reason to think a bit more deeply about the use of random decision making in my own research area: modelling animal behaviour.

Many models of animal behaviour make use of random decisions, or 'noise'. For example, an animal may choose a new direction in which to move by averaging the current directions of the other individuals around itself, and then add some random error to that average to give its own new direction. But why should an animal do something random? Surely there is a 'best' action to be taken based on what the animal knows, and it should do this. Indeed, how would an animal do something random? It is remarkably difficult for humans to write down 'random' series of numbers without the aid of a random number generator, such as a coin or a dice. If you were asked to pick one of two options with a probability of 0.65, how would you do it? Why should an animal be any different?

Usually when we ascribe noise to animal or human decisions, what we are really doing is modelling that part of the decision that we either don't understand, or that we choose to ignore. For example, in a recent paper my coauthors and I looked at factors influencing neighbourhood choice in Stockholm. We modelled choices as being influenced by the characteristics of the moving household and the neighbourhoods they were choosing from, but ultimately being probabilistic - i.e. random. As we say in the paper, this is equivalent to assuming that the households are influenced by many other factors that we don't observe, and that they make the best choice given all this extra information. Because we don't see everything that influences the decision, it appears 'noisy' to us.

So far, so good. This is fundamentally no more controversial than treating a coin toss as random, even though we know that the coin is obeying deterministic physical rules. As long as we use a decent model for the stochasticity in these decisions, we can happily treat what are really deterministic decisions as being random, and still make solid inferences about what influenced them. But we can run into trouble when we forget that only we are playing this trick. This becomes a problem in the world of collective behaviour, where we want to understand how animals are influencing and being influenced by each other. Though we might treat individual animals' decisions as being partly random, we cannot guarantee that the animals themselves also do the same thing. Indeed, it is likely that the animals themselves have a better idea about what factors motivate and influence each other than we do. Where we might, in our ignorance, see a random action, another animal might well see a response to some cue that we haven't thought to look for.

To illustrate, lets imagine that you and I are trying to choose a restaurant. For the purposes of simplicity I will assume that we like very much the same things in a restaurant - we have the same tastes in food and ambience. We approach two seemingly similar-looking restaurants, A and B. I can smell that the food in restaurant A smells somewhat more appetising than in B. Nonetheless, I see you starting to walk in the direction of restaurant B. I know we can both smell that the food in A is better, so what should I make of your decision? If I assume your decision is partly random, I might just assume you made a mistake - A really is better, but you randomly picked B instead. I am then free to pick A. But if I assume you made the best choice with the information available, I must conclude that you have some private information that outweighs the information we share - maybe you earlier read an excellent review of restaurant B. Since our tastes are very similar, I should also conclude that if I had access to your private information as well, I would have made the same choice, since the choice is determined exactly by the information. So now I really ought to pick restaurant B as well.
This place looked great on the web...
[Kyle Moore, CC-SA 1.0]
Looking at collective decision making this way shows that how individuals should respond to each other depends on how much they ascribe the choices made by others to random chance, not how much we do. We therefore need to be careful not to assume that 'noise' in the behaviour of animals in groups is an intrinsic property of the decisions, but instead remember that it depends on choices we make in deciding what to measure, and what to care about. The animals themselves may make very different choices. The consequences of adopting this viewpoint are laid out in detail in my recent paper: Collective decision making by rational individuals. In short they are:

1. The order in which previous decisions have been made is crucial in determining what the next individual will do - the most recent decisions are the most important.

2. Because of the above, how animals appear to interact depends strongly on what we choose to measure. Social information isn't something we can measure in a model-free, neutral way.

3. Group behaviour should change predictably when we observe animals (or humans) in their natural habitat versus the laboratory. In general, social behaviour will probably be stronger in the lab.

None of this is to say that animals or humans always (or ever) do behave rationally. Rather, that they make decisions on the basis of reasons, not the roll of a dice. And their reaction to the choices made by others will be shaped by what they perceive those reasons to be in other individuals. Perhaps, to paraphrase Michael Osborne, we should never assume that other people or animals are doing anything random. Or at least we shouldn't assume that other people are assuming that.........

Thursday, November 1, 2018

Yet more reasons to fund diverse basic science

Research is an incremental, iterative process. New advances build on those that came before, and open up new lines of research to follow afterwards. But not all research leads anywhere. The office drawers of academics are full of manuscripts that never got published, or data from studies that never showed any results. Whole fields such as phrenology enjoy periods in the sun before fading away (if you know of any modern research that directly descends from phrenology, let me know in the comments).

In this respect, research is a lot like the Tree of Life, with each project or study being a species. Species may give rise to new species (new research questions), or they may go extinct, but the Tree of Research (hopefully) endures. 

Mathematicians have tools for understanding tree-generating processes such as these: birth-death models. These specify what types of tree are likely to be generated based on the rates of speciation and extinction for individual species.

Graham Budd and I recently published a study investigating the properties of these processes. Trees generating by birth-death processes are very vulnerable; a newly created tree with only a few species can easily stop growing if all of those species go extinct. On the flip side, trees that have already generated many species can be very robust and are hard to push towards extinction. A consequence of this is that trees that do survive a long time tend to have bursts of rapid diversification at the start. Looking more deeply into the trees that survive, we find that the surviving lineages (those species that have modern descendants) are always diversifying like crazy, speciating at twice the rate we would otherwise expect.

Trees that survive for a long time tend to diversify quickly when they are small (Budd & Mann 2018)

What does this have to do with research funding? Increasingly research funding is allocated on the basis of competitive grant applications. I have written before about the waste involved in this, but another consequence is that research diversity suffers. To get a grant in the UK for example, you must convince the funder and reviewers that you have a very good chance to make notable findings and have impact in academia, industry and elsewhere. This requirement, along with the notable and growing bias towards funding senior academics who have substantial previous funding, favours research that is predictable, which follows the researcher's previously demonstrated expertise and where preliminary results are already available. This in turn reduces the diversity of possible research avenues that might be explored. 

What is the result of reducing diversity? Our research suggests that if we depress the diversification of research we risk extinguishing the Tree of Research altogether. If we focus research efforts too narrowly we put too many eggs in too few baskets. The future success of those research areas is less predictable than we might like to think - few phrenologists thought that their expertise would one day be seen as quackery. If those bets don't pay off then scientific progress may slow down or stop altogether.

Lineages that give rise to long-term descendants are always diversifying quickly (red lines). Green lines diversify slowly and go extinct (Budd & Mann 2018)

But surely, you might reply, isn't it a good idea to check on the track record of scientists and look at their ideas before giving them lots of public money? No doubt there is some value in scrutiny, but given the competition for academic jobs I think we can safely say that most academics have already been scrutinised before they start asking for money. As stated above, I believe our ability to predict what will be a success is highly limited. Moreover, several studies have shown that we can't even agree on what is good or not anyway, reducing weeks or months of labour to a lottery. Just as importantly, as another of my recent papers, this time with Dirk Helbing, has shown, the way that we allocate rewards and resources based on past success can distort the things that people choose to research, and as a result reduce the collective wisdom of academia as a whole. Dirk and I showed that too much diversity in what people choose to research is greatly preferable to too little: as a collective we need the individuals who research seemingly mad questions with little chance of success. Unfortunately, the most natural ways to reward and fund academics based on their track record would seem to create far too little diversity of research.

Fig. 2.

Rewards influence diversity and collective wisdom. Too much diversity (orange line) is better than too little (black and blue lines). (Mann & Helbing 2017).
So what can be done? Dirk and I showed that collective intelligence can be optimised by retrospectively rewarding individuals who are proved right when the majority is wrong. This mirrors approaches in statistics for ensemble learning called Boosting, wherein we train models to predict data that other models were unable to predict accurately. So I would be in favour of targeting grants to those who have gone against prevailing opinion and been proved right. However, we also showed that if agents choose what to research at random this will create greater collective intelligence than many reward schemes. This would support funding many scientists with unconditional funding that supports research wherever their curiosity takes them. This would have the additional advantage of removing much of the deadweight cost of grant applications.