Monday, December 5, 2016

Machine-learning doesn't give you a free pass

A few weeks ago I read this paper on arXiv, purporting to use machine-learning techniques to determine criminality from facial expressions. The paper uses ID photos of "criminals and non-criminals" and infers quantifiable facial structures that separate these two classes. I had a lot of issues with it and was annoyed if not surprised when the media got excited by it. Last week I also saw this excellent review of the paper that echoes many of my own concerns, and in the spirit of shamelessly jumping on the bandwagon I thought I'd add my two-cents.

As someone who has dabbled in criminology research, I was pretty disturbed by the paper from an ethical standpoint. I think this subject, even if it is declared fair game for research, ought to be approached with the utmost caution. The findings simply appeal too strongly to some of our more base instincts, and to historically dangerous ideas, to be treated casually. The sparsity of information about the data is troubling, and I personally find the idea of publishing photos of "criminals and non-criminals" in a freely-available academic paper to be extremely unsettling (I'm not going to reproduce them here). The paper contains no information on any ethical procedures followed.


Aside from these issues, I was also disappointed from a statistical perspective, and in a way that is becoming increasingly common in applications of machine-learning. The authors of this paper appear not to have considered any possible issues with the causality of what they are inferring. I have no reason to doubt that the facial patterns they found in the "criminal" photos are distinct in some way from those in the "non-criminal" set. That is, I believe they can, given a photo, with some accuracy predict which set it belongs to. However, they give no consideration to any possible causal explanation for why these individuals ended up in these two sets, beyond the implied idea that some individuals are simply born to be criminals and have faces to match.

Is it not possible, for example, that those involved in law enforcement are biased against individuals who look a certain way? Of course it is. Its not like there isn't research on exactly this question. Imagine what would happen if you conducted this research in western societies: do you doubt that the distinctive facial features of minority communities would be inferred as criminal, simply because of well-documented police and judicial bias against these individuals? In fact, you need not imagine, this already happens: machine-learning software analyses prisoners risk of reoffending, and entirely unsurprisingly attributes higher risk to black offenders, even though race is not explicitly included as a factor.

If this subject matter was less troublesome, I would support the publication of such results as long as the authors presented the findings as suggesting avenues for future, more careful controlled studies. However, in this case the authors resolutely do not take this approach. Instead, they conclude that their work definitively demonstrates the link between criminality and facial features:
"We are the first to study automated face-induced inference
on criminality. By extensive experiments and vigorous
cross validations, we have demonstrated that via supervised
machine learning, data-driven face classifiers are able
to make reliable inference on criminality. Furthermore, we
have discovered that a law of normality for faces of noncriminals.
After controlled for race, gender and age, the
general law-biding public have facial appearances that vary
in a significantly lesser degree than criminals."

This paper remains un-reviewed, and let us hope it does not get a stamp of approval by a reputable journal. However, it highlights a problem with the recent fascination with machine-learning methods. Partly because of the apparent sophistication of these methods, and partly because many in the field are originally computer scientists, physicists or engineers, rather than statisticians, there has been a reluctance to engage with statistical rigour and questions of causality. With many researchers hoping to be picked up by Google, Facebook or Amazon, the focus has been on predictive accuracy, and on computational efficiency in the face of overwhelming data. Some have even declared that the scientific method is dead now that we have Big Data. As Katherine Bailey has said: "Being proficient in the use of machine learning algorithms such as neural networks, a skill that’s in such incredibly high demand these days, must feel to some people almost god-like ".

This is dangerous nonsense, as the claim to infer criminality from facial features shows. It is true that Big Data gives us many new opportunities. In some cases, accurate prediction is all we need, and as we have argued in a recent paper, prediction is easy, cheap and unproblematic compared to causal inference. Where simple predictions can help, we should go ahead. We absolutely should be bringing the methods and insights of machine-learning into the mainstream of statistics (this is a large part of what I try to do in my research). Neil Lawrence has said that Neural Networks are "punk statistics", and by God statistics could do with a few punks! But we should not pretend that simply having a more sophisticated model, and a huge data set, absolve us of the statistical problems that have plagued analysts for centuries when testing scientific theories. Our models must be designed precisely to account for possible confounding factors, and we still need controlled studies to carefully assess causality. As computer scientists should know: garbage in, garbage out.


This is not a plea for researchers to 'stay in their lane'. I think criminology and statistics both need fresh ideas, and many of the smartest people I know work in machine-learning. We should all be looking for new areas to apply our ideas in. But working in a new field comes with some responsibility to learn the basic issues in that area. Almost everyone in biology or social science has a story about a physicist who thought they could solve every problem in a new field with a few simple equations, and I don't want data scientists to do the same thing. I fear that if modern data science had been invented before the discovery of the Theory of Gravity, we would now have computers capable of insanely accurate predictions of ballistics and planetary motions, and absolutely no idea how any of it really worked.