Monday, July 18, 2016

Will your job be automated? A critique of the predictions of Frey and Osborne

You cannot have failed to encounter the current hype and/or panic about job automation. The basic story is compelling. Drawing on the availability of Big Data, artificial intelligence is progressing at a breakneck speed, solving problems that once seemed like science fiction: driverless cars, recognising people in photos, giving eerily accurate suggestions about which films we might want to watch or even what email replies we might want to give. More mundane tasks that were once the preserve of highly-trained professionals are also at risk, such as legal research. A computer can scan millions of legal texts for relevant information while a lawyer is still finding the reference for the text they need.

All of this has led to a widespread belief that many people face the loss of their job in the near future. Of course, automation has been with us since the industrial revolution, and in some areas even before then. Resistance to, and despair about automation is as old as automation itself. But the new panic is about the possible scale of job losses, and the lack of useful employment opportunities for those displaced. An oft-quoted figure is that 47% of U.S. jobs are at risk of automation.

The figure of 47% originates in the work of Carl Frey and Michael Osborne, of Oxford University. Frey and Osborne persuasively argue that the progress in data collection, data analytics and artificial intelligence puts many tasks that were previously thought to be out of reach for computers and robots within touching distance of being automated. They contend that advances in pattern recognition mean that computers, which previously had been used to automate routine tasks, such as performing repeated calculations or fitting parts together in factories, will increasingly be able to tackle non-routine tasks. For example, Siri and similar artificial personal assistants take in unstructured voice requests and determine what the user wants, where to seek the required information and how to present it to them. With enough data, they suggest, almost any task can be automated by looking for patterns in the data that inform the task at hand:

"...we argue that it is largely already technologically possible to automate almost any task, provided that sufficient amounts of data are gathered for pattern recognition." [F&O]

These arguments are persuasive, and there is no doubt that modern machine-learning research has made great strides - it is worth trying to recall how outlandish some of today's AI technologies would have seemed just 10 years ago. Nonetheless, others such as Neil Lawrence, of Sheffield University, have argued that relying on huge data sets in this way is not the same thing as true artificial intelligence. Only a few organisations in the world such as Google and Facebook have access to truly vast amounts of data about our daily behaviours, and a great deal of their research is dedicated to targeting adverts at us with increasing precision. Moreover, if a computer needs a vast data set to learn what it should do, how readily can it adapt to new tasks? Will there always be a big enough relevant data set that has, or even could be collected? What about tasks where the computer may not have access to 'the grid' and the vast centres where data is stored? These are big questions that drive significant bodies of research in AI. Given these uncertainties, it is worth considering how F&O arrive at the rather precise number of 47% for the proportion of jobs at risk.

Fittingly enough, F&O use machine-learning itself to determine whether a job is automatable. They use a tool called Gaussian process classification (GPC) to predict whether a job is automatable, based on the characteristics of that job, as defined and measured in a data set called O*NET, collected by the US Department of Labor. O*NET lists the skills and knowledge required to perform each job. To use GPC to make predictions requires two things, a set of predictors (in this case the O*NET data) and a matching set of known outputs on which to train the classifier. In plain terms, they require not only the job characteristics, but also, for some of these jobs, a known risk of automation. Where does this second part come from? In short, they make an educated guess (or more precisely, they ask a group of well-informed people to make such a guess). In the paper they describe this process:

"First,  together with a group of ML researchers, we subjectively hand-labelled 70 occupations, assigning 1 if automatable, and 0 if not. For our subjective assessments, we draw upon a workshop held at the Oxford University Engineering Sciences Department, examining the automatability of a wide range of tasks. Our label assignments were based on eyeballing the O∗NET tasks and job description of each occupation. This information is particular to each occupation, as opposed to standardised across different jobs. The hand-labelling of the occupations was made by answering the question “Can the tasks of this job be sufficiently specified, conditional on the availability of big data, to be performed by state of the art computer-controlled equipment”. [F&O]

To make the process plain, they took 70 of the jobs in the data set about which they were most confident, and made their best guess as to whether these were going to be automated. They then use the GPC to translate these subjective opinions about 70 jobs into predictions on the other 600 or so in the data set. Essentially they train the GPC to learn what it is about certain jobs that makes them believe they will be automated. Ultimately then, the GPC propagates this subjective opinion to all the other jobs, and determines that 47% are predicted to be automated.

As a side effect, the GPC is also able to identify the factors that seemed to influence whether the workshop participants thought a job would be automated. The factors identified seem reasonable: jobs requiring high social perceptiveness have a low risk for example. But we should perhaps treat these findings with care - the very fact that they seem reasonable to us suggests that they also seemed reasonable to the people making the predictions - no wonder then that they labelled jobs requiring high social perceptiveness as less likely to be automated. Moreover, while the participants of a workshop at the Oxford University Engineering Sciences Department no doubt have greater expertise than the average person in determining the capabilities of machines, we should also be aware that such groups are somewhat selective to technological optimism - few people choose to become researchers in artificial intelligence if they do not believe it is important, any more than you would become a teacher if you didn't think education made a difference. Any biases or blind spots these individuals might have will be translated into the final figure of 47%, as well as the characteristics chosen as most important.

There is a danger when reading the paper (if one does, no doubt many news sources do not), that one can be impressed by the mathematical sophistication of the GPC prediction machinery. It is an impressive piece of technical work. But the GPC can only work with what is is given - it generalises from known examples in the data. The old saying about computer science: 'garbage in, garbage out' is overly pejorative here - the predictions the GPC has been trained on are not garbage, but the best educated guesses of well informed people. They are internally consistent - the GPC can predict well the predictions made by workshop participants for unseen examples. But the GPC cannot predict more accurately than the individuals themselves. It is important to realise that the trained-GPC is effective a machine for making the predictions these same individuals would have made themselves if they were asked. With all the uncertainties involved in a still nascent and quickly changing field, making precise predictions is extremely speculative. Just imagine how different many of these predictions would have been if people had been asked 10 years ago. What might they look like in 10 years time?

All of this makes me very skeptical about the now ubiquitous assumption that masses job losses are inevitable. In many ways I hope they are - we should hope that more of the tasks we only do out of necessity will be automated, as long as the economic gains can be spread equitably (a whole other ball game!). But a narrative of huge disruption feeds into the rather millennial milieu in which we find ourselves, plagued with doubts about our economic system, possible catastrophic climate change, antibiotic resistance etc. It is very tempting to believe that disruptive, destructive change is now a permanent feature of our lives. F&O, to their credit, do not take this line - I have seen Michael Osborne present his work previously and he speaks to all great possibilities automation creates. It is also worth noting that many tasks that can be automated take an amazingly long time to be so. I recently took a trip to the National Coal Mining Museum, where I was amazed to learn that very few mines had any serious machinery involved in the actual hacking off of coal until nationalisation and unionisation drove up labour costs and pushed efficiency up the agenda after the war. I'm perpetually amazed, as a renter, how many people think dishwashers are optional! As Frey & Osborne note, but few news outlets pick up on, automation will only happen if the cost of labour is sufficiently high - many government policies are directed explicitly at lowering the cost of labour to the employer.

We shall no doubt see feats of automation in our lifetimes that would stagger us today, just as the household appliances created in the 20th century would amaze our ancestors. But exactly which jobs will disappear, when they will do so and how many people will become unemployed? I would not want to guess.

Reference: [F&O] The future of employment: How susceptible are jobs to computerisation? Carl Benedikt Frey and Michael A. Osborne

1 comment:

  1. Richard,

    You might find this of interest:

    It takes the Frey and Osborne risk scores and uses them to create 4 different scenarios of future automation job loss. It is an interactive visualization, so you can dig down into the data and explore it interactively.

    My takeaway from the project was that automation risk is highly concentrated among low pay and low skilled workers.