From data privacy to bias to ethics, teaching machines to behave intelligently raises a number of difficult issues for society.
At its core, machine learning is concerned with using large data sets to learn the relationships between variables, make predictions, and interact with a changing environment. And it is becoming an increasingly important tool in business—so much so that almost all employees are likely to be impacted by it in one way or another over the next few years.
Large data sets on variables describing consumer purchases, stock price movements, and many other aspects of a business are not new. What is new is that advances in computer processing speeds and reductions in data storage costs allow us to reach conclusions from large data sets in ways that were simply not possible 20 or 30 years ago.
Machine learning, also referred to as data science, can be viewed as the new world of statistics. Traditionally, statistics has been concerned with such topics as probability distributions, confidence intervals, significance tests, and linear regression. Knowledge of these topics remains important, but we are now able to learn from large data sets in new ways. For example:
These applications of machine learning are now possible because of increases in computer processing speeds and reductions in data storage costs. And as a result, data science may well prove to be the most rewarding and exciting profession of the 21st century.
My latest book [Machine Learning in Business: An Introduction to the World of Data Science] explains the most popular algorithms used by data scientists. The objective is to enable readers to interact productively with data scientists and understand how data science can be used in a variety of business situations.
In this excerpt from the book, I will present some of the key issues posed to society by AI, which should be on the radar of leaders everywhere. But first, a brief history of humankind’s longstanding relationship with machines.
Man vs. Machine: A Brief History
Human progress has been marked by a number of industrial revolutions:
1. Steam and water power (1760-1840)
2. Electricity and mass production (1840-1920)
3. Computers and digital technology (1950-2000)
4. Artificial intelligence (2000-present)
There can be no doubt that the first three revolutions have brought huge benefits to society. The benefits were not always realized immediately, but they have eventually produced big improvements in our quality of life. At various times there were concerns that jobs traditionally carried out by humans would be moved to machines and that unemployment would result. This did not happen. Some jobs were lost during the first three industrial revolutions, but others were created.
For example, the first industrial revolution led to people leaving rural lifestyles to work in factories; the second changed the nature of the work done in factories with the introduction of assembly lines; and the third has led to more jobs involving the use of computers. The impact of the fourth industrial revolution is not yet clear.
It is worth noting that the third industrial revolution did not require all employees to become computer programmers. But it did require people in many jobs to learn how to use computers and work with software such as Word and Excel. We can expect the fourth industrial revolution to be similar to the third in that many individuals will have to learn new skills related to the use of artificial intelligence.
We are now reaching the stage where machine learning algorithms can make many routine decisions as well as, if not better than, human beings. But the key word here is ‘routine’, because the nature of the decision and the environment must be similar to that in the past. If the decision is non-standard or the environment has changed so that past data is no longer relevant, we cannot expect a machine learning algorithm to make good decisions.
Driverless cars provide an example here. If we changed the rules of the road—perhaps regarding how cars can make right or left turns—it would be very dangerous to rely on a driverless car that had been trained using the old rules.
A key task for human beings is likely to be managing large data sets and monitoring machine learning algorithms to ensure that decisions are not made on the basis of inappropriate data. Just as the third industrial revolution did not require everyone to become a computer programmer, the fourth industrial revolution will not require everyone to become a data scientist. However, for many jobs it will be important to understand the language of data science and what data scientists do. Today, many jobs involve using programs developed by others for carrying out various tasks. In the future, they may involve monitoring the operation of machine learning algorithms that have been developed by others.
The fact is, for some time to come, a human plus a trained machine is likely to be more effective than a human or a machine on its own. I will now look at some of the key issues this raises for society—and for leaders.
Issues for Society
Computers have been used to automate business tasks such as record keeping and sending out invoices for many years, and for the most part, society has benefited from this. But it is important to recognize that AI innovations involve more than just the automation of tasks: They actually allow machines to learn. Their aim is to allow machines to make decisions and interact with the environment similarly to the way humans do. Indeed, in many cases, the goal is to train machines so that they improve on the way human beings carry out certain tasks.
Most readers are familiar with the success of Google’s AlphaGo in beating the world champion Go player, Ke Jie. Go is a very complex game. It has too many moves for a computer to calculate all the possibilities, so AlphaGo uses a deep learning strategy to approximate the way the best human players think about their moves, and then improve on it. The key point is that AlphaGo’s programmers did not teach AlphaGo ‘how to play Go’: They taught it ‘to learn how to play Go’.
Teaching machines to use data to learn and behave intelligently raises a number of difficult issues for society. Following are five particular issues that leaders should familiarize themselves with.
Data Privacy. Issues associated with data privacy received a great deal of publicity as a result of the Cambridge Analytica saga. This company worked for both Donald Trump’s 2016 presidential campaign and for an organization campaigning for the UK to leave the European Union. It managed to acquire and use personal data on millions of Facebook users without obtaining permission from them. The data was detailed enough for the company to create profiles and determine what kind of advertisements or other actions would be most effective in promoting the interests of the organizations that had hired it.
Many governments are concerned about issues concerned with data privacy. The European Union has been particularly proactive and passed the General Data Protection Regulation (GDPR) which came into force in May 2018. It recognizes that data is valuable and includes in its requirements the following:
Fines for non-compliance with GDPR can be as high as 20 million euros or four per cent of a company’s global revenue. It is likely that other governments will pass similar legislation in the future. Interestingly, it is not just governments that are voicing concerns about the need to regulate the way data is used by companies. Mark Zuckerberg, Facebook’s CEO, agrees that rules are needed to govern the internet and has expressed support for GDPR.
Biases. By now, we all know that human beings exhibit biases. Some lead to risk-averse behaviour; others to risk seeking; some make us care about people; others lead us to be insensitive. It might be thought that one advantage of machines is that they take logical decisions and are not subject to biases at all. Unfortunately, this is not the case.
Like humans, machine learning algorithms exhibit many biases. One of the main ones to pay attention to concerns the data that has been collected: It might not be representative.
A classic example here (from a time well before the advent of machine learning) is an attempt by the Literary Digest to predict the result of the U.S. presidential election in 1936. The magazine polled ten million people (a huge sample) and received 2.4 million responses. It predicted that Landon (a republican) would beat Roosevelt (a democrat) by 57.1 to 42.9 per cent. In fact, Roosevelt won. What went wrong? The answer is that Literary Digest used a biased sample consisting of Digest readers, telephone users, and those with car registrations. It turned out that, taken together, these were predominantly republican supporters.
More recently, we can point to examples where facial recognition software was trained largely on images of white people and therefore did not recognize other races properly, resulting in misidentifications by police forces using the software.
There is a natural tendency of machine learning data to use readily-available data and to be biased in favour of existing practices. The data available for making lending decisions in the future is likely to be the data on loans that were actually made in the past. It would be nice to know how the loans that were not made in the past would have worked out, but this data, by its nature, is not available. Amazon experienced a similar bias when developing recruiting software. Its existing recruits were predominantly male and this led to the software being biased against women.
As a result, choosing the features that will be considered in a machine learning exercise is a critical task. In most cases, it is clearly unacceptable to use features such as race, gender or religious affiliation. But data scientists also have to be careful not to include other features that are highly correlated with these sensitive features. For example, if a particular neighbourhood has a high proportion of black residents, using ‘neighborhood of residence’ as a feature when developing an algorithm for loan decisions may lead to racial biases.
There are many other ways in which an analyst can (consciously or unconsciously) exhibit biases when developing a machine learning algorithm. For example, the way in which data is cleaned, the choice of models, and the way the results from an algorithm are interpreted and used can be subject to biases.
Ethics. Machine learning raises many ethical considerations. Many people feel that China has gone too far with its Social Credit System, which is intended to standardize the way citizens are assessed. An individual’s social score moves up and down depending on his or her behaviour. Bad driving, smoking in non-smoking areas, and buying too many video games are examples of activities that will lower one’s credit score. The credit score can affect the schools your children attend, whether you can travel abroad, and employment prospects.
Should machine learning be used in warfare? It is perhaps inevitable that it will be. After thousands of Google employees signed an open letter condemning the project, Google canceled Project Maven, which was a collaboration with the U.S. Department of Defense to improve drone strike targeting. However, the U.S. and other nations continue to research how AI can be used for military purposes.
Can machine learning algorithms be programmed to behave in a morally responsible and ethical way? One idea here is to create a new machine learning algorithm, and provide it with a large amount of data labeled as ‘ethical’ or ‘unethical’ so that it learns to identify unethical data. When new data arrives for a particular project, the algorithm could be used to decide whether or not it is ethically appropriate to use the data. The thinking here is that if a human being can learn ethical behaviour, so can a machine. Indeed, some have argued that machines can learn to be more ethical than humans.
An interesting ethical dilemma arises in connection with driverless cars. If an accident is unavoidable, what decision should be taken? How should an algorithm choose between killing a senior citizen and younger person? How should it choose between killing a jaywalker and someone who is obeying the rules for crossing roads? How should it choose between a hitting a cyclist wearing a helmet and one who is not?
The interaction of human beings with machine learning technologies can sometimes lead to unexpected results with inappropriate and unethical behaviour being learned. In March 2016, Microsoft released Tay (short for ‘thinking about you’), which was designed to learn by interacting with human beings on Twitter so that it would mimic the language patterns of a 19-year old American girl. Some Twitter users began tweeting politically incorrect phrases. Tay learned from these, and as a result sent racist and sexually- charged messages to other Twitter users. Microsoft shut down the service just 16 hours after it was released.
Transparency. When a bank uses a decision tree machine learning algorithm to make loan decisions, it is fairly easy to see why a loan was accepted or rejected. However, most machine learning algorithms are ‘black boxes’ in the sense that the reasons for the output are not immediately apparent.
This can create problems. An applicant who is refused for a loan might, not unreasonably, ask why the decision was made. An answer along the lines of ‘The algorithm has rejected you. I have no further information’ is likely to prove unsatisfactory. The General Data Protection Regulation mentioned earlier includes a ‘right to explanation’ with regard to machine learning algorithms applied to the data of citizens of the European Union. Specifically, individuals have the right to “meaningful information about the logic involved in, as well as the significance and the envisaged consequences of, such processing for the data subject.”
When making predictions, it is important to develop ways of making the results of machine learning algorithms accessible to those who are affected by the results. One way of assessing the importance of a particular feature (e.g., a credit score in a loan application) is to make a change to the feature and see what effect it has on the target (probability of default in the case of a loan application). The change made can reflect the dispersion of feature values in the data on which the machine learning algorithm has been trained.
Using this approach it is possible to provide an explanation that assigns a certain percentage to each of the features used. For example, a loan applicant might be told: ’40 per cent of the decision to reject your application was based on your credit score, 25 per cent on your income, 20 per cent on your debt-to income ratio, and 15 per cent on other factors.’
It is also important for companies to understand the algorithms they use so they can be confident that decisions are being made in a sensible way. There is always a risk that algorithms appear to be making intelligent decisions when they are actually taking advantage of obscure correlations.
An example here is the story of a German horse named Hans, who in the early 20th century appeared to be intelligent and able to solve mathematical problems. For example, he could add, subtract, multiply, divide and answer questions such as: ‘if the ninth day of the month is a Wednesday what day of the month is the following Friday?’
Hans indicated answers by stomping his hoof a number of times and received a reward when the answer was correct.
It turned out that the horse was really good at reading the expressions on the face of the person asking the questions and as a result, knew when to stop stomping. He did not actually have any mathematical intelligence. In short, there was a correlation between the correct answer and the expressions on the questioner’s face as the horse stomped its foot.
Similarly, there are stories of image recognition software that can distinguish between polar bears and dogs but is actually just responding to the background (ice or grass/trees), not to the images of the animals themselves. It we are to trust an algorithm to make important decisions for an organization, it is clearly important that we understand exactly how it is making those decisions.
Adversarial Machine Learning. Adversarial machine learning refers to the possibility of a machine learning algorithm being attacked with data designed to fool it. Arguably it is easier to fool a machine than a human being. A simple example of this is an individual who understands how a spam filter works and designs an email to get past it.
‘Spoofing’ in algorithmic trading is a form of adversarial machine learning. A spoofer attempts to (illegally) manipulate the market by feeding it with buy or sell orders and canceling before execution. A serious example of adversarial machine learning could be a malevolent individual who targets driverless cars, placing beside a road a sign that will confuse the car’s algorithm and lead to accidents.
One approach to this problem is to generate examples of adversarial machine learning attempts and train the machine not to be fooled by them. However, it seems likely that humans will have to monitor machine learning algorithms for some time to come to ensure that the algorithms are not being fooled or manipulated. The dangers of adversarial machine learning reinforce the point that machine learning algorithms should not be black boxes without any interpretation. Transparency and interpretability of the output is extremely important.
In closing
We cannot underestimate future advances in machine learning. Eventually, machines will very likely be smarter than human beings in almost every respect. As a result, a continuing challenge for the human race will be addressing the issues discussed herein and figuring out how to partner with machines in a way that benefits rather than damages mankind.
John C. Hull is a University Professor at the Rotman School of Management and Academic Director of FinHub, Rotman’s Financial Innovation lab. His latest book is Machine Learning in Business: An Introduction to the World of Data Science (2019). He is also the author of three best-selling books in the derivatives and risk management area.
Rotman faculty research is ranked in the top 10 worldwide by the Financial Times.
[This article has been reprinted, with permission, from Rotman Management, the magazine of the University of Toronto's Rotman School of Management]