Idea in Brief
Bias will find its way into AI and machine-learning models no matter how strong your technology is or how diverse your organization may be.
There are many sources of biased AI, all of which can easily fly under the radar of data scientists and other technologists.
An AI ethics committee can identify and mitigate the ethical risks of AI products that are developed in-house or procured from third-party vendors.
In 2019 a study published in the journal Science found that artificial intelligence from Optum, which many health systems were using to spot high-risk patients who should receive follow-up care, was prompting medical professionals to pay more attention to white people than to Black people. Only 18% of the people identified by the AI were Black, while 82% were white. After reviewing data on the patients who were actually the sickest, the researchers calculated that the numbers should have been about 46% and 53%, respectively. The impact was far-reaching: The researchers estimated that the AI had been applied to at least 100 million patients.
While the data scientists and executives involved in creating the Optum algorithm never set out to discriminate against Black people, they fell into a shockingly common trap: training AI with data that reflects historical discrimination, resulting in biased outputs. In this particular case, the data that was used showed that Black people receive fewer health care resources, which caused the algorithm to mistakenly infer that they needed less help.
There are a lot of well-documented and highly publicized ethical risks associated with AI; unintended bias and invasions of privacy are just two of the most notable kinds. In many instances the risks are specific to particular uses, like the possibility that self-driving cars will run over pedestrians or that AI-generated social media newsfeeds will sow distrust of public institutions. In some cases they’re major reputational, regulatory, financial, and legal threats. Because AI is built to operate at scale, when a problem occurs, it affects all the people the technology engages with—for instance, everyone who responds to a job listing or applies for a mortgage at a bank. If companies don’t carefully address ethical issues in planning and executing AI projects, they can waste a lot of time and money developing software that is ultimately too risky to use or sell, as many have already learned.
Your organization’s AI strategy needs to take into account several questions: How might the AI we design, procure, and deploy pose ethical risks that cannot be avoided? How do we systematically and comprehensively identify and mitigate them? If we ignore them, how much time and labor would it take us to respond to a regulatory investigation? How large a fine might we pay if found guilty, let alone negligent, of violating regulations or laws? How much would we need to spend to rebuild consumer and public trust, provided that money could solve the problem?
The answers to those questions will underscore how much your organization needs an AI ethical risk program. It must start at the executive level and permeate your company’s ranks—and, ultimately, the technology itself. In this article I’ll focus on one crucial element of such a program—an AI ethical risk committee—and explain why it’s critical that it include ethicists, lawyers, technologists, business strategists, and bias scouts. Then I’ll explore what that committee requires to be effective at a large enterprise.
But first, to provide a sense of why such a committee is so important, I’ll take a deep dive into the issue of discriminatory AI. Keep in mind that this is just one of the risks AI presents; there are many others that also need to be investigated in a systematic way.
Why and How Does AI Discriminate?
Two factors make bias in AI a formidable challenge: A wide variety of accidental paths can lead to it, and it isn’t remedied with a technical fix.
The sources of bias in AI are many. As I’ve noted, one issue is that real-world discrimination is often reflected in the data sets used to train it. For example, a 2019 study by the nonprofit newsroom the Markup found that lenders were more likely to deny home loans to people of color than to white people with similar financial characteristics. Holding 17 factors steady in a statistical analysis of more than 2 million conventional mortgage applications for home purchases, the researchers found that lenders were 80% more likely to reject Black applicants than to reject white ones. AI programs built on historical mortgage data, then, are highly likely to learn not to lend to Black people.
An AI ethical risk program must start at the executive level and permeate your company’s ranks—and, ultimately, the technology itself.
In some cases discrimination is the result of undersampling data from populations that the AI will have an impact on. Suppose you need data about the travel patterns of people commuting to and from work in order to create public transportation schedules, so you gather information on the geolocations of smartphones during commuting hours. The problem is that 15% of Americans, or roughly 50 million people, don’t own a smartphone. Many simply cannot afford a device and a data plan. People who are financially less well off, then, would be underrepresented in the data used to train your AI. As a result, your AI would tend to make decisions that benefit the neighborhoods where wealthy people live.
Proxy bias is another common problem. In one of its investigations ProPublica obtained the recidivism risk scores assigned to more than 7,000 people arrested in Broward County, Florida, in 2013 and 2014. The scores, which were generated by AI, were designed to predict which defendants were likely to commit additional crimes within two years of arrest and thus help judges determine bail and sentencing. When ProPublica checked to see how many defendants were actually charged with new crimes over the next two years, it found that the scores’ forecasts were unreliable. For example, only 20% of the people who were predicted to commit violent offenses did so. The algorithm doing the scoring was also twice as likely to falsely flag Black defendants as future criminals than to flag white defendants.
Although Northpointe, the developers of the AI’s algorithm, disputed ProPublica’s findings (more on that later), the underlying bias is worth examining. To wit: There can be two subpopulations that commit crimes at the same rate, but if one of them is policed more than the other, perhaps because of racial profiling, it will have higher arrest rates despite equal crime rates. Thus, when AI developers use arrest data as a proxy for the actual incidence of crimes, they produce software that erroneously claims one population is more likely to commit them than another.
In some cases the problem lies with the goal you’ve set for your AI—that is, in the decision about what the AI should predict. For instance, if you’re determining who should get lung transplants, you might prefer to give them to younger patients so that you can maximize the number of years the lungs will be used. But if you asked your AI to determine which patients were most likely to use the lungs for the longest amount of time, you would inadvertently discriminate against Black patients. Why? Because life expectancy at birth for the total U.S. population is 77.8 years, according to the Centers for Disease Control and Prevention’s National Center for Health Statistics. Life expectancy for the Black population is only 72 years.
Addressing these kinds of problems isn’t easy. Your company may not have the ability to account for historical injustices in data or the resources to carry out the investigation needed to make a well-informed decision about AI discrimination. And the examples raise a broader question: When is it ethically OK to produce differential effects across subpopulations, and when is it an affront to equality? The answers will vary by case, and they cannot be found by adjusting AI algorithms.
This brings us to the second hurdle: the inability of technology—and technologists—to effectively solve the discrimination problem.
At the highest level, AI takes a set of inputs, performs various calculations, and creates a set of outputs: Input this data about loan applicants, and the AI produces decisions about who is approved or denied. Input data about what transactions occurred where, when, and by whom, and the AI generates assessments of whether the transactions are legitimate or fraudulent. Input criminal justice histories, résumés, and symptoms, and the AI makes judgments about recidivism risk, interview worthiness, and medical conditions, respectively.
When is it OK to produce differential effects across subpopulations, and when is it an affront to equality? The answers will vary and cannot be found by adjusting AI algorithms.
One thing the AI is doing is dispensing benefits: loans, lighter sentences, interviews, and so on. And if you have information about the demographics of the recipients, then you can see how those benefits are distributed across various subpopulations. You may then ask, Is this a fair and equitable distribution? And if you’re a technologist, you may try to answer that question by applying one or more of the quantitative metrics for fairness unearthed by the growing research on machine learning.
Problems with this approach abound. Perhaps the biggest is that while roughly two dozen quantitative metrics for fairness exist, they are not compatible with one another. You simply cannot be fair according to all of them at the same time.
For example, Northpointe, the maker of COMPAS, the software that provides risk ratings on defendants, replied to charges of discrimination by pointing out that it was using a perfectly legitimate quantitative metric for fairness. More specifically, COMPAS aimed to maximize the rate at which it accurately identified people who would commit new offenses across Black and white defendants. But ProPublica used a different metric: the rate of false positives across Black and white defendants. Northpointe wanted to maximize true positives, while ProPublica wanted to minimize false ones. The issue is, you can’t do both at once. When you maximize true positives, you increase false positives, and when you minimize false positives, you decrease true positives.
Technical tools just aren’t enough here. They can tell you how various tweaks to your AI will result in different scores on different metrics of fairness, but they cannot tell you which metric to use. An ethical and business judgment needs to be made about that, and data scientists and engineers are not equipped to make it. The reason has nothing to do with their character; it’s simply that the vast majority of them have no experience or training in grappling with complex ethical dilemmas. Part of the solution to the problem, then, is to create an AI ethical risk committee with the right expertise and with the authority to have an impact.
The Function and Jurisdiction of an AI Ethics Committee
Your AI ethics committee can be a new entity within your organization or an existing body that you assign responsibility to. And if your organization is large, you might need more than one committee.
At a high level the function of the committee is simple: to systematically and comprehensively identify and help mitigate the ethical risks of AI products that are developed in-house or purchased from third-party vendors. When product and procurement teams bring it a proposal for an AI solution, the committee must confirm that the solution poses no serious ethical risks; recommend changes to it, and once they’re adopted, give it a second review; or advise against developing or procuring the solution altogether.
One important question you need to examine is how much authority the committee will have. If consulting it isn’t required but is merely advised, only a subset of your teams (and probably a small one) will do so. And only a subset of that subset will take up the committee’s recommendations. This is risky. If being ethically sound is at the top of the pyramid of your company’s values, granting the committee the power to veto proposals is a good idea. That will ensure that it has a real business impact.
In addition, you can reinforce the committee’s work by regularly recognizing employees, both informally (with, say, shoutouts at meetings) and formally (perhaps through promotions) for sincerely upholding and strengthening ethical standards for AI.
When a committee is given real power it allows great trust to be built with the company’s employees, clients, consumers, and other stakeholders, such as the government, especially if the organization is transparent about the committee’s operations—even if not about its exact decisions. However, companies that aren’t ready to grant that kind of authority to an internal committee but are serious about AI ethical risk mitigation can still find a middle ground. They can allow a senior executive, most likely someone in the C-suite, to overrule the committee, which would let their organizations take ethical risks that they consider to be worthwhile.
Who Should Serve on the Committee?
Now it’s time to dive a little deeper into the cross-functional expertise of the members: Who needs to be on your AI ethics committee and why?
These could be people with PhDs in philosophy who specialize in ethics, say, or people with master’s degrees in the ethics of criminal justice (or whatever your industry is). They aren’t there to render decisions about the company’s ethics, however. They’re there because they have the training, knowledge, and experience needed to understand and spot a vast array of ethical risks, are familiar with concepts and distinctions that aid in clear-eyed ethical deliberations, and are skilled at helping groups objectively assess ethical issues. This is not to say that you need full-time ethicists on staff; rather, you can bring them in and consult them when appropriate.
Because technical tools aren’t enough to solve the problem of bias, what is legally permissible often becomes an important consideration.
Lawyers, of course, are better equipped than anyone to figure out whether using a particular metric for fairness that has different effects on different subgroups might be viewed as discrimination under the law. But lawyers can also help determine whether using technical tools to assess fairness is even legal. It may well be prohibited by anti-discrimination law, which doesn’t allow data on variables associated with protected classes to be taken into account in a very wide range of decisions.
The expected financial returns on AI differ from use to use, and so do the business risks (promises have been made to clients, and contracts have been signed). The magnitude and kinds of ethical risks also vary, along with the strategies for addressing them and the investments of time and money those strategies will require.
So what mitigation tactics to take, when to take them, who should execute them, and so on is a business consideration. And while I tend to prioritize identifying and mitigating ethical risk, I must admit that in some cases that risk is small enough and other business risks are big enough that a restrained approach to managing it is reasonable. All of this is why having someone with a firm grip on business necessities on the committee is itself a business necessity.
Though I’ve explained what technologists cannot do, I must also acknowledge what they can: help others understand the technical underpinnings of AI models, the probability of success of various risk mitigation strategies, and whether some of those strategies are even feasible.
For example, using technology to flag possible bias presupposes that your organization has and can use demographic data to determine how a model’s output distributes goods or services across various subpopulations. But if you lack that demographic data or, as happens in financial services, you’re legally barred from collecting it, you’ll be stymied. You’ll have to turn to other strategies—such as creating synthetic data to train your AI. And whether those strategies are technologically possible—and, if so, how heavy a lift they are—is something that only a technologist can tell you. That information must find its way into the deliberations of the committee.
Bias scouts and subject matter experts.
Technical bias-mitigation tools measure the output of AI models—after data sets have been chosen and models have been trained. If they detect a problem that cannot be solved with relatively minimal tweaking, you’ll have to go back to the drawing board. Starting mitigation at step one of product development—during data collection and before model training—would be far more efficient and greatly increase your chances of success.
That is why you need people on your committee who might spot biases early in the process. Subject matter experts tend to be good at this. If your AI will be deployed in India, for instance, then an expert on Indian society should weigh in on its development. That person may understand that the way the data was gathered is likely to have undersampled some subset of the population—or that achieving the goal set for the AI may exacerbate an existing inequality in the country.
. . .
A strong artificial intelligence ethics committee is an essential tool for identifying and mitigating the risks of a powerful technology that promises great opportunities. Failing to pay careful attention to how you create that committee and how it gets folded into your organization could be devastating to your business’s reputation and, ultimately, its bottom line.
Editor’s note: Reid Blackman is the author of Ethical Machines: Your Concise Guide to Totally Unbiased, Transparent, and Respectful AI (Harvard Business Review Press, 2022), from which this article is adapted.