How Can Your Organization Manage AI Model Biases?

I’ve been reading the interesting and soul-searching (from a
data scientist perspective) book from Cathy O’Neil titled
Weapons
of Math Destruction
”, or WMD as used in the book. The book
provides several real-world examples of how Big Data and Data
Science – when not properly structured – can lead to
ethically-wrong unintended consequences. 

Chapter 3 “Arms Race: Going to College” describes how the
college ranking system developed by “US News & World
Report” in 1983 has created its own self-fulfilling, mis-aligned
ecosystem. Because of the influence the “US News & World
Report” ranking has on the multi-billion-dollar college
recruiting business, a few key metrics – SAT scores,
student-teacher ratios, acceptance rates, alumni donations,
freshman retention – get over-valued in college’s investment
strategies. 

The unintended consequences is that many colleges focus their
investments on overly-opulent facilities and over-paid research
faculty programs in the effort to increase their ranking, sometimes
at the expense of a more holistic “quality education and
enlightening personal experience” for the college students.

But the “US News & World Report” ranking is greatly
flawed with the omission of several critical key metrics.  For
example, the ranking doesn’t considered price.  If cost is not an
issue for someone deciding to go to college, then that’s okay. 
But for the other 99% of us, cost is an important factor in
determining a “quality” educational experience.

And that’s the challenge with AI model biases, if you don’t
carefully consider the different variables and metrics against
which you need to measure model progress and success, you may end
up with AI models that deliver ethically-wrong unintended
consequences.

So, how does one mitigate the negative impacts of models that
are supposed to represent the real-world, but actually provide a
dangerously biased and skewed perspective on that world? Here are a
couple of things that every organization can do to reduce the
ethically-wrong unintended consequences caused by AI models that
turn into ““Weapons of Math Destruction”:

  1. Brainstorm a “diverse, sometimes conflicting set of
    metrics” across a diverse group of stakeholders that drives the
    AI Utility Function
  2. Thoroughly explore and quantify the AI model costs of False
    Positives and False Negatives

#1. Brainstorm Diverse Set of Metrics to Power AI Utility
Function

One way to avoid AI models that deliver unintended consequences
is to invest the time upfront to brainstorm a “diverse, sometimes
conflicting set of metrics” against which the AI model will seek
to optimize.  This means embracing a diverse set of stakeholders (a
stakeholder
map
can help to identify the different stakeholders who either
impact or are impacted by the AI model) who can provide a diverse
set of perspectives on how best to measure the AI model’s
progress and success.

To understand why it’s important to capture a diverse and
sometimes conflicting set of metrics against which the AI model
must seek to optimize, one needs to understand how an AI model (AI
Agent) works (see Figure 1):

  1. The AI model relies upon the creation of an “AI Agent” that
    interacts with the environment to learn, where learning is guided
    by the definition of the rewards and
    penalties
    associated with actions taken by the AI
    Agent.
  2. The rewards and penalties against which the “AI Agent”
    seeks to take the “right” or optional actions are framed by the
    definition of valueas represented in
    the AI Utility Function.
  3. In order to create an “AI Agent” that makes the “right”
    decision, the AI Utility Functionmust be comprised
    of a holistic definition of “value” including
    financial/economic, operational, customer, society, environmental
    and spiritual value.


Figure 1:

Role of AI Agents and Continuously-learning and
Adapting

Bottom-line: the AI Agent determines or learns
“right versus wrong” based upon the definition
of value as articulated in the AI Utility
Function
.  The AI Utility
Function
provides the metrics against which the AI model
will learn the right actions to take in what situations (see Figure
2)


Figure

2:
AI Utility Function

To avoid the unintended consequences of a poorly constructed AI
Utility Function, collaboration with a diverse set of stakeholders
is required to identify those short-term and long-term metrics and
KPI’s against which AI model progress and success will be
measured.  The careful weighing of the short-term and long-term
metrics associated with the financial/economic, operational,
customer, society, environmental and spiritual dimensions must be
taken into consideration if we are to make AI work to the benefit
of all stakeholders (and maybe avoid those pesky Terminators in the
process).

To help brainstorm these diverse set of metrics, embrace the
“Thinking Like a Data Scientist” methodology which is designed
to drive the cross-organizational collaboration necessary to root
out and brainstorm these different metrics. The “Thinking
Like a Data Scientist
” process guides the identification of a
“diverse, sometimes conflicting metrics” into the data science
modeling work because the real world is full of “diverse,
sometimes conflicting metrics” against which the world must try
to optimize (see Figure 3).


Figure

3:
The Art of Thinking Like a Data Scientist

A key deliverable from the “Thinking Like a Data Scientist”
process is the
Hypothesis Development Canvas
.  The Hypothesis Development
Canvas helps in the identification of the variables and metrics
against which one is going to measure the targeted use case’s
progress and success.  For example, increase financial value, while
reducing operational costs and risks, while improving customer
satisfaction and likelihood to recommend, while improving societal
value and quality of life, while reducing environmental impact and
carbon footprint (see Figure 4).


Figure

4:
Hypothesis Development Canvas

The AI modeling requirements captured in the Hypothesis
Development Canvas then need to be translated into the AI Utility
Function that guides the metrics and variables against which the AI
model will seek to optimize. Shortcutting the process to define the
measures against which to monitor any complicated business
initiative is naïve…and could ultimately be dangerous depending
upon the costs associated with False Positives and False
Negatives. 

#2. Codifying the Costs Associated with False Positives and
False Negatives

Unintended consequences can easily occur with the AI model if a
thorough, comprehensive exploration of “what could go wrong”
isn’t conducted prior to building the AI models, and then
integrated those costs into the AI Utility Function.  And that
brings us into the realm of Type I and Type II errors, or False
Positives and False Negatives.

  • A Type I Error, or False Positive, occurs when asserting
    something as true when it is actually false.  This false positive
    error is basically a “false alarm” – a result that indicates
    a given condition has been fulfilled when it actually has not been
    fulfilled (i.e., erroneously a positive result has been
    assumed).
  • A Type II Error, or False Negative, occurs when a test result
    indicates that a condition has failed, when in reality the
    condition was successful.   A Type II Error or False Negative
    occurs when we fail to believe something (like someone is sick, or
    a part is going to break) is a true condition.

I think most folks struggle to understand Type I (False
Positive) and Type II (False Negative) errors, which is why I think
Figure 5 summarizes Type I and Type II errors very nicely
(he-he-he).


Figure

5: Understanding Type I (False Positive)
and Type II (False Negative) Errors

In Figure 5, a Type I Error (False Positive) occurs when the
doctor tells the man that he is pregnant, when obviously he can’t
be.  The Type II Error (False Negative) occurs when the doctor
tells the women that she is NOT pregnant when visual inspection
confirms that she is pregnant.

Let’s look at understanding the costs of False Positives and
False Negative using a real-world COVID19 example. With respect to
COVID19, when one has incomplete data and is trying to buy time in
order to get more complete, accurate and trusted data through
testing, then the best thing that one can do is to make decisions
based upon the costs of the False Positives and False
Negatives.  In the case of the COVID19, that means:

  • The cost of a False Positiveis that a healthy
    person will be quarantined and will be one of the first to receive
    the vaccine when it is available.  The cost of being wrong in this
    case are the costs associated with being quarantined such as lost
    wages and the inconvenience associated with being quarantined.  The
    cost of the False Positive in this case is very
    low.
  • One the other hand, the cost of a False
    Negative
    is that an infected person is classified as
    healthy and they continue to mingle in public infecting others and
    even potentially leading to the death of others.  The cost of
    the False Negative in this case is very high.

See the blog “Using
Confusion Matrices to Quantify the Cost of Being Wrong
” for
more homework on understanding the costs associated with False
Positives and False Negatives.  Maybe some of you can share this
blog with some of our elected officials…

Summary

Any time you see a very complex, multi-faceted decision that has
been boiled down to a single number…WATCH OUT! Creating a single
number against which to monitor any complicated business initiative
is naïve. Baseball, for example, leverages a
bevy of numbers and metrics
to determine the value of a
particular player, and many of those numbers and metrics – such
as Wins above Replacement, Offensive Wins above Replacement,
Offensive Runs above Average and W-L Percentage of Offensive Wins
above Average – are complex, composite metrics that are comprised
of additional data and metrics.

In a world more and more driven by AI models, Data Scientists
cannot effectively ascertain on their own the costs associated with
the unintended consequences of False Positives and False Negatives.
Mitigating unintended consequences requires the collaboration
across a diverse set of stakeholders in order to identify the
metrics against which the AI Utility Function will seek to
optimize.  And these metrics need to represent multiple, sometimes
conflicting objectives including financial/economic, operational,
customer, society, environmental and spiritual objectives.  And
again, the determination of the metrics that comprise the AI
Utility Function is not a Data Scientist job, unless, of course,
you don’t mind herds of Terminators roaming the local mall (I
hear that they like sunglasses).