NYAI #24: Developing Trust in Artificial Intelligence and Machine Learning for High-Stakes Applications w/ Dr. Kush Varshney (IBM Research AI)

© 2018 International Business Machines Corporation
IBM Research AI
Developing Trust in Artificial Intelligence
and Machine Learning for High-Stakes
Applications
Kush R. Varshney 
krvarshn@us.ibm.com 
http://krvarshney.github.io

© 2018 International Business Machines Corporation IBM Research AI
▪ Pneumonia patients with a history of asthma predicted to have lower mortality risk
than general population

Machine Learning in High-Stakes Applications
▪ Machine learning algorithms have started influencing every part of our lives
– Health and wellness, law and order, commerce, entertainment, finance, human capital management,
communication, transportation, philanthropy, …
– Technological components of larger sociotechnical systems
▪ Many consequential decisions are being supported by machine learning models:
Credit Employment Admission Sentencing

Trust

Maslow’s Hierarchy of Needs
https://mrsomarali.files.wordpress.com/2015/10/maslow1.jpg

Hierarchy of Needs for AI
https://mrsomarali.files.wordpress.com/2015/10/maslow1.jpg
basic accuracy and reliability
safety and security
transparency and trust

• Medical diagnosis
• Prison sentencing
• Loan approval
• (Enterprise AI)
• Streaming services deciding on the
compression level of video packets
• Web portals deciding which news
story to show on top
• Phoneme classification within
speech transcription systems
• (Consumer AI)
Decision Sciences Data Products
Application Domains

• High human cost of errors
• Uncertainty of training set being
representative of test set, and few
predictions made
• Safety and security issues
• Consideration of strategies for
achieving safety and security
beyond basic accuracy
• Quality of service has low human cost
• Large training set, large test set, ability
to explore the feature space, so little
epistemic uncertainty
• Not safety and security issues
• Focus squarely on basic accuracy
Decision Sciences Data Products
Application Domains

• Self-driving cars
• Robotic surgery
• Autonomous weapons
• Humongous state space
• Difficult value alignment
Cyber-Physical Systems
Application Domains

Some of Our Focus Areas – Enterprise AI vs. Consumer AI
▪ Transferability
▪ Learning + Reasoning
▪ Safety and Security

Agenda
▪ Formal definition of safety in machine learning
▪ Strategies for achieving safety
▪ Details on interpretable machine learning
▪ Details on algorithmic fairness
– AI Fairness 360
▪ Transparency via supplier’s declarations of conformity (SDoCs)
▪ Conclusion

K. R. Varshney and H. Alemzadeh, “On the Safety of Machine Learning:
Cyber-Physical Systems, Decision Sciences, and Data Products,” Big
Data, vol. 5, no. 3, p. 246–255, Sept. 2017.

Safety
▪ Commonly used term across engineering disciplines connoting the absence of
failures or conditions that render a system dangerous (Ferrell, 2010)
– Safe food and water, safe vehicles and roads, safe medical treatments, safe toys, safe
neighborhoods, safe industrial plants, …
▪ Each domain has specific design principles and regulations applicable only to it
▪ Few works attempt a precise definition applicable broadly
▪ Definition based on harm, risk, and epistemic uncertainty (Möller, 2012)

Harm
▪ A system yields an outcome based on its state and the inputs it receives
▪ The outcome event may be desired or undesired
▪ Outcomes have associated costs that can be measured and quantified by society
▪ An undesired outcome is a harm if its cost exceeds some threshold
▪ Unwanted events of small severity are not counted as safety issues

Risk
▪ We do not know what the outcome will be, but its distribution is known and we can
calculate the expectation of its cost
▪ Risk is the expected value of the cost of harm

Epistemic Uncertainty
▪ We still do not know what the outcome will be, but in contrast to risk, its probability
distribution is also unknown
▪ Epistemic uncertainty, in contrast to aleatoric uncertainty, results from lack of
knowledge that could be obtained in principle, but may be practically intractable to
gather
▪ Some decision theorists argue that all uncertainty can be captured probabilistically,
but we maintain the distinction between risk and uncertainty, following Möller
(2012)

Safety
▪ Safety is the reduction or minimization of risk and epistemic uncertainty of harmful
events
▪ Costs have to be sufficiently high in some human sense for events to be harmful
▪ Safety involves reducing both the probability of expected harms and the possibility
of unexpected harms

Risk Minimization in Statistical Machine Learning
▪ Risk minimization is the basis of statistical machine learning theory and practice
– Features and labels with probability density
– Function mapping
– Loss function
– Find to minimize risk
▪ Given i.i.d. training samples, not
– Empirical risk minimization
– converges to uniformly for all as goes to infinity (Glivenko-Cantelli)
▪ When is small, minimizing may not yield an with small
– Restrict complexity of based on some inductive bias (Vapnik, 1992)

Epistemic Uncertainty in Machine Learning
▪ Risk minimization has many strengths but does not capture uncertainty
▪ Not always the case that training samples are drawn from true underlying
probability distribution of
– The distribution the samples come from cannot always be known
Precludes covariate shift and domain adaptation techniques
– Training on a data set from a different distribution can cause much harm
▪ Even when drawn from true distribution, training samples may be absent from
large parts of due to small probability density there

Epistemic Uncertainty in Machine Learning
▪ Statistical learning theory analysis utilizes laws of large numbers to study the effect
of finite training data and the convergence of to
▪ In considering safety, should also be cognizant that in deployment, a machine
learning system only encounters a finite number of test samples
– The actual operational risk is an empirical quantity of the test set
– This operational risk may be much larger than the true risk for small cardinality test sets, even if is
risk-optimal
– This uncertainty caused by the instantiation of the test set can have large safety implications on
individual test samples

Loss Functions in Machine Learning
▪ In risk minimization, the domain of the loss function is and the range is an abstract
quantity representing prediction error
▪ In real-world applications, the value of the loss function may be endowed with
some human cost
– That human cost may imply a loss function that also includes in the domain
▪ The cost may be severe enough to be harmful and thus a safety issue in some
parts of the domain and not in others

Value Alignment
▪ Obtaining is non-trivial and context-dependent
▪ Eliciting the values of society
▪ Encapsulation of morals

Four Categories of Strategies for Achieving Safety
▪ Möller and Hansson (2008) have identified four main
categories of approaches to achieve safety
1. Inherently safe design
– Exclusion of a potential hazard from the system, e.g. excluding
hydrogen from buoyant material of blimp
2. Safe fail
– System remains safe when it fails in its intended operation, e.g.
dead man’s switches
3. Safety reserves
– Safety factors and safety margins, e.g. vessel wall is designed
thicker than the largest pressure it will experience by some ratio
4. Procedural safeguards
– Audits, training, posted warnings

Inherently Safe Design in Machine Learning
▪ Want robustness against the uncertainty of the training set not being sampled from the test
distribution
– The training set may have various quirks or biases that are unknown to the user and that will not be present
during the test phase
– Highly complex modeling techniques used today, including extreme gradient boosting and deep neural networks,
may pick up on those data vagaries in the learned models they produce to achieve high accuracy, but might fail
due to an unknown shift in the data domain
– The models are so complex that it is very difficult to understand how they will react to such shifts and whether
they will produce harmful outcomes as a result
▪ Interpretability of models and causality of predictors
– Extra constraints on
▪ May be performance loss in accuracy by doing so when measuring accuracy with a
common training and testing data probability distribution, but the reduction in epistemic
uncertainty by doing so increases safety

Safe Fail in Machine Learning
▪ When predictions cannot be given confidently, a reject option is used
– A human operator intervenes and provides a manual prediction
▪ Models are reported to be least confident and in the reject regime near the
decision boundary in classification problems
– Implicitly assumed that distance from the boundary is inversely related to confidence
– Reasonable in parts of with high probability density and many training samples
▪ In parts of with low density and no training samples, decision boundary may be
completely based on inductive bias with much epistemic uncertainty
– Distance from boundary as a trigger for reject option is meaningless
▪ For a rare combination of features in a test sample, always elect manual
examination (Attenberg et al., 2015)

Safety Reserves in Machine Learning (Uncertainty)
▪ Parametrize all uncertainty using
– Uncertainty in training set being from true distribution
Uncertain class priors, label noise, etc.
– Uncertainty in instantiation of test set
Do not care as much about average test error as maximum test error in certain applications
▪ Let the risk of the risk-optimal model if were known be
▪ Robust formulations find while constraining or minimizing either:

Safety Reserves in Machine Learning (Loss Function)
▪ Fairness
– The risk of harm for members of protected groups should not be much worse than the risk of harm for
others
– Feature space partition contains unprivileged group members (e.g. defined by caste or gender)
– Feature space partition contains priviliged group members
▪ Disparate impact constraint (Feldman et al., 2015)

Procedural Safeguards in Machine Learning
▪ Defining the training set and setting up evaluation procedures have certain
subtleties that can cause harm if done incorrectly
– User experience design can be used to guide and warn users and thereby increase safety
▪ Open source software and open data allows for the possibility of public audit
– Safety hazards can be discovered through public examination

Three Dimensions of Explainability 
One explanation does not fit all: There are many ways to explain things
directly interpretable vs. post hoc interpretation
global (model-level) vs. local (instance-level)
static vs. interactive (visual analytics)
The oldest AI formats, such as decision rule sets,
decision trees, and decision tables are simple enough
for people to understand. Supervised learning of these
models is directly interpretable.
Start with a black box model and probe into it with a
companion model to create interpretations. The black box
model continues to provide the actual prediction while
interpretation improve human interactions.
Show the entire predictive model to the user to help
them understand it (e.g. a small decision tree,
whether obtained directly or in a post hoc manner).
Only show the explanations associated with individual
predictions (i.e. what was it about the features of this
particular person that made her loan denied).
The interpretation is simply presented to the user. The user can interact with interpretation.

D. M. Malioutov and K. R. Varshney, “Exact Rule Learning via Boolean
Compressed Sensing,” Int. Conf. Machine Learning, p. 765-773, Jun. 2013.

Directly Interpretable Supervised Binary Classification
▪ Binary classification: special case of supervised learning when
▪ As discussed, different inductive biases lead to different ℋ
▪ The ℋ containing only sparse rule-based classifiers is interpretable (Freitas, 2014)
▪ Interpretability implies inherently safe design

Rule-Based Classifiers
▪ Single rule is an AND clause of a few Boolean terms
▪ Predict Federer to defeat Murray if he:
– Wins more than 59% of 4 to 9 shot rallies AND
– Wins more than 78% of points when serving at 30-30 or Deuce AND
– Serves less than 20% of serves into the body
▪ Rule set is a collection of single rules

Challenges of Rule Learning
▪ Finding compact decision rules involving few Boolean terms that best approximate
a given data set is an NP hard combinatorial optimization problem
▪ Most existing approaches for finding maximize criteria such as information gain,
support, confidence, lift, Gini impurity, etc.
– Decision trees, decision lists, RIPPER, SLIPPER, etc.
– Greedy heuristics with ad hoc pruning
▪ Renewed interest in rule learning driven by optimizing a principled objective, but
which retains interpretability (Rudin, 2015)

Group Testing Problem
▪ Discover a sparse subset of faulty items in a large set of mostly good items
using a few pooled tests
– Blood screening of large groups of army recruits
– Computational biology
– Fault discovery in computer networks
▪ Mix together the blood of several recruits
– If test is negative, none of the recruits are diseased
– If test is positive, at least one of the recruits is diseased
– Logical OR operation
▪ Construct the pools in an intelligent way to require a small number of tests with
perfect recovery of diseased individuals

Rule Learning as Group Testing
▪ Standard supervised binary classification problem
– with features and Boolean labels
▪ Construct individual Boolean clauses from features , for
– Months since promoted > 13
– Compensation plan == quota-based
– For continuous dimensions of X, make comparisons to set of thresholds
▪ Calculate the truth value of each Boolean term for each training sample to
construct an truth table matrix with entries

Rule Learning as Group Testing (continued)
▪ The positive training samples are now equivalent to diseased pools of army
recruits
▪ Determine an Boolean coefficient vector that specifies which Boolean terms to
OR together in a decision rule to recover the positive samples
▪ Learn so that , where Boolean notation means:
▪ AND rules are more natural than OR rules for people
– DeMorgan’s laws

Discussion
▪ Efficient optimization by relaxation with provable guarantees
▪ Achieve competitive accuracy and high interpretability

Selected 
Recent Explainability
Innovations from IBM
Research AI
Global, Directly Interpretable
Boolean Decision Rules via Column Generation
NIPS 2018 (accepted). Dash, Gunluk, Wei.
Global, Post-Hoc
Improving Simple Models with Confidence Profiles
NIPS 2018 (accepted). Dhurandhar, Shanmugam, Luss, Olsen.
Local, Directly Interpretable
Teaching Meaningful Explanations
AAAI 2019 (submitted). Codella, Hind, Ramamurthy, Campbell, Dhurandhar, Varshney, Wei, Mojsilovic.
Local, Post-Hoc
Explanations Based on the Missing: Towards Contrastive Explanations with Pertinent
Negatives
NIPS 2018 (accepted). Dhurandhar, Chen, Luss, Tu, Ting, Shanmugam, Das.
Interactive Model Visualization
Seq2Seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models
IEEE VAST 2018 (accepted). Strobelt, Gehrmann, Behrisch, Perer, Pfister, Rush.

Global, Post-Hoc
Improving Simple Models with Confidence Profiles
NIPS 2018. Dhurandhar, Shanmugam, Luss, Olsen.
Domain experts preference: SMEs many times want to use a model they are comfortable
with and trust.
Small Data settings: Small client data where complex neural networks might overfit so
simple models are preferred. However, possible to improve performance if we have a
pretrained network on a large public or private corpora belonging to the same feature space.
Resource Constrained Settings: In memory and power constrained settings only small
models can be deployed. Here improving small neural networks with the help of larger
neural network can be useful.
Challenge: Transfer information from complex models with higher accuracy to the decision
trees in a manner that enhances performance and adds valuable insight.
Method: Find area under curve (AUC) based on confidence scores at each probe for each
example based on complex neural network. Weight each training example by corresponding
AUC (or optimal weighting procedure) and retrain simple model such as a decision tree.

Local, Directly Interpretable 
 
 
Teaching Meaningful Explanations
AAAI 2019 (submitted). Codella, Hind, Ramamurthy, Campbell, Dhurandhar, Varshney, Wei, Mojsilovic.
• Ask SME to provide explanations (e) with their training
data (x, y, e)
• Train ML algorithm on features to jointly predict label (y)
and explanations (e)
• Apply classifier on new features inputs to produce label
(y) and explanation (e)

Local, Post-Hoc
Explanations Based on the Missing: Towards Contrastive Explanations with Pertinent Negatives
NIPS 2018. Dhurandhar, Chen, Luss, Tu, Ting, Shanmugam, Das.
3
• Current methods focus mostly on what is present
based on positive and negative relevance.
• Our method highlights what should be minimally
sufficiently present and necessarily absent.

Interactive Model Visualization 
 
 
Seq2Seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models
IEEE VAST 2018. Strobelt, Gehrmann, Behrisch, Perer, Pfister, Rush.

Unwanted bias and algorithmic fairness 
Machine learning, by its very nature, is always a form of statistical discrimination
Discrimination becomes
objectionable when it places certain
privileged groups at systematic
advantage and certain unprivileged
groups at systematic disadvantage
Illegal in certain contexts

Unwanted bias and algorithmic fairness 
Machine learning, by its very nature, is always a form of statistical discrimination
Unwanted bias in training data
yields models with unwanted bias
that scale out
Prejudice in labels
Undersampling or oversampling

Fairness in building and deploying models
(d’Alessandro et al., 2017)

Metrics, Algorithms
dataset
metric
pre-
processing
algorithm in-processing
algorithm
post-
processing
algorithm
classifier
metric

Metrics, Algorithms, and Explainers
dataset
metric
pre-
processing
algorithm in-processing
algorithm
post-
processing
algorithm
classifier
metric
classifier
metric
explainer
dataset
metric
explainer

21 (or more) definitions of fairness 
Group and individual fairness
Disparate impact, statistical parity, equality of odds, equality of opportunity
There is no one definition of fairness applicable in all contexts
Some definitions even conflict

Bias mitigation is not easy 
Cannot simply drop protected attributes because features are correlated with them

(Hardt, 2017)
Research 
Algorithmic fairness is one of the hottest topics in the ML/AI research community

F. P. Calmon, D. Wei, B. Vinzamuri, K. N. Ramamurthy, and K. R. Varshney,
“Optimized Data Pre-Processing for Discrimination Prevention,” Advances in
Neural Information Processing Systems, Dec. 2017.

Pre-Processing for Flexible Discrimination Prevention
(Biased) Data Model Decision
Pre-processing
In-processing
Post-processing
Published Data

Our Approach
▪ Given dataset containing
• – discriminatory (protected) variables
• – other (non-protected) variables
• – outcome
▪ Learn optimized (randomized) mapping
▪ Map both train and test data before model
Race Age Degree Priors Recid
Black >45 Misd. 0 N
White 25-45 Felony 0 N
Black <25 Felony 1-3 Y
White >45 Misd. 0 N
Race Age Degree Priors Recid
Black >45 Felony 0 N
White >45 Felony 0 N
Black <25 Felony 1-3 N
White >45 Misd. 0 N

Optimization Criteria
1. Group discrimination
Control dependence of
transformed outcome on
3. Utility preservation
Retain joint distribution so model
can still learn task
2. Individual distortion
Avoid large changes in individual
features
𝑥, 𝑦
^𝑥, ^𝑦
𝛿
𝑑1 𝑑2

Original COMPAS Dataset
re-commit crime
race
red = misdemeanor 
blue = felony

Pre-Processed COMPAS Dataset
race
red = misdemeanor 
blue = felony
re-commit crime

Interpretable Classifier Learned from Original Dataset
prior crimes
race
> 0
predicted not to  
re-offend
= 0
prior crimes age
Caucasian
African-American
re-offend
predicted to  
re-offend
> 3 ≤ 3
re-offend
predicted to  
re-offend
< 45 ≥ 45

Interpretable Classifier Learned from Pre-Processed Dataset
prior crimes
predicted to  
re-offend
> 3
re-offend
≤ 3

Balanced Accuracy vs. Fairness
For a classifier trained with original training data, at the
best classification rate, this measure is quite high. This
implies unfairness.
abs(1-disparate impact) must be close to zero (typically < 0.2) for classifier predictions to be fair.
For a classifier trained with transformed training data,
at the best classification rate, the measure drops
significantly, implying bias mitigation.

P. Sattigeri, S. C. Hoffman, V. Chenthamarakshan, and K. R. Varshney,
“Fairness GAN,” AAAI 2019, submitted.

Our Approach
An auxiliary classifier generative adversarial network (AC-GAN) with one main twist: try
to predict the protected attribute as poorly as possible from the features

1. Joint Conditional Generative
Model
Learn a generative model that
matches the true distribution .
3. GAN Formulation
Posed as an adversarial learning
problem and solved using novel
GAN formulation
2. Label and Sensitive
Attribute Independence
Maximization
Make outcome and sensitive
variable () independent in the
learned distribution.
Discriminator Training:
Generator Training:
Fairness GAN

Fairness GAN

05/03/18 Facebook says it has a tool to detect bias in its artificial intelligence Quartz
05/25/18 Microsoft is creating an oracle for catching biased AI algorithms MIT Technology Review
05/31/18 Pymetrics open-sources Audit AI, an algorithm bias detection tool VentureBeat
06/07/18 Google Education Guide to Responsible AI Practices – Fairness Google
06/09/18 Accenture wants to beat unfair AI with a professional toolkit TechCrunch

Fairness Measures
Framework to test given algorithm on variety of datasets and fairness
metrics
https://github.com/megantosh/
fairness_measures_code
Fairness
Comparison
Extensible test-bed to facilitate direct comparisons of algorithms with
respect to fairness measures. Includes raw & preprocessed datasets
https://github.com/algofairness/
fairness-comparison
Themis-ML
Python library built on scikit-learn that implements fairness-aware
machine learning algorithms
https://github.com/cosmicBboy/themis-
ml
FairML
Looks at significance of model inputs to quantify prediction
dependence on inputs
https://github.com/adebayoj/fairml
Aequitas
Web audit tool as well as python lib. Generates bias report for given
model and dataset
https://github.com/dssg/aequitas
Fairtest
Tests for associations between algorithm outputs and protected
populations
https://github.com/columbia/fairtest
Themis
Takes a black-box decision-making procedure and designs test cases
automatically to explore where the procedure might be exhibiting
group-based or causal discrimination
https://github.com/LASER-UMASS/
Themis
Audit-AI
Python library built on top of scikit-learn with various statistical tests
for classification and regression tasks
https://github.com/pymetrics/audit-ai

AI Fairness 360  Differentiation
Datasets
Toolbox
Fairness metrics (30+)
Fairness metric explanations
Bias mitigation algorithms (9+)
Guidance
Industry-specific tutorials
Comprehensive bias mitigation toolbox (including
unique algorithms from IBM Research)
Several metrics and algorithms that have no available
implementations elsewhere
Extensible
Designed to translate new research from the lab to
industry practitioners 
(e.g. scikit-learn’s fit/predict paradigm)

Supplier’s Declarations of Conformity
▪ A common factsheet voluntarily released with products in many different industries
▪ Intended use, provenance, etc.
▪ Reliability, safety, and security test results
▪ Transparency for the final step of achieving trust
▪ Information asymmetry / market for lemons

Supplier’s Declarations of Conformity
▪ What is the intended use of the service output?
▪ What algorithms or techniques does this service implement?
▪ Which datasets was the service tested on? (Provide links to datasets that were used for testing, along with corresponding
datasheets.)
▪ Describe the testing methodology.
▪ Describe the test results.
▪ Are you aware of possible examples of bias, ethical issues, or other safety risks as a result of using the service?
▪ Are the service outputs explainable and/or interpretable?
▪ For each dataset used by the service: Was the dataset checked for bias? What efforts were made to ensure that it is fair and
representative?
▪ Does the service implement and perform any bias detection and remediation?
▪ What is the expected performance on unseen data or data with different distributions?
▪ Was the service checked for robustness against adversarial attacks?
▪ When were the models last updated?

Summary and Conclusion
▪ Considerations beyond accuracy in enterprise AI
▪ Very basic definition of safety in terms of harm, risk, uncertainty
▪ Minimization of epistemic uncertainty is missing from standard modes of machine
learning developed around risk minimization
▪ Research agenda motivated by trust
– Risk minimization close to the limits of what is achievable
– Epistemic uncertainty minimization has new and exciting problems
– Some examples of strategies for increasing safety in machine learning, but far from complete
– Interpretability, fairness, SDoCs
– Open source toolbox: AI Fairness 360 (aif360)

http://aif360.mybluemix.net 
https://github.com/ibm/aif360 
https://pypi.org/project/aif360

NYAI #24: Developing Trust in Artificial Intelligence and Machine Learning for High-Stakes Applications w/ Dr. Kush Varshney (IBM Research AI)

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à NYAI #24: Developing Trust in Artificial Intelligence and Machine Learning for High-Stakes Applications w/ Dr. Kush Varshney (IBM Research AI)

Similaire à NYAI #24: Developing Trust in Artificial Intelligence and Machine Learning for High-Stakes Applications w/ Dr. Kush Varshney (IBM Research AI) (20)

Plus de Maryam Farooq

Plus de Maryam Farooq (15)

Dernier

Dernier (20)

NYAI #24: Developing Trust in Artificial Intelligence and Machine Learning for High-Stakes Applications w/ Dr. Kush Varshney (IBM Research AI)