SlideShare une entreprise Scribd logo
1  sur  46
Fairness in Machine Learning
Delip Rao
Metrics
Every ML practitioners dream scenario
Well defined eval objective
Lots of clean data
Rich data with lot of attributes
Incorporating Ethnicity improves Engagement
Metrics
But should you do it?
Report Link
link
Goldstein, the “computer expert”
Dramatic Changes in Machine Learning
Landscape
Rise of fast/cheap data collection, processing
Rise of popular, easy-to-use tools
Rise of Data Scientist Factories
Two Questions
Should everything that can be predicted, be predicted?
If you really have to predict, what should you be aware of?
“Catalog of Evils”
Dwork et al 2011
Sc
S
protected
class
population
Blatant Explicit Discrimination
Feature4231:Race=’Black’
Discrimination Based on Redundant Encoding
Feature4231:Race=’Black’
Features = {‘loc’, ‘income’, ..}
Polynomial kernel with degree 2
Feature6578:Loc=’EastOakland’^Income=’<10k’
Big Data
There is no
data like more
data.
Big Data
Classifier error rate
Number of training
examples in your data
Most ML Objective functions create
models accurate for the majority class at
the expense of the protected class
Cultural differences can throw a wrench in your
models
Look at Error Cases vs. Error Rates
Macros - Accuracy, RMSE, F1, etc
vs.
Individuals
Becoming Responsible Gatekeepers
We are pretty good at learning
function approximations today
Image Credit: Jason Eisner, the three cultures of ML, 2016
NNs &
Decision Trees
Learning methods that introduce fairness
Ways to characterize fairness
Need
How can we characterize fairness?
What does fairness even mean?
Group Fairness vs. Individual Fairness
How can we characterize fairness?
One way to characterize group fairness is to ensure both majority and the
protected population have similar outcomes.
or
P(FavorableOutcome | S) : P(FavorableOutcome | Sc) = 1 : 1
How can we characterize fairness?
One way to characterize group fairness is to ensure both majority and the
protected population have similar outcomes.
or
P(FavorableOutcome | S) : P(FavorableOutcome | Sc) = 1 : 1
often this is hard to achieve.
For example, for jobs, the EEOC specifies this ratio should be no less than 0.8 : 1
(aka 80% rule).
Characterizing Fairness of a black box classifier
One way: Is classifier outcome correlated with membership in
S?
Fairness as a constraint
Is classifier outcome correlated with membership in S?
Sensitive attributes
Decision function
Want
Fairness as a constraint
Constraint to be added:
Supervised Learning with Fairness Constraint
minimize
such that
Zafar et al, ICML 2015
“If we allowed a model to be
used for college admissions in
1870, we’d still have 0.7% of
women going to college.”
Recommended Reading
Reading List
There’s much material on fairness in data-driven decision/policy making from
literature in
- law
- sociology
- political science
- computer science/machine learning
- economics
(the machine learning literature is nascent, only around 2009 onwards)
Reading List (Fairness in ML)
Pedreschi, Dino, Salvatore Ruggieri, and Franco Turini. "Measuring Discrimination in Socially-Sensitive
Decision Records." SDM. 2009.
Kamiran, Faisal, and Toon Calders. "Classifying without discriminating."Computer, Control and
Communication, 2009. IC4 2009. 2nd International Conference on. IEEE, 2009.
Dwork, Cynthia, et al. "Fairness through awareness." Proceedings of the 3rd Innovations in
Theoretical Computer Science Conference. ACM, 2012
Romei, Andrea, and Salvatore Ruggieri. "A multidisciplinary survey on discrimination analysis."
The Knowledge Engineering Review 29.05 (2014)
Reading List (Fairness in ML)
Friedler, Sorelle, Carlos Scheidegger, and Suresh Venkatasubramanian. "Certifying and removing
disparate impact." CoRR (2014).
Barocas, Solon and Selbst, Andrew D., Big Data's Disparate Impact (August 14, 2015). California
Law Review, Vol. 104,
Zafar, Muhammad Bilal, et al. "Fairness Constraints: A Mechanism for Fair Classification." arXiv preprint
arXiv:1507.05259 (2015).
Zliobaite, Indre. "On the relation between accuracy and fairness in binary classification." arXiv preprint
arXiv:1505.05723 (2015).
Other resources
NSF’s “Big Data Innovation Hubs” were created in part to address these
challenges
http://www.nsf.gov/news/news_summ.jsp?cntn_id=136784
Stanford Law Review touches upon this topic regularly
http://www.stanfordlawreview.org/online/privacy-and-big-data
Fairness blog
http://fairness.haverford.edu
Academic: FATML workshops (NIPS 2014, ICML 2015)
www.fatml.org
Lessons
Discrimination is an emergent property of any learning algorithm
Watch out for discrimination (implicitly) encoded in features
Big Data can cause Big Problems
Watch out for the proportion of the protected classes
Always do error analysis with protected classes in mind
Notions of fairness are nascent at best. Involve as many people to improve
understanding.
There is no one best notion of fairness
questions
@deliprao / delip@joostware.com

Contenu connexe

Tendances

Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...Krishnaram Kenthapadi
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Krishnaram Kenthapadi
 
Say "Hi!" to Your New Boss
Say "Hi!" to Your New BossSay "Hi!" to Your New Boss
Say "Hi!" to Your New BossAndreas Dewes
 
Responsible AI in Industry (ICML 2021 Tutorial)
Responsible AI in Industry (ICML 2021 Tutorial)Responsible AI in Industry (ICML 2021 Tutorial)
Responsible AI in Industry (ICML 2021 Tutorial)Krishnaram Kenthapadi
 
Fairness, Transparency, and Privacy in AI @LinkedIn
Fairness, Transparency, and Privacy in AI @LinkedInFairness, Transparency, and Privacy in AI @LinkedIn
Fairness, Transparency, and Privacy in AI @LinkedInC4Media
 
Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsKrishnaram Kenthapadi
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
 
Algorithmic Impact Assessment: Fairness, Robustness and Explainability in Aut...
Algorithmic Impact Assessment: Fairness, Robustness and Explainability in Aut...Algorithmic Impact Assessment: Fairness, Robustness and Explainability in Aut...
Algorithmic Impact Assessment: Fairness, Robustness and Explainability in Aut...Adriano Soares Koshiyama
 
Explainable AI in Healthcare
Explainable AI in HealthcareExplainable AI in Healthcare
Explainable AI in Healthcarevonaurum
 
NYAI #24: Developing Trust in Artificial Intelligence and Machine Learning fo...
NYAI #24: Developing Trust in Artificial Intelligence and Machine Learning fo...NYAI #24: Developing Trust in Artificial Intelligence and Machine Learning fo...
NYAI #24: Developing Trust in Artificial Intelligence and Machine Learning fo...Maryam Farooq
 
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...Sri Ambati
 
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)Krishnaram Kenthapadi
 
Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsKrishnaram Kenthapadi
 
Responsible AI
Responsible AIResponsible AI
Responsible AIAnand Rao
 
Privacy in AI/ML Systems: Practical Challenges and Lessons Learned
Privacy in AI/ML Systems: Practical Challenges and Lessons LearnedPrivacy in AI/ML Systems: Practical Challenges and Lessons Learned
Privacy in AI/ML Systems: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
 
Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)Krishnaram Kenthapadi
 
Algorithmic Bias: Challenges and Opportunities for AI in Healthcare
Algorithmic Bias:  Challenges and Opportunities for AI in HealthcareAlgorithmic Bias:  Challenges and Opportunities for AI in Healthcare
Algorithmic Bias: Challenges and Opportunities for AI in HealthcareGregory Nelson
 
Interpretability beyond feature attribution quantitative testing with concept...
Interpretability beyond feature attribution quantitative testing with concept...Interpretability beyond feature attribution quantitative testing with concept...
Interpretability beyond feature attribution quantitative testing with concept...MLconf
 
Explainable AI with H2O Driverless AI's MLI module
Explainable AI with H2O Driverless AI's MLI moduleExplainable AI with H2O Driverless AI's MLI module
Explainable AI with H2O Driverless AI's MLI moduleMartin Dvorak
 
Algorithmic fairness
Algorithmic fairnessAlgorithmic fairness
Algorithmic fairnessAnthonyMelson
 

Tendances (20)

Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
 
Say "Hi!" to Your New Boss
Say "Hi!" to Your New BossSay "Hi!" to Your New Boss
Say "Hi!" to Your New Boss
 
Responsible AI in Industry (ICML 2021 Tutorial)
Responsible AI in Industry (ICML 2021 Tutorial)Responsible AI in Industry (ICML 2021 Tutorial)
Responsible AI in Industry (ICML 2021 Tutorial)
 
Fairness, Transparency, and Privacy in AI @LinkedIn
Fairness, Transparency, and Privacy in AI @LinkedInFairness, Transparency, and Privacy in AI @LinkedIn
Fairness, Transparency, and Privacy in AI @LinkedIn
 
Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML Systems
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
 
Algorithmic Impact Assessment: Fairness, Robustness and Explainability in Aut...
Algorithmic Impact Assessment: Fairness, Robustness and Explainability in Aut...Algorithmic Impact Assessment: Fairness, Robustness and Explainability in Aut...
Algorithmic Impact Assessment: Fairness, Robustness and Explainability in Aut...
 
Explainable AI in Healthcare
Explainable AI in HealthcareExplainable AI in Healthcare
Explainable AI in Healthcare
 
NYAI #24: Developing Trust in Artificial Intelligence and Machine Learning fo...
NYAI #24: Developing Trust in Artificial Intelligence and Machine Learning fo...NYAI #24: Developing Trust in Artificial Intelligence and Machine Learning fo...
NYAI #24: Developing Trust in Artificial Intelligence and Machine Learning fo...
 
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
 
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
 
Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML Systems
 
Responsible AI
Responsible AIResponsible AI
Responsible AI
 
Privacy in AI/ML Systems: Practical Challenges and Lessons Learned
Privacy in AI/ML Systems: Practical Challenges and Lessons LearnedPrivacy in AI/ML Systems: Practical Challenges and Lessons Learned
Privacy in AI/ML Systems: Practical Challenges and Lessons Learned
 
Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)
 
Algorithmic Bias: Challenges and Opportunities for AI in Healthcare
Algorithmic Bias:  Challenges and Opportunities for AI in HealthcareAlgorithmic Bias:  Challenges and Opportunities for AI in Healthcare
Algorithmic Bias: Challenges and Opportunities for AI in Healthcare
 
Interpretability beyond feature attribution quantitative testing with concept...
Interpretability beyond feature attribution quantitative testing with concept...Interpretability beyond feature attribution quantitative testing with concept...
Interpretability beyond feature attribution quantitative testing with concept...
 
Explainable AI with H2O Driverless AI's MLI module
Explainable AI with H2O Driverless AI's MLI moduleExplainable AI with H2O Driverless AI's MLI module
Explainable AI with H2O Driverless AI's MLI module
 
Algorithmic fairness
Algorithmic fairnessAlgorithmic fairness
Algorithmic fairness
 

Similaire à Fairness in Machine Learning

Fairness in Machine Learning @Codemotion
Fairness in Machine Learning @CodemotionFairness in Machine Learning @Codemotion
Fairness in Machine Learning @CodemotionAzzurra Ragone
 
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...Data Driven Innovation
 
Becoming Datacentric
Becoming DatacentricBecoming Datacentric
Becoming DatacentricTimothy Cook
 
Frankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / MedicineFrankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / MedicineFrank Rybicki
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedPhilip Bourne
 
Coming to an Understanding: a Cross-institutional Examination of Assessments ...
Coming to an Understanding: a Cross-institutional Examination of Assessments ...Coming to an Understanding: a Cross-institutional Examination of Assessments ...
Coming to an Understanding: a Cross-institutional Examination of Assessments ...Stephanie Wright
 
Predictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal BallPredictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal BallDATAVERSITY
 
Machine Learning Pitfalls
Machine Learning Pitfalls Machine Learning Pitfalls
Machine Learning Pitfalls Dan Elton
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodKarry Lu
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data SciencePhilip Bourne
 
IE_expressyourself_EssayH
IE_expressyourself_EssayHIE_expressyourself_EssayH
IE_expressyourself_EssayHjk6653284
 
Avoiding Machine Learning Pitfalls 2-10-18
Avoiding Machine Learning Pitfalls 2-10-18Avoiding Machine Learning Pitfalls 2-10-18
Avoiding Machine Learning Pitfalls 2-10-18Dan Elton
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical DataPaul Agapow
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DUniversity of Washington
 
Qualitative Legal Prediction - Prof. Daniel Katz
Qualitative Legal Prediction - Prof. Daniel KatzQualitative Legal Prediction - Prof. Daniel Katz
Qualitative Legal Prediction - Prof. Daniel Katzsmahboobani
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009Ian Foster
 
Module 1.3 data exploratory
Module 1.3  data exploratoryModule 1.3  data exploratory
Module 1.3 data exploratorySara Hooker
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 

Similaire à Fairness in Machine Learning (20)

Fairness in Machine Learning @Codemotion
Fairness in Machine Learning @CodemotionFairness in Machine Learning @Codemotion
Fairness in Machine Learning @Codemotion
 
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...
 
Becoming Datacentric
Becoming DatacentricBecoming Datacentric
Becoming Datacentric
 
Frankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / MedicineFrankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / Medicine
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Coming to an Understanding: a Cross-institutional Examination of Assessments ...
Coming to an Understanding: a Cross-institutional Examination of Assessments ...Coming to an Understanding: a Cross-institutional Examination of Assessments ...
Coming to an Understanding: a Cross-institutional Examination of Assessments ...
 
Predictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal BallPredictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal Ball
 
Machine Learning Pitfalls
Machine Learning Pitfalls Machine Learning Pitfalls
Machine Learning Pitfalls
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For Good
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data Science
 
Show me the data! Actionable insight from open courses
Show me the data! Actionable insight from open coursesShow me the data! Actionable insight from open courses
Show me the data! Actionable insight from open courses
 
IE_expressyourself_EssayH
IE_expressyourself_EssayHIE_expressyourself_EssayH
IE_expressyourself_EssayH
 
Oracle openworld-presentation
Oracle openworld-presentationOracle openworld-presentation
Oracle openworld-presentation
 
Avoiding Machine Learning Pitfalls 2-10-18
Avoiding Machine Learning Pitfalls 2-10-18Avoiding Machine Learning Pitfalls 2-10-18
Avoiding Machine Learning Pitfalls 2-10-18
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&D
 
Qualitative Legal Prediction - Prof. Daniel Katz
Qualitative Legal Prediction - Prof. Daniel KatzQualitative Legal Prediction - Prof. Daniel Katz
Qualitative Legal Prediction - Prof. Daniel Katz
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009
 
Module 1.3 data exploratory
Module 1.3  data exploratoryModule 1.3  data exploratory
Module 1.3 data exploratory
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 

Dernier

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Dernier (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

Fairness in Machine Learning

  • 1. Fairness in Machine Learning Delip Rao
  • 2.
  • 4. Every ML practitioners dream scenario Well defined eval objective Lots of clean data Rich data with lot of attributes
  • 5. Incorporating Ethnicity improves Engagement Metrics But should you do it?
  • 6.
  • 7.
  • 9.
  • 10.
  • 11. link
  • 13. Dramatic Changes in Machine Learning Landscape
  • 14. Rise of fast/cheap data collection, processing
  • 15. Rise of popular, easy-to-use tools
  • 16. Rise of Data Scientist Factories
  • 17.
  • 18. Two Questions Should everything that can be predicted, be predicted? If you really have to predict, what should you be aware of?
  • 22. Discrimination Based on Redundant Encoding Feature4231:Race=’Black’ Features = {‘loc’, ‘income’, ..} Polynomial kernel with degree 2 Feature6578:Loc=’EastOakland’^Income=’<10k’
  • 23.
  • 24. Big Data There is no data like more data.
  • 25. Big Data Classifier error rate Number of training examples in your data
  • 26. Most ML Objective functions create models accurate for the majority class at the expense of the protected class
  • 27. Cultural differences can throw a wrench in your models
  • 28. Look at Error Cases vs. Error Rates Macros - Accuracy, RMSE, F1, etc vs. Individuals
  • 30. We are pretty good at learning function approximations today
  • 31. Image Credit: Jason Eisner, the three cultures of ML, 2016 NNs & Decision Trees
  • 32. Learning methods that introduce fairness Ways to characterize fairness Need
  • 33. How can we characterize fairness? What does fairness even mean? Group Fairness vs. Individual Fairness
  • 34. How can we characterize fairness? One way to characterize group fairness is to ensure both majority and the protected population have similar outcomes. or P(FavorableOutcome | S) : P(FavorableOutcome | Sc) = 1 : 1
  • 35. How can we characterize fairness? One way to characterize group fairness is to ensure both majority and the protected population have similar outcomes. or P(FavorableOutcome | S) : P(FavorableOutcome | Sc) = 1 : 1 often this is hard to achieve. For example, for jobs, the EEOC specifies this ratio should be no less than 0.8 : 1 (aka 80% rule).
  • 36. Characterizing Fairness of a black box classifier One way: Is classifier outcome correlated with membership in S?
  • 37. Fairness as a constraint Is classifier outcome correlated with membership in S? Sensitive attributes Decision function Want
  • 38. Fairness as a constraint Constraint to be added:
  • 39. Supervised Learning with Fairness Constraint minimize such that Zafar et al, ICML 2015
  • 40. “If we allowed a model to be used for college admissions in 1870, we’d still have 0.7% of women going to college.” Recommended Reading
  • 41. Reading List There’s much material on fairness in data-driven decision/policy making from literature in - law - sociology - political science - computer science/machine learning - economics (the machine learning literature is nascent, only around 2009 onwards)
  • 42. Reading List (Fairness in ML) Pedreschi, Dino, Salvatore Ruggieri, and Franco Turini. "Measuring Discrimination in Socially-Sensitive Decision Records." SDM. 2009. Kamiran, Faisal, and Toon Calders. "Classifying without discriminating."Computer, Control and Communication, 2009. IC4 2009. 2nd International Conference on. IEEE, 2009. Dwork, Cynthia, et al. "Fairness through awareness." Proceedings of the 3rd Innovations in Theoretical Computer Science Conference. ACM, 2012 Romei, Andrea, and Salvatore Ruggieri. "A multidisciplinary survey on discrimination analysis." The Knowledge Engineering Review 29.05 (2014)
  • 43. Reading List (Fairness in ML) Friedler, Sorelle, Carlos Scheidegger, and Suresh Venkatasubramanian. "Certifying and removing disparate impact." CoRR (2014). Barocas, Solon and Selbst, Andrew D., Big Data's Disparate Impact (August 14, 2015). California Law Review, Vol. 104, Zafar, Muhammad Bilal, et al. "Fairness Constraints: A Mechanism for Fair Classification." arXiv preprint arXiv:1507.05259 (2015). Zliobaite, Indre. "On the relation between accuracy and fairness in binary classification." arXiv preprint arXiv:1505.05723 (2015).
  • 44. Other resources NSF’s “Big Data Innovation Hubs” were created in part to address these challenges http://www.nsf.gov/news/news_summ.jsp?cntn_id=136784 Stanford Law Review touches upon this topic regularly http://www.stanfordlawreview.org/online/privacy-and-big-data Fairness blog http://fairness.haverford.edu Academic: FATML workshops (NIPS 2014, ICML 2015) www.fatml.org
  • 45. Lessons Discrimination is an emergent property of any learning algorithm Watch out for discrimination (implicitly) encoded in features Big Data can cause Big Problems Watch out for the proportion of the protected classes Always do error analysis with protected classes in mind Notions of fairness are nascent at best. Involve as many people to improve understanding. There is no one best notion of fairness

Notes de l'éditeur

  1. Say you are building a dating app like Tinder. This involves solving a recommendation problem -- recommend a bunch of profiles to be rated thumbs up/thumbs down. This is a very straightforward collaborative filtering problem. We can build really performant models, because in addition to the historical rating data, we also have a very rich profile data. Note: I’m using Tinder as a convenient example. This work has nothing to do with the Match Group or their dating app Tinder.
  2. Let’s say our goal is to improve the following two business/engagement metrics: % of right swipes % of matches (a match happens two people right swipe on top of each other)
  3. A good question ask yourself, “if this was a newspaper personals ad, would mentioning “looking for white guys/girls only” be tolerated? Will a newspaper publish such an ad? If not, how can we build apps that are essentially enabling that? “But that is what the people want! I am race blind, but I have to give my users what they want” People wanted segregated bathrooms at some point. Perhaps some people still do. But as a society, and a collective conscious, we agreed that is not in the interest of the minorities and the vulnerable classes, and it promotes discrimination. So why are we okay in building the online equivalent of segregated bathrooms?
  4. This disparity can arise not just from machine learning, but in any kind of data-driven policy making, which is becoming the norm today. Consider a city fixing potholes.
  5. Imagine if there was an app to report the potholes, and the city could send somebody to fix them. Crowdsourcing for efficient governance. Sounds like a good idea, right? What if most of the complaints came from well-to-do neighborhoods, because they complain about every little thing, and most of the limited city road-repair resources were diverted to these well-to-do neighborhoods?
  6. The disparity caused by the street bump app was actually noted in this report commissioned by the White House in early 2014.
  7. From the report abstract
  8. Among the many call to actions, expanding technical expertise was identified as a major one.
  9. Talking about expanding technical expertise, consider this piece on predictive policing by Gillian Tett. This appeared 3 months after the White House report came out. Gillian is an experienced journalist, who railed against Wall Street quants for thoughtlessly deploying models.
  10. The same Gillian Tett, now writes this on Chicago’s predictive policing. “The program has nothing to do with race but multi-variable equations”.
  11. Today’s ML landscape is changing as we speak. Things are scaling in many dimensions.
  12. Scaling dimension 1: Data collection and processing It’s super easy to collect troves of data of all kinds, very cheap & fast to store/process it.
  13. Scaling dimension 2: Tools Today, pretty much anyone with basic programming skills (not even algorithmic) can build a predictive application. With startups building ML as a service, all you need is a rest endpoint and basic JS coding to build a model and serve predictions.
  14. Scaling dimension 3: Data Scientist Factories These are filling a need but not sure if that is the right way to fill ..
  15. Statistics used to be a rigorous discipline. That hasn’t stopped cargo-cult statisticians to popup and do bad statistics. Similarly, ML used to be a very academic discipline with practitioners having insights about the models they were training. Today that’s no longer the case. The curricula in most of these “data scientist factories” is questionable at best (8 weeks to learn ML/NLP/DataViz in one of them), and none of them have any time to get into the critical aspects about model thinking. What is worrisome is graduates from these places go on to work in places that build applications affecting people’s lives. At the very least, an awareness about the ethical issues of machine learning should be incorporated in every ML curriculum.
  16. Let’s say there’s a sensitive attribute or set of attributes (like gender, race, etc) that partitions a population into a protected class (or a minority class), and the majority class.
  17. Sometimes information about the sensitive attributes can “leak in” from other data sources, even if not explicitly encoded. For example, adding someone’s last name could divulge ethnicity or the location of an individual could correlate with race. Or in this example, know who your facebook friends are, and their sexual orientation, tells a lot about your sexual orientation! http://firstmonday.org/article/view/2611/2302 Often sensitive attributes like race and gender are redundantly encoded in other variables. For example, the text in tweets can reveal many demographic variables http://www.fastcompany.com/1769217/you-cant-keep-your-secrets-twitter
  18. Big data can cause big problems
  19. The very thing that makes big data helpful in driving error rates and producing better models, also explains why models perform poorly on the minority population. For minority populations, the number of training samples is dwarfed by the majority. One could try balancing the number of training samples in the majority, but that in turn creates an accuracy-fairness tradeoff.
  20. https://en.wikipedia.org/wiki/Nymwars
  21. If your classifier has 95% accuracy, and you deploy it in production, the 5% of the errors might be affecting a big chunk of the minority population.
  22. Not exhaustive, but should provide a good seed set. The ones in bold are a must read.