SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
Testing for Cognitive Bias in
AI Systems
Peter Varhol, Technology Strategy Research
About Me
• International speaker and writer
• Degrees in Math, CS, Psychology
• Technology communicator
• Former university professor, tech journalist
• Cat owner and distance runner
• peter@petervarhol.com
What You Will Learn
• What kind of systems produce nondeterministic results based on
data
• How machine learning results can be biased
• How to recognize and understand bias in machine learning
systems
Agenda
• What are machine learning and adaptive systems?
• How can software possibly be biased?
• How can we tell if they are biased?
• Can we do anything?
• Summary and conclusions
Machine Learning and Adaptive Systems
• We are building a different kind of software
• It never returns the same result
• That doesn’t make it wrong
• How can we assess if it’s correct?
• How do we know if there is a bug?
• How do we know if it’s biased?
Bug vs. Bias
• A bug is an identifiable and measurable error in process or result
• Usually fixed with a code change
• A bias is a systematic inflection in decisions that produces results
inconsistent with reality
• Bias can’t be fixed with a code change
How Does This Happen?
• The problem domain is ambiguous
• There is no single “right” answer
• “Close enough” can usually work
• As long as we can quantify “close enough”
• We don’t know quite why the software
responds as it does
• We can’t easily trace code paths
• We choose the data
• The software “learns” from past actions
How Can We Tell If It’s Biased?
• We look very carefully at the training data
• We set strict success criteria based on the system requirements
• We run many tests
• Most change parameters only slightly
• Some use radical inputs
• Compare results to success criteria
We Train Systems With Data
• We know the input data and expected results
• Input data is run through the algorithms and the results compared
to actual results
• The algorithms adjust parameters based on a comparison
between algorithm and actual
• What happens if the data systematically
misrepresents the domain?
Amazon Can’t Rid Its AI of Bias
• Amazon created an AI to crawl the web to find job candidates
• Training data was all resumes submitted for the last ten years
• In IT, the overwhelming majority were male
• The AI “learned” that males were superior for IT jobs
• Amazon couldn’t fix that training bias
Amazon Can’t Rid Its AI of Bias
• So just train the AI with the candidates hired
• Right?
• No, because males were also hired in overwhelming numbers
• Amazon hasn’t found a way to overcome this bias
• So they stopped using the tool
• Does all data have bias?
Many Systems Use Objective Data
• Electric wind sensor
• Determines wind speed and direction
• Based on the cooling of filaments
• Designed a three-layer neural network
• Then used the known data to train it
• Cooling in degrees of all four filaments
• Wind speed, direction
Can This Possibly Be Biased?
• Well, yes
• The training data could have been recorded in single
temperature/sunlight/humidity conditions
• Which could affect results under those conditions
• It’s a possible bias that doesn’t hurt anyone
• Or does it?
• Does anyone remember a certain O-ring?
Where Do Biases Come From?
• Data selection
• We choose training data that represents only one segment of the domain
• We limit our training data to certain times or seasons
• We overrepresent one population
• Or
• The problem domain has subtly changed
Where Do Biases Come From?
• Latent bias
• Concepts become incorrectly correlated
• Correlation does not mean causation
• But it be high enough to believe
• We could be promoting stereotypes
• This describes Amazon’s problem
Where Do Biases Come From?
• Interaction bias
• We may focus on keywords that users apply incorrectly
• User incorporates slang or unusual words
• “That’s bad, man”
• The story of Microsoft Tay
• It wasn’t bad, it was trained that way
Why Does Bias Matter?
• Wrong answers
• Often with no recourse
• Subtle discrimination (legal or illegal)
• And no one knows it
• Suboptimal results
• We’re not getting it right often enough
How Are These Systems Used?
• Transportation
• Self-driving cars
• Ecommerce
• Recommendation engines
• Finance
• Stock trading systems
• HR
• This is a scary one
• Medicine
• This is scarier
It’s Not Just AI
• All software has biases
• It’s written by people
• People make decisions on how to design and implement
• Bias is inevitable
• But can we find it and correct it?
• Do we have to?
Like This One
• A London doctor can’t get into her fitness center locker room
• The fitness center uses a “smart card” to access and record services
• While acknowledging the problem
• The fitness center couldn’t fix it
• But the software development team could
• They had hard-coded “doctor” to be synonymous with “male”
• It was meant as a convenient shortcut
About That Data
• We use data from the problem domain
• What’s that?
• In some cases, scientific measurements are accurate
• But we can choose the wrong measures
• Or not fully represent the problem domain
• But data can also be subjective
• We train with photos of one race over another
• We train with our own values of beauty
Another Example
• Retail recommendation engines
• Other people bought this
• You may also be interested in that
• They can bring in additional revenue
• But they can be very wrong sometimes
• The algorithms think we want to see
what they show us
Where Bias Can Be Very Wrong
• Brooks Ghost running shoes
• Versus ghost costumes
• Our data doesn’t take context into account
Challenges to Validating Requirements
• What does it mean to be correct?
• The result may be different every time
• There is no one single right answer
• How will this really work in production?
• How do I test it for bias?
Possible Answers
• Only look at outputs for given inputs
• And set accuracy parameters
• Don’t look at the outputs at all
• Focus on performance/usability/other features
• We can’t test accuracy
• Throw up our hands and go home
• We can’t easily define bias, let alone test for it
Testing Machine Learning Systems For Bias
• Have objective acceptance criteria
• Test with new data
• Don’t count on all results being accurate
• Understand the architecture of the network as a part of the testing
process
• Communicate the level of confidence you have in the results
Gearing Up to Test
• Think outside the box
• Understand your users intimately
• What will a user do?
• Know the training data
• Look for bias in the data
• Write tests to exploit edge cases
• Run many tests to look for trends
What is a Bug?
• A mismatch between inputs and outputs?
• It supposed to be that way!
• Not every recommendation will be a good one
• But that doesn’t mean it’s a bug
• Too many wrong answers
• Define too many
• Define wrong
We Found a Bug, Now What?
• The bug could be unrelated to the neural network
• Treat it as a normal bug
• If the neural network is involved
• Determine a definition of inaccurate
• Determine the likelihood of an inaccurate answer
• This may involve serious redevelopment
Bias and Truth and AI, Oh My
• All human decisions are biased in some way
• Do we want human decisions, or do we want correct ones?
• Human decisions come with human biases
• Can we get those out of our software?
• And will it be better if we do?
• Even with bias, computer decisions may be better
A Larger Problem
• What if we want the system to replicate how we would decide?
• Irrespective of bias
• We may accept bias if it is consistent
• After all, humans are biased
• This is a question with no easy answer in practice
Delivering in the Clutch
• Machines treat all events as equal
• Humans recognize the importance of some events
• And sometimes can rise to the occasion
• There is no mechanism for code to do this
• We could have data and algorithms to recognize
the importance of a specific event
• But the software cannot “improve” its answer
• This is less a bias than an inherent weakness
The Human in the Loop
• We don’t understand complex software systems
• Disasters often happen because software behaves in unexpected
ways
• Human oversight may prevent disasters, or wrong decisions
• Can we overcome human bias?
• The problem is that machines respond too quickly
• In many cases, there is not enough time for human oversight
Objections
• I will never encounter this type of application!
• You might be surprised
• I will do what I’ve always done
• No, you won’t
• My goals will be defined by others
• Unless they’re not
• You may be the one
Conclusions
• You can’t make intelligence from code
• Algorithms are thoughtless
• “Learning” systems are what we program and train
• No easy answer
• If we choose the data, it will likely be biased
• Recognize how your data fits into the problem domain
Testing for cognitive bias in ai systems
References
• https://qz.com/989137/when-a-robot-ai-doctor-misdiagnoses-you-
whos-to-blame/
• https://qz.com/1089555/artificial-intelligence-programmers-are-
trying-to-turn-code-into-intelligence-like-alchemists-tried-turning-
lead-into-gold/
• https://www.theatlantic.com/technology/archive/2017/09/saving-
the-world-from-code/540393/
• https://qz.com/work/1454396/amazons-sexist-hiring-algorithm-
could-still-be-better-than-a-human/
The End
Peter Varhol
peter@petervarhol.com
@pvarhol

Contenu connexe

Tendances

Worst practices in software testing by the Testing troll
Worst practices in software testing by the Testing trollWorst practices in software testing by the Testing troll
Worst practices in software testing by the Testing trollViktor Slavchev
 
Automated decision making with predictive applications – Big Data Frankfurt
Automated decision making with predictive applications – Big Data FrankfurtAutomated decision making with predictive applications – Big Data Frankfurt
Automated decision making with predictive applications – Big Data FrankfurtLars Trieloff
 
The good, the bad, and the metrics webinar hosted by xbo soft
The good, the bad, and the metrics webinar hosted by xbo softThe good, the bad, and the metrics webinar hosted by xbo soft
The good, the bad, and the metrics webinar hosted by xbo softXBOSoft
 
Automated Decision Making with Predictive Applications – Big Data Düsseldorf
Automated Decision Making with Predictive Applications – Big Data DüsseldorfAutomated Decision Making with Predictive Applications – Big Data Düsseldorf
Automated Decision Making with Predictive Applications – Big Data DüsseldorfLars Trieloff
 
"Worst" practices of software testing
"Worst" practices of software testing"Worst" practices of software testing
"Worst" practices of software testingViktor Slavchev
 
Rekard Edgren - Curing Our Binary Disease - EuroSTAR 2012
Rekard Edgren - Curing Our Binary Disease - EuroSTAR 2012Rekard Edgren - Curing Our Binary Disease - EuroSTAR 2012
Rekard Edgren - Curing Our Binary Disease - EuroSTAR 2012TEST Huddle
 
Test automation – the bitter truth
Test automation – the bitter truthTest automation – the bitter truth
Test automation – the bitter truthViktor Slavchev
 
Data science unit 1 By: Professor Lili Saghafi
Data science unit 1 By: Professor Lili Saghafi Data science unit 1 By: Professor Lili Saghafi
Data science unit 1 By: Professor Lili Saghafi Professor Lili Saghafi
 
Automating fetal heart monitor using machine learning
Automating fetal heart monitor using machine learningAutomating fetal heart monitor using machine learning
Automating fetal heart monitor using machine learningTamjid Rayhan
 
Learn Learning + Prototype Testing
Learn Learning + Prototype TestingLearn Learning + Prototype Testing
Learn Learning + Prototype TestingDave Hora
 
Machine Learning Vital Signs
Machine Learning Vital SignsMachine Learning Vital Signs
Machine Learning Vital SignsDonald Miner
 
Advancing Testing Using Axioms
Advancing Testing Using AxiomsAdvancing Testing Using Axioms
Advancing Testing Using AxiomsSQALab
 
Hindsight lessons about API testing
Hindsight lessons about API testingHindsight lessons about API testing
Hindsight lessons about API testingViktor Slavchev
 
Renat Gilmanov "Visual search results validation or where is my ninja turtles?"
Renat Gilmanov "Visual search results validation or where is my ninja turtles?"Renat Gilmanov "Visual search results validation or where is my ninja turtles?"
Renat Gilmanov "Visual search results validation or where is my ninja turtles?"Lviv Startup Club
 
Uncertainty Quantification in Complex Physical Systems. (An Inroduction)
Uncertainty Quantification in Complex Physical Systems. (An Inroduction)Uncertainty Quantification in Complex Physical Systems. (An Inroduction)
Uncertainty Quantification in Complex Physical Systems. (An Inroduction)Ogechi Onuoha
 
Data Science Toolkit for Product Managers
Data Science Toolkit for Product ManagersData Science Toolkit for Product Managers
Data Science Toolkit for Product ManagersMahmoud Jalajel
 
Data science toolkit for product managers
Data science toolkit for product managers Data science toolkit for product managers
Data science toolkit for product managers ProductFolks
 
Building a Successful Organization By Mastering Failure
Building a Successful Organization By Mastering FailureBuilding a Successful Organization By Mastering Failure
Building a Successful Organization By Mastering Failurejgoulah
 
How to think smarter about software development
How to think smarter about software developmentHow to think smarter about software development
How to think smarter about software developmentNilanjan Bhattacharya
 
Test-Driven Development
 Test-Driven Development  Test-Driven Development
Test-Driven Development Amir Assad
 

Tendances (20)

Worst practices in software testing by the Testing troll
Worst practices in software testing by the Testing trollWorst practices in software testing by the Testing troll
Worst practices in software testing by the Testing troll
 
Automated decision making with predictive applications – Big Data Frankfurt
Automated decision making with predictive applications – Big Data FrankfurtAutomated decision making with predictive applications – Big Data Frankfurt
Automated decision making with predictive applications – Big Data Frankfurt
 
The good, the bad, and the metrics webinar hosted by xbo soft
The good, the bad, and the metrics webinar hosted by xbo softThe good, the bad, and the metrics webinar hosted by xbo soft
The good, the bad, and the metrics webinar hosted by xbo soft
 
Automated Decision Making with Predictive Applications – Big Data Düsseldorf
Automated Decision Making with Predictive Applications – Big Data DüsseldorfAutomated Decision Making with Predictive Applications – Big Data Düsseldorf
Automated Decision Making with Predictive Applications – Big Data Düsseldorf
 
"Worst" practices of software testing
"Worst" practices of software testing"Worst" practices of software testing
"Worst" practices of software testing
 
Rekard Edgren - Curing Our Binary Disease - EuroSTAR 2012
Rekard Edgren - Curing Our Binary Disease - EuroSTAR 2012Rekard Edgren - Curing Our Binary Disease - EuroSTAR 2012
Rekard Edgren - Curing Our Binary Disease - EuroSTAR 2012
 
Test automation – the bitter truth
Test automation – the bitter truthTest automation – the bitter truth
Test automation – the bitter truth
 
Data science unit 1 By: Professor Lili Saghafi
Data science unit 1 By: Professor Lili Saghafi Data science unit 1 By: Professor Lili Saghafi
Data science unit 1 By: Professor Lili Saghafi
 
Automating fetal heart monitor using machine learning
Automating fetal heart monitor using machine learningAutomating fetal heart monitor using machine learning
Automating fetal heart monitor using machine learning
 
Learn Learning + Prototype Testing
Learn Learning + Prototype TestingLearn Learning + Prototype Testing
Learn Learning + Prototype Testing
 
Machine Learning Vital Signs
Machine Learning Vital SignsMachine Learning Vital Signs
Machine Learning Vital Signs
 
Advancing Testing Using Axioms
Advancing Testing Using AxiomsAdvancing Testing Using Axioms
Advancing Testing Using Axioms
 
Hindsight lessons about API testing
Hindsight lessons about API testingHindsight lessons about API testing
Hindsight lessons about API testing
 
Renat Gilmanov "Visual search results validation or where is my ninja turtles?"
Renat Gilmanov "Visual search results validation or where is my ninja turtles?"Renat Gilmanov "Visual search results validation or where is my ninja turtles?"
Renat Gilmanov "Visual search results validation or where is my ninja turtles?"
 
Uncertainty Quantification in Complex Physical Systems. (An Inroduction)
Uncertainty Quantification in Complex Physical Systems. (An Inroduction)Uncertainty Quantification in Complex Physical Systems. (An Inroduction)
Uncertainty Quantification in Complex Physical Systems. (An Inroduction)
 
Data Science Toolkit for Product Managers
Data Science Toolkit for Product ManagersData Science Toolkit for Product Managers
Data Science Toolkit for Product Managers
 
Data science toolkit for product managers
Data science toolkit for product managers Data science toolkit for product managers
Data science toolkit for product managers
 
Building a Successful Organization By Mastering Failure
Building a Successful Organization By Mastering FailureBuilding a Successful Organization By Mastering Failure
Building a Successful Organization By Mastering Failure
 
How to think smarter about software development
How to think smarter about software developmentHow to think smarter about software development
How to think smarter about software development
 
Test-Driven Development
 Test-Driven Development  Test-Driven Development
Test-Driven Development
 

Similaire à Testing for cognitive bias in ai systems

Not fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational valuesNot fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational valuesPeter Varhol
 
Not fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational valuesNot fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational valuesPeter Varhol
 
Correlation does not mean causation
Correlation does not mean causationCorrelation does not mean causation
Correlation does not mean causationPeter Varhol
 
Testing a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatraceTesting a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatracePeter Varhol
 
Making disaster routine
Making disaster routineMaking disaster routine
Making disaster routinePeter Varhol
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?Srinath Perera
 
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...OrateTeam
 
Don’t Let Missed Bugs Cause Mayhem in your Organization!
Don’t Let Missed Bugs Cause Mayhem in your Organization!Don’t Let Missed Bugs Cause Mayhem in your Organization!
Don’t Let Missed Bugs Cause Mayhem in your Organization!Qualitest
 
How did i miss that bug rtc
How did i miss that bug rtcHow did i miss that bug rtc
How did i miss that bug rtcGerieOwen
 
AI Models For Fun and Profit by Walmart Director of Artificial Intelligence
AI Models For Fun and Profit by Walmart Director of Artificial IntelligenceAI Models For Fun and Profit by Walmart Director of Artificial Intelligence
AI Models For Fun and Profit by Walmart Director of Artificial IntelligenceProduct School
 
Joined up Marketing Thinkery. How to automate your business
Joined up Marketing Thinkery. How to automate your businessJoined up Marketing Thinkery. How to automate your business
Joined up Marketing Thinkery. How to automate your businessAndrew Balerdi
 
Unit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxUnit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxChitrachitrap
 
Protecting privacy with fuzzy-feeling test data
Protecting privacy with fuzzy-feeling test dataProtecting privacy with fuzzy-feeling test data
Protecting privacy with fuzzy-feeling test dataMatt Bowen
 
How to make m achines learn
How to make m achines learnHow to make m achines learn
How to make m achines learniskamegy
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
 
How to Get the Most Out of Security Tools
How to Get the Most Out of Security ToolsHow to Get the Most Out of Security Tools
How to Get the Most Out of Security ToolsSecurity Innovation
 
Using AI to Build Fair and Equitable Workplaces
Using AI to Build Fair and Equitable WorkplacesUsing AI to Build Fair and Equitable Workplaces
Using AI to Build Fair and Equitable WorkplacesData Con LA
 

Similaire à Testing for cognitive bias in ai systems (20)

Not fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational valuesNot fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational values
 
Not fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational valuesNot fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational values
 
Correlation does not mean causation
Correlation does not mean causationCorrelation does not mean causation
Correlation does not mean causation
 
Testing a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatraceTesting a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatrace
 
Making disaster routine
Making disaster routineMaking disaster routine
Making disaster routine
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?
 
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
 
Don’t Let Missed Bugs Cause Mayhem in your Organization!
Don’t Let Missed Bugs Cause Mayhem in your Organization!Don’t Let Missed Bugs Cause Mayhem in your Organization!
Don’t Let Missed Bugs Cause Mayhem in your Organization!
 
How did i miss that bug rtc
How did i miss that bug rtcHow did i miss that bug rtc
How did i miss that bug rtc
 
Model validation
Model validationModel validation
Model validation
 
AI Models For Fun and Profit by Walmart Director of Artificial Intelligence
AI Models For Fun and Profit by Walmart Director of Artificial IntelligenceAI Models For Fun and Profit by Walmart Director of Artificial Intelligence
AI Models For Fun and Profit by Walmart Director of Artificial Intelligence
 
Joined up Marketing Thinkery. How to automate your business
Joined up Marketing Thinkery. How to automate your businessJoined up Marketing Thinkery. How to automate your business
Joined up Marketing Thinkery. How to automate your business
 
Unit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxUnit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptx
 
Bad Metric, Bad!
Bad Metric, Bad!Bad Metric, Bad!
Bad Metric, Bad!
 
Protecting privacy with fuzzy-feeling test data
Protecting privacy with fuzzy-feeling test dataProtecting privacy with fuzzy-feeling test data
Protecting privacy with fuzzy-feeling test data
 
How to make m achines learn
How to make m achines learnHow to make m achines learn
How to make m achines learn
 
L15. Machine Learning - Black Art
L15. Machine Learning - Black ArtL15. Machine Learning - Black Art
L15. Machine Learning - Black Art
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
 
How to Get the Most Out of Security Tools
How to Get the Most Out of Security ToolsHow to Get the Most Out of Security Tools
How to Get the Most Out of Security Tools
 
Using AI to Build Fair and Equitable Workplaces
Using AI to Build Fair and Equitable WorkplacesUsing AI to Build Fair and Equitable Workplaces
Using AI to Build Fair and Equitable Workplaces
 

Plus de Peter Varhol

DevOps and the Impostor Syndrome
DevOps and the Impostor SyndromeDevOps and the Impostor Syndrome
DevOps and the Impostor SyndromePeter Varhol
 
162 the technologist of the future
162   the technologist of the future162   the technologist of the future
162 the technologist of the futurePeter Varhol
 
Digital transformation through devops dod indianapolis
Digital transformation through devops dod indianapolisDigital transformation through devops dod indianapolis
Digital transformation through devops dod indianapolisPeter Varhol
 
What Aircrews Can Teach Testing Teams
What Aircrews Can Teach Testing TeamsWhat Aircrews Can Teach Testing Teams
What Aircrews Can Teach Testing TeamsPeter Varhol
 
Identifying and measuring testing debt
Identifying and measuring testing debtIdentifying and measuring testing debt
Identifying and measuring testing debtPeter Varhol
 
What aircrews can teach devops teams ignite
What aircrews can teach devops teams igniteWhat aircrews can teach devops teams ignite
What aircrews can teach devops teams ignitePeter Varhol
 
Talking to people lightning
Talking to people lightningTalking to people lightning
Talking to people lightningPeter Varhol
 
Varhol oracle database_firewall_oct2011
Varhol oracle database_firewall_oct2011Varhol oracle database_firewall_oct2011
Varhol oracle database_firewall_oct2011Peter Varhol
 
Qa test managed_code_varhol
Qa test managed_code_varholQa test managed_code_varhol
Qa test managed_code_varholPeter Varhol
 
Talking to people: the forgotten DevOps tool
Talking to people: the forgotten DevOps toolTalking to people: the forgotten DevOps tool
Talking to people: the forgotten DevOps toolPeter Varhol
 
How do we fix testing
How do we fix testingHow do we fix testing
How do we fix testingPeter Varhol
 
Moneyball peter varhol_starwest2012
Moneyball peter varhol_starwest2012Moneyball peter varhol_starwest2012
Moneyball peter varhol_starwest2012Peter Varhol
 

Plus de Peter Varhol (12)

DevOps and the Impostor Syndrome
DevOps and the Impostor SyndromeDevOps and the Impostor Syndrome
DevOps and the Impostor Syndrome
 
162 the technologist of the future
162   the technologist of the future162   the technologist of the future
162 the technologist of the future
 
Digital transformation through devops dod indianapolis
Digital transformation through devops dod indianapolisDigital transformation through devops dod indianapolis
Digital transformation through devops dod indianapolis
 
What Aircrews Can Teach Testing Teams
What Aircrews Can Teach Testing TeamsWhat Aircrews Can Teach Testing Teams
What Aircrews Can Teach Testing Teams
 
Identifying and measuring testing debt
Identifying and measuring testing debtIdentifying and measuring testing debt
Identifying and measuring testing debt
 
What aircrews can teach devops teams ignite
What aircrews can teach devops teams igniteWhat aircrews can teach devops teams ignite
What aircrews can teach devops teams ignite
 
Talking to people lightning
Talking to people lightningTalking to people lightning
Talking to people lightning
 
Varhol oracle database_firewall_oct2011
Varhol oracle database_firewall_oct2011Varhol oracle database_firewall_oct2011
Varhol oracle database_firewall_oct2011
 
Qa test managed_code_varhol
Qa test managed_code_varholQa test managed_code_varhol
Qa test managed_code_varhol
 
Talking to people: the forgotten DevOps tool
Talking to people: the forgotten DevOps toolTalking to people: the forgotten DevOps tool
Talking to people: the forgotten DevOps tool
 
How do we fix testing
How do we fix testingHow do we fix testing
How do we fix testing
 
Moneyball peter varhol_starwest2012
Moneyball peter varhol_starwest2012Moneyball peter varhol_starwest2012
Moneyball peter varhol_starwest2012
 

Dernier

IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum ComputingGDSC PJATK
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceMartin Humpolec
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 

Dernier (20)

IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum Computing
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your Salesforce
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 

Testing for cognitive bias in ai systems

  • 1. Testing for Cognitive Bias in AI Systems Peter Varhol, Technology Strategy Research
  • 2. About Me • International speaker and writer • Degrees in Math, CS, Psychology • Technology communicator • Former university professor, tech journalist • Cat owner and distance runner • peter@petervarhol.com
  • 3. What You Will Learn • What kind of systems produce nondeterministic results based on data • How machine learning results can be biased • How to recognize and understand bias in machine learning systems
  • 4. Agenda • What are machine learning and adaptive systems? • How can software possibly be biased? • How can we tell if they are biased? • Can we do anything? • Summary and conclusions
  • 5. Machine Learning and Adaptive Systems • We are building a different kind of software • It never returns the same result • That doesn’t make it wrong • How can we assess if it’s correct? • How do we know if there is a bug? • How do we know if it’s biased?
  • 6. Bug vs. Bias • A bug is an identifiable and measurable error in process or result • Usually fixed with a code change • A bias is a systematic inflection in decisions that produces results inconsistent with reality • Bias can’t be fixed with a code change
  • 7. How Does This Happen? • The problem domain is ambiguous • There is no single “right” answer • “Close enough” can usually work • As long as we can quantify “close enough” • We don’t know quite why the software responds as it does • We can’t easily trace code paths • We choose the data • The software “learns” from past actions
  • 8. How Can We Tell If It’s Biased? • We look very carefully at the training data • We set strict success criteria based on the system requirements • We run many tests • Most change parameters only slightly • Some use radical inputs • Compare results to success criteria
  • 9. We Train Systems With Data • We know the input data and expected results • Input data is run through the algorithms and the results compared to actual results • The algorithms adjust parameters based on a comparison between algorithm and actual • What happens if the data systematically misrepresents the domain?
  • 10. Amazon Can’t Rid Its AI of Bias • Amazon created an AI to crawl the web to find job candidates • Training data was all resumes submitted for the last ten years • In IT, the overwhelming majority were male • The AI “learned” that males were superior for IT jobs • Amazon couldn’t fix that training bias
  • 11. Amazon Can’t Rid Its AI of Bias • So just train the AI with the candidates hired • Right? • No, because males were also hired in overwhelming numbers • Amazon hasn’t found a way to overcome this bias • So they stopped using the tool • Does all data have bias?
  • 12. Many Systems Use Objective Data • Electric wind sensor • Determines wind speed and direction • Based on the cooling of filaments • Designed a three-layer neural network • Then used the known data to train it • Cooling in degrees of all four filaments • Wind speed, direction
  • 13. Can This Possibly Be Biased? • Well, yes • The training data could have been recorded in single temperature/sunlight/humidity conditions • Which could affect results under those conditions • It’s a possible bias that doesn’t hurt anyone • Or does it? • Does anyone remember a certain O-ring?
  • 14. Where Do Biases Come From? • Data selection • We choose training data that represents only one segment of the domain • We limit our training data to certain times or seasons • We overrepresent one population • Or • The problem domain has subtly changed
  • 15. Where Do Biases Come From? • Latent bias • Concepts become incorrectly correlated • Correlation does not mean causation • But it be high enough to believe • We could be promoting stereotypes • This describes Amazon’s problem
  • 16. Where Do Biases Come From? • Interaction bias • We may focus on keywords that users apply incorrectly • User incorporates slang or unusual words • “That’s bad, man” • The story of Microsoft Tay • It wasn’t bad, it was trained that way
  • 17. Why Does Bias Matter? • Wrong answers • Often with no recourse • Subtle discrimination (legal or illegal) • And no one knows it • Suboptimal results • We’re not getting it right often enough
  • 18. How Are These Systems Used? • Transportation • Self-driving cars • Ecommerce • Recommendation engines • Finance • Stock trading systems • HR • This is a scary one • Medicine • This is scarier
  • 19. It’s Not Just AI • All software has biases • It’s written by people • People make decisions on how to design and implement • Bias is inevitable • But can we find it and correct it? • Do we have to?
  • 20. Like This One • A London doctor can’t get into her fitness center locker room • The fitness center uses a “smart card” to access and record services • While acknowledging the problem • The fitness center couldn’t fix it • But the software development team could • They had hard-coded “doctor” to be synonymous with “male” • It was meant as a convenient shortcut
  • 21. About That Data • We use data from the problem domain • What’s that? • In some cases, scientific measurements are accurate • But we can choose the wrong measures • Or not fully represent the problem domain • But data can also be subjective • We train with photos of one race over another • We train with our own values of beauty
  • 22. Another Example • Retail recommendation engines • Other people bought this • You may also be interested in that • They can bring in additional revenue • But they can be very wrong sometimes • The algorithms think we want to see what they show us
  • 23. Where Bias Can Be Very Wrong • Brooks Ghost running shoes • Versus ghost costumes • Our data doesn’t take context into account
  • 24. Challenges to Validating Requirements • What does it mean to be correct? • The result may be different every time • There is no one single right answer • How will this really work in production? • How do I test it for bias?
  • 25. Possible Answers • Only look at outputs for given inputs • And set accuracy parameters • Don’t look at the outputs at all • Focus on performance/usability/other features • We can’t test accuracy • Throw up our hands and go home • We can’t easily define bias, let alone test for it
  • 26. Testing Machine Learning Systems For Bias • Have objective acceptance criteria • Test with new data • Don’t count on all results being accurate • Understand the architecture of the network as a part of the testing process • Communicate the level of confidence you have in the results
  • 27. Gearing Up to Test • Think outside the box • Understand your users intimately • What will a user do? • Know the training data • Look for bias in the data • Write tests to exploit edge cases • Run many tests to look for trends
  • 28. What is a Bug? • A mismatch between inputs and outputs? • It supposed to be that way! • Not every recommendation will be a good one • But that doesn’t mean it’s a bug • Too many wrong answers • Define too many • Define wrong
  • 29. We Found a Bug, Now What? • The bug could be unrelated to the neural network • Treat it as a normal bug • If the neural network is involved • Determine a definition of inaccurate • Determine the likelihood of an inaccurate answer • This may involve serious redevelopment
  • 30. Bias and Truth and AI, Oh My • All human decisions are biased in some way • Do we want human decisions, or do we want correct ones? • Human decisions come with human biases • Can we get those out of our software? • And will it be better if we do? • Even with bias, computer decisions may be better
  • 31. A Larger Problem • What if we want the system to replicate how we would decide? • Irrespective of bias • We may accept bias if it is consistent • After all, humans are biased • This is a question with no easy answer in practice
  • 32. Delivering in the Clutch • Machines treat all events as equal • Humans recognize the importance of some events • And sometimes can rise to the occasion • There is no mechanism for code to do this • We could have data and algorithms to recognize the importance of a specific event • But the software cannot “improve” its answer • This is less a bias than an inherent weakness
  • 33. The Human in the Loop • We don’t understand complex software systems • Disasters often happen because software behaves in unexpected ways • Human oversight may prevent disasters, or wrong decisions • Can we overcome human bias? • The problem is that machines respond too quickly • In many cases, there is not enough time for human oversight
  • 34. Objections • I will never encounter this type of application! • You might be surprised • I will do what I’ve always done • No, you won’t • My goals will be defined by others • Unless they’re not • You may be the one
  • 35. Conclusions • You can’t make intelligence from code • Algorithms are thoughtless • “Learning” systems are what we program and train • No easy answer • If we choose the data, it will likely be biased • Recognize how your data fits into the problem domain
  • 37. References • https://qz.com/989137/when-a-robot-ai-doctor-misdiagnoses-you- whos-to-blame/ • https://qz.com/1089555/artificial-intelligence-programmers-are- trying-to-turn-code-into-intelligence-like-alchemists-tried-turning- lead-into-gold/ • https://www.theatlantic.com/technology/archive/2017/09/saving- the-world-from-code/540393/ • https://qz.com/work/1454396/amazons-sexist-hiring-algorithm- could-still-be-better-than-a-human/

Notes de l'éditeur

  1. There is a type of software where having a defined output is no longer the case. Actually, two types. One is machine learning systems. The second is predictive analytics, or adaptive systems.
  2. These types of software are becoming increasingly common, in areas such as ecommerce, public transportation, automotive, finance, and computer networks. They have the potential to make decisions given sufficiently well-defined inputs and goals. In some instances, they are characterized as artificial intelligence, in that they seemingly make decisions that were once the purview of a human user or operator.
  3. How do you test this? You do know what the answer is supposed to be, because you built the network using the test data, but it will be rare to get an exactly correct answer all of the time. The product is actually tested during the training process. Training either brings convergence to accurate results, or it diverges. The question is how you evaluate the quality of the network.
  4. Have objective acceptance criteria. Know the amount of error you and your users are willing to accept. Test with new data. Once you’ve trained the network and frozen the architecture and coefficients, use fresh inputs and outputs to verify its accuracy. Don’t count on all results being accurate. That’s just the nature of the beast. And you may have to recommend throwing out the entire network architecture and starting over. Understand the architecture of the network as a part of the testing process. Few if any will be able to actually follow a set of inputs through the network of algorithms, but understanding how the network is constructed will help testers determine if another architecture might produce better results. Communicate the level of confidence you have in the results to management and users. Machine learning systems offer you the unique opportunity to describe confidence in statistical terms, so use them. One important thing to note is that the training data itself could well contain inaccuracies. In this case, because of measurement error, the recorded wind speed and direction could be off or ambiguous. In other cases, the cooling of the filament likely has some error in its measurement.
  5. You can’t make gold from lead, and you can’t make intelligence from code.