SlideShare une entreprise Scribd logo
1  sur  18
www.globalbigdataconference.com
Twitter : @bigdataconf
Architecting a predictive,
petabyte-scale, self-learning
fraud detection system
3
4
WHAT WE’RE UP AGAINST
4
4
50+Schemes
(and counting)
99.9999%‘Good’ messages
6+Months
per case
Needle in a haystack
Hybrid analytics
No training data
Semi-supervised learning
Adversarial learning
Online feedback
5
WHY HYBRID ANALYTICS?
5
5
Ignore
more rules
Unusual
timing of
events
Unusual
personal
network
Teamwork
& scale
Think & talk
differently
6
(BITS OF) THE TOOLBOX
6
6
Rule
Inference
Time
Series
AnalysisLink
Analysis
Ensemble
Learning
Natural
Language
7
THE CODE, PLEASE
7 7
Freely available Jupyter notebooks
Open source libraries & open data
Github.com/atigeo/hunting_criminals_demo
8
9
STREAM PROCESSING
9
9
Kafka
Email Stream
Account transactions
Stream
Email NLP
Features
People graph
Transactions time series
1 0
SAMPLE EMAIL PATTERNS
1 1
SAMPLE NATURAL LANGUAGE ANNOTATORS
Understand vocabulary
– Jargon
– Code words
– Multi-lingual
Understand grammar
– Who are we talking about?
– Past, present or future?
– Compound sentences
Understand context
– Email: Re:, Fwd:, attachments
– SMS & IM have their own grammar
1 2
SAMPLE GRAPH FEATURES
Standard algorithms like KMeans don’t work on “haystacks”
1 3
SAMPLE GRAPH FEATURES
Bregman Bubble Clustering
1 4
USER ANALYSIS ITERATION
Email NLP
Features
User graph
Transactions
time series
Graph Features
Time Series
Features
NLP Features
Agent Feedback
Train/TestClassifier
1 5
1 6
1 7
•Needle in a very large haystack
– Actually needs a petabyte-scale platform
•Multi-modal: no single trick works
– Hybrid analytics
•No labeled data
– Semi-supervised learning
– Cold start problem
•Sparse & high-dimensional
– Graph based features & change over time
•Adversarial
– Feedback & online learning
SUMMARY: CHALLENGES OF LEARNING CRIMINALS
1 8
@davidtalby

Contenu connexe

Tendances

Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Spark Summit
 
Big data bi-mature-oanyc summit
Big data bi-mature-oanyc summitBig data bi-mature-oanyc summit
Big data bi-mature-oanyc summit
Open Analytics
 
Big data-science-oanyc
Big data-science-oanycBig data-science-oanyc
Big data-science-oanyc
Open Analytics
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impact
Arun Kejariwal
 

Tendances (20)

Get Started with Driverless AI Recipes - Hands-on Training
Get Started with Driverless AI Recipes - Hands-on TrainingGet Started with Driverless AI Recipes - Hands-on Training
Get Started with Driverless AI Recipes - Hands-on Training
 
Big data and AI in Socialbakers
Big data and AI in SocialbakersBig data and AI in Socialbakers
Big data and AI in Socialbakers
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to production
 
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
 
Plume - A Code Property Graph Extraction and Analysis Library
Plume - A Code Property Graph Extraction and Analysis LibraryPlume - A Code Property Graph Extraction and Analysis Library
Plume - A Code Property Graph Extraction and Analysis Library
 
Simplify Governance of Streaming Data
Simplify Governance of Streaming Data Simplify Governance of Streaming Data
Simplify Governance of Streaming Data
 
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraphFROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
 
Big data bi-mature-oanyc summit
Big data bi-mature-oanyc summitBig data bi-mature-oanyc summit
Big data bi-mature-oanyc summit
 
Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...
Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...
Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...
 
ルールベースによるTwitter タイムライン感情分析
ルールベースによるTwitter タイムライン感情分析ルールベースによるTwitter タイムライン感情分析
ルールベースによるTwitter タイムライン感情分析
 
Getting Started With Dato - August 2015
Getting Started With Dato - August 2015Getting Started With Dato - August 2015
Getting Started With Dato - August 2015
 
Big data-science-oanyc
Big data-science-oanycBig data-science-oanyc
Big data-science-oanyc
 
Beyond Matching: Applying Data Science Techniques to IOC-based Detection
Beyond Matching: Applying Data Science Techniques to IOC-based DetectionBeyond Matching: Applying Data Science Techniques to IOC-based Detection
Beyond Matching: Applying Data Science Techniques to IOC-based Detection
 
Etl is Dead; Long Live Streams
Etl is Dead; Long Live StreamsEtl is Dead; Long Live Streams
Etl is Dead; Long Live Streams
 
Rakshit (Rocky) Bhatt Resume - 2022
Rakshit (Rocky) Bhatt Resume - 2022Rakshit (Rocky) Bhatt Resume - 2022
Rakshit (Rocky) Bhatt Resume - 2022
 
Elevation Query Extension: Introducing Subselects into Lucene Queries
Elevation Query Extension: Introducing Subselects into Lucene QueriesElevation Query Extension: Introducing Subselects into Lucene Queries
Elevation Query Extension: Introducing Subselects into Lucene Queries
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impact
 
Webinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningWebinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep Learning
 
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
 
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
 

Similaire à Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System

Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Spark Summit
 
Sis fri 1030 michael holmes
Sis fri 1030 michael holmesSis fri 1030 michael holmes
Sis fri 1030 michael holmes
MediaPost
 
Natural Language Processing & Semantic Models in an Imperfect World
Natural Language Processing & Semantic Modelsin an Imperfect WorldNatural Language Processing & Semantic Modelsin an Imperfect World
Natural Language Processing & Semantic Models in an Imperfect World
Vital.AI
 
PHP to Python with No Regrets
PHP to Python with No RegretsPHP to Python with No Regrets
PHP to Python with No Regrets
Alex Ezell
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Sri Ambati
 

Similaire à Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System (20)

Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
 
Hunting criminals with hybrid analytics -- October 2015
Hunting criminals with hybrid analytics -- October 2015Hunting criminals with hybrid analytics -- October 2015
Hunting criminals with hybrid analytics -- October 2015
 
Sis fri 1030 michael holmes
Sis fri 1030 michael holmesSis fri 1030 michael holmes
Sis fri 1030 michael holmes
 
Active learning from streams of graph, language & time series signals
Active learning from streams of graph, language & time series signalsActive learning from streams of graph, language & time series signals
Active learning from streams of graph, language & time series signals
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
Introduction To Python
Introduction To PythonIntroduction To Python
Introduction To Python
 
Realtime search at Yammer
Realtime search at YammerRealtime search at Yammer
Realtime search at Yammer
 
Real-time Search at Yammer - By Aleksandrovsky Boris
Real-time Search at Yammer - By Aleksandrovsky BorisReal-time Search at Yammer - By Aleksandrovsky Boris
Real-time Search at Yammer - By Aleksandrovsky Boris
 
Real Time Search at Yammer
Real Time Search at YammerReal Time Search at Yammer
Real Time Search at Yammer
 
Hunting criminals with hybrid analytics strata hadoop v4
Hunting criminals with hybrid analytics   strata hadoop v4Hunting criminals with hybrid analytics   strata hadoop v4
Hunting criminals with hybrid analytics strata hadoop v4
 
Natural Language Processing & Semantic Models in an Imperfect World
Natural Language Processing & Semantic Modelsin an Imperfect WorldNatural Language Processing & Semantic Modelsin an Imperfect World
Natural Language Processing & Semantic Models in an Imperfect World
 
Codemotion Berlin 2015 recap
Codemotion Berlin 2015   recapCodemotion Berlin 2015   recap
Codemotion Berlin 2015 recap
 
Hacking CEH cheat sheet
Hacking  CEH cheat sheetHacking  CEH cheat sheet
Hacking CEH cheat sheet
 
Hacking - CEH Cheat Sheet Exercises.pdf
Hacking - CEH Cheat Sheet Exercises.pdfHacking - CEH Cheat Sheet Exercises.pdf
Hacking - CEH Cheat Sheet Exercises.pdf
 
Data science in 10 steps
Data science in 10 stepsData science in 10 steps
Data science in 10 steps
 
ChatGPT - 5 lessons in 5 minutes
ChatGPT - 5 lessons in 5 minutesChatGPT - 5 lessons in 5 minutes
ChatGPT - 5 lessons in 5 minutes
 
PHP to Python with No Regrets
PHP to Python with No RegretsPHP to Python with No Regrets
PHP to Python with No Regrets
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
 
1435488539 221998
1435488539 2219981435488539 221998
1435488539 221998
 
Understanding Human Conversations with AI
Understanding Human Conversations with AI Understanding Human Conversations with AI
Understanding Human Conversations with AI
 

Plus de David Talby

Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
David Talby
 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
David Talby
 

Plus de David Talby (11)

Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...
 
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldTurning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
 
How to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical TrialsHow to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical Trials
 
New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021
 
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
 
Natural Language Understanding in Healthcare
Natural Language Understanding in HealthcareNatural Language Understanding in Healthcare
Natural Language Understanding in Healthcare
 
Deep learning for natural language understanding
Deep learning for natural language understandingDeep learning for natural language understanding
Deep learning for natural language understanding
 
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
 

Dernier

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Dernier (20)

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 

Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System

  • 2. Architecting a predictive, petabyte-scale, self-learning fraud detection system
  • 3. 3
  • 4. 4 WHAT WE’RE UP AGAINST 4 4 50+Schemes (and counting) 99.9999%‘Good’ messages 6+Months per case Needle in a haystack Hybrid analytics No training data Semi-supervised learning Adversarial learning Online feedback
  • 5. 5 WHY HYBRID ANALYTICS? 5 5 Ignore more rules Unusual timing of events Unusual personal network Teamwork & scale Think & talk differently
  • 6. 6 (BITS OF) THE TOOLBOX 6 6 Rule Inference Time Series AnalysisLink Analysis Ensemble Learning Natural Language
  • 7. 7 THE CODE, PLEASE 7 7 Freely available Jupyter notebooks Open source libraries & open data Github.com/atigeo/hunting_criminals_demo
  • 8. 8
  • 9. 9 STREAM PROCESSING 9 9 Kafka Email Stream Account transactions Stream Email NLP Features People graph Transactions time series
  • 10. 1 0 SAMPLE EMAIL PATTERNS
  • 11. 1 1 SAMPLE NATURAL LANGUAGE ANNOTATORS Understand vocabulary – Jargon – Code words – Multi-lingual Understand grammar – Who are we talking about? – Past, present or future? – Compound sentences Understand context – Email: Re:, Fwd:, attachments – SMS & IM have their own grammar
  • 12. 1 2 SAMPLE GRAPH FEATURES Standard algorithms like KMeans don’t work on “haystacks”
  • 13. 1 3 SAMPLE GRAPH FEATURES Bregman Bubble Clustering
  • 14. 1 4 USER ANALYSIS ITERATION Email NLP Features User graph Transactions time series Graph Features Time Series Features NLP Features Agent Feedback Train/TestClassifier
  • 15. 1 5
  • 16. 1 6
  • 17. 1 7 •Needle in a very large haystack – Actually needs a petabyte-scale platform •Multi-modal: no single trick works – Hybrid analytics •No labeled data – Semi-supervised learning – Cold start problem •Sparse & high-dimensional – Graph based features & change over time •Adversarial – Feedback & online learning SUMMARY: CHALLENGES OF LEARNING CRIMINALS