SlideShare une entreprise Scribd logo
1  sur  49
Télécharger pour lire hors ligne
Meet TransmogrifAI, Open
Source AutoML That Powers
Einstein Predictions
mtovbin@salesforce.com, @tovbinm
Matthew Tovbin, Principal Engineer, Einstein
Forward Looking Statement
Statement under the Private Securities Litigation Reform Act of 1995:
This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties
materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results
expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed
forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items
and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning
new, planned, or upgraded services or technology developments and customer contracts or use of our services.
The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new
functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our
operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any
litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our
relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our
service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger
enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in
our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent fiscal quarter.
These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section
of our Web site.
Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently
available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based
upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these
forward-looking statements.
Multi-cloud and Multi-tenant
1. Customer-specific models beat global models
2. Majority of business data is structured
3. Too many use cases, too few data scientists
Machine Learning is Hard and Even Harder for the Enterprise
Lessons our Data Scientists Learned
while Building Einstein
1. Customer-specific Models Beat Global Models
● Customers care about data privacy
● Every customer’s data is different
Enterprise Machine Learning
2. Majority of Business Data is Structured
https://www.kaggle.com/surveys/2017
Data
Prep
Feature
Engineering
Feature
Selection
Model
Training Model
The standard approach to
building an ML model
3. Too Many Use Cases, Too Few Data Scientists
ML is exponentially harder in the Enterprise with
many, customer-specific models
3. Too Many Use Cases, Too Few Data Scientists
Data
Prep
Feat.
Eng
Feat.
Selection
Model
Training
Model
Data
Prep
Feat.
Eng
Feat.
Selection
Model
Training
Model
Data
Prep
Feat.
Eng
Feat.
Selection
Model
Training
Model
Data
Prep
Feat.
Eng
Feat.
Selection
Model
Training
Model
Data
Prep
Feat.
Eng
Feat.
Selection
Model
Training
Model
Data
Prep
Feat.
Eng
Feat.
Selection
Model
Training
Model
Data
Prep
Feat.
Eng
Feat.
Selection
Model
Training
Model
Data
Prep
Feat.
Eng
Feat.
Selection
Model
Training
Model
TransmogrifAI
Introducing TransmogrifAI
Customer
specific
models
Structured,
transactional
data
Data
science
at scale
+ +
Automated Machine Learning for Structured Data
● Automated feature engineering, feature
selection & model selection
● ML abstractions that improve developer
productivity & collaboration
● Model explainability to improve
debuggability and transparency
>90% accuracy with 100x reduction in time
Introducing TransmogrifAI
Automated Machine Learning for Structured Data
Transform in a surprising or magical manner
What’s in a name?
transmogrify
5B+
predictions
per day
Einstein
Platform
Compute
Orchestration
Data Store
Model Lifecycle
Management
Data Science
Experience
Configuration
Services
Infrastructure
Metrics
Health Monitoring
ETL/GDPR/
Data
Processing
DL TransmogrifAI
Machine Learning
The AutoML Engine in the Einstein Platform
Lead Scoring Engagement ScoringCase Classification Prediction Builder
...
Einstein Prediction Builder
• Product: Point.
Click. Predict.
• Engineering: any
customer can create
any number of ML
applications on any
data?! Impossible!
Under the Hood
● Automated Feature Engineering
● Automated Feature Selection
● Automated Model Selection
Automatic Feature Engineering
Type Hierarchy For Machine Learning
FeatureType
OPNumeric OPCollection
OPSetOPList
NonNullableText
Email
Base64
Phone
ID
URL
ComboBox
PickList
TextArea
OPVector OPMap
BinaryMap
IntegralMap
DateList
DateTimeList
Integral
Real
Binary
Percent
Currency
Date
DateTime
MultiPickList TextMap
TextListCity
Street
Country
PostalCode
Location
State
Geolocation
StateMap
SingleResponse
RealNN
Categorical
MultiResponse
Legend: bold - abstract type, regular - concrete type, italic - trait, solid line - inheritance, dashed line - trait mixin
...
RealMap
https://developer.salesforce.com/docs/atlas.en-us.api.meta/api/field_types.htm
Prediction
Automatic Feature Engineering
transmogrify()
Lat LonSubjectPhoneEmail Age
Age
[0-15]
Age
[15-35]
Age
[>35]
Email Is
Spammy
Top Email
Domains
Country
Code
Phone Is
Valid
Top
TF-IDF
Terms
City, State
Feature Vector
Feature
34,200.03
14.001.02
22,430.11
47,895.66
Feature Null Indicator
34,200.03 0
14.001.02 0
16,045.21 1
22,430.11 0
16,045.21 1
47,895.66 0
Numeric – Imputation and Null value tracking
Categorical: One Hot Encoding
Text: TF-IDF
Temporal: Circular Statistics
Circular distributions are those that
have no true zero. Great for temporal
features and deals with seasonality:
● Hours of the Day
● Weeks on the Month
● Months of the Year
Numeric Categorical SpatialTemporal
Reverse Geocoding
Nearest POI
Text
Time difference
Circular Statistics
Time extraction (day,
week, month, year)
Language Detection
Language-wise
Tokenization
Hash Encoding
Tf-Idf
Word2Vec
Name Entity
Resolution
Smart Categorical
Imputation
Track null value
One Hot Encoding
Dynamic Top K pivot
Imputation
Track null value
Scaling - zNormalize,
log, linear
Smart Binning
Automatic Feature Engineering
Automatic Feature Selection
Problems with doing Machine
Learning on Enterprise Data
1. Hindsight Bias
2. Field Usage Changes
3. Bulk Uploads
4. Field Type Abuse
5. More...
Lead Before Conversion Lead At Conversion
Problem #1 – Hindsight Bias (aka Label Leakage)
In layman terms, it is like Marty McFly traveling to the future, getting his hands on
the Sports Almanac, and using it to bet on the games of the present.
Problem #2 – Field Usage Changes Over Time
Problem #3 – Bulk Upload by Business Workflow
A business process updated records having different
distribution - biased towards negative outcome
The quick, brown fox jumps over a lazy dog. DJs flock by when MTV ax quiz prog. Junk
MTV quiz graced by fox whelps. Bawds jog, flick quartz, vex nymphs. Waltz, bad nymph,
for quick jigs vex! Fox nymphs grab quick-jived waltz. Brick quiz whangs jumpy veldt fox.
Bright vixens jump; dozy fowl quack. Quick wafting zephyrs vex bold Jim. Quick zephyrs
blow, vexing daft Jim. Sex-charged fop blew my junk TV quiz. How quickly daft jumping
zebras vex. Two driven jocks help fax my big quiz. Quick, Baz, get my woven flax
jodhpurs! "Now fax quiz Jack!" my brave ghost pled. Five quacking zephyrs jolt my wax
bed. Flummoxed by job, kvetching W. zaps Iraq. Cozy sphinx waves quart jug of bad milk.
A very bad quack might jinx zippy fowls. Few quips galvanized the mock jury box. Quick
brown dogs jump over the lazy fox. The jay, pig, fox, zebra, and my wolves quack! Blowzy
red vixens fight for a quick jump. Joaquin Phoenix was gazed by MTV for luck. A wizard’s
job is to vex chumps quickly in fog. Watch "Jeopardy!", Alex Trebek's fun TV quiz game.
Woven silk pyjamas exchanged for blue quartz. Brawny gods just
Typical Text Feature ‘Last Open Stage’ Text Feature
align
answer
collect
contracting
negotiate
opportunity won
qualify
qualify/align
Problem #4 – Feature types abused
outcome/label
Opportunity Won value of this feature is a leaker
Problem #4 – Feature types abused
● Analyze every feature and output descriptive statistics
○ Mean
○ Min
○ Max
○ Variance
○ Number of Nulls
● Ensure Features have acceptable ranges
Automatic Feature Selection
● Analyse each feature
correlation to the label, who
has the most and least
predictive power?
● Drop features with low
predictive power
Automatic Feature Selection
Auto Bucketize
training vs scoring
Feature Lineage
Need to know the true label to evaluate the model
● Usually do a random train/holdout split on the labeled data and use cross-validation on
training set
Evaluating Models
Training set
Holdout set
● Time-based evaluation dataset is the true test of
how well a model is performing
○ Wait for existing (or new) records to have their
label determined
○ Predict from older state of that record and
compare to the true label
● Biggest problem is usually waiting for enough data to
be available
● We can also switch over to constructing the model
from the true event sequence rather than a snapshot
Evaluating Models
What does label leakage look like?
What does label leakage look like?
Leakers removed by
AutoML: 73
Leakers removed by
data scientist hand tuning: 42
Department
mkto_si__Last_Interesting_Moment__c
Description OtherPostalCode
et4ae5__Mobile_Country_Code__c Title
mkto2__Acquisition_Program_Id__c
JigsawContactId ReportsToId OtherCity
pi__last_activity__c MailingLongitude
pi__first_activity__c AssistantPhone HomePhone
Fax OtherStreet Partner_Last_Name__c
mkto_si__Last_Interesting_Moment_Desc__c
mkto2__Acquisition_Program__c Jigsaw
Company__c OtherLongitude AssistantName
Salutation OtherLatitude Purchase_Motivation__c
Secondary_Email__c TimetoPurchase__c
mkto_si__Last_Interesting_Moment_Source__c
MailingGeocodeAccuracy MailingLatitude
pi__created_date__c CommentCapture__c
Preferred_Communication_Method__c
TopPriorityValue__c
mkto_si__Last_Interesting_Moment_Type__c
OtherState TopPriorityProcess__c OtherCountry
MasterRecordId OtherGeocodeAccuracy
TopPriorityProduct__c
emailbounceddate
lastcurequestdate lastcuupdatedate
lastreferenceddate lastvieweddate
mkto2__acquisition_date__c
mkto_si__hidedate__c pi__grade__c
pi__notes__c pi__utm_content__c
account_link_easy_closets__c
csat_survey_completed_date__c
csat_survey_net_promoter_score__c
csat_survey_results_link__c birthdate
mkto_si__last_interesting_moment_date__c
pi__campaign__c pi__comments__c
pi__first_search_term__c
pi__first_search_type__c
pi__first_touch_url__c pi__score__c
pi__url__c pi__utm_campaign__c
pi__utm_medium__c pi__utm_source__c
historical_lead_score__c pi__utm_term__c
first_activity_timestamp__c
predicted_likelihood_to_purchase_2__c
best_time_to_call_date__
c total_lead_score__c
csat_customer_service_s
urvey_disallowed__c
referral_credit_applied__c
referral_days_til_purchas
e__c
predicted_likelihood_to_p
urchase__c createdbyid
createddate
lastactivitydate
lastmodifieddate
last_activity_date__c
systemmodstamp
AutoML vs Hand Tuned – Showdown
Live Prediction Results
AutoML vs Hand Tuned – Showdown
Automated Model Selection
Automated Model Selection
● Many hyperparameters for each algorithm
● Automated Hyperparameter tuning
○ Faster model creation with improved metrics
○ Search algorithms to find the optimal hyperparameters,
e.g grid search, random search
Grid Search Bayesian SearchRandom Search
Random Forests
Decision Trees
Logistic Regression w/ ElasticNet Regularization
Naive Bayes
Gradient Boosted trees
Decision Trees
Random Forests
Linear Regression w/ ElasticNet Regularization
Random Forests
Decision Trees
Multinomial Logistic Regression w/ ElasticNet
Naive Bayes
Compete Algorithms
RMSE
AccuracyAuROC
Regression
Binary Classification Multi-Class Classification
Automated Model Selection
Different Permutation of Thresholds Leads to Different Results
Demo
Image credit: Wikipedia
How well does it work?
• TransmogrifAI empowers:
• Predictive Journeys
• Lead Scoring
• Prediction Builder
• Case Classification
• Most of the models deployed in
production are completely hands free
• Serves 3B+ 5B+ predictions per day
Where do WE go next?
• Deeper model & score insights – LOCO, LIME
• Hyper parameter search strategies – Bayesian, Bandit-based
• Feature engineering – text embeddings, model specific
• Model portability
• Enable more applications – recommenders, unsupervised learning
• Perf tuning, bug fixes, docs, examples
• <Your requirements / feedback>
Where do YOU go next?
• Read the blog post - https://www.sfdc.co/open-sourcing-transmogrifai
• Try it out - https://transmogrif.ai
• Reach out and contribute - https://sfdc.co/transmogrifai-contributing
• Student? Apply to Google Summer of Code (GSoC) 2019 to work with us!
• Feeling creative? We need a logo.
Questions?
Meet TransmogrifAI, Open Source AutoML That Powers Einstein Predictions

Contenu connexe

Similaire à Meet TransmogrifAI, Open Source AutoML That Powers Einstein Predictions

Architecting in the Cloud: Choosing the Right Technologies for your Solution
Architecting in the Cloud: Choosing the Right Technologies for your SolutionArchitecting in the Cloud: Choosing the Right Technologies for your Solution
Architecting in the Cloud: Choosing the Right Technologies for your SolutionJeff Douglas
 
Dreamforce 2009: IT Success with Agile Development Processes
Dreamforce 2009: IT Success with Agile Development ProcessesDreamforce 2009: IT Success with Agile Development Processes
Dreamforce 2009: IT Success with Agile Development ProcessesSteve Greene
 
Planning Your Migration to the Lightning Experience
Planning Your Migration to the Lightning ExperiencePlanning Your Migration to the Lightning Experience
Planning Your Migration to the Lightning ExperienceShell Black
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionSplunk
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionSplunk
 
Agile.2013.effecting.a.dev ops.transformation.at.salesforce
Agile.2013.effecting.a.dev ops.transformation.at.salesforceAgile.2013.effecting.a.dev ops.transformation.at.salesforce
Agile.2013.effecting.a.dev ops.transformation.at.salesforceDave Mangot
 
The Business of Flow - Point and Click Workflow Applications
The Business of Flow - Point and Click Workflow ApplicationsThe Business of Flow - Point and Click Workflow Applications
The Business of Flow - Point and Click Workflow ApplicationsDreamforce
 
How GetNinjas uses data to make smarter product decisions
How GetNinjas uses data to make smarter product decisionsHow GetNinjas uses data to make smarter product decisions
How GetNinjas uses data to make smarter product decisionsBernardo Srulzon
 
IxDA09 - Postcard Patterns
IxDA09 - Postcard PatternsIxDA09 - Postcard Patterns
IxDA09 - Postcard PatternsIan Swinson
 
M A S002 Johnson 091707
M A S002  Johnson 091707M A S002  Johnson 091707
M A S002 Johnson 091707Dreamforce07
 
Common Mistakes Salesforce Admins Make
Common Mistakes Salesforce Admins MakeCommon Mistakes Salesforce Admins Make
Common Mistakes Salesforce Admins MakeMike Gerholdt
 
Dreamforce 2013 - Heroku 5 use cases
Dreamforce 2013 - Heroku 5 use casesDreamforce 2013 - Heroku 5 use cases
Dreamforce 2013 - Heroku 5 use casesVincent Spehner
 
Common Mistakes Salesforce Admins Make - #DF13
Common Mistakes Salesforce Admins Make - #DF13Common Mistakes Salesforce Admins Make - #DF13
Common Mistakes Salesforce Admins Make - #DF13Jared Miller
 
Success Services - Driving business metrics
Success Services - Driving business metrics Success Services - Driving business metrics
Success Services - Driving business metrics Salesforce_Benelux
 
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...Databricks
 
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)Salesforce Partners
 
Intuit - How to Scale Your Experimentation Program
Intuit - How to Scale Your Experimentation ProgramIntuit - How to Scale Your Experimentation Program
Intuit - How to Scale Your Experimentation ProgramOptimizely
 
Follow the evidence: Troubleshooting Performance Issues
Follow the evidence:  Troubleshooting Performance IssuesFollow the evidence:  Troubleshooting Performance Issues
Follow the evidence: Troubleshooting Performance IssuesSalesforce Developers
 

Similaire à Meet TransmogrifAI, Open Source AutoML That Powers Einstein Predictions (20)

Introduction to Apex Triggers
Introduction to Apex TriggersIntroduction to Apex Triggers
Introduction to Apex Triggers
 
Architecting in the Cloud: Choosing the Right Technologies for your Solution
Architecting in the Cloud: Choosing the Right Technologies for your SolutionArchitecting in the Cloud: Choosing the Right Technologies for your Solution
Architecting in the Cloud: Choosing the Right Technologies for your Solution
 
Dreamforce 2009: IT Success with Agile Development Processes
Dreamforce 2009: IT Success with Agile Development ProcessesDreamforce 2009: IT Success with Agile Development Processes
Dreamforce 2009: IT Success with Agile Development Processes
 
Apex for Admins: Beyond the Basics
Apex for Admins: Beyond the BasicsApex for Admins: Beyond the Basics
Apex for Admins: Beyond the Basics
 
Planning Your Migration to the Lightning Experience
Planning Your Migration to the Lightning ExperiencePlanning Your Migration to the Lightning Experience
Planning Your Migration to the Lightning Experience
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 
Agile.2013.effecting.a.dev ops.transformation.at.salesforce
Agile.2013.effecting.a.dev ops.transformation.at.salesforceAgile.2013.effecting.a.dev ops.transformation.at.salesforce
Agile.2013.effecting.a.dev ops.transformation.at.salesforce
 
The Business of Flow - Point and Click Workflow Applications
The Business of Flow - Point and Click Workflow ApplicationsThe Business of Flow - Point and Click Workflow Applications
The Business of Flow - Point and Click Workflow Applications
 
How GetNinjas uses data to make smarter product decisions
How GetNinjas uses data to make smarter product decisionsHow GetNinjas uses data to make smarter product decisions
How GetNinjas uses data to make smarter product decisions
 
IxDA09 - Postcard Patterns
IxDA09 - Postcard PatternsIxDA09 - Postcard Patterns
IxDA09 - Postcard Patterns
 
M A S002 Johnson 091707
M A S002  Johnson 091707M A S002  Johnson 091707
M A S002 Johnson 091707
 
Common Mistakes Salesforce Admins Make
Common Mistakes Salesforce Admins MakeCommon Mistakes Salesforce Admins Make
Common Mistakes Salesforce Admins Make
 
Dreamforce 2013 - Heroku 5 use cases
Dreamforce 2013 - Heroku 5 use casesDreamforce 2013 - Heroku 5 use cases
Dreamforce 2013 - Heroku 5 use cases
 
Common Mistakes Salesforce Admins Make - #DF13
Common Mistakes Salesforce Admins Make - #DF13Common Mistakes Salesforce Admins Make - #DF13
Common Mistakes Salesforce Admins Make - #DF13
 
Success Services - Driving business metrics
Success Services - Driving business metrics Success Services - Driving business metrics
Success Services - Driving business metrics
 
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
 
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)
 
Intuit - How to Scale Your Experimentation Program
Intuit - How to Scale Your Experimentation ProgramIntuit - How to Scale Your Experimentation Program
Intuit - How to Scale Your Experimentation Program
 
Follow the evidence: Troubleshooting Performance Issues
Follow the evidence:  Troubleshooting Performance IssuesFollow the evidence:  Troubleshooting Performance Issues
Follow the evidence: Troubleshooting Performance Issues
 

Dernier

UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profileakrivarotava
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 

Dernier (20)

UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profile
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 

Meet TransmogrifAI, Open Source AutoML That Powers Einstein Predictions

  • 1. Meet TransmogrifAI, Open Source AutoML That Powers Einstein Predictions mtovbin@salesforce.com, @tovbinm Matthew Tovbin, Principal Engineer, Einstein
  • 2. Forward Looking Statement Statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services. The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site. Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.
  • 4. 1. Customer-specific models beat global models 2. Majority of business data is structured 3. Too many use cases, too few data scientists Machine Learning is Hard and Even Harder for the Enterprise Lessons our Data Scientists Learned while Building Einstein
  • 5. 1. Customer-specific Models Beat Global Models ● Customers care about data privacy ● Every customer’s data is different Enterprise Machine Learning
  • 6. 2. Majority of Business Data is Structured https://www.kaggle.com/surveys/2017
  • 7. Data Prep Feature Engineering Feature Selection Model Training Model The standard approach to building an ML model 3. Too Many Use Cases, Too Few Data Scientists
  • 8. ML is exponentially harder in the Enterprise with many, customer-specific models 3. Too Many Use Cases, Too Few Data Scientists Data Prep Feat. Eng Feat. Selection Model Training Model Data Prep Feat. Eng Feat. Selection Model Training Model Data Prep Feat. Eng Feat. Selection Model Training Model Data Prep Feat. Eng Feat. Selection Model Training Model Data Prep Feat. Eng Feat. Selection Model Training Model Data Prep Feat. Eng Feat. Selection Model Training Model Data Prep Feat. Eng Feat. Selection Model Training Model Data Prep Feat. Eng Feat. Selection Model Training Model
  • 10. ● Automated feature engineering, feature selection & model selection ● ML abstractions that improve developer productivity & collaboration ● Model explainability to improve debuggability and transparency >90% accuracy with 100x reduction in time Introducing TransmogrifAI Automated Machine Learning for Structured Data
  • 11. Transform in a surprising or magical manner What’s in a name? transmogrify
  • 12. 5B+ predictions per day Einstein Platform Compute Orchestration Data Store Model Lifecycle Management Data Science Experience Configuration Services Infrastructure Metrics Health Monitoring ETL/GDPR/ Data Processing DL TransmogrifAI Machine Learning The AutoML Engine in the Einstein Platform Lead Scoring Engagement ScoringCase Classification Prediction Builder ...
  • 13. Einstein Prediction Builder • Product: Point. Click. Predict. • Engineering: any customer can create any number of ML applications on any data?! Impossible!
  • 14. Under the Hood ● Automated Feature Engineering ● Automated Feature Selection ● Automated Model Selection
  • 16. Type Hierarchy For Machine Learning FeatureType OPNumeric OPCollection OPSetOPList NonNullableText Email Base64 Phone ID URL ComboBox PickList TextArea OPVector OPMap BinaryMap IntegralMap DateList DateTimeList Integral Real Binary Percent Currency Date DateTime MultiPickList TextMap TextListCity Street Country PostalCode Location State Geolocation StateMap SingleResponse RealNN Categorical MultiResponse Legend: bold - abstract type, regular - concrete type, italic - trait, solid line - inheritance, dashed line - trait mixin ... RealMap https://developer.salesforce.com/docs/atlas.en-us.api.meta/api/field_types.htm Prediction
  • 17. Automatic Feature Engineering transmogrify() Lat LonSubjectPhoneEmail Age Age [0-15] Age [15-35] Age [>35] Email Is Spammy Top Email Domains Country Code Phone Is Valid Top TF-IDF Terms City, State Feature Vector
  • 18. Feature 34,200.03 14.001.02 22,430.11 47,895.66 Feature Null Indicator 34,200.03 0 14.001.02 0 16,045.21 1 22,430.11 0 16,045.21 1 47,895.66 0 Numeric – Imputation and Null value tracking
  • 21. Temporal: Circular Statistics Circular distributions are those that have no true zero. Great for temporal features and deals with seasonality: ● Hours of the Day ● Weeks on the Month ● Months of the Year
  • 22. Numeric Categorical SpatialTemporal Reverse Geocoding Nearest POI Text Time difference Circular Statistics Time extraction (day, week, month, year) Language Detection Language-wise Tokenization Hash Encoding Tf-Idf Word2Vec Name Entity Resolution Smart Categorical Imputation Track null value One Hot Encoding Dynamic Top K pivot Imputation Track null value Scaling - zNormalize, log, linear Smart Binning Automatic Feature Engineering
  • 24. Problems with doing Machine Learning on Enterprise Data 1. Hindsight Bias 2. Field Usage Changes 3. Bulk Uploads 4. Field Type Abuse 5. More...
  • 25. Lead Before Conversion Lead At Conversion Problem #1 – Hindsight Bias (aka Label Leakage)
  • 26. In layman terms, it is like Marty McFly traveling to the future, getting his hands on the Sports Almanac, and using it to bet on the games of the present.
  • 27. Problem #2 – Field Usage Changes Over Time
  • 28. Problem #3 – Bulk Upload by Business Workflow A business process updated records having different distribution - biased towards negative outcome
  • 29. The quick, brown fox jumps over a lazy dog. DJs flock by when MTV ax quiz prog. Junk MTV quiz graced by fox whelps. Bawds jog, flick quartz, vex nymphs. Waltz, bad nymph, for quick jigs vex! Fox nymphs grab quick-jived waltz. Brick quiz whangs jumpy veldt fox. Bright vixens jump; dozy fowl quack. Quick wafting zephyrs vex bold Jim. Quick zephyrs blow, vexing daft Jim. Sex-charged fop blew my junk TV quiz. How quickly daft jumping zebras vex. Two driven jocks help fax my big quiz. Quick, Baz, get my woven flax jodhpurs! "Now fax quiz Jack!" my brave ghost pled. Five quacking zephyrs jolt my wax bed. Flummoxed by job, kvetching W. zaps Iraq. Cozy sphinx waves quart jug of bad milk. A very bad quack might jinx zippy fowls. Few quips galvanized the mock jury box. Quick brown dogs jump over the lazy fox. The jay, pig, fox, zebra, and my wolves quack! Blowzy red vixens fight for a quick jump. Joaquin Phoenix was gazed by MTV for luck. A wizard’s job is to vex chumps quickly in fog. Watch "Jeopardy!", Alex Trebek's fun TV quiz game. Woven silk pyjamas exchanged for blue quartz. Brawny gods just Typical Text Feature ‘Last Open Stage’ Text Feature align answer collect contracting negotiate opportunity won qualify qualify/align Problem #4 – Feature types abused
  • 30. outcome/label Opportunity Won value of this feature is a leaker Problem #4 – Feature types abused
  • 31. ● Analyze every feature and output descriptive statistics ○ Mean ○ Min ○ Max ○ Variance ○ Number of Nulls ● Ensure Features have acceptable ranges Automatic Feature Selection
  • 32. ● Analyse each feature correlation to the label, who has the most and least predictive power? ● Drop features with low predictive power Automatic Feature Selection
  • 33. Auto Bucketize training vs scoring Feature Lineage
  • 34. Need to know the true label to evaluate the model ● Usually do a random train/holdout split on the labeled data and use cross-validation on training set Evaluating Models Training set Holdout set
  • 35. ● Time-based evaluation dataset is the true test of how well a model is performing ○ Wait for existing (or new) records to have their label determined ○ Predict from older state of that record and compare to the true label ● Biggest problem is usually waiting for enough data to be available ● We can also switch over to constructing the model from the true event sequence rather than a snapshot Evaluating Models
  • 36. What does label leakage look like?
  • 37. What does label leakage look like?
  • 38. Leakers removed by AutoML: 73 Leakers removed by data scientist hand tuning: 42 Department mkto_si__Last_Interesting_Moment__c Description OtherPostalCode et4ae5__Mobile_Country_Code__c Title mkto2__Acquisition_Program_Id__c JigsawContactId ReportsToId OtherCity pi__last_activity__c MailingLongitude pi__first_activity__c AssistantPhone HomePhone Fax OtherStreet Partner_Last_Name__c mkto_si__Last_Interesting_Moment_Desc__c mkto2__Acquisition_Program__c Jigsaw Company__c OtherLongitude AssistantName Salutation OtherLatitude Purchase_Motivation__c Secondary_Email__c TimetoPurchase__c mkto_si__Last_Interesting_Moment_Source__c MailingGeocodeAccuracy MailingLatitude pi__created_date__c CommentCapture__c Preferred_Communication_Method__c TopPriorityValue__c mkto_si__Last_Interesting_Moment_Type__c OtherState TopPriorityProcess__c OtherCountry MasterRecordId OtherGeocodeAccuracy TopPriorityProduct__c emailbounceddate lastcurequestdate lastcuupdatedate lastreferenceddate lastvieweddate mkto2__acquisition_date__c mkto_si__hidedate__c pi__grade__c pi__notes__c pi__utm_content__c account_link_easy_closets__c csat_survey_completed_date__c csat_survey_net_promoter_score__c csat_survey_results_link__c birthdate mkto_si__last_interesting_moment_date__c pi__campaign__c pi__comments__c pi__first_search_term__c pi__first_search_type__c pi__first_touch_url__c pi__score__c pi__url__c pi__utm_campaign__c pi__utm_medium__c pi__utm_source__c historical_lead_score__c pi__utm_term__c first_activity_timestamp__c predicted_likelihood_to_purchase_2__c best_time_to_call_date__ c total_lead_score__c csat_customer_service_s urvey_disallowed__c referral_credit_applied__c referral_days_til_purchas e__c predicted_likelihood_to_p urchase__c createdbyid createddate lastactivitydate lastmodifieddate last_activity_date__c systemmodstamp AutoML vs Hand Tuned – Showdown
  • 39. Live Prediction Results AutoML vs Hand Tuned – Showdown
  • 41. Automated Model Selection ● Many hyperparameters for each algorithm ● Automated Hyperparameter tuning ○ Faster model creation with improved metrics ○ Search algorithms to find the optimal hyperparameters, e.g grid search, random search Grid Search Bayesian SearchRandom Search
  • 42. Random Forests Decision Trees Logistic Regression w/ ElasticNet Regularization Naive Bayes Gradient Boosted trees Decision Trees Random Forests Linear Regression w/ ElasticNet Regularization Random Forests Decision Trees Multinomial Logistic Regression w/ ElasticNet Naive Bayes Compete Algorithms RMSE AccuracyAuROC Regression Binary Classification Multi-Class Classification Automated Model Selection
  • 43. Different Permutation of Thresholds Leads to Different Results
  • 45. How well does it work? • TransmogrifAI empowers: • Predictive Journeys • Lead Scoring • Prediction Builder • Case Classification • Most of the models deployed in production are completely hands free • Serves 3B+ 5B+ predictions per day
  • 46. Where do WE go next? • Deeper model & score insights – LOCO, LIME • Hyper parameter search strategies – Bayesian, Bandit-based • Feature engineering – text embeddings, model specific • Model portability • Enable more applications – recommenders, unsupervised learning • Perf tuning, bug fixes, docs, examples • <Your requirements / feedback>
  • 47. Where do YOU go next? • Read the blog post - https://www.sfdc.co/open-sourcing-transmogrifai • Try it out - https://transmogrif.ai • Reach out and contribute - https://sfdc.co/transmogrifai-contributing • Student? Apply to Google Summer of Code (GSoC) 2019 to work with us! • Feeling creative? We need a logo.