SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Python	
  as	
  part	
  of	
  a	
  produc0on	
  
machine	
  learning	
  stack	
  
	
  
	
  
	
  
Michael	
  Manapat	
  
@mlmanapat	
  
Stripe	
  
	
  
Outline	
  
	
  
-­‐Why	
  we	
  need	
  ML	
  at	
  Stripe	
  
-­‐Simple	
  models	
  with	
  sklearn	
  
-­‐Pipelines	
  with	
  Luigi	
  
-­‐Scoring	
  as	
  a	
  service	
  
	
  
Stripe	
  is	
  a	
  technology	
  company	
  
focusing	
  on	
  making	
  payments	
  easy	
  
	
  
-­‐Short	
  applica>on	
  
	
  
Tokeniza0on	
  
	
  
	
   Customer	
  
browser	
  
Stripe	
  
Stripe.js	
  
Token	
  
Merchant	
  
server	
  
Stripe	
  
API	
  call	
  
Result	
  
API	
  Call	
  
	
  
import stripe

stripe.Charge.create(

amount=400,

currency="usd",

card="tok_103xnl2gR5VxTSB”

email=customer@example.com"

)"
Fraud	
  /	
  business	
  viola0ons	
  
	
  
-­‐Terms	
  of	
  service	
  viola>ons	
  (weapons)	
  
-­‐Merchant	
  fraud	
  (card	
  “cashers”)	
  	
  	
  
-­‐Transac>on	
  fraud	
  
	
  
-­‐No	
  machine	
  learning	
  a	
  year	
  ago	
  
Fraud	
  /	
  business	
  viola0ons	
  
	
  
-­‐Terms	
  of	
  service	
  viola>ons	
  
	
  
E-­‐cigareMes,	
  drugs,	
  weapons,	
  etc.	
  
	
  
How	
  do	
  we	
  find	
  these	
  automa>cally?	
  
Merchant	
  sign	
  up	
  flow	
  
	
  
	
  
	
  
	
  
Applica>on	
  
submission	
  
Website	
  
scraped	
  
Text	
  scored	
  
Applica>on	
  
reviewed	
  
Merchant	
  sign	
  up	
  flow	
  
	
  
	
  
	
  
	
  
Applica>on	
  
submission	
  
Website	
  
scraped	
  
Text	
  scored	
  
Applica>on	
  
reviewed	
  
Machine	
  
learning	
  
pipeline	
  and	
  
service	
  
Building	
  a	
  classifier:	
  e-­‐cigareIes	
  
	
  
data = pandas.from_pickle(‘ecigs’)

data.head()



text violator

0 " please verify your age i am 21 years or older ... True

1 coming soon toggle me drag me with your mouse ... False

2 drink moscow mules cart 0 log in or create an ... False

3 vapors electronic cigarette buy now insuper st... True

4 t-shirts shorts hawaii about us silver coll... False



[5 rows x 2 columns]	
  
Features	
  for	
  text	
  classifica0on	
  
	
  
cv = CountVectorizer



features = 

cv.fit_transform(data['text'])



Sparse	
  matrix	
  of	
  word	
  counts	
  from	
  
input	
  text	
  (omiSng	
  feature	
  selec>on)	
  
Features	
  for	
  text	
  classifica0on	
  


X_train, X_test, y_train, y_test = 

train_test_split(

features, data['violator'], 

test_size=0.2)



-­‐Avoid	
  leakage	
  
-Other	
  cross-­‐valida>on	
  methods	
  
Training	
  


model = LogisticRegression()

model.fit(X_train, y_train)



Serializer	
  reads	
  from	
  


model.intercept_

model.coef_

	
  
Valida0on	
  


probs = model.predict_proba(X_test)



fpr, tpr, thresholds =

roc_curve(y_test, probs[:, 1])



matplotlib.pyplot(fpr, tpr)	
  
ROC:	
  Receiver	
  opera0ng	
  characteris0c	
  




	
  
Pipeline	
  
	
  
-­‐Fetch	
  website	
  snapshots	
  from	
  S3	
  
-­‐Fetch	
  classifica>ons	
  from	
  SQL/Impala	
  
-­‐Sani>ze	
  text	
  (strip	
  HTML)	
  
-­‐Run	
  feature	
  genera>on	
  and	
  selec>on	
  
-­‐Train	
  and	
  serialize	
  model	
  
-­‐Export	
  valida>on	
  sta>s>cs	
  
Luigi	
  
	
  
class GetSnapshots(luigi.Task):

def run(self):

" "...



class GenFeatures(luigi.Task):

def requires(self):

return GetSnapshots()"
Luigi	
  runs	
  tasks	
  on	
  Hadoop	
  cluster	
  
"
Scoring	
  as	
  a	
  service	
  
	
  
"Applica>on	
  
submission	
  
Website	
  
scraped	
  
Text	
  scored	
  
Applica>on	
  
reviewed	
  
ThriO	
  
RPC	
  
Scoring	
  
Service	
  
Scoring	
  as	
  a	
  service	
  
	
  
struct ScoringRequest {

1: string text

2: optional string model_name

}



struct ScoringResponse {

1: double score" " "// Experiments?

2: double request_duration

}"
Why	
  a	
  service?	
  
	
  
-­‐Same	
  code	
  base	
  for	
  training/scoring	
  
	
  
-­‐Reduced	
  duplica>on/easier	
  deploys	
  
	
  
-­‐Experimenta>on	
  
	
  
-­‐Log	
  requests	
  
	
  and	
  responses	
  
	
  (Parquet/Impala)	
  
	
  
-­‐Centralized	
  
	
  monitoring	
  
	
  (Graphite)	
  
Summary	
  
	
  
-­‐Simple	
  models	
  with	
  sklearn	
  
-­‐Pipelines	
  with	
  Luigi	
  
-­‐Scoring	
  as	
  a	
  service	
  
	
  
Thanks!	
  
@mlmanapat	
  
	
  

Contenu connexe

Tendances

Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
Less04 database instance
Less04 database instanceLess04 database instance
Less04 database instanceAmit Bhalla
 
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...DataStax
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...HostedbyConfluent
 
Seamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive WarehouseSeamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive WarehouseDataWorks Summit
 
Présentation Oracle DataBase 11g
Présentation Oracle DataBase 11gPrésentation Oracle DataBase 11g
Présentation Oracle DataBase 11gCynapsys It Hotspot
 
How a Developer can Troubleshoot a SQL performing poorly on a Production DB
How a Developer can Troubleshoot a SQL performing poorly on a Production DBHow a Developer can Troubleshoot a SQL performing poorly on a Production DB
How a Developer can Troubleshoot a SQL performing poorly on a Production DBCarlos Sierra
 
Evidence-Based Business Process Management
Evidence-Based Business Process ManagementEvidence-Based Business Process Management
Evidence-Based Business Process ManagementMarlon Dumas
 
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLWebinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLSeveralnines
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberFlink Forward
 
Your tuning arsenal: AWR, ADDM, ASH, Metrics and Advisors
Your tuning arsenal: AWR, ADDM, ASH, Metrics and AdvisorsYour tuning arsenal: AWR, ADDM, ASH, Metrics and Advisors
Your tuning arsenal: AWR, ADDM, ASH, Metrics and AdvisorsJohn Kanagaraj
 
Presentation oracle net services
Presentation    oracle net servicesPresentation    oracle net services
Presentation oracle net servicesxKinAnx
 
Oracle database performance tuning
Oracle database performance tuningOracle database performance tuning
Oracle database performance tuningYogiji Creations
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsFlink Forward
 

Tendances (20)

Introduction to influx db
Introduction to influx dbIntroduction to influx db
Introduction to influx db
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Less04 database instance
Less04 database instanceLess04 database instance
Less04 database instance
 
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Airflow 101
Airflow 101Airflow 101
Airflow 101
 
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
 
Seamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive WarehouseSeamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive Warehouse
 
Présentation Oracle DataBase 11g
Présentation Oracle DataBase 11gPrésentation Oracle DataBase 11g
Présentation Oracle DataBase 11g
 
Oracle GoldenGate
Oracle GoldenGate Oracle GoldenGate
Oracle GoldenGate
 
How a Developer can Troubleshoot a SQL performing poorly on a Production DB
How a Developer can Troubleshoot a SQL performing poorly on a Production DBHow a Developer can Troubleshoot a SQL performing poorly on a Production DB
How a Developer can Troubleshoot a SQL performing poorly on a Production DB
 
Evidence-Based Business Process Management
Evidence-Based Business Process ManagementEvidence-Based Business Process Management
Evidence-Based Business Process Management
 
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLWebinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
 
Your tuning arsenal: AWR, ADDM, ASH, Metrics and Advisors
Your tuning arsenal: AWR, ADDM, ASH, Metrics and AdvisorsYour tuning arsenal: AWR, ADDM, ASH, Metrics and Advisors
Your tuning arsenal: AWR, ADDM, ASH, Metrics and Advisors
 
SQL Tuning 101
SQL Tuning 101SQL Tuning 101
SQL Tuning 101
 
Presentation oracle net services
Presentation    oracle net servicesPresentation    oracle net services
Presentation oracle net services
 
Apache Airflow overview
Apache Airflow overviewApache Airflow overview
Apache Airflow overview
 
Oracle database performance tuning
Oracle database performance tuningOracle database performance tuning
Oracle database performance tuning
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 

En vedette

Machine learning in production with scikit-learn
Machine learning in production with scikit-learnMachine learning in production with scikit-learn
Machine learning in production with scikit-learnJeff Klukas
 
Production machine learning_infrastructure
Production machine learning_infrastructureProduction machine learning_infrastructure
Production machine learning_infrastructurejoshwills
 
Using PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataUsing PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataRobert Dempsey
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsTuri, Inc.
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningStepan Pushkarev
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operationsStepan Pushkarev
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in productionTuri, Inc.
 
Managing and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonManaging and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonSimon Frid
 
Square's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanSquare's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanHakka Labs
 
Building A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning PipelineBuilding A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning PipelineRobert Dempsey
 
PostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CapturePostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CaptureJeff Klukas
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In ProductionSamir Bessalah
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelinesjeykottalam
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architectureStepan Pushkarev
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017Carol Smith
 

En vedette (16)

Machine learning in production with scikit-learn
Machine learning in production with scikit-learnMachine learning in production with scikit-learn
Machine learning in production with scikit-learn
 
Production machine learning_infrastructure
Production machine learning_infrastructureProduction machine learning_infrastructure
Production machine learning_infrastructure
 
Using PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataUsing PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of Data
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning Models
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learning
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operations
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 
Managing and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonManaging and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in Python
 
Square's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanSquare's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong Yan
 
Building A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning PipelineBuilding A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning Pipeline
 
PostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CapturePostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data Capture
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In Production
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architecture
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
 

Similaire à Python as part of a production machine learning stack by Michael Manapat PyData SV 2014

미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가
미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가
미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가JaeCheolKim10
 
MLflow at Company Scale
MLflow at Company ScaleMLflow at Company Scale
MLflow at Company ScaleDatabricks
 
Vertical Recommendation Using Collaborative Filtering
Vertical Recommendation Using Collaborative FilteringVertical Recommendation Using Collaborative Filtering
Vertical Recommendation Using Collaborative Filteringgorass
 
Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2zhang hua
 
Iswim for testing
Iswim for testingIswim for testing
Iswim for testingClarkTony
 
Iswim for testing
Iswim for testingIswim for testing
Iswim for testingClarkTony
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaChetan Khatri
 
Machine Learning with Microsoft Azure
Machine Learning with Microsoft AzureMachine Learning with Microsoft Azure
Machine Learning with Microsoft AzureDmitry Petukhov
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...Chetan Khatri
 
Introduction to AWS Step Functions:
Introduction to AWS Step Functions: Introduction to AWS Step Functions:
Introduction to AWS Step Functions: Amazon Web Services
 
Profiling Mondrian MDX Requests in a Production Environment
Profiling Mondrian MDX Requests in a Production EnvironmentProfiling Mondrian MDX Requests in a Production Environment
Profiling Mondrian MDX Requests in a Production EnvironmentRaimonds Simanovskis
 
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...HostedbyConfluent
 
stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by...
stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by...stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by...
stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by...NETWAYS
 
Becoming a SOC2 Ruby Shop - Montreal.rb November, 5, 2022 Ruby Meetup
Becoming a SOC2 Ruby Shop - Montreal.rb November, 5, 2022 Ruby MeetupBecoming a SOC2 Ruby Shop - Montreal.rb November, 5, 2022 Ruby Meetup
Becoming a SOC2 Ruby Shop - Montreal.rb November, 5, 2022 Ruby MeetupAndy Maleh
 
Building Machine Learning Pipelines
Building Machine Learning PipelinesBuilding Machine Learning Pipelines
Building Machine Learning PipelinesInMobi Technology
 
IRJET- Credit Card Fraud Detection : A Comparison using Random Forest, SVM an...
IRJET- Credit Card Fraud Detection : A Comparison using Random Forest, SVM an...IRJET- Credit Card Fraud Detection : A Comparison using Random Forest, SVM an...
IRJET- Credit Card Fraud Detection : A Comparison using Random Forest, SVM an...IRJET Journal
 
Machine learning techniques in fraud prevention
Machine learning techniques in fraud preventionMachine learning techniques in fraud prevention
Machine learning techniques in fraud preventionVolodymyr Syzonenko
 
Meet TransmogrifAI, Open Source AutoML That Powers Einstein Predictions
Meet TransmogrifAI, Open Source AutoML That Powers Einstein PredictionsMeet TransmogrifAI, Open Source AutoML That Powers Einstein Predictions
Meet TransmogrifAI, Open Source AutoML That Powers Einstein PredictionsMatthew Tovbin
 

Similaire à Python as part of a production machine learning stack by Michael Manapat PyData SV 2014 (20)

미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가
미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가
미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가
 
MLflow at Company Scale
MLflow at Company ScaleMLflow at Company Scale
MLflow at Company Scale
 
Vertical Recommendation Using Collaborative Filtering
Vertical Recommendation Using Collaborative FilteringVertical Recommendation Using Collaborative Filtering
Vertical Recommendation Using Collaborative Filtering
 
Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2
 
Iswim for testing
Iswim for testingIswim for testing
Iswim for testing
 
Iswim for testing
Iswim for testingIswim for testing
Iswim for testing
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
 
Machine Learning with Microsoft Azure
Machine Learning with Microsoft AzureMachine Learning with Microsoft Azure
Machine Learning with Microsoft Azure
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
 
Introduction to AWS Step Functions:
Introduction to AWS Step Functions: Introduction to AWS Step Functions:
Introduction to AWS Step Functions:
 
Profiling Mondrian MDX Requests in a Production Environment
Profiling Mondrian MDX Requests in a Production EnvironmentProfiling Mondrian MDX Requests in a Production Environment
Profiling Mondrian MDX Requests in a Production Environment
 
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
 
Payments On Rails
Payments On RailsPayments On Rails
Payments On Rails
 
stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by...
stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by...stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by...
stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by...
 
MaheshCV_Yepme
MaheshCV_YepmeMaheshCV_Yepme
MaheshCV_Yepme
 
Becoming a SOC2 Ruby Shop - Montreal.rb November, 5, 2022 Ruby Meetup
Becoming a SOC2 Ruby Shop - Montreal.rb November, 5, 2022 Ruby MeetupBecoming a SOC2 Ruby Shop - Montreal.rb November, 5, 2022 Ruby Meetup
Becoming a SOC2 Ruby Shop - Montreal.rb November, 5, 2022 Ruby Meetup
 
Building Machine Learning Pipelines
Building Machine Learning PipelinesBuilding Machine Learning Pipelines
Building Machine Learning Pipelines
 
IRJET- Credit Card Fraud Detection : A Comparison using Random Forest, SVM an...
IRJET- Credit Card Fraud Detection : A Comparison using Random Forest, SVM an...IRJET- Credit Card Fraud Detection : A Comparison using Random Forest, SVM an...
IRJET- Credit Card Fraud Detection : A Comparison using Random Forest, SVM an...
 
Machine learning techniques in fraud prevention
Machine learning techniques in fraud preventionMachine learning techniques in fraud prevention
Machine learning techniques in fraud prevention
 
Meet TransmogrifAI, Open Source AutoML That Powers Einstein Predictions
Meet TransmogrifAI, Open Source AutoML That Powers Einstein PredictionsMeet TransmogrifAI, Open Source AutoML That Powers Einstein Predictions
Meet TransmogrifAI, Open Source AutoML That Powers Einstein Predictions
 

Plus de PyData

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerPyData
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPyData
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydPyData
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverPyData
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldPyData
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 

Plus de PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 

Dernier

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Dernier (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Python as part of a production machine learning stack by Michael Manapat PyData SV 2014

  • 1. Python  as  part  of  a  produc0on   machine  learning  stack         Michael  Manapat   @mlmanapat   Stripe    
  • 2. Outline     -­‐Why  we  need  ML  at  Stripe   -­‐Simple  models  with  sklearn   -­‐Pipelines  with  Luigi   -­‐Scoring  as  a  service    
  • 3. Stripe  is  a  technology  company   focusing  on  making  payments  easy     -­‐Short  applica>on    
  • 4. Tokeniza0on       Customer   browser   Stripe   Stripe.js   Token   Merchant   server   Stripe   API  call   Result  
  • 5. API  Call     import stripe
 stripe.Charge.create(
 amount=400,
 currency="usd",
 card="tok_103xnl2gR5VxTSB”
 email=customer@example.com"
 )"
  • 6. Fraud  /  business  viola0ons     -­‐Terms  of  service  viola>ons  (weapons)   -­‐Merchant  fraud  (card  “cashers”)       -­‐Transac>on  fraud     -­‐No  machine  learning  a  year  ago  
  • 7. Fraud  /  business  viola0ons     -­‐Terms  of  service  viola>ons     E-­‐cigareMes,  drugs,  weapons,  etc.     How  do  we  find  these  automa>cally?  
  • 8. Merchant  sign  up  flow           Applica>on   submission   Website   scraped   Text  scored   Applica>on   reviewed  
  • 9. Merchant  sign  up  flow           Applica>on   submission   Website   scraped   Text  scored   Applica>on   reviewed   Machine   learning   pipeline  and   service  
  • 10. Building  a  classifier:  e-­‐cigareIes     data = pandas.from_pickle(‘ecigs’)
 data.head()
 
 text violator
 0 " please verify your age i am 21 years or older ... True
 1 coming soon toggle me drag me with your mouse ... False
 2 drink moscow mules cart 0 log in or create an ... False
 3 vapors electronic cigarette buy now insuper st... True
 4 t-shirts shorts hawaii about us silver coll... False
 
 [5 rows x 2 columns]  
  • 11. Features  for  text  classifica0on     cv = CountVectorizer
 
 features = 
 cv.fit_transform(data['text'])
 
 Sparse  matrix  of  word  counts  from   input  text  (omiSng  feature  selec>on)  
  • 12. Features  for  text  classifica0on   
 X_train, X_test, y_train, y_test = 
 train_test_split(
 features, data['violator'], 
 test_size=0.2)
 
 -­‐Avoid  leakage   -Other  cross-­‐valida>on  methods  
  • 13. Training   
 model = LogisticRegression()
 model.fit(X_train, y_train)
 
 Serializer  reads  from   
 model.intercept_
 model.coef_
  
  • 14. Valida0on   
 probs = model.predict_proba(X_test)
 
 fpr, tpr, thresholds =
 roc_curve(y_test, probs[:, 1])
 
 matplotlib.pyplot(fpr, tpr)  
  • 15. ROC:  Receiver  opera0ng  characteris0c   
 
  
  • 16. Pipeline     -­‐Fetch  website  snapshots  from  S3   -­‐Fetch  classifica>ons  from  SQL/Impala   -­‐Sani>ze  text  (strip  HTML)   -­‐Run  feature  genera>on  and  selec>on   -­‐Train  and  serialize  model   -­‐Export  valida>on  sta>s>cs  
  • 17. Luigi     class GetSnapshots(luigi.Task):
 def run(self):
 " "...
 
 class GenFeatures(luigi.Task):
 def requires(self):
 return GetSnapshots()"
  • 18. Luigi  runs  tasks  on  Hadoop  cluster   "
  • 19. Scoring  as  a  service     "Applica>on   submission   Website   scraped   Text  scored   Applica>on   reviewed   ThriO   RPC   Scoring   Service  
  • 20. Scoring  as  a  service     struct ScoringRequest {
 1: string text
 2: optional string model_name
 }
 
 struct ScoringResponse {
 1: double score" " "// Experiments?
 2: double request_duration
 }"
  • 21. Why  a  service?     -­‐Same  code  base  for  training/scoring     -­‐Reduced  duplica>on/easier  deploys     -­‐Experimenta>on    
  • 22. -­‐Log  requests    and  responses    (Parquet/Impala)     -­‐Centralized    monitoring    (Graphite)  
  • 23. Summary     -­‐Simple  models  with  sklearn   -­‐Pipelines  with  Luigi   -­‐Scoring  as  a  service     Thanks!   @mlmanapat