SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
Hopsworks
Feature Store 2.0,
a new paradigm
Jim Dowling
Logical Clocks
2020-12-14
1st Global Feature Stores
for ML Meetup
Growing Consensus on how to manage complexity of AI
Feature Store Online
Distributed
Training
Model
Serving
A/B
Testing
Monitoring
Pipeline Management
HyperParameter
Tuning
Feature Store Offline
Feature
Engineering
Connectors
to External
Data Sources
Data Model Prediction
φ(x)
2
Growing Consensus on how to manage complexity of AI
Data validation
Distributed
ENGINEER
Model
Serving
A/B
Testing
Monitoring
Pipeline Management
HyperParameter
Tuning
Feature Engineering
Data
Collection
Hardware
Management
Data Model Prediction
φ(x)
ML PLATFORM
TRAIN and SERVE
FEATURE
STORE
End-to-End ML Pipelines and the Feature Store
Data Lake,
Warehouse,
Kafka
Feature
Store
Model
registry
Feature
Engineering
Model
Serving
Model
Training
Model
Deploy
Features
Validate
Retrieve Feature Values
End-to-End ML Pipelines and the Feature Store with CI/CD
Code and
configuration
Data Lake,
Warehouse,
Kafka
Feature
Store
Model
registry
Feature
Engineering
Model
Serving
Model
Training
Model
Deploy
Model
Monitoring
Experiments/
Development
Features
Validate
Retrieve Feature Values
Log Predictions, Retrieve Feature Statistics for Data Drift Detection
End-to-End ML Pipelines and the Feature Store with CI/CD and Provenance
Code and
configuration
Data Lake,
Warehouse,
Kafka
Feature
Store
Model
registry
Feature
Engineering
Model
Serving
Model
Training
Model
Deploy
Model
Monitoring
Experiments/
Development
Scaleout
Metadata
Features
Validate
Retrieve Feature Values
Log Predictions, Retrieve Feature Statistics for Data Drift Detection
Elasticsearch
Sync
Hopsworks Feature Store Concepts: Features, Feature Groups, and Training Datasets
Features name Pclass Sex Survive Name Balance
Feature
Groups
Titanic
Passenger List
Passenger
Bank Account
Hopsworks Feature Store Concepts: Features, Feature Groups, and Training Datasets
Features name Pclass Sex Survive Name Balance
Training
Datasets
Survivename PClass Sex Balance
Join
Feature
Groups
Titanic
Passenger List
Passenger
Bank Account
Hopsworks Feature Store Concepts: Features, Feature Groups, and Training Datasets
Features name Pclass Sex Survive Name Balance
Training
Datasets
Survivename PClass Sex Balance
Join
Feature
Groups
Titanic
Passenger List
Passenger
Bank Account
File format
.tfrecord
.npy
.csv
.hdf5,
.petastorm,
etc
Storage
Azure
S3
HopsFS
Features are created/updated at different cadences
Click features every 10 secs
CDC data every 30 secs
User profile updates every hour
Featurized weblogs data every day
Online
Feature
Store
Offline
Feature
Store
SQL DW
S3, HDFS
SQL
Event Data
Real-Time Data
User-Entered Features (<2 secs) Online
App
Low
Latency
Features
High
Latency
Features
Train,
Batch App
Feature Store
<10ms
TBs/PBs
FeatureGroup Ingestion in Hopsworks
Feature Store
ClickFeatureGroup
TableFeatureGroup
UserFeatureGroup
LogsFeatureGroup
Event Data
SQL DW
S3, HDFS
SQL
DataFrameAPI
Kafka Input
RTFeatureGroup
Online
App
Train,
Batch App
User Clicks
DB Updates
User Profile Updates
Weblogs
Hof: Real-time feature
Engineering
Kafka Output
Hopsworks Feature Store V1 API
First Feature Store with a General Purpose DataFrame API
Feature Store is a cache for materialized features, not a library.
Online and Offline Feature Stores to support low latency and scale, respectively
Reuse of Features means JOINS – Spark as a join engine
Hopsworks Feature Store V2 API
Enforce feature-group scope and schema+data versioning as best practice
Better support for multiple feature stores - join features from development and
production feature stores
Better support for complex joins of features
First class API support for time-travel
Support any Python or Spark client with a single library
Example Ingestion of data into a FeatureGroup
https://docs.hopsworks.ai/
dataframe = spark.read.json("s3://dataset/rain.json")
# do feature engineering on your dataframe
df.withColumn('precipitation', (df.val-min)/(max-min))
fg = fs.create_feature_group("rain",
version=1,
description="Rain features",
primary_key=['date', 'location_id'],
online_enabled=True)
fg.save(dataframe)
fg.add_tag(name=“ingestion, value=“Databricks:jim; Pii;notebook.ipynb”)
# Join features across FeatureGroups. Use “on=[..]” to explicitly enter the JOIN
key.
feature_join = rain_fg.select_all()
.join(temperature_fg.select_all(), on=["date", "location_id"])
.join(location_fg.select_all()))
sc = fs.get_storage_connector("myBucket", "S3")
td = fs.create_training_dataset("training_dataset", version=1,
storage_connector=sc,
data_format="tfrecords",
description="Training dataset, TfRecords format",
splits={'train': 0.7, 'test': 0.2, 'validate':
0.1})
td.save(feature_join)
# When training a model, read the training data (use “test” to read test data):
ds = td.read(split="train")
Example Creation of Train/Test Data from a Feature Store
https://docs.hopsworks.ai/
FeatureGroup Time-Travel
https://docs.hopsworks.ai/
fg.insert(upsert_df)
fg.commit_details()
df = fs.get_feature_group(“rain”, 1)
fg.read(“2020-12-15 09:00:01”).show()
fg.read_changes(“2020-12-14 09:00:01”,
“2020-12-15 09:00:01”).show()
Commit1
Timestamp1
Commit2
Timestamp2
... ...
... ...
Commitn
Timestampn
Feature Group (v1)
FeatureGroup Time-Travel
https://docs.hopsworks.ai/
fg.insert(upsert_df)
fg.commit_details()
df = fs.get_feature_group(“rain”, 1)
fg.read(“2020-12-15 09:00:01”).show()
fg.read_changes(“2020-12-14 09:00:01”,
“2020-12-15 09:00:01”).show()
Commit1
Timestamp1
Commit2
Timestamp2
... ...
... ...
Commitn
Timestampn
show
log
Feature Group (v1)
FeatureGroup Schema Versioning
https://docs.hopsworks.ai/
fg.insert(upsert_df)
fg.commit_details()
df = fs.get_feature_group(“rain”, 1)
fg.read(“2020-12-15 09:00:01”).show()
fg.read_changes(“2020-12-14 09:00:01”,
“2020-12-15 09:00:01”).show()
Commit1
Timestamp1
Commit2
Timestamp2
... ...
... ...
Commitn
Timestampn
Feature Group (v1)
Feature Group (v2)
latest
commit
of
schema
(v1)
FeatureGroup Time-Travel
https://docs.hopsworks.ai/
fg.insert(upsert_df)
fg.commit_details()
df = fs.get_feature_group(“rain”, 1)
fg.read(“2020-12-15 09:00:01”).show()
fg.read_changes(“2020-12-14 09:00:01”,
“2020-12-15 09:00:01”).show()
Commit1
Timestamp1
Commit2
Timestamp2
... ...
Commitn-1
Commitn
Timestampn
2020-12-15
09:00:01
Feature Group (v1)
FeatureGroup Time-Travel
https://docs.hopsworks.ai/
fg.insert(upsert_df)
fg.commit_details()
df = fs.get_feature_group(“rain”, 1)
fg.read(“2020-12-15 09:00:01”).show()
fg.read_changes(“2020-12-14 09:00:01”,
“2020-12-15 09:00:01”).show()
Commit1
Timestamp1
Commit2
2020-12-14
09:00:01
... ...
Commitn-1
Commitn
Timestampn
2020-12-15
09:00:01
Feature Group (v1)
Hopsworks Demo
github.com/logicalclocks/hopsworks
-
@logicalclocks
-
www.logicalclocks.com

Contenu connexe

Tendances

MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
Data Science Milan
 
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Lucidworks
 

Tendances (20)

Hopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AIHopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AI
 
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
 
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML PipelinesPyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
 
Feature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systemsFeature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systems
 
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptxDowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
 
When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?
 
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)
 
MLeap: Productionize Data Science Workflows Using Spark
MLeap: Productionize Data Science Workflows Using SparkMLeap: Productionize Data Science Workflows Using Spark
MLeap: Productionize Data Science Workflows Using Spark
 
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastScaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and Feast
 
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
 
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleData Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
 
Streaming Inference with Apache Beam and TFX
Streaming Inference with Apache Beam and TFXStreaming Inference with Apache Beam and TFX
Streaming Inference with Apache Beam and TFX
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Powering Custom Apps at Facebook using Spark Script Transformation
Powering Custom Apps at Facebook using Spark Script TransformationPowering Custom Apps at Facebook using Spark Script Transformation
Powering Custom Apps at Facebook using Spark Script Transformation
 
Spark Seattle meetup - Breaking ETL barrier with Spark Streaming
Spark Seattle meetup - Breaking ETL barrier with Spark StreamingSpark Seattle meetup - Breaking ETL barrier with Spark Streaming
Spark Seattle meetup - Breaking ETL barrier with Spark Streaming
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&M
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...
Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...
Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...
 

Similaire à Hopsworks Feature Store 2.0 a new paradigm

Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowling
Jim Dowling
 
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
Databricks
 

Similaire à Hopsworks Feature Store 2.0 a new paradigm (20)

Hamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreHamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature Store
 
Building a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkBuilding a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache Spark
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowling
 
Large-Scale Data Science in Apache Spark 2.0
Large-Scale Data Science in Apache Spark 2.0Large-Scale Data Science in Apache Spark 2.0
Large-Scale Data Science in Apache Spark 2.0
 
Using DuckDB ArrowFlight to Power a Feature Store
Using DuckDB ArrowFlight to Power a Feature StoreUsing DuckDB ArrowFlight to Power a Feature Store
Using DuckDB ArrowFlight to Power a Feature Store
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
 
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...
 
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architecture
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQL
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesIntroducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
 
Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark
 
Streaming etl in practice with postgre sql, apache kafka, and ksql mic
Streaming etl in practice with postgre sql, apache kafka, and ksql micStreaming etl in practice with postgre sql, apache kafka, and ksql mic
Streaming etl in practice with postgre sql, apache kafka, and ksql mic
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
 

Plus de Jim Dowling

Plus de Jim Dowling (20)

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdf
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdf
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money Laundering
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020
 
The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines
 
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocks
 
Berlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsBerlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on Hops
 
All AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIAll AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AI
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)
 
End-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in FinanceEnd-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in Finance
 
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraScaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
 
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUsScaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Hopsworks Feature Store 2.0 a new paradigm

  • 1. Hopsworks Feature Store 2.0, a new paradigm Jim Dowling Logical Clocks 2020-12-14 1st Global Feature Stores for ML Meetup
  • 2. Growing Consensus on how to manage complexity of AI Feature Store Online Distributed Training Model Serving A/B Testing Monitoring Pipeline Management HyperParameter Tuning Feature Store Offline Feature Engineering Connectors to External Data Sources Data Model Prediction φ(x) 2
  • 3. Growing Consensus on how to manage complexity of AI Data validation Distributed ENGINEER Model Serving A/B Testing Monitoring Pipeline Management HyperParameter Tuning Feature Engineering Data Collection Hardware Management Data Model Prediction φ(x) ML PLATFORM TRAIN and SERVE FEATURE STORE
  • 4. End-to-End ML Pipelines and the Feature Store Data Lake, Warehouse, Kafka Feature Store Model registry Feature Engineering Model Serving Model Training Model Deploy Features Validate Retrieve Feature Values
  • 5. End-to-End ML Pipelines and the Feature Store with CI/CD Code and configuration Data Lake, Warehouse, Kafka Feature Store Model registry Feature Engineering Model Serving Model Training Model Deploy Model Monitoring Experiments/ Development Features Validate Retrieve Feature Values Log Predictions, Retrieve Feature Statistics for Data Drift Detection
  • 6. End-to-End ML Pipelines and the Feature Store with CI/CD and Provenance Code and configuration Data Lake, Warehouse, Kafka Feature Store Model registry Feature Engineering Model Serving Model Training Model Deploy Model Monitoring Experiments/ Development Scaleout Metadata Features Validate Retrieve Feature Values Log Predictions, Retrieve Feature Statistics for Data Drift Detection Elasticsearch Sync
  • 7. Hopsworks Feature Store Concepts: Features, Feature Groups, and Training Datasets Features name Pclass Sex Survive Name Balance Feature Groups Titanic Passenger List Passenger Bank Account
  • 8. Hopsworks Feature Store Concepts: Features, Feature Groups, and Training Datasets Features name Pclass Sex Survive Name Balance Training Datasets Survivename PClass Sex Balance Join Feature Groups Titanic Passenger List Passenger Bank Account
  • 9. Hopsworks Feature Store Concepts: Features, Feature Groups, and Training Datasets Features name Pclass Sex Survive Name Balance Training Datasets Survivename PClass Sex Balance Join Feature Groups Titanic Passenger List Passenger Bank Account File format .tfrecord .npy .csv .hdf5, .petastorm, etc Storage Azure S3 HopsFS
  • 10. Features are created/updated at different cadences Click features every 10 secs CDC data every 30 secs User profile updates every hour Featurized weblogs data every day Online Feature Store Offline Feature Store SQL DW S3, HDFS SQL Event Data Real-Time Data User-Entered Features (<2 secs) Online App Low Latency Features High Latency Features Train, Batch App Feature Store <10ms TBs/PBs
  • 11. FeatureGroup Ingestion in Hopsworks Feature Store ClickFeatureGroup TableFeatureGroup UserFeatureGroup LogsFeatureGroup Event Data SQL DW S3, HDFS SQL DataFrameAPI Kafka Input RTFeatureGroup Online App Train, Batch App User Clicks DB Updates User Profile Updates Weblogs Hof: Real-time feature Engineering Kafka Output
  • 12. Hopsworks Feature Store V1 API First Feature Store with a General Purpose DataFrame API Feature Store is a cache for materialized features, not a library. Online and Offline Feature Stores to support low latency and scale, respectively Reuse of Features means JOINS – Spark as a join engine
  • 13. Hopsworks Feature Store V2 API Enforce feature-group scope and schema+data versioning as best practice Better support for multiple feature stores - join features from development and production feature stores Better support for complex joins of features First class API support for time-travel Support any Python or Spark client with a single library
  • 14. Example Ingestion of data into a FeatureGroup https://docs.hopsworks.ai/ dataframe = spark.read.json("s3://dataset/rain.json") # do feature engineering on your dataframe df.withColumn('precipitation', (df.val-min)/(max-min)) fg = fs.create_feature_group("rain", version=1, description="Rain features", primary_key=['date', 'location_id'], online_enabled=True) fg.save(dataframe) fg.add_tag(name=“ingestion, value=“Databricks:jim; Pii;notebook.ipynb”)
  • 15. # Join features across FeatureGroups. Use “on=[..]” to explicitly enter the JOIN key. feature_join = rain_fg.select_all() .join(temperature_fg.select_all(), on=["date", "location_id"]) .join(location_fg.select_all())) sc = fs.get_storage_connector("myBucket", "S3") td = fs.create_training_dataset("training_dataset", version=1, storage_connector=sc, data_format="tfrecords", description="Training dataset, TfRecords format", splits={'train': 0.7, 'test': 0.2, 'validate': 0.1}) td.save(feature_join) # When training a model, read the training data (use “test” to read test data): ds = td.read(split="train") Example Creation of Train/Test Data from a Feature Store https://docs.hopsworks.ai/
  • 16. FeatureGroup Time-Travel https://docs.hopsworks.ai/ fg.insert(upsert_df) fg.commit_details() df = fs.get_feature_group(“rain”, 1) fg.read(“2020-12-15 09:00:01”).show() fg.read_changes(“2020-12-14 09:00:01”, “2020-12-15 09:00:01”).show() Commit1 Timestamp1 Commit2 Timestamp2 ... ... ... ... Commitn Timestampn Feature Group (v1)
  • 17. FeatureGroup Time-Travel https://docs.hopsworks.ai/ fg.insert(upsert_df) fg.commit_details() df = fs.get_feature_group(“rain”, 1) fg.read(“2020-12-15 09:00:01”).show() fg.read_changes(“2020-12-14 09:00:01”, “2020-12-15 09:00:01”).show() Commit1 Timestamp1 Commit2 Timestamp2 ... ... ... ... Commitn Timestampn show log Feature Group (v1)
  • 18. FeatureGroup Schema Versioning https://docs.hopsworks.ai/ fg.insert(upsert_df) fg.commit_details() df = fs.get_feature_group(“rain”, 1) fg.read(“2020-12-15 09:00:01”).show() fg.read_changes(“2020-12-14 09:00:01”, “2020-12-15 09:00:01”).show() Commit1 Timestamp1 Commit2 Timestamp2 ... ... ... ... Commitn Timestampn Feature Group (v1) Feature Group (v2) latest commit of schema (v1)
  • 19. FeatureGroup Time-Travel https://docs.hopsworks.ai/ fg.insert(upsert_df) fg.commit_details() df = fs.get_feature_group(“rain”, 1) fg.read(“2020-12-15 09:00:01”).show() fg.read_changes(“2020-12-14 09:00:01”, “2020-12-15 09:00:01”).show() Commit1 Timestamp1 Commit2 Timestamp2 ... ... Commitn-1 Commitn Timestampn 2020-12-15 09:00:01 Feature Group (v1)
  • 20. FeatureGroup Time-Travel https://docs.hopsworks.ai/ fg.insert(upsert_df) fg.commit_details() df = fs.get_feature_group(“rain”, 1) fg.read(“2020-12-15 09:00:01”).show() fg.read_changes(“2020-12-14 09:00:01”, “2020-12-15 09:00:01”).show() Commit1 Timestamp1 Commit2 2020-12-14 09:00:01 ... ... Commitn-1 Commitn Timestampn 2020-12-15 09:00:01 Feature Group (v1)