Soumettre la recherche
Mettre en ligne
Distributed Heterogeneous Mixture Learning On Spark
•
2 j'aime
•
866 vues
Spark Summit
Suivre
Spark Summit 2016 talk by Masato Asahara (NEC) and Ryohei Fujimaki (NEC)
Lire moins
Lire la suite
Données & analyses
Signaler
Partager
Signaler
Partager
1 sur 51
Télécharger maintenant
Télécharger pour lire hors ligne
Recommandé
Distributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On Spark
Spark Summit
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark Summit
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
Alpine Data
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Jen Aman
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Databricks
Harnessing Spark Catalyst for Custom Data Payloads
Harnessing Spark Catalyst for Custom Data Payloads
Simeon Fitch
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache Spark
Burak Yavuz
Recommandé
Distributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On Spark
Spark Summit
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark Summit
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
Alpine Data
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Jen Aman
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Databricks
Harnessing Spark Catalyst for Custom Data Payloads
Harnessing Spark Catalyst for Custom Data Payloads
Simeon Fitch
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache Spark
Burak Yavuz
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
Databricks
Taming the Search: A Practical Way of Enforcing GDPR and CCPA in Very Large D...
Taming the Search: A Practical Way of Enforcing GDPR and CCPA in Very Large D...
Databricks
Spark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan Ravat
Spark Summit
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Cloudera, Inc.
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Databricks
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source Toolkits
DataWorks Summit
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutions
Stavros Kontopoulos
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Spark Summit
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Databricks
Geospatial Analytics at Scale with Deep Learning and Apache Spark
Geospatial Analytics at Scale with Deep Learning and Apache Spark
Databricks
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache Spark
Databricks
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
DataWorks Summit
Big Data Heterogeneous Mixture Learning on Spark
Big Data Heterogeneous Mixture Learning on Spark
DataWorks Summit/Hadoop Summit
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Wee Hyong Tok
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and Feast
Databricks
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Spark Summit
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Jen Aman
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Spark Summit
Hyperspace for Delta Lake
Hyperspace for Delta Lake
Databricks
Producing Spark on YARN for ETL
Producing Spark on YARN for ETL
DataWorks Summit/Hadoop Summit
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
Databricks
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
Databricks
Contenu connexe
Tendances
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
Databricks
Taming the Search: A Practical Way of Enforcing GDPR and CCPA in Very Large D...
Taming the Search: A Practical Way of Enforcing GDPR and CCPA in Very Large D...
Databricks
Spark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan Ravat
Spark Summit
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Cloudera, Inc.
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Databricks
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source Toolkits
DataWorks Summit
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutions
Stavros Kontopoulos
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Spark Summit
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Databricks
Geospatial Analytics at Scale with Deep Learning and Apache Spark
Geospatial Analytics at Scale with Deep Learning and Apache Spark
Databricks
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache Spark
Databricks
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
DataWorks Summit
Big Data Heterogeneous Mixture Learning on Spark
Big Data Heterogeneous Mixture Learning on Spark
DataWorks Summit/Hadoop Summit
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Wee Hyong Tok
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and Feast
Databricks
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Spark Summit
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Jen Aman
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Spark Summit
Hyperspace for Delta Lake
Hyperspace for Delta Lake
Databricks
Producing Spark on YARN for ETL
Producing Spark on YARN for ETL
DataWorks Summit/Hadoop Summit
Tendances
(20)
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
Taming the Search: A Practical Way of Enforcing GDPR and CCPA in Very Large D...
Taming the Search: A Practical Way of Enforcing GDPR and CCPA in Very Large D...
Spark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan Ravat
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source Toolkits
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutions
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Geospatial Analytics at Scale with Deep Learning and Apache Spark
Geospatial Analytics at Scale with Deep Learning and Apache Spark
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache Spark
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
Big Data Heterogeneous Mixture Learning on Spark
Big Data Heterogeneous Mixture Learning on Spark
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and Feast
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Hyperspace for Delta Lake
Hyperspace for Delta Lake
Producing Spark on YARN for ETL
Producing Spark on YARN for ETL
Similaire à Distributed Heterogeneous Mixture Learning On Spark
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
Databricks
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
Databricks
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
DataWorks Summit
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Databricks
The Sierra Supercomputer: Science and Technology on a Mission
The Sierra Supercomputer: Science and Technology on a Mission
inside-BigData.com
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Ahsan Javed Awan
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
Srivatsan Ramanujam
OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020
OpenACC
Multi Processor Architecture for image processing
Multi Processor Architecture for image processing
ideas2ignite
The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with Spark
SingleStore
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrations
inside-BigData.com
OpenACC and Open Hackathons Monthly Highlights May 2023.pdf
OpenACC and Open Hackathons Monthly Highlights May 2023.pdf
OpenACC
HPC in higher education
HPC in higher education
Kishor Satpathy
OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019
OpenACC
OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021
OpenACC
Big Data Analytics With MATLAB
Big Data Analytics With MATLAB
CodeOps Technologies LLP
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
Seldon
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
Edge AI and Vision Alliance
electronics-11-03883.pdf
electronics-11-03883.pdf
RioCarthiis
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Intel® Software
Similaire à Distributed Heterogeneous Mixture Learning On Spark
(20)
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
The Sierra Supercomputer: Science and Technology on a Mission
The Sierra Supercomputer: Science and Technology on a Mission
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020
Multi Processor Architecture for image processing
Multi Processor Architecture for image processing
The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with Spark
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrations
OpenACC and Open Hackathons Monthly Highlights May 2023.pdf
OpenACC and Open Hackathons Monthly Highlights May 2023.pdf
HPC in higher education
HPC in higher education
OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021
Big Data Analytics With MATLAB
Big Data Analytics With MATLAB
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
electronics-11-03883.pdf
electronics-11-03883.pdf
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Plus de Spark Summit
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Spark Summit
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
Plus de Spark Summit
(20)
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Dernier
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
Seán Kennedy
Learn How Data Science Changes Our World
Learn How Data Science Changes Our World
Eduminds Learning
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
TecnoIncentive
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Boston Institute of Analytics
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
socarem879
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
Dr Arash Najmaei ( Phd., MBA, BSc)
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
KarteekMane1
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
aleedritatuxx
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
VICTOR MAESTRE RAMIREZ
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
Boston Institute of Analytics
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
deepakthakur548787
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
Cathrine Wilhelmsen
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
Jeremy Anderson
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Boston Institute of Analytics
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Boston Institute of Analytics
Principles and Practices of Data Visualization
Principles and Practices of Data Visualization
KianJazayeri1
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
17djon017
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
SubhamKumar3239
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Boston Institute of Analytics
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
HimangsuNath
Dernier
(20)
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
Learn How Data Science Changes Our World
Learn How Data Science Changes Our World
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Principles and Practices of Data Visualization
Principles and Practices of Data Visualization
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
Distributed Heterogeneous Mixture Learning On Spark
1.
Distributed Heterogeneous Mixture Learning
on Spark Masato Asahara and Ryohei Fujimaki NEC Data Science Research Labs. Jun/08/2016 @Spark Summit 2016
2.
2 © NEC
Corporation 2016 Agenda
3.
3 © NEC
Corporation 2016 Agenda HML
4.
4 © NEC
Corporation 2016 Agenda CPU 12 SEP 11 SEP 10 SEP HML
5.
5 © NEC
Corporation 2016 Agenda HML
6.
NEC’s Predictive Analytics
and Heterogeneous Mixture Learning
7.
7 © NEC
Corporation 2016 Enterprise Applications of HML Driver Risk Assessment Inventory Optimization Churn Retention Predictive Maintenance Product Price Optimization Sales Optimization Energy/Water Operation Mgmt.
8.
8 © NEC
Corporation 2016 NEC’s Heterogeneous Mixture Learning (HML) NEC’s machine learning that automatically derives “accurate” and “transparent” formulas behind Big Data. Sun Sat Y=a1x1+ ・・・ +anxn Y=a1x1+ ・・・+knx3 Y=b1x2+ ・・・+hnxn Y=c1x4+ ・・・+fnxn Mon Sun Explore Massive Formulas Transparent data segmentation and predictive formulas Heterogeneous mixture data
9.
9 © NEC
Corporation 2016 The HML Model Gender Height tallshort age smoke food weight sports sports tea/coffee beer/wine The health risk of a tall man can be predicted by age, alcohol and smoke!
10.
10 © NEC
Corporation 2016 HML (Heterogeneous Mixture Learning)
11.
11 © NEC
Corporation 2016 HML (Heterogeneous Mixture Learning)
12.
12 © NEC
Corporation 2016 HML Algorithm Segment data by a rule-based tree Feature Selection Pruning Data Segmentation Training data
13.
13 © NEC
Corporation 2016 HML Algorithm Select features and fit predictive models for data segments Feature Selection Pruning Data Segmentation Training data
14.
14 © NEC
Corporation 2016 Enterprise Applications of HML Driver Risk Assessment Inventory Optimization Churn Retention Predictive Maintenance Energy/Water Operation Mgmt. Sales Optimization Product Price Optimization
15.
15 © NEC
Corporation 2016 Driver Risk Assessment Inventory Optimization Churn Retention Predictive Maintenanc e Energy/Water Operation Mgmt. Product Price Optimization Demand for Big Data Analysis 24(hour)×365(days)×3(year)×1000(shops) ~26,000,000 training samples Sales Optimization
16.
16 © NEC
Corporation 2016 Driver Risk Assessment Inventory Optimization Churn Retention Predictive Maintenance Energy/Water Demand Forecasting Sales Forecasting Product Price Optimization Demand for Big Data Analysis 5 million(customers)×12(months) =60,000,000 training samples
17.
17 © NEC
Corporation 2016 Driver Risk Assessment Inventory Optimization Churn Retention Predictive Maintenanc e Energy/Water Demand Forecasting Sales Forecasting Product Price Optimization Demand for Big Data Analysis
18.
Distributed HML on
Spark
19.
19 © NEC
Corporation 2016 Why Spark, not Hadoop? Because Spark’s in-memory architecture can run HML faster Feature Selection Pruning Data Segmentation CPU CPU CPU Data segmentation Feature selection ~ Pruning CPU CPU Data segmentation Feature selection
20.
20 © NEC
Corporation 2016 Data Scalability powered by Spark Treat unlimited scale of training data by adding executors executor executor executor CPU CPU CPU driver HDFS Add executors, and get more memory and CPU power Training data
21.
21 © NEC
Corporation 2016 Distributed HML Engine: Architecture HDFS Distributed HML Core (Scala) YARN invoke HML model Distributed HML Interface (Python) Matrix computation libraries (Breeze, BLAS) Data Scientist
22.
22 © NEC
Corporation 2016 3 Technical Key Points to Fast Run ML on Spark CPU 12 SEP 11 SEP 10 SEP
23.
23 © NEC
Corporation 2016 3 Technical Key Points to Fast Run ML on Spark CPU 12 SEP 11 SEP 10 SEP
24.
24 © NEC
Corporation 2016 Challenge to Avoid Data Shuffling executor executor executor driver Model: 10KB~10MB HML Model Merge Algorithm Training data: TB~ executor executor executor driver Naïve Design Distributed HML
25.
25 © NEC
Corporation 2016 Execution Flow of Distributed HML Executors build a rule-based tree from their local data in parallel Driver Executor Feature Selection Pruning Data Segmentation
26.
26 © NEC
Corporation 2016 Execution Flow of Distributed HML Driver merges the trees Driver Executor Feature Selection Pruning Data Segmentation HML Segment Merge Algorithm
27.
27 © NEC
Corporation 2016 Execution Flow of Distributed HML Driver broadcasts the merged tree to executors Driver Executor Feature Selection Pruning Data Segmentation
28.
28 © NEC
Corporation 2016 Execution Flow of Distributed HML Executors perform feature selection with their local data in parallel Driver Executor Feature Selection Pruning Data Segmentation Sat Sat Sat Sat Sat
29.
29 © NEC
Corporation 2016 Execution Flow of Distributed HML Driver merges the results of feature selection Driver Executor Feature Selection Pruning Data Segmentation Sat Sat Sat Sat Sat HML Feature Merge Algorithm
30.
30 © NEC
Corporation 2016 Execution Flow of Distributed HML Driver merges the results of feature selection Driver Executor Feature Selection Pruning Data Segmentation Sat HML Feature Merge Algorithm
31.
31 © NEC
Corporation 2016 Execution Flow of Distributed HML Driver broadcasts the merged results of feature selection to executors Driver Executor Feature Selection Pruning Data Segmentation Sat SatSatSat
32.
32 © NEC
Corporation 2016 3 Technical Key Points to Fast Run ML on Spark CPU 12 SEP 11 SEP 10 SEP
33.
33 © NEC
Corporation 2016 Wait Time for Other Executors Delays Execution Machine learning is likely to cause unbalanced computation load Executor Sat Sat Time Executor Executor
34.
34 © NEC
Corporation 2016 Balance Computational Effort for Each Executor Executor optimizes all predictive formulas with equally-divided training data Executor Sat Sat Time Executor Executor
35.
35 © NEC
Corporation 2016 Balance Computational Effort for Each Executor Executor optimizes all predictive formulas with equally-divided training data Executor Sat Sat Time Executor Executor Sat Sat Sat Sat Completion time reduced!
36.
36 © NEC
Corporation 2016 3 Technical Key Points to Fast Run ML on Spark 1TB CPU 12 SEP 11 SEP 10 SEP
37.
37 © NEC
Corporation 2016 Machine Learning Consists of Many Matrix Computations Feature Selection Pruning Data Segmentation Basic Matrix Computation Matrix Inversion Combinatorial Optimization … CPU
38.
38 © NEC
Corporation 2016 RDD Design for Leveraging High-Speed Libraries Distributed HML’s RDD Design Matrix object1 element … Straightforward RDD Design Training sample1 partition Training sample2 CPU CPU Matrix computation libraries (Breeze, BLAS) ~Training sample1 Training sample2 … seq(Training sample1, Training sample2, …) mapPartition Matrix object1 … Training sample1 Training sample2 map Iterator Matrix object creation element
39.
39 © NEC
Corporation 2016 Performance Degradation by Long-Chained RDD Long-chained RDD operations cause high-cost recalculations Feature Selection Pruning Data Segmentation RDD.map(TreeMapFunc) .map(FeatureSelectionMapFunc) .reduce(TreeReduceFunc) .reduce(FeatureSelectionReduceFunc) .map(TreeMapFunc) .map(FeatureSelectionMapFunc) .reduce(TreeReduceFunc) .reduce(FeatureSelectionReduceFunc) .map(PruningMapFunc)
40.
40 © NEC
Corporation 2016 Performance Degradation by Long-Chained RDD Long-chained RDD operations cause high-cost recalculations Feature Selection Pruning Data Segmentation RDD.map(TreeMapFunc) .map(FeatureSelectionMapFunc) .reduce(TreeReduceFunc) .reduce(FeatureSelectionReduceFunc) .map(TreeMapFunc) .map(FeatureSelectionMapFunc) .reduce(TreeReduceFunc) .reduce(FeatureSelectionReduceFunc) GC .map(TreeMapFunc) CPU .map(PruningMapFunc)
41.
41 © NEC
Corporation 2016 Performance Degradation by Long-Chained RDD Cut long-chained operations periodically by checkpoint() Feature Selection Pruning Data Segmentation RDD.map(TreeMapFunc) .map(FeatureSelectionMapFunc) .reduce(TreeReduceFunc) .reduce(FeatureSelectionReduceFunc) .map(TreeMapFunc) .map(FeatureSelectionMapFunc) .reduce(TreeReduceFunc) .reduce(FeatureSelectionReduceFunc) GC .map(TreeMapFunc) savedRDD .checkpoint() = CPU ~ .map(PruningMapFunc)
42.
Benchmark Performance Evaluations
43.
43 © NEC
Corporation 2016 Eval 1: Prediction Error (vs. Spark MLlib algorithms) Distributed HML achieves low error competitive to a complex model data # samples Distributed HML Logistic Regression Decision Tree Random Forests gas sensor array (CO)* 4,208,261 0.542 0.597 0.587 0.576 household power consumption * 2,075,259 0.524 0.531 0.529 0.655 HIGGS* 11,000,000 0.335 0.358 0.337 0.317 HEPMASS* 7,000,000 0.156 0.163 0.167 0.175 * UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/) 1st 1st 1st 2nd
44.
44 © NEC
Corporation 2016 Eval 2: Performance Scalability (vs. Spark MLlib algos) Distributed HML is competitive with Spark MLlib implementations. HIGGS* (11M samples) HEPMASS* (7M samples)# CPU cores Distributed HML Logistic Regression Distributed HML * UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/) Logistic Regression
45.
Evaluation in Real
Case
46.
46 © NEC
Corporation 2016 ATM Cash Balance Prediction Much cash High inventory cost Little cash High risk of out of stock Around 20,000 ATMs
47.
47 © NEC
Corporation 2016 Training Speed 2 hours9 days ▌Summary of data # ATMs: around 20,000 (in Japan) # training samples: around 10M ▌Cluster spec (10 nodes) # CPU cores: 128 Memory: 2.5TB Spark 1.6.1, Hadoop 2.7.2 (HDP 2.3.2) * Run w/ 1 CPU core and 256GB memory 110x Speed up Serial HML* Distributed HML
48.
48 © NEC
Corporation 2016 Prediction Error ▌Summary of data # ATMs: around 20,000 (in Japan) # training samples: around 23M ▌Cluster spec (10 nodes) # CPU cores: 128 Memory: 2.5TB Spark 1.6.1, Hadoop 2.7.2 (HDP 2.3.2) Average Error: 17% lower Serial HML Distributed HMLSerial HML Serial HML vs Winner: Distributed HML
49.
49 © NEC
Corporation 2016 Summary CPU 12 SEP 11 SEP 10 SEP
50.
50 © NEC
Corporation 2016 Summary
Télécharger maintenant