SlideShare a Scribd company logo
1 of 20
Download to read offline
Tobias Fuchs
tobias.fuchs@nm.ifi.lmu.de
LMU Munich, MNM Team
www.mnm-team.org
Expressing and Exploiting
Multi-Dimensional Locality
in DASH
SPPEXA Symposium 2016
2Expressing and Exploiting Multi-Dimensional Locality in DASH
Background
3Expressing and Exploiting Multi-Dimensional Locality in DASH
DASH
• Vision: “C++ standard template library for HPC”.
• Provides n-dim array abstraction for stencil- and dense matrix
operations.
• Realization of the PGAS (partitioned global address space)
programming model.
Background
4Expressing and Exploiting Multi-Dimensional Locality in DASH
PGAS and Locality
• Combine distributed memory into virtual global memory space.
• Strong sense of data ownership:
private, shared local, shared global
int p = 42;
Background
5Expressing and Exploiting Multi-Dimensional Locality in DASH
PGAS and Locality
• Combine distributed memory into virtual global memory space.
• Strong sense of data ownership:
private, shared local, shared global
int p = 42;
dash::Array<T> a;
a.local[4] = p;
Background
6Expressing and Exploiting Multi-Dimensional Locality in DASH
PGAS and Locality
• Combine distributed memory into virtual global memory space.
• Strong sense of data ownership:
private, shared local, shared global
int p;
dash::Array<T> a;
p = a[40];
Background
7Expressing and Exploiting Multi-Dimensional Locality in DASH
PGAS and Locality
• Locality (access distance to data) predominant factor for efficiency.
L = (local accesses) / (total accesses)
• Access pattern on data depends on implementation of algorithm.
• Complexity to maintain locality increases exponentially with the number
of data dimensions.
Objective and Approach
8Expressing and Exploiting Multi-Dimensional Locality in DASH
Objective
Portable efficiency by automatic deduction of optimal data distribution.
Approach
1. Identify distribution properties that allow well-defined specification of
any data distribution.
2. Let algorithms specify soft / hard constraints on distribution properties.
3. Derive optimal distribution for a given set of constraints.
 Automatic deduction of optimal data distribution
Distribution Properties
9Expressing and Exploiting Multi-Dimensional Locality in DASH
Property Categories
Mappings in data distribution can be categorized by their stages:
Partitioning Decomposing the index domain to blocks
Mapping Assigning blocks to units
Layout Storage order of block elements in units’ local memory
Distribution Properties
10Expressing and Exploiting Multi-Dimensional Locality in DASH
Example: Morton Order Distribution
Category Properties
Partitioning balanced, regular, rectangular
Mapping balanced, minimal, neighbor
Layout blocked, linear, canonical
Use Cases
11Expressing and Exploiting Multi-Dimensional Locality in DASH
Automatic Deduction of Optimal Data Distribution
“Find a data distribution that fulfills a set of properties.”
// Deduces pattern type, initializes pattern instance:
auto pattern =
make_pattern< _
partitioning_properties< |-- compile time deduction
balanced, regular >, | via C++11 generic meta template
mapping_properties< | programming
neighbor > |
layout_properties< |
blocked, row_major > _|
> _
(Size<2>(10000,10000), |-- run time deduction
Team<2>(24,24)); _|
Use Cases
12Expressing and Exploiting Multi-Dimensional Locality in DASH
Automatic Deduction of Optimal Data Distribution
“Find a data distribution that is optimal for a given algorithm.”
// Deduce pattern from algorithm constraints:
auto pattern = dash::make_pattern< dash::summa_pattern_constraints >(
Size<2>(10000,10000),
Team<2>(24,24));
dash::Matrix<double, 2> matrix_a(pattern);
dash::Matrix<double, 2> matrix_b(pattern);
dash::Matrix<double, 2> matrix_c(pattern);
dash::summa(matrix_a, matrix_b, matrix_c);
Use Cases
13Expressing and Exploiting Multi-Dimensional Locality in DASH
Automatic Deduction of Optimal Algorithm
“Find algorithm variant that is optimal for a given data distribution.”
// Specify how data is distributed in global memory:
auto pattern = dash::TilePattern<2>(10000,10000, TILED(100,100));
dash::Matrix<double, 2> matrix_a(pattern);
dash::Matrix<double, 2> matrix_b(pattern);
dash::Matrix<double, 2> matrix_c(pattern);
// Selects matrix product algorithm variant that is optimal for the given
// pattern:
dash::multiply(matrix_a, matrix_b, matrix_c);
Use Cases
14Expressing and Exploiting Multi-Dimensional Locality in DASH
Automatic Deduction of Optimal Algorithm
“Find data distribution for the most efficient algorithm variant.”
// Use constraints of most efficient algorithm, usually SUMMA for DGEMM:
auto pattern = dash::make_pattern< dash::multiply_pattern_constraints >(
Size<2>(10000,10000),
Team<2>(24,24));
dash::Matrix<double, 2> matrix_a(pattern);
dash::Matrix<double, 2> matrix_b(pattern);
dash::Matrix<double, 2> matrix_c(pattern);
// Calls dash::summa
dash::multiply(matrix_a, matrix_b, matrix_c);
Evaluation: DGEMM
15Expressing and Exploiting Multi-Dimensional Locality in DASH
MKL multithreaded vs. DASH MPI (GFLOP/s)
DASH: automatic distribution of matrix elements to MPI processes,
each using serial MKL for block matrix multiplication (SUMMA).
MKL: OpenMP threads, matrix initialization in master thread.
Evaluation: DGEMM
16Expressing and Exploiting Multi-Dimensional Locality in DASH
MKL multithreaded vs. DASH MPI (Speedup)
DASH: High locality due to optimal data distribution,
massive communication overhead (MPI, no shared windows).
MKL: Low locality (first touch issues), no communication.
 DASH beats MKL for bigger N and higher degrees of parallelism.
Speedup = DASHGFLOPS / MKLGFLOPS
Evaluation: SGEMM
17Expressing and Exploiting Multi-Dimensional Locality in DASH
MKL multithreaded vs. DASH MPI (GFLOP/s)
DASH: automatic distribution of matrix elements to MPI processes,
each using serial MKL for block matrix multiplication (SUMMA).
MKL: OpenMP threads, matrix initialization in master thread.
Evaluation: SGEMM
18Expressing and Exploiting Multi-Dimensional Locality in DASH
MKL multithreaded vs. DASH MPI (Speedup)
DASH: High locality due to optimal data distribution,
massive communication overhead (MPI, no shared windows).
MKL: Low locality (first touch issues), no communication.
 DASH beats MKL for bigger N and higher degrees of parallelism.
Speedup = DASHGFLOPS / MKLGFLOPS
Summary
19Expressing and Exploiting Multi-Dimensional Locality in DASH
Summary
• Optimal distribution of n-dim data depends on unmanageable multitude
of factors (topology, access pattern, data flow, …).
• We defined a universal classification of distribution properties.
• Property system allows automatic deduction of optimal data distribution
and algorithm variants at compile time and run time.
Works with any C++11 compiler (tested: Intel 14.0+, gcc 4.7+, clang).
• Work in progress: optimal data distribution for data flows.
Tobias Fuchs
tobias.fuchs@nm.ifi.lmu.de
www.mnm-team.org/~fuchst
DASH Project
www.dash-project.org
Visit for upcoming release

More Related Content

What's hot

Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Sangamesh Ragate
 
The convergence of HPC and BigData: What does it mean for HPC sysadmins?
The convergence of HPC and BigData: What does it mean for HPC sysadmins?The convergence of HPC and BigData: What does it mean for HPC sysadmins?
The convergence of HPC and BigData: What does it mean for HPC sysadmins?inside-BigData.com
 
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)Spark Summit
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.Kyong-Ha Lee
 
Target Holding - Big Dikes and Big Data
Target Holding - Big Dikes and Big DataTarget Holding - Big Dikes and Big Data
Target Holding - Big Dikes and Big DataFrens Jan Rumph
 
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLSandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLMLconf
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReducePietro Michiardi
 
Dremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasetsDremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasetsCarl Lu
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsLeila panahi
 
On Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsOn Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsYu Liu
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceM Baddar
 
Distributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsDistributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsBita Kazemi
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce AlgorithmsAmund Tveit
 
Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets robertlz
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processingjins0618
 
High Dimensional Indexing using MongoDB (MongoSV 2012)
High Dimensional Indexing using MongoDB (MongoSV 2012)High Dimensional Indexing using MongoDB (MongoSV 2012)
High Dimensional Indexing using MongoDB (MongoSV 2012)Nicholas Knize, Ph.D., GISP
 

What's hot (20)

Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)
 
The convergence of HPC and BigData: What does it mean for HPC sysadmins?
The convergence of HPC and BigData: What does it mean for HPC sysadmins?The convergence of HPC and BigData: What does it mean for HPC sysadmins?
The convergence of HPC and BigData: What does it mean for HPC sysadmins?
 
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.
 
Target Holding - Big Dikes and Big Data
Target Holding - Big Dikes and Big DataTarget Holding - Big Dikes and Big Data
Target Holding - Big Dikes and Big Data
 
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLSandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReduce
 
Google's Dremel
Google's DremelGoogle's Dremel
Google's Dremel
 
Dremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasetsDremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasets
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
 
On Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsOn Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and Experiments
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Distributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsDistributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasets
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
 
Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets
 
Pig Experience
Pig ExperiencePig Experience
Pig Experience
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
High Dimensional Indexing using MongoDB (MongoSV 2012)
High Dimensional Indexing using MongoDB (MongoSV 2012)High Dimensional Indexing using MongoDB (MongoSV 2012)
High Dimensional Indexing using MongoDB (MongoSV 2012)
 
MapReduce
MapReduceMapReduce
MapReduce
 

Viewers also liked

Emergent UX: Seducing the six minds
Emergent UX: Seducing the six mindsEmergent UX: Seducing the six minds
Emergent UX: Seducing the six mindsJohn Whalen
 
MODERNIZING YOUR WORKPLACE WITH OFFICE 365
MODERNIZING YOUR WORKPLACE WITH OFFICE 365MODERNIZING YOUR WORKPLACE WITH OFFICE 365
MODERNIZING YOUR WORKPLACE WITH OFFICE 365Tarek El Jammal
 
Next 2013: Conference on Innovation and The Future
Next 2013: Conference on Innovation and The FutureNext 2013: Conference on Innovation and The Future
Next 2013: Conference on Innovation and The FutureBernard Moon
 
B2B Communication Matrix - Gruppo 1
B2B Communication Matrix - Gruppo 1B2B Communication Matrix - Gruppo 1
B2B Communication Matrix - Gruppo 1Paola Furlanetto
 
Correo electronico diapo
Correo electronico diapoCorreo electronico diapo
Correo electronico diapoNath Rosales
 
"e" is for "everywhere": Designing email in the mobile age
"e" is for "everywhere": Designing email in the mobile age"e" is for "everywhere": Designing email in the mobile age
"e" is for "everywhere": Designing email in the mobile ageMathew Patterson
 
Wapiti Labs Inc. Website Design
Wapiti Labs Inc. Website DesignWapiti Labs Inc. Website Design
Wapiti Labs Inc. Website DesignBrenna French
 
Internet of Things: How Finance Should Embrace the Coming Flood to Drive Top-...
Internet of Things: How Finance Should Embrace the Coming Flood to Drive Top-...Internet of Things: How Finance Should Embrace the Coming Flood to Drive Top-...
Internet of Things: How Finance Should Embrace the Coming Flood to Drive Top-...Gotransverse
 
Reflexões sobre o terceiro ciclo dirigidas para alunos de doutoramento
Reflexões sobre o terceiro ciclo dirigidas para alunos de doutoramentoReflexões sobre o terceiro ciclo dirigidas para alunos de doutoramento
Reflexões sobre o terceiro ciclo dirigidas para alunos de doutoramentoLuis Borges Gouveia
 
Présentation EasyShair
Présentation EasyShairPrésentation EasyShair
Présentation EasyShairSalmane Tazi
 
Superstitious and Deluded Beliefs
Superstitious and Deluded BeliefsSuperstitious and Deluded Beliefs
Superstitious and Deluded BeliefsOH TEIK BIN
 
El Verbo powerpoint
El Verbo powerpointEl Verbo powerpoint
El Verbo powerpointHernan Vlt
 
Kia case study
Kia case studyKia case study
Kia case studyNewsworks
 
Website Design Trend 2016
Website Design Trend 2016Website Design Trend 2016
Website Design Trend 2016Samuel Soon
 

Viewers also liked (17)

Special Kashmir Ex- Jammu
Special Kashmir Ex- JammuSpecial Kashmir Ex- Jammu
Special Kashmir Ex- Jammu
 
Emergent UX: Seducing the six minds
Emergent UX: Seducing the six mindsEmergent UX: Seducing the six minds
Emergent UX: Seducing the six minds
 
MODERNIZING YOUR WORKPLACE WITH OFFICE 365
MODERNIZING YOUR WORKPLACE WITH OFFICE 365MODERNIZING YOUR WORKPLACE WITH OFFICE 365
MODERNIZING YOUR WORKPLACE WITH OFFICE 365
 
Next 2013: Conference on Innovation and The Future
Next 2013: Conference on Innovation and The FutureNext 2013: Conference on Innovation and The Future
Next 2013: Conference on Innovation and The Future
 
B2B Communication Matrix - Gruppo 1
B2B Communication Matrix - Gruppo 1B2B Communication Matrix - Gruppo 1
B2B Communication Matrix - Gruppo 1
 
CV Team / Resume template
CV Team / Resume templateCV Team / Resume template
CV Team / Resume template
 
Correo electronico diapo
Correo electronico diapoCorreo electronico diapo
Correo electronico diapo
 
"e" is for "everywhere": Designing email in the mobile age
"e" is for "everywhere": Designing email in the mobile age"e" is for "everywhere": Designing email in the mobile age
"e" is for "everywhere": Designing email in the mobile age
 
Wapiti Labs Inc. Website Design
Wapiti Labs Inc. Website DesignWapiti Labs Inc. Website Design
Wapiti Labs Inc. Website Design
 
Internet of Things: How Finance Should Embrace the Coming Flood to Drive Top-...
Internet of Things: How Finance Should Embrace the Coming Flood to Drive Top-...Internet of Things: How Finance Should Embrace the Coming Flood to Drive Top-...
Internet of Things: How Finance Should Embrace the Coming Flood to Drive Top-...
 
Reflexões sobre o terceiro ciclo dirigidas para alunos de doutoramento
Reflexões sobre o terceiro ciclo dirigidas para alunos de doutoramentoReflexões sobre o terceiro ciclo dirigidas para alunos de doutoramento
Reflexões sobre o terceiro ciclo dirigidas para alunos de doutoramento
 
Présentation EasyShair
Présentation EasyShairPrésentation EasyShair
Présentation EasyShair
 
Superstitious and Deluded Beliefs
Superstitious and Deluded BeliefsSuperstitious and Deluded Beliefs
Superstitious and Deluded Beliefs
 
Lumi
LumiLumi
Lumi
 
El Verbo powerpoint
El Verbo powerpointEl Verbo powerpoint
El Verbo powerpoint
 
Kia case study
Kia case studyKia case study
Kia case study
 
Website Design Trend 2016
Website Design Trend 2016Website Design Trend 2016
Website Design Trend 2016
 

Similar to Expressing and Exploiting Multi-Dimensional Locality in DASH

Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkSpark Summit
 
Data processing platforms with SMACK: Spark and Mesos internals
Data processing platforms with SMACK:  Spark and Mesos internalsData processing platforms with SMACK:  Spark and Mesos internals
Data processing platforms with SMACK: Spark and Mesos internalsAnton Kirillov
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesKelly Technologies
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...Kiruthikak14
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkCloudera, Inc.
 
Bigdata analytics K.kiruthika 2nd M.Sc.,computer science Bon secoures college...
Bigdata analytics K.kiruthika 2nd M.Sc.,computer science Bon secoures college...Bigdata analytics K.kiruthika 2nd M.Sc.,computer science Bon secoures college...
Bigdata analytics K.kiruthika 2nd M.Sc.,computer science Bon secoures college...Kiruthikak14
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introductionHektor Jacynycz García
 
MAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine LearningMAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine LearningGianvito Siciliano
 
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:211. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:2Fabio Fumarola
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelJenny Liu
 
MAP REDUCE BASED ON CLOAK DHT DATA REPLICATION EVALUATION
MAP REDUCE BASED ON CLOAK DHT DATA REPLICATION EVALUATIONMAP REDUCE BASED ON CLOAK DHT DATA REPLICATION EVALUATION
MAP REDUCE BASED ON CLOAK DHT DATA REPLICATION EVALUATIONijdms
 
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLabBeyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLabVijay Srinivas Agneeswaran, Ph.D
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 

Similar to Expressing and Exploiting Multi-Dimensional Locality in DASH (20)

Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
Data processing platforms with SMACK: Spark and Mesos internals
Data processing platforms with SMACK:  Spark and Mesos internalsData processing platforms with SMACK:  Spark and Mesos internals
Data processing platforms with SMACK: Spark and Mesos internals
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
 
Bigdata analytics K.kiruthika 2nd M.Sc.,computer science Bon secoures college...
Bigdata analytics K.kiruthika 2nd M.Sc.,computer science Bon secoures college...Bigdata analytics K.kiruthika 2nd M.Sc.,computer science Bon secoures college...
Bigdata analytics K.kiruthika 2nd M.Sc.,computer science Bon secoures college...
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introduction
 
E031201032036
E031201032036E031201032036
E031201032036
 
MAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine LearningMAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine Learning
 
PointNet
PointNetPointNet
PointNet
 
Hadoop
HadoopHadoop
Hadoop
 
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:211. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:2
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in Parallel
 
MAP REDUCE BASED ON CLOAK DHT DATA REPLICATION EVALUATION
MAP REDUCE BASED ON CLOAK DHT DATA REPLICATION EVALUATIONMAP REDUCE BASED ON CLOAK DHT DATA REPLICATION EVALUATION
MAP REDUCE BASED ON CLOAK DHT DATA REPLICATION EVALUATION
 
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLabBeyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
 
Big data analytics_beyond_hadoop_public_18_july_2013
Big data analytics_beyond_hadoop_public_18_july_2013Big data analytics_beyond_hadoop_public_18_july_2013
Big data analytics_beyond_hadoop_public_18_july_2013
 
Scala+data
Scala+dataScala+data
Scala+data
 
Spark training-in-bangalore
Spark training-in-bangaloreSpark training-in-bangalore
Spark training-in-bangalore
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 

Recently uploaded

MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 

Recently uploaded (20)

MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 

Expressing and Exploiting Multi-Dimensional Locality in DASH

  • 1. Tobias Fuchs tobias.fuchs@nm.ifi.lmu.de LMU Munich, MNM Team www.mnm-team.org Expressing and Exploiting Multi-Dimensional Locality in DASH SPPEXA Symposium 2016
  • 2. 2Expressing and Exploiting Multi-Dimensional Locality in DASH
  • 3. Background 3Expressing and Exploiting Multi-Dimensional Locality in DASH DASH • Vision: “C++ standard template library for HPC”. • Provides n-dim array abstraction for stencil- and dense matrix operations. • Realization of the PGAS (partitioned global address space) programming model.
  • 4. Background 4Expressing and Exploiting Multi-Dimensional Locality in DASH PGAS and Locality • Combine distributed memory into virtual global memory space. • Strong sense of data ownership: private, shared local, shared global int p = 42;
  • 5. Background 5Expressing and Exploiting Multi-Dimensional Locality in DASH PGAS and Locality • Combine distributed memory into virtual global memory space. • Strong sense of data ownership: private, shared local, shared global int p = 42; dash::Array<T> a; a.local[4] = p;
  • 6. Background 6Expressing and Exploiting Multi-Dimensional Locality in DASH PGAS and Locality • Combine distributed memory into virtual global memory space. • Strong sense of data ownership: private, shared local, shared global int p; dash::Array<T> a; p = a[40];
  • 7. Background 7Expressing and Exploiting Multi-Dimensional Locality in DASH PGAS and Locality • Locality (access distance to data) predominant factor for efficiency. L = (local accesses) / (total accesses) • Access pattern on data depends on implementation of algorithm. • Complexity to maintain locality increases exponentially with the number of data dimensions.
  • 8. Objective and Approach 8Expressing and Exploiting Multi-Dimensional Locality in DASH Objective Portable efficiency by automatic deduction of optimal data distribution. Approach 1. Identify distribution properties that allow well-defined specification of any data distribution. 2. Let algorithms specify soft / hard constraints on distribution properties. 3. Derive optimal distribution for a given set of constraints.  Automatic deduction of optimal data distribution
  • 9. Distribution Properties 9Expressing and Exploiting Multi-Dimensional Locality in DASH Property Categories Mappings in data distribution can be categorized by their stages: Partitioning Decomposing the index domain to blocks Mapping Assigning blocks to units Layout Storage order of block elements in units’ local memory
  • 10. Distribution Properties 10Expressing and Exploiting Multi-Dimensional Locality in DASH Example: Morton Order Distribution Category Properties Partitioning balanced, regular, rectangular Mapping balanced, minimal, neighbor Layout blocked, linear, canonical
  • 11. Use Cases 11Expressing and Exploiting Multi-Dimensional Locality in DASH Automatic Deduction of Optimal Data Distribution “Find a data distribution that fulfills a set of properties.” // Deduces pattern type, initializes pattern instance: auto pattern = make_pattern< _ partitioning_properties< |-- compile time deduction balanced, regular >, | via C++11 generic meta template mapping_properties< | programming neighbor > | layout_properties< | blocked, row_major > _| > _ (Size<2>(10000,10000), |-- run time deduction Team<2>(24,24)); _|
  • 12. Use Cases 12Expressing and Exploiting Multi-Dimensional Locality in DASH Automatic Deduction of Optimal Data Distribution “Find a data distribution that is optimal for a given algorithm.” // Deduce pattern from algorithm constraints: auto pattern = dash::make_pattern< dash::summa_pattern_constraints >( Size<2>(10000,10000), Team<2>(24,24)); dash::Matrix<double, 2> matrix_a(pattern); dash::Matrix<double, 2> matrix_b(pattern); dash::Matrix<double, 2> matrix_c(pattern); dash::summa(matrix_a, matrix_b, matrix_c);
  • 13. Use Cases 13Expressing and Exploiting Multi-Dimensional Locality in DASH Automatic Deduction of Optimal Algorithm “Find algorithm variant that is optimal for a given data distribution.” // Specify how data is distributed in global memory: auto pattern = dash::TilePattern<2>(10000,10000, TILED(100,100)); dash::Matrix<double, 2> matrix_a(pattern); dash::Matrix<double, 2> matrix_b(pattern); dash::Matrix<double, 2> matrix_c(pattern); // Selects matrix product algorithm variant that is optimal for the given // pattern: dash::multiply(matrix_a, matrix_b, matrix_c);
  • 14. Use Cases 14Expressing and Exploiting Multi-Dimensional Locality in DASH Automatic Deduction of Optimal Algorithm “Find data distribution for the most efficient algorithm variant.” // Use constraints of most efficient algorithm, usually SUMMA for DGEMM: auto pattern = dash::make_pattern< dash::multiply_pattern_constraints >( Size<2>(10000,10000), Team<2>(24,24)); dash::Matrix<double, 2> matrix_a(pattern); dash::Matrix<double, 2> matrix_b(pattern); dash::Matrix<double, 2> matrix_c(pattern); // Calls dash::summa dash::multiply(matrix_a, matrix_b, matrix_c);
  • 15. Evaluation: DGEMM 15Expressing and Exploiting Multi-Dimensional Locality in DASH MKL multithreaded vs. DASH MPI (GFLOP/s) DASH: automatic distribution of matrix elements to MPI processes, each using serial MKL for block matrix multiplication (SUMMA). MKL: OpenMP threads, matrix initialization in master thread.
  • 16. Evaluation: DGEMM 16Expressing and Exploiting Multi-Dimensional Locality in DASH MKL multithreaded vs. DASH MPI (Speedup) DASH: High locality due to optimal data distribution, massive communication overhead (MPI, no shared windows). MKL: Low locality (first touch issues), no communication.  DASH beats MKL for bigger N and higher degrees of parallelism. Speedup = DASHGFLOPS / MKLGFLOPS
  • 17. Evaluation: SGEMM 17Expressing and Exploiting Multi-Dimensional Locality in DASH MKL multithreaded vs. DASH MPI (GFLOP/s) DASH: automatic distribution of matrix elements to MPI processes, each using serial MKL for block matrix multiplication (SUMMA). MKL: OpenMP threads, matrix initialization in master thread.
  • 18. Evaluation: SGEMM 18Expressing and Exploiting Multi-Dimensional Locality in DASH MKL multithreaded vs. DASH MPI (Speedup) DASH: High locality due to optimal data distribution, massive communication overhead (MPI, no shared windows). MKL: Low locality (first touch issues), no communication.  DASH beats MKL for bigger N and higher degrees of parallelism. Speedup = DASHGFLOPS / MKLGFLOPS
  • 19. Summary 19Expressing and Exploiting Multi-Dimensional Locality in DASH Summary • Optimal distribution of n-dim data depends on unmanageable multitude of factors (topology, access pattern, data flow, …). • We defined a universal classification of distribution properties. • Property system allows automatic deduction of optimal data distribution and algorithm variants at compile time and run time. Works with any C++11 compiler (tested: Intel 14.0+, gcc 4.7+, clang). • Work in progress: optimal data distribution for data flows.