SlideShare une entreprise Scribd logo
1  sur  61
Télécharger pour lire hors ligne
SECTION 3:
MODERN COMPUTING:
CLOUD, DISTRIBUTED & HIGH PERFORMANCE
DR. ÜMIT V. ÇATALYÜREK
PROFESSOR AND ASSOCIATE CHAIR
Georgia Institute of Technology
JANUARY 27, 2017
The Big Data to Knowledge (BD2K)
Guide to the Fundamentals of Data Science
1
ÜMİT V. ÇATALYÜREK
• A Professor in the School of Computational Science &
Engineering in the College of Computing at the Georgia
Institute of Technology.
• A recipient of an NSF CAREER award
• The primary investigator of several awards from the
Department of Energy, the National Institute of Health, & the
National Science Foundation.
• An Associate Editor for Parallel Computing, & editorial board
member for IEEE Transactions on Parallel & Distributed
Computing, & the Journal of Parallel & Distributed Computing.
• A Fellow of IEEE, member of ACM & SIAM, & the Chair for
IEEE TCPP 2016-2017, & Vice-Chair for ACM SIGBio 2015-
2018 term.
• Main research areas: parallel computing, combinatorial
scientific computing & biomedical informatics.
• More information about Dr. Ümit V. Çatalyürek can be
found at http://cc.gatech.edu/~umit.
2
MODERN COMPUTING: CLOUD, DISTRIBUTED &
HIGH PERFORMANCE COMPUTING
Ümit V. Çatalyürek
Professor and Associate Chair
School of Computational Science and Engineering
Georgia Institute of Technology
The BD2K Guide to the Fundamentals of Data Science Series
27 January 2017
3
Outline
• HPC
• What is it? Why?
• A Crash Course on (HPC) Computer Architecture
• History of Single “Processor” Performance
• Taxonomy of Processors, Memory Topology of Parallel Computers
• Supercomputers
• How to speedup your application?
• Focus the common case
• Pay attention to locality
• Take advantage of parallelism
• An Example Application: Whole-Slide Histopathology
Image Analysis for Neuroblastoma
• Summary
4
What does High Performance Computing (HPC) mean?
• There is no such thing as “Low Performance Computing”
• “HPC most generally refers to the practice of aggregating computing
power in a way that delivers much higher performance than one could
get out of a typical desktop computer or workstation in order to solve
large problems in science, engineering, or business” (insideHPC)
• HPC allows scientists and engineers to solve complex science,
engineering, and business problems using applications that require
high bandwidth, enhanced networking, and very high compute
capabilities.” (Amazon AWS)
• “HPC is the use of parallel processing for running advanced
application programs efficiently, reliably and quickly… The term HPC
is occasionally used as a synonym for supercomputing.”
(SearchEnterpriseLinux/WhatIs.com)
5
My Definition of High Performance Computing (HPC)
• Efficient use of computing platforms for running application
programs quickly.
• Why do we care about speed?
• We do not want science to wait for computing.
• Why do we care about efficiency?
• Efficient use of resources means more resources available to all of us J
• Somebody has to pay the bills!
• When you have efficient program, it will be also very fast!
• Supercomputing is HPC, but HPC does not mean just
supercomputing
• For Supercomputers check top500.org (more later).
6
Computing Today
• Computing = Parallel Computing = HPC
• Any “computer” you touch has parallel processing power:
• Your laptop’s CPU has at least 2 cores.
• Your cell phone has 4-8 cores!
• This is BD2K Seminar: Data (and hence computational need) is BIG!
• Too big that it does not fit in to your computer.
• It takes too long to compute on your computer.
0	
1	
10	
100	
1,000	
10,000	
100,000	
1,000,000	
10,000,000	
Dec-82	
Nov-84	
Oct-86	
Sep-88	
Aug-90	
Jul-92	
Jun-94	
M
ay-96	
Apr-98	
M
ar-00	
Feb-02	
Jan-04	
Dec-05	
Nov-07	
Oct-09	
Sep-11	
Aug-13	
Jul-15	
Megabases	
GenBank	Bases	
GenBank	
WGS	
Source: http://www.genome.gov/sequencingcosts/
Oxford Nanapore
MinION MkI
7
Outline
• HPC
• What is it? Why?
• A Crash Course on (HPC) Computer Architecture
• History of Single “Processor” Performance
• Taxonomy of Processors, Memory Topology of Parallel Computers
• Supercomputers
• How to speedup your application?
• Focus the common case
• Pay attention to locality
• Take advantage of parallelism
• An Example Application: Whole-Slide Histopathology
Image Analysis for Neuroblastoma
• Summary
8
History of Single “Processor” Performance
9
RISC
Move to multi-cores
Bandwidth and Latency
•Bandwidth or throughput
• Total work done in a given time
• 10,000-25,000X improvement for processors
• 300-1200X improvement for memory and disks
•Latency or response time
• Time between start and completion of an event
• 30-80X improvement for processors
• 6-8X improvement for memory and disks
10
Bandwidth and Latency
11
Log-log plot of bandwidth and latency milestones
Flynn’s Taxonomy
12
Instructions
Single (SI) Multiple (MI)
Data
Multiple(MD)
SISD
Single-threaded
process
MISD
Pipeline
architecture
SIMD
Vector Processing
MIMD
Shared-/
Distributed-
Memory
Computing
Single(SD)
SISD
13
D D D D D D D
Processor
Instructions
SIMD
14
D0
Processor
Instructions
D0D0 D0 D0 D0
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D0
GPU (SIMD) Advantage
15 Images are from W. Dally’s SC10 Keynote Talk
MIMD
16
D D D D D D D
Processor
Instructions
D D D D D D D
Processor
Instructions
Memory Typology: Shared
17
Memory
Processor
Processor Processor
Processor
a.k.a. SMPs
Memory Typology: Distributed
18
MemoryProcessor MemoryProcessor
MemoryProcessor MemoryProcessor
Network
Memory Typology: Hybrid
19
Memory
Processor
Network
Processor
Memory
Processor
Processor
Memory
Processor
Processor
Memory
Processor
Processor
Memory Typology: Hybrid + Hetorogenous
20
Memory
Processor
Network
Processor
GPU
Memory
Processor
Processor
GPU
Memory
Processor
Processor
GPU
Memory
Processor
Processor
GPU
Outline
• HPC
• What is it? Why?
• A Crash Course on (HPC) Computer Architecture
• History of Single “Processor” Performance
• Taxonomy of Processors, Memory Topology of Parallel Computers
• Supercomputers
• How to speedup your application?
• Focus the common case
• Pay attention to locality
• Take advantage of parallelism
• An Example Application: Whole-Slide Histopathology
Image Analysis for Neuroblastoma
• Summary
21
Oxen or Chicken Dilemma
• "If you were plowing a field, which would you rather use?
Two strong oxen or 1024 chickens?”
Seymour Cray
22
23
Highlights from Top500
24
Highlights from Top500
25
Highlights from Top500
26
Outline
• HPC
• What is it? Why?
• A Crash Course on (HPC) Computer Architecture
• History of Single “Processor” Performance
• Taxonomy of Processors, Memory Topology of Parallel Computers
• Supercomputers
• How to speedup your application?
• Focus the common case
• Pay attention to locality
• Take advantage of parallelism
• An Example Application: Whole-Slide Histopathology
Image Analysis for Neuroblastoma
• Summary
27
Amdahl’s Law
28
( )
enhanced
enhanced
enhanced
new
old
overall
Speedup
Fraction
Fraction
1
ExTime
ExTime
Speedup
+−
==
1
Best you could ever hope to do:
( )enhanced
maximum
Fraction-1
1
Speedup =
( ) !
"
#
$
%
&
+−×=
enhanced
enhanced
enhancedoldnew
Speedup
Fraction
FractionExTimeExTime 1
Amdahl’s Law Example:
( )
( )
56.1
64.0
1
10
0.4
0.41
1
Speedup
Fraction
Fraction1
1
Speedup
enhanced
enhanced
enhanced
overall
==
+−
=
+−
=
29
• Sequence Analysis Pipeline has a “slow” step which does
error correction of the input reads
• New CPU 10X faster
• I/O bound server, so 60% time waiting for I/O
• Apparently, its human nature to be attracted by 10X
faster, vs. keeping in perspective its just 1.6X faster
Multiple Sequence Alignment
VTISCTGSSSNIG-AGNHVKWYQQLPG
VTISCTGTSSNIG--SITVNWYQQLPG
LRLSCSSSGFIFS--SYAMYWVRQAPG
LSLTCTVSGTSFD--DYYSTWVRQPPG
PEVTCVVVDVSHEDPQVKFNW--YVDG
ATLVCLISDFYPG--AVTVAW--KADS
AALGCLVKDYFPE--PVTVSW--NS-G
VSLTCLVKGFYPS--DIAVEW--ESNG
30
VTISCTGSSSNIGAG-NHVKWYQQLPG
VTISCTGTSSNIGS--ITVNWYQQLPG
LRLSCSSSGFIFSS--YAMYWVRQAPG
LSLTCTVSGTSFDD--YYSTWVRQPPG
PEVTCVVVDVSHEDPQVKFNWYVDG--
ATLVCLISDFYPGA--VTVAWKADS--
AALGCLVKDYFPEP--VTVSWNSG---
VSLTCLVKGFYPSD--IAVEWESNG--
• Optimal: O(2n P |li|)
• 6 sequences of length 100 if constant is 10-9 seconds
• running time 6.4 x 104 seconds (~17.7 hours)
• add 2 sequences
• running time 2.6 x 109 seconds (~82.4 years!)
or
CLUSTAL W
• Based on Higgins & Sharp CLUSTAL [Gene88]
• Progressive alignment-based strategy
• Pairwise Alignment (n2l2)
• A distance matrix is computed using either an approximate method (fast) or
dynamic programming (more accurate, slower)
• Computation of Guide Tree (n3): phylogenetic tree
• Computed from the distance matrix
• Iteratively selecting aligned pairs and linking them.
• Progressive Alignment (nl2)
• A series of pairwise alignments computed using full dynamic programming to align
larger and larger groups of sequences.
• The order in the Guide Tree determines the ordering of sequence alignments.
• At each step; either two sequences are aligned, or a new sequence is aligned with
a group, or two groups are aligned.
• n: number of sequences in the query
• l : average sequence length
31
Speeding up CLUSTALW
Breakdown of CLUSTAL W Execution Time on PIII-650MHz
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
25 50 75 100 150 200 400 600 800 1000
Number of GPCR Sequences
TimeFractions
prog-align
guidetree
pairwise
• By parallelizing most time
consuming part: pair-wise
alignment
0
1
2
3
4
5
6
7
8
9
1 2 3 4 5 6 7 8
Speeedup
# Processors
Speedup of Parallelized Version of CLUSTALW
linear
pair align
ideal
speedup
total
32
0
100
200
300
400
500
600
700
800
900
1,000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Speedup
Number Of Processors
10.00%
5.00%
2.00%
1.00%
0.50%
0.10%
More on Amdahl’s law
33
Outline
• HPC
• What is it? Why?
• A Crash Course on (HPC) Computer Architecture
• History of Single “Processor” Performance
• Taxonomy of Processors, Memory Topology of Parallel Computers
• Supercomputers
• How to speedup your application?
• Focus the common case
• Pay attention to locality
• Take advantage of parallelism
• An Example Application: Whole-Slide Histopathology
Image Analysis for Neuroblastoma
• Summary
34
Levels of the Memory Hierarchy
35
CPU Registers
100s Bytes
300 – 500 ps (0.3-0.5 ns)
L1 and L2 Cache
10s-100s K Bytes
~1 ns - ~10 ns
$1000s/ GByte
Main Memory
G Bytes
80ns- 200ns
~ $100/ GByte
Disk
10s T Bytes, 10 ms
(10,000,000 ns)
~ $1 / GByte
Capacity
Access Time
Cost
Tape infinite
sec-min
~$1 / GByte
Registers
L1 Cache
Memory
Disk
Tape
Instr. Operands
Blocks
Pages
Files
Upper Level
Lower Level
faster
Larger
L2 Cache
Blocks
Locality Aware Remote Visualization
• Scientific and clinical research generate multi-BG to multi-TB of
spatially and temporally correlated data
• Different spatial and temporal resolutions
• Different acquisition modalities, from CT to light microscopy to electron
micrography
• Examples Applications: Visible Human, mouse BIRN
• DataCutter Streams Data to MPI-based OSC Parallel Renderer
• Setup
• Full color Visible Woman dataset
• Super-sampled at 2x for entire dataset, 4x and 8x for regions of the dataset
• Data stored on 20 nodes
• 8 rendering nodes and 1 compositing node with texture VR
• Remote thin client connected over internet
System Overview
Query Execution
Implementation of OSC Parallel Renderer
Implementation of OSC Parallel Renderer
Outline
• HPC
• What is it? Why?
• A Crash Course on (HPC) Computer Architecture
• History of Single “Processor” Performance
• Taxonomy of Processors, Memory Topology of Parallel Computers
• Supercomputers
• How to speedup your application?
• Focus the common case
• Pay attention to locality
• Take advantage of parallelism
• An Example Application: Whole-Slide Histopathology
Image Analysis for Neuroblastoma
• Summary
41
Current and Emerging Scientific Applications
42
Processing Remotely-Sensed Data
NOAA Tiros-N
w/ AVHRR sensor
AVHRR Level 1 DataAVHRR Level 1 Data
• As the TIROS-N satellite orbits, the
Advanced Very High Resolution Radiometer (AVHRR)
sensor scans perpendicular to the satellite’s track.
• At regular intervals along a scan line measurements
are gathered to form an instantaneous field of view
(IFOV).
• Scan lines are aggregated into Level 1 data sets.
A single file of Global Area
Coverage (GAC) data
represents:
• ~one full earth orbit.
• ~110 minutes.
• ~40 megabytes.
• ~15,000 scan lines.
One scan line is 409 IFOV’s
Satellite Data Processing
DCE-MRI Analysis
Short Sequence
Mapping
Quantum Chemistry
Image ProcessingMultimedia Video Surveillance Montage
Application Patterns
•Complex and diverse processing structures
43
Processing Remotely-Sensed Data
NOAA Tiros-N
w/ AVHRR sensor
AVHRR Level 1 DataAVHRR Level 1 Data
• As the TIROS-N satellite orbits, the
Advanced Very High Resolution Radiometer (AVHRR)
sensor scans perpendicular to the satellite’s track.
• At regular intervals along a scan line measurements
are gathered to form an instantaneous field of view
(IFOV).
• Scan lines are aggregated into Level 1 data sets.
A single file of Global Area
Coverage (GAC) data
represents:
• ~one full earth orbit.
• ~110 minutes.
• ~40 megabytes.
• ~15,000 scan lines.
One scan line is 409 IFOV’s
Bag-of-Tasks Model
Data Analysis Applications
Bag-of-Tasks Applications
Task
File
Application Patterns
•Complex and diverse processing structures
44
Data Analysis Applications
Bag-of-Tasks Applications Workflows
Non-streaming
Task
File
Sequential or Parallel Task
This image cannot currently be displayed.
Non-streaming
Application Patterns
•Complex and diverse processing structures
45
Streaming
Data Analysis Applications
Bag-of-Tasks Applications Workflows
Non-streaming
Task
File
Sequential or Parallel Task
Streaming
Taxonomy of Parallelism
•Complex and diverse processing structures
• Varied parallelism
46
Bag-of-Tasks Applications
Sequential Task
File
P1 P2 P3 P4
Task-parallelism
Application Patterns
• Complex	and	diverse	processing	structures
• Varied	parallelism	
•Bag-of-tasks	applications:	task-parallelism
47
Data-parallelism
Task-parallelism
Non-streaming Workflows
Sequential or Parallel Task
P1 P2 P3 P4
Taxonomy of Parallelism
• Complex	and	diverse	processing	structures
• Varied	parallelism	
48
Taxonomy of Parallelism
• Complex	and	diverse	processing	structures
• Varied	parallelism	
•Bag-of-tasks:	task-parallelism
•Non-streaming	workflows:	task- and	data-parallelism
49
Data-parallelism
Streaming Workflows
Sequential or Parallel Task
P1 P2 P3 P4
Pipelined-parallelism
Task-parallelism
Taxonomy of Parallelism
• Complex	and	diverse	processing	structures
• Varied	parallelism	
50
Taxonomy of Parallelism
• Complex	and	diverse	processing	structures
• Varied	parallelism	
•Bag-of-tasks:	task-parallelism
•Non-streaming	workflows:	task- and	data-parallelism
•Streaming	workflows:	task-,	data- and	pipelined-
parallelism
51
Outline
• HPC
• What is it? Why?
• A Crash Course on (HPC) Computer Architecture
• History of Single “Processor” Performance
• Taxonomy of Processors, Memory Topology of Parallel Computers
• Supercomputers
• How to speedup your application?
• Focus the common case
• Pay attention to locality
• Take advantage of parallelism
• An Example Application: Whole-Slide Histopathology
Image Analysis for Neuroblastoma
• Summary
52
An Example Application: Whole-Slide Histopathology
Image Analysis for Neuroblastoma
• Classify	biopsy	tissue	images	into	different	subtypes	of	
prognostic	significance
• Very	high	resolution	slides
• Divided	into	smaller	tiles
• Multi-resolution	image	analysis
• Mimics	the	way	pathologists	perform	their	analysis
• If	classification	at	lower	resolution	is	not	satisfactory,	analysis	algorithm	is	
executed	at	higher	resolution(s),	hence	the	dynamic	workload.
53
Why do we need HPC?
§ Due to the large sizes of whole-slide images
§ A 120K x 120K image digitized at 40x occupies more than 40 GB.
§ The processing time on a single CPU
§ For an image tile of 1K x 1K is »6 secs w/ Matlab, 850 msecs w/
C++
§ For a “small” 50K x 50K slide (assuming 50% background) »20 min.
§ In algorithm development
§ Algorithm development in Matlab
§ Requires evaluation of many different techniques, parameters etc.
§ In clinical practice, 8-9 biopsy samples are collected per patient. For an
average of 500 neuroblastoma patients treated annually, our biomedical
image analysis consumes:
§ On a CPU: 24 months using Matlab and 3.4 months using C++.
§ Can we reduce this to couple days or even hours?
54
`
Whole-slide image
Label 1
Label 2
background
undetermined
Assign classification labels
Classification map
Image tiles (40X magnification)
CPU
SSE
Intel Xeon
Phi
…
Computation units
GPU …
Computational Infrastructure
55
CPU
C/C++
…
Characterizing the GPU/CPU speed-up
56
Color
conversion
Co-
occur.
matrices
LBP
operator
Histo-
gram
Color channels Three Three One One
Output results 1Kx1K
tile
4x4
matrix
1Kx1K
tile
256
bins
Comput. weight Heavy Average Heavy Low
Operator type Streaming Iterative Streaming Iterative
Data reuse None Strong Little Strong
Locality access None High Little High
Arithm.
intensity
Heavy Low Average Low
Memory
access
Low High Average High
GPU speed-up 166.09 x 16.75 x 85.86 x 8.32 x
Effect of runtime optimizations
57
Homogeneous base case
Heterogeneous base case
Tile recalculation rate: % of tiles recalculated
at higher resolution.
ODDS improves performance even in the
base case
Using an additional CPU-only machine is
more than 3x faster than GPU-only version
Cluster Comput (2012) 15:125–144 139
Table 6 Different
demand-driven scheduling
policies used in Sect. 6
Demand-driven Area of Queue Policy Size of request for
Scheduling Policy effect Sender Receiver data buffers
DDFCFS Intra-filter Unsorted Unsorted Static
DDWRR Intra-filter Unsorted Sorted by speedup Static
ODDS Inter-filter Sorted by speedup Sorted by speedup Dynamic
In Table 6 we present three demand-driven policies
(where consumer filters only get as much data as they re-
quest) used in our evaluation. All these scheduling policies
maintain some minimal queue at the receiver side, such that
processor idle time is avoided. Simpler policies like round-
robin or random do not fit into the demand-driven paradigm,
as they simply push data buffers down to the consumer filters
without any knowledge of whether the data buffers are being
processed efficiently. As such, we do not consider these to
be good scheduling methods, and we exclude them from our
evaluation.
The First-Come, First-Served (DDFCFS) policy simply
maintains FIFO queues of data buffers on both ends of the
stream, and a filter instance requesting data will get what-
ever data buffer is next out of the queue. The DDWRR
policy uses the same technique as DDFCFS on the sender
side, but sorts its receiver-side queue of data buffers by
the relative speedup to give the highest-performing data
buffers to each processor. Both DDFCFS and DDWRR have
a static value for requests for data buffers during execu-
tion, which is chosen by the programmer. For ODDS, dis-
cussed in Sect. 5.3, the sender and receiver queues are sorted
by speedup and the receiver’s number of requests for data
buffers is dynamically calculated at run-time.
6.5.1 Homogeneous cluster base case
This section presents the results of experiments run in the
Fig. 17 Homogeneous base case evaluation
Outline
• HPC
• What is it? Why?
• A Crash Course on (HPC) Computer Architecture
• History of Single “Processor” Performance
• Taxonomy of Processors, Memory Topology of Parallel Computers
• Supercomputers
• How to speedup your application?
• Focus the common case
• Pay attention to locality
• Take advantage of parallelism
• An Example Application: Whole-Slide Histopathology
Image Analysis for Neuroblastoma
• Summary
58
How about Cloud Computing?
• Cloud Computing
• It is not really “Cloud”; it is someone else’s computer!
• Rent instead of buy.
• Pay for Compute, Data Storage and Transfer.
• Our current best bet to enable sharing of large data, workflows and
computational resources.
• For “most of us” our best bet to achieve scalability and speed.
• Sample reading:
• Nature Reviews Genetics 11, 647-657 (September 2010) | doi:10.1038/nrg2857
• Computational solutions to large-scale data management and analysis
• Eric E. Schadt, Michael D. Linderman, Jon Sorenson, Lawrence Lee, and Garry
P. Nolan
• http://www.nature.com/nrg/multimedia/compsolutions/slideshow.html
• See also: Correspondence by Trelles et al. | Correspondence by Schadt et al.
59
Summary
• How to speedup your application?
• Focus the common case
• If only 50% can be “improved”, best you can get 2x speedup!
• Pay attention to locality
• Reduce data move
• Move computation to data
• Take advantage of parallelism
• Multiple types of parallelism: task-, data- and pipelined-parallelism
• Fastest processor does not mean your application will run fast; find most suitable
architecture.
• GPUs are good for “regular” computations
• GPUs can be up-to 10x faster compared to multi-core CPU, in many real life
applications, it is usually 3-5x
60
QUESTIONS?
61

Contenu connexe

Tendances

BUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONS
BUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONSBUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONS
BUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONSijccsa
 
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster IJECEIAES
 
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...IOSR Journals
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computinghuda2018
 
Introduction to High Performance Computing
Introduction to High Performance ComputingIntroduction to High Performance Computing
Introduction to High Performance ComputingUmarudin Zaenuri
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Robert Grossman
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance ComputingDivyen Patel
 
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015Abdul Nasir
 
GPU accelerated Large Scale Analytics
GPU accelerated Large Scale AnalyticsGPU accelerated Large Scale Analytics
GPU accelerated Large Scale AnalyticsSuleiman Shehu
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetupGanesan Narayanasamy
 
Accumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and SupercomputingAccumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and SupercomputingAccumulo Summit
 
High performance computing
High performance computingHigh performance computing
High performance computingMaher Alshammari
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoopdbpublications
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesDavid Tjahjono,MD,MBA(UK)
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training reportSarvesh Meena
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERinside-BigData.com
 
Covid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsCovid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsGanesan Narayanasamy
 
Leveraging IoT as part of your digital transformation
Leveraging IoT as part of your digital transformationLeveraging IoT as part of your digital transformation
Leveraging IoT as part of your digital transformationJohn Archer
 

Tendances (20)

BUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONS
BUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONSBUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONS
BUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONS
 
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
 
High–Performance Computing
High–Performance ComputingHigh–Performance Computing
High–Performance Computing
 
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
 
Introduction to High Performance Computing
Introduction to High Performance ComputingIntroduction to High Performance Computing
Introduction to High Performance Computing
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance Computing
 
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
 
GPU accelerated Large Scale Analytics
GPU accelerated Large Scale AnalyticsGPU accelerated Large Scale Analytics
GPU accelerated Large Scale Analytics
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar
 
Accumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and SupercomputingAccumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoop
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution Times
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
 
Covid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsCovid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power Systems
 
Leveraging IoT as part of your digital transformation
Leveraging IoT as part of your digital transformationLeveraging IoT as part of your digital transformation
Leveraging IoT as part of your digital transformation
 

Similaire à Modern Computing: Cloud, Distributed, & High Performance

How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceinside-BigData.com
 
Dell High-Performance Computing solutions: Enable innovations, outperform exp...
Dell High-Performance Computing solutions: Enable innovations, outperform exp...Dell High-Performance Computing solutions: Enable innovations, outperform exp...
Dell High-Performance Computing solutions: Enable innovations, outperform exp...Dell World
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresCloudLightning
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores inside-BigData.com
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel ComputingRoshan Karunarathna
 
Introduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingIntroduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingSayed Chhattan Shah
 
Stories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresStories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresSpark Summit
 
parallelprogramming-130823023925-phpapp01.pptx
parallelprogramming-130823023925-phpapp01.pptxparallelprogramming-130823023925-phpapp01.pptx
parallelprogramming-130823023925-phpapp01.pptxMarlonMagtibay3
 
Lecture 1
Lecture 1Lecture 1
Lecture 1Mr SMAK
 
Scientific
Scientific Scientific
Scientific marpierc
 
Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudAmazon Web Services
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
A Queue Simulation Tool for a High Performance Scientific Computing Center
A Queue Simulation Tool for a High Performance Scientific Computing CenterA Queue Simulation Tool for a High Performance Scientific Computing Center
A Queue Simulation Tool for a High Performance Scientific Computing CenterJames McGalliard
 
NASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & EngineeringNASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & Engineeringinside-BigData.com
 
AWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAmazon Web Services
 

Similaire à Modern Computing: Cloud, Distributed, & High Performance (20)

How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
 
Dell High-Performance Computing solutions: Enable innovations, outperform exp...
Dell High-Performance Computing solutions: Enable innovations, outperform exp...Dell High-Performance Computing solutions: Enable innovations, outperform exp...
Dell High-Performance Computing solutions: Enable innovations, outperform exp...
 
Available HPC Resources at CSUC
Available HPC Resources at CSUCAvailable HPC Resources at CSUC
Available HPC Resources at CSUC
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud Infrastructures
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores 
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Available HPC resources at CSUC
Available HPC resources at CSUCAvailable HPC resources at CSUC
Available HPC resources at CSUC
 
Future of hpc
Future of hpcFuture of hpc
Future of hpc
 
Introduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingIntroduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed Computing
 
Stories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresStories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi Torres
 
parallelprogramming-130823023925-phpapp01.pptx
parallelprogramming-130823023925-phpapp01.pptxparallelprogramming-130823023925-phpapp01.pptx
parallelprogramming-130823023925-phpapp01.pptx
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Scientific
Scientific Scientific
Scientific
 
Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS Cloud
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
A Queue Simulation Tool for a High Performance Scientific Computing Center
A Queue Simulation Tool for a High Performance Scientific Computing CenterA Queue Simulation Tool for a High Performance Scientific Computing Center
A Queue Simulation Tool for a High Performance Scientific Computing Center
 
NASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & EngineeringNASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & Engineering
 
AWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWS
 
Pdc lecture1
Pdc lecture1Pdc lecture1
Pdc lecture1
 
Chap2 slides
Chap2 slidesChap2 slides
Chap2 slides
 

Plus de inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

Plus de inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Dernier

UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 

Dernier (20)

UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 

Modern Computing: Cloud, Distributed, & High Performance

  • 1. SECTION 3: MODERN COMPUTING: CLOUD, DISTRIBUTED & HIGH PERFORMANCE DR. ÜMIT V. ÇATALYÜREK PROFESSOR AND ASSOCIATE CHAIR Georgia Institute of Technology JANUARY 27, 2017 The Big Data to Knowledge (BD2K) Guide to the Fundamentals of Data Science 1
  • 2. ÜMİT V. ÇATALYÜREK • A Professor in the School of Computational Science & Engineering in the College of Computing at the Georgia Institute of Technology. • A recipient of an NSF CAREER award • The primary investigator of several awards from the Department of Energy, the National Institute of Health, & the National Science Foundation. • An Associate Editor for Parallel Computing, & editorial board member for IEEE Transactions on Parallel & Distributed Computing, & the Journal of Parallel & Distributed Computing. • A Fellow of IEEE, member of ACM & SIAM, & the Chair for IEEE TCPP 2016-2017, & Vice-Chair for ACM SIGBio 2015- 2018 term. • Main research areas: parallel computing, combinatorial scientific computing & biomedical informatics. • More information about Dr. Ümit V. Çatalyürek can be found at http://cc.gatech.edu/~umit. 2
  • 3. MODERN COMPUTING: CLOUD, DISTRIBUTED & HIGH PERFORMANCE COMPUTING Ümit V. Çatalyürek Professor and Associate Chair School of Computational Science and Engineering Georgia Institute of Technology The BD2K Guide to the Fundamentals of Data Science Series 27 January 2017 3
  • 4. Outline • HPC • What is it? Why? • A Crash Course on (HPC) Computer Architecture • History of Single “Processor” Performance • Taxonomy of Processors, Memory Topology of Parallel Computers • Supercomputers • How to speedup your application? • Focus the common case • Pay attention to locality • Take advantage of parallelism • An Example Application: Whole-Slide Histopathology Image Analysis for Neuroblastoma • Summary 4
  • 5. What does High Performance Computing (HPC) mean? • There is no such thing as “Low Performance Computing” • “HPC most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business” (insideHPC) • HPC allows scientists and engineers to solve complex science, engineering, and business problems using applications that require high bandwidth, enhanced networking, and very high compute capabilities.” (Amazon AWS) • “HPC is the use of parallel processing for running advanced application programs efficiently, reliably and quickly… The term HPC is occasionally used as a synonym for supercomputing.” (SearchEnterpriseLinux/WhatIs.com) 5
  • 6. My Definition of High Performance Computing (HPC) • Efficient use of computing platforms for running application programs quickly. • Why do we care about speed? • We do not want science to wait for computing. • Why do we care about efficiency? • Efficient use of resources means more resources available to all of us J • Somebody has to pay the bills! • When you have efficient program, it will be also very fast! • Supercomputing is HPC, but HPC does not mean just supercomputing • For Supercomputers check top500.org (more later). 6
  • 7. Computing Today • Computing = Parallel Computing = HPC • Any “computer” you touch has parallel processing power: • Your laptop’s CPU has at least 2 cores. • Your cell phone has 4-8 cores! • This is BD2K Seminar: Data (and hence computational need) is BIG! • Too big that it does not fit in to your computer. • It takes too long to compute on your computer. 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 Dec-82 Nov-84 Oct-86 Sep-88 Aug-90 Jul-92 Jun-94 M ay-96 Apr-98 M ar-00 Feb-02 Jan-04 Dec-05 Nov-07 Oct-09 Sep-11 Aug-13 Jul-15 Megabases GenBank Bases GenBank WGS Source: http://www.genome.gov/sequencingcosts/ Oxford Nanapore MinION MkI 7
  • 8. Outline • HPC • What is it? Why? • A Crash Course on (HPC) Computer Architecture • History of Single “Processor” Performance • Taxonomy of Processors, Memory Topology of Parallel Computers • Supercomputers • How to speedup your application? • Focus the common case • Pay attention to locality • Take advantage of parallelism • An Example Application: Whole-Slide Histopathology Image Analysis for Neuroblastoma • Summary 8
  • 9. History of Single “Processor” Performance 9 RISC Move to multi-cores
  • 10. Bandwidth and Latency •Bandwidth or throughput • Total work done in a given time • 10,000-25,000X improvement for processors • 300-1200X improvement for memory and disks •Latency or response time • Time between start and completion of an event • 30-80X improvement for processors • 6-8X improvement for memory and disks 10
  • 11. Bandwidth and Latency 11 Log-log plot of bandwidth and latency milestones
  • 12. Flynn’s Taxonomy 12 Instructions Single (SI) Multiple (MI) Data Multiple(MD) SISD Single-threaded process MISD Pipeline architecture SIMD Vector Processing MIMD Shared-/ Distributed- Memory Computing Single(SD)
  • 13. SISD 13 D D D D D D D Processor Instructions
  • 14. SIMD 14 D0 Processor Instructions D0D0 D0 D0 D0 D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D0
  • 15. GPU (SIMD) Advantage 15 Images are from W. Dally’s SC10 Keynote Talk
  • 16. MIMD 16 D D D D D D D Processor Instructions D D D D D D D Processor Instructions
  • 17. Memory Typology: Shared 17 Memory Processor Processor Processor Processor a.k.a. SMPs
  • 18. Memory Typology: Distributed 18 MemoryProcessor MemoryProcessor MemoryProcessor MemoryProcessor Network
  • 20. Memory Typology: Hybrid + Hetorogenous 20 Memory Processor Network Processor GPU Memory Processor Processor GPU Memory Processor Processor GPU Memory Processor Processor GPU
  • 21. Outline • HPC • What is it? Why? • A Crash Course on (HPC) Computer Architecture • History of Single “Processor” Performance • Taxonomy of Processors, Memory Topology of Parallel Computers • Supercomputers • How to speedup your application? • Focus the common case • Pay attention to locality • Take advantage of parallelism • An Example Application: Whole-Slide Histopathology Image Analysis for Neuroblastoma • Summary 21
  • 22. Oxen or Chicken Dilemma • "If you were plowing a field, which would you rather use? Two strong oxen or 1024 chickens?” Seymour Cray 22
  • 23. 23
  • 27. Outline • HPC • What is it? Why? • A Crash Course on (HPC) Computer Architecture • History of Single “Processor” Performance • Taxonomy of Processors, Memory Topology of Parallel Computers • Supercomputers • How to speedup your application? • Focus the common case • Pay attention to locality • Take advantage of parallelism • An Example Application: Whole-Slide Histopathology Image Analysis for Neuroblastoma • Summary 27
  • 28. Amdahl’s Law 28 ( ) enhanced enhanced enhanced new old overall Speedup Fraction Fraction 1 ExTime ExTime Speedup +− == 1 Best you could ever hope to do: ( )enhanced maximum Fraction-1 1 Speedup = ( ) ! " # $ % & +−×= enhanced enhanced enhancedoldnew Speedup Fraction FractionExTimeExTime 1
  • 29. Amdahl’s Law Example: ( ) ( ) 56.1 64.0 1 10 0.4 0.41 1 Speedup Fraction Fraction1 1 Speedup enhanced enhanced enhanced overall == +− = +− = 29 • Sequence Analysis Pipeline has a “slow” step which does error correction of the input reads • New CPU 10X faster • I/O bound server, so 60% time waiting for I/O • Apparently, its human nature to be attracted by 10X faster, vs. keeping in perspective its just 1.6X faster
  • 31. CLUSTAL W • Based on Higgins & Sharp CLUSTAL [Gene88] • Progressive alignment-based strategy • Pairwise Alignment (n2l2) • A distance matrix is computed using either an approximate method (fast) or dynamic programming (more accurate, slower) • Computation of Guide Tree (n3): phylogenetic tree • Computed from the distance matrix • Iteratively selecting aligned pairs and linking them. • Progressive Alignment (nl2) • A series of pairwise alignments computed using full dynamic programming to align larger and larger groups of sequences. • The order in the Guide Tree determines the ordering of sequence alignments. • At each step; either two sequences are aligned, or a new sequence is aligned with a group, or two groups are aligned. • n: number of sequences in the query • l : average sequence length 31
  • 32. Speeding up CLUSTALW Breakdown of CLUSTAL W Execution Time on PIII-650MHz 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% 25 50 75 100 150 200 400 600 800 1000 Number of GPCR Sequences TimeFractions prog-align guidetree pairwise • By parallelizing most time consuming part: pair-wise alignment 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 Speeedup # Processors Speedup of Parallelized Version of CLUSTALW linear pair align ideal speedup total 32
  • 33. 0 100 200 300 400 500 600 700 800 900 1,000 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 Speedup Number Of Processors 10.00% 5.00% 2.00% 1.00% 0.50% 0.10% More on Amdahl’s law 33
  • 34. Outline • HPC • What is it? Why? • A Crash Course on (HPC) Computer Architecture • History of Single “Processor” Performance • Taxonomy of Processors, Memory Topology of Parallel Computers • Supercomputers • How to speedup your application? • Focus the common case • Pay attention to locality • Take advantage of parallelism • An Example Application: Whole-Slide Histopathology Image Analysis for Neuroblastoma • Summary 34
  • 35. Levels of the Memory Hierarchy 35 CPU Registers 100s Bytes 300 – 500 ps (0.3-0.5 ns) L1 and L2 Cache 10s-100s K Bytes ~1 ns - ~10 ns $1000s/ GByte Main Memory G Bytes 80ns- 200ns ~ $100/ GByte Disk 10s T Bytes, 10 ms (10,000,000 ns) ~ $1 / GByte Capacity Access Time Cost Tape infinite sec-min ~$1 / GByte Registers L1 Cache Memory Disk Tape Instr. Operands Blocks Pages Files Upper Level Lower Level faster Larger L2 Cache Blocks
  • 36. Locality Aware Remote Visualization • Scientific and clinical research generate multi-BG to multi-TB of spatially and temporally correlated data • Different spatial and temporal resolutions • Different acquisition modalities, from CT to light microscopy to electron micrography • Examples Applications: Visible Human, mouse BIRN • DataCutter Streams Data to MPI-based OSC Parallel Renderer • Setup • Full color Visible Woman dataset • Super-sampled at 2x for entire dataset, 4x and 8x for regions of the dataset • Data stored on 20 nodes • 8 rendering nodes and 1 compositing node with texture VR • Remote thin client connected over internet
  • 39. Implementation of OSC Parallel Renderer
  • 40. Implementation of OSC Parallel Renderer
  • 41. Outline • HPC • What is it? Why? • A Crash Course on (HPC) Computer Architecture • History of Single “Processor” Performance • Taxonomy of Processors, Memory Topology of Parallel Computers • Supercomputers • How to speedup your application? • Focus the common case • Pay attention to locality • Take advantage of parallelism • An Example Application: Whole-Slide Histopathology Image Analysis for Neuroblastoma • Summary 41
  • 42. Current and Emerging Scientific Applications 42 Processing Remotely-Sensed Data NOAA Tiros-N w/ AVHRR sensor AVHRR Level 1 DataAVHRR Level 1 Data • As the TIROS-N satellite orbits, the Advanced Very High Resolution Radiometer (AVHRR) sensor scans perpendicular to the satellite’s track. • At regular intervals along a scan line measurements are gathered to form an instantaneous field of view (IFOV). • Scan lines are aggregated into Level 1 data sets. A single file of Global Area Coverage (GAC) data represents: • ~one full earth orbit. • ~110 minutes. • ~40 megabytes. • ~15,000 scan lines. One scan line is 409 IFOV’s Satellite Data Processing DCE-MRI Analysis Short Sequence Mapping Quantum Chemistry Image ProcessingMultimedia Video Surveillance Montage
  • 43. Application Patterns •Complex and diverse processing structures 43 Processing Remotely-Sensed Data NOAA Tiros-N w/ AVHRR sensor AVHRR Level 1 DataAVHRR Level 1 Data • As the TIROS-N satellite orbits, the Advanced Very High Resolution Radiometer (AVHRR) sensor scans perpendicular to the satellite’s track. • At regular intervals along a scan line measurements are gathered to form an instantaneous field of view (IFOV). • Scan lines are aggregated into Level 1 data sets. A single file of Global Area Coverage (GAC) data represents: • ~one full earth orbit. • ~110 minutes. • ~40 megabytes. • ~15,000 scan lines. One scan line is 409 IFOV’s Bag-of-Tasks Model Data Analysis Applications Bag-of-Tasks Applications Task File
  • 44. Application Patterns •Complex and diverse processing structures 44 Data Analysis Applications Bag-of-Tasks Applications Workflows Non-streaming Task File Sequential or Parallel Task This image cannot currently be displayed. Non-streaming
  • 45. Application Patterns •Complex and diverse processing structures 45 Streaming Data Analysis Applications Bag-of-Tasks Applications Workflows Non-streaming Task File Sequential or Parallel Task Streaming
  • 46. Taxonomy of Parallelism •Complex and diverse processing structures • Varied parallelism 46 Bag-of-Tasks Applications Sequential Task File P1 P2 P3 P4 Task-parallelism
  • 47. Application Patterns • Complex and diverse processing structures • Varied parallelism •Bag-of-tasks applications: task-parallelism 47
  • 48. Data-parallelism Task-parallelism Non-streaming Workflows Sequential or Parallel Task P1 P2 P3 P4 Taxonomy of Parallelism • Complex and diverse processing structures • Varied parallelism 48
  • 49. Taxonomy of Parallelism • Complex and diverse processing structures • Varied parallelism •Bag-of-tasks: task-parallelism •Non-streaming workflows: task- and data-parallelism 49
  • 50. Data-parallelism Streaming Workflows Sequential or Parallel Task P1 P2 P3 P4 Pipelined-parallelism Task-parallelism Taxonomy of Parallelism • Complex and diverse processing structures • Varied parallelism 50
  • 51. Taxonomy of Parallelism • Complex and diverse processing structures • Varied parallelism •Bag-of-tasks: task-parallelism •Non-streaming workflows: task- and data-parallelism •Streaming workflows: task-, data- and pipelined- parallelism 51
  • 52. Outline • HPC • What is it? Why? • A Crash Course on (HPC) Computer Architecture • History of Single “Processor” Performance • Taxonomy of Processors, Memory Topology of Parallel Computers • Supercomputers • How to speedup your application? • Focus the common case • Pay attention to locality • Take advantage of parallelism • An Example Application: Whole-Slide Histopathology Image Analysis for Neuroblastoma • Summary 52
  • 53. An Example Application: Whole-Slide Histopathology Image Analysis for Neuroblastoma • Classify biopsy tissue images into different subtypes of prognostic significance • Very high resolution slides • Divided into smaller tiles • Multi-resolution image analysis • Mimics the way pathologists perform their analysis • If classification at lower resolution is not satisfactory, analysis algorithm is executed at higher resolution(s), hence the dynamic workload. 53
  • 54. Why do we need HPC? § Due to the large sizes of whole-slide images § A 120K x 120K image digitized at 40x occupies more than 40 GB. § The processing time on a single CPU § For an image tile of 1K x 1K is »6 secs w/ Matlab, 850 msecs w/ C++ § For a “small” 50K x 50K slide (assuming 50% background) »20 min. § In algorithm development § Algorithm development in Matlab § Requires evaluation of many different techniques, parameters etc. § In clinical practice, 8-9 biopsy samples are collected per patient. For an average of 500 neuroblastoma patients treated annually, our biomedical image analysis consumes: § On a CPU: 24 months using Matlab and 3.4 months using C++. § Can we reduce this to couple days or even hours? 54
  • 55. ` Whole-slide image Label 1 Label 2 background undetermined Assign classification labels Classification map Image tiles (40X magnification) CPU SSE Intel Xeon Phi … Computation units GPU … Computational Infrastructure 55 CPU C/C++ …
  • 56. Characterizing the GPU/CPU speed-up 56 Color conversion Co- occur. matrices LBP operator Histo- gram Color channels Three Three One One Output results 1Kx1K tile 4x4 matrix 1Kx1K tile 256 bins Comput. weight Heavy Average Heavy Low Operator type Streaming Iterative Streaming Iterative Data reuse None Strong Little Strong Locality access None High Little High Arithm. intensity Heavy Low Average Low Memory access Low High Average High GPU speed-up 166.09 x 16.75 x 85.86 x 8.32 x
  • 57. Effect of runtime optimizations 57 Homogeneous base case Heterogeneous base case Tile recalculation rate: % of tiles recalculated at higher resolution. ODDS improves performance even in the base case Using an additional CPU-only machine is more than 3x faster than GPU-only version Cluster Comput (2012) 15:125–144 139 Table 6 Different demand-driven scheduling policies used in Sect. 6 Demand-driven Area of Queue Policy Size of request for Scheduling Policy effect Sender Receiver data buffers DDFCFS Intra-filter Unsorted Unsorted Static DDWRR Intra-filter Unsorted Sorted by speedup Static ODDS Inter-filter Sorted by speedup Sorted by speedup Dynamic In Table 6 we present three demand-driven policies (where consumer filters only get as much data as they re- quest) used in our evaluation. All these scheduling policies maintain some minimal queue at the receiver side, such that processor idle time is avoided. Simpler policies like round- robin or random do not fit into the demand-driven paradigm, as they simply push data buffers down to the consumer filters without any knowledge of whether the data buffers are being processed efficiently. As such, we do not consider these to be good scheduling methods, and we exclude them from our evaluation. The First-Come, First-Served (DDFCFS) policy simply maintains FIFO queues of data buffers on both ends of the stream, and a filter instance requesting data will get what- ever data buffer is next out of the queue. The DDWRR policy uses the same technique as DDFCFS on the sender side, but sorts its receiver-side queue of data buffers by the relative speedup to give the highest-performing data buffers to each processor. Both DDFCFS and DDWRR have a static value for requests for data buffers during execu- tion, which is chosen by the programmer. For ODDS, dis- cussed in Sect. 5.3, the sender and receiver queues are sorted by speedup and the receiver’s number of requests for data buffers is dynamically calculated at run-time. 6.5.1 Homogeneous cluster base case This section presents the results of experiments run in the Fig. 17 Homogeneous base case evaluation
  • 58. Outline • HPC • What is it? Why? • A Crash Course on (HPC) Computer Architecture • History of Single “Processor” Performance • Taxonomy of Processors, Memory Topology of Parallel Computers • Supercomputers • How to speedup your application? • Focus the common case • Pay attention to locality • Take advantage of parallelism • An Example Application: Whole-Slide Histopathology Image Analysis for Neuroblastoma • Summary 58
  • 59. How about Cloud Computing? • Cloud Computing • It is not really “Cloud”; it is someone else’s computer! • Rent instead of buy. • Pay for Compute, Data Storage and Transfer. • Our current best bet to enable sharing of large data, workflows and computational resources. • For “most of us” our best bet to achieve scalability and speed. • Sample reading: • Nature Reviews Genetics 11, 647-657 (September 2010) | doi:10.1038/nrg2857 • Computational solutions to large-scale data management and analysis • Eric E. Schadt, Michael D. Linderman, Jon Sorenson, Lawrence Lee, and Garry P. Nolan • http://www.nature.com/nrg/multimedia/compsolutions/slideshow.html • See also: Correspondence by Trelles et al. | Correspondence by Schadt et al. 59
  • 60. Summary • How to speedup your application? • Focus the common case • If only 50% can be “improved”, best you can get 2x speedup! • Pay attention to locality • Reduce data move • Move computation to data • Take advantage of parallelism • Multiple types of parallelism: task-, data- and pipelined-parallelism • Fastest processor does not mean your application will run fast; find most suitable architecture. • GPUs are good for “regular” computations • GPUs can be up-to 10x faster compared to multi-core CPU, in many real life applications, it is usually 3-5x 60