3. FORMLESS
• FORMLESS: Scalable Utilization of Embedded
Manycores in Streaming Applications
[LCTES’12]
– Functionally-cOnsistent stRucturally-MalLEabe
Streaming Specification
– Actor-oriented specification models
– Space exploration scheme
• to customize the application specification to better fit
the target platform.
6. Dynamic Load Balancing
• A Distributed and Adaptive Dynamic Load
Balancing Scheme for Parallel Processing of
Medium-Grain Tasks
[IEEE Jounal, 1990]
– Challenge: Allocate and distribute tasks
dynamically with minimum run time overhead.
– Design: A distributed and adaptive load balancing
scheme for medium-grain tasks
7. Dynamic Load Balancing (cont.)
• Key idea 1: Neighborhood average strategy
– Attempts to balance load within a neighborhood
by distributing tasks
• such that all neighbors have loads close to the
neighborhood average.
– The decision when to balance load is based on the
neighborhood state information that is checked
periodically.
• Each processor maintains status information of all its
neighbors.
8. Dynamic Load Balancing (cont.)
• Key idea 2: Grain Size Control
– If the cost of making work available to another
processor exceeds the cost of executing it at the
local processor, then it does not make sense to
decompose and parallelize work beyond a certain
size or granularity of work.
– Granularity control: To determine when to stop
breaking down a computation into parallel
computations at a frontier node, treating it as a
leaf node and executing it sequentially.
9. Adaptive Load Balancing
• Compiler and Run-Time Support for Adaptive
Load Balancing in Software Distributed Shared
Memory Systems
[1998]
– Use information provided by the compiler to help
the run-time system distribute the work of the
parallel loops
• according to the relative power of the processors
• minimize communication and page sharing
10. Adaptive Load Balancing (cont.)
• Compile-Time Support for Load Balancing
– The specific compiler adopts SUIF system, which is
organized as a set of compiler passes.
– The SUIF pass extracts the shared data access
patterns in each of the SPMD regions, and feeds
this information to the run-time system.
• also responsible for adding hooks in the parallelized
code to allow run-time library to change the load
distribution
--------
SUIF: Stanford University Intermediate Format
SPMD: Single-Program Multiple-Data
11. Adaptive Load Balancing (cont.)
– Access pattern extraction
• SUIF pass walks through the program looking for
accesses to shared memory.
– Prefetching
• Use the access pattern information to prefetch data
through prefetching calls.
– Load balancing interface and strategy
• The compiler can direct the run-time to choose
between two partitioning strategies for distributing the
parallel loops.
1. Shifting of loop boundaries
2. Multiple loop bounds
12. Adaptive Load Balancing (cont.)
• Run-Time Load Balancing Support
– The run-time library is responsible for keeping
track of the progress of each process
• collect statistics about the execution time of each
parallel task, and
• adjust the load accordingly
– Load balancing vs. Locality management
• need to avoid unnecessary movement of data and
minimize page sharing
• Locality-conscious load balancing: the run-time library
uses the information supplied by the compiler about
what loop distribution strategy to use.
13. Algorithms for Scheduling
• Scheduling Malleable Parallel Tasks: An
Asymptotic Fully Polynomial-Time
Approximation Scheme [2002]
• Mapping and Scheduling Heterogeneous Tasks
using Genertic Algorithms [1995]
Notes de l'éditeur
Design space exploration for platform-driven instantiation of a FORMLESS specification.
FORMLESS specification of the sort example: A) Actor specifications. B-D) Example instantiations.
The scheme attempts to balance load within a neighborhood by distributing tasks such that all neighbors have loads close to the neighborhood average.
In terms of processing time the average grain size is defined as (Total Sequential Execution Time / Total Number of Message Processed)
The goal is to minimize execution time by considering both communication and the computation components.