SlideShare une entreprise Scribd logo
1  sur  35
Composing and Executing
Parallel Data-flow Graphs with
          Shell Pipes
      Edward Walker (TACC)
         Weijia Xu (TACC)
   Vinoth Chandar (Oracle Corp)
Agenda
• Motivation

• Shell language extensions

• Implementation

• Experimental evaluation

• Conclusions
Motivation
• Distributed memory clusters are becoming pervasive in
  industry and academia

• Shells are the default login environment on these
  systems

• Shell pipes are commonly used for composing extensible
  unix commands.

• There has been no change to the syntax/semantics of
  shell pipes since their invention over 30 years ago.

• Growing need to compose massively parallel jobs
  quickly, using existing software
Extending Shells for Parallel
            Computing
•   Build a simple, powerful coordination layer at the Shell


•   The coordination layer transparently manages the
    parallelism in the workflow


•   User specifies parallel computation as a dataflow graph
    using extensions to the Shell


•   Provides the ability to combine different tools and build
    interesting parallel programs quickly.
Shell pipe extensions
• Pipeline fork
     A | B on n procs
• Pipeline join
     A on n procs | B
• Pipeline cycles
     (++ n A)
• Pipeline key-value aggregation
     A | B on keys
Parallel shell tasks extensions
> function foo()
{
   echo “hello world”
}

> foo on all procs          # foo() on all CPUs

> foo on all nodes          # foo() on all nodes
                 stride
> foo on 10:2 procs # 10 tasks, 2 tasks on each node
                     span
> foo on 10:2:2 procs       # 10 tasks, 2 tasks on alternative node
Composing data-flow graphs
• Example 1:

 function B1() {}
                                             B1
 function B2() {}
                                         A        C
 function B()
 {                                           B2
   if (($_ASPECT_TASKID == 0 )) ; then
            B1
   else
       B2
   endif
 }

 A | B on 2 procs | C
Composing data-flow graphs
• Example 2:
  function map()
  {
                                                                              reduce
    emit_tuple –k key –v value                              map
  }
                                                                  Key-value
  function reduce()                                                 DHT
  {
    consume_tuple –k key –v value                           map               reduce

      num=${#value[@]}
      for ((i=0; i < $num; i++)) ; do
                   # process key=$key, value=${value[$i]}
      done
  }

  map on all procs | reduce on keys
BASH Implementation
Startup Overlay
• Script may have many instances requiring
  startup of parallel tasks
• Motivation for overlay:
  – Fast startup of parallel shell workers
  – Handles node failures gracefully
• Two level hierarchy: sectors and proxies
• Overlay node addressing: 7             0

            Compute node ID

                              Sector id   Proxy id
Fault-Tolerance

• Proxy nodes monitor peers within sector, and
  sector heads monitor peer sectors
• Node 0 maintains a list of available nodes in the
  overlay in a master_node file
                           Overlay sector 0                                         Overlay sector 1
                           Proxy Node 3           Proxy   Node 0                    Proxy   Node 6      Proxy   Node 7
                           exec                   exec                              exec                exec


                  Node 2              Node 1                               Node 4              Node 5
          Proxy                    Proxy                           Proxy                    Proxy
          exec                     exec                            exec                     exec



                                               master_node
Starting shell workers with
      startup overlay
1. Bash spawns agent.
2. Agent queries master_node and spawns
node I/O multiplexor
                                Overlay sector 0                                         Overlay sector 1
                                Proxy Node 3           Proxy   Node 0                    Proxy   Node 6      Proxy   Node 7
                                exec                   exec                              exec                exec


                       Node 2              Node 1                               Node 4              Node 5
               Proxy                    Proxy                           Proxy                    Proxy
               exec                     exec                            exec                     exec


                                                    master_node




                                 (2)


        (1)
 BASH          agent


              (2)


              Node I/O
               MUX
3. Agent Invokes overlay to spawn
CPU I/O multiplexor on node
                                Overlay sector 0                                       Overlay sector 1
                                Proxy Node 3        Proxy   Node 0                     Proxy   Node 6      Proxy   Node 7
                                exec                exec                               exec                exec


                       Node 2              Node 1                             Node 4              Node 5
               Proxy                    Proxy                         Proxy                    Proxy
               exec                     exec                          exec                     exec



              (3)



                                                                           (3)

        (1)
 BASH          agent



              (2)


              Node I/O                                               CPU I/O
               MUX                                                    MUX
4. CPU I/O multiplexor spawns a
shell worker per CPU on node
                                Overlay sector 0                                          Overlay sector 1
                                Proxy Node 3        Proxy   Node 0                        Proxy   Node 6      Proxy   Node 7
                                exec                exec                                  exec                exec


                       Node 2              Node 1                                Node 4              Node 5
               Proxy                    Proxy                           Proxy                     Proxy
               exec                     exec                            exec                      exec



              (3)




                                                                             (3)
        (1)
 BASH          agent



              (2)


              Node I/O                                                 CPU I/O
               MUX                                                      MUX

                                                                           (4)
                                                                     CPU CPUCPU
5. CPU I/O multiplexor calls back to
node I/O multiplexor
                                 Overlay sector 0                                                Overlay sector 1
                                 Proxy Node 3              Proxy   Node 0                        Proxy   Node 6      Proxy   Node 7
                                 exec                      exec                                  exec                exec


                        Node 2              Node 1                                      Node 4              Node 5
                Proxy                    Proxy                                 Proxy                     Proxy
                exec                     exec                                  exec                      exec



              (3)




        (1)
 BASH           agent
                                                                                   (3)


              (2)
                                                     (5)
              Node I/O                                                        CPU I/O
               MUX                                                             MUX

                                                                                  (4)

                                                                            CPU CPUCPU
Implementation of pipeline
          fork
1. Process B pipes stdin into stdin_file
                                                A | B on N procs

                                 stdin                 BASH
    stdout                pipe           (1)


                                           aspect-agent B

                                  stdin
   A                             reader
             stdin_file
2. Constructs command files for each
task
                                                  A | B on N procs

                                 stdin                   BASH
    stdout                pipe           (1)


                                        aspect-agent B
                                                      Cmd
                                  stdin            dispatcher
    A                            reader           (2)
             stdin_file




                                          Cmd
                                          files
                                                    B
                                          cat stdin_file | B
3. 4. and 5. Execute command files in shell
workers and marshal results back to shell

                                                         A | B on N procs

                                        stdin                   BASH
           stdout                pipe           (1)




                                                                                                    control


                                                                                                              stdout
                                               aspect-agent B
                                                             Cmd
                                         stdin            dispatcher
          A                             reader                                                         I/O
                                                         (2)               flusher
                                                                               flusher                MUX
                    stdin_file                                                      flusher
                                                                       (3)




                                                                  qu
                                                                     eue
                                                 Cmd                                          (5)
                                                 files                         Node
                                                           B                     Node
                                                                               MUX Node
                                                                                 MUX
                                                 cat stdin_file | B                 MUX

                                                               Compute node                (4)

                                                                   Shell              Shell
                                                                  worker             worker



                                                                      B                B
6. Replay command files on failure

                                                             A | B on N procs

                                            stdin                   BASH
        stdout                 pipe                 (1)




                                                                                                        control


                                                                                                                  stdout
                                                   aspect-agent B
                                                                 Cmd
                                             stdin            dispatcher
        A                                   reader                                                         I/O
                                                             (2)               flusher
                                                                                   flusher                MUX
                  stdin_file                                                            flusher
                                                replayer                   (3)
                                      (6)




                                                                      qu
                                                                         e
        Local compute node




                                                                          ue
                                                     Cmd                                          (5)
             Shell               Shell               files                         Node
            worker              worker                         B                     Node
                                                                                   MUX Node
                                                                                     MUX
                                                     cat stdin_file | B                 MUX

                                                                   Compute node                (4)
              B                   B
                                                                       Shell              Shell
                                                                      worker             worker



                                                                          B                B
Implementation of key-value
       aggregation
1. Agent inspects and hashes key

                              A | B on keys
                      pipe
                                   BASH
            control            control
                                    (1)
                      aspect-agent B
                                   Key
             A
                                dispatcher
2. Routes key-value to compute node based
on key hash, and stored in hash table

                                     A | B on keys
                             pipe
                                          BASH
                  control             control
                                           (1)
                             aspect-agent B
                                          Key
                    A
                                       dispatcher
                                                    (2)


                                                             Node
                                                             MUX


           Compute node                             Compute node
            Distributed Hash Table

                            Hash                                   Hash
             gdbm           table                   gdbm           table
3. Each node constructs command files to
pipe the key-value entry from its hash table
into process B
                                      A | B on keys
                              pipe
                                           BASH
                   control             control
                                            (1)
                              aspect-agent B
                                           Key
                     A
                                        dispatcher
                                                     (2)


                                                                  Node
                                                                  MUX


            Compute node                             Compute node
             Distributed Hash Table

                             Hash                                    Hash
              gdbm           table                   gdbm            table



              emit_tuple                             emit_tuple
                                                                         (3)
                                       B                                       B
4. Results from the command files
execution are marshaled back to the shell

                                      A | B on keys
                              pipe
                                           BASH
                   control             control
                                            (1)




                                                             stdout


                                                                         control
                              aspect-agent B
                                           Key              I/O MUX
                     A
                                        dispatcher
                                                     (2)


                                                                      Node
                                                                      MUX          (4)


            Compute node                             Compute node
            Distributed Hash Table

                             Hash                                        Hash
              gdbm           table                   gdbm                table



              emit_tuple                             emit_tuple
                                                                             (3)
                                       B                                            B
Experimental Evaluation
Startup overlay performance (when
compared to SSH default mechanism)
Syntactic benchmark I: performance of
             pipeline join
Syntactic benchmark II: performance of
        key-value aggregation
TeraSort benchmark:
           Parallel bucket sort
• Step 1: spawn the data generator in parallel on
  each compute node, partitioning data across N
  nodes for task T if the first 2 bytes fall in the
  range:
               16 T        16     T + 1
              2 ∗ N ,     2     ∗
                                    N 
                                       

• Step 2: perform sort on local data on each node

• Step 3: merge results onto global file system
TeraSort benchmark:
    Sorting rate
Related Work
• Ptolemy – embedded system design

• Yahoo Pipes – web content filtering

• Hadoop – Java implementation of
  MapReduce

• Dryad - distributed DAG data flow
  computation
Conclusion
• A debugger would be extremely helpful.
  Working on bashdb implementation.

• Run-time simulator would be helpful to
  predict performance based on
  characteristics of cluster.

• Still thinking about how to incorporate our
  extensions for named pipes (i.e. mkfifo).
Questions ?

Contenu connexe

Tendances

For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...
For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...
For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...CODE BLUE
 
Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...
Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...
Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...confluent
 
Functional Operations (Functional Programming at Comcast Labs Connect)
Functional Operations (Functional Programming at Comcast Labs Connect)Functional Operations (Functional Programming at Comcast Labs Connect)
Functional Operations (Functional Programming at Comcast Labs Connect)Susan Potter
 
Inter-process communication of Android
Inter-process communication of AndroidInter-process communication of Android
Inter-process communication of AndroidTetsuyuki Kobayashi
 
Take a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru Otsuka
Take a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru OtsukaTake a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru Otsuka
Take a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru OtsukaCODE BLUE
 
Essentials of Multithreaded System Programming in C++
Essentials of Multithreaded System Programming in C++Essentials of Multithreaded System Programming in C++
Essentials of Multithreaded System Programming in C++Shuo Chen
 
[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory AnalysisMoabi.com
 
Dynamo: Not Just For Datastores
Dynamo: Not Just For DatastoresDynamo: Not Just For Datastores
Dynamo: Not Just For DatastoresSusan Potter
 
[COSCUP 2021] LLVM Project: The Good, The Bad, and The Ugly
[COSCUP 2021] LLVM Project: The Good, The Bad, and The Ugly[COSCUP 2021] LLVM Project: The Good, The Bad, and The Ugly
[COSCUP 2021] LLVM Project: The Good, The Bad, and The UglyMin-Yih Hsu
 
CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...
CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...
CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...CanSecWest
 
Detecting hardware virtualization rootkits
Detecting hardware virtualization rootkitsDetecting hardware virtualization rootkits
Detecting hardware virtualization rootkitsEdgar Barbosa
 
Trends of SW Platforms for Heterogeneous Multi-core systems and Open Source ...
Trends of SW Platforms for Heterogeneous Multi-core systems and  Open Source ...Trends of SW Platforms for Heterogeneous Multi-core systems and  Open Source ...
Trends of SW Platforms for Heterogeneous Multi-core systems and Open Source ...Seunghwa Song
 
Docker - container and lightweight virtualization
Docker - container and lightweight virtualization Docker - container and lightweight virtualization
Docker - container and lightweight virtualization Sim Janghoon
 
syzbot and the tale of million kernel bugs
syzbot and the tale of million kernel bugssyzbot and the tale of million kernel bugs
syzbot and the tale of million kernel bugsDmitry Vyukov
 
Ricon/West 2013: Adventures with Riak Pipe
Ricon/West 2013: Adventures with Riak PipeRicon/West 2013: Adventures with Riak Pipe
Ricon/West 2013: Adventures with Riak PipeSusan Potter
 
The Silence of the Canaries
The Silence of the CanariesThe Silence of the Canaries
The Silence of the CanariesKernel TLV
 
Kernel Recipes 2019 - CVEs are dead, long live the CVE!
Kernel Recipes 2019 - CVEs are dead, long live the CVE!Kernel Recipes 2019 - CVEs are dead, long live the CVE!
Kernel Recipes 2019 - CVEs are dead, long live the CVE!Anne Nicolas
 
ADB(Android Debug Bridge): How it works?
ADB(Android Debug Bridge): How it works?ADB(Android Debug Bridge): How it works?
ADB(Android Debug Bridge): How it works?Tetsuyuki Kobayashi
 

Tendances (20)

For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...
For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...
For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...
 
The pocl Kernel Compiler
The pocl Kernel CompilerThe pocl Kernel Compiler
The pocl Kernel Compiler
 
Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...
Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...
Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...
 
Functional Operations (Functional Programming at Comcast Labs Connect)
Functional Operations (Functional Programming at Comcast Labs Connect)Functional Operations (Functional Programming at Comcast Labs Connect)
Functional Operations (Functional Programming at Comcast Labs Connect)
 
Inter-process communication of Android
Inter-process communication of AndroidInter-process communication of Android
Inter-process communication of Android
 
Take a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru Otsuka
Take a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru OtsukaTake a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru Otsuka
Take a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru Otsuka
 
Essentials of Multithreaded System Programming in C++
Essentials of Multithreaded System Programming in C++Essentials of Multithreaded System Programming in C++
Essentials of Multithreaded System Programming in C++
 
[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis
 
Dynamo: Not Just For Datastores
Dynamo: Not Just For DatastoresDynamo: Not Just For Datastores
Dynamo: Not Just For Datastores
 
[COSCUP 2021] LLVM Project: The Good, The Bad, and The Ugly
[COSCUP 2021] LLVM Project: The Good, The Bad, and The Ugly[COSCUP 2021] LLVM Project: The Good, The Bad, and The Ugly
[COSCUP 2021] LLVM Project: The Good, The Bad, and The Ugly
 
CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...
CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...
CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...
 
Detecting hardware virtualization rootkits
Detecting hardware virtualization rootkitsDetecting hardware virtualization rootkits
Detecting hardware virtualization rootkits
 
Xen Debugging
Xen DebuggingXen Debugging
Xen Debugging
 
Trends of SW Platforms for Heterogeneous Multi-core systems and Open Source ...
Trends of SW Platforms for Heterogeneous Multi-core systems and  Open Source ...Trends of SW Platforms for Heterogeneous Multi-core systems and  Open Source ...
Trends of SW Platforms for Heterogeneous Multi-core systems and Open Source ...
 
Docker - container and lightweight virtualization
Docker - container and lightweight virtualization Docker - container and lightweight virtualization
Docker - container and lightweight virtualization
 
syzbot and the tale of million kernel bugs
syzbot and the tale of million kernel bugssyzbot and the tale of million kernel bugs
syzbot and the tale of million kernel bugs
 
Ricon/West 2013: Adventures with Riak Pipe
Ricon/West 2013: Adventures with Riak PipeRicon/West 2013: Adventures with Riak Pipe
Ricon/West 2013: Adventures with Riak Pipe
 
The Silence of the Canaries
The Silence of the CanariesThe Silence of the Canaries
The Silence of the Canaries
 
Kernel Recipes 2019 - CVEs are dead, long live the CVE!
Kernel Recipes 2019 - CVEs are dead, long live the CVE!Kernel Recipes 2019 - CVEs are dead, long live the CVE!
Kernel Recipes 2019 - CVEs are dead, long live the CVE!
 
ADB(Android Debug Bridge): How it works?
ADB(Android Debug Bridge): How it works?ADB(Android Debug Bridge): How it works?
ADB(Android Debug Bridge): How it works?
 

En vedette

Voldemort : Prototype to Production
Voldemort : Prototype to ProductionVoldemort : Prototype to Production
Voldemort : Prototype to ProductionVinoth Chandar
 
Voldemort on Solid State Drives
Voldemort on Solid State DrivesVoldemort on Solid State Drives
Voldemort on Solid State DrivesVinoth Chandar
 
Lecture 3
Lecture 3Lecture 3
Lecture 3Mr SMAK
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Vinoth Chandar
 
Applications of paralleL processing
Applications of paralleL processingApplications of paralleL processing
Applications of paralleL processingPage Maker
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processingPage Maker
 
Introducción a Voldemort - Innova4j
Introducción a Voldemort - Innova4jIntroducción a Voldemort - Innova4j
Introducción a Voldemort - Innova4jInnova4j
 

En vedette (13)

Voldemort : Prototype to Production
Voldemort : Prototype to ProductionVoldemort : Prototype to Production
Voldemort : Prototype to Production
 
Bluetube
BluetubeBluetube
Bluetube
 
Voldemort on Solid State Drives
Voldemort on Solid State DrivesVoldemort on Solid State Drives
Voldemort on Solid State Drives
 
Project Voldemort
Project VoldemortProject Voldemort
Project Voldemort
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
Vol1
Vol1Vol1
Vol1
 
Voldemort
VoldemortVoldemort
Voldemort
 
Lecture 47
Lecture 47Lecture 47
Lecture 47
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived
 
Applications of paralleL processing
Applications of paralleL processingApplications of paralleL processing
Applications of paralleL processing
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processing
 
Introducción a Voldemort - Innova4j
Introducción a Voldemort - Innova4jIntroducción a Voldemort - Innova4j
Introducción a Voldemort - Innova4j
 

Similaire à Composing and Executing Parallel Data Flow Graphs wth Shell Pipes

Ryu: network operating system
Ryu: network operating systemRyu: network operating system
Ryu: network operating systemIsaku Yamahata
 
Openflow勉強会 「OpenFlowコントローラを取り巻く状況とその実装」
Openflow勉強会 「OpenFlowコントローラを取り巻く状況とその実装」Openflow勉強会 「OpenFlowコントローラを取り巻く状況とその実装」
Openflow勉強会 「OpenFlowコントローラを取り巻く状況とその実装」Sho Shimizu
 
Openstack Quantum + Devstack Tutorial
Openstack Quantum + Devstack TutorialOpenstack Quantum + Devstack Tutorial
Openstack Quantum + Devstack TutorialDavid Lapsley
 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsHisaki Ohara
 
Node.js Explained
Node.js ExplainedNode.js Explained
Node.js ExplainedJeff Kunkle
 
第2回クラウドネットワーク研究会 「OpenFlowコントローラとその実装」
第2回クラウドネットワーク研究会 「OpenFlowコントローラとその実装」第2回クラウドネットワーク研究会 「OpenFlowコントローラとその実装」
第2回クラウドネットワーク研究会 「OpenFlowコントローラとその実装」Sho Shimizu
 
Harmonia open iris_basic_v0.1
Harmonia open iris_basic_v0.1Harmonia open iris_basic_v0.1
Harmonia open iris_basic_v0.1Yongyoon Shin
 
Network Automation Tools
Network Automation ToolsNetwork Automation Tools
Network Automation ToolsEdwin Beekman
 
4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...
4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...
4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...Antonio Garrote Hernández
 
Open Source Integration with WSO2 Enterprise Service Bus
Open Source Integration  with  WSO2 Enterprise Service BusOpen Source Integration  with  WSO2 Enterprise Service Bus
Open Source Integration with WSO2 Enterprise Service Bussumedha.r
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftLee Stott
 
Node.js at Joyent: Engineering for Production
Node.js at Joyent: Engineering for ProductionNode.js at Joyent: Engineering for Production
Node.js at Joyent: Engineering for Productionjclulow
 
SDNDS.TW Mininet
SDNDS.TW MininetSDNDS.TW Mininet
SDNDS.TW MininetNCTU
 
Introduction to MapReduce using Disco
Introduction to MapReduce using DiscoIntroduction to MapReduce using Disco
Introduction to MapReduce using DiscoJim Roepcke
 
OSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js TutorialOSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js TutorialTom Croucher
 
[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDN
[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDN[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDN
[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDNOpenStack Korea Community
 

Similaire à Composing and Executing Parallel Data Flow Graphs wth Shell Pipes (20)

Ryu: network operating system
Ryu: network operating systemRyu: network operating system
Ryu: network operating system
 
Openflow勉強会 「OpenFlowコントローラを取り巻く状況とその実装」
Openflow勉強会 「OpenFlowコントローラを取り巻く状況とその実装」Openflow勉強会 「OpenFlowコントローラを取り巻く状況とその実装」
Openflow勉強会 「OpenFlowコントローラを取り巻く状況とその実装」
 
Genode Compositions
Genode CompositionsGenode Compositions
Genode Compositions
 
Openstack Quantum + Devstack Tutorial
Openstack Quantum + Devstack TutorialOpenstack Quantum + Devstack Tutorial
Openstack Quantum + Devstack Tutorial
 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructions
 
Node.js Explained
Node.js ExplainedNode.js Explained
Node.js Explained
 
第2回クラウドネットワーク研究会 「OpenFlowコントローラとその実装」
第2回クラウドネットワーク研究会 「OpenFlowコントローラとその実装」第2回クラウドネットワーク研究会 「OpenFlowコントローラとその実装」
第2回クラウドネットワーク研究会 「OpenFlowコントローラとその実装」
 
Harmonia open iris_basic_v0.1
Harmonia open iris_basic_v0.1Harmonia open iris_basic_v0.1
Harmonia open iris_basic_v0.1
 
Network Automation Tools
Network Automation ToolsNetwork Automation Tools
Network Automation Tools
 
OVS-NFV Tutorial
OVS-NFV TutorialOVS-NFV Tutorial
OVS-NFV Tutorial
 
4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...
4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...
4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...
 
Open Source Integration with WSO2 Enterprise Service Bus
Open Source Integration  with  WSO2 Enterprise Service BusOpen Source Integration  with  WSO2 Enterprise Service Bus
Open Source Integration with WSO2 Enterprise Service Bus
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop Microsoft
 
Ryu ods2012-spring
Ryu ods2012-springRyu ods2012-spring
Ryu ods2012-spring
 
Node.js at Joyent: Engineering for Production
Node.js at Joyent: Engineering for ProductionNode.js at Joyent: Engineering for Production
Node.js at Joyent: Engineering for Production
 
SDNDS.TW Mininet
SDNDS.TW MininetSDNDS.TW Mininet
SDNDS.TW Mininet
 
Nodejs Session01
Nodejs Session01Nodejs Session01
Nodejs Session01
 
Introduction to MapReduce using Disco
Introduction to MapReduce using DiscoIntroduction to MapReduce using Disco
Introduction to MapReduce using Disco
 
OSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js TutorialOSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js Tutorial
 
[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDN
[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDN[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDN
[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDN
 

Dernier

Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Dernier (20)

Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Composing and Executing Parallel Data Flow Graphs wth Shell Pipes

  • 1. Composing and Executing Parallel Data-flow Graphs with Shell Pipes Edward Walker (TACC) Weijia Xu (TACC) Vinoth Chandar (Oracle Corp)
  • 2. Agenda • Motivation • Shell language extensions • Implementation • Experimental evaluation • Conclusions
  • 3. Motivation • Distributed memory clusters are becoming pervasive in industry and academia • Shells are the default login environment on these systems • Shell pipes are commonly used for composing extensible unix commands. • There has been no change to the syntax/semantics of shell pipes since their invention over 30 years ago. • Growing need to compose massively parallel jobs quickly, using existing software
  • 4. Extending Shells for Parallel Computing • Build a simple, powerful coordination layer at the Shell • The coordination layer transparently manages the parallelism in the workflow • User specifies parallel computation as a dataflow graph using extensions to the Shell • Provides the ability to combine different tools and build interesting parallel programs quickly.
  • 5. Shell pipe extensions • Pipeline fork A | B on n procs • Pipeline join A on n procs | B • Pipeline cycles (++ n A) • Pipeline key-value aggregation A | B on keys
  • 6. Parallel shell tasks extensions > function foo() { echo “hello world” } > foo on all procs # foo() on all CPUs > foo on all nodes # foo() on all nodes stride > foo on 10:2 procs # 10 tasks, 2 tasks on each node span > foo on 10:2:2 procs # 10 tasks, 2 tasks on alternative node
  • 7. Composing data-flow graphs • Example 1: function B1() {} B1 function B2() {} A C function B() { B2 if (($_ASPECT_TASKID == 0 )) ; then B1 else B2 endif } A | B on 2 procs | C
  • 8. Composing data-flow graphs • Example 2: function map() { reduce emit_tuple –k key –v value map } Key-value function reduce() DHT { consume_tuple –k key –v value map reduce num=${#value[@]} for ((i=0; i < $num; i++)) ; do # process key=$key, value=${value[$i]} done } map on all procs | reduce on keys
  • 10. Startup Overlay • Script may have many instances requiring startup of parallel tasks • Motivation for overlay: – Fast startup of parallel shell workers – Handles node failures gracefully • Two level hierarchy: sectors and proxies • Overlay node addressing: 7 0 Compute node ID Sector id Proxy id
  • 11. Fault-Tolerance • Proxy nodes monitor peers within sector, and sector heads monitor peer sectors • Node 0 maintains a list of available nodes in the overlay in a master_node file Overlay sector 0 Overlay sector 1 Proxy Node 3 Proxy Node 0 Proxy Node 6 Proxy Node 7 exec exec exec exec Node 2 Node 1 Node 4 Node 5 Proxy Proxy Proxy Proxy exec exec exec exec master_node
  • 12. Starting shell workers with startup overlay
  • 13. 1. Bash spawns agent. 2. Agent queries master_node and spawns node I/O multiplexor Overlay sector 0 Overlay sector 1 Proxy Node 3 Proxy Node 0 Proxy Node 6 Proxy Node 7 exec exec exec exec Node 2 Node 1 Node 4 Node 5 Proxy Proxy Proxy Proxy exec exec exec exec master_node (2) (1) BASH agent (2) Node I/O MUX
  • 14. 3. Agent Invokes overlay to spawn CPU I/O multiplexor on node Overlay sector 0 Overlay sector 1 Proxy Node 3 Proxy Node 0 Proxy Node 6 Proxy Node 7 exec exec exec exec Node 2 Node 1 Node 4 Node 5 Proxy Proxy Proxy Proxy exec exec exec exec (3) (3) (1) BASH agent (2) Node I/O CPU I/O MUX MUX
  • 15. 4. CPU I/O multiplexor spawns a shell worker per CPU on node Overlay sector 0 Overlay sector 1 Proxy Node 3 Proxy Node 0 Proxy Node 6 Proxy Node 7 exec exec exec exec Node 2 Node 1 Node 4 Node 5 Proxy Proxy Proxy Proxy exec exec exec exec (3) (3) (1) BASH agent (2) Node I/O CPU I/O MUX MUX (4) CPU CPUCPU
  • 16. 5. CPU I/O multiplexor calls back to node I/O multiplexor Overlay sector 0 Overlay sector 1 Proxy Node 3 Proxy Node 0 Proxy Node 6 Proxy Node 7 exec exec exec exec Node 2 Node 1 Node 4 Node 5 Proxy Proxy Proxy Proxy exec exec exec exec (3) (1) BASH agent (3) (2) (5) Node I/O CPU I/O MUX MUX (4) CPU CPUCPU
  • 18. 1. Process B pipes stdin into stdin_file A | B on N procs stdin BASH stdout pipe (1) aspect-agent B stdin A reader stdin_file
  • 19. 2. Constructs command files for each task A | B on N procs stdin BASH stdout pipe (1) aspect-agent B Cmd stdin dispatcher A reader (2) stdin_file Cmd files B cat stdin_file | B
  • 20. 3. 4. and 5. Execute command files in shell workers and marshal results back to shell A | B on N procs stdin BASH stdout pipe (1) control stdout aspect-agent B Cmd stdin dispatcher A reader I/O (2) flusher flusher MUX stdin_file flusher (3) qu eue Cmd (5) files Node B Node MUX Node MUX cat stdin_file | B MUX Compute node (4) Shell Shell worker worker B B
  • 21. 6. Replay command files on failure A | B on N procs stdin BASH stdout pipe (1) control stdout aspect-agent B Cmd stdin dispatcher A reader I/O (2) flusher flusher MUX stdin_file flusher replayer (3) (6) qu e Local compute node ue Cmd (5) Shell Shell files Node worker worker B Node MUX Node MUX cat stdin_file | B MUX Compute node (4) B B Shell Shell worker worker B B
  • 23. 1. Agent inspects and hashes key A | B on keys pipe BASH control control (1) aspect-agent B Key A dispatcher
  • 24. 2. Routes key-value to compute node based on key hash, and stored in hash table A | B on keys pipe BASH control control (1) aspect-agent B Key A dispatcher (2) Node MUX Compute node Compute node Distributed Hash Table Hash Hash gdbm table gdbm table
  • 25. 3. Each node constructs command files to pipe the key-value entry from its hash table into process B A | B on keys pipe BASH control control (1) aspect-agent B Key A dispatcher (2) Node MUX Compute node Compute node Distributed Hash Table Hash Hash gdbm table gdbm table emit_tuple emit_tuple (3) B B
  • 26. 4. Results from the command files execution are marshaled back to the shell A | B on keys pipe BASH control control (1) stdout control aspect-agent B Key I/O MUX A dispatcher (2) Node MUX (4) Compute node Compute node Distributed Hash Table Hash Hash gdbm table gdbm table emit_tuple emit_tuple (3) B B
  • 28. Startup overlay performance (when compared to SSH default mechanism)
  • 29. Syntactic benchmark I: performance of pipeline join
  • 30. Syntactic benchmark II: performance of key-value aggregation
  • 31. TeraSort benchmark: Parallel bucket sort • Step 1: spawn the data generator in parallel on each compute node, partitioning data across N nodes for task T if the first 2 bytes fall in the range:  16 T 16 T + 1 2 ∗ N , 2 ∗ N    • Step 2: perform sort on local data on each node • Step 3: merge results onto global file system
  • 32. TeraSort benchmark: Sorting rate
  • 33. Related Work • Ptolemy – embedded system design • Yahoo Pipes – web content filtering • Hadoop – Java implementation of MapReduce • Dryad - distributed DAG data flow computation
  • 34. Conclusion • A debugger would be extremely helpful. Working on bashdb implementation. • Run-time simulator would be helpful to predict performance based on characteristics of cluster. • Still thinking about how to incorporate our extensions for named pipes (i.e. mkfifo).