A quick overview of the seed for Meandre 2.0 series. It covers the main motivations moving forward and the disruptive changes introduced via the use of Scala and MongoDB
1. Xavier Llorà
Data-Intensive Technologies and Applications
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
xllora@illinois.edu
2. • Great feedback and lessons learned from 1.4.X series
• Hot topics on 1.4.X
• Complex concurrency model based on traditional
semaphores written in Java
• Server performance bounded by JENA’s persistent
model implementation
• State caching on individual servers increase
complexity of single-image clusters
• Cloud-deployable, but not cloud-friendly
3. • How 1.5 efforts turned into 2.0?
• Cloud-friendly infrastructure required rethinking
core functionalities
• Drastic redesign of backend state storage
• Revisited execution engine to support distributed
flow execution
• Changes on the API that will rendered returned
JSON documents incompatible with 1.4.X
4.
5. • Rewritten from scratch in Scala
• RDBMS backend via Jena/JDBC has been dropped
• MongoDB for state management and scalability
• Meandre 2.0 server is stateless
• Meandre API revised
• Revised response documents
• Simplified API (reduced the number of services)
• New job API
6. • New HTML interaction interface
• Off-the-shelf full-fledged single-image cluster
• Revised flow execution lifecycle: Queued, Preparing,
Running, Done, Failed, Killed, Aborted
• Flow execution as a separate spawned process.
Multiple execution engines are available
• Running flows can be killed on demand
• Rewritten execution engine (Snowfield)
• Support for distributed flow fragment execution
8. • MongoDB bridges the gap between
• Key-value stores (which are fast and highly
scalable)
• Traditional RDBMS systems (which provide rich
queries and deep functionality)
• MongoDB supports replication of data between
servers for failover and redundancy
• MongoDB is designed to scale horizontally via an
auto-sharding permitting the development of large-
9. • Fast REST API prototyping and development for Scala
• Built on the top of Jetty (http://jetty.codehaus.org/
jetty/)
• Enables quick prototyping of REST APIs
• Provides a simple DSL build on Scala
• Developed to support the development of Meandre
2.0
• http://github.com/xllora/Crochet
10. import crochet._
new Crochet {
get(“/message”,“text/plain”) { “Hello World!” }
} serving “./static_content” as “/static” on 8080
Get your server up and running by running
$ scala
-cp crochet-0.1.4.jar:crochet-3dparty-libraries-0.1.X.jar
hello-world-with-static.scala
11. • Notification fabric for distributed Scala applications
• Back ended on MongoDB for scalability
• Snare monitors developed for Meandre 2.0
• Track activity via heartbeat
• Provide messaging between monitors and global
broadcasting of BSON objects
• Basic monitoring over HTTP via Crochet
• http://github.com/xllora/Snare
12. scala> import snare.tools.Implicits._
scala> val monitors = (1 to 3).toList.map(
i => snare.Snare(
"me_"+i,
“my_pool”,
(o)=>{println(o);true}
)
)
scala> monitors.map(_.activity=true)
2010.01.28 16:47:05.222::INFO:[EVTL] Notification event loop engaged for 230815e0-30cc-3afe-99ac-936d497d1282
2010.01.28 16:47:05.231::INFO:[EVTL] Notification event loop engaged for baec232f-d74d-3fd1-ad3a-caf362f58b7d
2010.01.28 16:47:05.236::INFO:[EVTL] Notification event loop engaged for d057fcde-fd10-3edd-9fd2-cfe464c6971c
2010.01.28 16:47:08.136::INFO:[HRTB] Heartbeat engaged for baec232f-d74d-3fd1-ad3a-caf362f58b7d
2010.01.28 16:47:08.136::INFO:[HRTB] Heartbeat engaged for 230815e0-30cc-3afe-99ac-936d497d1282
2010.01.28 16:47:08.136::INFO:[HRTB] Heartbeat engaged for d057fcde-fd10-3edd-9fd2-cfe464c6971c
scala> monitors(0).broadcast("""{"msg":“Fooo!!!”}""")
scala> monitors(0).notifyPeer(
“230815e0-30cc-3afe-99ac-936d497d1282”,
"""{"msg":“Fooo!!!”}"""
)
13.
14. • Meandre 2.0 requires at least 2 separate services
running
• A MongoDB for shared state storage and
management
• A Meandre server to provide services (via Crochet)
and facilitate execution (customizable execution
engines)
• A single-image Meandre cluster scales horizontally
by adding new Meandre servers and sharding the
MongoDB store
15. • Can be broken in three basic functional units
1. The Meandre server (main activity coordinator)
2. The MongoDB store (holds all server state, job
related information, and system information)
3. Meandre customizable executor (in charge of flow
execution allowing selection of multiple
execution engines)
16. Crochet
Server
State API Snare Monitor Job Manager API
User Info, Profiles & Roles •Execution coordination
Repositories •Spawn external jobs for execution
•Customizable execution engine
Unified Job Queue
•On job running per server
Job Consoles and Logs •Allow consuming all server
Snare Cluster Status & Heartbeat resources
17. • A cluster is formed by one or more Meandre servers
• MongoDB scalability can support tens of Meandre
servers with a single instance
• Adding more Meandre servers allows:
• Provide web service load balance
• Fault tolerance
• Improving the throughput of job execution
(number of concurrent jobs is equal to the number
of Meandre servers in the cluster)
18. Crochet
Server
State API
Balancing
Snare Monitor
Job Manager API
Crochet
Server
Load
State API
Snare Monitor
Job Manager API
User Info, Profiles & Roles
Crochet
Repositories
Server
Unified Job Queue
State API
Job Consoles and Logs
Snare Monitor
Snare Cluster Status & Heartbeat Job Manager API
19. • A single image cluster can be scaled out by relying
on MongoDB
• MongoDB is the key to as single-image cluster
• Starting at 1.6.X MongoDB provides production
ready autosharding
• State scalability via sharded collections allows to
keep scaling up a single-image large-scale Meandre
Cluster
22. • The response messages have been revised
• Homogenized the structure of the response contents
• Revisit execution mechanics
• Introduce a new job API that helps
• Submit jobs for execution
• Track them (monitor state, kill, etc.)
• Inspect console and logs in real time
23. • Repository API
Manage user repository of components and flows
• Location API
Manage locations from where components and flows
can be imported into a user repository
• Security API
Allow administrators to manage users and their
profiles and roles in a given cluster
24. • Publish API
Helps manage the components and flows that get
published to the publicly shared global repository
• Cluster management & logs API
The cluster management API mostly focus on cluster
monitoring (via Snare web monitor), selective server/
cluster shutdown, and access to server/cluster logs
• Job API
The new job API allows to submit, monitor, and
control jobs submitted for execution to a cluster
25. • Public API
Miscellaneous public services providing access to the
public repository, demo repository, and pinging
services (targeted to specific servers)
26. • The prefix of the rest API is configurable
• Each call specifies the response format using a simple
file extension convention
• The next few slides provides a raw list of the revisited
API (further details should be looked up on the
Meandre documentation website)
35. • Already mentioned that flows in Meandre 2.0 are
spawn on a separate process
• The execution process is a wrapper
• STDIN: Read the repository RDF to execute
• STDOUT: Outputs the console flow output
• STDERR: Outputs of the logs of the flow
• Console and logs are streamed and archive by the
Meandre server in real time
36. • Console and logs are linked to job submission
• Users can query anytime for consoles and logs and
they will get the current contents
• Once flow execution finishes consoles and logs are
compacted but are still available on demand
37. Crochet
Control
Server
Flow & Components
RDF (STDIN)
State API
Snare Monitor Job Manager API
Console (STDOUT) Spawned
•Consoles Flow Execution
•Logs
Logs (STDIN) Process
•Job tracking
38. • Meandre 2.0 server does not provide any execution
facility. Instead, it spawns a separate process
• The process is pass a command-line parameter (the
port number for the WebUI)
• The process is assumed to read the repository to
execute (flow and required components RDF)
• Reads console (STDOUT) and logs (STDERR) and
pushes them into MongoDB
• It is able to terminate a spawned job on demand
39. • The Job API submit service accepts a parameter
(“wrapper”) that allows you to request specific
execution engines.
• The default execution engines provided in 2.0 are
• echo: Just reprints the input RDF to the console
and logs beginning and end of execution
• 1.4.x: The latest execution engine released on the
1.4.x series
• snowfield: The revamped Meandre 2.0 execution
engine (also the basic execution piece of
distributed execution of flows)
40. • All execution engines are place on the
<MEANDRE_HOME>/scripts directory
• All execution engines are lunch via Scala scripts
using the name convention
execution_<NAME>.scala
• The provided execution engines are named
• execution_echo.scala
• execution_1.4.x.scala
• execution_snowfield.scala
41. • You can add an execution engine by adding a script
following the previous naming convention. For
instance, execution engine my_engine will require
a Scala wrapper place in the <MEANDRE_HOME>/
scripts folder named
execution_my_engine.scala
• You can request your customized execution engine
by submitting jobs via the REST API and add the
parameter &wrapper=my_engine
42.
43. • The introduction of the Job API have refined flow
lifecycle
• 1.4.X execution was on demand (potentially
overloading the box)
• 2.0.X introduces a refine execution state
44. Done
ne
Execution successfully
e
y gi
bl
User request
ad en
la
completed
ai
re on
av
i
ut
er
ec
rv
Ex
Se
Submitted Preparing Running
Inf
In
ras
fra
t ruc
Infrast
st
tur
ru
e
ct
ur
fai
e
ructure
lur
e
fa
ilu
re
failure
w
Bad-behaved flow
equest
flo
st
ed
e
qu
av
est
User r
e
eh
u
rr req
-b
se er
U Us Ba
d
Aborted Failed Killed
45.
46. • Data-driven execution
• No centralized control
• Designed with multi and many cores in mind
• The underlying assumption
• One thread per component
• Finite buffers on input ports to decouple
production/consumption
• Ideal for share-memory machines (e.g. Cobalt)
48. • Two other threads are created
• Mr. Proper
• This threads monitors the status of component threads
• If no thread running and no data, then flow is done, time
to clean
• Mr. Probe can
• Record firing events
• Data in the buffers
• Component state
50. • Key benefit for Meandre after the Scala transition
• High level parallel constructs
• Simple concurrency model
• Actors modeled after Erlang
• Actors are light weight when compared to threads
• Configurable scheduling for actors
51. • Actors are the primitive of concurrent computation
• Actors respond to messages they receive
• Actors perform local operations
• Actors send messages to other actors
• Actors can create new actors
52. JVM
C1 C3 C6
A1 A3 A6
C2 C4 C5
Actor
A2 A4 A5 Scheduler
Mr. Probing Proper
A0
53. • Abstraction
• Break the relation between components and threads
• Minimize context switching between threads
• Main benefit
• Simple communication model
• Trivial to distribute!
54. C1 C3 C2 C4
A1 A3 A2 A4
Actor Actor
Scheduler Scheduler
JVM1 JVM2 Mr. Probing Proper
A0
Actor Actor Actor
Scheduler Scheduler Scheduler
C5 C6
JVM0
A5 A6
JVM3 JVM4
55. • Now JVM can be place on different machines
• Questions?
• How do I group components in JVMs?
• Where do I place the JVMs
• Scheduling and mapping relies on 3rd parties
• Manually by user
• Model 1 job and let the grid do the allocation (e.g.
Abe, Blue Waters)
• Cloud orchestrated
56. Xavier Llorà
Data-Intensive Technologies and Applications
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
xllora@illinois.edu