Flux is a YAML DSL for easily configuring and deploying Apache Storm topologies without embedding configuration in code. It allows defining Storm components, streams, and topology configuration in a YAML file. This file can then be used with the Flux tool to locally run or remotely deploy the topology. Flux aims to reduce duplication of effort by separating configuration from code and allowing topologies to be deployed by submitting a single YAML file without recompilation. It supports Storm core APIs as well as existing topology code and components from other projects like Storm-Kafka.
2. About me…
• VP - Apache Storm
• ASF Member
• Member of Technical Staff, Hortonworks
3. What is Flux?
• An easier way to configure and deploy Apache Storm topologies
• A YAML DSL for defining and configuring Storm topologies
• And more…
6. What’s wrong here?
public static void main(String[] args) throws Exception {
String name = "myTopology";
StormTopology topology = getTopology();
// logic to determine if we're running locally or not…
boolean runLocal = shouldRunLocal();
// create necessary config options…
Config conf = new Config();
if(runLocal){
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, conf, topology);
} else {
StormSubmitter.submitTopology(name, conf, topology);
}
}
7. What’s wrong here?
public static void main(String[] args) throws Exception {
String name = "myTopology";
StormTopology topology = getTopology();
// logic to determine if we're running locally or not…
boolean runLocal = shouldRunLocal();
// create necessary config options…
Config conf = new Config();
if(runLocal){
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, conf, topology);
} else {
StormSubmitter.submitTopology(name, conf, topology);
}
}
• Configuration tightly coupled with code.
• Changes require recompilation & repackaging.
8. Wouldn’t this be easier?
storm jar mycomponents.jar org.apache.storm.flux.Flux --local config.yaml
OR…
storm jar mycomponents.jar org.apache.storm.flux.Flux --remote config.yaml
9. Flux allows you to package all your Storm components once.
Then wire, configure, and deploy topologies using a YAML
definition.
10. Flux Features
• Easily configure and deploy Storm topologies (Both Storm core and Microbatch
API) without embedding configuration in your topology code
• Support for existing topology code
• Define Storm Core API (Spouts/Bolts) using a flexible YAML DSL
• YAML DSL support for virtually any Storm component (storm-kafka, storm-hdfs,
storm-hbase, etc.)
• Convenient support for multi-lang components
• External property substitution/filtering for easily switching between
configurations/environments (similar to Maven-style ${variable.name}
substitution)
13. Config
A Map-of-Maps (Objects) that will be passed to the topology at
submission time (Storm config).
# topology name
name: “myTopology"
# topology configuration
config:
topology.workers: 5
topology.max.spout.pending: 1000
# ...
14. Components
• Catalog (list/map) of Objects that can be used/referenced in other
parts of the YAML configuration
• Roughly analogous to Spring beans.
15. Components
Simple Java class with default constructor:
# Components
components:
- id: "stringScheme"
className: "storm.kafka.StringScheme"
16. Components: Constructor Arguments
Component classes can be instantiate with “constructorArgs” (a list of
class constructor arguments):
# Components
components:
- id: "zkHosts"
className: "storm.kafka.ZkHosts"
constructorArgs:
- "localhost:2181"
17. Components: References
Components can be “referenced” throughout the YAML config and
used as arguments:
# Components
components:
- id: "stringScheme"
className: "storm.kafka.StringScheme"
- id: "stringMultiScheme"
className: "backtype.storm.spout.SchemeAsMultiScheme"
constructorArgs:
- ref: "stringScheme"
18. Components: Properties
Components can be configured using JavaBean setter methods and
public instance variables:
- id: "spoutConfig"
className: "storm.kafka.SpoutConfig"
properties:
- name: "forceFromStart"
value: true
- name: "scheme"
ref: "stringMultiScheme"
19. Components: Config Methods
Call arbitrary methods to configure a component:
- id: "recordFormat"
className:
"org.apache.storm.hdfs.bolt.format.DelimitedRecordFormat"
configMethods:
- name: "withFieldDelimiter"
args: ["|"]
References can be used here as well.
20. Spouts
A list of objects that implement the IRichSpout interface and an
associated parallelism setting.
# spout definitions
spouts:
- id: "sentence-spout"
className: "org.apache.storm.flux.wrappers.spouts.FluxShellSpout"
# shell spout constructor takes 2 arguments: String[], String[]
constructorArgs:
# command line
- ["node", "randomsentence.js"]
# output fields
- ["word"]
parallelism: 1
# ...
21. Bolts
A list of objects that implement the IRichBolt or IBasicBolt interface with
an associated parallelism setting.
# bolt definitions
bolts:
- id: "splitsentence"
className: "org.apache.storm.flux.wrappers.bolts.FluxShellBolt"
constructorArgs:
# command line
- ["python", "splitsentence.py"]
# output fields
- ["word"]
parallelism: 1
# ...
- id: "count"
className: "backtype.storm.testing.TestWordCounter"
parallelism: 1
# ...
22. Spout and Bolt definitions are just extensions of
“Component” with a “parallelism” attribute, so all
component features (references, constructor
args, properties, config methods) can be used.
23. Streams
• Represent Spout-to-Bolt and Bolt-to-Bolt connections
• In graph terms: “edges”
• Also define Stream Groupings:
• ALL, CUSTOM, DIRECT, SHUFFLE, LOCAL_OR_SHUFFLE,
FIELDS, GLOBAL, or NONE.
24. Streams
Custom stream grouping:
- from: "bolt-1"
to: "bolt-2"
grouping:
type: CUSTOM
customClass:
className: "backtype.storm.testing.NGrouping"
constructorArgs:
- 1
Again, you can use references, properties, and config methods.
25. Filtering/Variable Substitution
Define properties in an external properties file, and reference them in
YAML using ${} syntax:
- id: "rotationAction"
className:
"org.apache.storm.hdfs.common.rotation.MoveFileAction"
configMethods:
- name: "toDestination"
args: ["${hdfs.dest.dir}"]
Will get replaced with value of property prior to YAML parsing.
26. Filtering/Variable Substitution
Environment variables can be referenced in YAML using ${ENV-}
syntax:
- id: "rotationAction"
className:
"org.apache.storm.hdfs.common.rotation.MoveFileAction"
configMethods:
- name: "toDestination"
args: [“${ENV-HDFS_DIR}”]
Will get replaced with value of $HDFS_DIR env variable prior to YAML parsing.
27. File Includes and Overrides
Include files/classpath resources and optionally override values:
name: "include-topology"
includes:
- resource: true
file: "/configs/shell_test.yaml"
override: false #otherwise subsequent includes that define 'name'
would override
29. Existing Topologies
• Alternative to YAML Spout/Bolt/Stream DSL
• Same syntax
• Works with transactional/micro-batch (Trident) topologies
• Tell Flux about the class that will produce your topology
• Components, references, constructor args, properties, config
methods, etc. can all be used
30. Existing Topologies
Provide a class with a public method that returns a StormTopology
instance:
/**
* Marker interface for objects that can produce `StormTopology` objects.
*
* If a `topology-source` class implements the `getTopology()` method, Flux will
* call that method. Otherwise, it will introspect the given class and look for a
* similar method that produces a `StormTopology` instance.
*
* Note that it is not strictly necessary for a class to implement this interface.
* If a class defines a method with a similar signature, Flux should be able to find
* and invoke it.
*
*/
public interface TopologySource {
public StormTopology getTopology(Map<String, Object> config);
}
This can be a Spout/Bolt or Trident topology.
31. Existing Topologies
Define a topologySource to tell Flux how to configure the class that
creates the topology:
# configuration that uses an existing topology that does not implement
TopologySource
name: "existing-topology"
topologySource:
className: "org.apache.storm.flux.test.SimpleTopology"
methodName: "getTopologyWithDifferentMethodName"
constructorArgs:
- "foo"
- "bar"
Components, references, constructor args, properties,
config methods, etc. can all be used.
32. Flux Usage
• Add the Flux dependency to your project.
• Use the Maven shade plugin to create a fat jar file.
• Use the `storm` command to run (locally) or deploy (remotely) your
topology:
storm jar mycomponents.jar org.apache.storm.flux.Flux [options] <config file>
33. Flux Usage: Command Line Options
usage: storm jar <my_topology_uber_jar.jar> org.apache.storm.flux.Flux [options] <topology-config.yaml>
-d,--dry-run Do not run or deploy the topology. Just build, validate, and print information about the
topology.
-e,--env-filter Perform environment variable substitution. Replace keysidentified with `${ENV-[NAME]}` will be
replaced with the corresponding `NAME` environment value
-f,--filter <file> Perform property substitution. Use the specified file as a source of properties, and replace
keys identified with {$[property name]} with the value defined in the properties file.
-i,--inactive Deploy the topology, but do not activate it.
-l,--local Run the topology in local mode.
-n,--no-splash Suppress the printing of the splash screen.
-q,--no-detail Suppress the printing of topology details.
-r,--remote Deploy the topology to a remote cluster.
-R,--resource Treat the supplied path as a class path resource instead of a file.
-s,--sleep <ms> When running locally, the amount of time to sleep (in ms.) before killing the topology and
shutting down the local cluster.
-z,--zookeeper <host:port> When running in local mode, use the ZooKeeper at the specified <host>:<port> instead of the
in-process ZooKeeper. (requires Storm 0.9.3 or later)
34. With great power comes great
responsibility.
It’s up to you to avoid shooting yourself in the foot!