SlideShare une entreprise Scribd logo
1  sur  36
███████╗██╗ ██╗ ██╗██╗ ██╗
██╔════╝██║ ██║ ██║╚██╗██╔╝
█████╗ ██║ ██║ ██║ ╚███╔╝
██╔══╝ ██║ ██║ ██║ ██╔██╗
██║ ███████╗╚██████╔╝██╔╝
██╗
╚═╝ ╚══════╝ ╚═════╝ ╚═╝ ╚═╝
Apache Storm
Frictionless Topology Configuration & Deployment
P. Taylor Goetz, Hortonworks
@ptgoetz
Storm BoF - Hadoop Summit Brussels 2015
About me…
• VP - Apache Storm
• ASF Member
• Member of Technical Staff, Hortonworks
What is Flux?
• An easier way to configure and deploy Apache Storm topologies
• A YAML DSL for defining and configuring Storm topologies
• And more…
Why Flux?
Because seeing duplication of
effort makes me sad…
What’s wrong here?
public static void main(String[] args) throws Exception {
String name = "myTopology";
StormTopology topology = getTopology();
// logic to determine if we're running locally or not…
boolean runLocal = shouldRunLocal();
// create necessary config options…
Config conf = new Config();
if(runLocal){
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, conf, topology);
} else {
StormSubmitter.submitTopology(name, conf, topology);
}
}
What’s wrong here?
public static void main(String[] args) throws Exception {
String name = "myTopology";
StormTopology topology = getTopology();
// logic to determine if we're running locally or not…
boolean runLocal = shouldRunLocal();
// create necessary config options…
Config conf = new Config();
if(runLocal){
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, conf, topology);
} else {
StormSubmitter.submitTopology(name, conf, topology);
}
}
• Configuration tightly coupled with code.
• Changes require recompilation & repackaging.
Wouldn’t this be easier?
storm jar mycomponents.jar org.apache.storm.flux.Flux --local config.yaml
OR…
storm jar mycomponents.jar org.apache.storm.flux.Flux --remote config.yaml
Flux allows you to package all your Storm components once.
Then wire, configure, and deploy topologies using a YAML
definition.
Flux Features
• Easily configure and deploy Storm topologies (Both Storm core and Microbatch
API) without embedding configuration in your topology code
• Support for existing topology code
• Define Storm Core API (Spouts/Bolts) using a flexible YAML DSL
• YAML DSL support for virtually any Storm component (storm-kafka, storm-hdfs,
storm-hbase, etc.)
• Convenient support for multi-lang components
• External property substitution/filtering for easily switching between
configurations/environments (similar to Maven-style ${variable.name}
substitution)
Flux YAML DSL
YAML Definition Consists of:
• Topology Name (1)
• Includes (0…*)
• Config Map (0…1)
• Components (0…*)
• Spouts (1…*)
• Bolts (1…*)
• Streams (1…*)
Flux YAML DSL
Config
A Map-of-Maps (Objects) that will be passed to the topology at
submission time (Storm config).
# topology name
name: “myTopology"
# topology configuration
config:
topology.workers: 5
topology.max.spout.pending: 1000
# ...
Components
• Catalog (list/map) of Objects that can be used/referenced in other
parts of the YAML configuration
• Roughly analogous to Spring beans.
Components
Simple Java class with default constructor:
# Components
components:
- id: "stringScheme"
className: "storm.kafka.StringScheme"
Components: Constructor Arguments
Component classes can be instantiate with “constructorArgs” (a list of
class constructor arguments):
# Components
components:
- id: "zkHosts"
className: "storm.kafka.ZkHosts"
constructorArgs:
- "localhost:2181"
Components: References
Components can be “referenced” throughout the YAML config and
used as arguments:
# Components
components:
- id: "stringScheme"
className: "storm.kafka.StringScheme"
- id: "stringMultiScheme"
className: "backtype.storm.spout.SchemeAsMultiScheme"
constructorArgs:
- ref: "stringScheme"
Components: Properties
Components can be configured using JavaBean setter methods and
public instance variables:
- id: "spoutConfig"
className: "storm.kafka.SpoutConfig"
properties:
- name: "forceFromStart"
value: true
- name: "scheme"
ref: "stringMultiScheme"
Components: Config Methods
Call arbitrary methods to configure a component:
- id: "recordFormat"
className:
"org.apache.storm.hdfs.bolt.format.DelimitedRecordFormat"
configMethods:
- name: "withFieldDelimiter"
args: ["|"]
References can be used here as well.
Spouts
A list of objects that implement the IRichSpout interface and an
associated parallelism setting.
# spout definitions
spouts:
- id: "sentence-spout"
className: "org.apache.storm.flux.wrappers.spouts.FluxShellSpout"
# shell spout constructor takes 2 arguments: String[], String[]
constructorArgs:
# command line
- ["node", "randomsentence.js"]
# output fields
- ["word"]
parallelism: 1
# ...
Bolts
A list of objects that implement the IRichBolt or IBasicBolt interface with
an associated parallelism setting.
# bolt definitions
bolts:
- id: "splitsentence"
className: "org.apache.storm.flux.wrappers.bolts.FluxShellBolt"
constructorArgs:
# command line
- ["python", "splitsentence.py"]
# output fields
- ["word"]
parallelism: 1
# ...
- id: "count"
className: "backtype.storm.testing.TestWordCounter"
parallelism: 1
# ...
Spout and Bolt definitions are just extensions of
“Component” with a “parallelism” attribute, so all
component features (references, constructor
args, properties, config methods) can be used.
Streams
• Represent Spout-to-Bolt and Bolt-to-Bolt connections
• In graph terms: “edges”
• Also define Stream Groupings:
• ALL, CUSTOM, DIRECT, SHUFFLE, LOCAL_OR_SHUFFLE,
FIELDS, GLOBAL, or NONE.
Streams
Custom stream grouping:
- from: "bolt-1"
to: "bolt-2"
grouping:
type: CUSTOM
customClass:
className: "backtype.storm.testing.NGrouping"
constructorArgs:
- 1
Again, you can use references, properties, and config methods.
Filtering/Variable Substitution
Define properties in an external properties file, and reference them in
YAML using ${} syntax:
- id: "rotationAction"
className:
"org.apache.storm.hdfs.common.rotation.MoveFileAction"
configMethods:
- name: "toDestination"
args: ["${hdfs.dest.dir}"]
Will get replaced with value of property prior to YAML parsing.
Filtering/Variable Substitution
Environment variables can be referenced in YAML using ${ENV-}
syntax:
- id: "rotationAction"
className:
"org.apache.storm.hdfs.common.rotation.MoveFileAction"
configMethods:
- name: "toDestination"
args: [“${ENV-HDFS_DIR}”]
Will get replaced with value of $HDFS_DIR env variable prior to YAML parsing.
File Includes and Overrides
Include files/classpath resources and optionally override values:
name: "include-topology"
includes:
- resource: true
file: "/configs/shell_test.yaml"
override: false #otherwise subsequent includes that define 'name'
would override
Existing Topologies
&
Trident Topologies
Existing Topologies
• Alternative to YAML Spout/Bolt/Stream DSL
• Same syntax
• Works with transactional/micro-batch (Trident) topologies
• Tell Flux about the class that will produce your topology
• Components, references, constructor args, properties, config
methods, etc. can all be used
Existing Topologies
Provide a class with a public method that returns a StormTopology
instance:
/**
* Marker interface for objects that can produce `StormTopology` objects.
*
* If a `topology-source` class implements the `getTopology()` method, Flux will
* call that method. Otherwise, it will introspect the given class and look for a
* similar method that produces a `StormTopology` instance.
*
* Note that it is not strictly necessary for a class to implement this interface.
* If a class defines a method with a similar signature, Flux should be able to find
* and invoke it.
*
*/
public interface TopologySource {
public StormTopology getTopology(Map<String, Object> config);
}
This can be a Spout/Bolt or Trident topology.
Existing Topologies
Define a topologySource to tell Flux how to configure the class that
creates the topology:
# configuration that uses an existing topology that does not implement
TopologySource
name: "existing-topology"
topologySource:
className: "org.apache.storm.flux.test.SimpleTopology"
methodName: "getTopologyWithDifferentMethodName"
constructorArgs:
- "foo"
- "bar"
Components, references, constructor args, properties,
config methods, etc. can all be used.
Flux Usage
• Add the Flux dependency to your project.
• Use the Maven shade plugin to create a fat jar file.
• Use the `storm` command to run (locally) or deploy (remotely) your
topology:
storm jar mycomponents.jar org.apache.storm.flux.Flux [options] <config file>
Flux Usage: Command Line Options
usage: storm jar <my_topology_uber_jar.jar> org.apache.storm.flux.Flux [options] <topology-config.yaml>
-d,--dry-run Do not run or deploy the topology. Just build, validate, and print information about the
topology.
-e,--env-filter Perform environment variable substitution. Replace keysidentified with `${ENV-[NAME]}` will be
replaced with the corresponding `NAME` environment value
-f,--filter <file> Perform property substitution. Use the specified file as a source of properties, and replace
keys identified with {$[property name]} with the value defined in the properties file.
-i,--inactive Deploy the topology, but do not activate it.
-l,--local Run the topology in local mode.
-n,--no-splash Suppress the printing of the splash screen.
-q,--no-detail Suppress the printing of topology details.
-r,--remote Deploy the topology to a remote cluster.
-R,--resource Treat the supplied path as a class path resource instead of a file.
-s,--sleep <ms> When running locally, the amount of time to sleep (in ms.) before killing the topology and
shutting down the local cluster.
-z,--zookeeper <host:port> When running in local mode, use the ZooKeeper at the specified <host>:<port> instead of the
in-process ZooKeeper. (requires Storm 0.9.3 or later)
With great power comes great
responsibility.
It’s up to you to avoid shooting yourself in the foot!
Feedback/Contributions Welcome
https://github.com/ptgoetz/fluxFlux on GitHub:
Thank you! AMA…
P. Taylor Goetz, Hortonworks
@ptgoetz

Contenu connexe

Tendances

Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) BigDataEverywhere
 
Kotlin coroutines and spring framework
Kotlin coroutines and spring frameworkKotlin coroutines and spring framework
Kotlin coroutines and spring frameworkSunghyouk Bae
 
Python client api
Python client apiPython client api
Python client apidreampuf
 
Building node.js applications with Database Jones
Building node.js applications with Database JonesBuilding node.js applications with Database Jones
Building node.js applications with Database JonesJohn David Duncan
 
JUnit5 and TestContainers
JUnit5 and TestContainersJUnit5 and TestContainers
JUnit5 and TestContainersSunghyouk Bae
 
Backbone.js: Run your Application Inside The Browser
Backbone.js: Run your Application Inside The BrowserBackbone.js: Run your Application Inside The Browser
Backbone.js: Run your Application Inside The BrowserHoward Lewis Ship
 
Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018Sunghyouk Bae
 
Zabbix LLD from a C Module by Jan-Piet Mens
Zabbix LLD from a C Module by Jan-Piet MensZabbix LLD from a C Module by Jan-Piet Mens
Zabbix LLD from a C Module by Jan-Piet MensNETWAYS
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)Qiangning Hong
 
Down to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap DumpsDown to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap DumpsAndrei Pangin
 
SQL Server Select Topics
SQL Server Select TopicsSQL Server Select Topics
SQL Server Select TopicsJay Coskey
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207Jay Coskey
 
XQuery Extensions
XQuery ExtensionsXQuery Extensions
XQuery ExtensionsAaron Buma
 
Developing for Node.JS with MySQL and NoSQL
Developing for Node.JS with MySQL and NoSQLDeveloping for Node.JS with MySQL and NoSQL
Developing for Node.JS with MySQL and NoSQLJohn David Duncan
 
Scala ActiveRecord
Scala ActiveRecordScala ActiveRecord
Scala ActiveRecordscalaconfjp
 

Tendances (20)

Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
 
Kotlin coroutines and spring framework
Kotlin coroutines and spring frameworkKotlin coroutines and spring framework
Kotlin coroutines and spring framework
 
Spring data requery
Spring data requerySpring data requery
Spring data requery
 
Python client api
Python client apiPython client api
Python client api
 
Building node.js applications with Database Jones
Building node.js applications with Database JonesBuilding node.js applications with Database Jones
Building node.js applications with Database Jones
 
JUnit5 and TestContainers
JUnit5 and TestContainersJUnit5 and TestContainers
JUnit5 and TestContainers
 
Backbone.js: Run your Application Inside The Browser
Backbone.js: Run your Application Inside The BrowserBackbone.js: Run your Application Inside The Browser
Backbone.js: Run your Application Inside The Browser
 
Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018
 
Scala active record
Scala active recordScala active record
Scala active record
 
Zabbix LLD from a C Module by Jan-Piet Mens
Zabbix LLD from a C Module by Jan-Piet MensZabbix LLD from a C Module by Jan-Piet Mens
Zabbix LLD from a C Module by Jan-Piet Mens
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
 
Down to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap DumpsDown to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap Dumps
 
Java 7 New Features
Java 7 New FeaturesJava 7 New Features
Java 7 New Features
 
SQL Server Select Topics
SQL Server Select TopicsSQL Server Select Topics
SQL Server Select Topics
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207
 
XQuery Extensions
XQuery ExtensionsXQuery Extensions
XQuery Extensions
 
Developing for Node.JS with MySQL and NoSQL
Developing for Node.JS with MySQL and NoSQLDeveloping for Node.JS with MySQL and NoSQL
Developing for Node.JS with MySQL and NoSQL
 
Polyglot Persistence
Polyglot PersistencePolyglot Persistence
Polyglot Persistence
 
XQuery Design Patterns
XQuery Design PatternsXQuery Design Patterns
XQuery Design Patterns
 
Scala ActiveRecord
Scala ActiveRecordScala ActiveRecord
Scala ActiveRecord
 

Similaire à Flux: Apache Storm Frictionless Topology Configuration & Deployment

Terraform at Scale - All Day DevOps 2017
Terraform at Scale - All Day DevOps 2017Terraform at Scale - All Day DevOps 2017
Terraform at Scale - All Day DevOps 2017Jonathon Brouse
 
Comprehensive Terraform Training
Comprehensive Terraform TrainingComprehensive Terraform Training
Comprehensive Terraform TrainingYevgeniy Brikman
 
Infrastructure as code deployed using Stacker
Infrastructure as code deployed using StackerInfrastructure as code deployed using Stacker
Infrastructure as code deployed using StackerMessageMedia
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache MesosJoe Stein
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormDavorin Vukelic
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache MesosJoe Stein
 
Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Prajal Kulkarni
 
Getting started with Clojure
Getting started with ClojureGetting started with Clojure
Getting started with ClojureJohn Stevenson
 
Riak add presentation
Riak add presentationRiak add presentation
Riak add presentationIlya Bogunov
 
Terraform infraestructura como código
Terraform infraestructura como códigoTerraform infraestructura como código
Terraform infraestructura como códigoVictor Adsuar
 
Testing NodeJS with Mocha, Should, Sinon, and JSCoverage
Testing NodeJS with Mocha, Should, Sinon, and JSCoverageTesting NodeJS with Mocha, Should, Sinon, and JSCoverage
Testing NodeJS with Mocha, Should, Sinon, and JSCoveragemlilley
 
DZone Java 8 Block Buster: Query Databases Using Streams
DZone Java 8 Block Buster: Query Databases Using StreamsDZone Java 8 Block Buster: Query Databases Using Streams
DZone Java 8 Block Buster: Query Databases Using StreamsSpeedment, Inc.
 
JavaScript Growing Up
JavaScript Growing UpJavaScript Growing Up
JavaScript Growing UpDavid Padbury
 
Solr As A SparkSQL DataSource
Solr As A SparkSQL DataSourceSolr As A SparkSQL DataSource
Solr As A SparkSQL DataSourceSpark Summit
 

Similaire à Flux: Apache Storm Frictionless Topology Configuration & Deployment (20)

Terraform at Scale - All Day DevOps 2017
Terraform at Scale - All Day DevOps 2017Terraform at Scale - All Day DevOps 2017
Terraform at Scale - All Day DevOps 2017
 
Lobos Introduction
Lobos IntroductionLobos Introduction
Lobos Introduction
 
Comprehensive Terraform Training
Comprehensive Terraform TrainingComprehensive Terraform Training
Comprehensive Terraform Training
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Infrastructure as code deployed using Stacker
Infrastructure as code deployed using StackerInfrastructure as code deployed using Stacker
Infrastructure as code deployed using Stacker
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache Mesos
 
Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Getting started with Clojure
Getting started with ClojureGetting started with Clojure
Getting started with Clojure
 
Riak add presentation
Riak add presentationRiak add presentation
Riak add presentation
 
Jstl Guide
Jstl GuideJstl Guide
Jstl Guide
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Terraform infraestructura como código
Terraform infraestructura como códigoTerraform infraestructura como código
Terraform infraestructura como código
 
Testing NodeJS with Mocha, Should, Sinon, and JSCoverage
Testing NodeJS with Mocha, Should, Sinon, and JSCoverageTesting NodeJS with Mocha, Should, Sinon, and JSCoverage
Testing NodeJS with Mocha, Should, Sinon, and JSCoverage
 
DZone Java 8 Block Buster: Query Databases Using Streams
DZone Java 8 Block Buster: Query Databases Using StreamsDZone Java 8 Block Buster: Query Databases Using Streams
DZone Java 8 Block Buster: Query Databases Using Streams
 
Storm
StormStorm
Storm
 
JavaScript Growing Up
JavaScript Growing UpJavaScript Growing Up
JavaScript Growing Up
 
Solr As A SparkSQL DataSource
Solr As A SparkSQL DataSourceSolr As A SparkSQL DataSource
Solr As A SparkSQL DataSource
 

Plus de P. Taylor Goetz

From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...P. Taylor Goetz
 
Past, Present, and Future of Apache Storm
Past, Present, and Future of Apache StormPast, Present, and Future of Apache Storm
Past, Present, and Future of Apache StormP. Taylor Goetz
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphP. Taylor Goetz
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache StormP. Taylor Goetz
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark StreamingP. Taylor Goetz
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceP. Taylor Goetz
 

Plus de P. Taylor Goetz (8)

From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...
 
Past, Present, and Future of Apache Storm
Past, Present, and Future of Apache StormPast, Present, and Future of Apache Storm
Past, Present, and Future of Apache Storm
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 

Dernier

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 

Dernier (20)

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 

Flux: Apache Storm Frictionless Topology Configuration & Deployment

  • 1. ███████╗██╗ ██╗ ██╗██╗ ██╗ ██╔════╝██║ ██║ ██║╚██╗██╔╝ █████╗ ██║ ██║ ██║ ╚███╔╝ ██╔══╝ ██║ ██║ ██║ ██╔██╗ ██║ ███████╗╚██████╔╝██╔╝ ██╗ ╚═╝ ╚══════╝ ╚═════╝ ╚═╝ ╚═╝ Apache Storm Frictionless Topology Configuration & Deployment P. Taylor Goetz, Hortonworks @ptgoetz Storm BoF - Hadoop Summit Brussels 2015
  • 2. About me… • VP - Apache Storm • ASF Member • Member of Technical Staff, Hortonworks
  • 3. What is Flux? • An easier way to configure and deploy Apache Storm topologies • A YAML DSL for defining and configuring Storm topologies • And more…
  • 5. Because seeing duplication of effort makes me sad…
  • 6. What’s wrong here? public static void main(String[] args) throws Exception { String name = "myTopology"; StormTopology topology = getTopology(); // logic to determine if we're running locally or not… boolean runLocal = shouldRunLocal(); // create necessary config options… Config conf = new Config(); if(runLocal){ LocalCluster cluster = new LocalCluster(); cluster.submitTopology(name, conf, topology); } else { StormSubmitter.submitTopology(name, conf, topology); } }
  • 7. What’s wrong here? public static void main(String[] args) throws Exception { String name = "myTopology"; StormTopology topology = getTopology(); // logic to determine if we're running locally or not… boolean runLocal = shouldRunLocal(); // create necessary config options… Config conf = new Config(); if(runLocal){ LocalCluster cluster = new LocalCluster(); cluster.submitTopology(name, conf, topology); } else { StormSubmitter.submitTopology(name, conf, topology); } } • Configuration tightly coupled with code. • Changes require recompilation & repackaging.
  • 8. Wouldn’t this be easier? storm jar mycomponents.jar org.apache.storm.flux.Flux --local config.yaml OR… storm jar mycomponents.jar org.apache.storm.flux.Flux --remote config.yaml
  • 9. Flux allows you to package all your Storm components once. Then wire, configure, and deploy topologies using a YAML definition.
  • 10. Flux Features • Easily configure and deploy Storm topologies (Both Storm core and Microbatch API) without embedding configuration in your topology code • Support for existing topology code • Define Storm Core API (Spouts/Bolts) using a flexible YAML DSL • YAML DSL support for virtually any Storm component (storm-kafka, storm-hdfs, storm-hbase, etc.) • Convenient support for multi-lang components • External property substitution/filtering for easily switching between configurations/environments (similar to Maven-style ${variable.name} substitution)
  • 11. Flux YAML DSL YAML Definition Consists of: • Topology Name (1) • Includes (0…*) • Config Map (0…1) • Components (0…*) • Spouts (1…*) • Bolts (1…*) • Streams (1…*)
  • 13. Config A Map-of-Maps (Objects) that will be passed to the topology at submission time (Storm config). # topology name name: “myTopology" # topology configuration config: topology.workers: 5 topology.max.spout.pending: 1000 # ...
  • 14. Components • Catalog (list/map) of Objects that can be used/referenced in other parts of the YAML configuration • Roughly analogous to Spring beans.
  • 15. Components Simple Java class with default constructor: # Components components: - id: "stringScheme" className: "storm.kafka.StringScheme"
  • 16. Components: Constructor Arguments Component classes can be instantiate with “constructorArgs” (a list of class constructor arguments): # Components components: - id: "zkHosts" className: "storm.kafka.ZkHosts" constructorArgs: - "localhost:2181"
  • 17. Components: References Components can be “referenced” throughout the YAML config and used as arguments: # Components components: - id: "stringScheme" className: "storm.kafka.StringScheme" - id: "stringMultiScheme" className: "backtype.storm.spout.SchemeAsMultiScheme" constructorArgs: - ref: "stringScheme"
  • 18. Components: Properties Components can be configured using JavaBean setter methods and public instance variables: - id: "spoutConfig" className: "storm.kafka.SpoutConfig" properties: - name: "forceFromStart" value: true - name: "scheme" ref: "stringMultiScheme"
  • 19. Components: Config Methods Call arbitrary methods to configure a component: - id: "recordFormat" className: "org.apache.storm.hdfs.bolt.format.DelimitedRecordFormat" configMethods: - name: "withFieldDelimiter" args: ["|"] References can be used here as well.
  • 20. Spouts A list of objects that implement the IRichSpout interface and an associated parallelism setting. # spout definitions spouts: - id: "sentence-spout" className: "org.apache.storm.flux.wrappers.spouts.FluxShellSpout" # shell spout constructor takes 2 arguments: String[], String[] constructorArgs: # command line - ["node", "randomsentence.js"] # output fields - ["word"] parallelism: 1 # ...
  • 21. Bolts A list of objects that implement the IRichBolt or IBasicBolt interface with an associated parallelism setting. # bolt definitions bolts: - id: "splitsentence" className: "org.apache.storm.flux.wrappers.bolts.FluxShellBolt" constructorArgs: # command line - ["python", "splitsentence.py"] # output fields - ["word"] parallelism: 1 # ... - id: "count" className: "backtype.storm.testing.TestWordCounter" parallelism: 1 # ...
  • 22. Spout and Bolt definitions are just extensions of “Component” with a “parallelism” attribute, so all component features (references, constructor args, properties, config methods) can be used.
  • 23. Streams • Represent Spout-to-Bolt and Bolt-to-Bolt connections • In graph terms: “edges” • Also define Stream Groupings: • ALL, CUSTOM, DIRECT, SHUFFLE, LOCAL_OR_SHUFFLE, FIELDS, GLOBAL, or NONE.
  • 24. Streams Custom stream grouping: - from: "bolt-1" to: "bolt-2" grouping: type: CUSTOM customClass: className: "backtype.storm.testing.NGrouping" constructorArgs: - 1 Again, you can use references, properties, and config methods.
  • 25. Filtering/Variable Substitution Define properties in an external properties file, and reference them in YAML using ${} syntax: - id: "rotationAction" className: "org.apache.storm.hdfs.common.rotation.MoveFileAction" configMethods: - name: "toDestination" args: ["${hdfs.dest.dir}"] Will get replaced with value of property prior to YAML parsing.
  • 26. Filtering/Variable Substitution Environment variables can be referenced in YAML using ${ENV-} syntax: - id: "rotationAction" className: "org.apache.storm.hdfs.common.rotation.MoveFileAction" configMethods: - name: "toDestination" args: [“${ENV-HDFS_DIR}”] Will get replaced with value of $HDFS_DIR env variable prior to YAML parsing.
  • 27. File Includes and Overrides Include files/classpath resources and optionally override values: name: "include-topology" includes: - resource: true file: "/configs/shell_test.yaml" override: false #otherwise subsequent includes that define 'name' would override
  • 29. Existing Topologies • Alternative to YAML Spout/Bolt/Stream DSL • Same syntax • Works with transactional/micro-batch (Trident) topologies • Tell Flux about the class that will produce your topology • Components, references, constructor args, properties, config methods, etc. can all be used
  • 30. Existing Topologies Provide a class with a public method that returns a StormTopology instance: /** * Marker interface for objects that can produce `StormTopology` objects. * * If a `topology-source` class implements the `getTopology()` method, Flux will * call that method. Otherwise, it will introspect the given class and look for a * similar method that produces a `StormTopology` instance. * * Note that it is not strictly necessary for a class to implement this interface. * If a class defines a method with a similar signature, Flux should be able to find * and invoke it. * */ public interface TopologySource { public StormTopology getTopology(Map<String, Object> config); } This can be a Spout/Bolt or Trident topology.
  • 31. Existing Topologies Define a topologySource to tell Flux how to configure the class that creates the topology: # configuration that uses an existing topology that does not implement TopologySource name: "existing-topology" topologySource: className: "org.apache.storm.flux.test.SimpleTopology" methodName: "getTopologyWithDifferentMethodName" constructorArgs: - "foo" - "bar" Components, references, constructor args, properties, config methods, etc. can all be used.
  • 32. Flux Usage • Add the Flux dependency to your project. • Use the Maven shade plugin to create a fat jar file. • Use the `storm` command to run (locally) or deploy (remotely) your topology: storm jar mycomponents.jar org.apache.storm.flux.Flux [options] <config file>
  • 33. Flux Usage: Command Line Options usage: storm jar <my_topology_uber_jar.jar> org.apache.storm.flux.Flux [options] <topology-config.yaml> -d,--dry-run Do not run or deploy the topology. Just build, validate, and print information about the topology. -e,--env-filter Perform environment variable substitution. Replace keysidentified with `${ENV-[NAME]}` will be replaced with the corresponding `NAME` environment value -f,--filter <file> Perform property substitution. Use the specified file as a source of properties, and replace keys identified with {$[property name]} with the value defined in the properties file. -i,--inactive Deploy the topology, but do not activate it. -l,--local Run the topology in local mode. -n,--no-splash Suppress the printing of the splash screen. -q,--no-detail Suppress the printing of topology details. -r,--remote Deploy the topology to a remote cluster. -R,--resource Treat the supplied path as a class path resource instead of a file. -s,--sleep <ms> When running locally, the amount of time to sleep (in ms.) before killing the topology and shutting down the local cluster. -z,--zookeeper <host:port> When running in local mode, use the ZooKeeper at the specified <host>:<port> instead of the in-process ZooKeeper. (requires Storm 0.9.3 or later)
  • 34. With great power comes great responsibility. It’s up to you to avoid shooting yourself in the foot!
  • 36. Thank you! AMA… P. Taylor Goetz, Hortonworks @ptgoetz