SlideShare une entreprise Scribd logo
1  sur  77
Advanced Functional
Programming in Scala
Patrick Nicolas
Oct 2013
Rev. July 2015
patricknicolas.blogspot.com
www.slideshare.net/pnicolas
This is an overview of some interesting advanced
features of Scala. It is not meant to be a tutorial and
assume that you are familiar with the key constructs
of the language.
Some of the examples are extracted from Scala for
Machine Learning – Packt Publishing
Scala has a lot of features …..
Actors
Composed futures
F-bound
Reactive
Advanced functional programming?
... among them
Higher kind projection
Contravariant functors
Monadic composition
Streams
Views
Type classes
Stacked mixins models
Cake pattern
Magnet pattern
View bounds
F-bound polymorphism
Dataflow back pressure
Continuation passing style
Functors and monads are defined as single type higher kinds:
M[_]. The problem is to define monadic composition for
objects belongs to categories that have two or more types
M[_, _] ( i.e. Function1[U, V] ).
Higher kind projection
Scala support functorial and monadic operations for multi-
type categories using higher kind type projection
Higher kind projection
Let us consider a covariant functor F that applies a morphism f
within a category C defined as
∀𝑎, 𝑏 ∈ 𝐶 𝑓: 𝑎 → 𝑏
𝐹 𝑎 → 𝑏 = 𝐹 𝑎 → 𝐹(𝑏)
The definition of a functor in Scala relies on a single type
higher kind M
(*) Functors are important concepts in algebraic topology used
in defining algebra for tensors for example.
Higher kind projection
How can we define a functor for classes that have multiple
parameterized type?
Let’s consider the definition of a tensor using Scala Function1
The covariant CoVector (resp. contravariant Vector) vectors are
created through a projection onto the covariant (resp.
contravariant) parameterized type T of Function1.
Higher kind projection
The implementation of the functor for the Vector type uses the
projection of the higher kind Function1 to its covariant
component by accessing # the inner type Vector of Tensor
The map applies covariant composition, compose of Function1
Higher kind projection
Contravariant functors
Monadic composition
Streams
Views
Type classes
Stacked mixins models
Cake pattern
Magnet pattern
View bounds
F-bound polymorphism
Dataflow back pressure
Continuation passing style
Contravariant functors
Some categories of objects such as covariant tensors or
function parameterized on the input or contravariant
type (i.e. T => Function1[T, U] for a given type U),
require the order of morphisms be reversed.
Morphisms on contravariant argument type are transported
through contravariant functors.
Contravariant functors
Let us consider a contravariant functor F that applies a
morphism f within a category C defined as
∀𝑎, 𝑏 ∈ 𝐶 𝑓 𝑎 → 𝑏
𝐹 𝑎 → 𝑏 = 𝐹 𝑏 → 𝐹(𝑎)
The definition of a contravariant functor in Scala relies on a
single type higher kind M
Contravariant functors
The implementation of the contravariant functor for the CoVector
type uses the projection of the higher kind Function1 to its
covariant component by accessing # the inner type CoVector of
Tensor
The map applies covariant composition, andThen of Function1
Higher kind projection
Contravariant functors
Monadic composition
Streams
Views
Type classes
Stacked mixins models
Cake pattern
Magnet pattern
View bounds
F-bound polymorphism
Dataflow back pressure
Continuation passing style
It is quite common to compose, iteratively or recursively
functions, methods or data transformations.
Monadic composition
Monads extends the concept of functor to support
composition (or chaining) of computation into a chain
Monads are abstract structures in algebraic topology related to
the category theory.
A category C is a structure which has
● object {a, b,c...}
● morphism or maps on objects f: a->b
● composition of morphisms
f: a->b, g: b->c => f o g: a->c
Monads enable the “monadic” composition or chaining of
functions or computation on single type argument.
Monadic composition
Let’s consider the definition of a kernel function Kf as the composition
of 2 functions g o h.
𝒦𝑓 𝐱, 𝐲 = 𝑔(
𝑖
ℎ(𝑥𝑖, 𝑦𝑖))
Monadic composition
We create a monad to generate any kind of kernel functions Kf, by
composing their component g: g1 o g2 o … o gn o h
A monad extends a functor with binding method (flatMap)
The monadic definition of the kernel function component h
Monadic composition
Example of Kernel functions
𝒦 𝐱, 𝐲 = 𝑒
−
1
2
𝐱−𝐲
𝜎
2
h: 𝑥, 𝑦 → 𝑥 − 𝑦 g: 𝑥 → 𝑒
−
1
2𝜎2( 𝑥)2
Polynomial kernel
𝒦 𝐱, 𝐲 = (1 + 𝐱. 𝐲) 𝑑
h: 𝑥, 𝑦 → 𝑥. 𝑦 g: 𝑥 → (1 + 𝑥) 𝑑
Monadic composition
Radius basis function kernel
The monadic composition consists of chaining the flatMap invocation
on the functor, map, that preserves morphisms on kernel functions.
Monadic composition
The for comprehension closure is a syntactic sugar on the iterative
monadic composition.
Higher kind projection
Contravariant functors
Monadic composition
Streams
Views
Type classes
Stacked mixins models
Cake pattern
Magnet pattern
View bounds
F-bound polymorphism
Dataflow back pressure
Continuation passing style
Streams
Streams reduce memory consumption by allocating and
releasing chunk of data (or slice or time series) while allowing
reuse of intermediate results.
Some problems lend themselves to process very large data
sets of unknown size for which the execution may have to be
aborted or re-applied
The large data set is converted into a stream then broken
down into manageable slices. The slices are instantiated,
processed (i.e. loss function) and released back to the
garbage collector, one at the time
X0 X1 ….... Xn ………. Xm
Data stream
1
2𝑚
𝑦 𝑛 − 𝑓 𝒘|𝑥 𝑛
2
+ 𝜆 𝒘 2
Garbage collector
Xi
Allocate
slice .take
Release slice .drop
Heap
Traversal loss function
Streams
Slices of NOBS observations are allocated one at the time, (take)
processed, then released (drop) at the time.
Views and streams
The reference streamRef has to be weak, in order to have the slices
garbage collected. Otherwise the memory consumption increases
with each new batch of data.
(*) Alternatives: define strmRef as a def or use StreamIterator
Views and streams
Comparing list, stream and stream with weak references.
Views and streams
Operating zone
Higher kind projection
Contravariant functors
Monadic composition
Streams
Views
Type classes
Stacked mixins models
Cake pattern
Magnet pattern
View bounds
F-bound polymorphism
Dataflow back pressure
Continuation passing style
Views
Scientific computations require chaining complex data
transformations on large data set. There is not always a
need to process all elements of the dataset.
Scala allows the creation of a view on collections that are
the result of a data transformation. The elements are
instantiated only once needed.
Views
Accessing an element of the list requires allocating
the entire list in memory.
Accessing an element of the view requires allocating
only this element in memory.
Higher kind projection
Contravariant functors
Monadic composition
Streams
Views
Type classes
Stacked mixins models
Cake pattern
Magnet pattern
View bounds
F-bound polymorphism
Dataflow back pressure
Continuation passing style
Type classes
Scala libraries classes cannot always be sub-classed.
Wrapping library component in a helper class clutters the
design.
Type classes extends classes functionality without
cluttering name spaces (alternative to type classes)
The purpose of reusability goes beyond refactoring code.
It includes leveraging existing well understood concepts
and semantic.
Let’s consider the definition of a tensor as being either a vector
or a covector.
Type classes
Let’s extend the concept of tensor with. A metric is computed as
the inner product or composition of a Covector and a vector.
The computationis implemented by the method Metric.apply
Type classes
The inner object Metric define the implicit conversion
Higher kind projection
Contravariant functors
Monadic composition
Streams
Views
Type classes
Stacked mixins models
Cake pattern
Magnet pattern
View bounds
F-bound polymorphism
Dataflow back pressure
Continuation passing style
Stacked mixins models
Scala stacked traits and abstract values preserve the core
formalism of mathematical expressions.
Traditional programming languages compare unfavorably to
scientific related language such as R because their inability
to follow a strict mathematical formalism:
1. Variable declaration
2. Model definition
3. Instantiation
𝑓 ∈ ℝ 𝑛
→ ℝ 𝑛
𝑓 𝑥 = 𝑒 𝑥
𝑔 ∈ ℝ 𝑛 → ℝ
ℎ = 𝑔𝑜𝑓
g 𝒙 = 𝑖 𝑥𝑖
Declaration
Model
Instantiation
Stacked mixins models
Higher kind projection
Contravariant functors
Monadic composition
Streams
Views
Type classes
Stacked mixins models
Cake pattern
Magnet pattern
View bounds
F-bound polymorphism
Dataflow back pressure
Continuation passing style
Stacked mixins models
Building machine learning apps requires configurable,
dynamic workflows
Leverage mixins, inheritance and abstract values to create
models and weave data transformation.
Factory design patterns have been used to model dynamic
systems (GoF). Dependency injection has gain popularity
for creating configurable systems (i.e. Spring framework).
Stacked mixins models
Multiple models and algorithms are typically evaluated by
weaving computation tasks.
A learning platform is a framework that
• Define computational tasks
• Wires the tasks (data flow)
• Deploys the tasks (*)
Overcome limitation of monadic composition (3 level of
dynamic binding…)
(*) Actor-based deployment
Even the simplest workflow, defined as a pipeline of data transformations
requires a flexible design …
Stacked mixins models
Stacked mixins models
Summary of the 3 configurability layers of Cake pattern
1. Given the objective of the computation, select the best
sequence of module/tasks (i.e. Modeling: Preprocessing +
Training + Validating)
2. Given the profile of data input, select the best data
transformation for each module (i.e. Data preprocessing:
Kalman, DFT, Moving average….)
3. Given the computing platform, select the best
implementation for each data transformation (i.e. Kalman:
KalmanOnAkka, Spark…)
Implementation of Preprocessing module
Stacked mixins models
Implementation of Preprocessing module using discrete Fourier
… and discrete Kalman filter
Stacked mixins models
d
d
Preprocessing
Loading
Reducing Training
Validating
Preprocessor
DFTFilter
Kalman
EM
PCA SVM
MLP
Reducer Supervisor
Clustering
Clustering workflow = preprocessing task -> Reducing task
Modeling workflow = preprocessing task -> model training task
-> model validation
Modeling
Stacked mixins models
A simple clustering workflow requires a preprocessor &
reducer. The computation sequence exec transform a time
series of element of type U and return a time series of
type W as option
Stacked mixins models
A model is created by processing the original time series of type TS[T]
through a preprocessor, a training supervisor and a validator
Stacked mixins models
Putting all together for a conditional path execution …
Stacked mixins models
Higher kind projection
Contravariant functors
Monadic composition
Streams
Views
Type classes
Stacked mixins models
Cake pattern
Magnet pattern
View bounds
F-bound polymorphism
Dataflow back pressure
Continuation passing style
Magnet pattern
Method overloading in Scala has limitations:
• Type erasure in the JVM causes collision of type of
arguments in overloaded methods
• Overloaded methods cannot be lifted into a function
• Code may be unecessary duplicated
The magnet pattern overcomes these limitations by
encapsulating the return and redefining the overloaded
methods as implicit functions.
Magnet pattern
Let’s consider the following three incarnations of the method test
These methods have different return types. The first and last
methods conflict because of type erasure on T => List[Double]
Magnet pattern
Step 1: Define generic return type and constructor
Step 2: Implement the test methods as implicits
Magnet pattern
Step 3: Implement the lifted function test as follows
The first call invokes the implicit fromTN and the second
triggers the implicit fromT.
The return type is inferred from the type of argument
Higher kind projection
Contravariant functors
Monadic composition
Streams
Views
Type classes
Stacked mixins models
Cake pattern
Magnet pattern
View bounds
F-bound polymorphism
Dataflow back pressure
Continuation passing style
View bound
Context bound cannot be used to bind the parameterized
type of a generic class to a primitive type.
Scala view bounds allows to create developers to create
class with parameterized types associated to a Scala or
Java primitive type.
View bound
Let’s consider a class which parameterized type can be
manipulate as a Float.
Context bound is not permissible
Constraining the type with a upper bound Float does not
work as Float is a final class.
View bound
The solution is to bind the class type to a Float using an
implicit conversion (or view)
The <% directive is the short notation for
Higher kind projection
Contravariant functors
Monadic composition
Streams
Views
Type classes
Stacked mixins models
Cake pattern
Magnet pattern
View bounds
F-bound polymorphism
Dataflow back pressure
Continuation passing style
F-Bound polymorphism is a parametric type polymorphism
that constrains the subtypes to themselves using bounds.
It is important to write code that catch error at compile
time. How can we enforce type integrity in subclasses?
F-Bound polymorphism
F-Bound polymorphism
Let’s create a trait that define a discriminative learning model
with method to manipulate data.
The class Svm and Mlp implements the Discriminative trait.
The problem is that nothing prevent to create a class Nnet
that impersonates an Svm class.
F-Bound polymorphism
One solution is to restrict (or bound) the type to a Discriminative
class
It prevents a new class to insert itself into the hierarchy.
.. but does not guarantee the type integrity for existing classes
F-Bound polymorphism
The self reference guarantee the integrity of each existing
and new subclass. F-Bound polymorphism is a self-
referenced bound polymorphism.
Higher kind projection
Contravariant functors
Monadic composition
Streams
Views
Type classes
Stacked mixins models
Cake pattern
Magnet pattern
View bounds
F-bound polymorphism
Data flow control
Continuation passing style
Data flow back pressure
A data flow control mechanism handling back pressure
on bounded mail boxes of upstream actors.
Scala actors provide a reliable way to deploy workflows
on a distributed environment. However, some nodes
may experience slow processing and create performance
bottlenecks.
Data flow back pressure
Actor-based workflow has to consider
- Cascading failures => supervision strategy
- Cascading bottleneck => Mailbox back-pressure strategy
Workers
Router, Dispatcher, …
Messages passing scheme to process various data streams
with transformations.
Dataset
Workers
Controller
Watcher
Load->
Compute->
Bounded mailboxes
<- GetStatus
Status ->
Completed->
Data flow back pressure
Worker actors processes data chunk msg.xt sent by the
controller with the transformation msg.fct
Message sent by collector to trigger computation
Data flow back pressure
Watcher actor monitors messages queues report to collector with
Status message.
GetStatus message sent by the collector has no payload
Data flow back pressure
Controller creates the workers, bounded mailbox for each worker
actor (msgQueues) and the watcher actor.
Data flow back pressure
The Controller loads the data sets per chunk upon receiving the
message Load from the main program. It processes the results of
the computation from the worker (Completed) and throttle the
input to workers for each Status message.
Data flow back pressure
The Load message is implemented as a loop that create data chunk
which size is adjusted according to the load computed by the
watcher and forwarded to the controller, Status
Data flow back pressure
Simple throttle increases/decreases size of the batch of
observations given the current load and specified watermark.
Data flow back pressure
Selecting faster/slower and less/more accurate version of algorithm
can also be used in the regulation strategy
Feedback control loop adjusts the size of the batches given the
load in mail boxes and complexity of the computation
Data flow back pressure
• Feedback control loop should be smoothed (moving
average, Kalman…)
• A larger variety of data flow control actions such as
adding more workers, increasing queue capacity, …
• The watch dog should handle dead letters, in case of a
failure of the feedback control or the workers.
• Reactive streams introduced in Akka 2.2+ has a
sophisticated TCP-based propagation and back pressure
control flows
Notes
Data flow back pressure
Higher kind projection
Contravariant functors
Monadic composition
Streams
Views
Type classes
Stacked mixins models
Cake pattern
Magnet pattern
View bounds
F-bound polymorphism
Dataflow back pressure
Continuation passing style
Delimited continuation
Continuation Passing Style (CPS) is a technique that
abstracts computation unit as a data structure in order to
control the state of a computer program, workflow or
sequence of data transformations
Continuations are used to ‘jump’ to a method that
produces a call to the current method. They can be
regarded as ‘functional GOTO’
Delimited continuation
A data transformation (or computation unit) can be
extended (continued) with another transformation known
as continuation. The continuation is provided as argument
of the orginal transformation.
Let’s consider the following workflow
The first workflow is not a continuation, the second is
Delimited continuation
A delimited continuation is a section of the workflow that
is reified into a function returning a value. This technique
relies on control delimiters (shift/reset) to make the
continuation composable and reusable.
More Scala nuggets…
• Domain specific language
• Reactive streams
• Back-pressure strategy using connection state
Wait a minute, there is more…..

Contenu connexe

Tendances

Introduction to Swift programming language.
Introduction to Swift programming language.Introduction to Swift programming language.
Introduction to Swift programming language.Icalia Labs
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksLegacy Typesafe (now Lightbend)
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to sparkDuyhai Doan
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLDatabricks
 
ZIO: Powerful and Principled Functional Programming in Scala
ZIO: Powerful and Principled Functional Programming in ScalaZIO: Powerful and Principled Functional Programming in Scala
ZIO: Powerful and Principled Functional Programming in ScalaWiem Zine Elabidine
 
The Functional Programming Triad of Map, Filter and Fold
The Functional Programming Triad of Map, Filter and FoldThe Functional Programming Triad of Map, Filter and Fold
The Functional Programming Triad of Map, Filter and FoldPhilip Schwarz
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesDatabricks
 
Implementing the IO Monad in Scala
Implementing the IO Monad in ScalaImplementing the IO Monad in Scala
Implementing the IO Monad in ScalaHermann Hueck
 
Functional programming
Functional programmingFunctional programming
Functional programmingijcd
 
Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to ScalaRahul Jain
 
Koalas: Making an Easy Transition from Pandas to Apache Spark
Koalas: Making an Easy Transition from Pandas to Apache SparkKoalas: Making an Easy Transition from Pandas to Apache Spark
Koalas: Making an Easy Transition from Pandas to Apache SparkDatabricks
 
N-Queens Combinatorial Problem - Polyglot FP for Fun and Profit – Haskell and...
N-Queens Combinatorial Problem - Polyglot FP for Fun and Profit – Haskell and...N-Queens Combinatorial Problem - Polyglot FP for Fun and Profit – Haskell and...
N-Queens Combinatorial Problem - Polyglot FP for Fun and Profit – Haskell and...Philip Schwarz
 
Parboiled explained
Parboiled explainedParboiled explained
Parboiled explainedPaul Popoff
 
Camel JBang - Quarkus Insights.pdf
Camel JBang - Quarkus Insights.pdfCamel JBang - Quarkus Insights.pdf
Camel JBang - Quarkus Insights.pdfClaus Ibsen
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...StampedeCon
 

Tendances (20)

Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Introduction to Swift programming language.
Introduction to Swift programming language.Introduction to Swift programming language.
Introduction to Swift programming language.
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
ZIO: Powerful and Principled Functional Programming in Scala
ZIO: Powerful and Principled Functional Programming in ScalaZIO: Powerful and Principled Functional Programming in Scala
ZIO: Powerful and Principled Functional Programming in Scala
 
WebSockets with Spring 4
WebSockets with Spring 4WebSockets with Spring 4
WebSockets with Spring 4
 
The Functional Programming Triad of Map, Filter and Fold
The Functional Programming Triad of Map, Filter and FoldThe Functional Programming Triad of Map, Filter and Fold
The Functional Programming Triad of Map, Filter and Fold
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using Kubernetes
 
Implementing the IO Monad in Scala
Implementing the IO Monad in ScalaImplementing the IO Monad in Scala
Implementing the IO Monad in Scala
 
Functional programming
Functional programmingFunctional programming
Functional programming
 
OOP and FP
OOP and FPOOP and FP
OOP and FP
 
Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to Scala
 
Koalas: Making an Easy Transition from Pandas to Apache Spark
Koalas: Making an Easy Transition from Pandas to Apache SparkKoalas: Making an Easy Transition from Pandas to Apache Spark
Koalas: Making an Easy Transition from Pandas to Apache Spark
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
N-Queens Combinatorial Problem - Polyglot FP for Fun and Profit – Haskell and...
N-Queens Combinatorial Problem - Polyglot FP for Fun and Profit – Haskell and...N-Queens Combinatorial Problem - Polyglot FP for Fun and Profit – Haskell and...
N-Queens Combinatorial Problem - Polyglot FP for Fun and Profit – Haskell and...
 
Parboiled explained
Parboiled explainedParboiled explained
Parboiled explained
 
Camel JBang - Quarkus Insights.pdf
Camel JBang - Quarkus Insights.pdfCamel JBang - Quarkus Insights.pdf
Camel JBang - Quarkus Insights.pdf
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 

Similaire à Advanced Functional Programming in Scala

Scala for Machine Learning
Scala for Machine LearningScala for Machine Learning
Scala for Machine LearningPatrick Nicolas
 
Fuel Up JavaScript with Functional Programming
Fuel Up JavaScript with Functional ProgrammingFuel Up JavaScript with Functional Programming
Fuel Up JavaScript with Functional ProgrammingShine Xavier
 
MDE=Model Driven Everything (Spanish Eclipse Day 2009)
MDE=Model Driven Everything (Spanish Eclipse Day 2009)MDE=Model Driven Everything (Spanish Eclipse Day 2009)
MDE=Model Driven Everything (Spanish Eclipse Day 2009)Jordi Cabot
 
Coeffects: A Calculus of Context-Dependent Computation
Coeffects: A Calculus of Context-Dependent ComputationCoeffects: A Calculus of Context-Dependent Computation
Coeffects: A Calculus of Context-Dependent ComputationTomas Petricek
 
Extending Rotor with Structural Reflection to support Reflective Languages
Extending Rotor with Structural Reflection to support Reflective LanguagesExtending Rotor with Structural Reflection to support Reflective Languages
Extending Rotor with Structural Reflection to support Reflective Languagesfranciscoortin
 
Inference & Learning in Linear Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear Chain Conditional Random Fields (CRFs)Inference & Learning in Linear Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear Chain Conditional Random Fields (CRFs)Anmol Dwivedi
 
M04 Design Patterns
M04 Design PatternsM04 Design Patterns
M04 Design PatternsDang Tuan
 
Roberto Trasarti PhD Thesis
Roberto Trasarti PhD ThesisRoberto Trasarti PhD Thesis
Roberto Trasarti PhD ThesisRoberto Trasarti
 
scala.reflect, Eugene Burmako
scala.reflect, Eugene Burmakoscala.reflect, Eugene Burmako
scala.reflect, Eugene BurmakoVasil Remeniuk
 
Евгений Бурмако «scala.reflect»
Евгений Бурмако «scala.reflect»Евгений Бурмако «scala.reflect»
Евгений Бурмако «scala.reflect»e-Legion
 
Design Patterns
Design PatternsDesign Patterns
Design Patternssoms_1
 
MOSKitt Transformations And Traceability
MOSKitt Transformations And TraceabilityMOSKitt Transformations And Traceability
MOSKitt Transformations And TraceabilityJavier Muñoz
 
Executable modeling & dynamic adaptation
Executable modeling & dynamic adaptationExecutable modeling & dynamic adaptation
Executable modeling & dynamic adaptationOlivier Le Goaër
 

Similaire à Advanced Functional Programming in Scala (20)

Scala for Machine Learning
Scala for Machine LearningScala for Machine Learning
Scala for Machine Learning
 
Fuel Up JavaScript with Functional Programming
Fuel Up JavaScript with Functional ProgrammingFuel Up JavaScript with Functional Programming
Fuel Up JavaScript with Functional Programming
 
MDE=Model Driven Everything (Spanish Eclipse Day 2009)
MDE=Model Driven Everything (Spanish Eclipse Day 2009)MDE=Model Driven Everything (Spanish Eclipse Day 2009)
MDE=Model Driven Everything (Spanish Eclipse Day 2009)
 
Coeffects: A Calculus of Context-Dependent Computation
Coeffects: A Calculus of Context-Dependent ComputationCoeffects: A Calculus of Context-Dependent Computation
Coeffects: A Calculus of Context-Dependent Computation
 
Extending Rotor with Structural Reflection to support Reflective Languages
Extending Rotor with Structural Reflection to support Reflective LanguagesExtending Rotor with Structural Reflection to support Reflective Languages
Extending Rotor with Structural Reflection to support Reflective Languages
 
UIC Panella Thesis
UIC Panella ThesisUIC Panella Thesis
UIC Panella Thesis
 
Inference & Learning in Linear Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear Chain Conditional Random Fields (CRFs)Inference & Learning in Linear Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear Chain Conditional Random Fields (CRFs)
 
ListMyPolygons 0.6
ListMyPolygons 0.6ListMyPolygons 0.6
ListMyPolygons 0.6
 
3rd 3DDRESD: DReAMS
3rd 3DDRESD: DReAMS3rd 3DDRESD: DReAMS
3rd 3DDRESD: DReAMS
 
M04 Design Patterns
M04 Design PatternsM04 Design Patterns
M04 Design Patterns
 
Roberto Trasarti PhD Thesis
Roberto Trasarti PhD ThesisRoberto Trasarti PhD Thesis
Roberto Trasarti PhD Thesis
 
scala.reflect, Eugene Burmako
scala.reflect, Eugene Burmakoscala.reflect, Eugene Burmako
scala.reflect, Eugene Burmako
 
Евгений Бурмако «scala.reflect»
Евгений Бурмако «scala.reflect»Евгений Бурмако «scala.reflect»
Евгений Бурмако «scala.reflect»
 
Design Patterns
Design PatternsDesign Patterns
Design Patterns
 
MOSKitt Transformations And Traceability
MOSKitt Transformations And TraceabilityMOSKitt Transformations And Traceability
MOSKitt Transformations And Traceability
 
Executable modeling & dynamic adaptation
Executable modeling & dynamic adaptationExecutable modeling & dynamic adaptation
Executable modeling & dynamic adaptation
 
Towards Improving Interface Modularity in Legacy Java Software Through Automa...
Towards Improving Interface Modularity in Legacy Java Software Through Automa...Towards Improving Interface Modularity in Legacy Java Software Through Automa...
Towards Improving Interface Modularity in Legacy Java Software Through Automa...
 
React hooks
React hooksReact hooks
React hooks
 
10-DesignPatterns.ppt
10-DesignPatterns.ppt10-DesignPatterns.ppt
10-DesignPatterns.ppt
 
UML Basics
UML BasicsUML Basics
UML Basics
 

Plus de Patrick Nicolas

Autonomous medical coding with discriminative transformers
Autonomous medical coding with discriminative transformersAutonomous medical coding with discriminative transformers
Autonomous medical coding with discriminative transformersPatrick Nicolas
 
Open Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learningOpen Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learningPatrick Nicolas
 
AI for electronic health records
AI for electronic health recordsAI for electronic health records
AI for electronic health recordsPatrick Nicolas
 
Monadic genetic kernels in Scala
Monadic genetic kernels in ScalaMonadic genetic kernels in Scala
Monadic genetic kernels in ScalaPatrick Nicolas
 
Stock Market Prediction using Hidden Markov Models and Investor sentiment
Stock Market Prediction using Hidden Markov Models and Investor sentimentStock Market Prediction using Hidden Markov Models and Investor sentiment
Stock Market Prediction using Hidden Markov Models and Investor sentimentPatrick Nicolas
 
Adaptive Intrusion Detection Using Learning Classifiers
Adaptive Intrusion Detection Using Learning ClassifiersAdaptive Intrusion Detection Using Learning Classifiers
Adaptive Intrusion Detection Using Learning ClassifiersPatrick Nicolas
 
Data Modeling using Symbolic Regression
Data Modeling using Symbolic RegressionData Modeling using Symbolic Regression
Data Modeling using Symbolic RegressionPatrick Nicolas
 
Semantic Analysis using Wikipedia Taxonomy
Semantic Analysis using Wikipedia TaxonomySemantic Analysis using Wikipedia Taxonomy
Semantic Analysis using Wikipedia TaxonomyPatrick Nicolas
 
Taxonomy-based Contextual Ads Targeting
Taxonomy-based Contextual Ads TargetingTaxonomy-based Contextual Ads Targeting
Taxonomy-based Contextual Ads TargetingPatrick Nicolas
 
Multi-tenancy in Private Clouds
Multi-tenancy in Private CloudsMulti-tenancy in Private Clouds
Multi-tenancy in Private CloudsPatrick Nicolas
 

Plus de Patrick Nicolas (11)

Autonomous medical coding with discriminative transformers
Autonomous medical coding with discriminative transformersAutonomous medical coding with discriminative transformers
Autonomous medical coding with discriminative transformers
 
Open Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learningOpen Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learning
 
AI for electronic health records
AI for electronic health recordsAI for electronic health records
AI for electronic health records
 
Monadic genetic kernels in Scala
Monadic genetic kernels in ScalaMonadic genetic kernels in Scala
Monadic genetic kernels in Scala
 
Stock Market Prediction using Hidden Markov Models and Investor sentiment
Stock Market Prediction using Hidden Markov Models and Investor sentimentStock Market Prediction using Hidden Markov Models and Investor sentiment
Stock Market Prediction using Hidden Markov Models and Investor sentiment
 
Adaptive Intrusion Detection Using Learning Classifiers
Adaptive Intrusion Detection Using Learning ClassifiersAdaptive Intrusion Detection Using Learning Classifiers
Adaptive Intrusion Detection Using Learning Classifiers
 
Data Modeling using Symbolic Regression
Data Modeling using Symbolic RegressionData Modeling using Symbolic Regression
Data Modeling using Symbolic Regression
 
Semantic Analysis using Wikipedia Taxonomy
Semantic Analysis using Wikipedia TaxonomySemantic Analysis using Wikipedia Taxonomy
Semantic Analysis using Wikipedia Taxonomy
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Taxonomy-based Contextual Ads Targeting
Taxonomy-based Contextual Ads TargetingTaxonomy-based Contextual Ads Targeting
Taxonomy-based Contextual Ads Targeting
 
Multi-tenancy in Private Clouds
Multi-tenancy in Private CloudsMulti-tenancy in Private Clouds
Multi-tenancy in Private Clouds
 

Dernier

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Dernier (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Advanced Functional Programming in Scala

  • 1. Advanced Functional Programming in Scala Patrick Nicolas Oct 2013 Rev. July 2015 patricknicolas.blogspot.com www.slideshare.net/pnicolas
  • 2. This is an overview of some interesting advanced features of Scala. It is not meant to be a tutorial and assume that you are familiar with the key constructs of the language. Some of the examples are extracted from Scala for Machine Learning – Packt Publishing
  • 3. Scala has a lot of features ….. Actors Composed futures F-bound Reactive Advanced functional programming? ... among them
  • 4. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
  • 5. Functors and monads are defined as single type higher kinds: M[_]. The problem is to define monadic composition for objects belongs to categories that have two or more types M[_, _] ( i.e. Function1[U, V] ). Higher kind projection Scala support functorial and monadic operations for multi- type categories using higher kind type projection
  • 6. Higher kind projection Let us consider a covariant functor F that applies a morphism f within a category C defined as ∀𝑎, 𝑏 ∈ 𝐶 𝑓: 𝑎 → 𝑏 𝐹 𝑎 → 𝑏 = 𝐹 𝑎 → 𝐹(𝑏) The definition of a functor in Scala relies on a single type higher kind M (*) Functors are important concepts in algebraic topology used in defining algebra for tensors for example.
  • 7. Higher kind projection How can we define a functor for classes that have multiple parameterized type? Let’s consider the definition of a tensor using Scala Function1 The covariant CoVector (resp. contravariant Vector) vectors are created through a projection onto the covariant (resp. contravariant) parameterized type T of Function1.
  • 8. Higher kind projection The implementation of the functor for the Vector type uses the projection of the higher kind Function1 to its covariant component by accessing # the inner type Vector of Tensor The map applies covariant composition, compose of Function1
  • 9. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
  • 10. Contravariant functors Some categories of objects such as covariant tensors or function parameterized on the input or contravariant type (i.e. T => Function1[T, U] for a given type U), require the order of morphisms be reversed. Morphisms on contravariant argument type are transported through contravariant functors.
  • 11. Contravariant functors Let us consider a contravariant functor F that applies a morphism f within a category C defined as ∀𝑎, 𝑏 ∈ 𝐶 𝑓 𝑎 → 𝑏 𝐹 𝑎 → 𝑏 = 𝐹 𝑏 → 𝐹(𝑎) The definition of a contravariant functor in Scala relies on a single type higher kind M
  • 12. Contravariant functors The implementation of the contravariant functor for the CoVector type uses the projection of the higher kind Function1 to its covariant component by accessing # the inner type CoVector of Tensor The map applies covariant composition, andThen of Function1
  • 13. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
  • 14. It is quite common to compose, iteratively or recursively functions, methods or data transformations. Monadic composition Monads extends the concept of functor to support composition (or chaining) of computation into a chain
  • 15. Monads are abstract structures in algebraic topology related to the category theory. A category C is a structure which has ● object {a, b,c...} ● morphism or maps on objects f: a->b ● composition of morphisms f: a->b, g: b->c => f o g: a->c Monads enable the “monadic” composition or chaining of functions or computation on single type argument. Monadic composition
  • 16. Let’s consider the definition of a kernel function Kf as the composition of 2 functions g o h. 𝒦𝑓 𝐱, 𝐲 = 𝑔( 𝑖 ℎ(𝑥𝑖, 𝑦𝑖)) Monadic composition We create a monad to generate any kind of kernel functions Kf, by composing their component g: g1 o g2 o … o gn o h
  • 17. A monad extends a functor with binding method (flatMap) The monadic definition of the kernel function component h Monadic composition
  • 18. Example of Kernel functions 𝒦 𝐱, 𝐲 = 𝑒 − 1 2 𝐱−𝐲 𝜎 2 h: 𝑥, 𝑦 → 𝑥 − 𝑦 g: 𝑥 → 𝑒 − 1 2𝜎2( 𝑥)2 Polynomial kernel 𝒦 𝐱, 𝐲 = (1 + 𝐱. 𝐲) 𝑑 h: 𝑥, 𝑦 → 𝑥. 𝑦 g: 𝑥 → (1 + 𝑥) 𝑑 Monadic composition Radius basis function kernel
  • 19. The monadic composition consists of chaining the flatMap invocation on the functor, map, that preserves morphisms on kernel functions. Monadic composition The for comprehension closure is a syntactic sugar on the iterative monadic composition.
  • 20. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
  • 21. Streams Streams reduce memory consumption by allocating and releasing chunk of data (or slice or time series) while allowing reuse of intermediate results. Some problems lend themselves to process very large data sets of unknown size for which the execution may have to be aborted or re-applied
  • 22. The large data set is converted into a stream then broken down into manageable slices. The slices are instantiated, processed (i.e. loss function) and released back to the garbage collector, one at the time X0 X1 ….... Xn ………. Xm Data stream 1 2𝑚 𝑦 𝑛 − 𝑓 𝒘|𝑥 𝑛 2 + 𝜆 𝒘 2 Garbage collector Xi Allocate slice .take Release slice .drop Heap Traversal loss function Streams
  • 23. Slices of NOBS observations are allocated one at the time, (take) processed, then released (drop) at the time. Views and streams
  • 24. The reference streamRef has to be weak, in order to have the slices garbage collected. Otherwise the memory consumption increases with each new batch of data. (*) Alternatives: define strmRef as a def or use StreamIterator Views and streams
  • 25. Comparing list, stream and stream with weak references. Views and streams Operating zone
  • 26. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
  • 27. Views Scientific computations require chaining complex data transformations on large data set. There is not always a need to process all elements of the dataset. Scala allows the creation of a view on collections that are the result of a data transformation. The elements are instantiated only once needed.
  • 28. Views Accessing an element of the list requires allocating the entire list in memory. Accessing an element of the view requires allocating only this element in memory.
  • 29. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
  • 30. Type classes Scala libraries classes cannot always be sub-classed. Wrapping library component in a helper class clutters the design. Type classes extends classes functionality without cluttering name spaces (alternative to type classes) The purpose of reusability goes beyond refactoring code. It includes leveraging existing well understood concepts and semantic.
  • 31. Let’s consider the definition of a tensor as being either a vector or a covector. Type classes Let’s extend the concept of tensor with. A metric is computed as the inner product or composition of a Covector and a vector. The computationis implemented by the method Metric.apply
  • 32. Type classes The inner object Metric define the implicit conversion
  • 33. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
  • 34. Stacked mixins models Scala stacked traits and abstract values preserve the core formalism of mathematical expressions. Traditional programming languages compare unfavorably to scientific related language such as R because their inability to follow a strict mathematical formalism: 1. Variable declaration 2. Model definition 3. Instantiation
  • 35. 𝑓 ∈ ℝ 𝑛 → ℝ 𝑛 𝑓 𝑥 = 𝑒 𝑥 𝑔 ∈ ℝ 𝑛 → ℝ ℎ = 𝑔𝑜𝑓 g 𝒙 = 𝑖 𝑥𝑖 Declaration Model Instantiation Stacked mixins models
  • 36. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
  • 37. Stacked mixins models Building machine learning apps requires configurable, dynamic workflows Leverage mixins, inheritance and abstract values to create models and weave data transformation. Factory design patterns have been used to model dynamic systems (GoF). Dependency injection has gain popularity for creating configurable systems (i.e. Spring framework).
  • 38. Stacked mixins models Multiple models and algorithms are typically evaluated by weaving computation tasks. A learning platform is a framework that • Define computational tasks • Wires the tasks (data flow) • Deploys the tasks (*) Overcome limitation of monadic composition (3 level of dynamic binding…) (*) Actor-based deployment
  • 39. Even the simplest workflow, defined as a pipeline of data transformations requires a flexible design … Stacked mixins models
  • 40. Stacked mixins models Summary of the 3 configurability layers of Cake pattern 1. Given the objective of the computation, select the best sequence of module/tasks (i.e. Modeling: Preprocessing + Training + Validating) 2. Given the profile of data input, select the best data transformation for each module (i.e. Data preprocessing: Kalman, DFT, Moving average….) 3. Given the computing platform, select the best implementation for each data transformation (i.e. Kalman: KalmanOnAkka, Spark…)
  • 41. Implementation of Preprocessing module Stacked mixins models
  • 42. Implementation of Preprocessing module using discrete Fourier … and discrete Kalman filter Stacked mixins models
  • 43. d d Preprocessing Loading Reducing Training Validating Preprocessor DFTFilter Kalman EM PCA SVM MLP Reducer Supervisor Clustering Clustering workflow = preprocessing task -> Reducing task Modeling workflow = preprocessing task -> model training task -> model validation Modeling Stacked mixins models
  • 44. A simple clustering workflow requires a preprocessor & reducer. The computation sequence exec transform a time series of element of type U and return a time series of type W as option Stacked mixins models
  • 45. A model is created by processing the original time series of type TS[T] through a preprocessor, a training supervisor and a validator Stacked mixins models
  • 46. Putting all together for a conditional path execution … Stacked mixins models
  • 47. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
  • 48. Magnet pattern Method overloading in Scala has limitations: • Type erasure in the JVM causes collision of type of arguments in overloaded methods • Overloaded methods cannot be lifted into a function • Code may be unecessary duplicated The magnet pattern overcomes these limitations by encapsulating the return and redefining the overloaded methods as implicit functions.
  • 49. Magnet pattern Let’s consider the following three incarnations of the method test These methods have different return types. The first and last methods conflict because of type erasure on T => List[Double]
  • 50. Magnet pattern Step 1: Define generic return type and constructor Step 2: Implement the test methods as implicits
  • 51. Magnet pattern Step 3: Implement the lifted function test as follows The first call invokes the implicit fromTN and the second triggers the implicit fromT. The return type is inferred from the type of argument
  • 52. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
  • 53. View bound Context bound cannot be used to bind the parameterized type of a generic class to a primitive type. Scala view bounds allows to create developers to create class with parameterized types associated to a Scala or Java primitive type.
  • 54. View bound Let’s consider a class which parameterized type can be manipulate as a Float. Context bound is not permissible Constraining the type with a upper bound Float does not work as Float is a final class.
  • 55. View bound The solution is to bind the class type to a Float using an implicit conversion (or view) The <% directive is the short notation for
  • 56. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
  • 57. F-Bound polymorphism is a parametric type polymorphism that constrains the subtypes to themselves using bounds. It is important to write code that catch error at compile time. How can we enforce type integrity in subclasses? F-Bound polymorphism
  • 58. F-Bound polymorphism Let’s create a trait that define a discriminative learning model with method to manipulate data. The class Svm and Mlp implements the Discriminative trait. The problem is that nothing prevent to create a class Nnet that impersonates an Svm class.
  • 59. F-Bound polymorphism One solution is to restrict (or bound) the type to a Discriminative class It prevents a new class to insert itself into the hierarchy. .. but does not guarantee the type integrity for existing classes
  • 60. F-Bound polymorphism The self reference guarantee the integrity of each existing and new subclass. F-Bound polymorphism is a self- referenced bound polymorphism.
  • 61. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Data flow control Continuation passing style
  • 62. Data flow back pressure A data flow control mechanism handling back pressure on bounded mail boxes of upstream actors. Scala actors provide a reliable way to deploy workflows on a distributed environment. However, some nodes may experience slow processing and create performance bottlenecks.
  • 63. Data flow back pressure Actor-based workflow has to consider - Cascading failures => supervision strategy - Cascading bottleneck => Mailbox back-pressure strategy Workers Router, Dispatcher, …
  • 64. Messages passing scheme to process various data streams with transformations. Dataset Workers Controller Watcher Load-> Compute-> Bounded mailboxes <- GetStatus Status -> Completed-> Data flow back pressure
  • 65. Worker actors processes data chunk msg.xt sent by the controller with the transformation msg.fct Message sent by collector to trigger computation Data flow back pressure
  • 66. Watcher actor monitors messages queues report to collector with Status message. GetStatus message sent by the collector has no payload Data flow back pressure
  • 67. Controller creates the workers, bounded mailbox for each worker actor (msgQueues) and the watcher actor. Data flow back pressure
  • 68. The Controller loads the data sets per chunk upon receiving the message Load from the main program. It processes the results of the computation from the worker (Completed) and throttle the input to workers for each Status message. Data flow back pressure
  • 69. The Load message is implemented as a loop that create data chunk which size is adjusted according to the load computed by the watcher and forwarded to the controller, Status Data flow back pressure
  • 70. Simple throttle increases/decreases size of the batch of observations given the current load and specified watermark. Data flow back pressure Selecting faster/slower and less/more accurate version of algorithm can also be used in the regulation strategy
  • 71. Feedback control loop adjusts the size of the batches given the load in mail boxes and complexity of the computation Data flow back pressure
  • 72. • Feedback control loop should be smoothed (moving average, Kalman…) • A larger variety of data flow control actions such as adding more workers, increasing queue capacity, … • The watch dog should handle dead letters, in case of a failure of the feedback control or the workers. • Reactive streams introduced in Akka 2.2+ has a sophisticated TCP-based propagation and back pressure control flows Notes Data flow back pressure
  • 73. Higher kind projection Contravariant functors Monadic composition Streams Views Type classes Stacked mixins models Cake pattern Magnet pattern View bounds F-bound polymorphism Dataflow back pressure Continuation passing style
  • 74. Delimited continuation Continuation Passing Style (CPS) is a technique that abstracts computation unit as a data structure in order to control the state of a computer program, workflow or sequence of data transformations Continuations are used to ‘jump’ to a method that produces a call to the current method. They can be regarded as ‘functional GOTO’
  • 75. Delimited continuation A data transformation (or computation unit) can be extended (continued) with another transformation known as continuation. The continuation is provided as argument of the orginal transformation. Let’s consider the following workflow The first workflow is not a continuation, the second is
  • 76. Delimited continuation A delimited continuation is a section of the workflow that is reified into a function returning a value. This technique relies on control delimiters (shift/reset) to make the continuation composable and reusable.
  • 77. More Scala nuggets… • Domain specific language • Reactive streams • Back-pressure strategy using connection state Wait a minute, there is more…..

Notes de l'éditeur

  1. Context of the presentation: The transition from Java and Python to Scala is not that easy: It goes beyond selecting Scala for its obvious benefits. - support functional concepts - leverage open source libraries and framework if needed - fast, distributed enough to handle large data sets Scala was the most logical choice. Scientific programming may very well involved different roles in a project: Mathematicians for formulas Data scientists for data processing and modeling Software engineering for implementation Dev. Ops and performance engineers for deployment in production In order to ease the pain, we tend to to learn/adopt Scala incrementally within a development team.. The problem is that you end up with an inconsistent code base with different levels of quality and the team developed a somewhat negative attitude toward the language. The solution is to select a list of problems or roadblocks (in our case Machine learning) and compare the solution in Scala with Java, Python ... (You sell the outcome not the process). Presentation A set of diverse Scala features or constructs the entire team agreed that Scala is a far better solution than Python or Java.
  2. Disclaimer…
  3. Being an object oriented and functional language, Scala has a lot of features and powerful constructs to choose from…. Here is a list of some of the features of Scala that are particularly valuable for writing scientific workflows, machine learning algorithms and complex analytics solutions.
  4. Geometric entities on a differential (Riemann) manifold are defined as tensors. Tensor can be Covariant Contra variant A bilinear form such as Tensor product Inner product n-differential forms …
  5. Some useful references, on the theory of categories and monads .. 1 One Div Zero Monads are Elephants Part 2 J. Iry Blog 2007 http://james-iry.blogspot.com/2007/10/monads-are-elephants-part-2.html 2. Monad Design for the Web §7 A Review of Collections as Monads L.G. Meredith Artima 2012
  6. For better understanding of kernel functions in Machine learning Introduction to Machine Learning §Nonparametric Regression: Smoothing Models. E. Alpaydin MIT Press 2007 A Short Introduction to Learning with Kernels B. Scholkopt, Max Planck Institut f ̈ur Biologische Kybernetik A. Smola Australian National University 2005 http://alex.smola.org/papers/2003/SchSmo03c.pdf The purpose here is to generate and experiment with any kind of explicit kernels by defining and composing two g and h functions A function h operates on each feature or component of the vector A function g is the transformation of the dot product of the two vectors. The dot product is computed by applying the function to all the elements and compute the sum. The “dot” product K is computed by traversing the two observations (vector of features), computing the sum and finally applying the g transform. The variable type is the type of the function g (F1 = Double => Double)
  7. Once the functor is defined, the monad is created by adding the flatMap method. The monad, KFMonad which take a kernel function as argument is defined as an implicit class so kernel functions “inherits” the monadic methods. The map and flatMap transformation applies to the g function or transformation on the inner product. The flatMap method is implemented by creating a new Kernel and applying the transformation to only one of the component of the Kernel function: function h (in red). This “partial” monadic operation is good enough for building Kernel functions on the fly.
  8. Kernel functions that project the inner product to the manifold for non-linear models belong to the family of exponential functions Polynomial functions and Radius basis functions are two of the most commonly used kernel functions. Note: The source code is shown here to illustrate the fact that the implementation in any other language would be a lot more messy and won’t be able to fit in any of those slides.
  9. Finally we can chain flatMap to map to compose two kernel function and compute the dot/inner product of the resulting kernel function. Note that composed kernel function used the h function of the last invocation in the for instruction. The method does not expose the functor or KF classes that wraps the components of the kernel function.
  10. Real-time streaming is becoming popular (i.e. Apache Spark streaming library, Akka reactive streams….). Short of using one of these frameworks, you can create a simple streaming mechanism for large data sets that require. Streams vs. Iterator: Iterator does not allow to dynamically select the chunk of memory or preserve it if necessary for future computation. It is not uncommon to have to train a model as labeled data becomes available (online training). In this case, the size of the data set can be very large and unknown. The processing of the data would result in high memory consumption and heavy load on the Garbage collector. Finally, the traversing the entire data sets (~ allocating large memory chunk) may not even needed as some computation may abort if some condition is met. Scala’s streams can help!
  11. In order to minimize the memory footprint, two actions have to take place Allocate slice of the data set (memory chunk) from the heap using take method. Release the memory chunk back to the Garbage collection through a drop method. This example is taking from “Scala for Machine Learning” Packt Publishing. Most of machine learning algorithms consists of minimizing a loss function or maximizing a likelihood. For each iteration (or recursion) the cumulative loss is computed and passed to the optimizer (i.e. Gradient descent or variant) to update the model parameters. Each slice of n observations is allocated, processed by the loss function then released back to the Garbage collector.
  12. The Loss class has a single method, exec, that traverses the stream. Once again, the loss is computed using a tail recursion. An observation is defined as y = f(x) where x the feature set containing for instance, age, ethnicity of patient, body temperature and y is the label value such as the diagnosed disease.. The tail recursion allocates the next slice of STEP observations through the take method, computes the lost, nextLoss, then drop the slice. The reference is recursively redefined as the reference to the remaining stream. The problem is that the garbage collector cannot reclaim the memory because the first reference to the stream is created outside the recursion. The solution is to declare the reference to the stream as weak so it chunks of memory associated to the slices/batches of observations already processed can be reclaimed.
  13. The reference of the stream is created as a Weak java reference to an instance created by the stream constructor, Stream.iterate. In this case, the weak reference has been used to show Java concepts are still relevant.
  14. Let’s compare the memory consumption of three strategies to compute the loss function on a very large dataset. A list A stream with standard reference A stream with weak reference In the first scenario, and as expected, the memory for the entire data set is allocated before processing. The memory requirement for the stream with a strong reference increases each time a new slice is instantiated, because the memory block is held by the reference to the original stream. Only the stream with a weak reference guarantees that only the memory for a slice of STEP observations is needed through the entire execution.
  15. The first thing to come to mind in creating complex system from existing objects (or classes) is a factory design pattern. Design patterns have been introduced by the “Gang of four” in the eponymous “Design Patterns: Elements of Reusable Object-Oriented software” some 20 years ago… The list of factory design patterns includes Builder, Prototype, Factory method, Composite, Bridge and obviously Singleton. Those patterns are not very convenient for weaving data transformation (these transformation being defined as class or interface). This is where dependency injection popularized by the Spring framework comes into play. Beyond composition and inheritance, Scala enables us to implement and chain data transformations/reductions by stacking the traits that declare these transformations or reductions.
  16. The implementation in Scala matches perfectly the universal mathematical formalism Here is another example Declaration variable 𝑥∈ℝ, 𝑦∈ℝ Declaration of model f(x,y)=𝑥+𝑦 Instantiation of variable 𝑥=5, 𝑦=7; 𝑓 5,7 =12
  17. The first thing to come to mind in creating complex system from existing objects (or classes) is a factory design pattern. Design patterns have been introduced by the “Gang of four” in the eponymous “Design Patterns: Elements of Reusable Object-Oriented software” some 20 years ago… The list of factory design patterns includes Builder, Prototype, Factory method, Composite, Bridge and obviously Singleton. Those patterns are not very convenient for weaving data transformation (these transformation being defined as class or interface). This is where dependency injection popularized by the Spring framework comes into play. Beyond composition and inheritance, Scala enables us to implement and chain data transformations/reductions by stacking the traits that declare these transformations or reductions.
  18. We briefly mentioned that the for comprehension can be used to chain/stack data transformation. Dependency injection provides a very flexible approach to create workflows dynamically, sometimes referred as the Cake pattern. Note: As far the 3rd point, deployment of tasks, it usually involves a actor-based (non blocking) distributed architecture such as Akka and Spark. We will mention it briefly later in this presentation is introducing mailbox back-pressure mechanism.
  19. Let’s look at the Training module/task as an example. The task of training a model is executed by a Supervisor instance that can be either a support vector machine or a multi-layer perceptron, in this simplistic case. Each of these two “supervisors” can have several implementations (single host, distributed through a low-latency network,…) Once defined, the modules are to be weaved/chained by making sure that output of a module/tasks matches the input of the subsequent task. Notes: The training module can be broken down further into generative and discriminative models. Real-world applications are significantly more complex and would include REST service, DAO to access relational database, caches…. The terms “module”, “tasks” or “computational tasks” are used interchangeably in this section.
  20. Let’s consider the Preprocessing module (or task) implemented as trait Preprocessing of a data set is performed by a processor of type Preprocessor that is defined at run time. Therefore the preprocessor has to be declared as an abstract value. The three preprocessors defined in the preprocessing modules are Kalman filter, Moving Average (MovAv) and Discrete Fourier filter (DFTF). Those 2 inner classes act as adapter or stub to the actual implementation of those algorithm.
  21. Here is an implementation of the Kalman and Discrete Fourier transform band-pass filter. The2 inner classes, Kalman and DFTF act as adapter or stub to the actual implementation of those algorithm. It allows the implementation may consist of multiple version. For instance filtering.Kalman is a trait with several implementation of the algorithm (single host, distributed, using Spark…) Such design allows to Select the type of preprocessing algorithm within the Preprocessing module or namespace Select the implementation of this particular type of algorithm/preprocessor in the filtering package
  22. From the data management perspective, Clustering implements two consecutive data transformations: preprocessing and dimension reduction.
  23. Modeling workflow is created by chaining an implementation of the filter, training and validator, all selected at run-time. Modeling is therefore implemented as a stack of 3 traits, each representing a transformation or reduction on data sets.
  24. Computational tasks related to machine learning can be complex and lengthy. The process should be able to select the appropriate date flow (or sequence of data transformation or reduction) at run time, according to the state of the computation. In this simple case, a clustering task is triggered if anomalyDetection is needed, training a model is launched otherwise. These conditional path execution are important for complex analysis or lengthy computation that require unattended execution (i.e. overnight or over the week-end). Note: The overriding of the abstract value for the Modeling workflow are omitted here for the sake of clarity Summary: This factory pattern operates on 3 level of componentization: Dynamic selection of 1- Workflow or sequence of tasks according to the objective of the computation (i.e. Clustering => Preprocessing) 2- Task processing algorithm according to the data (i.e. Preprocessing => Kalman filter) 3- Implementation of task processing according to the environment (i.e. Kalman filter => Implementation on Apache Spark)
  25. The objective is to avoid bottleneck in the computation data flow which would result in overflowing actors’ mail box/local buffers. A strategy to control the flow (or back-pressure flow) is needed to regulate the data flow across all modules. This example use a back-pressure handling mechanism that consists of monitoring bounded mail boxes. This is a simplistic approach to flow control described for the sake of illustrating the concept. As we will see later, there is a far more effective mechanism to deal with back-pressure.
  26. We mentioned earlier that a learning platform requires implementing, wiring and deploying tasks. Akka or framework derived from Akka, are commonly used to deploy workflow for large datasets because of the immutability, non-blocking and supervision characteristic for actors. Scala/Akka actors are resilient because that are defined with hierarchical context for which an actor because a supervisor to other actors. In this slide, a router is a supervising actor to the workers and, depending on the selected strategy, is responsible of restarting a worker in case of failure. But, what about the case for which the load (number of messages in the
  27. In this example, an actor, Controller loads chunk of data, partitions and distributed across multiple Worker actors, along with a data transformation. Upon receiving the message ‘Compute’ the workers process data given a transformation function. The workers returns the processed data through a Completed message. The purpose of the watch dog actor, ‘Watcher’ is to monitor the utilization of mailbox and report it to the Controller This is a simple feedback control: 1- Watcher monitors the utilization of the mailbox (average length) 2- The controller adjust the size of each batch in the load message handler (throttling) 3- The workers process the next batch
  28. Let’s start with the Worker actor. The load on a worker depends on three variables 1- The amount of data to process 2- The complexity of the data transformation 3- The underlying system (cores, memory..) The controller provides the slice of data to be processed by the workers msg.xt as well as the data transformation msg.fct.
  29. Let’s look at our watch dog actor, Watcher: It computes the load as the average mailbox utilization and send it back to the controller through a Status message.
  30. As its name implies, the controller configure and manage the dynamic workflow and control the back pressure from the worker actors. As far as the configuration is concerned, the Controller generates a list of workers, the bounded mailboxes, msgQueues, for the workers and ultimately the watcher actor. The worker and watcher are created within the Controller context using the Akka actorOf constructor.
  31. As far as the management of data flow and feedback control loop, the Controller loads partition and distribute batch of data points to the worker actors (message: Load) processes the results of the computation in workers (message: Completed) - Throttle up or down the flow upon receiving the status on utilization of mail boxes from the watcher (message: Status)
  32. The composition of the messages processed by the controller are self-explanatory. It Adjusts the size of the next batch if required (throttle method) Extracts the next batch of data from the input stream Partition and distribute the batch across the worker actors Send the partition along with the data transformation to the workers.
  33. The implementation of the throttle method is rather simple. It takes the load computed by the watcher actor and the current batch size (number of data points to be processed) as input. It update the batch size using a simple ratio relative to the watermark. For instance if load is below the watermark, the batch size is increased.
  34. The bottom graph describes the throttle action and the complexity of data transformation. The complexity of the data transformation and has an impact on the load on workers. It varies from 0 (simple map operation) to 2 (complex data processing involving recursion or iterations). The throttle intensity ranges between -6 (rapid decrease of size of batches) and +6 (rapid increase of size of batches of data) The top graph displays the actual utilization of the mail boxes with capacity of 512 messages as regulated by the feedback control loop (executed by the controller).
  35. The deployment of a reactive data flow in production would require significant improvement on our Naïve model. The feedback control loop could be smoothed with a moving average technique or Kalman filter to avoid erratic behavior We would need to provide a larger range of options for control actions beside adjusting the size of data batches: increase of number of workers, mail box capacity, caching strategy, .. A fine grained set of actions reduces also the risk of instable systems. The watch dog should be able to handle dead letters in case of failure (mailbox overflowing) Reactive streams control the flow back pressure at the TCP connection level. It is far more accurate and responsive that mailbox utilization.