From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Introducing the WSO2 Complex Event Processor
1. Introducing the WSO2
Complex Event Processor
Simplifying Complexities of Data Processing
S. Suhothayan
Software Engineer,
Data Technologies Team.
2. Outline
ƒ Introduction to CEP
ƒ WSO2 CEP Server
ƒ Siddhi Runtime
ƒ HA & Scalability of WSO2 CEP
ƒ WSO2 CEP server and WSO2 BAM
ƒ Use Cases
3. Event Processing (Contd.)
ƒ Event processing is about listening to events and
detecting patterns in near real-time without storing
all events.
ƒ Three models
o Simple Event Processing
- Simple filters (e.g. Is this a gold or platinum customer?)
o Event Stream Processing
- Looking across multiple event streams and joining
multiple event stream etc.
o Complex Event Processing
- Processing multiple event streams to identify meaningful
patterns, using complex conditions & temporal windows
- E.g. There has been a more than 10% increase in overall
trading activity AND the average price of commodities
has fallen 2% in the last 4 hours
4. Complex Event Processing
ƒ We categorize events into different streams
ƒ Process with minimal storage
ƒ Use queries to evaluate the continuous event
streams (Usually SQL like query language)
ƒ Very fast results (in milliseconds range)
5. CEP Queries
ƒ Types of queries are following
o Filters and Projection
o Windows – events are processed within temporal
windows (e.g. for aggregation and joins).
Time window vs. length window.
o Ordering – identify event sequences and patterns
(e.g. for a credit card new location followed by
small and a large purchase might suggest a fraud)
o Joins – join two streams
6. Example Query
from p=PINChangeEvents#window.time(3600) join
t=TransactionEvents[amount>10000]#window.time(3600)
on p.custid==t.custid
return t.custid, t.amount;
7. Opensource CEP Runtimes
ƒ Siddhi
o Apache License, a java library, Tuple based event
model
o Supports distributed processing
o Supports multiple query models
- Based on a SQL-like language
- Filters, Windows, Joins, Ordering and others
ƒ Esper, http://esper.codehaus.org
o GPLv2 License, a Java library, Events can be XML, Map,
Object
o Supports multiple query models
- Based on a SQL-like language
- Filters, Windows, Joins, Ordering and others
ƒ Drools Fusion
o Apache License, a java library
o Support for temporal reasoning + windows
8. WSO2 CEP Server
ƒ Enterprise grade server for CEP runtimes
ƒ Provides support for several transports
(network access) and data formats
o SOAP/WS-Eventing – XML messages
o REST/JSON – JSON messages
o JMS – map messages, XML messages
o Thrift – WSO2 data bridge format
- High Performant Event Capturing & Delivery Framework
supports Java/C/C++/C# via Thrift language bindings.
ƒ Support multiple CEP runtimes
o Siddhi – WSO2, new, very fast, distributed
o Esper - well known CEP runtime
o Drools Fusion – rule based, but much slower
ƒ Easy plugin new brokers, new CEP engines
10. CEP Buckets
ƒ CEP Bucket is a
logical execution
unit
ƒ Each CEP bucket has
set of queries,
event sources and
input, output event
mappings.
ƒ It is one-one with a
CEP engine
11. Management UI
ƒ To define
buckets
ƒ Update running
queries without
resetting
current
execution
states
ƒ Manage brokers
(Data adopters)
12. Developer Studio UI
ƒ Eclipse based
tool to define
buckets
ƒ Can manage
the
configurations
through the
production
lifecycle
14. Big Picture
ƒ Users provide query/queries
ƒ Map event streams to queries
ƒ Siddhi keep the queries running and invoke
callbacks registered against one or more
queries/streams
ƒ Example Query
from cseEventStream[ symbol == ‘IBM’]#win.time(50000)
insert into IBMStockQuote symbol, avg(price) as avgPrice
16. Siddhi Queries: Filters
from <stream-name> [<conditions>]*
insert into <stream-name>
ƒ Filters the events by conditions
ƒ Conditions
o >, <, = , <=, <=, !=
o contains
o and, or, not
ƒ Example
from cseEventStream[price >= 20 and symbol==’IBM’]
insert into StockQuote symbol, volume
17. Window
from <stream-name> [<conditions>]#window.<window-name>(<parameters>)
Insert [<output-type>] into <stream-name
ƒ Types of Windows
o (Time | Length) (Sliding| Batch) windows
o Unique window, First unique (not supported in 1.0)
ƒ Type of aggregate functions
o sum, avg, max, min
ƒ Example
from cseEventStream[price >= 20]#window.lengthBatch(50)
insert expired-events into StockQuote
symbol, avg(price) as avgPrice
group by symbol
having avgPrice>50
18. Join
from <stream>#<window> [unidirectional] join <stream>#<window>
on <condition> within <time>
insert into <stream>
ƒ Join two streams based on a condition and window
ƒ Join can be in multiple forms ((left|right|full outer) |
inner) join - only inner is supported in 1.0
ƒ Unidirectional – event arriving only to the
unidirectional stream triggers the join
ƒ Example
from TickEvent[symbol==’IBM’]#win.length(2000)
join NewsEvent#win.time(500)
insert into JoinStream *
19. Pattern
from [every] <condition> Æ [every] <condition> … <condition>
within <time>
insert into StockQuote (<attribute-name>* | * )
ƒ Check condition A happen before/after condition B
ƒ Can do iterative checks via “every” keyword.
ƒ Here with “within <time>”, SIddhi emits only events
that are within that time of each other
ƒ Example
from every (a1 = purchase[price < 10] )
Æa2 = purchase [price >10000 and a1.cardNo==a2.cardNo]
within 300000
insert into potentialFraud
a2. cardNo as cardNo, a2. price as price, a2.place as place
20. Sequence
from <event-regular-expression> within <time> insert into <stream>
ƒ Regular Expressions supported
o * - Zero or more matches (reluctant).
o + - One or more matches (reluctant).
o ? - Zero or one match (reluctant).
o or – either event
ƒ Here we have to refer events returned by * , + using
square brackets to access a specific occurrence of
that event
From a1 = requestOrder[action == "buy"],
b1 = cseEventStream[price > a1.price and symbol==a1.symbol]+,
b2 = cseEventStream[price <b1.price]
insert into purchaseOrder
a1. symbol as symbol, b1[0].price as firstPrice, b2.price as orderPrice
21. Performance Results
ƒ We compared Siddhi with Esper, the widely used
opensource CEP engine
ƒ For evaluation, we did setup different queries using both
systems, push events in to the system, and measure the
time till all of them are processed.
ƒ We used Intel(R) Xeon(R) X3440 @2.53GHz , 4 cores 8M
cache 8GB RAM running Debian 2.6.32-5-amd64 Kernel
22. Performance Comparison With ESPER
Simple filter without window
from StockTick[prize >6] return symbol, prize
23. Performance Comparison With ESPER
State machine query for pattern matching
From f=FraudWarningEvent ->
p=PINChangeEvent(accountNumber=f.accountNumber)
return accountNumber;
24. Siddhi Features
ƒ Supports State Persistence
o Enabling Queries to span lifetimes much greater
than server uptime.
o By taking periodic snapshots and storing all state
information and windows to a scalable persistence
store (Apache Cassandra).
o Pluggable persistent stores.
ƒ Support Highly Available Deployment
o Using Hazelcast distributed cache as a shared
working memory.
25. HA/ Persistence
ƒ This is ability to recover
runtime state in the
case of a failure
ƒ CEP server can support
if CEP engine supports
persistence (OK with
Siddhi, Esper)
26. Scaling
ƒ CEP pipeline can be distributed,But queries like
windows, patterns, and Join are hard to distribute
ƒ WSO2 CEP with Siddhi uses distributed cache
(Hazelcast) as shared memory and selective
processing approach to achieve massive scalability in
distributed processing
27. Event Recording
ƒ Ability to record all/some of the events for
future processing
ƒ Few options
o Publish them to Cassandra cluster using WSO2 data
bridge API or BAM (can process data in Cassandra
with Hadoop using WSO2 BAM).
o Write them to distributed cache
o Custom thrift based event recorder
31. Scenario
ƒ Monitoring stock exchange for game changing
moments
ƒ Two input event streams.
o Event stream of Stock Quotes from a stock
exchange
o Event stream of word count on various company
names from twitter pages
ƒ Check whether the last traded price of the
stock has changed significantly(by 2%) within
last minute, and people are twitting about that
company (> 10) within last minute
36. Queries
from allStockQuotes[win.time(60000)]
insert into fastMovingStockQuotes
symbol,price, avg(price) as averagePrice
group by symbol
having ((price > averagePrice*1.02) or (averagePrice*0.98 > price ))
from twitterFeed[win.time(60000)]
insert into highFrequentTweets
company as company, sum(wordCount) as words
group by company
having (words > 10)
from fastMovingStockQuotes[win.time(60000)] as fastMovingStockQuotes
join highFrequentTweets[win.time(60000)] as highFrequentTweets
on fastMovingStockQuotes.symbol==highFrequentTweets.company
insert into predictedStockQuotes
fastMovingStockQuotes.symbol as company,
fastMovingStockQuotes.averagePrice as amount,
highFrequentTweets.words as words
37. Alert
ƒ As a XML
<quotedata:StockQuoteDataEvent
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:quotedata="http://ws.cdyne.com/">
<quotedata:StockSymbol>{company}</quotedata:StockSymbol>
<quotedata:LastTradeAmount>{amount}</quotedata:LastTradeAmount>
<quotedata:WordCount>{words}</quotedata:WordCount>
</quotedata:StockQuoteDataEvent>
38. Useful links
ƒ WSO2 CEP 2.0.0 Milestone 2
https://svn.wso2.org/repos/wso2/people/suho/packs/cep/wso2cep-2.0.0-
M2.zip
ƒ Distributed Processing Sample With Siddhi CEP
and ActiveMQ JMS Broker.
http://suhothayan.blogspot.com/2012/08/distributed-processing-sample-for-wso2.
html