SlideShare une entreprise Scribd logo
1  sur  41
Enabling Operational Intelligence
Using Hadoop MapReduce
Copyright © 2014 by ScaleOut Software, Inc.
Hadoop Summit
June 3-5, 2014
Bill Bain, CEO (wbain@scaleoutsoftware.com)
2 ScaleOut Software, Inc.
• The Need for Operational Intelligence (OI)
• Operational Intelligence vs. Business Intelligence
• Implementing OI Using In-Memory Computing:
• In-Memory Data Grid
• Data-Parallel Computation: “Parallel Method Invocation”
• Implementing MapReduce Unchanged on an IMDG
• A Detailed Example in Financial Services
• Video Demo
• Examples of Applications in Operational Intelligence
Agenda
3 ScaleOut Software, Inc.
Goal: Provide immediate feedback to a system handling live data.
A few examples:
• Equity trading: to minimize risk during a trading day
• Ecommerce: to optimize real-time shopping activity
• Reservations systems: to identify issues, reroute, etc.
• Credit cards & wire transfers: to detect fraud in real time
• Smart grids: to optimize power distribution & detect issues
Online Systems Need Operational
Intelligence
4 ScaleOut Software, Inc.
• To keep up with fast
growing “live” workloads &
maintain fast response times:
• Ex.: Handle incoming data
streams in real time.
• Ex. Process updates to data
set based on incoming data.
• To identify and respond to
trends in fast-changing data:
• Ex. Evaluate data set changes in
real time.
• Ex. Respond to identified
patterns within seconds.
Challenges for Operational Intelligence
0
50
100
150
200
250
300
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
Millions
Growth in Web Servers
Source:
Netcraft
0
500
1000
1500
2000
2500
3000
3500
4000
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
Exebytes
Growth in “Big Data”
“More data has been
created in the past three
years than in the past
40,000.”
5 ScaleOut Software, Inc.
Big Data Analytics
Real-Time vs. Batch Analytics
Static data sets
Petabytes
Disk storage
Hours to minutes
Best uses:
• Analyzing
warehoused data
• Mining for long-
term trends
Live data sets
Gigabytes to terabytes
In-memory storage
Minutes to seconds
Best uses:
• Tracking live data
• Immediately
identifying trends
and capturing
opportunities
• Providing immediate
feedback
IMDGs
Spark
Storm
CEP
Hadoop
IBM
Teradata
Oracle
SAP
Real-Time Batch
Real-time
“Operational Intelligence”
Batch
“Business Intelligence”
6 ScaleOut Software, Inc.
• Traditional Hadoop MapReduce
platforms analyze offline data:
• Very large, disk-based datasets
• Data repeatedly copied from disk to
memory.
• Batch-scheduled (multi-tenant)
• IMDGs store and analyze live data:
• Fast-changing, operational data
integrated with live updates
• Data kept memory-resident (data
motion is minimized)
• Inline-scheduled (single tenant)
Design Goals for Hadoop vs. IMDGs
7 ScaleOut Software, Inc.
• Operational intelligence can co-exist with business intelligence:
• Processes streaming data close to its sources.
• Provides real-time, “tactical” feedback (e.g., recommendations, alerts).
• Translates data for storage in the data warehouse (ETL).
• Data warehouse provides “strategic” guidance.
• Using the same tool set (i.e., Hadoop MapReduce) lowers TCO:
• Leverages common skill set.
• Simplifies design (e.g., loading data into HDFS).
Integrated View of Analytics
8 ScaleOut Software, Inc.
• In-memory data
grid (IMDG) holds
active entities
undergoing state
changes in memory.
• IMDG updates entities
with incoming stream
of state changes.
• Backing store
optionally holds large
population of entities.
• Analytics engine
examines entities in
real time and
generates alerts
within seconds as
needed.
In-Memory Architecture for
Operational Intelligence
9 ScaleOut Software, Inc.
In-Memory Data Grid (IMDG) stores “live” data in a cluster:
• Fits in the business logic layer:
• Follows object-oriented view of data
(vs. relational view).
• Stores unstructured collections of
Java/.NET objects.
• Uses create/read/update/delete
and query APIs to access data.
• Implemented across a cluster of
servers or VMs:
• Scales storage and throughput
by adding servers.
• Provides high availability
in case a server fails.
In-Memory Data Grid for Live Data
10 ScaleOut Software, Inc.
• IMDG’s collections of objects behave like
in-memory collections:
• Unstructured, typically instances of a class
(stored as serialized blobs)
• Individually accessible / update-able
• IMDG adds attributes:
• Accessible by global key
• Query-able by properties
• Highly available
• Optional timeouts
• Distributed locking
• Integration with a backing store
• Optional dependency relationships
• Asynchronous event handling
IMDG Stores “Live” Data
Basic “CRUD” APIs:
• Create(key, obj, tout)
• Read(key)
• Update(key, obj)
• Delete(key)
and…
• Lock(key)
• Unlock(key)
Object
key
11 ScaleOut Software, Inc.
IMDG Analyzes Live Data
• Integrated execution engine: “Parallel Method Invocation” (PMI)
• Object-oriented version of HPC
data-parallel computing model
• Serves as a platform for implementing
MapReduce and other data-parallel
operators.
• Runs user-defined methods in
parallel across the cluster.
• Globally merges results.
• Benefits:
• Simple, well understood model
• Fast startup time
• Fast global barrier
• Minimum data motion
• Automatic code shipping
Analyze Data
(Eval)
Combine Results
(Merge)
12 ScaleOut Software, Inc.
PMI Enables Linear Speedup
Avoids data motion (network or disk I/O) which limits throughput:
13 ScaleOut Software, Inc.
Spark / Spark Streaming from U.C.
Berkeley amplab:
• In-memory computing to accelerate and extend
Hadoop MapReduce using data-parallel
operators in Scala.
• Stores data as “resilient
distributed datasets” (RDDs):
• Distributed across cluster
• Immutable
• Hold data from/output to HDFS.
• Store data stream as a sequence of RDDs.
• Comparison to IMDG:
• Not designed for “live” data:
• Lacks CRUD on individual objects.
• Lacks high availability.
• Designed for “data parallel” transformations.
Comparison: IMDGs to Spark
14 ScaleOut Software, Inc.
Run MapReduce as two PMI
phases:
• Data can be input from either the
IMDG or an external data source.
• Works with any input/output format
compatible with the Apache
distribution.
• IMDG uses its data-parallel
execution engine (PMI) to invoke
the mappers and the reducers.
• Eliminates batch scheduling
overhead.
• Intermediate results are stored
within the IMDG.
• Minimizes data motion between the
mappers and reducers.
• Allows optional sorting.
• Output of a single reducer/combiner
optionally can be globally merged.
Implementing MapReduce on IMDG
15 ScaleOut Software, Inc.
// This job will run using the Hadoop
// job tracker:
public static void main(String[] args)
throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf,
"wordcount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(
TextInputFormat.class);
job.setOutputFormatClass(
TextOutputFormat.class);
FileInputFormat.addInputPath(
job, new Path(args[0]));
FileOutputFormat.setOutputPath(
job, new Path(args[1]));
job.waitForCompletion(true);
}
// This job will run using ScaleOut hServer:
public static void main(String[] args)
throws Exception {
Configuration conf = new Configuration();
Job job = new HServerJob(conf,
"wordcount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(
TextInputFormat.class);
job.setOutputFormatClass(
TextOutputFormat.class);
FileInputFormat.addInputPath(
job, new Path(args[0]));
FileOutputFormat.setOutputPath(
job, new Path(args[1]));
job.waitForCompletion(true);
}
Configuring Application for the IMDG
• Without YARN, just subclass the Hadoop Job class with a one-line
change:
16 ScaleOut Software, Inc.
Running Under YARN
• With YARN, just replace the MapReduce execution framework:
• Example of running MapReduce on
IMDG using Hortonworks YARN:
• YARN directs jobs
to IMDG.
• IMDG accelerates
execution.
$ hadoop jar hadoop-mapreduce-examples.jar wordcount
-Dmapreduce.framework.name=hserver-yarn in out
17 ScaleOut Software, Inc.
• With YARN, IMDG can run Apache or
other Hive distribution unchanged.
• Accelerates queries for datasets hosted
in HDFS or the IMDG.
• Limitation: Intermediate data must fit
within the IMDG.
• Implementation note:
• Hive not thread-safe
• Requires multiple JVMs per server for one
Hive query
• Currently seeing 3X speedup (tuning in
progress)
• More optimizations possible, but…
• Limited by “unchanged” approach
Using YARN to Run Hive on IMDG
18 ScaleOut Software, Inc.
• A Hadoop distribution does not have to be installed unless HDFS is used.
• The developer starts MapReduce applications from a remote workstation.
• The IMDG automatically builds a reusable “invocation grid” of JVMs on the
grid’s servers for PMI and ships the application’s jars.
• Results are stored in the IMDG, HDFS, or optionally globally merged and
returned to the remote workstation.
Running MapReduce on an IMDG
19 ScaleOut Software, Inc.
The invocation grid can be re-used across MapReduce jobs:
Accelerating Start-Up Times
//Configure and load the invocation grid
InvocationGrid grid = HServerJob.getInvocationGridBuilder("myGrid").
// Add JAR files as IG dependencies
addJar("main-job.jar"). addJar(“mylib.jar").
// Add classes as IG dependencies
addClass(MyMap.class). addClass(MyRed.class).
// Define custom JVM parameters
setJVMParameters("-Xms512M -Xmx1024M").
load();
//Run 10 jobs on the same invocation grid
for(int i=0; i<10; i++) {
Configuration conf = new Configuration();
//The preloaded invocation grid is passed to the job.
Job job = new HServerJob(conf, "Job number "+i, false, grid);
//......Configure the job here.........
//Run the job
job.waitForCompletion(true);
}
//Unload the invocation grid when we are done
grid.unload();
20 ScaleOut Software, Inc.
• IMDG adds grid input format for
accessing key/value pairs held in the
IMDG.
• MapReduce programs optionally can
output results to IMDG with grid
output format.
• Grid Record Reader optimizes access
to key/value pairs to eliminate
network overhead.
• Applications can access and update
key/value pairs as operational data
during analysis.
Accessing In-Memory Data
21 ScaleOut Software, Inc.
IMDG needs multiple in-memory
storage models:
• Named cache, optimized for
rich semantics on large
objects:
• Property-based query
• Distributed locking
• Access from remote grids
• Named map, optimized for
efficient storage and bulk
analysis (e.g., MapReduce):
• Highly efficient object storage
• Pipelined, bulk-access
mechanisms
• Follows Java Named Map
semantics.
Optimizing In-Memory Storage for M/R
22 ScaleOut Software, Inc.
In-Memory Named Map:
• Stores key/value pairs in chunks.
• Allows CRUD operations on kvps.
• Automatically organizes chunks into
splits.
• Uses per-split hash table to access
keys and manage multi-valued
keys.
• Stores shuffled data set between
mappers and reducers.
• Pipelines chunks to mappers and
from reducers.
• Optionally uses memory mapped
files to reduce access latency.
• Provides support for sorting keys.
Named Map Optimizations
23 ScaleOut Software, Inc.
• IMDG adds Dataset Record Reader (wrapper) to cache HDFS data
during program execution.
• Hadoop automatically retrieves data from IMDG on subsequent runs.
• Dataset Record Reader
stores and retrieves data
with minimum network
and memory overheads.
• Tests with Terasort
benchmark have
demonstrated 11X
faster access latency
over HDFS without IMDG.
Optional Caching of HDFS Data
24 ScaleOut Software, Inc.
• IMDG caches “chunks” of key/value pairs instead of HDFS records.
• Serves key/value pairs directly to mappers on cache access.
• Avoids overhead of reparsing records.
Details of HDFS Caching
“Record” Phase “Playback” Phase
25 ScaleOut Software, Inc.
• Measured performance:
• Startup times reduced to a few milliseconds
• Word count benchmark shows 20X speedup.
• Real-world example shows >40X speedup.
• MapReduce optimizations:
• Optional sorting
• Optional multicast of parameters to mappers
• Optional O(logN) global combining (avoids
single reducer)
• Optional HDFS caching
• Optional reuse of JVMs across jobs
• Current limitations:
• No specific security for multi-tenancy
• Intermediate data must fit in the IMDG
Performance & Optimizations
26 ScaleOut Software, Inc.
In-Memory MapReduce:
• Enables use of Hadoop MapReduce for operational intelligence.
• Accelerates data access by holding data in memory.
• Analyzes and updates “live” data.
• Reduces overheads of standard
Hadoop distributions:
• Batch scheduling
• Disk access
• Data shuffling
• Mandatory key sorting
• Avoids vendor-specific APIs:
• Leverages Hadoop skill sets.
Summary of Benefits
27 ScaleOut Software, Inc.
Integrate analysis into a stock trading platform:
• The IMDG holds market data and hedging strategies.
• Updates to market data
continuously flow through
the IMDG.
• The IMDG performs
repeated data-parallel
analysis on hedging
strategies and alerts
traders in real time.
• IMDG automatically and dynamically
scales its throughput to handle new
hedging strategies by adding servers.
• Measured >40X speedup over Apache 1.2.
Example in Financial Services
28 ScaleOut Software, Inc.
The Challenge: Quickly evaluate and respond to sub-second
market changes:
• Hedge fund tracks a set of hedging strategies:
• Strategies can cover various market
sectors, such as high-tech, automotive,
energy, consumer, real estate, etc.
• Each strategy contains list of holdings
and rules for managing the holdings
(such as target allocations).
• Updates to market data
continuously arrive during
the trading day.
• Challenge: The hedge fund must be able to quickly update and
analyze its hedging strategies and provide alerts to traders.
Demo of the Finserv Application
29 ScaleOut Software, Inc.
• Delivers a stream of alerts to traders
within a few seconds.
• Enables the trader to examine strategy details in real time:
Output: Real-Time Alerts
30 ScaleOut Software, Inc.
• Video Link
Video
31 ScaleOut Software, Inc.
• Measured a similar financial services application (back testing stock
trading strategies on stock histories)
• Hosted IMDG in Amazon EC2 using 75 servers holding 1 TB of stock
history data in memory
• IMDG handled a continuous stream of updates (1.1 GB/s)
• Results: analyzed 1 TB in 4.1 seconds (250 GB/s) with linear scaling
Example of Performance Scaling
32 ScaleOut Software, Inc.
Fast map/reduce reconciles inventory and order systems for an
online retailer:
• Challenge: Inventory and online
order management are handled
by different applications.
• Reconciled once per day.
• Inaccurate orders reduces margins.
• Solution:
• Host SKUs in IMDG updated in real
time by order & inventory systems.
• Use MapReduce to reconcile in two minutes.
• Results: Real-time reconciliation ensures accurate orders.
Example in Ecommerce: Inventory
Management
33 ScaleOut Software, Inc.
• IMDG holds customer
information for active
Web users.
• IMDG saves/retrieves
customer information
from backing store.
• Web browsers send
activity information to
analytics engine.
• IMDG updates customer history and
preferences.
• Analytics engine identifies browsing and
buying patterns.
• Analytics engine makes suggestions in
real-time. Also sends email follow-ups.
Example: Web Shopping
34 ScaleOut Software, Inc.
• Track
connectivity
issues.
• Obtain time-
sensitive
business data.
• Offer enhanced
services.
• Increase
security.
Example: Telecommunications
Optimize Operations
Customer Experience
Historical queries
for real-time data
enrichment
Stream
persistence for
future analysis
Network
Elements
35 ScaleOut Software, Inc.
• Online systems need operational
intelligence on “live” data for
immediate feedback.
• Operational intelligence can be
implemented using Hadoop
MapReduce unchanged.
• In-memory data grid provides
an excellent platform for MR-
based operational intelligence:
• Hosts and updates “live” data.
• Implements high availability.
• Offers fast MapReduce execution
for immediate results.
• Leverages Hadoop skill sets.
Recap
Additional Information
36
37 ScaleOut Software, Inc.
• Storm implements pipelined, task-parallel execution by “bolts” on
incoming data streams.
• Streams can be distributed to bolts with configurable mappings.
• Developer controls the number of tasks per bolt.
• Storm uses a centralized master node and Zookeeper for fault-
tolerance.
• Key strength: continuous
processing of input
streams
• Issues:
• Complexity / tuning
• Minimizing data motion
• Managing global state
Comparison to Storm
38 ScaleOut Software, Inc.
• Create method to analyze a queried stock object and another
method to pair-wise merge the results:
Java Example: Parallel Method Invocation
public class StockAnalysis implements
Invokable<Stock, StockCalcParams, Double>
{
public Double eval(Stock stock, StockCalcParams param)
throws InvokeException {
return stock.getPrice() * stock.getTotalShares();
}
public Double merge(Double first, Double second)
throws InvokeException {
return first + second;
}
}
39 ScaleOut Software, Inc.
• Run a parallel method invocation on a queried set of objects:
Java Example: Parallel Method Invocation
NamedCache cache = CacheFactory.getCache("Stocks");
InvokeResult valueOfSelectedStocks =
cache.invoke(
StockAnalysis.class,
Stock.class,
or(equal("ticker", "GOOG"), equal("ticker", "ORCL")),
new StockCalcParams());
System.out.println("The value of selected stocks is" +
valueOfSelectedStocks.getResult());
40 ScaleOut Software, Inc.
• IMDG ships user’s code and libraries to its servers.
• IMDG automatically schedules analysis operations across all grid
servers and cores.
• The analysis runs on all objects selected
by the parallel query.
• Each grid server analyzes its locally stored
objects to minimize data motion.
• Parallel execution ensures fast
completion time:
• IMDG automatically distributes
workload across servers/cores.
• Scaling the IMDG automatically
handles larger data sets.
PMI: Running the Analysis
41 ScaleOut Software, Inc.
• The IMDG automatically merges all analysis results.
• The IMDG first merges all results within each grid server in parallel.
• It then merges results across all grid servers to create one combined
result.
• Efficient parallel merge
minimizes the delay in
combining all results.
• The IMDG delivers the
combined result to the
invoking application as
one object.
PMI: Merging the Results

Contenu connexe

Tendances

Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...Precisely
 
What’s New: Splunk App for Stream and Splunk MINT
What’s New: Splunk App for Stream and Splunk MINTWhat’s New: Splunk App for Stream and Splunk MINT
What’s New: Splunk App for Stream and Splunk MINTSplunk
 
ExtraHop Product Overview Datasheet
ExtraHop Product Overview DatasheetExtraHop Product Overview Datasheet
ExtraHop Product Overview DatasheetExtraHop Networks
 
Whitepaper factors to consider commercial infrastructure management vendors
Whitepaper  factors to consider commercial infrastructure management vendorsWhitepaper  factors to consider commercial infrastructure management vendors
Whitepaper factors to consider commercial infrastructure management vendorsapprize360
 
Splunk MINT and Stream Breakout
Splunk MINT and Stream BreakoutSplunk MINT and Stream Breakout
Splunk MINT and Stream BreakoutSplunk
 
Whitepaper factors to consider when selecting an open source infrastructure ...
Whitepaper  factors to consider when selecting an open source infrastructure ...Whitepaper  factors to consider when selecting an open source infrastructure ...
Whitepaper factors to consider when selecting an open source infrastructure ...apprize360
 
Experiences in Mainframe-to-Splunk Big Data Access
Experiences in Mainframe-to-Splunk Big Data AccessExperiences in Mainframe-to-Splunk Big Data Access
Experiences in Mainframe-to-Splunk Big Data AccessPrecisely
 
Federal Webinar: Application monitoring for on-premises, hybrid, and multi-cl...
Federal Webinar: Application monitoring for on-premises, hybrid, and multi-cl...Federal Webinar: Application monitoring for on-premises, hybrid, and multi-cl...
Federal Webinar: Application monitoring for on-premises, hybrid, and multi-cl...SolarWinds
 
Orion NTA Customer Training
Orion NTA Customer TrainingOrion NTA Customer Training
Orion NTA Customer TrainingSolarWinds
 
Splunk App for Stream
Splunk App for StreamSplunk App for Stream
Splunk App for StreamSplunk
 
Government and Education: IT Tools to Support Your Hybrid Workforce
Government and Education: IT Tools to Support Your Hybrid WorkforceGovernment and Education: IT Tools to Support Your Hybrid Workforce
Government and Education: IT Tools to Support Your Hybrid WorkforceSolarWinds
 
Migrating workloads to OpenStack
Migrating workloads to OpenStackMigrating workloads to OpenStack
Migrating workloads to OpenStackopenstackstl
 
SplunkLive! Austin Customer Presentation - Dell
SplunkLive! Austin Customer Presentation - DellSplunkLive! Austin Customer Presentation - Dell
SplunkLive! Austin Customer Presentation - DellSplunk
 
DGI Compliance Webinar
DGI Compliance WebinarDGI Compliance Webinar
DGI Compliance WebinarSolarWinds
 
Fast 360 assessment sample report
Fast 360 assessment sample reportFast 360 assessment sample report
Fast 360 assessment sample reportExtraHop Networks
 
SplunkLive! Austin Customer Presentation - Xerox
SplunkLive! Austin Customer Presentation - XeroxSplunkLive! Austin Customer Presentation - Xerox
SplunkLive! Austin Customer Presentation - XeroxSplunk
 
Im 2021 tutorial next-generation closed-loop automation - an inside view - ...
Im 2021 tutorial   next-generation closed-loop automation - an inside view - ...Im 2021 tutorial   next-generation closed-loop automation - an inside view - ...
Im 2021 tutorial next-generation closed-loop automation - an inside view - ...Ishan Vaishnavi
 

Tendances (20)

Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
 
What’s New: Splunk App for Stream and Splunk MINT
What’s New: Splunk App for Stream and Splunk MINTWhat’s New: Splunk App for Stream and Splunk MINT
What’s New: Splunk App for Stream and Splunk MINT
 
ExtraHop Product Overview Datasheet
ExtraHop Product Overview DatasheetExtraHop Product Overview Datasheet
ExtraHop Product Overview Datasheet
 
Whitepaper factors to consider commercial infrastructure management vendors
Whitepaper  factors to consider commercial infrastructure management vendorsWhitepaper  factors to consider commercial infrastructure management vendors
Whitepaper factors to consider commercial infrastructure management vendors
 
Splunk MINT and Stream Breakout
Splunk MINT and Stream BreakoutSplunk MINT and Stream Breakout
Splunk MINT and Stream Breakout
 
Whitepaper factors to consider when selecting an open source infrastructure ...
Whitepaper  factors to consider when selecting an open source infrastructure ...Whitepaper  factors to consider when selecting an open source infrastructure ...
Whitepaper factors to consider when selecting an open source infrastructure ...
 
IBM Operations Analytics For z Systems V2.2 - Client Long Pres
IBM Operations Analytics For z Systems V2.2 - Client Long PresIBM Operations Analytics For z Systems V2.2 - Client Long Pres
IBM Operations Analytics For z Systems V2.2 - Client Long Pres
 
IBM IT Operations Analytics for z systems
IBM IT Operations Analytics for z systemsIBM IT Operations Analytics for z systems
IBM IT Operations Analytics for z systems
 
Experiences in Mainframe-to-Splunk Big Data Access
Experiences in Mainframe-to-Splunk Big Data AccessExperiences in Mainframe-to-Splunk Big Data Access
Experiences in Mainframe-to-Splunk Big Data Access
 
SteelCentral NetSensor 3.0
SteelCentral NetSensor 3.0SteelCentral NetSensor 3.0
SteelCentral NetSensor 3.0
 
Federal Webinar: Application monitoring for on-premises, hybrid, and multi-cl...
Federal Webinar: Application monitoring for on-premises, hybrid, and multi-cl...Federal Webinar: Application monitoring for on-premises, hybrid, and multi-cl...
Federal Webinar: Application monitoring for on-premises, hybrid, and multi-cl...
 
Orion NTA Customer Training
Orion NTA Customer TrainingOrion NTA Customer Training
Orion NTA Customer Training
 
Splunk App for Stream
Splunk App for StreamSplunk App for Stream
Splunk App for Stream
 
Government and Education: IT Tools to Support Your Hybrid Workforce
Government and Education: IT Tools to Support Your Hybrid WorkforceGovernment and Education: IT Tools to Support Your Hybrid Workforce
Government and Education: IT Tools to Support Your Hybrid Workforce
 
Migrating workloads to OpenStack
Migrating workloads to OpenStackMigrating workloads to OpenStack
Migrating workloads to OpenStack
 
SplunkLive! Austin Customer Presentation - Dell
SplunkLive! Austin Customer Presentation - DellSplunkLive! Austin Customer Presentation - Dell
SplunkLive! Austin Customer Presentation - Dell
 
DGI Compliance Webinar
DGI Compliance WebinarDGI Compliance Webinar
DGI Compliance Webinar
 
Fast 360 assessment sample report
Fast 360 assessment sample reportFast 360 assessment sample report
Fast 360 assessment sample report
 
SplunkLive! Austin Customer Presentation - Xerox
SplunkLive! Austin Customer Presentation - XeroxSplunkLive! Austin Customer Presentation - Xerox
SplunkLive! Austin Customer Presentation - Xerox
 
Im 2021 tutorial next-generation closed-loop automation - an inside view - ...
Im 2021 tutorial   next-generation closed-loop automation - an inside view - ...Im 2021 tutorial   next-generation closed-loop automation - an inside view - ...
Im 2021 tutorial next-generation closed-loop automation - an inside view - ...
 

En vedette

Operational Intelligence at Eadington Conference
Operational Intelligence at Eadington ConferenceOperational Intelligence at Eadington Conference
Operational Intelligence at Eadington ConferenceDavid Paster
 
SplunkLive Auckland - Operational Intelligence
SplunkLive Auckland - Operational IntelligenceSplunkLive Auckland - Operational Intelligence
SplunkLive Auckland - Operational IntelligenceSplunk
 
Mainframe Optimization in 2017
Mainframe Optimization in 2017Mainframe Optimization in 2017
Mainframe Optimization in 2017Precisely
 
State of the Mainframe for 2017
State of the Mainframe for 2017State of the Mainframe for 2017
State of the Mainframe for 2017Precisely
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeMongoDB
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicDataWorks Summit
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachDataWorks Summit
 
Apache Spark: The Analytics Operating System
Apache Spark: The Analytics Operating SystemApache Spark: The Analytics Operating System
Apache Spark: The Analytics Operating SystemAdarsh Pannu
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 

En vedette (10)

Operational Intelligence at Eadington Conference
Operational Intelligence at Eadington ConferenceOperational Intelligence at Eadington Conference
Operational Intelligence at Eadington Conference
 
Operational Intelligence
Operational IntelligenceOperational Intelligence
Operational Intelligence
 
SplunkLive Auckland - Operational Intelligence
SplunkLive Auckland - Operational IntelligenceSplunkLive Auckland - Operational Intelligence
SplunkLive Auckland - Operational Intelligence
 
Mainframe Optimization in 2017
Mainframe Optimization in 2017Mainframe Optimization in 2017
Mainframe Optimization in 2017
 
State of the Mainframe for 2017
State of the Mainframe for 2017State of the Mainframe for 2017
State of the Mainframe for 2017
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
 
Apache Spark: The Analytics Operating System
Apache Spark: The Analytics Operating SystemApache Spark: The Analytics Operating System
Apache Spark: The Analytics Operating System
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 

Similaire à Operational Intelligence Using Hadoop

Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareData Con LA
 
November 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridNovember 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridYahoo Developer Network
 
Real-time analysis using an in-memory data grid - Cloud Expo 2013
Real-time analysis using an in-memory data grid - Cloud Expo 2013Real-time analysis using an in-memory data grid - Cloud Expo 2013
Real-time analysis using an in-memory data grid - Cloud Expo 2013ScaleOut Software
 
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...Big Data Spain
 
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...In-Memory Computing Summit
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceMongoDB
 
IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...
IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...
IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...In-Memory Computing Summit
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsYong Feng
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...AboutYouGmbH
 
Bigdata and Hadoop with Docker
Bigdata and Hadoop with DockerBigdata and Hadoop with Docker
Bigdata and Hadoop with Dockerharidasnss
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceWilfried Hoge
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingBikas Saha
 
In memory computing principles by Mac Moore of GridGain
In memory computing principles by Mac Moore of GridGainIn memory computing principles by Mac Moore of GridGain
In memory computing principles by Mac Moore of GridGainData Con LA
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Precisely
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processinghitesh1892
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingHortonworks
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresOzgun Erdogan
 

Similaire à Operational Intelligence Using Hadoop (20)

Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
 
November 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridNovember 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory grid
 
Real-time analysis using an in-memory data grid - Cloud Expo 2013
Real-time analysis using an in-memory data grid - Cloud Expo 2013Real-time analysis using an in-memory data grid - Cloud Expo 2013
Real-time analysis using an in-memory data grid - Cloud Expo 2013
 
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-Service
 
IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...
IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...
IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 
Bigdata and Hadoop with Docker
Bigdata and Hadoop with DockerBigdata and Hadoop with Docker
Bigdata and Hadoop with Docker
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
 
Ibm db2update2019 icp4 data
Ibm db2update2019   icp4 dataIbm db2update2019   icp4 data
Ibm db2update2019 icp4 data
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
In memory computing principles by Mac Moore of GridGain
In memory computing principles by Mac Moore of GridGainIn memory computing principles by Mac Moore of GridGain
In memory computing principles by Mac Moore of GridGain
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with Postgres
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 

Dernier (20)

Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 

Operational Intelligence Using Hadoop

  • 1. Enabling Operational Intelligence Using Hadoop MapReduce Copyright © 2014 by ScaleOut Software, Inc. Hadoop Summit June 3-5, 2014 Bill Bain, CEO (wbain@scaleoutsoftware.com)
  • 2. 2 ScaleOut Software, Inc. • The Need for Operational Intelligence (OI) • Operational Intelligence vs. Business Intelligence • Implementing OI Using In-Memory Computing: • In-Memory Data Grid • Data-Parallel Computation: “Parallel Method Invocation” • Implementing MapReduce Unchanged on an IMDG • A Detailed Example in Financial Services • Video Demo • Examples of Applications in Operational Intelligence Agenda
  • 3. 3 ScaleOut Software, Inc. Goal: Provide immediate feedback to a system handling live data. A few examples: • Equity trading: to minimize risk during a trading day • Ecommerce: to optimize real-time shopping activity • Reservations systems: to identify issues, reroute, etc. • Credit cards & wire transfers: to detect fraud in real time • Smart grids: to optimize power distribution & detect issues Online Systems Need Operational Intelligence
  • 4. 4 ScaleOut Software, Inc. • To keep up with fast growing “live” workloads & maintain fast response times: • Ex.: Handle incoming data streams in real time. • Ex. Process updates to data set based on incoming data. • To identify and respond to trends in fast-changing data: • Ex. Evaluate data set changes in real time. • Ex. Respond to identified patterns within seconds. Challenges for Operational Intelligence 0 50 100 150 200 250 300 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Millions Growth in Web Servers Source: Netcraft 0 500 1000 1500 2000 2500 3000 3500 4000 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 Exebytes Growth in “Big Data” “More data has been created in the past three years than in the past 40,000.”
  • 5. 5 ScaleOut Software, Inc. Big Data Analytics Real-Time vs. Batch Analytics Static data sets Petabytes Disk storage Hours to minutes Best uses: • Analyzing warehoused data • Mining for long- term trends Live data sets Gigabytes to terabytes In-memory storage Minutes to seconds Best uses: • Tracking live data • Immediately identifying trends and capturing opportunities • Providing immediate feedback IMDGs Spark Storm CEP Hadoop IBM Teradata Oracle SAP Real-Time Batch Real-time “Operational Intelligence” Batch “Business Intelligence”
  • 6. 6 ScaleOut Software, Inc. • Traditional Hadoop MapReduce platforms analyze offline data: • Very large, disk-based datasets • Data repeatedly copied from disk to memory. • Batch-scheduled (multi-tenant) • IMDGs store and analyze live data: • Fast-changing, operational data integrated with live updates • Data kept memory-resident (data motion is minimized) • Inline-scheduled (single tenant) Design Goals for Hadoop vs. IMDGs
  • 7. 7 ScaleOut Software, Inc. • Operational intelligence can co-exist with business intelligence: • Processes streaming data close to its sources. • Provides real-time, “tactical” feedback (e.g., recommendations, alerts). • Translates data for storage in the data warehouse (ETL). • Data warehouse provides “strategic” guidance. • Using the same tool set (i.e., Hadoop MapReduce) lowers TCO: • Leverages common skill set. • Simplifies design (e.g., loading data into HDFS). Integrated View of Analytics
  • 8. 8 ScaleOut Software, Inc. • In-memory data grid (IMDG) holds active entities undergoing state changes in memory. • IMDG updates entities with incoming stream of state changes. • Backing store optionally holds large population of entities. • Analytics engine examines entities in real time and generates alerts within seconds as needed. In-Memory Architecture for Operational Intelligence
  • 9. 9 ScaleOut Software, Inc. In-Memory Data Grid (IMDG) stores “live” data in a cluster: • Fits in the business logic layer: • Follows object-oriented view of data (vs. relational view). • Stores unstructured collections of Java/.NET objects. • Uses create/read/update/delete and query APIs to access data. • Implemented across a cluster of servers or VMs: • Scales storage and throughput by adding servers. • Provides high availability in case a server fails. In-Memory Data Grid for Live Data
  • 10. 10 ScaleOut Software, Inc. • IMDG’s collections of objects behave like in-memory collections: • Unstructured, typically instances of a class (stored as serialized blobs) • Individually accessible / update-able • IMDG adds attributes: • Accessible by global key • Query-able by properties • Highly available • Optional timeouts • Distributed locking • Integration with a backing store • Optional dependency relationships • Asynchronous event handling IMDG Stores “Live” Data Basic “CRUD” APIs: • Create(key, obj, tout) • Read(key) • Update(key, obj) • Delete(key) and… • Lock(key) • Unlock(key) Object key
  • 11. 11 ScaleOut Software, Inc. IMDG Analyzes Live Data • Integrated execution engine: “Parallel Method Invocation” (PMI) • Object-oriented version of HPC data-parallel computing model • Serves as a platform for implementing MapReduce and other data-parallel operators. • Runs user-defined methods in parallel across the cluster. • Globally merges results. • Benefits: • Simple, well understood model • Fast startup time • Fast global barrier • Minimum data motion • Automatic code shipping Analyze Data (Eval) Combine Results (Merge)
  • 12. 12 ScaleOut Software, Inc. PMI Enables Linear Speedup Avoids data motion (network or disk I/O) which limits throughput:
  • 13. 13 ScaleOut Software, Inc. Spark / Spark Streaming from U.C. Berkeley amplab: • In-memory computing to accelerate and extend Hadoop MapReduce using data-parallel operators in Scala. • Stores data as “resilient distributed datasets” (RDDs): • Distributed across cluster • Immutable • Hold data from/output to HDFS. • Store data stream as a sequence of RDDs. • Comparison to IMDG: • Not designed for “live” data: • Lacks CRUD on individual objects. • Lacks high availability. • Designed for “data parallel” transformations. Comparison: IMDGs to Spark
  • 14. 14 ScaleOut Software, Inc. Run MapReduce as two PMI phases: • Data can be input from either the IMDG or an external data source. • Works with any input/output format compatible with the Apache distribution. • IMDG uses its data-parallel execution engine (PMI) to invoke the mappers and the reducers. • Eliminates batch scheduling overhead. • Intermediate results are stored within the IMDG. • Minimizes data motion between the mappers and reducers. • Allows optional sorting. • Output of a single reducer/combiner optionally can be globally merged. Implementing MapReduce on IMDG
  • 15. 15 ScaleOut Software, Inc. // This job will run using the Hadoop // job tracker: public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass( TextInputFormat.class); job.setOutputFormatClass( TextOutputFormat.class); FileInputFormat.addInputPath( job, new Path(args[0])); FileOutputFormat.setOutputPath( job, new Path(args[1])); job.waitForCompletion(true); } // This job will run using ScaleOut hServer: public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new HServerJob(conf, "wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass( TextInputFormat.class); job.setOutputFormatClass( TextOutputFormat.class); FileInputFormat.addInputPath( job, new Path(args[0])); FileOutputFormat.setOutputPath( job, new Path(args[1])); job.waitForCompletion(true); } Configuring Application for the IMDG • Without YARN, just subclass the Hadoop Job class with a one-line change:
  • 16. 16 ScaleOut Software, Inc. Running Under YARN • With YARN, just replace the MapReduce execution framework: • Example of running MapReduce on IMDG using Hortonworks YARN: • YARN directs jobs to IMDG. • IMDG accelerates execution. $ hadoop jar hadoop-mapreduce-examples.jar wordcount -Dmapreduce.framework.name=hserver-yarn in out
  • 17. 17 ScaleOut Software, Inc. • With YARN, IMDG can run Apache or other Hive distribution unchanged. • Accelerates queries for datasets hosted in HDFS or the IMDG. • Limitation: Intermediate data must fit within the IMDG. • Implementation note: • Hive not thread-safe • Requires multiple JVMs per server for one Hive query • Currently seeing 3X speedup (tuning in progress) • More optimizations possible, but… • Limited by “unchanged” approach Using YARN to Run Hive on IMDG
  • 18. 18 ScaleOut Software, Inc. • A Hadoop distribution does not have to be installed unless HDFS is used. • The developer starts MapReduce applications from a remote workstation. • The IMDG automatically builds a reusable “invocation grid” of JVMs on the grid’s servers for PMI and ships the application’s jars. • Results are stored in the IMDG, HDFS, or optionally globally merged and returned to the remote workstation. Running MapReduce on an IMDG
  • 19. 19 ScaleOut Software, Inc. The invocation grid can be re-used across MapReduce jobs: Accelerating Start-Up Times //Configure and load the invocation grid InvocationGrid grid = HServerJob.getInvocationGridBuilder("myGrid"). // Add JAR files as IG dependencies addJar("main-job.jar"). addJar(“mylib.jar"). // Add classes as IG dependencies addClass(MyMap.class). addClass(MyRed.class). // Define custom JVM parameters setJVMParameters("-Xms512M -Xmx1024M"). load(); //Run 10 jobs on the same invocation grid for(int i=0; i<10; i++) { Configuration conf = new Configuration(); //The preloaded invocation grid is passed to the job. Job job = new HServerJob(conf, "Job number "+i, false, grid); //......Configure the job here......... //Run the job job.waitForCompletion(true); } //Unload the invocation grid when we are done grid.unload();
  • 20. 20 ScaleOut Software, Inc. • IMDG adds grid input format for accessing key/value pairs held in the IMDG. • MapReduce programs optionally can output results to IMDG with grid output format. • Grid Record Reader optimizes access to key/value pairs to eliminate network overhead. • Applications can access and update key/value pairs as operational data during analysis. Accessing In-Memory Data
  • 21. 21 ScaleOut Software, Inc. IMDG needs multiple in-memory storage models: • Named cache, optimized for rich semantics on large objects: • Property-based query • Distributed locking • Access from remote grids • Named map, optimized for efficient storage and bulk analysis (e.g., MapReduce): • Highly efficient object storage • Pipelined, bulk-access mechanisms • Follows Java Named Map semantics. Optimizing In-Memory Storage for M/R
  • 22. 22 ScaleOut Software, Inc. In-Memory Named Map: • Stores key/value pairs in chunks. • Allows CRUD operations on kvps. • Automatically organizes chunks into splits. • Uses per-split hash table to access keys and manage multi-valued keys. • Stores shuffled data set between mappers and reducers. • Pipelines chunks to mappers and from reducers. • Optionally uses memory mapped files to reduce access latency. • Provides support for sorting keys. Named Map Optimizations
  • 23. 23 ScaleOut Software, Inc. • IMDG adds Dataset Record Reader (wrapper) to cache HDFS data during program execution. • Hadoop automatically retrieves data from IMDG on subsequent runs. • Dataset Record Reader stores and retrieves data with minimum network and memory overheads. • Tests with Terasort benchmark have demonstrated 11X faster access latency over HDFS without IMDG. Optional Caching of HDFS Data
  • 24. 24 ScaleOut Software, Inc. • IMDG caches “chunks” of key/value pairs instead of HDFS records. • Serves key/value pairs directly to mappers on cache access. • Avoids overhead of reparsing records. Details of HDFS Caching “Record” Phase “Playback” Phase
  • 25. 25 ScaleOut Software, Inc. • Measured performance: • Startup times reduced to a few milliseconds • Word count benchmark shows 20X speedup. • Real-world example shows >40X speedup. • MapReduce optimizations: • Optional sorting • Optional multicast of parameters to mappers • Optional O(logN) global combining (avoids single reducer) • Optional HDFS caching • Optional reuse of JVMs across jobs • Current limitations: • No specific security for multi-tenancy • Intermediate data must fit in the IMDG Performance & Optimizations
  • 26. 26 ScaleOut Software, Inc. In-Memory MapReduce: • Enables use of Hadoop MapReduce for operational intelligence. • Accelerates data access by holding data in memory. • Analyzes and updates “live” data. • Reduces overheads of standard Hadoop distributions: • Batch scheduling • Disk access • Data shuffling • Mandatory key sorting • Avoids vendor-specific APIs: • Leverages Hadoop skill sets. Summary of Benefits
  • 27. 27 ScaleOut Software, Inc. Integrate analysis into a stock trading platform: • The IMDG holds market data and hedging strategies. • Updates to market data continuously flow through the IMDG. • The IMDG performs repeated data-parallel analysis on hedging strategies and alerts traders in real time. • IMDG automatically and dynamically scales its throughput to handle new hedging strategies by adding servers. • Measured >40X speedup over Apache 1.2. Example in Financial Services
  • 28. 28 ScaleOut Software, Inc. The Challenge: Quickly evaluate and respond to sub-second market changes: • Hedge fund tracks a set of hedging strategies: • Strategies can cover various market sectors, such as high-tech, automotive, energy, consumer, real estate, etc. • Each strategy contains list of holdings and rules for managing the holdings (such as target allocations). • Updates to market data continuously arrive during the trading day. • Challenge: The hedge fund must be able to quickly update and analyze its hedging strategies and provide alerts to traders. Demo of the Finserv Application
  • 29. 29 ScaleOut Software, Inc. • Delivers a stream of alerts to traders within a few seconds. • Enables the trader to examine strategy details in real time: Output: Real-Time Alerts
  • 30. 30 ScaleOut Software, Inc. • Video Link Video
  • 31. 31 ScaleOut Software, Inc. • Measured a similar financial services application (back testing stock trading strategies on stock histories) • Hosted IMDG in Amazon EC2 using 75 servers holding 1 TB of stock history data in memory • IMDG handled a continuous stream of updates (1.1 GB/s) • Results: analyzed 1 TB in 4.1 seconds (250 GB/s) with linear scaling Example of Performance Scaling
  • 32. 32 ScaleOut Software, Inc. Fast map/reduce reconciles inventory and order systems for an online retailer: • Challenge: Inventory and online order management are handled by different applications. • Reconciled once per day. • Inaccurate orders reduces margins. • Solution: • Host SKUs in IMDG updated in real time by order & inventory systems. • Use MapReduce to reconcile in two minutes. • Results: Real-time reconciliation ensures accurate orders. Example in Ecommerce: Inventory Management
  • 33. 33 ScaleOut Software, Inc. • IMDG holds customer information for active Web users. • IMDG saves/retrieves customer information from backing store. • Web browsers send activity information to analytics engine. • IMDG updates customer history and preferences. • Analytics engine identifies browsing and buying patterns. • Analytics engine makes suggestions in real-time. Also sends email follow-ups. Example: Web Shopping
  • 34. 34 ScaleOut Software, Inc. • Track connectivity issues. • Obtain time- sensitive business data. • Offer enhanced services. • Increase security. Example: Telecommunications Optimize Operations Customer Experience Historical queries for real-time data enrichment Stream persistence for future analysis Network Elements
  • 35. 35 ScaleOut Software, Inc. • Online systems need operational intelligence on “live” data for immediate feedback. • Operational intelligence can be implemented using Hadoop MapReduce unchanged. • In-memory data grid provides an excellent platform for MR- based operational intelligence: • Hosts and updates “live” data. • Implements high availability. • Offers fast MapReduce execution for immediate results. • Leverages Hadoop skill sets. Recap
  • 37. 37 ScaleOut Software, Inc. • Storm implements pipelined, task-parallel execution by “bolts” on incoming data streams. • Streams can be distributed to bolts with configurable mappings. • Developer controls the number of tasks per bolt. • Storm uses a centralized master node and Zookeeper for fault- tolerance. • Key strength: continuous processing of input streams • Issues: • Complexity / tuning • Minimizing data motion • Managing global state Comparison to Storm
  • 38. 38 ScaleOut Software, Inc. • Create method to analyze a queried stock object and another method to pair-wise merge the results: Java Example: Parallel Method Invocation public class StockAnalysis implements Invokable<Stock, StockCalcParams, Double> { public Double eval(Stock stock, StockCalcParams param) throws InvokeException { return stock.getPrice() * stock.getTotalShares(); } public Double merge(Double first, Double second) throws InvokeException { return first + second; } }
  • 39. 39 ScaleOut Software, Inc. • Run a parallel method invocation on a queried set of objects: Java Example: Parallel Method Invocation NamedCache cache = CacheFactory.getCache("Stocks"); InvokeResult valueOfSelectedStocks = cache.invoke( StockAnalysis.class, Stock.class, or(equal("ticker", "GOOG"), equal("ticker", "ORCL")), new StockCalcParams()); System.out.println("The value of selected stocks is" + valueOfSelectedStocks.getResult());
  • 40. 40 ScaleOut Software, Inc. • IMDG ships user’s code and libraries to its servers. • IMDG automatically schedules analysis operations across all grid servers and cores. • The analysis runs on all objects selected by the parallel query. • Each grid server analyzes its locally stored objects to minimize data motion. • Parallel execution ensures fast completion time: • IMDG automatically distributes workload across servers/cores. • Scaling the IMDG automatically handles larger data sets. PMI: Running the Analysis
  • 41. 41 ScaleOut Software, Inc. • The IMDG automatically merges all analysis results. • The IMDG first merges all results within each grid server in parallel. • It then merges results across all grid servers to create one combined result. • Efficient parallel merge minimizes the delay in combining all results. • The IMDG delivers the combined result to the invoking application as one object. PMI: Merging the Results