SlideShare une entreprise Scribd logo
1  sur  68
Télécharger pour lire hors ligne
© 2014 MapR Technologies 1© 2014 MapR Technologies
© 2014 MapR Technologies 2
Who am I?
Ted Dunning, Chief Applications Architect MapR Technologies
Email tdunning@mapr.com tdunning@apache.org
Twitter @Ted_Dunning
© 2014 MapR Technologies 3
e-book available courtesy of MapR
http://bit.ly/1jQ9QuL
A New Look at Anomaly Detection
by Ted Dunning and Ellen Friedman © June 2014 (published by O’Reilly)
© 2014 MapR Technologies 4
Agenda
• The Internet is turning upside down
• A story
• The last (mile) shall be first
• Time series on NO-SQL
• Faster time series on NO-SQL
• Summary
© 2014 MapR Technologies 5
How the Internet Works
• Big content servers feed data across the backbone to
• Regional caches and servers feed data across neighborhood
transport to
• The “last mile”
• Bits are nearly conserved, $ are concentrated centrally
– But total $ mass at the edge is much higher
© 2014 MapR Technologies 6
How The Internet Works
Server
Cache
Cache
Gateway
Switch
Firewall
c1
c2
Gateway
Switch Firewall
c1
c2
Switch
Firewall c1
c2
© 2014 MapR Technologies 7
Conservation of Bits Decreases Bandwidth
Server
Cache
Cache
Gateway
Switch
Firewall
c1
c2
Gateway
Switch Firewall
c1
c2
Switch
Firewall c1
c2
© 2014 MapR Technologies 8
Total Investment Dominated by Last Mile
Server
Cache
Cache
Gateway
Switch
Firewall
c1
c2
Gateway
Switch Firewall
c1
c2
Switch
Firewall c1
c2
© 2014 MapR Technologies 9
The Rub
• What's the problem?
– Speed (end-to-end latency, backbone bw)
– Feasibility (cost for consumer links)
– Caching
• What do we need?
– Cheap last-mile hardware
– Good caches
© 2014 MapR Technologies 10
First:
An apology for going
off-script
© 2014 MapR Technologies 11
Now, the story
© 2014 MapR Technologies 12
© 2014 MapR Technologies 13
By the 1840’s, the NY-SF
sailing time was down to
130-180 days
© 2014 MapR Technologies 14
© 2014 MapR Technologies 15
In 1851, the record was
set at 89 days by the
Flying Cloud
© 2014 MapR Technologies 16
The difference was due
(in part) to big data
and a primitive kind of
time-series database
© 2014 MapR Technologies 17
© 2014 MapR Technologies 18
© 2014 MapR Technologies 19
© 2014 MapR Technologies 20
These charts were free …
If you donated your data
© 2014 MapR Technologies 21
But how does this apply
today?
© 2014 MapR Technologies 22
What has changed?
Where will it lead?
© 2014 MapR Technologies 23
© 2014 MapR Technologies 24
© 2014 MapR Technologies 25
© 2014 MapR Technologies 26
© 2014 MapR Technologies 27
© 2014 MapR Technologies 28
© 2014 MapR Technologies 29
© 2014 MapR Technologies 30
© 2014 MapR Technologies 31
© 2014 MapR Technologies 32
© 2014 MapR Technologies 33
Things
© 2014 MapR Technologies 34
Emitting data
© 2014 MapR Technologies 35
How The Internet Works
Server
Cache
Cache
Gateway
Switch
Firewall
c1
c2
Gateway
Switch Firewall
c1
c2
Switch
Firewall c1
c2
© 2014 MapR Technologies 36
How the Internet is Going to Work
Server
Cache
Cache
GatewaySwitchController
m4
m3
Gateway
Switch
Controller
m6
m5
Switch
Controllerm2
m1
© 2014 MapR Technologies 37
Where Will The $ Go?
Server
Cache
Cache
GatewaySwitchController
m4
m3
Gateway
Switch
Controller
m6
m5
Switch
Controllerm2
m1
© 2014 MapR Technologies 38
Sensors
© 2014 MapR Technologies 39
Controllers
© 2014 MapR Technologies 40
The Problems
• Sensors and controllers have little processing or space
– SIM cards = 20Mhz processor, 128kb space = 16kB
– Arduino mini = 15kB RAM (more EPROM)
– BeagleBone/Raspberry Pi = 500 kB RAM
• Sensors and controllers have little power
– Very common to power down 99% of the time
• Sensors and controls often have very low bandwidth
– Mesh networks with base rates << 1Mb/s
– Power line networking
– Intermittent 3G/4G/LTE connectivity
© 2014 MapR Technologies 41
What Do We Need to Do With a Time Series
• Acquire
– Measurement, transmission, reception
– Mostly not our problem
• Store
– We own this
• Retrieve
– We have to allow this
• Analyze and visualize
– We facilitate this via retrieval
© 2014 MapR Technologies 42
Retrieval Requirements
• Retrieve by time-series, time range, tags
– Possibly pull millions of data points at a time
– Possibly do on-the-fly windowed aggregations
• Search by unstructured data
– Typically require time windowed facetting after search
– Also need to dive in with first kind of retrieval
© 2014 MapR Technologies 43
Storage choices and trade-offs
• Flat files
– Great for rapid ingest with massive data
– Handles essentially any data type
– Less good for data requiring frequent updates
– Harder to find specific ranges
• Traditional relational db
– Ingests up to 10,000’s/ sec; prefers well structured (numerical) data; expensive
• Non-relational db: Tables (such as MapR tables in M7 or HBase)
– Ingests up to 100,000 rows/sec
– Handles wide variety of data
– Good for frequent updates
– Easily scanned in a range
© 2014 MapR Technologies 44
Specific Example
• Consider a server farm
• Lots of system metrics
• Typically 100-300 stats / 30 s
• Loads, RPC’s, packets, requests/s
• Common to have 100 – 10,000 machines
© 2014 MapR Technologies 45
The General Outline
• 10 samples / second / machine
x 1,000 machines
= 10,000 samples / second
• This is what Open TSDB was designed to handle
• Install and go, but don’t test at scale
© 2014 MapR Technologies 46
Specific Example
• Consider oil drilling rigs
• When drilling wells, there are *lots* of moving parts
• Typically a drilling rig makes about 10K samples/s
• Temperatures, pressures, magnetics,
machine vibration levels, salinity, voltage,
currents, many others
• Typical project has 100 rigs
© 2014 MapR Technologies 47
The General Outline
• 10K samples / second / rig
x 100 rigs
= 1M samples / second
© 2014 MapR Technologies 48
The General Outline
• 10K samples / second / rig
x 100 rigs
= 1M samples / second
• But wait, there’s more
– Suppose you want to test your system
– Perhaps with a year of data
– And you want to load that data in << 1 year
• 100x real-time = 100M samples / second
© 2014 MapR Technologies 49
How Should That Work?
Message
queue
Collector
MapR
table
Samples
Web service Users
© 2014 MapR Technologies 50
Example Time Series
...
1409497082 327810227706 mysql.bytes_received schema=foo host=db1
1409497099 6604859181710 mysql.bytes_sent schema=foo host=db1
1409497106 327812421706 mysql.bytes_received schema=foo host=db1
1409497113 6604901075387 mysql.bytes_sent schema=foo host=db
...
UNIX epoch timestamp: $(date +%s)
a metric (often hierarchical)
two tags
© 2014 MapR Technologies 51
The Whole Picture
HBase
or
MapR-DB
© 2014 MapR Technologies 52
Wide Table Design: Point-by-Point
© 2014 MapR Technologies 53
Wide Table Design: Hybrid Point-by-Point + Blob
Insertion of data as blob makes original columns redundant
Non-relational, but you can query these tables with Drill
© 2014 MapR Technologies 54
Status to This Point
• Each sample requires one insertion, compaction requires
another
• Typical performance on SE cluster
– 1 edge node + 4 cluster nodes
– 20,000 samples per second observed
– Would be faster on performance cluster, possibly not a lot
• Suitable for server monitoring
• Not suitable for large scale history ingestion
• Bulk load helps a little, but not much
• Still 1000x too slow for industrial work
© 2014 MapR Technologies 55
Speeding up OpenTSDB
20,000 data points per second per node in the test cluster
Why can’t it be faster ?
© 2014 MapR Technologies 56
Speeding up OpenTSDB: open source MapR extensions
Available on Github: https://github.com/mapr-demos/opentsdb
© 2014 MapR Technologies 57
Status to This Point
• 3600 samples require one insertion
• Typical results on SE cluster
– 1 edge node + 4 cluster nodes
– 14 million samples per second observed
– ~700x faster ingestion
• Typical results on performance cluster
– 2-4 edge nodes + 4-9 cluster nodes
– 110 million samples/s (4 nodes) to >200 million samples/s (8 nodes)
• Suitable for large scale history ingestion
• 30 million data points retrieved in 20s
• Ready for industrial work
© 2014 MapR Technologies 58
Going Further
• Open TSDB is substantially limited in many respects
– Millisecond resolution is a bit of a hack
– Data formats “just growed”, better design needed
– Internal code is difficult to modify safely
• Possible improvements
– Compress and batch at collectors
– Use advanced compression technology
– Interface with modern query systems (Apache Drill)
© 2014 MapR Technologies 59
Compression example
Samples are
64b time, 16 bit sample
Sample time at 10kHz
Sample time jitter makes it
important to keep original
time-stamp
How much overhead to
retain time-stamp?
© 2014 MapR Technologies 60
Key Results
• Ingestion is network limited
– Edge nodes are the critical resource
– Number of edge nodes defines a limit to scaling
• With enough edge nodes scaling is near perfect
• Performance of raw OpenTSDB is limited by stateless demon
• Modified OpenTSDB can run 1000x faster
© 2014 MapR Technologies 61
Overall Ingestion Rate
Nodes
TotalIngestionRate(millionsofpoints/second)
4 5 8 9
050150250
Two ingestors
One ingestor
© 2014 MapR Technologies 62
Normalized Ingestion Rate
Nodes
Ingestionpernode(millionsofpoints/second)
4 5 8 9
010203040 Two ingestors
One ingestor
© 2014 MapR Technologies 63
Why MapR?
• MapR tables are inherently faster, safer
– Sustained > 1GB/s ingest rate in tests
• Mirror to M5 or M7 cluster to isolate analytics load
• Transaction logs involves frequent appends, many files
© 2014 MapR Technologies 64
When is this All Wrong?
• In some cases, retrieval by series-id + time range not sufficient
• May need very flexible retrieval of events based on text-like
criteria
• Search may be better than class time-series database
• Can scale Lucene based search to > 1 million events / second
© 2014 MapR Technologies 65
When is it Even More Right
• In many industrial settings, data rates from individual sensors are
relatively high
– Latency to view is still measured in seconds, not sample points
• This allows batching at source
• Common requirement for highly variable sample rates
– 1 sample/s, baseline, switch to 10 k sample/s
– Small batches during slow times are just fine since number of sensors is
constant
– Requires variable window sizes
© 2014 MapR Technologies 66
Summary
• The internet is turning upside down
• This will make time series ubiquitous
• Current open source systems are much too slow
• We can fix that with modern NoSQL systems
– (I wear a red hat for a reason)
© 2014 MapR Technologies 67
Questions
© 2014 MapR Technologies 68
Thank You
@mapr maprtech
tdunning@mapr.com
tdunning@apache.org
Ted Dunning, ChiefApplicationArchitect
MapRTechnologies
maprtech
mapr-technologies

Contenu connexe

Tendances

Boston hug-2012-07
Boston hug-2012-07Boston hug-2012-07
Boston hug-2012-07Ted Dunning
 
NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBNoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBMapR Technologies
 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series DatabaseDataWorks Summit
 
Dchug m7-30 apr2013
Dchug m7-30 apr2013Dchug m7-30 apr2013
Dchug m7-30 apr2013jdfiori
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranMapR Technologies
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Modern Data Stack France
 
Hug france-2012-12-04
Hug france-2012-12-04Hug france-2012-12-04
Hug france-2012-12-04Ted Dunning
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreModern Data Stack France
 
Getting Started with HBase
Getting Started with HBaseGetting Started with HBase
Getting Started with HBaseCarol McDonald
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemCloudera, Inc.
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceUwe Printz
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR Technologies
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBaseCarol McDonald
 

Tendances (20)

Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
Boston hug-2012-07
Boston hug-2012-07Boston hug-2012-07
Boston hug-2012-07
 
NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBNoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DB
 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series Database
 
Dchug m7-30 apr2013
Dchug m7-30 apr2013Dchug m7-30 apr2013
Dchug m7-30 apr2013
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
 
Hug france-2012-12-04
Hug france-2012-12-04Hug france-2012-12-04
Hug france-2012-12-04
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
Getting Started with HBase
Getting Started with HBaseGetting Started with HBase
Getting Started with HBase
 
10c introduction
10c introduction10c introduction
10c introduction
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On TezFebruary 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community Edition
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBase
 

En vedette

Hadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache DrillHadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache DrillMapR Technologies
 
Maintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single ClusterMaintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single ClusterMapR Technologies
 
Hadoop: Revolutionizing Analytics AND Operations
Hadoop: Revolutionizing Analytics AND OperationsHadoop: Revolutionizing Analytics AND Operations
Hadoop: Revolutionizing Analytics AND OperationsMapR Technologies
 
Hadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsHadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsMapR Technologies
 
Drill Lightning London Big Data
Drill Lightning London Big DataDrill Lightning London Big Data
Drill Lightning London Big DataMapR Technologies
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezJan Pieter Posthuma
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 

En vedette (14)

Drill at the Chicago Hug
Drill at the Chicago HugDrill at the Chicago Hug
Drill at the Chicago Hug
 
Summit EU Machine Learning
Summit EU Machine LearningSummit EU Machine Learning
Summit EU Machine Learning
 
New directions for mahout
New directions for mahoutNew directions for mahout
New directions for mahout
 
Hadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache DrillHadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache Drill
 
News From Mahout
News From MahoutNews From Mahout
News From Mahout
 
Maintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single ClusterMaintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single Cluster
 
Apache drill
Apache drillApache drill
Apache drill
 
Hadoop: Revolutionizing Analytics AND Operations
Hadoop: Revolutionizing Analytics AND OperationsHadoop: Revolutionizing Analytics AND Operations
Hadoop: Revolutionizing Analytics AND Operations
 
Hadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsHadoop as a Platform for Genomics
Hadoop as a Platform for Genomics
 
Drill Lightning London Big Data
Drill Lightning London Big DataDrill Lightning London Big Data
Drill Lightning London Big Data
 
Cmu 2011 09.pptx
Cmu 2011 09.pptxCmu 2011 09.pptx
Cmu 2011 09.pptx
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to Tez
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 

Similaire à Dealing with an Upside Down Internet

Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...NoSQLmatters
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
Time Series Data in a Time Series World
Time Series Data in a Time Series WorldTime Series Data in a Time Series World
Time Series Data in a Time Series WorldMapR Technologies
 
Building HBase Applications - Ted Dunning
Building HBase Applications - Ted DunningBuilding HBase Applications - Ted Dunning
Building HBase Applications - Ted DunningMapR Technologies
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningHUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningJohn Mulhall
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop DataWorks Summit/Hadoop Summit
 
Lawrence Livermore Labs talk 2011
Lawrence Livermore Labs talk 2011Lawrence Livermore Labs talk 2011
Lawrence Livermore Labs talk 2011MapR Technologies
 
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen ChinaAllen Day, PhD
 
How to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detectionHow to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detectionDataWorks Summit
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesDataWorks Summit/Hadoop Summit
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaDataStax Academy
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureMapR Technologies
 
Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19ExtremeEarth
 

Similaire à Dealing with an Upside Down Internet (20)

Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
Time Series Data in a Time Series World
Time Series Data in a Time Series WorldTime Series Data in a Time Series World
Time Series Data in a Time Series World
 
Building HBase Applications - Ted Dunning
Building HBase Applications - Ted DunningBuilding HBase Applications - Ted Dunning
Building HBase Applications - Ted Dunning
 
Keys for Success from Streams to Queries
Keys for Success from Streams to QueriesKeys for Success from Streams to Queries
Keys for Success from Streams to Queries
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningHUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_Dunning
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
 
Yarnthug2014
Yarnthug2014Yarnthug2014
Yarnthug2014
 
Lawrence Livermore Labs talk 2011
Lawrence Livermore Labs talk 2011Lawrence Livermore Labs talk 2011
Lawrence Livermore Labs talk 2011
 
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
 
How to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detectionHow to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detection
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
 
Kafka talk
Kafka talkKafka talk
Kafka talk
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data Architecture
 
Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19
 

Plus de MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 

Plus de MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 

Dernier

Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 

Dernier (20)

Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 

Dealing with an Upside Down Internet

  • 1. © 2014 MapR Technologies 1© 2014 MapR Technologies
  • 2. © 2014 MapR Technologies 2 Who am I? Ted Dunning, Chief Applications Architect MapR Technologies Email tdunning@mapr.com tdunning@apache.org Twitter @Ted_Dunning
  • 3. © 2014 MapR Technologies 3 e-book available courtesy of MapR http://bit.ly/1jQ9QuL A New Look at Anomaly Detection by Ted Dunning and Ellen Friedman © June 2014 (published by O’Reilly)
  • 4. © 2014 MapR Technologies 4 Agenda • The Internet is turning upside down • A story • The last (mile) shall be first • Time series on NO-SQL • Faster time series on NO-SQL • Summary
  • 5. © 2014 MapR Technologies 5 How the Internet Works • Big content servers feed data across the backbone to • Regional caches and servers feed data across neighborhood transport to • The “last mile” • Bits are nearly conserved, $ are concentrated centrally – But total $ mass at the edge is much higher
  • 6. © 2014 MapR Technologies 6 How The Internet Works Server Cache Cache Gateway Switch Firewall c1 c2 Gateway Switch Firewall c1 c2 Switch Firewall c1 c2
  • 7. © 2014 MapR Technologies 7 Conservation of Bits Decreases Bandwidth Server Cache Cache Gateway Switch Firewall c1 c2 Gateway Switch Firewall c1 c2 Switch Firewall c1 c2
  • 8. © 2014 MapR Technologies 8 Total Investment Dominated by Last Mile Server Cache Cache Gateway Switch Firewall c1 c2 Gateway Switch Firewall c1 c2 Switch Firewall c1 c2
  • 9. © 2014 MapR Technologies 9 The Rub • What's the problem? – Speed (end-to-end latency, backbone bw) – Feasibility (cost for consumer links) – Caching • What do we need? – Cheap last-mile hardware – Good caches
  • 10. © 2014 MapR Technologies 10 First: An apology for going off-script
  • 11. © 2014 MapR Technologies 11 Now, the story
  • 12. © 2014 MapR Technologies 12
  • 13. © 2014 MapR Technologies 13 By the 1840’s, the NY-SF sailing time was down to 130-180 days
  • 14. © 2014 MapR Technologies 14
  • 15. © 2014 MapR Technologies 15 In 1851, the record was set at 89 days by the Flying Cloud
  • 16. © 2014 MapR Technologies 16 The difference was due (in part) to big data and a primitive kind of time-series database
  • 17. © 2014 MapR Technologies 17
  • 18. © 2014 MapR Technologies 18
  • 19. © 2014 MapR Technologies 19
  • 20. © 2014 MapR Technologies 20 These charts were free … If you donated your data
  • 21. © 2014 MapR Technologies 21 But how does this apply today?
  • 22. © 2014 MapR Technologies 22 What has changed? Where will it lead?
  • 23. © 2014 MapR Technologies 23
  • 24. © 2014 MapR Technologies 24
  • 25. © 2014 MapR Technologies 25
  • 26. © 2014 MapR Technologies 26
  • 27. © 2014 MapR Technologies 27
  • 28. © 2014 MapR Technologies 28
  • 29. © 2014 MapR Technologies 29
  • 30. © 2014 MapR Technologies 30
  • 31. © 2014 MapR Technologies 31
  • 32. © 2014 MapR Technologies 32
  • 33. © 2014 MapR Technologies 33 Things
  • 34. © 2014 MapR Technologies 34 Emitting data
  • 35. © 2014 MapR Technologies 35 How The Internet Works Server Cache Cache Gateway Switch Firewall c1 c2 Gateway Switch Firewall c1 c2 Switch Firewall c1 c2
  • 36. © 2014 MapR Technologies 36 How the Internet is Going to Work Server Cache Cache GatewaySwitchController m4 m3 Gateway Switch Controller m6 m5 Switch Controllerm2 m1
  • 37. © 2014 MapR Technologies 37 Where Will The $ Go? Server Cache Cache GatewaySwitchController m4 m3 Gateway Switch Controller m6 m5 Switch Controllerm2 m1
  • 38. © 2014 MapR Technologies 38 Sensors
  • 39. © 2014 MapR Technologies 39 Controllers
  • 40. © 2014 MapR Technologies 40 The Problems • Sensors and controllers have little processing or space – SIM cards = 20Mhz processor, 128kb space = 16kB – Arduino mini = 15kB RAM (more EPROM) – BeagleBone/Raspberry Pi = 500 kB RAM • Sensors and controllers have little power – Very common to power down 99% of the time • Sensors and controls often have very low bandwidth – Mesh networks with base rates << 1Mb/s – Power line networking – Intermittent 3G/4G/LTE connectivity
  • 41. © 2014 MapR Technologies 41 What Do We Need to Do With a Time Series • Acquire – Measurement, transmission, reception – Mostly not our problem • Store – We own this • Retrieve – We have to allow this • Analyze and visualize – We facilitate this via retrieval
  • 42. © 2014 MapR Technologies 42 Retrieval Requirements • Retrieve by time-series, time range, tags – Possibly pull millions of data points at a time – Possibly do on-the-fly windowed aggregations • Search by unstructured data – Typically require time windowed facetting after search – Also need to dive in with first kind of retrieval
  • 43. © 2014 MapR Technologies 43 Storage choices and trade-offs • Flat files – Great for rapid ingest with massive data – Handles essentially any data type – Less good for data requiring frequent updates – Harder to find specific ranges • Traditional relational db – Ingests up to 10,000’s/ sec; prefers well structured (numerical) data; expensive • Non-relational db: Tables (such as MapR tables in M7 or HBase) – Ingests up to 100,000 rows/sec – Handles wide variety of data – Good for frequent updates – Easily scanned in a range
  • 44. © 2014 MapR Technologies 44 Specific Example • Consider a server farm • Lots of system metrics • Typically 100-300 stats / 30 s • Loads, RPC’s, packets, requests/s • Common to have 100 – 10,000 machines
  • 45. © 2014 MapR Technologies 45 The General Outline • 10 samples / second / machine x 1,000 machines = 10,000 samples / second • This is what Open TSDB was designed to handle • Install and go, but don’t test at scale
  • 46. © 2014 MapR Technologies 46 Specific Example • Consider oil drilling rigs • When drilling wells, there are *lots* of moving parts • Typically a drilling rig makes about 10K samples/s • Temperatures, pressures, magnetics, machine vibration levels, salinity, voltage, currents, many others • Typical project has 100 rigs
  • 47. © 2014 MapR Technologies 47 The General Outline • 10K samples / second / rig x 100 rigs = 1M samples / second
  • 48. © 2014 MapR Technologies 48 The General Outline • 10K samples / second / rig x 100 rigs = 1M samples / second • But wait, there’s more – Suppose you want to test your system – Perhaps with a year of data – And you want to load that data in << 1 year • 100x real-time = 100M samples / second
  • 49. © 2014 MapR Technologies 49 How Should That Work? Message queue Collector MapR table Samples Web service Users
  • 50. © 2014 MapR Technologies 50 Example Time Series ... 1409497082 327810227706 mysql.bytes_received schema=foo host=db1 1409497099 6604859181710 mysql.bytes_sent schema=foo host=db1 1409497106 327812421706 mysql.bytes_received schema=foo host=db1 1409497113 6604901075387 mysql.bytes_sent schema=foo host=db ... UNIX epoch timestamp: $(date +%s) a metric (often hierarchical) two tags
  • 51. © 2014 MapR Technologies 51 The Whole Picture HBase or MapR-DB
  • 52. © 2014 MapR Technologies 52 Wide Table Design: Point-by-Point
  • 53. © 2014 MapR Technologies 53 Wide Table Design: Hybrid Point-by-Point + Blob Insertion of data as blob makes original columns redundant Non-relational, but you can query these tables with Drill
  • 54. © 2014 MapR Technologies 54 Status to This Point • Each sample requires one insertion, compaction requires another • Typical performance on SE cluster – 1 edge node + 4 cluster nodes – 20,000 samples per second observed – Would be faster on performance cluster, possibly not a lot • Suitable for server monitoring • Not suitable for large scale history ingestion • Bulk load helps a little, but not much • Still 1000x too slow for industrial work
  • 55. © 2014 MapR Technologies 55 Speeding up OpenTSDB 20,000 data points per second per node in the test cluster Why can’t it be faster ?
  • 56. © 2014 MapR Technologies 56 Speeding up OpenTSDB: open source MapR extensions Available on Github: https://github.com/mapr-demos/opentsdb
  • 57. © 2014 MapR Technologies 57 Status to This Point • 3600 samples require one insertion • Typical results on SE cluster – 1 edge node + 4 cluster nodes – 14 million samples per second observed – ~700x faster ingestion • Typical results on performance cluster – 2-4 edge nodes + 4-9 cluster nodes – 110 million samples/s (4 nodes) to >200 million samples/s (8 nodes) • Suitable for large scale history ingestion • 30 million data points retrieved in 20s • Ready for industrial work
  • 58. © 2014 MapR Technologies 58 Going Further • Open TSDB is substantially limited in many respects – Millisecond resolution is a bit of a hack – Data formats “just growed”, better design needed – Internal code is difficult to modify safely • Possible improvements – Compress and batch at collectors – Use advanced compression technology – Interface with modern query systems (Apache Drill)
  • 59. © 2014 MapR Technologies 59 Compression example Samples are 64b time, 16 bit sample Sample time at 10kHz Sample time jitter makes it important to keep original time-stamp How much overhead to retain time-stamp?
  • 60. © 2014 MapR Technologies 60 Key Results • Ingestion is network limited – Edge nodes are the critical resource – Number of edge nodes defines a limit to scaling • With enough edge nodes scaling is near perfect • Performance of raw OpenTSDB is limited by stateless demon • Modified OpenTSDB can run 1000x faster
  • 61. © 2014 MapR Technologies 61 Overall Ingestion Rate Nodes TotalIngestionRate(millionsofpoints/second) 4 5 8 9 050150250 Two ingestors One ingestor
  • 62. © 2014 MapR Technologies 62 Normalized Ingestion Rate Nodes Ingestionpernode(millionsofpoints/second) 4 5 8 9 010203040 Two ingestors One ingestor
  • 63. © 2014 MapR Technologies 63 Why MapR? • MapR tables are inherently faster, safer – Sustained > 1GB/s ingest rate in tests • Mirror to M5 or M7 cluster to isolate analytics load • Transaction logs involves frequent appends, many files
  • 64. © 2014 MapR Technologies 64 When is this All Wrong? • In some cases, retrieval by series-id + time range not sufficient • May need very flexible retrieval of events based on text-like criteria • Search may be better than class time-series database • Can scale Lucene based search to > 1 million events / second
  • 65. © 2014 MapR Technologies 65 When is it Even More Right • In many industrial settings, data rates from individual sensors are relatively high – Latency to view is still measured in seconds, not sample points • This allows batching at source • Common requirement for highly variable sample rates – 1 sample/s, baseline, switch to 10 k sample/s – Small batches during slow times are just fine since number of sensors is constant – Requires variable window sizes
  • 66. © 2014 MapR Technologies 66 Summary • The internet is turning upside down • This will make time series ubiquitous • Current open source systems are much too slow • We can fix that with modern NoSQL systems – (I wear a red hat for a reason)
  • 67. © 2014 MapR Technologies 67 Questions
  • 68. © 2014 MapR Technologies 68 Thank You @mapr maprtech tdunning@mapr.com tdunning@apache.org Ted Dunning, ChiefApplicationArchitect MapRTechnologies maprtech mapr-technologies

Notes de l'éditeur

  1. Talk track: 2nd in series, first was on how to build a simple recommender. This one on anomaly detection is being sold by O’Reilly on Amazon, but for a limited time MapR is giving away the e-book for free. Here’s the link where you can register to get one.
  2. Ted’s original talk notes: OpenTSDB consists of a Time Series Daemon (TSD) as well as set of command line utilities. Interaction with OpenTSDB is primarily achieved by running one or more of the TSDs. Each TSD is independent. There is no master, no shared state so you can run as many TSDs as required to handle any load you throw at it. Each TSD uses the open source databaseHBase to store and retrieve time-series data. The HBase schema is highly optimized for fast aggregations of similar time series to minimize storage space. Users of the TSD never need to access HBase directly. You can communicate with the TSD via a simple telnet-style protocol, an HTTP API or a simple built-in GUI. All communications happen on the same port (the TSD figures out the protocol of the client by looking at the first few bytes it receives).
  3. Key ideas: Unique row key based on an id for each time series (looked up from a separate look-up table); important part of the efficiency of design is to have each column be a time off-set from the start time shown in the row key. Note that data is stored point-by-point in this wide table design. Ted’s notes from his original slide: One technique for increasing the rate at which data can be retrieved from a time series database is to store many values in each row. Doing this allows data points to be retrieved at a higher speed Because both HBase and MapR-DB store data ordered by the primary key, this design will cause rows containing data from a single time series to wind up near one another on disk. Retrieving data from a particular time series for a time range will involve largely sequential disk operations and therefore will be much faster than would be the case if the rows were widely scattered. Typically, the time window is adjusted so that 100–1,000 samples are in each row.
  4. Ted’s notes from original slide: The table design is improved by collapsing all of the data for a row into a single data structure known as a blob. This blob can be highly compressed so that less data needs to be read from disk. Also, having a single column per row decreases the per-column overhead incurred by the on-disk format that HBase uses, which further increases performance. Data can be progressively converted to the compressed format as soon as it is known that little or no new data is likely to arrive for that time series and time window. Commonly, once the time window ends, new data will only arrive for a few more seconds, and the compression of the data can begin. Since compressed and uncompressed data can coexist in the same row, if a few samples arrive after the row is compressed, the row can simply be compressed again to merge the blob and the late-arriving samples.
  5. Richard: This is based on a figure from Chapter 3 of our book. Point here is to show that with standard Open TSDB, data is loaded into the wide table point-by-point, then pulled out and compressed to blob, then reloaded to form the hybrid table. This is a fairly efficient arrangement. Next slide will show how this is speeded up with the MapR open source extensions. Here are Ted’s original notes: Since data is inserted in the uncompressed format, the arrival of each data point requires a row update operation to insert the value into the database. Then read again by the blob maker. Reads are approximately equal to writes. Once data is compressed to blobs, it is again written to the database. This row update can limit the insertion rate for data to as little as 20,000 data points per second per node in the cluster.
  6. Richard: Also based on a figure from Chapter 3 of book: This slide shows the increased performance using the open source code MapR made open on github. I’ve added the github link. The key differences is that the blob production occurs upstream, before the data is ever loaded into the table. The restart logs are useful so that if there were ever a glitch with the process of compressing data to blobs and insertion, you would not lose the original data. Note that there is still the delay while blobs are made… see explanation in book, chapters 3 and 4. Richard: Please preserve the rest of the material on fast ingestion with MapR extensions (direct blob loading) for Ted’s talk on Sat. Use this slide as a preview and mention that Ted will be talking about this on Fiday. Ted’s original notes: the direct blob insertion data flow allows the insertion rate to be increased by as much as roughly 1,000-fold. How does the direct blob approach get this bump in performance? The essential difference is that the blob maker has been moved into the data flow between the catcher and the NoSQL time series database. This way, the blob maker can use incoming data from a memory cache rather than extracting its input from wide table rows already stored in the storage tier. the full data stream is only written to the memory cache, which is fast, rather than to the database. Data is not written to the storage tier until it’s compressed into blobs, so writing can be much faster. The number of database operations is decreased by the average number of data points in each of the compressed data blobs. This decrease can easily be a factor in the thousands.