SlideShare une entreprise Scribd logo
1  sur  22
Copyright © 2015 Splunk Inc.
Hunk – Unlocking the
Power of Big Data
Splunk
Disruptive Approach to Unstructured Data
Structured
RDBMS
SQL Search
Schema at Write Schema at Read
1980-2010 2010+
ETL Universal
Indexing
Unstructured
Volume | Velocity | Variety
Mainframe
Data
VMware
Platform for Machine Data
Exchange PCI Security
DB Connect MobileForwarders
Syslog,
TCP,
Other
Sensors,
Control
Systems
600+ Ecosystem of Apps
Stream
SPLUNK TODAY
Copyright © 2015 Splunk Inc.
Splunk – Big Data Engine
5
Distributed File System
(semi-structured)
Key/Value, Columnar or
Other (semi-structured)
Relational Database
(highly structured)
MapReduce
Cassandra
Accumulo
MongoDB
Splunk - Big Data Technologies
SQL &
MapReduce
NoSQL
Temporal, Unstructured
Heterogeneous
Hadoop
RDBMS HDFS Storage +
MapReduce
Real-Time Indexing
5
Oracle
MySQL
IBM DB2
Teradata
Massive Linear Scalability to Tens of TBs/Day
Send data from 1000s of servers using combination of Splunk Forwarders, syslog, WMI, message queues, or other remote protocols
Auto load-balanced forwarding to as many Splunk Indexers as you need to index terabytes/day
Offload search load to Splunk Search Heads
6
Automatic load balancing linearly
scales indexing
Distributed search and MapReduce
linearly scales search and reporting
7
Splunk Real-Time Analytics
Data
ParsingQueue
Parsing Pipeline
• Source, event typing
• Character set
normalization
• Line breaking
• Timestamp identification
• Regex transforms
Indexing
Pipeline
Real-time
Buffer
Raw data
Index Files
Real-time
Search
Process
Monitor Input
IndexQueue
TCP/UDP Input
Scripted Input
Splunk
Index
7
8
Search Head Clustering
Ability to group search heads into a cluster in order to provide
Highly Available and Scalable search services = Thousands of Users
8
MISSION
CRITICAL
ENTERPRISE
9
Splunk Index Replication – High Availability
9
2
Master asks the redundant
peer to act as primary
3
Peers copies the search
files / index files / raw data
2 3
1
Master auto-detects that a
peer is down
1
• Default is 3X Replication
Copyright © 2015 Splunk Inc.
Hunk – Hadoop
11
Splunk and Hadoop
1
Hunk:
– Main use case = Analyze Hadoop Data using Hadoop Processing
Splunk Hadoop Connect:
– Main use case = Real-time export data from Splunk to Hadoop
Hunk Archive
– Main use case = Archive Splunk indexers to Hadoop
Splunk HadoopOps:
– Main use case = Monitor Hadoop
1
Integrated Analytics Platform
Full-featured,
Integrated
Product
Insights for
Everyone
Works with
What You
Have Today
Explore Visualize Dashboard
s
ShareAnalyze
Hadoop Clusters NoSQL and Other Data Stores
Hadoop Client Libraries Streaming Resource Libraries
for Diverse Data Stores
13
Hunk – Unique
1
1. Run Natively in Hadoop:
– Use Hadoop MapReduce
2. Mixed Mode:
– Allows for data Preview
3. Auto deploy SplunkD to DataNodes:
– On the fly Indexing
4. Access Control:
– Allows for many users / many Hadoop directories / support Kerberos
5. Schema On the Fly
14
Run Natively in Hadoop
External resource
(e.g. hadoop.prod)
MapReduce
jobs
Tasks
/ working
directory
Index on data nodes
Hunk
search head >
1
5
3
4
2
NameNode
JobTracker
(YARN)
DataNode /
TaskTracker
DataNode /
TaskTracker
DataNode /
TaskTracker
HDFS
14
Hadoop
MR Jobs
15
Mixed-mode Search
15
Time
Hadoop MR /
Splunk Index
Splunk Stream
Switch over
time
preview
preview
• Data Preview
• Allows users to search interactively by pausing and
refining queries
16
Indexing On the fly - Hunk Data Processing
16
HDFS
Results
Final search
results
ERP
Search process
Remote results Remote results
Search head
MapReduce
Search process
TaskTracker
raw
preprocessed
Remote results
Remote results
17 1
Role-based Security for Shared Clusters
Pass-through
Authentication
• Provide role-based security
for Hadoop clusters
• Access Hadoop resources
under security and
compliance
• Integrates with Kerberos
for Hadoop security
Business
Analyst
Marketing
Analyst
Sys
Admin
Business
Analyst
Queue:
Biz Analytics
Marketing
Analyst
Queue:
Marketing
Sys
Admin2
Queue:
Prod
18
Managed Archiving Splunk Enterprise to Hunk-HDFS
1
• Archive buckets to Hadoop (HDFS) instead of freezing buckets or throwing data away
• Store old data up to 1/10 cheaper in Hadoop cheap batch storage instead of SANs
• Optimize Splunk Enterprise search head performance for real-time monitoring,
alerting and dashboarding with short-term historical context
• Hunk search, analyze and visualize months or years of historical data in Hadoop
• Run federated queries and dashboards across Splunk Enterprise and Hunk
Hadoop Clusters
WARM
COLD
FROZEN
19
Hunk Enables Hadoop as Self Service
1
20
New Search
i ndex=" j obsummar y_l ogs_al l _r ed" cl ust er =" di l i t hi um* " | eval t ot al _sl ot _seconds=( m apSl ot Seconds + r educeSl ot Sec
onds) | eval gb_hour s=( ( t ot al _sl ot _seconds * 0. 5) / 3600) | eval gb_hour s=r ound( gb_h our s) | t i mechar t span=6h sum
( gb_hour s) as gb_hour s by queue
Last 7 days
✓ 1,175,726 events (5/20/ 14 8:00:00.000 PM to 5/ 27/14 8:26:26.000 PM)
200,000
400,000
600,000
_time ↕
OTH
ER
↕
apg_dai
lyhigh_
p3 ↕
apg_dail
ymedium
_p5 ↕
apg_hou
rlyhigh_
p1 ↕
apg_ho
urlylow_
p4 ↕
apg_hourl
ymedium
_p2 ↕
apg
_p7
↕
curveb
all_larg
e ↕
curveb
all_me
d ↕
sling
shot
↕
sling
stone
↕
Visualization
_time
Wed May 21
2014
Thu May 22 Fri May 23 Sat May 24 Sun May 25 Mon May 26
Yahoo - Visualizing Hadoop
2
• 600PB of Data
• Very large clusters used by many
groups across the enterprise
• 35,000 individual Datanodes
• Hadoop is provided as a Self
Service
21
Vantrix Mobile media optimization
2
144 Hadoop Nodes,
69 TB SSD Storage
Analytics Application
10 Million subscribers generate:
• 80GB of raw session log data / day
• 26 Million video data session records
Hunk Query
• 20 sec – search through 27M events
• Returning 4.7M events
Hunk as indexer - Automatically indexed and counted field value occurrences
Hunk as Self Service - Proved invaluable for identifying and exploring use cases
Hunk business value – Help identify when subscribers abandon video
Thank You

Contenu connexe

Tendances

Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Spark Summit
 
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark Summit
 
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaSpark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaDatabricks
 
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarScalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarDatabricks
 
Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersEnabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersDataWorks Summit
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)Databricks
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Spark Summit
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidDataWorks Summit
 
SplunkLive! London 2016 Getting started with Splunk
SplunkLive! London 2016 Getting started with SplunkSplunkLive! London 2016 Getting started with Splunk
SplunkLive! London 2016 Getting started with SplunkSplunk
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsDataWorks Summit
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun JeongSpark Summit
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystemDataiku
 
Spark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with SparkSpark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with SparkDatabricks
 
Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming Stratio
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and SparkMichael Stack
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Uwe Printz
 

Tendances (20)

Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
 
Distributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop ClustersDistributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop Clusters
 
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
 
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
 
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaSpark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
 
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarScalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
 
Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersEnabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
 
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
 
SplunkLive! London 2016 Getting started with Splunk
SplunkLive! London 2016 Getting started with SplunkSplunkLive! London 2016 Getting started with Splunk
SplunkLive! London 2016 Getting started with Splunk
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystem
 
Spark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with SparkSpark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with Spark
 
Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
 
What's next for Big Data? -- Apache Spark
What's next for Big Data? -- Apache SparkWhat's next for Big Data? -- Apache Spark
What's next for Big Data? -- Apache Spark
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
 

Similaire à Hunk - Unlocking the Power of Big Data

Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoIntro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoMapR Technologies
 
Solution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorSolution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorBlueData, Inc.
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Jim Dowling
 
Hadoop breizhjug
Hadoop breizhjugHadoop breizhjug
Hadoop breizhjugDavid Morin
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku
 
9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to SchoolAdam Doyle
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Vinoth Chandar
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeongYousun Jeong
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big DataOmnia Safaan
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionSplunk
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environmentDelhi/NCR HUG
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache HadoopSteve Watt
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkVenkata Naga Ravi
 
Splunk and map_reduce
Splunk and map_reduceSplunk and map_reduce
Splunk and map_reduceGreg Hanchin
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Sparktsliwowicz
 
Python Powered Data Science at Pivotal (PyData 2013)
Python Powered Data Science at Pivotal (PyData 2013)Python Powered Data Science at Pivotal (PyData 2013)
Python Powered Data Science at Pivotal (PyData 2013)Srivatsan Ramanujam
 

Similaire à Hunk - Unlocking the Power of Big Data (20)

Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoIntro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of Twingo
 
Solution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorSolution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab Accelerator
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
Hadoop breizhjug
Hadoop breizhjugHadoop breizhjug
Hadoop breizhjug
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
 
9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived
 
InternReport
InternReportInternReport
InternReport
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environment
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
 
Splunk and map_reduce
Splunk and map_reduceSplunk and map_reduce
Splunk and map_reduce
 
Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Python Powered Data Science at Pivotal (PyData 2013)
Python Powered Data Science at Pivotal (PyData 2013)Python Powered Data Science at Pivotal (PyData 2013)
Python Powered Data Science at Pivotal (PyData 2013)
 

Plus de Splunk

.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routineSplunk
 
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTVSplunk
 
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Navegando la normativa SOX (Telefónica).conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Navegando la normativa SOX (Telefónica)Splunk
 
.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - Raiffeisen Bank International.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - Raiffeisen Bank InternationalSplunk
 
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett .conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett Splunk
 
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär).conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)Splunk
 
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu....conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...Splunk
 
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever....conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...Splunk
 
.conf go 2023 - De NOC a CSIRT (Cellnex)
.conf go 2023 - De NOC a CSIRT (Cellnex).conf go 2023 - De NOC a CSIRT (Cellnex)
.conf go 2023 - De NOC a CSIRT (Cellnex)Splunk
 
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)Splunk
 
Splunk - BMW connects business and IT with data driven operations SRE and O11y
Splunk - BMW connects business and IT with data driven operations SRE and O11ySplunk - BMW connects business and IT with data driven operations SRE and O11y
Splunk - BMW connects business and IT with data driven operations SRE and O11ySplunk
 
Splunk x Freenet - .conf Go Köln
Splunk x Freenet - .conf Go KölnSplunk x Freenet - .conf Go Köln
Splunk x Freenet - .conf Go KölnSplunk
 
Splunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go KölnSplunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go KölnSplunk
 
Data foundations building success, at city scale – Imperial College London
 Data foundations building success, at city scale – Imperial College London Data foundations building success, at city scale – Imperial College London
Data foundations building success, at city scale – Imperial College LondonSplunk
 
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...Splunk
 
SOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security WebinarSOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security WebinarSplunk
 
.conf Go 2022 - Observability Session
.conf Go 2022 - Observability Session.conf Go 2022 - Observability Session
.conf Go 2022 - Observability SessionSplunk
 
.conf Go Zurich 2022 - Keynote
.conf Go Zurich 2022 - Keynote.conf Go Zurich 2022 - Keynote
.conf Go Zurich 2022 - KeynoteSplunk
 
.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform SessionSplunk
 
.conf Go Zurich 2022 - Security Session
.conf Go Zurich 2022 - Security Session.conf Go Zurich 2022 - Security Session
.conf Go Zurich 2022 - Security SessionSplunk
 

Plus de Splunk (20)

.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine
 
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
 
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Navegando la normativa SOX (Telefónica).conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
 
.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - Raiffeisen Bank International.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - Raiffeisen Bank International
 
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett .conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
 
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär).conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
 
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu....conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
 
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever....conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
 
.conf go 2023 - De NOC a CSIRT (Cellnex)
.conf go 2023 - De NOC a CSIRT (Cellnex).conf go 2023 - De NOC a CSIRT (Cellnex)
.conf go 2023 - De NOC a CSIRT (Cellnex)
 
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
 
Splunk - BMW connects business and IT with data driven operations SRE and O11y
Splunk - BMW connects business and IT with data driven operations SRE and O11ySplunk - BMW connects business and IT with data driven operations SRE and O11y
Splunk - BMW connects business and IT with data driven operations SRE and O11y
 
Splunk x Freenet - .conf Go Köln
Splunk x Freenet - .conf Go KölnSplunk x Freenet - .conf Go Köln
Splunk x Freenet - .conf Go Köln
 
Splunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go KölnSplunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go Köln
 
Data foundations building success, at city scale – Imperial College London
 Data foundations building success, at city scale – Imperial College London Data foundations building success, at city scale – Imperial College London
Data foundations building success, at city scale – Imperial College London
 
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
 
SOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security WebinarSOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security Webinar
 
.conf Go 2022 - Observability Session
.conf Go 2022 - Observability Session.conf Go 2022 - Observability Session
.conf Go 2022 - Observability Session
 
.conf Go Zurich 2022 - Keynote
.conf Go Zurich 2022 - Keynote.conf Go Zurich 2022 - Keynote
.conf Go Zurich 2022 - Keynote
 
.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session
 
.conf Go Zurich 2022 - Security Session
.conf Go Zurich 2022 - Security Session.conf Go Zurich 2022 - Security Session
.conf Go Zurich 2022 - Security Session
 

Dernier

Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 

Dernier (20)

Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 

Hunk - Unlocking the Power of Big Data

  • 1. Copyright © 2015 Splunk Inc. Hunk – Unlocking the Power of Big Data
  • 2. Splunk Disruptive Approach to Unstructured Data Structured RDBMS SQL Search Schema at Write Schema at Read 1980-2010 2010+ ETL Universal Indexing Unstructured Volume | Velocity | Variety
  • 3. Mainframe Data VMware Platform for Machine Data Exchange PCI Security DB Connect MobileForwarders Syslog, TCP, Other Sensors, Control Systems 600+ Ecosystem of Apps Stream SPLUNK TODAY
  • 4. Copyright © 2015 Splunk Inc. Splunk – Big Data Engine
  • 5. 5 Distributed File System (semi-structured) Key/Value, Columnar or Other (semi-structured) Relational Database (highly structured) MapReduce Cassandra Accumulo MongoDB Splunk - Big Data Technologies SQL & MapReduce NoSQL Temporal, Unstructured Heterogeneous Hadoop RDBMS HDFS Storage + MapReduce Real-Time Indexing 5 Oracle MySQL IBM DB2 Teradata
  • 6. Massive Linear Scalability to Tens of TBs/Day Send data from 1000s of servers using combination of Splunk Forwarders, syslog, WMI, message queues, or other remote protocols Auto load-balanced forwarding to as many Splunk Indexers as you need to index terabytes/day Offload search load to Splunk Search Heads 6 Automatic load balancing linearly scales indexing Distributed search and MapReduce linearly scales search and reporting
  • 7. 7 Splunk Real-Time Analytics Data ParsingQueue Parsing Pipeline • Source, event typing • Character set normalization • Line breaking • Timestamp identification • Regex transforms Indexing Pipeline Real-time Buffer Raw data Index Files Real-time Search Process Monitor Input IndexQueue TCP/UDP Input Scripted Input Splunk Index 7
  • 8. 8 Search Head Clustering Ability to group search heads into a cluster in order to provide Highly Available and Scalable search services = Thousands of Users 8 MISSION CRITICAL ENTERPRISE
  • 9. 9 Splunk Index Replication – High Availability 9 2 Master asks the redundant peer to act as primary 3 Peers copies the search files / index files / raw data 2 3 1 Master auto-detects that a peer is down 1 • Default is 3X Replication
  • 10. Copyright © 2015 Splunk Inc. Hunk – Hadoop
  • 11. 11 Splunk and Hadoop 1 Hunk: – Main use case = Analyze Hadoop Data using Hadoop Processing Splunk Hadoop Connect: – Main use case = Real-time export data from Splunk to Hadoop Hunk Archive – Main use case = Archive Splunk indexers to Hadoop Splunk HadoopOps: – Main use case = Monitor Hadoop
  • 12. 1 Integrated Analytics Platform Full-featured, Integrated Product Insights for Everyone Works with What You Have Today Explore Visualize Dashboard s ShareAnalyze Hadoop Clusters NoSQL and Other Data Stores Hadoop Client Libraries Streaming Resource Libraries for Diverse Data Stores
  • 13. 13 Hunk – Unique 1 1. Run Natively in Hadoop: – Use Hadoop MapReduce 2. Mixed Mode: – Allows for data Preview 3. Auto deploy SplunkD to DataNodes: – On the fly Indexing 4. Access Control: – Allows for many users / many Hadoop directories / support Kerberos 5. Schema On the Fly
  • 14. 14 Run Natively in Hadoop External resource (e.g. hadoop.prod) MapReduce jobs Tasks / working directory Index on data nodes Hunk search head > 1 5 3 4 2 NameNode JobTracker (YARN) DataNode / TaskTracker DataNode / TaskTracker DataNode / TaskTracker HDFS 14 Hadoop MR Jobs
  • 15. 15 Mixed-mode Search 15 Time Hadoop MR / Splunk Index Splunk Stream Switch over time preview preview • Data Preview • Allows users to search interactively by pausing and refining queries
  • 16. 16 Indexing On the fly - Hunk Data Processing 16 HDFS Results Final search results ERP Search process Remote results Remote results Search head MapReduce Search process TaskTracker raw preprocessed Remote results Remote results
  • 17. 17 1 Role-based Security for Shared Clusters Pass-through Authentication • Provide role-based security for Hadoop clusters • Access Hadoop resources under security and compliance • Integrates with Kerberos for Hadoop security Business Analyst Marketing Analyst Sys Admin Business Analyst Queue: Biz Analytics Marketing Analyst Queue: Marketing Sys Admin2 Queue: Prod
  • 18. 18 Managed Archiving Splunk Enterprise to Hunk-HDFS 1 • Archive buckets to Hadoop (HDFS) instead of freezing buckets or throwing data away • Store old data up to 1/10 cheaper in Hadoop cheap batch storage instead of SANs • Optimize Splunk Enterprise search head performance for real-time monitoring, alerting and dashboarding with short-term historical context • Hunk search, analyze and visualize months or years of historical data in Hadoop • Run federated queries and dashboards across Splunk Enterprise and Hunk Hadoop Clusters WARM COLD FROZEN
  • 19. 19 Hunk Enables Hadoop as Self Service 1
  • 20. 20 New Search i ndex=" j obsummar y_l ogs_al l _r ed" cl ust er =" di l i t hi um* " | eval t ot al _sl ot _seconds=( m apSl ot Seconds + r educeSl ot Sec onds) | eval gb_hour s=( ( t ot al _sl ot _seconds * 0. 5) / 3600) | eval gb_hour s=r ound( gb_h our s) | t i mechar t span=6h sum ( gb_hour s) as gb_hour s by queue Last 7 days ✓ 1,175,726 events (5/20/ 14 8:00:00.000 PM to 5/ 27/14 8:26:26.000 PM) 200,000 400,000 600,000 _time ↕ OTH ER ↕ apg_dai lyhigh_ p3 ↕ apg_dail ymedium _p5 ↕ apg_hou rlyhigh_ p1 ↕ apg_ho urlylow_ p4 ↕ apg_hourl ymedium _p2 ↕ apg _p7 ↕ curveb all_larg e ↕ curveb all_me d ↕ sling shot ↕ sling stone ↕ Visualization _time Wed May 21 2014 Thu May 22 Fri May 23 Sat May 24 Sun May 25 Mon May 26 Yahoo - Visualizing Hadoop 2 • 600PB of Data • Very large clusters used by many groups across the enterprise • 35,000 individual Datanodes • Hadoop is provided as a Self Service
  • 21. 21 Vantrix Mobile media optimization 2 144 Hadoop Nodes, 69 TB SSD Storage Analytics Application 10 Million subscribers generate: • 80GB of raw session log data / day • 26 Million video data session records Hunk Query • 20 sec – search through 27M events • Returning 4.7M events Hunk as indexer - Automatically indexed and counted field value occurrences Hunk as Self Service - Proved invaluable for identifying and exploring use cases Hunk business value – Help identify when subscribers abandon video

Notes de l'éditeur

  1. But listening to your machine data isn’t as easy as it sounds. Machine Data is different: It is voluminous unstructured time series data with no predefined schema It is generated by all IT systems– from servers and applications, to RFIDs and wire data. It is non-standard data and characterized by many unpredictable and changing formats   Because of this, machine data cannot be managed using traditional approaches. Traditional approaches require you to transform your data and force fit it into a brittle schema – They aren’t designed to handle the inconsistent machine data formats Traditional approaches are designed with specific use cases and queries in mind – they limit the problems that you can solve Traditional approaches rely on siloed tools that are designed for structured data approaches and legacy computing environments – They are inherently limited in their ability to scale   To listen to your machine data, you need a solution with no limits: No limits on the formats of data No limits on where you can collect the data from No limits on the questions that you can ask and the use cases you can solve. And no limits on scale.   You need a solution that can keep up with Machine Data.  
  2. Since then, Splunk has invested significantly to expand from a search tool to a mission-critical platform. The platform includes hundreds of data types and can scale to massive volumes Today, it’s more than Splunk Enterprise, we’ve added Splunk Cloud, Hunk, Splunk MINT for mobile intelligence; and have more than 600 Apps. Machine data is more than logs! It’s wire data, mainframe data, mobile device data, sensor data, metrics Your use cases have evolved well beyond troubleshooting so we’re investing in solutions that leverage the power of Splunk Enterprise to provide you with packaged views into your data for faster, deeper insights. Our most well-known solution is Splunk Enterprise Security and if you aren’t using it yet, we encourage you to find out why it’s turning the traditional SIEM market upside down.
  3. How has big data evolved over time. For a long time, ‘big data’ was was simply a large database. The database industry – in order to handle large data – moved to smaller databases, but many of them. Horizontal partitioning (Also known as Sharding) is a database design principle whereby rows of a database table are held separately (For example, A -> D in one database E -> H in a second database, etc ..) Hadoop was introduced by Google and was adapted as the de-facto big data system. Hadoop is an open source project from Apache that has evolved rapidly into a major technology movement. It has emerged as a popular way to handle massive amounts of data, including structured and complex unstructured data. Its popularity is due in part to its ability to store and process large amounts of data effectively across clusters of commodity hardware. Apache Hadoop is not actually a single product but instead a collection of several components. For the most part, Hadoop is a batch oriented system. ** Teradata Aster Data & SQL on Hadoop are SQL interface systems that can talk to Hadoop ** Cassandra & HBase are NoSQL databases that can process data using a Key / Value in real-time. Splunk = Temporal, Unstructured, Heterogeneous, real-time analytics platform.
  4. Splunk allows you divide up the work of search and indexing across as many servers as you need to achieve the performance and scale you require. Using work dividing techniques such as MapReduce, Splunk can take a single search and query as many indexers as you need to complete the job, allowing you to use inexpensive commodity hardware in massively parallel clusters. For example, if you had 1 million events to search, one Indexer can easily complete that search. But it will take a little time – let’s say 30 seconds. However, if the same million events was spread across 10 indexers, the same search would complete in 3 seconds. How fast and how large you want your searches is yours to control by adding indexers as desired.
  5. For the most part, you can use monitor to add nearly all your data sources from files and directories. However, you might want to use upload to add one-time inputs, such as an archive of historical data. You can enable Splunk to accept an input on any TCP or UDP port. Splunk consumes any data sent on these ports. Use this method for syslog (default port is UDP 514), or set up netcat and bind to a port. TCP is the protocol underlying Splunk's data distribution and is the recommended method for sending data from any remote machine to your Splunk server. Splunk can index remote data from syslog-ng or any other application that transmits via TCP. However, there are times when you want to use scripts to feed data to Splunk for indexing, or prepare data from a non-standard source so Splunk can properly parse events and extract fields. You can use shell scripts, python scripts, Windows batch files, PowerShell, or any other utility that can format and stream the data that you want Splunk to index. You can stream the data to Splunk or write the data from a script to a file. All data that comes into Splunk enters through the parsing pipeline as large chunks. During parsing, Splunk breaks these chunks into events which it hands off to the indexing pipeline, where final processing occurs. During both parsing and indexing, Splunk acts on the data, transforming it in various ways. Most of these processes are configurable, so you have the ability to adapt them to your needs. To kick off a real-time search in Splunk Web, use the time range menu to select a preset Real-time time range window, such as 30 seconds or 1 minute. You can also specify a sliding time range window to apply to your real-time search. This defines a real-time buffer. The Splunk Index is the repository for Splunk Enterprise data. Splunk Enterprise transforms incoming data into events, which it stores in indexes.
  6.   Faster Recovery II -   If you look at the screen – 2 indexers on the left with green cylinders – searchable copies of the data, 2 indexers on the right – only raw data What happens when a peer goes down, master waits for hb timeout and marks the peer down. <Cl> Reassigns primaries to another peer. Then tries to enforce the replication policy, makes copies of the raw data and search files In 5.0, search files are generated on each peer from the raw data, In 6.0, the search files are copied over from a peer that already has them instead of regenerating. <Cl> These statistics are from our internal tests... Another point to note is generating search files from the rawdata is cpu intensive as compared to copying search files.  
  7. Quick to set-up, scales to multiple concurrent databases Enrich machine data with structured data from relational databases Execute database queries directly from the Splunk user interface Browse and navigate database schemas and tables Combine machine data with structured data from relational databases
  8. Quick to set-up, scales to multiple concurrent databases Enrich machine data with structured data from relational databases Execute database queries directly from the Splunk user interface Browse and navigate database schemas and tables Combine machine data with structured data from relational databases
  9. Search execution: The Hunk Search head takes the list of content of directories in the virtual index. The search head filters directories & files based on the search & time range (partition pruning) The NameNode and JobTracker (MapReduce Resource Manager in YARN) read data from MapReduce framework and feed it to search process. The process computes File Splits, constructs and submits the MapReduce jobs. Hunk streams a few File Splits from HDFS and processes them in the Search Head to provider quick previews. The search head consumes and merges the MapReduce results (provide incremental previews) while the MapReduce jobs kick off. The data nodes run a copy of splunkd to process the the jobs and write them to a working directory in HDFS. Final results are stored in the Hunk search head. Hunk utilizes the Splunk Search Processing Language, the industry-leading method to enable interactive data exploration across large, diverse data sets. There is no requirement to "understand" data up front. For customers of Splunk Enterprise, reuse your Search Processing Language knowledge and skill set for data stored in Hadoop. Any commands whose output depends on the event input order would yield different results – this is because Splunk guarantees events to be delivered in descending time order. Hunk doesn’t. This is the reason why transaction and localize do not work. We can see the results from the intermediate Hadoop Map jobs getting steamed into the Splunk UI even before all the Map jobs are finished, and once all the Hadoop Maps are done processing the results, Splunk displays the full results. In essence, Splunk acts as the Hadoop Reduce phase and there is no need to use Hadoop for that phase.
  10. Hunk starts the streaming and reporting modes concurrently. Streaming results show until the reporting results come in. Allows users to search interactively by pausing and refining queries. This is a major, unique advantage of Hunk compared to alternative approaches such as Hive or SQL on Hadoop which require fixed schema in an effort to speed up searches, while Hunk retains the combination of schema on the fly with results preview.
  11. In this new feature, planned for release in the next Hunk release (version 6.2.1), archive buckets to Hadoop (the Hadoop Distributed File System, or HDFS) instead of freezing buckets or throwing data away. This significantly lowers the total cost of ownership (TCO) for Splunk Enterprise installations while giving security analysts, risk managers and marketers access to months or years of historical data integral for their job success. Store old data up to 1/10 cheaper in Hadoop cheap batch storage instead of SANs Optimize Splunk Enterprise search head performance for real-time monitoring, alerting and dashboarding with short-term historical context Hunk search, analyze and visualize months or years of historical data in Hadoop Run federated queries and dashboards across Splunk Enterprise and Hunk
  12. Indexing