SlideShare a Scribd company logo
1 of 19
Indrasis Mondal
August 11, 2018
Real-time Operational Intelligence at Hulu
2
❏ Operational Intelligence at Hulu
❏ Buy vs Build
❏ High Level Architecture
❏ Use Cases
❏ Conclusion/Challenges
Agenda
3
Operational Intelligence
● Operational intelligence (OI) is a category of
real-time dynamic, business analytics that
delivers visibility and insight into data, streaming
events and business operations.
Business Intelligence
● Business intelligence solutions help
organizations improve their business
performance over time.
What is Operational Intelligence
Source: ,
https://www.linkedin.com/pulse/operational-reporting-vs-business-intelligence-whats-sean-williams/
4
The operational intelligence tool at Hulu is known as Glyph. It empowers Hulu to easily draw insights from
real-time and historical event driven data
● Capabilities
○ Real time data exploration , analytics and visualization
○ Interactive query
○ Real-time dashboard
○ Dynamic Real-time funnels
○ API Service
● Primary Usages
○ Operational Intelligence and Reporting
○ User interaction
○ Product usage
○ Video quality
○ App and device performance, etc.
Operational Intelligence Capability at Hulu
5
● Guiding Principles
○ Need for a data visualization tool.
○ This tool is capable of answering event-driven questions
○ Questions about user interaction, app health or quality of service
○ This tool serves as serving layer in a lambda pattern
● Key Assumptions
○ Primary stakeholders are Product, Technology, Engineering Operation, and Analysts
○ Ad Hoc questions related to events are time-bound
○ Aggregation at any other level are not optimized - risk of slow response
○ The data is available in real-time
○ The data is aggregated, not sampled
○ Result delay is not equal to query delay, it is data availability question!
○ CAP: Availability and partition tolerance guaranteed
○ CAP: Eventual consistency achieved through batch layer
Glyph Introduction
6
Why Build?
7
• ~8TB of device and app data produced each day, with ~4PB available in hadoop
• ~150K Events / Second flowing through the pipeline
• ~1.5TB of Druid data generated each day
• ~5 second delta from data emitted by client to data is queryable
• ~150k Glyph queries per day, resulting in ~450k Druid queries
• 250ms Average response time
• 1100ms P95 response time across all queries
• ~50% of Hulu employees use Glyph with ~10% using it on any given day
• Largest single data source in Druid produces ~0.5TB per day
Glyph by Numbers
8
High Level Architecture (Lambda Pattern at Hulu)
9
● Open source timeseries column-oriented datastore: http://druid.io/
● Supports streaming data, which is immediately queryable
● Built for lambda-style architectures
● Highly distributed
● Sub-second response times
● Built in time-based tiering for data storage
● All-in-one system split across several roles
○ Historical: Data node. Loads segments determined by the coordinator and makes them available
for querying. Executes queries on the portions of data owned by the given node.
○ Coordinator: Data availability node. Manages the Historical nodes and performs segment balancing
and handoffs.
○ Middle Manager: Indexing node. Spawns many workers (peons) on each host which ingest
streaming data.
○ Overlord: “Middle manager” manager. Distributes indexing jobs across the many available middle
managers.
○ Broker: Query nodes. Farm user queries out over many historicals + peons then aggregates node
results.
Druid at a glance
10
● Cluster: ~80 Druid nodes with ~160 TB of total storage
● Map events to data sources
○ High-volume and/or complexity events get their own dedicated data sources
■ Can individually scale and modify the retention to fit the requirements
■ Higher cost as require dedicated indexing capacity
○ Low-volume events get merged into shared data sources
■ Reduces indexing resources used in favor of potentially worse storage size
● Simplify column types for ingestion
○ Dimension: Some column people want to filter/split over
○ Metric: Some column people want to aggregate. Each metric generates every aggregation type,
allowing users to execute any query
○ High-cardinality: Some column which would normally be a dimension, but due to the cardinality
of the value has limited value in full fidelity. These columns are only able to have a count distinct
query run over them, as they have been aggregated away into a sketch representation
Druid - How we use it
11
● Core problems:
○ Data definitions and requirements constantly change
○ Many data sources, hard to maintain consistency across them all
○ Need flexibility in order to change things like segment size, granularity and schemas as data ages
○ Hard to tell if data is getting dropped, or if there is just no data due to ingestion setup
● What we tried:
○ Ingest data via blacklist
■ PRO: Easiest setup imaginable
■ CON: Easiest database failure mode imaginable
○ Ingest data via whitelist shipped along with ingestion services
■ PRO: Easy to add a new event just had to modify 2 config files
■ CON: Was hard to implement a config schema that didn’t involve complex logic across
multiple projects
■ CON: Each project ended up having config differences due to forgotten / delayed
deployments
● Current: Ingest data via whitelist served via micro-service
○ PRO: Guaranteed consistency across ingestion services
○ PRO: Allowed development of configuration as a service, rather than as an afterthought
○ CON: Effectively put a single point of failure, if config service went down ingestion would fail
Ingestion Configs
12
Ingestion Configs
13
● Problems:
○ Druid syntax is decidedly *not* SQL
○ Use case requirements were for very simple querying; Don’t bring a firetruck to a water gun fight
○ Wanted query descriptions to be as simple as possible
■ Not building a generic SQL engine, so able to define a simple data model to describe the queries
■ Simple query description allowed us to easily fit it into a rich UI, as well as work with internally
○ Given use cases, common to see the same query issued many times, but we have had some
difficulties with Druid query caching at scale
○ Need to abstract consumers away from our druid implementation choices
○ Sometimes people want joins
● Solutions:
○ Build an API to do simplified data model -> query translation
○ Build in query-aware caching logic
○ Abstract away our implementations during query translations
○ Build in API-side query time lookups to project properties on top of existing data
Glyph API
Over the first half of 2018 6-10% of Hulu’s employees
view Glyph dashboards each month. What are these
dashboards used for?
1. Real time monitoring of special events
2. Real time monitoring of feature launch
3. Real time monitoring of device health
Use Cases
Glyph is instrumented to allow for Glyph data usage
queries in Glyph.
Real-time Monitoring of Special Events
Why monitor:
● Keep on top of quality of service issues
● Evaluate effect on sign up
● Measure the concurrent users watching the event
● Measure the percent of users watching the event
● Determine if there are platform-specific issues
How do we monitor:
● Set up dashboard with relevant metrics
● Circulate widely before the event
When to monitor:
● During the event, refer to the dashboard when people have questions related to usage and quality of
service
Real Time monitoring of Feature Launch
Feature Launch
Why monitor:
● Determine how fast new features are adopted by users
● Determine common usage patterns related to new features
● Determine platform-specific performance
What to monitor
● Fields or events related to the new features
How do we monitor:
● Set up dashboard with relevant metrics
○ Include data from beta testing, if possible
● Share widely among product managers and client teams
When to monitor:
● Set a fixed endpoint of right before the feature launch, and monitor the dashboard as data starts to
roll in
Real Time monitoring of Device Health
Why monitor:
● Continuously understanding app behavior will allow us to detect issues in new app versions issues.
● Compare user behavior and app performance of different app versions
What to monitor:
● Adoption rates of new app versions
● App performance by app version
● Performance comparison week over week and day over day
How do we monitor:
● Set up dashboards for each app client
When to monitor:
● Always, but especially when new app versions are released
18
Challenges & Lessons Learned
THANK YOU!
indrasis.mondal@hulu.com

More Related Content

What's hot

Making the Transition from the Suite to the Hub
Making the Transition from the Suite to the HubMaking the Transition from the Suite to the Hub
Making the Transition from the Suite to the HubJerika Phelps
 
Systematic Migration of Monolith to Microservices
Systematic Migration of Monolith to MicroservicesSystematic Migration of Monolith to Microservices
Systematic Migration of Monolith to MicroservicesPradeep Dalvi
 
Using Kafka: Anatomy of the Flowable event registry
Using Kafka: Anatomy of the Flowable event registryUsing Kafka: Anatomy of the Flowable event registry
Using Kafka: Anatomy of the Flowable event registryFlowable
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 
Ledingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lkLedingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lkMukesh Singh
 
Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...
 Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of... Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...
Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...Dataconomy Media
 
Data monstersrealtimeetl new
Data monstersrealtimeetl newData monstersrealtimeetl new
Data monstersrealtimeetl newGreenM
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartMukesh Singh
 
Evgen Kostenko "How to process 80 million events per day and build relational...
Evgen Kostenko "How to process 80 million events per day and build relational...Evgen Kostenko "How to process 80 million events per day and build relational...
Evgen Kostenko "How to process 80 million events per day and build relational...Fwdays
 
Data Strategies for Managing the Cycles in Oil and Gas
Data Strategies for Managing the Cycles in Oil and GasData Strategies for Managing the Cycles in Oil and Gas
Data Strategies for Managing the Cycles in Oil and GasDenodo
 
YaJUG/Kaiserslautern JUG - 3 easy improvements in your microservices architec...
YaJUG/Kaiserslautern JUG - 3 easy improvements in your microservices architec...YaJUG/Kaiserslautern JUG - 3 easy improvements in your microservices architec...
YaJUG/Kaiserslautern JUG - 3 easy improvements in your microservices architec...Nicolas Fränkel
 
go>tech world - 3 performance improvements with Hazelcast IMDG in your micros...
go>tech world - 3 performance improvements with Hazelcast IMDG in your micros...go>tech world - 3 performance improvements with Hazelcast IMDG in your micros...
go>tech world - 3 performance improvements with Hazelcast IMDG in your micros...Nicolas Fränkel
 
Istanbul JUG - 3 performance improvements with Hazelcast IMDG in your microse...
Istanbul JUG - 3 performance improvements with Hazelcast IMDG in your microse...Istanbul JUG - 3 performance improvements with Hazelcast IMDG in your microse...
Istanbul JUG - 3 performance improvements with Hazelcast IMDG in your microse...Nicolas Fränkel
 
Voxxed Days Cluj - 3 performance improvements with Hazelcast IMDG in your mic...
Voxxed Days Cluj - 3 performance improvements with Hazelcast IMDG in your mic...Voxxed Days Cluj - 3 performance improvements with Hazelcast IMDG in your mic...
Voxxed Days Cluj - 3 performance improvements with Hazelcast IMDG in your mic...Nicolas Fränkel
 
Flowable: Building a crowd sourced document extraction and verification system
Flowable: Building a crowd sourced document extraction and verification systemFlowable: Building a crowd sourced document extraction and verification system
Flowable: Building a crowd sourced document extraction and verification systemFlowable
 
Flowable Business Processing from Kafka Events
Flowable Business Processing from Kafka Events Flowable Business Processing from Kafka Events
Flowable Business Processing from Kafka Events Flowable
 
CMMN makes BPMN smarter and engaging
CMMN makes BPMN smarter and engagingCMMN makes BPMN smarter and engaging
CMMN makes BPMN smarter and engagingFlowable
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQLWSO2
 

What's hot (19)

Making the Transition from the Suite to the Hub
Making the Transition from the Suite to the HubMaking the Transition from the Suite to the Hub
Making the Transition from the Suite to the Hub
 
Systematic Migration of Monolith to Microservices
Systematic Migration of Monolith to MicroservicesSystematic Migration of Monolith to Microservices
Systematic Migration of Monolith to Microservices
 
Using Kafka: Anatomy of the Flowable event registry
Using Kafka: Anatomy of the Flowable event registryUsing Kafka: Anatomy of the Flowable event registry
Using Kafka: Anatomy of the Flowable event registry
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Ford
FordFord
Ford
 
Ledingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lkLedingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lk
 
Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...
 Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of... Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...
Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...
 
Data monstersrealtimeetl new
Data monstersrealtimeetl newData monstersrealtimeetl new
Data monstersrealtimeetl new
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Evgen Kostenko "How to process 80 million events per day and build relational...
Evgen Kostenko "How to process 80 million events per day and build relational...Evgen Kostenko "How to process 80 million events per day and build relational...
Evgen Kostenko "How to process 80 million events per day and build relational...
 
Data Strategies for Managing the Cycles in Oil and Gas
Data Strategies for Managing the Cycles in Oil and GasData Strategies for Managing the Cycles in Oil and Gas
Data Strategies for Managing the Cycles in Oil and Gas
 
YaJUG/Kaiserslautern JUG - 3 easy improvements in your microservices architec...
YaJUG/Kaiserslautern JUG - 3 easy improvements in your microservices architec...YaJUG/Kaiserslautern JUG - 3 easy improvements in your microservices architec...
YaJUG/Kaiserslautern JUG - 3 easy improvements in your microservices architec...
 
go>tech world - 3 performance improvements with Hazelcast IMDG in your micros...
go>tech world - 3 performance improvements with Hazelcast IMDG in your micros...go>tech world - 3 performance improvements with Hazelcast IMDG in your micros...
go>tech world - 3 performance improvements with Hazelcast IMDG in your micros...
 
Istanbul JUG - 3 performance improvements with Hazelcast IMDG in your microse...
Istanbul JUG - 3 performance improvements with Hazelcast IMDG in your microse...Istanbul JUG - 3 performance improvements with Hazelcast IMDG in your microse...
Istanbul JUG - 3 performance improvements with Hazelcast IMDG in your microse...
 
Voxxed Days Cluj - 3 performance improvements with Hazelcast IMDG in your mic...
Voxxed Days Cluj - 3 performance improvements with Hazelcast IMDG in your mic...Voxxed Days Cluj - 3 performance improvements with Hazelcast IMDG in your mic...
Voxxed Days Cluj - 3 performance improvements with Hazelcast IMDG in your mic...
 
Flowable: Building a crowd sourced document extraction and verification system
Flowable: Building a crowd sourced document extraction and verification systemFlowable: Building a crowd sourced document extraction and verification system
Flowable: Building a crowd sourced document extraction and verification system
 
Flowable Business Processing from Kafka Events
Flowable Business Processing from Kafka Events Flowable Business Processing from Kafka Events
Flowable Business Processing from Kafka Events
 
CMMN makes BPMN smarter and engaging
CMMN makes BPMN smarter and engagingCMMN makes BPMN smarter and engaging
CMMN makes BPMN smarter and engaging
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 

Similar to Data Con LA 2018 - Enabling real-time exploration and analytics at scale at Hulu by Indrasis Mondal

Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017Deepu K Sasidharan
 
Devoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipsterDevoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipsterJulien Dubois
 
Last Conference 2017: Big Data in a Production Environment: Lessons Learnt
Last Conference 2017: Big Data in a Production Environment: Lessons LearntLast Conference 2017: Big Data in a Production Environment: Lessons Learnt
Last Conference 2017: Big Data in a Production Environment: Lessons LearntMark Grebler
 
[WSO2 Meetup] Tools and Techniques for Building and Maintaining Streaming-bas...
[WSO2 Meetup] Tools and Techniques for Building and Maintaining Streaming-bas...[WSO2 Meetup] Tools and Techniques for Building and Maintaining Streaming-bas...
[WSO2 Meetup] Tools and Techniques for Building and Maintaining Streaming-bas...WSO2
 
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, CriteoParis Open Source Summit
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the CloudAmihay Zer-Kavod
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data PlatformDani Solà Lagares
 
Exploring Alluxio for Daily Tasks at Robinhood
Exploring Alluxio for Daily Tasks at RobinhoodExploring Alluxio for Daily Tasks at Robinhood
Exploring Alluxio for Daily Tasks at RobinhoodAlluxio, Inc.
 
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + Demos
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + DemosDrools5 Community Training Module 5 Drools BLIP Architectural Overview + Demos
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + DemosMauricio (Salaboy) Salatino
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at TwitterPrasad Wagle
 
How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018ARDC
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Piyush Kumar
 
Rakuten’s Journey with Splunk - Evolution of Splunk as a Service
Rakuten’s Journey with Splunk - Evolution of Splunk as a ServiceRakuten’s Journey with Splunk - Evolution of Splunk as a Service
Rakuten’s Journey with Splunk - Evolution of Splunk as a ServiceRakuten Group, Inc.
 
MuleSoft Manchester Meetup #4 slides 11th February 2021
MuleSoft Manchester Meetup #4 slides 11th February 2021MuleSoft Manchester Meetup #4 slides 11th February 2021
MuleSoft Manchester Meetup #4 slides 11th February 2021Ieva Navickaite
 
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...Denodo
 
Digicorp - Supply Chain Analytics Apps
Digicorp - Supply Chain Analytics AppsDigicorp - Supply Chain Analytics Apps
Digicorp - Supply Chain Analytics AppsDigicorp
 
Alex Nauda [Nobl9] | How Not to Build an SLO Platform | InfluxDays NA 2021
Alex Nauda [Nobl9] | How Not to Build an SLO Platform | InfluxDays NA 2021Alex Nauda [Nobl9] | How Not to Build an SLO Platform | InfluxDays NA 2021
Alex Nauda [Nobl9] | How Not to Build an SLO Platform | InfluxDays NA 2021InfluxData
 
NetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & ContainersNetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & Containersaspyker
 

Similar to Data Con LA 2018 - Enabling real-time exploration and analytics at scale at Hulu by Indrasis Mondal (20)

Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017
 
Devoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipsterDevoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipster
 
Last Conference 2017: Big Data in a Production Environment: Lessons Learnt
Last Conference 2017: Big Data in a Production Environment: Lessons LearntLast Conference 2017: Big Data in a Production Environment: Lessons Learnt
Last Conference 2017: Big Data in a Production Environment: Lessons Learnt
 
[WSO2 Meetup] Tools and Techniques for Building and Maintaining Streaming-bas...
[WSO2 Meetup] Tools and Techniques for Building and Maintaining Streaming-bas...[WSO2 Meetup] Tools and Techniques for Building and Maintaining Streaming-bas...
[WSO2 Meetup] Tools and Techniques for Building and Maintaining Streaming-bas...
 
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the Cloud
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
 
Exploring Alluxio for Daily Tasks at Robinhood
Exploring Alluxio for Daily Tasks at RobinhoodExploring Alluxio for Daily Tasks at Robinhood
Exploring Alluxio for Daily Tasks at Robinhood
 
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + Demos
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + DemosDrools5 Community Training Module 5 Drools BLIP Architectural Overview + Demos
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + Demos
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
 
Rakuten’s Journey with Splunk - Evolution of Splunk as a Service
Rakuten’s Journey with Splunk - Evolution of Splunk as a ServiceRakuten’s Journey with Splunk - Evolution of Splunk as a Service
Rakuten’s Journey with Splunk - Evolution of Splunk as a Service
 
MuleSoft Manchester Meetup #4 slides 11th February 2021
MuleSoft Manchester Meetup #4 slides 11th February 2021MuleSoft Manchester Meetup #4 slides 11th February 2021
MuleSoft Manchester Meetup #4 slides 11th February 2021
 
Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017
 
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
 
Digicorp - Supply Chain Analytics Apps
Digicorp - Supply Chain Analytics AppsDigicorp - Supply Chain Analytics Apps
Digicorp - Supply Chain Analytics Apps
 
Alex Nauda [Nobl9] | How Not to Build an SLO Platform | InfluxDays NA 2021
Alex Nauda [Nobl9] | How Not to Build an SLO Platform | InfluxDays NA 2021Alex Nauda [Nobl9] | How Not to Build an SLO Platform | InfluxDays NA 2021
Alex Nauda [Nobl9] | How Not to Build an SLO Platform | InfluxDays NA 2021
 
NetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & ContainersNetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & Containers
 

More from Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

More from Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Data Con LA 2018 - Enabling real-time exploration and analytics at scale at Hulu by Indrasis Mondal

  • 1. Indrasis Mondal August 11, 2018 Real-time Operational Intelligence at Hulu
  • 2. 2 ❏ Operational Intelligence at Hulu ❏ Buy vs Build ❏ High Level Architecture ❏ Use Cases ❏ Conclusion/Challenges Agenda
  • 3. 3 Operational Intelligence ● Operational intelligence (OI) is a category of real-time dynamic, business analytics that delivers visibility and insight into data, streaming events and business operations. Business Intelligence ● Business intelligence solutions help organizations improve their business performance over time. What is Operational Intelligence Source: , https://www.linkedin.com/pulse/operational-reporting-vs-business-intelligence-whats-sean-williams/
  • 4. 4 The operational intelligence tool at Hulu is known as Glyph. It empowers Hulu to easily draw insights from real-time and historical event driven data ● Capabilities ○ Real time data exploration , analytics and visualization ○ Interactive query ○ Real-time dashboard ○ Dynamic Real-time funnels ○ API Service ● Primary Usages ○ Operational Intelligence and Reporting ○ User interaction ○ Product usage ○ Video quality ○ App and device performance, etc. Operational Intelligence Capability at Hulu
  • 5. 5 ● Guiding Principles ○ Need for a data visualization tool. ○ This tool is capable of answering event-driven questions ○ Questions about user interaction, app health or quality of service ○ This tool serves as serving layer in a lambda pattern ● Key Assumptions ○ Primary stakeholders are Product, Technology, Engineering Operation, and Analysts ○ Ad Hoc questions related to events are time-bound ○ Aggregation at any other level are not optimized - risk of slow response ○ The data is available in real-time ○ The data is aggregated, not sampled ○ Result delay is not equal to query delay, it is data availability question! ○ CAP: Availability and partition tolerance guaranteed ○ CAP: Eventual consistency achieved through batch layer Glyph Introduction
  • 7. 7 • ~8TB of device and app data produced each day, with ~4PB available in hadoop • ~150K Events / Second flowing through the pipeline • ~1.5TB of Druid data generated each day • ~5 second delta from data emitted by client to data is queryable • ~150k Glyph queries per day, resulting in ~450k Druid queries • 250ms Average response time • 1100ms P95 response time across all queries • ~50% of Hulu employees use Glyph with ~10% using it on any given day • Largest single data source in Druid produces ~0.5TB per day Glyph by Numbers
  • 8. 8 High Level Architecture (Lambda Pattern at Hulu)
  • 9. 9 ● Open source timeseries column-oriented datastore: http://druid.io/ ● Supports streaming data, which is immediately queryable ● Built for lambda-style architectures ● Highly distributed ● Sub-second response times ● Built in time-based tiering for data storage ● All-in-one system split across several roles ○ Historical: Data node. Loads segments determined by the coordinator and makes them available for querying. Executes queries on the portions of data owned by the given node. ○ Coordinator: Data availability node. Manages the Historical nodes and performs segment balancing and handoffs. ○ Middle Manager: Indexing node. Spawns many workers (peons) on each host which ingest streaming data. ○ Overlord: “Middle manager” manager. Distributes indexing jobs across the many available middle managers. ○ Broker: Query nodes. Farm user queries out over many historicals + peons then aggregates node results. Druid at a glance
  • 10. 10 ● Cluster: ~80 Druid nodes with ~160 TB of total storage ● Map events to data sources ○ High-volume and/or complexity events get their own dedicated data sources ■ Can individually scale and modify the retention to fit the requirements ■ Higher cost as require dedicated indexing capacity ○ Low-volume events get merged into shared data sources ■ Reduces indexing resources used in favor of potentially worse storage size ● Simplify column types for ingestion ○ Dimension: Some column people want to filter/split over ○ Metric: Some column people want to aggregate. Each metric generates every aggregation type, allowing users to execute any query ○ High-cardinality: Some column which would normally be a dimension, but due to the cardinality of the value has limited value in full fidelity. These columns are only able to have a count distinct query run over them, as they have been aggregated away into a sketch representation Druid - How we use it
  • 11. 11 ● Core problems: ○ Data definitions and requirements constantly change ○ Many data sources, hard to maintain consistency across them all ○ Need flexibility in order to change things like segment size, granularity and schemas as data ages ○ Hard to tell if data is getting dropped, or if there is just no data due to ingestion setup ● What we tried: ○ Ingest data via blacklist ■ PRO: Easiest setup imaginable ■ CON: Easiest database failure mode imaginable ○ Ingest data via whitelist shipped along with ingestion services ■ PRO: Easy to add a new event just had to modify 2 config files ■ CON: Was hard to implement a config schema that didn’t involve complex logic across multiple projects ■ CON: Each project ended up having config differences due to forgotten / delayed deployments ● Current: Ingest data via whitelist served via micro-service ○ PRO: Guaranteed consistency across ingestion services ○ PRO: Allowed development of configuration as a service, rather than as an afterthought ○ CON: Effectively put a single point of failure, if config service went down ingestion would fail Ingestion Configs
  • 13. 13 ● Problems: ○ Druid syntax is decidedly *not* SQL ○ Use case requirements were for very simple querying; Don’t bring a firetruck to a water gun fight ○ Wanted query descriptions to be as simple as possible ■ Not building a generic SQL engine, so able to define a simple data model to describe the queries ■ Simple query description allowed us to easily fit it into a rich UI, as well as work with internally ○ Given use cases, common to see the same query issued many times, but we have had some difficulties with Druid query caching at scale ○ Need to abstract consumers away from our druid implementation choices ○ Sometimes people want joins ● Solutions: ○ Build an API to do simplified data model -> query translation ○ Build in query-aware caching logic ○ Abstract away our implementations during query translations ○ Build in API-side query time lookups to project properties on top of existing data Glyph API
  • 14. Over the first half of 2018 6-10% of Hulu’s employees view Glyph dashboards each month. What are these dashboards used for? 1. Real time monitoring of special events 2. Real time monitoring of feature launch 3. Real time monitoring of device health Use Cases Glyph is instrumented to allow for Glyph data usage queries in Glyph.
  • 15. Real-time Monitoring of Special Events Why monitor: ● Keep on top of quality of service issues ● Evaluate effect on sign up ● Measure the concurrent users watching the event ● Measure the percent of users watching the event ● Determine if there are platform-specific issues How do we monitor: ● Set up dashboard with relevant metrics ● Circulate widely before the event When to monitor: ● During the event, refer to the dashboard when people have questions related to usage and quality of service
  • 16. Real Time monitoring of Feature Launch Feature Launch Why monitor: ● Determine how fast new features are adopted by users ● Determine common usage patterns related to new features ● Determine platform-specific performance What to monitor ● Fields or events related to the new features How do we monitor: ● Set up dashboard with relevant metrics ○ Include data from beta testing, if possible ● Share widely among product managers and client teams When to monitor: ● Set a fixed endpoint of right before the feature launch, and monitor the dashboard as data starts to roll in
  • 17. Real Time monitoring of Device Health Why monitor: ● Continuously understanding app behavior will allow us to detect issues in new app versions issues. ● Compare user behavior and app performance of different app versions What to monitor: ● Adoption rates of new app versions ● App performance by app version ● Performance comparison week over week and day over day How do we monitor: ● Set up dashboards for each app client When to monitor: ● Always, but especially when new app versions are released