SlideShare une entreprise Scribd logo
1  sur  23
© Rocana, Inc. All Rights Reserved. | 1
Eric Sammer – CTO and co-founder, @esammer
Data Day Texas 2016
High cardinality time series search
A new level of scale
© Rocana, Inc. All Rights Reserved. | 2
Context
• We build a system for large scale realtime collection, processing, and
analysis of event-oriented machine data
• On prem or in the cloud, but not SaaS
• Supportability is a big deal for us
• Predictability of performance and under failures
• Ease of configuration and operation
• Behavior in wacky environments
• All of our decisions are informed by this - YMMV
© Rocana, Inc. All Rights Reserved. | 3
What I mean by “scale”
• Typical: 10s of TB of new data per day
• Average event size ~200-500 bytes
• 20TB per day
• @200 bytes = 1.2M events / second, ~109.9B events / day, 40.1T events / year
• @500 bytes = 509K events / second, ~43.9B events / day, 16T events / year,
• Retaining years online for query
© Rocana, Inc. All Rights Reserved. | 4
General purpose search – the good parts
• We originally built against Solr Cloud (but most of this goes for Elastic
Seach too)
• Amazing feature set for general purpose search
• Good support for moderate scale
• Excellent at
• Content search – news sites, document repositories
• Finite size datasets – product catalogs, job postings, things you prune
• Low(er) cardinality datasets that (mostly) fit in memory
© Rocana, Inc. All Rights Reserved. | 5
Problems with general purpose search systems
• Fixed shard allocation models – always N partitions
• Multi-level and semantic partitioning is painful without building your own
macro query planner
• All shards open all the time; poor resource control for high retention
• APIs are record-at-a-time focused for NRT indexing; poor ingest
performance (aka: please stop making everything REST!)
• Ingest concurrency is wonky
• High write amplification on data we know won’t change
• Other smaller stuff…
© Rocana, Inc. All Rights Reserved. | 6
“Well actually…”
Plenty of ways to push general purpose systems
(We tried many of them)
• Using multiple collections as partitions, macro query planning
• Running multiple JVMs per node for better utilization
• Pushing historical searches into another system
• Building weirdo caches of things
At some point the cost of hacking outweighed the cost of building
© Rocana, Inc. All Rights Reserved. | 7
Warning!
• This is not a condemnation of general purpose search systems!
• Unless the sky is falling, use one of those systems
© Rocana, Inc. All Rights Reserved. | 8
We built a thing: Rocana Search
High cardinality, low latency, parallel search system for time-oriented events
© Rocana, Inc. All Rights Reserved. | 9
Features of Rocana Search
• Fully parallelized ingest and query, built for large clusters
• Every node is an indexer, query coordinator, and executor
• Optimized for high cardinality time-oriented event data
• Built to keep all data online and queryable without wasting resources for
infrequently used data
• Fully durable, resistant to node failures
• Operationally friendly: online ops, predictable resource usage and
performance
• Uses battle tested open source components (Kafka, Lucene, HDFS, ZK)
© Rocana, Inc. All Rights Reserved. | 10
Major differences
• Storage and partition model looks more like range-partitioned tables in
databases; new partitions easily added, old ones dropped, support for
multi-field partitioning
• Partitions subdivided into slices for parallel writes
• Query engine aggressively prunes partitions by analyzing predicates
• Ingestion path is Kafka, built for extremely high throughput of small
events
What we know about our data allows us to optimize
© Rocana, Inc. All Rights Reserved. | 11
Architecture
(A single node)
RS
HDFS
MetadataIndex Management Coordinator
ExecutorIndexes
Query Client
Kafka
Data Producers
ZK
© Rocana, Inc. All Rights Reserved. | 12
Collections, partitions, and slices
• A search collection is split into partitions by a partition strategy
• Think: “By year, month, day, hour”
• Partitioning invisible to queries (e.g. `time:[x TO y] AND host:z` works normally)
• Partitions are divided into slices to support lock-free parallel writes
• Think: “This hour has 20 slices, each of which is independent for write”
© Rocana, Inc. All Rights Reserved. | 13
Collections, partitions, and slices
Collection “events”
Partition “2016/01/01”
Slice 0 Slice 1
Slice 2 Slice N
Partition “2016/01/02”
Slice 0 Slice 1
Slice 2 Slice N
© Rocana, Inc. All Rights Reserved. | 14
From events to partitions to slices Partition 2016/01/01
Slice 0
Slice 1
Topic events
KP 0
KP 1
Event 1
2016/01/01
Event 2
2016/01/01
Event 3
2016/01/02
Partition 2016/01/02
Slice 0
Slice 1
E1
E2
E3
© Rocana, Inc. All Rights Reserved. | 15
Assigning slices to nodes
Node 1
Partition 2016/01/01
S 0 S 2
Partition 2016/01/02
S 0 S 2
Partition 2016/01/03
S 0 S 2
Partition 2016/01/04
S 0 S 2
Topic eventsKP 0 KP 2 KP 1 KP 3
Node 2
Partition 2016/01/01
S 1 S 3
Partition 2016/01/02
S 1 S 3
Partition 2016/01/03
S 1 S 3
Partition 2016/01/04
S 1 S 3
© Rocana, Inc. All Rights Reserved. | 16
Following the write path
• One of the search nodes is the exclusive owner of KP 0 and KP 1
• Consume a batch of events
• Use the partition strategy to figure out to which RS partition it belongs
• Kafka messages carry the partition so we know the slice
• Event written to the proper partition/slice
• Eventually the indexes are committed
• If the partition or slice is new, metadata service is informed
© Rocana, Inc. All Rights Reserved. | 17
Query engine basics
• Queries submitted to coordinator via RPC
• Coordinator (smart) parses, plans, schedules and monitors fragments,
merges results, responds to client
• Fragments are submitted to executors for processing
• Executors (dumb) search exactly what they’re told, stream to coordinator
• Fragment is generated for every partition/slice that may contain data
© Rocana, Inc. All Rights Reserved. | 18
Some implications
• Search processes are on the same nodes as the HDFS DataNode
• First replica of any event received by search from Kafka is written locally
• Result: Unless nodes fail, all reads are local (HDFS short circuit reads)
• Linux kernel page cache is useful here
• HDFS caching can be used
• Search has an off-heap block cache as well
• In case of failure, any search node can read any index
• HDFS overhead winds up being very little, still get the advantages
© Rocana, Inc. All Rights Reserved. | 19
Contrived query scenario
• 80 Kafka partitions (80 slices)
• Collection partitioned by day
• 80 nodes, 16 executor threads each
• Query: time:[2015-01-01 TO 2016-01-01] AND service:sshd
• 365 * 80 = 29200 fragments generated for the query (a lot!)
• 29200 / (80 * 16) = ~22 “waves” of fragments
• If each “wave” takes ~0.5 second, the query takes ~11 seconds
© Rocana, Inc. All Rights Reserved. | 20
More real, but preliminary
• 24 AWS EC2 d2.2xl, instance storage
• Ingesting data at ~3 million events per minute (50K eps)
• 24 Kafka partitions / RS slices
• Index size: 5.9 billion events
• Query: All events, facet by 3 fields
• No tuning (default config): ~10 seconds (with a silly bug)
• 10 concurrent instances of the same query: ~21 seconds total
• 50 concurrent instances: ~41 seconds
• We will do much better shortly (*ahem*, Brett)!
© Rocana, Inc. All Rights Reserved. | 21
What we’ve really shown
In the context of search, scale means:
• High cardinality: Billions of events per day
• High speed ingest: Hundreds of thousands of events per second
• Not having to age data out of the collection
• Handling large, concurrent queries, while ingesting data
• Fully utilizing modern hardware
These things are very possible
© Rocana, Inc. All Rights Reserved. | 22
Next steps
• Read replicas
• Smarter partition elimination in complex queries
• Speculative execution of query fragments
• Additional metadata for index fields to improve storage efficiency
• Smarter cache management
• Better visibility into performance and health
• Strong consensus (e.g. Raft, multi-paxos) for metadata?
© Rocana, Inc. All Rights Reserved. | 23
Thank you!
Hopefully I still have time for
questions.
rocana.com
@esammer
esammer@rocana.com
(ask me for stickers)
The (amazing) core search team:
• Brett Hoerner - @bretthoerner
• Michael Peterson - @quux00
• Mark Tozzi - @not_napoleon

Contenu connexe

Tendances

Flink at netflix paypal speaker series
Flink at netflix   paypal speaker seriesFlink at netflix   paypal speaker series
Flink at netflix paypal speaker seriesMonal Daxini
 
Data pipeline with kafka
Data pipeline with kafkaData pipeline with kafka
Data pipeline with kafkaMole Wong
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020confluent
 
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails?
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails? Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails?
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails? confluent
 
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, ShopifyIt's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, ShopifyHostedbyConfluent
 
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka StreamsKafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streamsconfluent
 
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean FellowsDeploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean Fellowsconfluent
 
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Beaming flink to the cloud @ netflix   ff 2016-monal-daxiniBeaming flink to the cloud @ netflix   ff 2016-monal-daxini
Beaming flink to the cloud @ netflix ff 2016-monal-daxiniMonal Daxini
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streamingdatamantra
 
Ingesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah WhitacreIngesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah Whitacreconfluent
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer confluent
 
Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka confluent
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEkawamuray
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Monal Daxini
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsMonal Daxini
 
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Flink Forward
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumarconfluent
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniMonal Daxini
 

Tendances (20)

Flink at netflix paypal speaker series
Flink at netflix   paypal speaker seriesFlink at netflix   paypal speaker series
Flink at netflix paypal speaker series
 
Data pipeline with kafka
Data pipeline with kafkaData pipeline with kafka
Data pipeline with kafka
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
 
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails?
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails? Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails?
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails?
 
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, ShopifyIt's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
 
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka StreamsKafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
 
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean FellowsDeploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
 
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Beaming flink to the cloud @ netflix   ff 2016-monal-daxiniBeaming flink to the cloud @ netflix   ff 2016-monal-daxini
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Ingesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah WhitacreIngesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah Whitacre
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer
 
Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data Problems
 
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 

Similaire à High cardinality time series search: A new level of scale - Data Day Texas 2016

Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchJoe Alex
 
Supercharging Data Performance for Real-Time Data Analysis
Supercharging Data Performance for Real-Time Data Analysis Supercharging Data Performance for Real-Time Data Analysis
Supercharging Data Performance for Real-Time Data Analysis Ryft
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Emprovise
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraJon Haddad
 
Building a system for machine and event-oriented data - SF HUG Nov 2015
Building a system for machine and event-oriented data - SF HUG Nov 2015Building a system for machine and event-oriented data - SF HUG Nov 2015
Building a system for machine and event-oriented data - SF HUG Nov 2015Felicia Haggarty
 
Frontera: open source, large scale web crawling framework
Frontera: open source, large scale web crawling frameworkFrontera: open source, large scale web crawling framework
Frontera: open source, large scale web crawling frameworkScrapinghub
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingApache Apex
 
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsJava one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsSpeedment, Inc.
 
Performance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsPerformance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsDataWorks Summit/Hadoop Summit
 
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...Amazon Web Services
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexApache Apex
 
Alexander Sibiryakov- Frontera
Alexander Sibiryakov- FronteraAlexander Sibiryakov- Frontera
Alexander Sibiryakov- FronteraPyData
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionDataStax Academy
 

Similaire à High cardinality time series search: A new level of scale - Data Day Texas 2016 (20)

Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
Supercharging Data Performance for Real-Time Data Analysis
Supercharging Data Performance for Real-Time Data Analysis Supercharging Data Performance for Real-Time Data Analysis
Supercharging Data Performance for Real-Time Data Analysis
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
 
Building a system for machine and event-oriented data - SF HUG Nov 2015
Building a system for machine and event-oriented data - SF HUG Nov 2015Building a system for machine and event-oriented data - SF HUG Nov 2015
Building a system for machine and event-oriented data - SF HUG Nov 2015
 
Frontera: open source, large scale web crawling framework
Frontera: open source, large scale web crawling frameworkFrontera: open source, large scale web crawling framework
Frontera: open source, large scale web crawling framework
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
 
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsJava one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
 
Performance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsPerformance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data Platforms
 
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
Alexander Sibiryakov- Frontera
Alexander Sibiryakov- FronteraAlexander Sibiryakov- Frontera
Alexander Sibiryakov- Frontera
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 

Dernier

Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 

Dernier (20)

Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 

High cardinality time series search: A new level of scale - Data Day Texas 2016

  • 1. © Rocana, Inc. All Rights Reserved. | 1 Eric Sammer – CTO and co-founder, @esammer Data Day Texas 2016 High cardinality time series search A new level of scale
  • 2. © Rocana, Inc. All Rights Reserved. | 2 Context • We build a system for large scale realtime collection, processing, and analysis of event-oriented machine data • On prem or in the cloud, but not SaaS • Supportability is a big deal for us • Predictability of performance and under failures • Ease of configuration and operation • Behavior in wacky environments • All of our decisions are informed by this - YMMV
  • 3. © Rocana, Inc. All Rights Reserved. | 3 What I mean by “scale” • Typical: 10s of TB of new data per day • Average event size ~200-500 bytes • 20TB per day • @200 bytes = 1.2M events / second, ~109.9B events / day, 40.1T events / year • @500 bytes = 509K events / second, ~43.9B events / day, 16T events / year, • Retaining years online for query
  • 4. © Rocana, Inc. All Rights Reserved. | 4 General purpose search – the good parts • We originally built against Solr Cloud (but most of this goes for Elastic Seach too) • Amazing feature set for general purpose search • Good support for moderate scale • Excellent at • Content search – news sites, document repositories • Finite size datasets – product catalogs, job postings, things you prune • Low(er) cardinality datasets that (mostly) fit in memory
  • 5. © Rocana, Inc. All Rights Reserved. | 5 Problems with general purpose search systems • Fixed shard allocation models – always N partitions • Multi-level and semantic partitioning is painful without building your own macro query planner • All shards open all the time; poor resource control for high retention • APIs are record-at-a-time focused for NRT indexing; poor ingest performance (aka: please stop making everything REST!) • Ingest concurrency is wonky • High write amplification on data we know won’t change • Other smaller stuff…
  • 6. © Rocana, Inc. All Rights Reserved. | 6 “Well actually…” Plenty of ways to push general purpose systems (We tried many of them) • Using multiple collections as partitions, macro query planning • Running multiple JVMs per node for better utilization • Pushing historical searches into another system • Building weirdo caches of things At some point the cost of hacking outweighed the cost of building
  • 7. © Rocana, Inc. All Rights Reserved. | 7 Warning! • This is not a condemnation of general purpose search systems! • Unless the sky is falling, use one of those systems
  • 8. © Rocana, Inc. All Rights Reserved. | 8 We built a thing: Rocana Search High cardinality, low latency, parallel search system for time-oriented events
  • 9. © Rocana, Inc. All Rights Reserved. | 9 Features of Rocana Search • Fully parallelized ingest and query, built for large clusters • Every node is an indexer, query coordinator, and executor • Optimized for high cardinality time-oriented event data • Built to keep all data online and queryable without wasting resources for infrequently used data • Fully durable, resistant to node failures • Operationally friendly: online ops, predictable resource usage and performance • Uses battle tested open source components (Kafka, Lucene, HDFS, ZK)
  • 10. © Rocana, Inc. All Rights Reserved. | 10 Major differences • Storage and partition model looks more like range-partitioned tables in databases; new partitions easily added, old ones dropped, support for multi-field partitioning • Partitions subdivided into slices for parallel writes • Query engine aggressively prunes partitions by analyzing predicates • Ingestion path is Kafka, built for extremely high throughput of small events What we know about our data allows us to optimize
  • 11. © Rocana, Inc. All Rights Reserved. | 11 Architecture (A single node) RS HDFS MetadataIndex Management Coordinator ExecutorIndexes Query Client Kafka Data Producers ZK
  • 12. © Rocana, Inc. All Rights Reserved. | 12 Collections, partitions, and slices • A search collection is split into partitions by a partition strategy • Think: “By year, month, day, hour” • Partitioning invisible to queries (e.g. `time:[x TO y] AND host:z` works normally) • Partitions are divided into slices to support lock-free parallel writes • Think: “This hour has 20 slices, each of which is independent for write”
  • 13. © Rocana, Inc. All Rights Reserved. | 13 Collections, partitions, and slices Collection “events” Partition “2016/01/01” Slice 0 Slice 1 Slice 2 Slice N Partition “2016/01/02” Slice 0 Slice 1 Slice 2 Slice N
  • 14. © Rocana, Inc. All Rights Reserved. | 14 From events to partitions to slices Partition 2016/01/01 Slice 0 Slice 1 Topic events KP 0 KP 1 Event 1 2016/01/01 Event 2 2016/01/01 Event 3 2016/01/02 Partition 2016/01/02 Slice 0 Slice 1 E1 E2 E3
  • 15. © Rocana, Inc. All Rights Reserved. | 15 Assigning slices to nodes Node 1 Partition 2016/01/01 S 0 S 2 Partition 2016/01/02 S 0 S 2 Partition 2016/01/03 S 0 S 2 Partition 2016/01/04 S 0 S 2 Topic eventsKP 0 KP 2 KP 1 KP 3 Node 2 Partition 2016/01/01 S 1 S 3 Partition 2016/01/02 S 1 S 3 Partition 2016/01/03 S 1 S 3 Partition 2016/01/04 S 1 S 3
  • 16. © Rocana, Inc. All Rights Reserved. | 16 Following the write path • One of the search nodes is the exclusive owner of KP 0 and KP 1 • Consume a batch of events • Use the partition strategy to figure out to which RS partition it belongs • Kafka messages carry the partition so we know the slice • Event written to the proper partition/slice • Eventually the indexes are committed • If the partition or slice is new, metadata service is informed
  • 17. © Rocana, Inc. All Rights Reserved. | 17 Query engine basics • Queries submitted to coordinator via RPC • Coordinator (smart) parses, plans, schedules and monitors fragments, merges results, responds to client • Fragments are submitted to executors for processing • Executors (dumb) search exactly what they’re told, stream to coordinator • Fragment is generated for every partition/slice that may contain data
  • 18. © Rocana, Inc. All Rights Reserved. | 18 Some implications • Search processes are on the same nodes as the HDFS DataNode • First replica of any event received by search from Kafka is written locally • Result: Unless nodes fail, all reads are local (HDFS short circuit reads) • Linux kernel page cache is useful here • HDFS caching can be used • Search has an off-heap block cache as well • In case of failure, any search node can read any index • HDFS overhead winds up being very little, still get the advantages
  • 19. © Rocana, Inc. All Rights Reserved. | 19 Contrived query scenario • 80 Kafka partitions (80 slices) • Collection partitioned by day • 80 nodes, 16 executor threads each • Query: time:[2015-01-01 TO 2016-01-01] AND service:sshd • 365 * 80 = 29200 fragments generated for the query (a lot!) • 29200 / (80 * 16) = ~22 “waves” of fragments • If each “wave” takes ~0.5 second, the query takes ~11 seconds
  • 20. © Rocana, Inc. All Rights Reserved. | 20 More real, but preliminary • 24 AWS EC2 d2.2xl, instance storage • Ingesting data at ~3 million events per minute (50K eps) • 24 Kafka partitions / RS slices • Index size: 5.9 billion events • Query: All events, facet by 3 fields • No tuning (default config): ~10 seconds (with a silly bug) • 10 concurrent instances of the same query: ~21 seconds total • 50 concurrent instances: ~41 seconds • We will do much better shortly (*ahem*, Brett)!
  • 21. © Rocana, Inc. All Rights Reserved. | 21 What we’ve really shown In the context of search, scale means: • High cardinality: Billions of events per day • High speed ingest: Hundreds of thousands of events per second • Not having to age data out of the collection • Handling large, concurrent queries, while ingesting data • Fully utilizing modern hardware These things are very possible
  • 22. © Rocana, Inc. All Rights Reserved. | 22 Next steps • Read replicas • Smarter partition elimination in complex queries • Speculative execution of query fragments • Additional metadata for index fields to improve storage efficiency • Smarter cache management • Better visibility into performance and health • Strong consensus (e.g. Raft, multi-paxos) for metadata?
  • 23. © Rocana, Inc. All Rights Reserved. | 23 Thank you! Hopefully I still have time for questions. rocana.com @esammer esammer@rocana.com (ask me for stickers) The (amazing) core search team: • Brett Hoerner - @bretthoerner • Michael Peterson - @quux00 • Mark Tozzi - @not_napoleon

Notes de l'éditeur

  1. YMMV Not necessarily true for you Enterprise software – shipping stuff to people Fine grained events – logs, user behavior, etc. For everything – solving the problem of “enterprise wide” ops, so it’s everything from everywhere from everyone for all time (until they run out of money for nodes). This isn’t condemnation of general purpose search engines as much as what we had to do for our domain
  2. YMMV Not necessarily true for you Enterprise software – shipping stuff to people Fine grained events – logs, user behavior, etc. For everything – solving the problem of “enterprise wide” ops, so it’s everything from everywhere from everyone for all time (until they run out of money for nodes). This isn’t condemnation of general purpose search engines as much as what we had to do for our domain
  3. It does most of what you want for most cases most of the time. They’ve solved some really hard problems. Content search (e.g. news sites, document repos), finite size datasets (e.g. product catalogs), low cardinality datasets that fit in memory. Not us.
  4. Flexible systems with a bevy of full text search features Moderate and fixed document count: big by historical standards, small by ours. Design reflects these assumptions. Fixed sharding at index creation. Partition events into N buckets. For long retention time-based systems, this isn’t how we think. Let’s keep it until it’s painful. Then we add boxes. When that’s painful, we prune. Not sure what that looks like. Repartitioning is not feasible at scale. Partitions count should be dynamic. Multi-level partitioning is painful without building your own query layer; by range(time), then hash(region) or identity(region). All shards are open all the time. Implicit assumption that either you 1. have queries that touch the data evenly or 2. have inifinite resources. Recent time events are hotter than distant, but distant still needs to be available for query. Poor cache control. Recent data should be in cache. Historical scans shouldn’t push recent data out of cache. APIs are extremely “single record” focused. REST with record-at-a-time is absolutely abysmal for high throughput systems. Batch indexing is not useful. No in between. Read replicas are expensive and homogenous. Ideally we have 3 read replicas for the last N days and 1 for others. Replicas (for performance) should take up space in memory, but not on disk. Ingest concurrency tends to be wonky; whole lotta locking going on. Anecdotally, it’s difficult to get Solr Cloud to light up all cores on a box without running multiple JVMs; something is weird. We can get the benefits of NRT indexing speed with fewer writer checkpoints because our ingest pipeline acts as a reliable log. We recover from Kafka based on the last time the writer checkpointed so we can checkpoint very infrequently if we want. We know our data doesn’t change, or changes very little, after a certain point, so we can optimize and freeze indexes reducing write amplification from compactions.
  5. There are plenty of ways we could of pushed the general purpose systems, and we did. We layered our own partitioning and shard selection on top of Solr Cloud with time-based collection round robining. That got us pretty far, but not far enough. We were starting to do a lot of query rewriting and scheduling. Run mulitple JVMs per box. Gross. Unsupportable. Push historical queries out of search to a system such as spark. Build weird caches of frequent data sets. At some point, the cost of hacking outweighed the cost of building.
  6. Flexible systems with a bevy of full text search features Moderate and fixed document count: big by historical standards, small by ours. Design reflects these assumptions. Fixed sharding at index creation. Partition events into N buckets. For long retention time-based systems, this isn’t how we think. Let’s keep it until it’s painful. Then we add boxes. When that’s painful, we prune. Not sure what that looks like. Repartitioning is not feasible at scale. Partitions count should be dynamic. Multi-level partitioning is painful without building your own query layer; by range(time), then hash(region) or identity(region). All shards are open all the time. Implicit assumption that either you 1. have queries that touch the data evenly or 2. have inifinite resources. Recent time events are hotter than distant, but distant still needs to be available for query. Poor cache control. Recent data should be in cache. Historical scans shouldn’t push recent data out of cache. APIs are extremely “single record” focused. REST with record-at-a-time is absolutely abysmal for high throughput systems. Batch indexing is not useful. No in between. Read replicas are expensive and homogenous. Ideally we have 3 read replicas for the last N days and 1 for others. Replicas (for performance) should take up space in memory, but not on disk. Ingest concurrency tends to be wonky; whole lotta locking going on. Anecdotally, it’s difficult to get Solr Cloud to light up all cores on a box without running multiple JVMs; something is weird. We can get the benefits of NRT indexing speed with fewer writer checkpoints because our ingest pipeline acts as a reliable log. We recover from Kafka based on the last time the writer checkpointed so we can checkpoint very infrequently if we want. We know our data doesn’t change, or changes very little, after a certain point, so we can optimize and freeze indexes reducing write amplification from compactions.