SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
Untangling Cluster Management with Helix

Helix team @ LinkedIn
Kishore Gopalakrishna
http://www.linkedin.com/in/kgopalak
@kishoreg1980
     Recruiting Solutions                  1
Outline


 What is Helix
 Use case 1: distributed data store
 Architecture
 Use case 2: consumer group
 Helix at LinkedIn
 Q&A


                                       2
What is Helix




  Cluster management framework for distributed systems
  using declarative state model




                                                         3
Distributed system examples




                              4
Motivation

 A system starts out simple…
 …but gets complex in the real world
 …as you address real requirements

                          Application

                           client library
  Scale
  Failover
  Bootstrapping
                           Call Routing
                             System

          Replica 1                         …

          Replica 2                         …
                                                5
Motivation




 These are cluster management problems
  Helix solves them once…
     Scale
  …so you can focus on your system
     Failover
  Bootstrapping




                                          6
Outline


 What is Helix
 Use case 1: distributed data store
 Architecture
 Use case 2: consumer group
 Helix at LinkedIn
 Q&A


                                       7
Use-Case: Distributed Data Store

 Distributed




                          P.1




      Node 1            Node 2     Node 3


                                            8
Use-Case: Distributed Data Store

 Distributed
 Partitioned




  P.1    P.2     P.3   P.5     P.6    P.7   P.9    P.1     P.11
                                                   0
  P.4                  P.8                  P.1
                                            2



        Node 1               Node 2               Node 3


                                                                  9
Use-Case: Distributed Data Store

 Distributed
 Partitioned
 Replicated




  P.1    P.2     P.3   P.5      P.6    P.7   P.9    P.1     P.11
                                                    0
  P.4    P.5     P.6   P.8      P.1    P.2   P.1    P.3     P.4
                                             2
  P.9    P.1           P.11     P.1          P.7    P.8
         0                      2

        Node 1                Node 2               Node 3


                                                                   10
Partition Layout

 Highly Available
 Master accepts writes
 Balanced distribution
                                                            Master
                                                            Slave




  P.1    P.2     P.3   P.5      P.6    P.7   P.9    P.1       P.11
                                                    0
  P.4    P.5     P.6   P.8      P.1    P.2   P.1    P.3       P.4
                                             2
  P.9    P.1           P.11     P.1          P.7    P.8
         0                      2

        Node 1                Node 2               Node 3


                                                                     11
Failover




                                                            Master
                                                            Slave




  P.1    P.2     P.3   P.5      P.6    P.7   P.9    P.1       P.11
                                                    0
  P.4    P.5     P.6   P.8      P.1    P.2   P.1    P.3       P.4
                                             2
  P.9    P.1           P.11     P.1          P.7    P.8
         0                      2

        Node 1                Node 2               Node 3
Add Capacity


  P.1    P.5     P.9


  P.1    P.1     P.8
  0      2
                                                            Master
        Node 4                                              Slave




  P.1    P.2     P.3   P.5      P.6    P.7   P.9    P.1       P.11
                                                    0
  P.4    P.5     P.6   P.8      P.1    P.2   P.1    P.3       P.4
                                             2
  P.9    P.1           P.11     P.1          P.7    P.8
         0                      2

        Node 1                Node 2               Node 3
Use-case requirements

  • Partition constraints
     • 1 master per partition
     • Balance partitions across cluster
     • No single-point-of-failure: replicas on different nodes
  • Handle failures: transfer mastership
  • Elasticity
     • Distribute workload across added nodes
      Minimize partition movement
  • Meet SLAs
      Throttle concurrent data movement




                                                                 14
Recruiting Solutions   ‹#›
Generalizing cluster management



                   STATE MACHINE




          CONSTRAINTS              OBJECTIVE

                                               16
Outline


 What is Helix
 Use case 1: distributed data store
 Architecture
 Use case 2: consumer group
 Helix at LinkedIn
 Q&A


                                       17
Helix Based System Roles

                                                                                 PARTICIPANT
    IDEAL STATE

                                                                                 SPECTATOR
                                    Controller


                                                       Parition routing
                                                             logic
   CURRENT STATE
                         RESPONSE        COMMAND




   P.1     P.2     P.3          P.5        P.6   P.7       P.9       P.1   P.1
                                                                     0     1

   P.4     P.5     P.6          P.8        P.1   P.2       P.1       P.3   P.4
                                           P.1
                                                           2

   P.9     P.1                  P.1        P.1             P.7       P.8
           0                    1          2


         Node 1                       Node 2                     Node 3

                                                                                       18
Controller Execution Flow



             N1   P1   P2               SLAVE              N1   P1   P2
                                          S
             N2   P2   P3                                  N2   P2   P3


             N3   P3   P1                                  N3   P3   P1

                                                                           N1
                             O                        M
                            OFFLINE               MASTER

                                      REBALANCER                           N2

                                                            P1:OS
                                                           P1:SM
             N1   P1   P2

                                                                           N3
             N2   P2   P3
                                      ZooKeeper

SPECTATORS   N3   P3   P1



                                                           MESSAGE QUEUE
Controller fault tolerance




                             Zookeeper




               Controller    Controller   Controller
                  1             2            3




               LEADER        STANDBY      STANDBY




                                                       20
Controller fault tolerance




                             Zookeeper




               Controller    Controller   Controller
                  1             2            3




               OFFLINE       LEADER       STANDBY




                                                       21
Participant Plug-in code




                           22
Spectator Plug-in code




                         23
Benefits

 Cluster operations “just work”
   – Bootstrapping
   – Failover
   – Add nodes
 Global vs Local
   – Helix Controller
        Global knowledge
        Makes cluster decisions
   – Participant
        Local knowledge
        Follows orders




                                   24
Outline


 What is Helix
 Use case 1: distributed data store
 Architecture
 Use case 2: consumer group
 Helix at LinkedIn
 Q&A


                                       25
consumer group




                 26
Consumer group: Scaling




                          27
Consumer group: Fault tolerance




                                  28
Consumer group: state model


                   ONLINE     MAX=1




                   OFFLINE


                                      29
Outline


 What is Helix
 Use case 1: distributed data store
 Architecture
 Use case 2: consumer group
 Helix at LinkedIn
 Q&A


                                       30
Helix usage at LinkedIn (Pictures)

 Espresso
   – a timeline-consistent, distributed data store
 Databus
   – a change data capture service
 Search as a Service
   – a multi-tenant service for multiple search applications
 More planned




                                                               31
Summary

 Building Distributed Data Systems is hard
   – Abstraction and modularity is key
 Helix: A Generic framework for Cluster Management
 Simple programming model: declarative state machine




                                                        32
Helix: Future Roadmap


• Features
   • Span multiple data centers
   • Load balancing


• Announcement
   • Open source: https://github.com/linkedin/helix
   • Apache incubation
   • New contributors
Questions?




             34

Contenu connexe

Tendances

Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafkaconfluent
 
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingClick-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingRobert Metzger
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsKetan Gote
 
RabbitMQ vs Apache Kafka - Part 1
RabbitMQ vs Apache Kafka - Part 1RabbitMQ vs Apache Kafka - Part 1
RabbitMQ vs Apache Kafka - Part 1Erlang Solutions
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeSlim Baltagi
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka IntroductionAmita Mirajkar
 
IBM MQ and Kafka, what is the difference?
IBM MQ and Kafka, what is the difference?IBM MQ and Kafka, what is the difference?
IBM MQ and Kafka, what is the difference?David Ware
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafkaJiangjie Qin
 
Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain confluent
 
Introduction To RabbitMQ
Introduction To RabbitMQIntroduction To RabbitMQ
Introduction To RabbitMQKnoldus Inc.
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producerconfluent
 

Tendances (20)

kafka
kafkakafka
kafka
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingClick-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer Checkpointing
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
RabbitMQ vs Apache Kafka - Part 1
RabbitMQ vs Apache Kafka - Part 1RabbitMQ vs Apache Kafka - Part 1
RabbitMQ vs Apache Kafka - Part 1
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
IBM MQ and Kafka, what is the difference?
IBM MQ and Kafka, what is the difference?IBM MQ and Kafka, what is the difference?
IBM MQ and Kafka, what is the difference?
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafka
 
Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Kafka/SMM Crash Course
Kafka/SMM Crash CourseKafka/SMM Crash Course
Kafka/SMM Crash Course
 
Introduction To RabbitMQ
Introduction To RabbitMQIntroduction To RabbitMQ
Introduction To RabbitMQ
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 

Plus de Kishore Gopalakrishna

Building real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case studyBuilding real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case studyKishore Gopalakrishna
 
Pinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastorePinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastoreKishore Gopalakrishna
 
Multi-Tenant Data Cloud with YARN & Helix
Multi-Tenant Data Cloud with YARN & HelixMulti-Tenant Data Cloud with YARN & Helix
Multi-Tenant Data Cloud with YARN & HelixKishore Gopalakrishna
 
Untangling cluster management with Helix
Untangling cluster management with HelixUntangling cluster management with Helix
Untangling cluster management with HelixKishore Gopalakrishna
 
Data driven testing: Case study with Apache Helix
Data driven testing: Case study with Apache HelixData driven testing: Case study with Apache Helix
Data driven testing: Case study with Apache HelixKishore Gopalakrishna
 

Plus de Kishore Gopalakrishna (6)

History of Apache Pinot
History of Apache Pinot History of Apache Pinot
History of Apache Pinot
 
Building real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case studyBuilding real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case study
 
Pinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastorePinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastore
 
Multi-Tenant Data Cloud with YARN & Helix
Multi-Tenant Data Cloud with YARN & HelixMulti-Tenant Data Cloud with YARN & Helix
Multi-Tenant Data Cloud with YARN & Helix
 
Untangling cluster management with Helix
Untangling cluster management with HelixUntangling cluster management with Helix
Untangling cluster management with Helix
 
Data driven testing: Case study with Apache Helix
Data driven testing: Case study with Apache HelixData driven testing: Case study with Apache Helix
Data driven testing: Case study with Apache Helix
 

Dernier

All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sectoritnewsafrica
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 

Dernier (20)

All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 

Apache Helix presentation at SOCC 2012

  • 1. Untangling Cluster Management with Helix Helix team @ LinkedIn Kishore Gopalakrishna http://www.linkedin.com/in/kgopalak @kishoreg1980 Recruiting Solutions 1
  • 2. Outline  What is Helix  Use case 1: distributed data store  Architecture  Use case 2: consumer group  Helix at LinkedIn  Q&A 2
  • 3. What is Helix Cluster management framework for distributed systems using declarative state model 3
  • 5. Motivation  A system starts out simple…  …but gets complex in the real world  …as you address real requirements Application client library  Scale  Failover  Bootstrapping Call Routing System Replica 1 … Replica 2 … 5
  • 6. Motivation  These are cluster management problems   Helix solves them once… Scale   …so you can focus on your system Failover  Bootstrapping 6
  • 7. Outline  What is Helix  Use case 1: distributed data store  Architecture  Use case 2: consumer group  Helix at LinkedIn  Q&A 7
  • 8. Use-Case: Distributed Data Store  Distributed P.1 Node 1 Node 2 Node 3 8
  • 9. Use-Case: Distributed Data Store  Distributed  Partitioned P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.8 P.1 2 Node 1 Node 2 Node 3 9
  • 10. Use-Case: Distributed Data Store  Distributed  Partitioned  Replicated P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 2 P.9 P.1 P.11 P.1 P.7 P.8 0 2 Node 1 Node 2 Node 3 10
  • 11. Partition Layout  Highly Available  Master accepts writes  Balanced distribution Master Slave P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 2 P.9 P.1 P.11 P.1 P.7 P.8 0 2 Node 1 Node 2 Node 3 11
  • 12. Failover Master Slave P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 2 P.9 P.1 P.11 P.1 P.7 P.8 0 2 Node 1 Node 2 Node 3
  • 13. Add Capacity P.1 P.5 P.9 P.1 P.1 P.8 0 2 Master Node 4 Slave P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 2 P.9 P.1 P.11 P.1 P.7 P.8 0 2 Node 1 Node 2 Node 3
  • 14. Use-case requirements • Partition constraints • 1 master per partition • Balance partitions across cluster • No single-point-of-failure: replicas on different nodes • Handle failures: transfer mastership • Elasticity • Distribute workload across added nodes  Minimize partition movement • Meet SLAs  Throttle concurrent data movement 14
  • 16. Generalizing cluster management STATE MACHINE CONSTRAINTS OBJECTIVE 16
  • 17. Outline  What is Helix  Use case 1: distributed data store  Architecture  Use case 2: consumer group  Helix at LinkedIn  Q&A 17
  • 18. Helix Based System Roles PARTICIPANT IDEAL STATE SPECTATOR Controller Parition routing logic CURRENT STATE RESPONSE COMMAND P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.1 0 1 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 P.1 2 P.9 P.1 P.1 P.1 P.7 P.8 0 1 2 Node 1 Node 2 Node 3 18
  • 19. Controller Execution Flow N1 P1 P2 SLAVE N1 P1 P2 S N2 P2 P3 N2 P2 P3 N3 P3 P1 N3 P3 P1 N1 O M OFFLINE MASTER REBALANCER N2 P1:OS P1:SM N1 P1 P2 N3 N2 P2 P3 ZooKeeper SPECTATORS N3 P3 P1 MESSAGE QUEUE
  • 20. Controller fault tolerance Zookeeper Controller Controller Controller 1 2 3 LEADER STANDBY STANDBY 20
  • 21. Controller fault tolerance Zookeeper Controller Controller Controller 1 2 3 OFFLINE LEADER STANDBY 21
  • 24. Benefits  Cluster operations “just work” – Bootstrapping – Failover – Add nodes  Global vs Local – Helix Controller  Global knowledge  Makes cluster decisions – Participant  Local knowledge  Follows orders 24
  • 25. Outline  What is Helix  Use case 1: distributed data store  Architecture  Use case 2: consumer group  Helix at LinkedIn  Q&A 25
  • 28. Consumer group: Fault tolerance 28
  • 29. Consumer group: state model ONLINE MAX=1 OFFLINE 29
  • 30. Outline  What is Helix  Use case 1: distributed data store  Architecture  Use case 2: consumer group  Helix at LinkedIn  Q&A 30
  • 31. Helix usage at LinkedIn (Pictures)  Espresso – a timeline-consistent, distributed data store  Databus – a change data capture service  Search as a Service – a multi-tenant service for multiple search applications  More planned 31
  • 32. Summary  Building Distributed Data Systems is hard – Abstraction and modularity is key  Helix: A Generic framework for Cluster Management  Simple programming model: declarative state machine 32
  • 33. Helix: Future Roadmap • Features • Span multiple data centers • Load balancing • Announcement • Open source: https://github.com/linkedin/helix • Apache incubation • New contributors

Notes de l'éditeur

  1. Partitioned queue consumption, lets say there are 6 queues and some consumers to consume form these queues.The requirement is simple, the number of queues must be equally divided among the consumers. On top of the we need partition affinity while consuming instead of randomly picking up from any queue.