SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
Data Infrastructure at LinkedIn
Kapil Surlaker

http://www.linkedin.com/in/kapilsurlaker
@kapilsurlaker




                                           1
Outline


 LinkedIn Products
 Data Ecosystem
 LinkedIn Data Infrastructure Solutions
 Next Play




                                           2
LinkedIn By The Numbers

   150M + users*
   ~ 4.2B People Searches in 2011**
   >2M companies with LinkedIn Company Pages**
   16 languages
   75% of Fortune 100 Companies use LinkedIn to hire***




                                                  * As of February 9th 2012
                                              ** As of December 31st 2011
                                            *** As of September 30th 2011




                                                                         3
Broad Range of Products & Services




                                     4
User Profiles
                 Large dataset
                Medium writes
                Very high reads
                Freshness <1s




                             5
Communications
                 Large dataset
                   High writes
                   High reads
                 Freshness <1s




                            6
People You May Know
                        Large dataset
                      Compute intensive
                         High reads
                       Freshness ~hrs




                                   7
LinkedIn Today    Moving dataset
                    High writes
                    High reads
                 Freshness ~mins




                                   8
Outline


 LinkedIn Products
 Data Ecosystem
 LinkedIn Data Infrastructure Solutions
 Next Play




                                           9
Three Paradigms : Simplifying the Data Continuum




• Member Profiles         • Linkedin Today            • People You May Know

• Company Profiles        • Profile Standardization   • Connection Strength
• Connections             • News                      • News
• Communications          • Recommendations           • Recommendations
                          • Search                    • Next best idea
                          • Communications


 Online                   Nearline                     Offline
Activity that should     Activity that should         Activity that can be
be reflected immediately be reflected soon            reflected later


                                                                              10
LinkedIn Product Architecture




                                11
LinkedIn Product Architecture




                                12
LinkedIn Product Architecture




                                13
LinkedIn Data Infrastructure Solutions
Databus : Timeline-Consistent Change Data Capture




                                                    14
Databus at LinkedIn
                                                           Client
                   Relay                                    Consumer 1




                                              Client Lib
      Capture                      On-line




                                              Databus
 DB   Changes                      Changes
                  Event Win                                 Consumer n
                 On-line
                Changes

                 Bootstrap                                 Client
                                                            Consumer 1




                                              Client Lib
                                              Databus
                               Consistent
                              Snapshot at U
                      DB                                    Consumer n




                                                                         15
Databus at LinkedIn
                                                                  Client
                         Relay                                     Consumer 1




                                                     Client Lib
        Capture                           On-line




                                                     Databus
  DB    Changes                           Changes
                        Event Win                                  Consumer n
                      On-line
                     Changes

                       Bootstrap                                  Client
                                                                   Consumer 1




                                                     Client Lib
                                                     Databus
                                      Consistent
                                     Snapshot at U
                           DB                                      Consumer n




 Transport independent of data       Tens of relays
  source: Oracle, MySQL, …            Hundreds of sources
 Transactional semantics             Low latency - milliseconds
 In order, at least once delivery

                                                                                16
LinkedIn Product Architecture




                                17
LinkedIn Product Architecture




                                18
LinkedIn Data Infrastructure Solutions

Voldemort: Highly-Available Distributed KV Store




                                                   19
Voldemort: Architecture




 •   Pluggable components    • 10 clusters, 100+ nodes
 •   Tunable consistency /   • Largest cluster – 10K+ qps
     availability            • Avg latency: 3ms
 •   Key/value model,        • Hundreds of Stores
     server side “views”     • Largest store – 2.8TB+
LinkedIn Product Architecture




                                21
LinkedIn Data Infrastructure Solutions
Kafka: High-Volume Low-Latency Messaging System




                                                  22
LinkedIn Product Architecture




                                23
Kafka: Architecture
                          Broker Tier
WebTier                                                                  Consumers

     Push         Sequential write            sendfile     Pull
                                                                                    Iterator 1




                                                                       Client Lib
     Event                                                 Events
                             Topic 1




                                                                        Kafka
     s
     100 MB/sec                                           200 MB/sec
                             Topic 2
                                                                                    Iterator n
                             Topic N

                                                                                    Topic  Offset


                                     Topic, Partition                    Offset
                                     Ownership
                                                         Zookeeper       Management




                                                                                                 24
Kafka: Architecture
                          Broker Tier
WebTier                                                                     Consumers

        Push      Sequential write            sendfile        Pull
                                                                                       Iterator 1




                                                                          Client Lib
        Event                                                 Events
                             Topic 1




                                                                           Kafka
        s
     100 MB/sec                                              200 MB/sec
                             Topic 2
                                                                                       Iterator n
                             Topic N

                                                                                       Topic  Offset


                                     Topic, Partition                       Offset
                                     Ownership
                                                            Zookeeper       Management


        At least once delivery                            Billions of Events, TBs per day
        Very high throughput                              50K+ per sec at peak
        Low latency                                       Inter and Intra-cluster replication
        Durability                                        End-to-end latency: few seconds
                                                                                                    25
LinkedIn Product Architecture




                                26
LinkedIn Data Infrastructure Solutions
Espresso: Indexed Timeline-Consistent Distributed
           Data Store




                                                    27
Application View


                   Hierarchical data model



                   Rich functionality on resources
                        Conditional updates
                        Partial updates
                        Atomic counters


                   Rich functionality within
                   resource groups
                        Transactions
                        Secondary index
                        Text search


                                               28
Partitioning




               29
Espresso Partition Layout: Master, Slave
3 Storage Engine nodes, 2 way replication

   Database
                       P.1    P.2     P.3    P.5     P.6     P.7
 Partition: P.1
 Node: 1               P.4    P.5     P.6    P.8     P.1     P.2
 …
 Partition: P.12
 Node: 3
                       P.9    P.1            P.11    P.1
                              0                      2
                             Node 1                 Node 2
   Cluster

 Node: 1
 M: P.1 – Active       P.9    P.1     P.11
     …                        0
 S: P.5 – Active
     …                 P.1    P.3     P.4
                       2
                       P.7    P.8                            Master
   Cluster                                                   Slave
   Manager                   Node 3
Espresso: System Components




                              31
Generic Cluster Manager: Helix

• Generic Distributed State Model
• Centralized Config Management
• Automatic Load Balancing
• Fault tolerance
• Health monitoring
• Cluster expansion and
  rebalancing
• Espresso, Databus and Search
• Open Source Apr 2012
• https://github.com/linkedin/helix




                                      32
Espresso@Linkedin

 Launched first application Oct 2011
 Open source 2012
 Future
   – Multi-Datacenter support
   – Global secondary indexes
   – Time-partitioned data




                                        33
LinkedIn Product Architecture




                                34
Acknowledgments

 Siddharth Anand, Aditya Auradkar, Chavdar Botev, Vinoth Chandar,
 Shirshanka Das, Dave DeMaagd, Alex Feinberg, John Fung, Phanindra
 Ganti, Mihir Gandhi, Lei Gao, Bhaskar Ghosh, Kishore Gopalakrishna,
 Brendan Harris, Rajappa Iyer, Swaroop Jagadish, Joel Koshy, Kevin Krawez,
 Jay Kreps, Shi Lu, Sunil Nagaraj, Neha Narkhede, Sasha Pachev, Igor
 Perisic, Lin Qiao, Tom Quiggle, Jun Rao, Bob Schulman, Abraham
 Sebastian, Oliver Seeliger, Adam Silberstein, Boris Shkolnik, Chinmay
 Soman, Subbu Subramaniam, Roshan Sumbaly, Kapil Surlaker, Sajid
 Topiwala, Cuong Tran, Balaji Varadarajan, Jemiah Westerman, Zach White,
 Victor Ye, David Zhang, and Jason Zhang




                                                                         35
Questions?




             36

Contenu connexe

Tendances

[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloudJeff Hung
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInSam Shah
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseData Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseDataWorks Summit
 
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0Denodo
 
L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneMongoDB
 
MongoDB Europe 2016 - The Rise of the Data Lake
MongoDB Europe 2016 - The Rise of the Data LakeMongoDB Europe 2016 - The Rise of the Data Lake
MongoDB Europe 2016 - The Rise of the Data LakeMongoDB
 
What's New In MongoDB 3.6
What's New In MongoDB 3.6What's New In MongoDB 3.6
What's New In MongoDB 3.6MongoDB
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewDataWorks Summit/Hadoop Summit
 
All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...
All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...
All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...Databricks
 
MongoDB Europe 2016 - MongoDB Atlas
MongoDB Europe 2016 - MongoDB AtlasMongoDB Europe 2016 - MongoDB Atlas
MongoDB Europe 2016 - MongoDB AtlasMongoDB
 
Amundsen at Brex and Looker integration
Amundsen at Brex and Looker integrationAmundsen at Brex and Looker integration
Amundsen at Brex and Looker integrationmarkgrover
 
Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessWebinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessMongoDB
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryNeo4j
 
REA Group's journey with Data Cataloging and Amundsen
REA Group's journey with Data Cataloging and AmundsenREA Group's journey with Data Cataloging and Amundsen
REA Group's journey with Data Cataloging and Amundsenmarkgrover
 
[DataCon.TW 2018] Metadata Store: Generalized Entity Database for Intelligenc...
[DataCon.TW 2018] Metadata Store: Generalized Entity Database for Intelligenc...[DataCon.TW 2018] Metadata Store: Generalized Entity Database for Intelligenc...
[DataCon.TW 2018] Metadata Store: Generalized Entity Database for Intelligenc...Jeff Hung
 
Data Virtualization and ETL
Data Virtualization and ETLData Virtualization and ETL
Data Virtualization and ETLLily Luo
 
Encompassing Information Integration
Encompassing Information IntegrationEncompassing Information Integration
Encompassing Information Integrationnguyenfilip
 
Ultralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC EdgeUltralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC EdgeDataWorks Summit
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discoverymarkgrover
 
IoT Connected Brewery
IoT Connected BreweryIoT Connected Brewery
IoT Connected BreweryJason Hubbard
 

Tendances (20)

[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedIn
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseData Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
 
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
 
L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova Generazione
 
MongoDB Europe 2016 - The Rise of the Data Lake
MongoDB Europe 2016 - The Rise of the Data LakeMongoDB Europe 2016 - The Rise of the Data Lake
MongoDB Europe 2016 - The Rise of the Data Lake
 
What's New In MongoDB 3.6
What's New In MongoDB 3.6What's New In MongoDB 3.6
What's New In MongoDB 3.6
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
 
All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...
All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...
All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...
 
MongoDB Europe 2016 - MongoDB Atlas
MongoDB Europe 2016 - MongoDB AtlasMongoDB Europe 2016 - MongoDB Atlas
MongoDB Europe 2016 - MongoDB Atlas
 
Amundsen at Brex and Looker integration
Amundsen at Brex and Looker integrationAmundsen at Brex and Looker integration
Amundsen at Brex and Looker integration
 
Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessWebinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your Business
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
REA Group's journey with Data Cataloging and Amundsen
REA Group's journey with Data Cataloging and AmundsenREA Group's journey with Data Cataloging and Amundsen
REA Group's journey with Data Cataloging and Amundsen
 
[DataCon.TW 2018] Metadata Store: Generalized Entity Database for Intelligenc...
[DataCon.TW 2018] Metadata Store: Generalized Entity Database for Intelligenc...[DataCon.TW 2018] Metadata Store: Generalized Entity Database for Intelligenc...
[DataCon.TW 2018] Metadata Store: Generalized Entity Database for Intelligenc...
 
Data Virtualization and ETL
Data Virtualization and ETLData Virtualization and ETL
Data Virtualization and ETL
 
Encompassing Information Integration
Encompassing Information IntegrationEncompassing Information Integration
Encompassing Information Integration
 
Ultralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC EdgeUltralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC Edge
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
IoT Connected Brewery
IoT Connected BreweryIoT Connected Brewery
IoT Connected Brewery
 

En vedette

Resume- William Myers FD2016.1.4
Resume- William Myers FD2016.1.4Resume- William Myers FD2016.1.4
Resume- William Myers FD2016.1.4William Myers
 
Personal branding playbook
Personal branding playbookPersonal branding playbook
Personal branding playbookOnline Business
 
LinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationLinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationAmy W. Tang
 
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Shirshanka Das
 
Using Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsUsing Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsPerficient, Inc.
 
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...Shirshanka Das
 
Participatory Design: Bringing Users Into Your Process
Participatory Design: Bringing Users Into Your ProcessParticipatory Design: Bringing Users Into Your Process
Participatory Design: Bringing Users Into Your ProcessDavid Sherwin
 
Unlocking the Experts
Unlocking the ExpertsUnlocking the Experts
Unlocking the ExpertsLinkedIn
 
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...Edureka!
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShareSlideShare
 
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017Carol Smith
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Edureka!
 
Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...
Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...
Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...Shirshanka Das
 
What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...
What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...
What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...Edureka!
 
Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017NVIDIA
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017Carol Smith
 

En vedette (17)

Resume- William Myers FD2016.1.4
Resume- William Myers FD2016.1.4Resume- William Myers FD2016.1.4
Resume- William Myers FD2016.1.4
 
Personal branding playbook
Personal branding playbookPersonal branding playbook
Personal branding playbook
 
LinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationLinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data Application
 
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
 
Using Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsUsing Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and Analytics
 
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
 
Participatory Design: Bringing Users Into Your Process
Participatory Design: Bringing Users Into Your ProcessParticipatory Design: Bringing Users Into Your Process
Participatory Design: Bringing Users Into Your Process
 
Unlocking the Experts
Unlocking the ExpertsUnlocking the Experts
Unlocking the Experts
 
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShare
 
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 
Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...
Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...
Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...
 
What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...
What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...
What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...
 
Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
 

Similaire à Data Infrastructure at LinkedIn

Xldb2011 tue 1005_linked_in
Xldb2011 tue 1005_linked_inXldb2011 tue 1005_linked_in
Xldb2011 tue 1005_linked_inliqiang xu
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarCloudera, Inc.
 
Introduction to Databus
Introduction to DatabusIntroduction to Databus
Introduction to DatabusAmy W. Tang
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationMongoDB
 
Apache Kafka: Past, Present and Future
Apache Kafka: Past, Present and FutureApache Kafka: Past, Present and Future
Apache Kafka: Past, Present and Futureconfluent
 
Meetup realtime datacollection
Meetup realtime datacollectionMeetup realtime datacollection
Meetup realtime datacollectionInder Singh
 
InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2
InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2
InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2Calpont Corporation
 
Lovett introducing cloud computing nov 2009
Lovett introducing cloud computing nov 2009Lovett introducing cloud computing nov 2009
Lovett introducing cloud computing nov 2009Hilde Lovett
 
Embedded Analytics: The Next Mega-Wave of Innovation
Embedded Analytics: The Next Mega-Wave of InnovationEmbedded Analytics: The Next Mega-Wave of Innovation
Embedded Analytics: The Next Mega-Wave of InnovationInside Analysis
 
Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...
Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...
Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...SL Corporation
 
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYCAWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYCAmazon Web Services
 
Audaxis : BI Project for an Association of Pharmacists
Audaxis : BI Project for an Association of PharmacistsAudaxis : BI Project for an Association of Pharmacists
Audaxis : BI Project for an Association of PharmacistsAudaxis
 
Citrix Netscaler Intro
Citrix Netscaler IntroCitrix Netscaler Intro
Citrix Netscaler IntroRui Lopes
 
Load Balancing und Beschleunigung mit Citrix Net Scaler
Load Balancing und Beschleunigung mit Citrix Net ScalerLoad Balancing und Beschleunigung mit Citrix Net Scaler
Load Balancing und Beschleunigung mit Citrix Net ScalerDigicomp Academy AG
 
All Aboard the Databus
All Aboard the DatabusAll Aboard the Databus
All Aboard the DatabusAmy W. Tang
 
NOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jNOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jTobias Lindaaker
 
Big Data Journeys: Review of roadmaps taken by early adopters to achieve thei...
Big Data Journeys: Review of roadmaps taken by early adopters to achieve thei...Big Data Journeys: Review of roadmaps taken by early adopters to achieve thei...
Big Data Journeys: Review of roadmaps taken by early adopters to achieve thei...Krishnan Parasuraman
 
Meetup Microservices Commandments
Meetup Microservices CommandmentsMeetup Microservices Commandments
Meetup Microservices CommandmentsBill Zajac
 
Service Oriented Architecture (SOA) [1/5] : Introduction to SOA
Service Oriented Architecture (SOA) [1/5] : Introduction to SOAService Oriented Architecture (SOA) [1/5] : Introduction to SOA
Service Oriented Architecture (SOA) [1/5] : Introduction to SOAIMC Institute
 

Similaire à Data Infrastructure at LinkedIn (20)

Xldb2011 tue 1005_linked_in
Xldb2011 tue 1005_linked_inXldb2011 tue 1005_linked_in
Xldb2011 tue 1005_linked_in
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
 
Introduction to Databus
Introduction to DatabusIntroduction to Databus
Introduction to Databus
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
Secure Big Data Analytics - Hadoop & Intel
Secure Big Data Analytics - Hadoop & IntelSecure Big Data Analytics - Hadoop & Intel
Secure Big Data Analytics - Hadoop & Intel
 
Apache Kafka: Past, Present and Future
Apache Kafka: Past, Present and FutureApache Kafka: Past, Present and Future
Apache Kafka: Past, Present and Future
 
Meetup realtime datacollection
Meetup realtime datacollectionMeetup realtime datacollection
Meetup realtime datacollection
 
InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2
InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2
InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2
 
Lovett introducing cloud computing nov 2009
Lovett introducing cloud computing nov 2009Lovett introducing cloud computing nov 2009
Lovett introducing cloud computing nov 2009
 
Embedded Analytics: The Next Mega-Wave of Innovation
Embedded Analytics: The Next Mega-Wave of InnovationEmbedded Analytics: The Next Mega-Wave of Innovation
Embedded Analytics: The Next Mega-Wave of Innovation
 
Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...
Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...
Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...
 
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYCAWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
 
Audaxis : BI Project for an Association of Pharmacists
Audaxis : BI Project for an Association of PharmacistsAudaxis : BI Project for an Association of Pharmacists
Audaxis : BI Project for an Association of Pharmacists
 
Citrix Netscaler Intro
Citrix Netscaler IntroCitrix Netscaler Intro
Citrix Netscaler Intro
 
Load Balancing und Beschleunigung mit Citrix Net Scaler
Load Balancing und Beschleunigung mit Citrix Net ScalerLoad Balancing und Beschleunigung mit Citrix Net Scaler
Load Balancing und Beschleunigung mit Citrix Net Scaler
 
All Aboard the Databus
All Aboard the DatabusAll Aboard the Databus
All Aboard the Databus
 
NOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jNOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4j
 
Big Data Journeys: Review of roadmaps taken by early adopters to achieve thei...
Big Data Journeys: Review of roadmaps taken by early adopters to achieve thei...Big Data Journeys: Review of roadmaps taken by early adopters to achieve thei...
Big Data Journeys: Review of roadmaps taken by early adopters to achieve thei...
 
Meetup Microservices Commandments
Meetup Microservices CommandmentsMeetup Microservices Commandments
Meetup Microservices Commandments
 
Service Oriented Architecture (SOA) [1/5] : Introduction to SOA
Service Oriented Architecture (SOA) [1/5] : Introduction to SOAService Oriented Architecture (SOA) [1/5] : Introduction to SOA
Service Oriented Architecture (SOA) [1/5] : Introduction to SOA
 

Plus de Amy W. Tang

Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedInAmy W. Tang
 
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)Amy W. Tang
 
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)Amy W. Tang
 
Building Distributed Systems Using Helix
Building Distributed Systems Using HelixBuilding Distributed Systems Using Helix
Building Distributed Systems Using HelixAmy W. Tang
 
LinkedIn Graph Presentation
LinkedIn Graph PresentationLinkedIn Graph Presentation
LinkedIn Graph PresentationAmy W. Tang
 
Voldemort on Solid State Drives
Voldemort on Solid State DrivesVoldemort on Solid State Drives
Voldemort on Solid State DrivesAmy W. Tang
 
Untangling Cluster Management with Helix
Untangling Cluster Management with HelixUntangling Cluster Management with Helix
Untangling Cluster Management with HelixAmy W. Tang
 
A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedIn
A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedInA Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedIn
A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedInAmy W. Tang
 

Plus de Amy W. Tang (8)

Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn
 
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
 
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
 
Building Distributed Systems Using Helix
Building Distributed Systems Using HelixBuilding Distributed Systems Using Helix
Building Distributed Systems Using Helix
 
LinkedIn Graph Presentation
LinkedIn Graph PresentationLinkedIn Graph Presentation
LinkedIn Graph Presentation
 
Voldemort on Solid State Drives
Voldemort on Solid State DrivesVoldemort on Solid State Drives
Voldemort on Solid State Drives
 
Untangling Cluster Management with Helix
Untangling Cluster Management with HelixUntangling Cluster Management with Helix
Untangling Cluster Management with Helix
 
A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedIn
A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedInA Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedIn
A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedIn
 

Dernier

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Dernier (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Data Infrastructure at LinkedIn

  • 1. Data Infrastructure at LinkedIn Kapil Surlaker http://www.linkedin.com/in/kapilsurlaker @kapilsurlaker 1
  • 2. Outline  LinkedIn Products  Data Ecosystem  LinkedIn Data Infrastructure Solutions  Next Play 2
  • 3. LinkedIn By The Numbers  150M + users*  ~ 4.2B People Searches in 2011**  >2M companies with LinkedIn Company Pages**  16 languages  75% of Fortune 100 Companies use LinkedIn to hire*** * As of February 9th 2012 ** As of December 31st 2011 *** As of September 30th 2011 3
  • 4. Broad Range of Products & Services 4
  • 5. User Profiles Large dataset Medium writes Very high reads Freshness <1s 5
  • 6. Communications Large dataset High writes High reads Freshness <1s 6
  • 7. People You May Know Large dataset Compute intensive High reads Freshness ~hrs 7
  • 8. LinkedIn Today Moving dataset High writes High reads Freshness ~mins 8
  • 9. Outline  LinkedIn Products  Data Ecosystem  LinkedIn Data Infrastructure Solutions  Next Play 9
  • 10. Three Paradigms : Simplifying the Data Continuum • Member Profiles • Linkedin Today • People You May Know • Company Profiles • Profile Standardization • Connection Strength • Connections • News • News • Communications • Recommendations • Recommendations • Search • Next best idea • Communications Online Nearline Offline Activity that should Activity that should Activity that can be be reflected immediately be reflected soon reflected later 10
  • 14. LinkedIn Data Infrastructure Solutions Databus : Timeline-Consistent Change Data Capture 14
  • 15. Databus at LinkedIn Client Relay Consumer 1 Client Lib Capture On-line Databus DB Changes Changes Event Win Consumer n On-line Changes Bootstrap Client Consumer 1 Client Lib Databus Consistent Snapshot at U DB Consumer n 15
  • 16. Databus at LinkedIn Client Relay Consumer 1 Client Lib Capture On-line Databus DB Changes Changes Event Win Consumer n On-line Changes Bootstrap Client Consumer 1 Client Lib Databus Consistent Snapshot at U DB Consumer n  Transport independent of data  Tens of relays source: Oracle, MySQL, …  Hundreds of sources  Transactional semantics  Low latency - milliseconds  In order, at least once delivery 16
  • 19. LinkedIn Data Infrastructure Solutions Voldemort: Highly-Available Distributed KV Store 19
  • 20. Voldemort: Architecture • Pluggable components • 10 clusters, 100+ nodes • Tunable consistency / • Largest cluster – 10K+ qps availability • Avg latency: 3ms • Key/value model, • Hundreds of Stores server side “views” • Largest store – 2.8TB+
  • 22. LinkedIn Data Infrastructure Solutions Kafka: High-Volume Low-Latency Messaging System 22
  • 24. Kafka: Architecture Broker Tier WebTier Consumers Push Sequential write sendfile Pull Iterator 1 Client Lib Event Events Topic 1 Kafka s 100 MB/sec 200 MB/sec Topic 2 Iterator n Topic N Topic  Offset Topic, Partition Offset Ownership Zookeeper Management 24
  • 25. Kafka: Architecture Broker Tier WebTier Consumers Push Sequential write sendfile Pull Iterator 1 Client Lib Event Events Topic 1 Kafka s 100 MB/sec 200 MB/sec Topic 2 Iterator n Topic N Topic  Offset Topic, Partition Offset Ownership Zookeeper Management  At least once delivery  Billions of Events, TBs per day  Very high throughput  50K+ per sec at peak  Low latency  Inter and Intra-cluster replication  Durability  End-to-end latency: few seconds 25
  • 27. LinkedIn Data Infrastructure Solutions Espresso: Indexed Timeline-Consistent Distributed Data Store 27
  • 28. Application View Hierarchical data model Rich functionality on resources  Conditional updates  Partial updates  Atomic counters Rich functionality within resource groups  Transactions  Secondary index  Text search 28
  • 30. Espresso Partition Layout: Master, Slave 3 Storage Engine nodes, 2 way replication Database P.1 P.2 P.3 P.5 P.6 P.7 Partition: P.1 Node: 1 P.4 P.5 P.6 P.8 P.1 P.2 … Partition: P.12 Node: 3 P.9 P.1 P.11 P.1 0 2 Node 1 Node 2 Cluster Node: 1 M: P.1 – Active P.9 P.1 P.11 … 0 S: P.5 – Active … P.1 P.3 P.4 2 P.7 P.8 Master Cluster Slave Manager Node 3
  • 32. Generic Cluster Manager: Helix • Generic Distributed State Model • Centralized Config Management • Automatic Load Balancing • Fault tolerance • Health monitoring • Cluster expansion and rebalancing • Espresso, Databus and Search • Open Source Apr 2012 • https://github.com/linkedin/helix 32
  • 33. Espresso@Linkedin  Launched first application Oct 2011  Open source 2012  Future – Multi-Datacenter support – Global secondary indexes – Time-partitioned data 33
  • 35. Acknowledgments Siddharth Anand, Aditya Auradkar, Chavdar Botev, Vinoth Chandar, Shirshanka Das, Dave DeMaagd, Alex Feinberg, John Fung, Phanindra Ganti, Mihir Gandhi, Lei Gao, Bhaskar Ghosh, Kishore Gopalakrishna, Brendan Harris, Rajappa Iyer, Swaroop Jagadish, Joel Koshy, Kevin Krawez, Jay Kreps, Shi Lu, Sunil Nagaraj, Neha Narkhede, Sasha Pachev, Igor Perisic, Lin Qiao, Tom Quiggle, Jun Rao, Bob Schulman, Abraham Sebastian, Oliver Seeliger, Adam Silberstein, Boris Shkolnik, Chinmay Soman, Subbu Subramaniam, Roshan Sumbaly, Kapil Surlaker, Sajid Topiwala, Cuong Tran, Balaji Varadarajan, Jemiah Westerman, Zach White, Victor Ye, David Zhang, and Jason Zhang 35