SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
Cloud Infrastructure:
  GFS & MapReduce
          Andrii Vozniuk
  Used slides of: Jeff Dean, Ed Austin


Data Management in the Cloud
            EPFL
      February 27, 2012
Outline
•   Motivation
•   Problem Statement
•   Storage: Google File System (GFS)
•   Processing: MapReduce
•   Benchmarks
•   Conclusions
Motivation
• Huge amounts of data to store and process
• Example @2004:
  – 20+ billion web pages x 20KB/page = 400+ TB
  – Reading from one disc 30-35 MB/s
     • Four months just to read the web
     • 1000 hard drives just to store the web
     • Even more complicated if we want to process data
• Exp. growth. The solution should be scalable.
Motivation
• Buy super fast, ultra reliable hardware?
  – Ultra expensive
  – Controlled by third party
  – Internals can be hidden and proprietary
  – Hard to predict scalability
  – Fails less often, but still fails!
  – No suitable solution on the market
Motivation
• Use commodity hardware? Benefits:
   –   Commodity machines offer much better perf/$
   –   Full control on and understanding of internals
   –   Can be highly optimized for their workloads
   –   Really smart people can do really smart things

• Not that easy:
   – Fault tolerance: something breaks all the time
   – Applications development
   – Debugging, Optimization, Locality
   – Communication and coordination
   – Status reporting, monitoring
• Handle all these issues for every problem you want to solve
Problem Statement
• Develop a scalable distributed file system for
  large data-intensive applications running on
  inexpensive commodity hardware
• Develop a tool for processing large data sets in
  parallel on inexpensive commodity hardware
• Develop the both in coordination for optimal
  performance
Google Cluster Environment



          Cloud                  Datacenters




Servers       Racks   Clusters
Google Cluster Environment
• @2009:
   –   200+ clusters
   –   1000+ machines in many of them
   –   4+ PB File systems
   –   40GB/s read/write load
   –   Frequent HW failures
   –   100s to 1000s active jobs (1 to 1000 tasks)
• Cluster is 1000s of machines
   – Stuff breaks: for 1000 – 1 per day, for 10000 – 10 per day
   – How to store data reliably with high throughput?
   – How to make it easy to develop distributed applications?
Google Technology Stack
Google Technology Stack




                   Focus of
                   this talk
GF S          Google File System*
•   Inexpensive commodity hardware
•   High Throughput > Low Latency
•   Large files (multi GB)
•   Multiple clients
•   Workload
    – Large streaming reads
    – Small random writes
    – Concurrent append to the same file
* The Google File System. S. Ghemawat, H. Gobioff, S. Leung. SOSP, 2003
GF S                      Architecture




•   User-level process running on commodity Linux machines
•   Consists of Master Server and Chunk Servers
•   Files broken into chunks (typically 64 MB), 3x redundancy (clusters, DCs)
•   Data transfers happen directly between clients and Chunk Servers
GF S                  Master Node
• Centralization for simplicity
• Namespace and metadata management
• Managing chunks
   –   Where they are (file<-chunks, replicas)
   –   Where to put new
   –   When to re-replicate (failure, load-balancing)
   –   When and what to delete (garbage collection)
• Fault tolerance
   –   Shadow masters
   –   Monitoring infrastructure outside of GFS
   –   Periodic snapshots
   –   Mirrored operations log
GF S              Master Node
• Metadata is in memory – it’s fast!
  – A 64 MB chunk needs less than 64B metadata => for 640
    TB less than 640MB
• Asks Chunk Servers when
  – Master starts
  – Chunk Server joins the cluster
• Operation log
  – Is used for serialization of concurrent operations
  – Replicated
  – Respond to client only when log is flushed locally and
    remotely
GF S          Chunk Servers
• 64MB chunks as Linux files
  – Reduce size of the master‘s datastructure
  – Reduce client-master interaction
  – Internal fragmentation => allocate space lazily
  – Possible hotspots => re-replicate
• Fault tolerance
  – Heart-beat to the master
  – Something wrong => master inits replication
GF S                 Mutation Order
                             Current lease holder?
    Write request


                        3a. data          identity of primary
                                          location of replicas
                                          (cached by client)

Operation completed                Operation completed
or Error report         3b. data
                                          Primary assigns # to mutations
                                          Applies it
                                          Forwards write request

                        3c. data

                                   Operation completed
GF S         Control & Data Flow
• Decouple control flow and data flow
• Control flow
  – Master -> Primary -> Secondaries
• Data flow
  – Carefully picked chain of Chunk Servers
       • Forward to the closest first
       • Distance estimated based on IP
  – Fully utilize outbound bandwidth
  – Pipelining to exploit full-duplex links
GF S     Other Important Things
• Snapshot operation – make a copy very fast
• Smart chunks creation policy
  – Below-average disk utilization, limited # of recent
• Smart re-replication policy
  – Under replicated first
  – Chunks that are blocking client
  – Live files first (rather than deleted)
• Rebalance and GC periodically

          How to process data stored in GFS?
M M M

 R   R                MapReduce*
• A simple programming model applicable to many
  large-scale computing problems
• Divide and conquer strategy
• Hide messy details in MapReduce runtime library:
     –   Automatic parallelization
     –   Load balancing
     –   Network and disk transfer optimizations
     –   Fault tolerance
     –   Part of the stack: improvements to core library benefit
         all users of library
*MapReduce: Simplified Data Processing on Large Clusters.
J. Dean, S. Ghemawat. OSDI, 2004
M M M

R   R           Typical problem
• Read a lot of data. Break it into the parts
• Map: extract something important from each part
• Shuffle and Sort
• Reduce: aggregate, summarize, filter, transform Map results
• Write the results
• Chain, Cascade

• Implement Map and Reduce to fit the problem
M M M

R   R   Nice Example
M M M

R   R     Other suitable examples
•   Distributed grep
•   Distributed sort (Hadoop MapReduce won TeraSort)
•   Term-vector per host
•   Document clustering
•   Machine learning
•   Web access log stats
•   Web link-graph reversal
•   Inverted index construction
•   Statistical machine translation
M M M

R   R                        Model
• Programmer specifies two primary methods
    – Map(k,v) -> <k’,v’>*
    – Reduce(k’, <v’>*) -> <k’,v’>*
• All v’ with same k’ are reduced together, in order
• Usually also specify how to partition k’
    – Partition(k’, total partitions) -> partition for k’
        • Often a simple hash of the key
        • Allows reduce operations for different k’ to be parallelized
M M M

R   R                   Code
Map
    map(String key, String value):
      // key: document name
      // value: document contents
      for each word w in value:
          EmitIntermediate(w, "1");
Reduce
    reduce(String key, Iterator values):
      // key: a word
      // values: a list of counts
      int result = 0;
      for each v in values:
          result += ParseInt(v);
      Emit(AsString(result))
M M M

R   R                 Architecture




        • One master, many workers
        • Infrastructure manages scheduling and distribution
        • User implements Map and Reduce
M M M

R   R   Architecture
M M M

R   R   Architecture
M M M

R   R   Architecture




        Combiner = local Reduce
M M M

R   R               Important things
•   Mappers scheduled close to data
•   Chunk replication improves locality
•   Reducers often run on same machine as mappers
•   Fault tolerance
    –   Map crash – re-launch all task of machine
    –   Reduce crash – repeat crashed task only
    –   Master crash – repeat whole job
    –   Skip bad records
• Fighting ‘stragglers’ by launching backup tasks
• Proven scalability
• September 2009 Google ran 3,467,000 MR Jobs averaging
  488 machines per Job
• Extensively used in Yahoo and Facebook with Hadoop
GF S   Benchmarks: GFS
M M M

R   R   Benchmarks: MapReduce
Conclusions
“We believe we get tremendous competitive advantage
by essentially building our own infrastructure”
-- Eric Schmidt

• GFS & MapReduce
   – Google achieved their goals
   – A fundamental part of their stack
• Open source implementations
   – GFS  Hadoop Distributed FS (HDFS)
   – MapReduce  Hadoop MapReduce
Thank your for your attention!
   Andrii.Vozniuk@epfl.ch

Contenu connexe

Tendances

The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)Romain Jacotin
 
Google File System
Google File SystemGoogle File System
Google File Systemguest2cb4689
 
advanced Google file System
advanced Google file Systemadvanced Google file System
advanced Google file Systemdiptipan
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Antonio Cesarano
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoverySteven Francia
 
google file system
google file systemgoogle file system
google file systemdiptipan
 
Google file system
Google file systemGoogle file system
Google file systemDhan V Sagar
 
Google File Systems
Google File SystemsGoogle File Systems
Google File SystemsAzeem Mumtaz
 
Google file system GFS
Google file system GFSGoogle file system GFS
Google file system GFSzihad164
 
Google File System
Google File SystemGoogle File System
Google File Systemnadikari123
 
Teoria efectului defectului hardware: GoogleFS
Teoria efectului defectului hardware: GoogleFSTeoria efectului defectului hardware: GoogleFS
Teoria efectului defectului hardware: GoogleFSAsociatia ProLinux
 
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...areej qasrawi
 

Tendances (20)

The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
 
Google file system
Google file systemGoogle file system
Google file system
 
Google File System
Google File SystemGoogle File System
Google File System
 
Google File System
Google File SystemGoogle File System
Google File System
 
advanced Google file System
advanced Google file Systemadvanced Google file System
advanced Google file System
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...
 
Google file system
Google file systemGoogle file system
Google file system
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster Recovery
 
Google File System
Google File SystemGoogle File System
Google File System
 
google file system
google file systemgoogle file system
google file system
 
Google file system
Google file systemGoogle file system
Google file system
 
Google File Systems
Google File SystemsGoogle File Systems
Google File Systems
 
Google file system GFS
Google file system GFSGoogle file system GFS
Google file system GFS
 
Google File System
Google File SystemGoogle File System
Google File System
 
Google file system
Google file systemGoogle file system
Google file system
 
Unit 2.pptx
Unit 2.pptxUnit 2.pptx
Unit 2.pptx
 
Teoria efectului defectului hardware: GoogleFS
Teoria efectului defectului hardware: GoogleFSTeoria efectului defectului hardware: GoogleFS
Teoria efectului defectului hardware: GoogleFS
 
Gfs介绍
Gfs介绍Gfs介绍
Gfs介绍
 
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
 
Anatomy of file write in hadoop
Anatomy of file write in hadoopAnatomy of file write in hadoop
Anatomy of file write in hadoop
 

Similaire à Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk

HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.pptvijayapraba1
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloudelliando dias
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersCleverence Kombe
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewKonstantin V. Shvachko
 
MapReduce presentation
MapReduce presentationMapReduce presentation
MapReduce presentationVu Thi Trang
 
In memory grids IMDG
In memory grids IMDGIn memory grids IMDG
In memory grids IMDGPrateek Jain
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreducehansen3032
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in JavaRuben Badaró
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingBikas Saha
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big DataJoe Alex
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez DataWorks Summit
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tipsSubhas Kumar Ghosh
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 

Similaire à Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk (20)

try
trytry
try
 
Hadoop
HadoopHadoop
Hadoop
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clusters
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
MapReduce presentation
MapReduce presentationMapReduce presentation
MapReduce presentation
 
In memory grids IMDG
In memory grids IMDGIn memory grids IMDG
In memory grids IMDG
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
 
MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
 
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tips
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 

Plus de Andrii Vozniuk

Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...
Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...
Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...Andrii Vozniuk
 
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...Embedded interactive learning analytics dashboards with Elasticsearch and Kib...
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...Andrii Vozniuk
 
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Andrii Vozniuk
 
Combining content analytics and activity tracking to mine user interests and ...
Combining content analytics and activity tracking to mine user interests and ...Combining content analytics and activity tracking to mine user interests and ...
Combining content analytics and activity tracking to mine user interests and ...Andrii Vozniuk
 
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...Andrii Vozniuk
 
Contextual learning analytics apps to create awareness in blended inquiry lea...
Contextual learning analytics apps to create awareness in blended inquiry lea...Contextual learning analytics apps to create awareness in blended inquiry lea...
Contextual learning analytics apps to create awareness in blended inquiry lea...Andrii Vozniuk
 
Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...
Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...
Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...Andrii Vozniuk
 
Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...
Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...
Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...Andrii Vozniuk
 
AngeLA: Putting the teacher in control of student privacy in the online class...
AngeLA: Putting the teacher in control of student privacy in the online class...AngeLA: Putting the teacher in control of student privacy in the online class...
AngeLA: Putting the teacher in control of student privacy in the online class...Andrii Vozniuk
 
Scheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukScheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukAndrii Vozniuk
 
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk Andrii Vozniuk
 

Plus de Andrii Vozniuk (11)

Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...
Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...
Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...
 
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...Embedded interactive learning analytics dashboards with Elasticsearch and Kib...
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...
 
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
 
Combining content analytics and activity tracking to mine user interests and ...
Combining content analytics and activity tracking to mine user interests and ...Combining content analytics and activity tracking to mine user interests and ...
Combining content analytics and activity tracking to mine user interests and ...
 
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...
 
Contextual learning analytics apps to create awareness in blended inquiry lea...
Contextual learning analytics apps to create awareness in blended inquiry lea...Contextual learning analytics apps to create awareness in blended inquiry lea...
Contextual learning analytics apps to create awareness in blended inquiry lea...
 
Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...
Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...
Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...
 
Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...
Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...
Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...
 
AngeLA: Putting the teacher in control of student privacy in the online class...
AngeLA: Putting the teacher in control of student privacy in the online class...AngeLA: Putting the teacher in control of student privacy in the online class...
AngeLA: Putting the teacher in control of student privacy in the online class...
 
Scheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukScheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii Vozniuk
 
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
 

Dernier

Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Dernier (20)

Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk

  • 1. Cloud Infrastructure: GFS & MapReduce Andrii Vozniuk Used slides of: Jeff Dean, Ed Austin Data Management in the Cloud EPFL February 27, 2012
  • 2. Outline • Motivation • Problem Statement • Storage: Google File System (GFS) • Processing: MapReduce • Benchmarks • Conclusions
  • 3. Motivation • Huge amounts of data to store and process • Example @2004: – 20+ billion web pages x 20KB/page = 400+ TB – Reading from one disc 30-35 MB/s • Four months just to read the web • 1000 hard drives just to store the web • Even more complicated if we want to process data • Exp. growth. The solution should be scalable.
  • 4. Motivation • Buy super fast, ultra reliable hardware? – Ultra expensive – Controlled by third party – Internals can be hidden and proprietary – Hard to predict scalability – Fails less often, but still fails! – No suitable solution on the market
  • 5. Motivation • Use commodity hardware? Benefits: – Commodity machines offer much better perf/$ – Full control on and understanding of internals – Can be highly optimized for their workloads – Really smart people can do really smart things • Not that easy: – Fault tolerance: something breaks all the time – Applications development – Debugging, Optimization, Locality – Communication and coordination – Status reporting, monitoring • Handle all these issues for every problem you want to solve
  • 6. Problem Statement • Develop a scalable distributed file system for large data-intensive applications running on inexpensive commodity hardware • Develop a tool for processing large data sets in parallel on inexpensive commodity hardware • Develop the both in coordination for optimal performance
  • 7. Google Cluster Environment Cloud Datacenters Servers Racks Clusters
  • 8. Google Cluster Environment • @2009: – 200+ clusters – 1000+ machines in many of them – 4+ PB File systems – 40GB/s read/write load – Frequent HW failures – 100s to 1000s active jobs (1 to 1000 tasks) • Cluster is 1000s of machines – Stuff breaks: for 1000 – 1 per day, for 10000 – 10 per day – How to store data reliably with high throughput? – How to make it easy to develop distributed applications?
  • 10. Google Technology Stack Focus of this talk
  • 11. GF S Google File System* • Inexpensive commodity hardware • High Throughput > Low Latency • Large files (multi GB) • Multiple clients • Workload – Large streaming reads – Small random writes – Concurrent append to the same file * The Google File System. S. Ghemawat, H. Gobioff, S. Leung. SOSP, 2003
  • 12. GF S Architecture • User-level process running on commodity Linux machines • Consists of Master Server and Chunk Servers • Files broken into chunks (typically 64 MB), 3x redundancy (clusters, DCs) • Data transfers happen directly between clients and Chunk Servers
  • 13. GF S Master Node • Centralization for simplicity • Namespace and metadata management • Managing chunks – Where they are (file<-chunks, replicas) – Where to put new – When to re-replicate (failure, load-balancing) – When and what to delete (garbage collection) • Fault tolerance – Shadow masters – Monitoring infrastructure outside of GFS – Periodic snapshots – Mirrored operations log
  • 14. GF S Master Node • Metadata is in memory – it’s fast! – A 64 MB chunk needs less than 64B metadata => for 640 TB less than 640MB • Asks Chunk Servers when – Master starts – Chunk Server joins the cluster • Operation log – Is used for serialization of concurrent operations – Replicated – Respond to client only when log is flushed locally and remotely
  • 15. GF S Chunk Servers • 64MB chunks as Linux files – Reduce size of the master‘s datastructure – Reduce client-master interaction – Internal fragmentation => allocate space lazily – Possible hotspots => re-replicate • Fault tolerance – Heart-beat to the master – Something wrong => master inits replication
  • 16. GF S Mutation Order Current lease holder? Write request 3a. data identity of primary location of replicas (cached by client) Operation completed Operation completed or Error report 3b. data Primary assigns # to mutations Applies it Forwards write request 3c. data Operation completed
  • 17. GF S Control & Data Flow • Decouple control flow and data flow • Control flow – Master -> Primary -> Secondaries • Data flow – Carefully picked chain of Chunk Servers • Forward to the closest first • Distance estimated based on IP – Fully utilize outbound bandwidth – Pipelining to exploit full-duplex links
  • 18. GF S Other Important Things • Snapshot operation – make a copy very fast • Smart chunks creation policy – Below-average disk utilization, limited # of recent • Smart re-replication policy – Under replicated first – Chunks that are blocking client – Live files first (rather than deleted) • Rebalance and GC periodically How to process data stored in GFS?
  • 19. M M M R R MapReduce* • A simple programming model applicable to many large-scale computing problems • Divide and conquer strategy • Hide messy details in MapReduce runtime library: – Automatic parallelization – Load balancing – Network and disk transfer optimizations – Fault tolerance – Part of the stack: improvements to core library benefit all users of library *MapReduce: Simplified Data Processing on Large Clusters. J. Dean, S. Ghemawat. OSDI, 2004
  • 20. M M M R R Typical problem • Read a lot of data. Break it into the parts • Map: extract something important from each part • Shuffle and Sort • Reduce: aggregate, summarize, filter, transform Map results • Write the results • Chain, Cascade • Implement Map and Reduce to fit the problem
  • 21. M M M R R Nice Example
  • 22. M M M R R Other suitable examples • Distributed grep • Distributed sort (Hadoop MapReduce won TeraSort) • Term-vector per host • Document clustering • Machine learning • Web access log stats • Web link-graph reversal • Inverted index construction • Statistical machine translation
  • 23. M M M R R Model • Programmer specifies two primary methods – Map(k,v) -> <k’,v’>* – Reduce(k’, <v’>*) -> <k’,v’>* • All v’ with same k’ are reduced together, in order • Usually also specify how to partition k’ – Partition(k’, total partitions) -> partition for k’ • Often a simple hash of the key • Allows reduce operations for different k’ to be parallelized
  • 24. M M M R R Code Map map(String key, String value): // key: document name // value: document contents for each word w in value: EmitIntermediate(w, "1"); Reduce reduce(String key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result))
  • 25. M M M R R Architecture • One master, many workers • Infrastructure manages scheduling and distribution • User implements Map and Reduce
  • 26. M M M R R Architecture
  • 27. M M M R R Architecture
  • 28. M M M R R Architecture Combiner = local Reduce
  • 29. M M M R R Important things • Mappers scheduled close to data • Chunk replication improves locality • Reducers often run on same machine as mappers • Fault tolerance – Map crash – re-launch all task of machine – Reduce crash – repeat crashed task only – Master crash – repeat whole job – Skip bad records • Fighting ‘stragglers’ by launching backup tasks • Proven scalability • September 2009 Google ran 3,467,000 MR Jobs averaging 488 machines per Job • Extensively used in Yahoo and Facebook with Hadoop
  • 30. GF S Benchmarks: GFS
  • 31. M M M R R Benchmarks: MapReduce
  • 32. Conclusions “We believe we get tremendous competitive advantage by essentially building our own infrastructure” -- Eric Schmidt • GFS & MapReduce – Google achieved their goals – A fundamental part of their stack • Open source implementations – GFS  Hadoop Distributed FS (HDFS) – MapReduce  Hadoop MapReduce
  • 33. Thank your for your attention! Andrii.Vozniuk@epfl.ch