SlideShare une entreprise Scribd logo
1  sur  38
the cloud
COMPUTING
Outline
 Large-scale Distributed Systems
 Introduction to Cloud Computing
 Cloud Computing paradigms and models
 Introduction to MapReduce
 Alternative architectures
 Writing Application using Hadoop
Distributed Systems
 Set of discrete machines which cooperate to perform
computation
 Give the notion of a single “machine”
 Keep the distribution transparent
 Examples:
 Compute clusters
 Distributed storage systems, such as Dropbox, Google Drive, etc.
 The Web
Characteristics
 Ordering
 Time is used to ensure ordering
 In most cases, only need to know that a happened before
b, known as the happens-before relation
 Distributed Mutual Exclusion
 Concurrent access to shared resources needs to be
synchronized
 Central lock server: All lock requests are handled by a
central server
 Token passing: Arrange nodes into a ring and a token is
passed around
 Totally-ordered multicast: Clients multicast requests to
each other
Characteristics (2)
 Distributed transactions
 Distributed transactions span multiple transaction processing servers
 Actions need to be coordinated across multiple parties
 Replication
A number of distributed systems involve replication
 Data replication: Multiple copies of some object stored at different servers
 Computation replication: Multiple servers capable of providing an
operation
 Advantages
1. Load balancing: Work spread out across clients
2. Lower latency: Better performance if replica close to the client
3. Fault tolerance: Failure of some replicas can be tolerated
CAP
 CAP
 Consistency: All nodes see the same state
 Availability: All requests get a response
 Partitioning: System continues to operate even in the
face of node failure
 Brewer‟s conjecture states that in a distributed
system only 2 out of 3 possible
 In the current setup, partitioning is a given:
Hardware/software fails all the time
 Therefore, systems need to choose between
consistency and availability
Advantages
 Scalability:
 The scale of the Internet (think how many queries Google servers handle daily)
 Only a matter of adding more machines
 Cheaper than super computers
 More machines means more parallelism, hence better performance
 Sharing:
 The same resource is shared between multiple users
 Just like the Internet is shared between millions of users
 Communication:
 Communication between (potentially geographically isolated) machines and
users (via email, Facebook, etc.)
 Reliability:
 The service can remain active even if multiple machines go down
Challenges
 Concurrency:
 Concurrent execution requires some form of coordination
 Fault-tolerance:
 Any component can fail at any instant due to a software or a
hardware bug
 Security:
 One machine can compromise the entire system
 Coordination:
 No global time so non-trivial to coordinate
 Trouble shooting:
 Hard to trouble shoot because hard to reason about the
system
Introduction to Cloud
Computing
 An emerging IT development, deployment, and delivery model that enables
real-time delivery of a broad range of IT products, services and solutions over
the internet
 A realization of utility computing in which computation, storage, and services
are offered as a metered service
 Grid Computing: form of distributed computing, acting
in concert to perform very large tasks
 Utility Computing: metered service similar to a
traditional public utility
 Autonomic Computing: capable of self-management
 Cloud Computing: deployments as of 2009 depend on
grids, have autonomic characteristics and bill like
utilities
Characteristics
 On-demand self-service: allows users to obtain,
configure and deploy cloud services themselves using
cloud service catalogues, without requiring the
assistance of IT.
 Broad network access: capabilities are available over
the network and accessed through standard
mechanisms that promote use by heterogeneous thin
or thick client platforms
 Resource pooling: The provider‟s computing resources
are pooled to serve multiple consumers using a multi-
tenant model, with different physical and virtual
resources dynamically assigned and reassigned
according to consumer demand.
Characteristics (2)
 Rapid elasticity: Capabilities can be rapidly and
elastically provisioned, in some cases automatically,
to quickly scale out and rapidly released to quickly
scale in. To the consumer, the capabilities available
for provisioning often appear to be unlimited and
can be purchased in any quantity at any time.
 Measured service: Cloud systems automatically
control and optimize resource use by leveraging a
metering capability at some level of abstraction
appropriate to the type of service (e.g., storage,
processing, bandwidth, and active user accounts).
Cloud Service Models
 SaaS – Software as a Service: Network-hosted
application
 PaaS– Platform as a Service: Network-hosted software
development platform
 IaaS – Infrastructure as a Service: Provider hosts
customer VMs or provides network storage
 DaaS – Data as a Service: Customer queries against
provider’s database
 IPMaaS – Identity and Policy Management as a
Service: Provider manages identity and/or access
control policy for customer
 NaaS – Network as a Service: Provider offers virtualized
networks (e.g. VPNs)
Deployment Models
 Private Cloud: infrastructure is operated solely for an
organization.
 Public Cloud: infrastructure is made available to the general
public as a pay-as-you-go model, e.g. Amazon Web Services,
Google AppEngine, and Microsoft Azure
 Community Cloud: infrastructure between several
organizations from a specific community with common
concerns (security, compliance, jurisdiction, etc.), whether
managed internally or by a third-party and hosted internally or
externally.
 Hybrid Cloud: infrastructure is a combination of two or more
clouds(private, community, or public) that remain unique
entities but are bound together by standardized or proprietary
technology that enables data and application portability
between environments.
Private Cloud
Private Outsourced Cloud
Public Cloud
Hybrid Cloud
Advantages
Advantages to both service providers and end users
 Service providers:
 Simplified software installation and maintenance
 Centralized control over versioning
 No need to build, provision, and maintain a datacenter
 On the fly scaling
 End users:
 “Anytime, anywhere” access
 Share data and collaborate easily
 Safeguard data stored in the infrastructure
Obstacles
 Bugs in large-scale distributed systems: Hard to
debug large-scale applications in full deployment
 Scaling quickly: Automatically scaling while
conserving resources and money is an open ended
problem
 Reputation fate sharing: Bad behavior by one
tenant can reflect badly on the rest
 Software licensing: Gap between pay-as-you-go
model and software licensing
Obstacles (2)
 Service availability: Possibility of cloud outage
 Data lock-in: Dependence on cloud specific APIs
 Security: Requires strong encrypted storage, VLANs,
and network middle-boxes (firewalls, etc.)
 Data transfer bottlenecks: Moving large amounts of
data in and out is expensive
 Performance unpredictability: Resource sharing
between applications
 Scalable storage: No standard model to arbitrarily
scale storage up and down on-demand while
ensuring data durability and high availability
Introduction to MapReduce
 A simple programming model that applies to many
large-scale computing problems
 Hide messy details in MR runtime library:
 Automatic parallelization
 Load balancing
 Network and disk transfer optimization
 Handling of machine failures
 Robustness
 Improvements to core library benefit all users of library
Google MapReduce – Idea
 The core idea behind MapReduce is mapping your
data set into a collection of <key, value> pairs, and
then reducing over all pairs with the same key.
 Map
 Apply function to all elements of a list
 square x = x * x;
 Map square [1, 2, 3, 4, 5];
 [1, 4, 9, 16, 25]
 Reduce
 Combine all elements of a list
 Reduce (+)[1, 2, 3, 4, 5];
 15
Google MapReduce –
Overview
MapReduce architecture
 Master: In charge of all meta data, work scheduling
and distribution, and job orchestration
 Workers: Contain slots to execute map or reduce
functions
 Mappers:
 A map worker reads the contents of the input split that it has
been assigned
 It parses the file and converts it to key/value pairs and invokes
the user-defined map function for each pair
 The intermediate key/value pairs after the application of the
map logic are collected (buffered) in memory
 Once the buffered key/value pairs exceed a threshold they are
written to local disk and partitioned (using a partitioning
function) into R partitions. The location of each partition is passed
to the master
MapReduce architecture (2)
 Workers: Contain slots to execute map or reduce
functions
 Reducers:
 A reduce worker gets locations of its input partitions from the
master and uses HTTP requests to retrieve them
 Once it has read all its input, it sorts it by key to group
together all occurrences of the same key
 It then invokes the user-defined reduce for each key and
passes it the key and its associated values
 The key/value pairs generated after the application of the
reduce logic are then written to a final output file, which is
subsequently written to the distributed filesystem
Google File System - GFS
 In-house distributed file system at Google
 Stores all input an output files
 Stores files…
 divided into 64 MB blocks
 on at least 3 different machines
 Machines running GFS also run MapReduce
MapReduce job phases
A MapReduce job can be divided into 4 phases:
 Input split: The input dataset is sliced into M splits, one
per map task
 Map logic: The user-supplied map function is invoked
 In tandem a sort phase is also applied that ensures that
map output is locally sorted by key
 In addition, the key space is also partitioned amongst the
reducers
 Shuffle: Map output is relayed to all reduce tasks
 Reduce logic: The user-provided reduce function is
invoked
 Before the application of the reduce function, the input
keys are merged to get globally sorted key/value pairs
Google MapReduce –
Example
Wordcount map in Java
1. public void map(Object key, Text value , Context
context) {
2. StringTokenizer itr = new
StringTokenizer(value.toString());
3. while (itr.hasMoreTokens()) {
4. word.set(itr.nextToken());
5. context.write(word , one);
6. }
7. }
Wordcount reduce in Java
1. public void reduce(Text key, Iterable <IntWritable >
values ,
Context context) {
2. int sum = 0;
3. for (IntWritable val : values) {
4. sum += val.get();
5. }
6. result.set(sum);
7. context.write(key, result);
8. }
Hadoop
 Open-source implementation of MapReduce,
developed by Doug Cutting originally at Yahoo! in
2004
 Now a top-level Apache open-source project
 Implemented in Java (Google‟s in-house
implementation is in C++)
 Jobs can be written in C++, Java, Python, etc.
 Comes with an associated distributed filesystem,
HDFS (clone of GFS)
Hadoop Components
 Hadoop consists of two core components
– The Hadoop Distributed File System (HDFS)
– MapReduce Software Framework
 There are many other projects based around
core Hadoop
– Often referred to as the „Hadoop
Ecosystem‟
– Pig, Hive, HBase,
Flume, Oozie, Sqoop, etc
Hadoop Users
 Adobe: Several areas from social services to
unstructured data storage and processing
 eBay: 532 nodes cluster storing 5.3PB of data
 Facebook: Used for reporting/analytics; one cluster
with 1100 nodes (12PB) and another with 300 nodes
(3PB)
 LinkedIn: 3 clusters with collectively 4000 nodes
 Twitter: To store and process Tweets and log files
 Yahoo!: Multiple clusters with collectively 40000
nodes; largest cluster has 4500 nodes!
Running a Hadoop Application
 The first order of the day is to format the Hadoop
DFS
 Jump to the Hadoop directory and execute:
bin/hadoop namenode -format
 Running Hadoop
 To run Hadoop and HDFS:
bin/start-all.sh
 To terminate them:
bin/stop-all.sh
Running a Hadoop Application
 Generating a dataset
 Create a temporary directory to hold the data:
 mkdir /tmp/gutenberg
 Jump to it:
 cd /tmp/gutenberg
 Download text files:
 wget www.gutenberg.org/etext/20417
 wget www.gutenberg.org/etext/5000
 wget www.gutenberg.org/etext/4300
Running a Hadoop Application
 Copying the dataset to the HDFS
 Jump to the Hadoop directory and execute:
 bin/hadoop dfs -copyFromLocal /tmp/gutenberg
/ccw/Gutenberg
 Running Wordcount
 bin/hadoop jar hadoop-examples-1.0.4.jar wordcount
/ccw/gutenberg /ccw/gutenberg-output
 Retrieving results from the HDFS
 Copy to the local FS:
 bin/hadoop dfs –getmerge /ccw/gutenberg-output
/tmp/gutenberg-output
Running a Hadoop Application
 Accessing the web interface
 JobTracker: http://localhost:50030
 TaskTracker: http://localhost:50060
 Reference: Running Hadoop on Ubuntu Linux
(Single-Node Cluster):
 http://www.michael-noll.com/tutorials/running-
hadoop-on-ubuntu-linux-single-node-cluster/
Thanks

Contenu connexe

Tendances

Mod05lec22(cloudonomics tutorial)
Mod05lec22(cloudonomics tutorial)Mod05lec22(cloudonomics tutorial)
Mod05lec22(cloudonomics tutorial)Ankit Gupta
 
Week 1 lecture material cc
Week 1 lecture material ccWeek 1 lecture material cc
Week 1 lecture material ccAnkit Gupta
 
Designing Distributed Systems: Google Cas Study
Designing Distributed Systems: Google Cas StudyDesigning Distributed Systems: Google Cas Study
Designing Distributed Systems: Google Cas StudyMeysam Javadi
 
LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY
LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY
LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY ijccsa
 
Distributed Systems Real Life Applications
Distributed Systems Real Life ApplicationsDistributed Systems Real Life Applications
Distributed Systems Real Life ApplicationsAman Srivastava
 
Cluster Computing Seminar.
Cluster Computing Seminar.Cluster Computing Seminar.
Cluster Computing Seminar.Balvant Biradar
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computersshopnil786
 
Week 8 lecture material
Week 8 lecture materialWeek 8 lecture material
Week 8 lecture materialAnkit Gupta
 
Cluster Tutorial
Cluster TutorialCluster Tutorial
Cluster Tutorialcybercbm
 

Tendances (19)

Mod05lec22(cloudonomics tutorial)
Mod05lec22(cloudonomics tutorial)Mod05lec22(cloudonomics tutorial)
Mod05lec22(cloudonomics tutorial)
 
Week 1 lecture material cc
Week 1 lecture material ccWeek 1 lecture material cc
Week 1 lecture material cc
 
Designing Distributed Systems: Google Cas Study
Designing Distributed Systems: Google Cas StudyDesigning Distributed Systems: Google Cas Study
Designing Distributed Systems: Google Cas Study
 
Dbms
DbmsDbms
Dbms
 
Clusters
ClustersClusters
Clusters
 
LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY
LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY
LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY
 
Distributed Systems Real Life Applications
Distributed Systems Real Life ApplicationsDistributed Systems Real Life Applications
Distributed Systems Real Life Applications
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
Cluster Computing Seminar.
Cluster Computing Seminar.Cluster Computing Seminar.
Cluster Computing Seminar.
 
Cloud computing_Final
Cloud computing_FinalCloud computing_Final
Cloud computing_Final
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computers
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
CLUSTER COMPUTING
CLUSTER COMPUTINGCLUSTER COMPUTING
CLUSTER COMPUTING
 
Week 8 lecture material
Week 8 lecture materialWeek 8 lecture material
Week 8 lecture material
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
cluster computing
cluster computingcluster computing
cluster computing
 
Cluster Tutorial
Cluster TutorialCluster Tutorial
Cluster Tutorial
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 

En vedette

Implementing remote procedure calls rev2
Implementing remote procedure calls rev2Implementing remote procedure calls rev2
Implementing remote procedure calls rev2Sung-jae Park
 
Large Graph Processing
Large Graph ProcessingLarge Graph Processing
Large Graph ProcessingZuhair khayyat
 
Introduction to C++ Remote Procedure Call (RPC)
Introduction to C++ Remote Procedure Call (RPC)Introduction to C++ Remote Procedure Call (RPC)
Introduction to C++ Remote Procedure Call (RPC)Abdelrahman Al-Ogail
 
Application architecture for cloud
Application architecture for cloudApplication architecture for cloud
Application architecture for cloudMarco Parenzan
 
Remote procedure call on client server computing
Remote procedure call on client server computingRemote procedure call on client server computing
Remote procedure call on client server computingSatya P. Joshi
 
remote procedure calls
  remote procedure calls  remote procedure calls
remote procedure callsAshish Kumar
 

En vedette (7)

Implementing remote procedure calls rev2
Implementing remote procedure calls rev2Implementing remote procedure calls rev2
Implementing remote procedure calls rev2
 
Session1
Session1Session1
Session1
 
Large Graph Processing
Large Graph ProcessingLarge Graph Processing
Large Graph Processing
 
Introduction to C++ Remote Procedure Call (RPC)
Introduction to C++ Remote Procedure Call (RPC)Introduction to C++ Remote Procedure Call (RPC)
Introduction to C++ Remote Procedure Call (RPC)
 
Application architecture for cloud
Application architecture for cloudApplication architecture for cloud
Application architecture for cloud
 
Remote procedure call on client server computing
Remote procedure call on client server computingRemote procedure call on client server computing
Remote procedure call on client server computing
 
remote procedure calls
  remote procedure calls  remote procedure calls
remote procedure calls
 

Similaire à Cloud computing

Cloud ready reference
Cloud ready referenceCloud ready reference
Cloud ready referenceHelly Patel
 
Cloudmod4
Cloudmod4Cloudmod4
Cloudmod4kongara
 
CLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdf
CLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdfCLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdf
CLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdfyadavkarthik4437
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bankpkaviya
 
Cloud_on_Linux_Operating_System.pdf
Cloud_on_Linux_Operating_System.pdfCloud_on_Linux_Operating_System.pdf
Cloud_on_Linux_Operating_System.pdfPalanikumar72221
 
NSUT_Lecture1_cloud computing[1].pptx
NSUT_Lecture1_cloud computing[1].pptxNSUT_Lecture1_cloud computing[1].pptx
NSUT_Lecture1_cloud computing[1].pptxUtkarshKumar608655
 
Computing notes
Computing notesComputing notes
Computing notesthenraju24
 
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...IOSR Journals
 
26300119031_cloud_computing.ppt
26300119031_cloud_computing.ppt26300119031_cloud_computing.ppt
26300119031_cloud_computing.pptAnkitShaw27
 

Similaire à Cloud computing (20)

Cloud ready reference
Cloud ready referenceCloud ready reference
Cloud ready reference
 
cloud computing
cloud computingcloud computing
cloud computing
 
Cloud Basics.pptx
Cloud Basics.pptxCloud Basics.pptx
Cloud Basics.pptx
 
Cloudmod4
Cloudmod4Cloudmod4
Cloudmod4
 
CLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdf
CLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdfCLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdf
CLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdf
 
Cloud Computing
Cloud Computing Cloud Computing
Cloud Computing
 
Grid Presentation
Grid PresentationGrid Presentation
Grid Presentation
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bank
 
Cloud_on_Linux_Operating_System.pdf
Cloud_on_Linux_Operating_System.pdfCloud_on_Linux_Operating_System.pdf
Cloud_on_Linux_Operating_System.pdf
 
NSUT_Lecture1_cloud computing[1].pptx
NSUT_Lecture1_cloud computing[1].pptxNSUT_Lecture1_cloud computing[1].pptx
NSUT_Lecture1_cloud computing[1].pptx
 
cloud computing basics
cloud computing basicscloud computing basics
cloud computing basics
 
Distributed Systems.pptx
Distributed Systems.pptxDistributed Systems.pptx
Distributed Systems.pptx
 
Distributed system.pptx
Distributed system.pptxDistributed system.pptx
Distributed system.pptx
 
unit 1.pptx
unit 1.pptxunit 1.pptx
unit 1.pptx
 
Cloud.pptx
Cloud.pptxCloud.pptx
Cloud.pptx
 
Computing notes
Computing notesComputing notes
Computing notes
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
H017144148
H017144148H017144148
H017144148
 
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
 
26300119031_cloud_computing.ppt
26300119031_cloud_computing.ppt26300119031_cloud_computing.ppt
26300119031_cloud_computing.ppt
 

Dernier

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Dernier (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 

Cloud computing

  • 2. Outline  Large-scale Distributed Systems  Introduction to Cloud Computing  Cloud Computing paradigms and models  Introduction to MapReduce  Alternative architectures  Writing Application using Hadoop
  • 3. Distributed Systems  Set of discrete machines which cooperate to perform computation  Give the notion of a single “machine”  Keep the distribution transparent  Examples:  Compute clusters  Distributed storage systems, such as Dropbox, Google Drive, etc.  The Web
  • 4. Characteristics  Ordering  Time is used to ensure ordering  In most cases, only need to know that a happened before b, known as the happens-before relation  Distributed Mutual Exclusion  Concurrent access to shared resources needs to be synchronized  Central lock server: All lock requests are handled by a central server  Token passing: Arrange nodes into a ring and a token is passed around  Totally-ordered multicast: Clients multicast requests to each other
  • 5. Characteristics (2)  Distributed transactions  Distributed transactions span multiple transaction processing servers  Actions need to be coordinated across multiple parties  Replication A number of distributed systems involve replication  Data replication: Multiple copies of some object stored at different servers  Computation replication: Multiple servers capable of providing an operation  Advantages 1. Load balancing: Work spread out across clients 2. Lower latency: Better performance if replica close to the client 3. Fault tolerance: Failure of some replicas can be tolerated
  • 6. CAP  CAP  Consistency: All nodes see the same state  Availability: All requests get a response  Partitioning: System continues to operate even in the face of node failure  Brewer‟s conjecture states that in a distributed system only 2 out of 3 possible  In the current setup, partitioning is a given: Hardware/software fails all the time  Therefore, systems need to choose between consistency and availability
  • 7. Advantages  Scalability:  The scale of the Internet (think how many queries Google servers handle daily)  Only a matter of adding more machines  Cheaper than super computers  More machines means more parallelism, hence better performance  Sharing:  The same resource is shared between multiple users  Just like the Internet is shared between millions of users  Communication:  Communication between (potentially geographically isolated) machines and users (via email, Facebook, etc.)  Reliability:  The service can remain active even if multiple machines go down
  • 8. Challenges  Concurrency:  Concurrent execution requires some form of coordination  Fault-tolerance:  Any component can fail at any instant due to a software or a hardware bug  Security:  One machine can compromise the entire system  Coordination:  No global time so non-trivial to coordinate  Trouble shooting:  Hard to trouble shoot because hard to reason about the system
  • 9. Introduction to Cloud Computing  An emerging IT development, deployment, and delivery model that enables real-time delivery of a broad range of IT products, services and solutions over the internet  A realization of utility computing in which computation, storage, and services are offered as a metered service  Grid Computing: form of distributed computing, acting in concert to perform very large tasks  Utility Computing: metered service similar to a traditional public utility  Autonomic Computing: capable of self-management  Cloud Computing: deployments as of 2009 depend on grids, have autonomic characteristics and bill like utilities
  • 10. Characteristics  On-demand self-service: allows users to obtain, configure and deploy cloud services themselves using cloud service catalogues, without requiring the assistance of IT.  Broad network access: capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms  Resource pooling: The provider‟s computing resources are pooled to serve multiple consumers using a multi- tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand.
  • 11. Characteristics (2)  Rapid elasticity: Capabilities can be rapidly and elastically provisioned, in some cases automatically, to quickly scale out and rapidly released to quickly scale in. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be purchased in any quantity at any time.  Measured service: Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts).
  • 12. Cloud Service Models  SaaS – Software as a Service: Network-hosted application  PaaS– Platform as a Service: Network-hosted software development platform  IaaS – Infrastructure as a Service: Provider hosts customer VMs or provides network storage  DaaS – Data as a Service: Customer queries against provider’s database  IPMaaS – Identity and Policy Management as a Service: Provider manages identity and/or access control policy for customer  NaaS – Network as a Service: Provider offers virtualized networks (e.g. VPNs)
  • 13. Deployment Models  Private Cloud: infrastructure is operated solely for an organization.  Public Cloud: infrastructure is made available to the general public as a pay-as-you-go model, e.g. Amazon Web Services, Google AppEngine, and Microsoft Azure  Community Cloud: infrastructure between several organizations from a specific community with common concerns (security, compliance, jurisdiction, etc.), whether managed internally or by a third-party and hosted internally or externally.  Hybrid Cloud: infrastructure is a combination of two or more clouds(private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability between environments.
  • 18. Advantages Advantages to both service providers and end users  Service providers:  Simplified software installation and maintenance  Centralized control over versioning  No need to build, provision, and maintain a datacenter  On the fly scaling  End users:  “Anytime, anywhere” access  Share data and collaborate easily  Safeguard data stored in the infrastructure
  • 19. Obstacles  Bugs in large-scale distributed systems: Hard to debug large-scale applications in full deployment  Scaling quickly: Automatically scaling while conserving resources and money is an open ended problem  Reputation fate sharing: Bad behavior by one tenant can reflect badly on the rest  Software licensing: Gap between pay-as-you-go model and software licensing
  • 20. Obstacles (2)  Service availability: Possibility of cloud outage  Data lock-in: Dependence on cloud specific APIs  Security: Requires strong encrypted storage, VLANs, and network middle-boxes (firewalls, etc.)  Data transfer bottlenecks: Moving large amounts of data in and out is expensive  Performance unpredictability: Resource sharing between applications  Scalable storage: No standard model to arbitrarily scale storage up and down on-demand while ensuring data durability and high availability
  • 21. Introduction to MapReduce  A simple programming model that applies to many large-scale computing problems  Hide messy details in MR runtime library:  Automatic parallelization  Load balancing  Network and disk transfer optimization  Handling of machine failures  Robustness  Improvements to core library benefit all users of library
  • 22. Google MapReduce – Idea  The core idea behind MapReduce is mapping your data set into a collection of <key, value> pairs, and then reducing over all pairs with the same key.  Map  Apply function to all elements of a list  square x = x * x;  Map square [1, 2, 3, 4, 5];  [1, 4, 9, 16, 25]  Reduce  Combine all elements of a list  Reduce (+)[1, 2, 3, 4, 5];  15
  • 24. MapReduce architecture  Master: In charge of all meta data, work scheduling and distribution, and job orchestration  Workers: Contain slots to execute map or reduce functions  Mappers:  A map worker reads the contents of the input split that it has been assigned  It parses the file and converts it to key/value pairs and invokes the user-defined map function for each pair  The intermediate key/value pairs after the application of the map logic are collected (buffered) in memory  Once the buffered key/value pairs exceed a threshold they are written to local disk and partitioned (using a partitioning function) into R partitions. The location of each partition is passed to the master
  • 25. MapReduce architecture (2)  Workers: Contain slots to execute map or reduce functions  Reducers:  A reduce worker gets locations of its input partitions from the master and uses HTTP requests to retrieve them  Once it has read all its input, it sorts it by key to group together all occurrences of the same key  It then invokes the user-defined reduce for each key and passes it the key and its associated values  The key/value pairs generated after the application of the reduce logic are then written to a final output file, which is subsequently written to the distributed filesystem
  • 26. Google File System - GFS  In-house distributed file system at Google  Stores all input an output files  Stores files…  divided into 64 MB blocks  on at least 3 different machines  Machines running GFS also run MapReduce
  • 27. MapReduce job phases A MapReduce job can be divided into 4 phases:  Input split: The input dataset is sliced into M splits, one per map task  Map logic: The user-supplied map function is invoked  In tandem a sort phase is also applied that ensures that map output is locally sorted by key  In addition, the key space is also partitioned amongst the reducers  Shuffle: Map output is relayed to all reduce tasks  Reduce logic: The user-provided reduce function is invoked  Before the application of the reduce function, the input keys are merged to get globally sorted key/value pairs
  • 29. Wordcount map in Java 1. public void map(Object key, Text value , Context context) { 2. StringTokenizer itr = new StringTokenizer(value.toString()); 3. while (itr.hasMoreTokens()) { 4. word.set(itr.nextToken()); 5. context.write(word , one); 6. } 7. }
  • 30. Wordcount reduce in Java 1. public void reduce(Text key, Iterable <IntWritable > values , Context context) { 2. int sum = 0; 3. for (IntWritable val : values) { 4. sum += val.get(); 5. } 6. result.set(sum); 7. context.write(key, result); 8. }
  • 31. Hadoop  Open-source implementation of MapReduce, developed by Doug Cutting originally at Yahoo! in 2004  Now a top-level Apache open-source project  Implemented in Java (Google‟s in-house implementation is in C++)  Jobs can be written in C++, Java, Python, etc.  Comes with an associated distributed filesystem, HDFS (clone of GFS)
  • 32. Hadoop Components  Hadoop consists of two core components – The Hadoop Distributed File System (HDFS) – MapReduce Software Framework  There are many other projects based around core Hadoop – Often referred to as the „Hadoop Ecosystem‟ – Pig, Hive, HBase, Flume, Oozie, Sqoop, etc
  • 33. Hadoop Users  Adobe: Several areas from social services to unstructured data storage and processing  eBay: 532 nodes cluster storing 5.3PB of data  Facebook: Used for reporting/analytics; one cluster with 1100 nodes (12PB) and another with 300 nodes (3PB)  LinkedIn: 3 clusters with collectively 4000 nodes  Twitter: To store and process Tweets and log files  Yahoo!: Multiple clusters with collectively 40000 nodes; largest cluster has 4500 nodes!
  • 34. Running a Hadoop Application  The first order of the day is to format the Hadoop DFS  Jump to the Hadoop directory and execute: bin/hadoop namenode -format  Running Hadoop  To run Hadoop and HDFS: bin/start-all.sh  To terminate them: bin/stop-all.sh
  • 35. Running a Hadoop Application  Generating a dataset  Create a temporary directory to hold the data:  mkdir /tmp/gutenberg  Jump to it:  cd /tmp/gutenberg  Download text files:  wget www.gutenberg.org/etext/20417  wget www.gutenberg.org/etext/5000  wget www.gutenberg.org/etext/4300
  • 36. Running a Hadoop Application  Copying the dataset to the HDFS  Jump to the Hadoop directory and execute:  bin/hadoop dfs -copyFromLocal /tmp/gutenberg /ccw/Gutenberg  Running Wordcount  bin/hadoop jar hadoop-examples-1.0.4.jar wordcount /ccw/gutenberg /ccw/gutenberg-output  Retrieving results from the HDFS  Copy to the local FS:  bin/hadoop dfs –getmerge /ccw/gutenberg-output /tmp/gutenberg-output
  • 37. Running a Hadoop Application  Accessing the web interface  JobTracker: http://localhost:50030  TaskTracker: http://localhost:50060  Reference: Running Hadoop on Ubuntu Linux (Single-Node Cluster):  http://www.michael-noll.com/tutorials/running- hadoop-on-ubuntu-linux-single-node-cluster/