SlideShare a Scribd company logo
1 of 9
PetaByte Scale Computing on
 Amazon EC2 with Big Data
         Vishal Malik
        Head Cloud CoE
          Cognizant
PetaByte scale computing on
                    Amazon EC2 with BigData




©2011, Cognizant
Some Background..


•Only 5% of data on the web today is structured

•Challenge would be cutting through the noise!
   Ability to process huge data, filter at scale.
   Turning raw unstructured data into insights using ML etc.
   Adding relevance to data via personalizing content.
   Analyzing data by applying ML about what user likes and give more
   of it. (driving online-ad revenues for example)

•By 2013, we’ll have 650 Exabyte's on internet!

•Sentimental analysis in real-time will become more prevalent.

•The need to process 40+ TB (compressed) data/day by single
 organization will become more prevalent.




2   | ©2011, Cognizant
The Challenge?

How to scale without significant increase in the
 infrastructure cost (processing & storage).

How to do Analytics near real-time, as opposed to
 guess work!
   Process 5TB+ (uncompressed data) in less
    than 1 minute! (today it can take ~ an hour)

People are asking new questions everyday, hence the
 need to have all data in DWH and ability to answer
 these questions near real-time. (Agile BI!)

Feedback loop presenting data in line with user
 preferences.

3   | ©2011, Cognizant
rDBMS: The “good and the bad”

Good for:
  Relational data transactions

Bad for:
  Queues, polling, caching
  Social graph tree traversal, NxN relationships
  Don’t require ACID for everything!
  Not good for scaling to PetaBytes of data

Traditional SQL based systems have:
  Replication delay & cache eviction produce
    inconsistent results to the at end-user.
  Slow (single threaded)
  Locks create contention for popular data hence
     can’t scale to PetaBytes
4   | ©2011, Cognizant
Solution?


Cost effective way to
   Process data and,
   Store data

Processing side: One of the most popular ones are:

       Use Hadoop (Open Source MR framework) for back-end
        distributed processing.
       Build a sql-like (lightweight) layer on top of Hadoop.
       Access time is in micro-seconds, moving towards
        near-real time!

Storage side: Popular and very stables ones are:

       Use S3, SimpleDB (from Amazon’s AWS) etc
       Private cloud using NoSQL db’s namely Hbase,
        CouchDB, MongoDB, Riak, Redis etc

5   | ©2011, Cognizant
Current State of Storage Tiering




                                                                   Solutions
              Existing                Innovation?
                                                                   required
             Solutions
                                  Customers are asking
 •     Only h/w based option                                •   Easy to manage
 •     Cost of implementation
                                  for storage solutions         storage
       is very high assuming      that are cheaper and      •   East to implement
       RAID 6, RAID 10+0          easier to implement,          storage systems.
       and other costly                understand.          •   Have a say in policies
       options .                                                set to move data
 •     H/W based solution not       “We’re seeing a big         wherever required at
       user friendly and          opportunity to position       the disk level.
       policies set are                                     •   Visibility of what is
       transparent to the user.
                                   iMoveS where data is         happening to my data
 •     Purely based on disk         growing significantly       and how/where it is
       storage hardware from          along with cost”          stored
       support perspective.                                 •   Better control over
 •     Storing 7TB in 6 hours                                   where/how/what is
       or less is not possible                                  stored on my storage
       using current disks                                      systems.
       with 80MBytes/sec
       write rate.
     | ©2011, Cognizant
NoSQL DataStores…

Make Storing/Retrieving
 of information easier to                       All done using
 manage & use
     Based on access pattern, migrate
                                                iMoveS Engine
      data to the right storage engine      •   S/W based
      based on pre-set policies. E.g. <
      10% writes go to Hbase. > 50GB            checkpoint system
      stores go to HBase. < 50GB go to      •   Policy based data
      MongoDB.
     Understand access patterns to
                                                object migration
      refine and retune policies under      •   Policy based data
      which data migration happens              access/storage
                                            •   Extreme Scalability
Make S/W based storage
                                            •   Great for machine
 engine do all the
 intelligent work                               generated data for
       Performance gains                       analysis.
       High availability
       Administration & monitoring
       Low cost/gigabytes
         Anyone should be able to store data and not worry about
                      replication, RAID, mirroring.

  | ©2011, Cognizant
Thank You




8   | ©2011, Cognizant

More Related Content

What's hot

Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
NAILBITER
 

What's hot (20)

Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBase
 
AI for Intelligent Cloud and Intelligent Edge: Discover, Deploy, and Manage w...
AI for Intelligent Cloud and Intelligent Edge:Discover, Deploy, and Manage w...AI for Intelligent Cloud and Intelligent Edge:Discover, Deploy, and Manage w...
AI for Intelligent Cloud and Intelligent Edge: Discover, Deploy, and Manage w...
 
Microsoft Azure
Microsoft AzureMicrosoft Azure
Microsoft Azure
 
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
 
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
 
Ppt on cloud service
Ppt on cloud servicePpt on cloud service
Ppt on cloud service
 
IBM Storage for Analytics, Cognitive and Cloud
IBM Storage for Analytics, Cognitive and CloudIBM Storage for Analytics, Cognitive and Cloud
IBM Storage for Analytics, Cognitive and Cloud
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Cloud computing What Why How
Cloud computing What Why HowCloud computing What Why How
Cloud computing What Why How
 
Azure Hd insigth news
Azure Hd insigth newsAzure Hd insigth news
Azure Hd insigth news
 
Microsoft Azure update
Microsoft Azure updateMicrosoft Azure update
Microsoft Azure update
 
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
 
Microsoft azure infrastructure essentials course manual
Microsoft azure infrastructure essentials   course manualMicrosoft azure infrastructure essentials   course manual
Microsoft azure infrastructure essentials course manual
 
IBM Public Cloud Platform Nov 2021
IBM Public Cloud Platform Nov 2021IBM Public Cloud Platform Nov 2021
IBM Public Cloud Platform Nov 2021
 
Module 3 - QuickSight Overview
Module 3 - QuickSight OverviewModule 3 - QuickSight Overview
Module 3 - QuickSight Overview
 
Designing Artificial Intelligence
Designing Artificial IntelligenceDesigning Artificial Intelligence
Designing Artificial Intelligence
 
Designing big data analytics solutions on azure
Designing big data analytics solutions on azureDesigning big data analytics solutions on azure
Designing big data analytics solutions on azure
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 
IBM Cloud pak for data brochure
IBM Cloud pak for data   brochureIBM Cloud pak for data   brochure
IBM Cloud pak for data brochure
 

Viewers also liked

Open source and standards - unleashing the potential for innovation of cloud ...
Open source and standards - unleashing the potential for innovation of cloud ...Open source and standards - unleashing the potential for innovation of cloud ...
Open source and standards - unleashing the potential for innovation of cloud ...
Ignacio M. Llorente
 
Zuora @ AlwaysOn 2012 - The Only 3 SaaS Metrics That Matter
Zuora @ AlwaysOn 2012 - The Only 3 SaaS Metrics That MatterZuora @ AlwaysOn 2012 - The Only 3 SaaS Metrics That Matter
Zuora @ AlwaysOn 2012 - The Only 3 SaaS Metrics That Matter
Zuora, Inc.
 
Leaders in the Cloud: Identifying Cloud Business Value for Customers
Leaders in the Cloud: Identifying Cloud Business Value for CustomersLeaders in the Cloud: Identifying Cloud Business Value for Customers
Leaders in the Cloud: Identifying Cloud Business Value for Customers
OpSource
 

Viewers also liked (16)

Getting Started with Amazon CloudSearch
Getting Started with Amazon CloudSearchGetting Started with Amazon CloudSearch
Getting Started with Amazon CloudSearch
 
Building the European Cloud Computing Strategy
Building the European Cloud Computing StrategyBuilding the European Cloud Computing Strategy
Building the European Cloud Computing Strategy
 
Open source and standards - unleashing the potential for innovation of cloud ...
Open source and standards - unleashing the potential for innovation of cloud ...Open source and standards - unleashing the potential for innovation of cloud ...
Open source and standards - unleashing the potential for innovation of cloud ...
 
Architectures for open and scalable clouds
Architectures for open and scalable cloudsArchitectures for open and scalable clouds
Architectures for open and scalable clouds
 
Cloud is such stuff as dreams are made on
Cloud is such stuff as dreams are made onCloud is such stuff as dreams are made on
Cloud is such stuff as dreams are made on
 
Cloud Computing and Enterprise Architecture
Cloud Computing and Enterprise ArchitectureCloud Computing and Enterprise Architecture
Cloud Computing and Enterprise Architecture
 
ClientSummit2010_CloudWorkshop
ClientSummit2010_CloudWorkshopClientSummit2010_CloudWorkshop
ClientSummit2010_CloudWorkshop
 
High Performance Web Applications
High Performance Web ApplicationsHigh Performance Web Applications
High Performance Web Applications
 
Zuora @ AlwaysOn 2012 - The Only 3 SaaS Metrics That Matter
Zuora @ AlwaysOn 2012 - The Only 3 SaaS Metrics That MatterZuora @ AlwaysOn 2012 - The Only 3 SaaS Metrics That Matter
Zuora @ AlwaysOn 2012 - The Only 3 SaaS Metrics That Matter
 
Enterprise 2.0 Summit 2012 Closing Keynote - Next-Generation Ecosystems And i...
Enterprise 2.0 Summit 2012 Closing Keynote - Next-Generation Ecosystems And i...Enterprise 2.0 Summit 2012 Closing Keynote - Next-Generation Ecosystems And i...
Enterprise 2.0 Summit 2012 Closing Keynote - Next-Generation Ecosystems And i...
 
Scaling the Cloud - Cloud Security
Scaling the Cloud - Cloud SecurityScaling the Cloud - Cloud Security
Scaling the Cloud - Cloud Security
 
Cloud Computing for Enterprise Architects
Cloud Computing for Enterprise ArchitectsCloud Computing for Enterprise Architects
Cloud Computing for Enterprise Architects
 
Getting an open systems cloud strategy right the first time linthicm
Getting an open systems cloud strategy right the first time linthicmGetting an open systems cloud strategy right the first time linthicm
Getting an open systems cloud strategy right the first time linthicm
 
Cloud Computing Integration Introduction
Cloud Computing Integration IntroductionCloud Computing Integration Introduction
Cloud Computing Integration Introduction
 
Hadoop and DynamoDB
Hadoop and DynamoDBHadoop and DynamoDB
Hadoop and DynamoDB
 
Leaders in the Cloud: Identifying Cloud Business Value for Customers
Leaders in the Cloud: Identifying Cloud Business Value for CustomersLeaders in the Cloud: Identifying Cloud Business Value for Customers
Leaders in the Cloud: Identifying Cloud Business Value for Customers
 

Similar to AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigData - Vishal Malik, Cognizant

start_your_datacenter_sds_v3
start_your_datacenter_sds_v3start_your_datacenter_sds_v3
start_your_datacenter_sds_v3
David Byte
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
Adi Challa
 
Journey to the Programmable Data Center
Journey to the Programmable Data CenterJourney to the Programmable Data Center
Journey to the Programmable Data Center
Toby Weiss
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
elliando dias
 

Similar to AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigData - Vishal Malik, Cognizant (20)

#MFSummit2016 Operate: The race for space
#MFSummit2016 Operate: The race for space#MFSummit2016 Operate: The race for space
#MFSummit2016 Operate: The race for space
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
 
start_your_datacenter_sds_v3
start_your_datacenter_sds_v3start_your_datacenter_sds_v3
start_your_datacenter_sds_v3
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
TECHunplugged Austin 2016
TECHunplugged Austin 2016TECHunplugged Austin 2016
TECHunplugged Austin 2016
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Make a Move to AWS Now
Make a Move to AWS Now Make a Move to AWS Now
Make a Move to AWS Now
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
 
Journey to the Programmable Data Center
Journey to the Programmable Data CenterJourney to the Programmable Data Center
Journey to the Programmable Data Center
 
Kognitio overview april 2013
Kognitio overview april 2013Kognitio overview april 2013
Kognitio overview april 2013
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
Iod session 3423 analytics patterns of expertise, the fast path to amazing ...
Iod session 3423   analytics patterns of expertise, the fast path to amazing ...Iod session 3423   analytics patterns of expertise, the fast path to amazing ...
Iod session 3423 analytics patterns of expertise, the fast path to amazing ...
 
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCOCloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
 
Accelerate Your Migration to "Application-Centric" Storage-as-a-Service from ...
Accelerate Your Migration to "Application-Centric" Storage-as-a-Service from ...Accelerate Your Migration to "Application-Centric" Storage-as-a-Service from ...
Accelerate Your Migration to "Application-Centric" Storage-as-a-Service from ...
 
Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAchieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud World
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Recently uploaded (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigData - Vishal Malik, Cognizant

  • 1. PetaByte Scale Computing on Amazon EC2 with Big Data Vishal Malik Head Cloud CoE Cognizant
  • 2. PetaByte scale computing on Amazon EC2 with BigData ©2011, Cognizant
  • 3. Some Background.. •Only 5% of data on the web today is structured •Challenge would be cutting through the noise! Ability to process huge data, filter at scale. Turning raw unstructured data into insights using ML etc. Adding relevance to data via personalizing content. Analyzing data by applying ML about what user likes and give more of it. (driving online-ad revenues for example) •By 2013, we’ll have 650 Exabyte's on internet! •Sentimental analysis in real-time will become more prevalent. •The need to process 40+ TB (compressed) data/day by single organization will become more prevalent. 2 | ©2011, Cognizant
  • 4. The Challenge? How to scale without significant increase in the infrastructure cost (processing & storage). How to do Analytics near real-time, as opposed to guess work! Process 5TB+ (uncompressed data) in less than 1 minute! (today it can take ~ an hour) People are asking new questions everyday, hence the need to have all data in DWH and ability to answer these questions near real-time. (Agile BI!) Feedback loop presenting data in line with user preferences. 3 | ©2011, Cognizant
  • 5. rDBMS: The “good and the bad” Good for: Relational data transactions Bad for: Queues, polling, caching Social graph tree traversal, NxN relationships Don’t require ACID for everything! Not good for scaling to PetaBytes of data Traditional SQL based systems have: Replication delay & cache eviction produce inconsistent results to the at end-user. Slow (single threaded) Locks create contention for popular data hence can’t scale to PetaBytes 4 | ©2011, Cognizant
  • 6. Solution? Cost effective way to Process data and, Store data Processing side: One of the most popular ones are: Use Hadoop (Open Source MR framework) for back-end distributed processing. Build a sql-like (lightweight) layer on top of Hadoop. Access time is in micro-seconds, moving towards near-real time! Storage side: Popular and very stables ones are: Use S3, SimpleDB (from Amazon’s AWS) etc Private cloud using NoSQL db’s namely Hbase, CouchDB, MongoDB, Riak, Redis etc 5 | ©2011, Cognizant
  • 7. Current State of Storage Tiering Solutions Existing Innovation? required Solutions Customers are asking • Only h/w based option • Easy to manage • Cost of implementation for storage solutions storage is very high assuming that are cheaper and • East to implement RAID 6, RAID 10+0 easier to implement, storage systems. and other costly understand. • Have a say in policies options . set to move data • H/W based solution not “We’re seeing a big wherever required at user friendly and opportunity to position the disk level. policies set are • Visibility of what is transparent to the user. iMoveS where data is happening to my data • Purely based on disk growing significantly and how/where it is storage hardware from along with cost” stored support perspective. • Better control over • Storing 7TB in 6 hours where/how/what is or less is not possible stored on my storage using current disks systems. with 80MBytes/sec write rate. | ©2011, Cognizant
  • 8. NoSQL DataStores… Make Storing/Retrieving of information easier to All done using manage & use  Based on access pattern, migrate iMoveS Engine data to the right storage engine • S/W based based on pre-set policies. E.g. < 10% writes go to Hbase. > 50GB checkpoint system stores go to HBase. < 50GB go to • Policy based data MongoDB.  Understand access patterns to object migration refine and retune policies under • Policy based data which data migration happens access/storage • Extreme Scalability Make S/W based storage • Great for machine engine do all the intelligent work generated data for  Performance gains analysis.  High availability  Administration & monitoring  Low cost/gigabytes Anyone should be able to store data and not worry about replication, RAID, mirroring. | ©2011, Cognizant
  • 9. Thank You 8 | ©2011, Cognizant