More Related Content Similar to AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigData - Vishal Malik, Cognizant (20) More from Amazon Web Services (20) AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigData - Vishal Malik, Cognizant3. Some Background..
•Only 5% of data on the web today is structured
•Challenge would be cutting through the noise!
Ability to process huge data, filter at scale.
Turning raw unstructured data into insights using ML etc.
Adding relevance to data via personalizing content.
Analyzing data by applying ML about what user likes and give more
of it. (driving online-ad revenues for example)
•By 2013, we’ll have 650 Exabyte's on internet!
•Sentimental analysis in real-time will become more prevalent.
•The need to process 40+ TB (compressed) data/day by single
organization will become more prevalent.
2 | ©2011, Cognizant
4. The Challenge?
How to scale without significant increase in the
infrastructure cost (processing & storage).
How to do Analytics near real-time, as opposed to
guess work!
Process 5TB+ (uncompressed data) in less
than 1 minute! (today it can take ~ an hour)
People are asking new questions everyday, hence the
need to have all data in DWH and ability to answer
these questions near real-time. (Agile BI!)
Feedback loop presenting data in line with user
preferences.
3 | ©2011, Cognizant
5. rDBMS: The “good and the bad”
Good for:
Relational data transactions
Bad for:
Queues, polling, caching
Social graph tree traversal, NxN relationships
Don’t require ACID for everything!
Not good for scaling to PetaBytes of data
Traditional SQL based systems have:
Replication delay & cache eviction produce
inconsistent results to the at end-user.
Slow (single threaded)
Locks create contention for popular data hence
can’t scale to PetaBytes
4 | ©2011, Cognizant
6. Solution?
Cost effective way to
Process data and,
Store data
Processing side: One of the most popular ones are:
Use Hadoop (Open Source MR framework) for back-end
distributed processing.
Build a sql-like (lightweight) layer on top of Hadoop.
Access time is in micro-seconds, moving towards
near-real time!
Storage side: Popular and very stables ones are:
Use S3, SimpleDB (from Amazon’s AWS) etc
Private cloud using NoSQL db’s namely Hbase,
CouchDB, MongoDB, Riak, Redis etc
5 | ©2011, Cognizant
7. Current State of Storage Tiering
Solutions
Existing Innovation?
required
Solutions
Customers are asking
• Only h/w based option • Easy to manage
• Cost of implementation
for storage solutions storage
is very high assuming that are cheaper and • East to implement
RAID 6, RAID 10+0 easier to implement, storage systems.
and other costly understand. • Have a say in policies
options . set to move data
• H/W based solution not “We’re seeing a big wherever required at
user friendly and opportunity to position the disk level.
policies set are • Visibility of what is
transparent to the user.
iMoveS where data is happening to my data
• Purely based on disk growing significantly and how/where it is
storage hardware from along with cost” stored
support perspective. • Better control over
• Storing 7TB in 6 hours where/how/what is
or less is not possible stored on my storage
using current disks systems.
with 80MBytes/sec
write rate.
| ©2011, Cognizant
8. NoSQL DataStores…
Make Storing/Retrieving
of information easier to All done using
manage & use
Based on access pattern, migrate
iMoveS Engine
data to the right storage engine • S/W based
based on pre-set policies. E.g. <
10% writes go to Hbase. > 50GB checkpoint system
stores go to HBase. < 50GB go to • Policy based data
MongoDB.
Understand access patterns to
object migration
refine and retune policies under • Policy based data
which data migration happens access/storage
• Extreme Scalability
Make S/W based storage
• Great for machine
engine do all the
intelligent work generated data for
Performance gains analysis.
High availability
Administration & monitoring
Low cost/gigabytes
Anyone should be able to store data and not worry about
replication, RAID, mirroring.
| ©2011, Cognizant