SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
[[The Wikibon Project]]
Big Data and Hadoop: Key Drivers,
Ecosystem and Use Cases
November 2011
© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
What is Big Data?
2
Big Data n Data sets whose size, type
and/or speed make them impractical
to process and analyze with traditional
database technologies and related data
management tools.
© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Why is Big Data Important?
3
Big  Data  is  the  new  de.initive  source  
of  competitive  advantage  across  
industries  …
…  For  those  organizations  that  
embrace  Big  Data,  the  possibilities  
for  innovation,  improved  agility,  and  
increased  pro.itability  are  nearly  
endless.
© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Three Key Big Data Drivers
4
1.  Volume, Variety, Velocity
2.  Hardware Commoditization
3.  Cloud Computing
© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Characteristics of Big Data
5
© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Sources of Big Data
6
© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Hadoop
7
Open source framework for processing, storing
and analyzing Big Data.
Fundamental concept: Rather than banging
away at one, huge block of data with a single
machine, Hadoop breaks up Big Data into
multiple parts so each part can be processed
and analyzed in parallel.
© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Hadoop: The Pros and Cons
8
First the pros … Hadoop is a time- and
cost-effective approach to store,
process and analyze large volumes of
unstructured data allowing for new and
unprecedented types of analytics.
Now the cons … Hadoop is complex and
difficult to deploy and manage; there’s a
dearth of Hadoop-savvy engineers and
Data Scientists on the job market; the
risk of forking and vendor lock-in
remains.
© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Hadoop: The Pros and Cons cont.
9
More pros … Many bright minds contributing to
Hadoop resulting in rapid development and an
ecosystem of vendors emerging to make Hadoop
enterprise-ready.
© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
The Big Data Ecosystem
10
© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Big Data Pioneers
11
•  Largest Hadoop instance
on the planet … 40,000
nodes handling 200+ PB
of data.
•  Used to support research
for ad systems and Web
search.
•  Match ads with users,
detect spam in Yahoo!
Mail, pick relevant top
stories.
© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Big Data Pioneers cont.
12
•  Two major clusters processing and
storing over 30 PB of data.
•  Uses HDFS to store copies of
internal log and dimension data.
•  Developed Hive to
perform large-scale
analytics on user data.
•  Using HBase to store,
manage and retrieve
Facebook Messenger
data.
© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Big Data Pioneers cont.
13
•  Uses Hadoop to support “People You May Know” feature.
•  Tailors its search engine to return most relevant results
for recruiters, employers and job seekers.
•  Created a visualization tool to allow users to explore their
professional network to discover hidden patterns.
© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Big Data in Financial Services
14
•  Over 30,000 databases and 15,000 applications
spread across 7 business units.
•  Using Hadoop as the basis of its Common Data
Platform.
•  Looking to establish 360 degree view of customer
for upsell and cross-sell opportunities.
© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Big Data in Financial Services cont.
15
•  Risk management and analysis to understand
financial exposure.
•  Detecting fraudulent transactions and potentially
criminal activity.
•  Conduct sentiment analysis on social media data.
© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Thank You
16
Jeffrey F. Kelly
Principal Research Contributor
The Wikibon Project
jeff.kelly@wikibon.org
@jeffreyfkelly
www.wikibon.org
www.siliconangle.com

Contenu connexe

Tendances

Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analyticsjoshwills
 
Data Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringData Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringAnant Corporation
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real WorldMark Kromer
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataDataWorks Summit/Hadoop Summit
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016StampedeCon
 
Optimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudOptimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudQubole
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConfQubole
 
Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014Rajan Kanitkar
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightGert Drapers
 
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonUsing Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonMapR Technologies
 
Data & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureData & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureNiels Naglé
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
 
How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016StampedeCon
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoopRommel Garcia
 
The Microsoft BigData Story
The Microsoft BigData StoryThe Microsoft BigData Story
The Microsoft BigData StoryLynn Langit
 
Democratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn CreatorDemocratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn CreatorDatabricks
 
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Qubole
 

Tendances (20)

Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
 
Data Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringData Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data Engineering
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real World
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch data
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 
Optimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudOptimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public Cloud
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Interactive query using hadoop
Interactive query using hadoopInteractive query using hadoop
Interactive query using hadoop
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConf
 
Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonUsing Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
 
Data & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureData & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architecture
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoop
 
The Microsoft BigData Story
The Microsoft BigData StoryThe Microsoft BigData Story
The Microsoft BigData Story
 
Democratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn CreatorDemocratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn Creator
 
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
 

En vedette

Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Perficient, Inc.
 
Towards Neuro–Information Science
Towards Neuro–Information ScienceTowards Neuro–Information Science
Towards Neuro–Information Sciencejacekg
 
KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY- SCOPE,THEORIES AND...
KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY-  SCOPE,THEORIES AND...KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY-  SCOPE,THEORIES AND...
KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY- SCOPE,THEORIES AND...Dr. Raju M. Mathew
 
Big data + data science startup focus points
Big data + data science startup focus pointsBig data + data science startup focus points
Big data + data science startup focus pointsTom Zorde
 
Sharing & Sustaining Ecosystem Data
Sharing & Sustaining Ecosystem DataSharing & Sustaining Ecosystem Data
Sharing & Sustaining Ecosystem DataTERN Australia
 
Semiotics and Information Science
Semiotics and Information ScienceSemiotics and Information Science
Semiotics and Information ScienceFlorence Paisey
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystemSlideCentral
 
Real time data services
Real time data servicesReal time data services
Real time data servicesRelevate
 
Real Time Big Data
Real Time Big DataReal Time Big Data
Real Time Big DataInfoFarm
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystemmagda3695
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data EcosystemIvo Vachkov
 
Earley Executive Roundtable - Building a Digital Transformation Roadmap
Earley Executive Roundtable - Building a Digital Transformation RoadmapEarley Executive Roundtable - Building a Digital Transformation Roadmap
Earley Executive Roundtable - Building a Digital Transformation RoadmapEarley Information Science
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
 
Conceptions of information science
Conceptions of information scienceConceptions of information science
Conceptions of information scienceJorge Prado
 
J.M. Díaz Nafría: Science of Information: Emergence and evolution of meaning
J.M. Díaz Nafría: Science of Information: Emergence and evolution of meaningJ.M. Díaz Nafría: Science of Information: Emergence and evolution of meaning
J.M. Díaz Nafría: Science of Information: Emergence and evolution of meaningJosé Nafría
 
Data Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceData Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceJian Qin
 
Share: Science Information Life Cycle
Share: Science Information Life CycleShare: Science Information Life Cycle
Share: Science Information Life Cyclekauberry
 
Information, Science, and Society
Information, Science, and SocietyInformation, Science, and Society
Information, Science, and SocietyMelanie Swan
 

En vedette (20)

Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
Towards Neuro–Information Science
Towards Neuro–Information ScienceTowards Neuro–Information Science
Towards Neuro–Information Science
 
KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY- SCOPE,THEORIES AND...
KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY-  SCOPE,THEORIES AND...KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY-  SCOPE,THEORIES AND...
KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY- SCOPE,THEORIES AND...
 
Big data + data science startup focus points
Big data + data science startup focus pointsBig data + data science startup focus points
Big data + data science startup focus points
 
Sharing & Sustaining Ecosystem Data
Sharing & Sustaining Ecosystem DataSharing & Sustaining Ecosystem Data
Sharing & Sustaining Ecosystem Data
 
Semiotics and Information Science
Semiotics and Information ScienceSemiotics and Information Science
Semiotics and Information Science
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Real time data services
Real time data servicesReal time data services
Real time data services
 
Real Time Big Data
Real Time Big DataReal Time Big Data
Real Time Big Data
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Earley Executive Roundtable - Building a Digital Transformation Roadmap
Earley Executive Roundtable - Building a Digital Transformation RoadmapEarley Executive Roundtable - Building a Digital Transformation Roadmap
Earley Executive Roundtable - Building a Digital Transformation Roadmap
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 
Conceptions of information science
Conceptions of information scienceConceptions of information science
Conceptions of information science
 
J.M. Díaz Nafría: Science of Information: Emergence and evolution of meaning
J.M. Díaz Nafría: Science of Information: Emergence and evolution of meaningJ.M. Díaz Nafría: Science of Information: Emergence and evolution of meaning
J.M. Díaz Nafría: Science of Information: Emergence and evolution of meaning
 
Data Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceData Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information Science
 
Real-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the AgencyReal-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the Agency
 
Share: Science Information Life Cycle
Share: Science Information Life CycleShare: Science Information Life Cycle
Share: Science Information Life Cycle
 
Information, Science, and Society
Information, Science, and SocietyInformation, Science, and Society
Information, Science, and Society
 

Similaire à Big Data and Hadoop - key drivers, ecosystem and use cases

re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.Shakir Ali
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.Shakir Ali
 
Introduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemIntroduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemPetr Novotný
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldCA Technologies
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreSoftweb Solutions
 
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and ToolsIRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and ToolsIRJET Journal
 
Big data seminor
Big data seminorBig data seminor
Big data seminorberasrujana
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaSkillspeed
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptxkalai75
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks
 
Big Data on Public Cloud
Big Data on Public CloudBig Data on Public Cloud
Big Data on Public CloudIMC Institute
 

Similaire à Big Data and Hadoop - key drivers, ecosystem and use cases (20)

re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.
 
Introduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemIntroduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 System
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
Big data
Big dataBig data
Big data
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big Data
Big DataBig Data
Big Data
 
bigdata.pptx
bigdata.pptxbigdata.pptx
bigdata.pptx
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and ToolsIRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data
Big DataBig Data
Big Data
 
Big data seminor
Big data seminorBig data seminor
Big data seminor
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
 
Big Data on Public Cloud
Big Data on Public CloudBig Data on Public Cloud
Big Data on Public Cloud
 

Plus de Jeff Kelly

CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...
CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...
CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...Jeff Kelly
 
Wikibon Barclays Disruptive Tech Call - November 2014
Wikibon Barclays Disruptive Tech Call - November 2014Wikibon Barclays Disruptive Tech Call - November 2014
Wikibon Barclays Disruptive Tech Call - November 2014Jeff Kelly
 
Wikibon Big Data Capital Markets Day 2014
Wikibon Big Data Capital Markets Day 2014Wikibon Big Data Capital Markets Day 2014
Wikibon Big Data Capital Markets Day 2014Jeff Kelly
 
Democratizing Big Data (Updated)
Democratizing Big Data (Updated)Democratizing Big Data (Updated)
Democratizing Big Data (Updated)Jeff Kelly
 
The business value of Big Data
The business value of Big DataThe business value of Big Data
The business value of Big DataJeff Kelly
 
Create your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouseCreate your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouseJeff Kelly
 
Democratizing Big Data
Democratizing Big DataDemocratizing Big Data
Democratizing Big DataJeff Kelly
 

Plus de Jeff Kelly (7)

CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...
CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...
CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...
 
Wikibon Barclays Disruptive Tech Call - November 2014
Wikibon Barclays Disruptive Tech Call - November 2014Wikibon Barclays Disruptive Tech Call - November 2014
Wikibon Barclays Disruptive Tech Call - November 2014
 
Wikibon Big Data Capital Markets Day 2014
Wikibon Big Data Capital Markets Day 2014Wikibon Big Data Capital Markets Day 2014
Wikibon Big Data Capital Markets Day 2014
 
Democratizing Big Data (Updated)
Democratizing Big Data (Updated)Democratizing Big Data (Updated)
Democratizing Big Data (Updated)
 
The business value of Big Data
The business value of Big DataThe business value of Big Data
The business value of Big Data
 
Create your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouseCreate your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouse
 
Democratizing Big Data
Democratizing Big DataDemocratizing Big Data
Democratizing Big Data
 

Dernier

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 

Dernier (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 

Big Data and Hadoop - key drivers, ecosystem and use cases

  • 1. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org [[The Wikibon Project]] Big Data and Hadoop: Key Drivers, Ecosystem and Use Cases November 2011
  • 2. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org What is Big Data? 2 Big Data n Data sets whose size, type and/or speed make them impractical to process and analyze with traditional database technologies and related data management tools.
  • 3. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Why is Big Data Important? 3 Big  Data  is  the  new  de.initive  source   of  competitive  advantage  across   industries  … …  For  those  organizations  that   embrace  Big  Data,  the  possibilities   for  innovation,  improved  agility,  and   increased  pro.itability  are  nearly   endless.
  • 4. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Three Key Big Data Drivers 4 1.  Volume, Variety, Velocity 2.  Hardware Commoditization 3.  Cloud Computing
  • 5. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Characteristics of Big Data 5
  • 6. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Sources of Big Data 6
  • 7. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Hadoop 7 Open source framework for processing, storing and analyzing Big Data. Fundamental concept: Rather than banging away at one, huge block of data with a single machine, Hadoop breaks up Big Data into multiple parts so each part can be processed and analyzed in parallel.
  • 8. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Hadoop: The Pros and Cons 8 First the pros … Hadoop is a time- and cost-effective approach to store, process and analyze large volumes of unstructured data allowing for new and unprecedented types of analytics. Now the cons … Hadoop is complex and difficult to deploy and manage; there’s a dearth of Hadoop-savvy engineers and Data Scientists on the job market; the risk of forking and vendor lock-in remains.
  • 9. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Hadoop: The Pros and Cons cont. 9 More pros … Many bright minds contributing to Hadoop resulting in rapid development and an ecosystem of vendors emerging to make Hadoop enterprise-ready.
  • 10. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org The Big Data Ecosystem 10
  • 11. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Big Data Pioneers 11 •  Largest Hadoop instance on the planet … 40,000 nodes handling 200+ PB of data. •  Used to support research for ad systems and Web search. •  Match ads with users, detect spam in Yahoo! Mail, pick relevant top stories.
  • 12. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Big Data Pioneers cont. 12 •  Two major clusters processing and storing over 30 PB of data. •  Uses HDFS to store copies of internal log and dimension data. •  Developed Hive to perform large-scale analytics on user data. •  Using HBase to store, manage and retrieve Facebook Messenger data.
  • 13. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Big Data Pioneers cont. 13 •  Uses Hadoop to support “People You May Know” feature. •  Tailors its search engine to return most relevant results for recruiters, employers and job seekers. •  Created a visualization tool to allow users to explore their professional network to discover hidden patterns.
  • 14. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Big Data in Financial Services 14 •  Over 30,000 databases and 15,000 applications spread across 7 business units. •  Using Hadoop as the basis of its Common Data Platform. •  Looking to establish 360 degree view of customer for upsell and cross-sell opportunities.
  • 15. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Big Data in Financial Services cont. 15 •  Risk management and analysis to understand financial exposure. •  Detecting fraudulent transactions and potentially criminal activity. •  Conduct sentiment analysis on social media data.
  • 16. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Thank You 16 Jeffrey F. Kelly Principal Research Contributor The Wikibon Project jeff.kelly@wikibon.org @jeffreyfkelly www.wikibon.org www.siliconangle.com