Big Data and Hadoop - key drivers, ecosystem and use cases

•

2 j'aime•1,412 vues

This document discusses big data and Hadoop. It defines big data as extremely large data sets that are difficult to process using traditional databases. Three key drivers of big data are identified as volume, variety and velocity of data. Hadoop is introduced as an open source framework for storing and processing big data across multiple machines in parallel. Examples of big data pioneers using Hadoop like Yahoo, Facebook and LinkedIn are provided. Potential uses of big data in the financial services industry are also briefly outlined.

Technologie

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
[[The Wikibon Project]]
Big Data and Hadoop: Key Drivers,
Ecosystem and Use Cases
November 2011

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
What is Big Data?
2
Big Data n Data sets whose size, type
and/or speed make them impractical
to process and analyze with traditional
database technologies and related data
management tools.

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Why is Big Data Important?
3
Big Data is the new de.initive source
of competitive advantage across
industries …
… For those organizations that
embrace Big Data, the possibilities
for innovation, improved agility, and
increased pro.itability are nearly
endless.

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Three Key Big Data Drivers
4
1.  Volume, Variety, Velocity
2.  Hardware Commoditization
3.  Cloud Computing

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Characteristics of Big Data
5

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Sources of Big Data
6

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Hadoop
7
Open source framework for processing, storing
and analyzing Big Data.
Fundamental concept: Rather than banging
away at one, huge block of data with a single
machine, Hadoop breaks up Big Data into
multiple parts so each part can be processed
and analyzed in parallel.

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Hadoop: The Pros and Cons
8
First the pros … Hadoop is a time- and
cost-effective approach to store,
process and analyze large volumes of
unstructured data allowing for new and
unprecedented types of analytics.
Now the cons … Hadoop is complex and
difficult to deploy and manage; there’s a
dearth of Hadoop-savvy engineers and
Data Scientists on the job market; the
risk of forking and vendor lock-in
remains.

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Hadoop: The Pros and Cons cont.
9
More pros … Many bright minds contributing to
Hadoop resulting in rapid development and an
ecosystem of vendors emerging to make Hadoop
enterprise-ready.

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
The Big Data Ecosystem
10

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Big Data Pioneers
11
•  Largest Hadoop instance
on the planet … 40,000
nodes handling 200+ PB
of data.
•  Used to support research
for ad systems and Web
search.
•  Match ads with users,
detect spam in Yahoo!
Mail, pick relevant top
stories.

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Big Data Pioneers cont.
12
•  Two major clusters processing and
storing over 30 PB of data.
•  Uses HDFS to store copies of
internal log and dimension data.
•  Developed Hive to
perform large-scale
analytics on user data.
•  Using HBase to store,
manage and retrieve
Facebook Messenger
data.

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Big Data Pioneers cont.
13
•  Uses Hadoop to support “People You May Know” feature.
•  Tailors its search engine to return most relevant results
for recruiters, employers and job seekers.
•  Created a visualization tool to allow users to explore their
professional network to discover hidden patterns.

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Big Data in Financial Services
14
•  Over 30,000 databases and 15,000 applications
spread across 7 business units.
•  Using Hadoop as the basis of its Common Data
Platform.
•  Looking to establish 360 degree view of customer
for upsell and cross-sell opportunities.

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Big Data in Financial Services cont.
15
•  Risk management and analysis to understand
financial exposure.
•  Detecting fraudulent transactions and potentially
criminal activity.
•  Conduct sentiment analysis on social media data.

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org
Thank You
16
Jeffrey F. Kelly
Principal Research Contributor
The Wikibon Project
jeff.kelly@wikibon.org
@jeffreyfkelly
www.wikibon.org
www.siliconangle.com

Recommandé

Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & RedshiftDataKitchen

Atlanta Data Science Meetup | Qubole slidesQubole

Redshift IntroductionDataKitchen

Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016StampedeCon

Big Data at Pinterest - Presented by QuboleQubole

Qubole - Big data in cloudDmitry Tolpeko

Qubole presentation for the Cleveland Big Data and Hadoop Meetup Qubole

HDInsight Hadoop on Windows AzureLynn Langit

Recommandé

Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & RedshiftDataKitchen

Atlanta Data Science Meetup | Qubole slidesQubole

Redshift IntroductionDataKitchen

Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016StampedeCon

Big Data at Pinterest - Presented by QuboleQubole

Qubole - Big data in cloudDmitry Tolpeko

Qubole presentation for the Cleveland Big Data and Hadoop Meetup Qubole

HDInsight Hadoop on Windows AzureLynn Langit

Hadoop vs. RDBMS for Advanced Analyticsjoshwills

Data Engineer's Lunch #55: Get Started in Data EngineeringAnant Corporation

Big Data in the Real WorldMark Kromer

Using Hadoop to build a Data Quality Service for both real-time and batch dataDataWorks Summit/Hadoop Summit

Innovation in the Data Warehouse - StampedeCon 2016StampedeCon

Introduction to Kudu - StampedeCon 2016StampedeCon

Optimizing Big Data to run in the Public CloudQubole

Hd insight essentials quick viewRajesh Nadipalli

Interactive query using hadoopArvind Radhakrishnen

Atlanta MLConfQubole

Talend Big Data Capabilities - 2014Rajan Kanitkar

The Fundamentals Guide to HDP and HDInsightGert Drapers

Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonMapR Technologies

Data & analytics challenges in a microservice architectureNiels Naglé

Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon

How to get started in Big Data without Big Costs - StampedeCon 2016StampedeCon

Interactive query in hadoopRommel Garcia

The Microsoft BigData StoryLynn Langit

Democratizing Machine Learning: Perspective from a scikit-learn CreatorDatabricks

Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Qubole

Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Perficient, Inc.

Towards Neuro–Information Sciencejacekg

Contenu connexe

Tendances

Hadoop vs. RDBMS for Advanced Analyticsjoshwills

Data Engineer's Lunch #55: Get Started in Data EngineeringAnant Corporation

Big Data in the Real WorldMark Kromer

Using Hadoop to build a Data Quality Service for both real-time and batch dataDataWorks Summit/Hadoop Summit

Innovation in the Data Warehouse - StampedeCon 2016StampedeCon

Introduction to Kudu - StampedeCon 2016StampedeCon

Optimizing Big Data to run in the Public CloudQubole

Hd insight essentials quick viewRajesh Nadipalli

Interactive query using hadoopArvind Radhakrishnen

Atlanta MLConfQubole

Talend Big Data Capabilities - 2014Rajan Kanitkar

The Fundamentals Guide to HDP and HDInsightGert Drapers

Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonMapR Technologies

Data & analytics challenges in a microservice architectureNiels Naglé

Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon

How to get started in Big Data without Big Costs - StampedeCon 2016StampedeCon

Interactive query in hadoopRommel Garcia

The Microsoft BigData StoryLynn Langit

Democratizing Machine Learning: Perspective from a scikit-learn CreatorDatabricks

Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Qubole

Tendances (20)

Hadoop vs. RDBMS for Advanced Analytics

Data Engineer's Lunch #55: Get Started in Data Engineering

Big Data in the Real World

Using Hadoop to build a Data Quality Service for both real-time and batch data

Innovation in the Data Warehouse - StampedeCon 2016

Introduction to Kudu - StampedeCon 2016

Optimizing Big Data to run in the Public Cloud

Hd insight essentials quick view

Interactive query using hadoop

Atlanta MLConf

Talend Big Data Capabilities - 2014

The Fundamentals Guide to HDP and HDInsight

Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson

Data & analytics challenges in a microservice architecture

Turn Data Into Actionable Insights - StampedeCon 2016

How to get started in Big Data without Big Costs - StampedeCon 2016

Interactive query in hadoop

The Microsoft BigData Story

Democratizing Machine Learning: Perspective from a scikit-learn Creator

Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...

En vedette

Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Perficient, Inc.

Towards Neuro–Information Sciencejacekg

KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY- SCOPE,THEORIES AND...Dr. Raju M. Mathew

Big data + data science startup focus pointsTom Zorde

Sharing & Sustaining Ecosystem DataTERN Australia

Semiotics and Information ScienceFlorence Paisey

Big data ecosystemSlideCentral

Real time data servicesRelevate

Real Time Big DataInfoFarm

Big data ecosystemmagda3695

Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta

Big Data EcosystemIvo Vachkov

Earley Executive Roundtable - Building a Digital Transformation RoadmapEarley Information Science

Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta

Conceptions of information scienceJorge Prado

J.M. Díaz Nafría: Science of Information: Emergence and evolution of meaningJosé Nafría

Data Science and What It Means to Library and Information ScienceJian Qin

Real-Time Analytics: The Future of Big Data in the AgencyInfochimps, a CSC Big Data Business

Share: Science Information Life Cyclekauberry

Information, Science, and SocietyMelanie Swan

En vedette (20)

Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...

Towards Neuro–Information Science

KNOWLEDGE SCIENCE; NOT INFORMATION SCIENCE OR TECHNOLOGY- SCOPE,THEORIES AND...

Big data + data science startup focus points

Sharing & Sustaining Ecosystem Data

Semiotics and Information Science

Big data ecosystem

Real time data services

Real Time Big Data

Big data ecosystem

Introducing the Big Data Ecosystem with Caserta Concepts & Talend

Big Data Ecosystem

Earley Executive Roundtable - Building a Digital Transformation Roadmap

Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016

Conceptions of information science

J.M. Díaz Nafría: Science of Information: Emergence and evolution of meaning

Data Science and What It Means to Library and Information Science

Real-Time Analytics: The Future of Big Data in the Agency

Share: Science Information Life Cycle

Information, Science, and Society

Similaire à Big Data and Hadoop - key drivers, ecosystem and use cases

re:Introduce Big Data and Hadoop Eco-system.Shakir Ali

Introduction to Big Data & Big Data 1.0 SystemPetr Novotný

Bridging the Big Data Gap in the Software-Driven WorldCA Technologies

Big Data in Action : Operations, Analytics and moreSoftweb Solutions

Big dataMahmudul Alam

Bigdata " new level"Vamshikrishna Goud

Big Data AnalyticsSreedhar Chowdam

Hadoop HDFS.ppt6535ANURAGANURAG

Big DataPriyanka Tuteja

bigdata.pptxKammetaJoshna

Data mining with big dataSandip Tipayle Patil

IRJET- A Comparative Study on Big Data Analytics Approaches and ToolsIRJET Journal

Big data pptOECLIB Odisha Electronics Control Library

Big DataMehmet Burak Akgün

Big data seminorberasrujana

BIG Data & Hadoop Applications in Social MediaSkillspeed

ppt final.pptxkalai75

Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks

Big Data on Public CloudIMC Institute

Similaire à Big Data and Hadoop - key drivers, ecosystem and use cases (20)

re:Introduce Big Data and Hadoop Eco-system.

Introduction to Big Data & Big Data 1.0 System

Bridging the Big Data Gap in the Software-Driven World

Big Data in Action : Operations, Analytics and more

Big data

Bigdata " new level"

Big Data Analytics

Hadoop HDFS.ppt

Big Data

bigdata.pptx

Data mining with big data

IRJET- A Comparative Study on Big Data Analytics Approaches and Tools

Big data ppt

Big Data

Big data seminor

BIG Data & Hadoop Applications in Social Media

ppt final.pptx

Hortonworks and Red Hat Webinar_Sept.3rd_Part 1

Big Data on Public Cloud

Plus de Jeff Kelly

CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...Jeff Kelly

Wikibon Barclays Disruptive Tech Call - November 2014Jeff Kelly

Wikibon Big Data Capital Markets Day 2014Jeff Kelly

Democratizing Big Data (Updated)Jeff Kelly

The business value of Big DataJeff Kelly

Create your Big Data vision and Hadoop-ify your data warehouseJeff Kelly

Democratizing Big DataJeff Kelly

Plus de Jeff Kelly (7)

CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...

Wikibon Barclays Disruptive Tech Call - November 2014

Wikibon Big Data Capital Markets Day 2014

Democratizing Big Data (Updated)

The business value of Big Data

Create your Big Data vision and Hadoop-ify your data warehouse

Democratizing Big Data

Dernier

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Commit 2024 - Secret Management made easyAlfredo García Lavilla

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

Advanced Computer Architecture – An IntroductionDilum Bandara

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

Take control of your SAP testing with UiPath Test SuiteDianaGray10

What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

How to write a Business Continuity PlanDatabarracks

SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521

Dernier (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

Are Multi-Cloud and Serverless Good or Bad?

Unraveling Multimodality with Large Language Models.pdf

DMCC Future of Trade Web3 - Special Edition

Commit 2024 - Secret Management made easy

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

Advanced Computer Architecture – An Introduction

Streamlining Python Development: A Guide to a Modern Project Setup

The Ultimate Guide to Choosing WordPress Pros and Cons

Take control of your SAP testing with UiPath Test Suite

What is DBT - The Ultimate Data Build Tool.pdf

Ensuring Technical Readiness For Copilot in Microsoft 365

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

DevEX - reference for building teams, processes, and platforms

"Debugging python applications inside k8s environment", Andrii Soldatenko

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

How to write a Business Continuity Plan

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES

Big Data and Hadoop - key drivers, ecosystem and use cases

1. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org [[The Wikibon Project]] Big Data and Hadoop: Key Drivers, Ecosystem and Use Cases November 2011

2. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org What is Big Data? 2 Big Data n Data sets whose size, type and/or speed make them impractical to process and analyze with traditional database technologies and related data management tools.

3. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Why is Big Data Important? 3 Big Data is the new de.initive source of competitive advantage across industries … … For those organizations that embrace Big Data, the possibilities for innovation, improved agility, and increased pro.itability are nearly endless.

4. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Three Key Big Data Drivers 4 1.  Volume, Variety, Velocity 2.  Hardware Commoditization 3.  Cloud Computing

7. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Hadoop 7 Open source framework for processing, storing and analyzing Big Data. Fundamental concept: Rather than banging away at one, huge block of data with a single machine, Hadoop breaks up Big Data into multiple parts so each part can be processed and analyzed in parallel.

8. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Hadoop: The Pros and Cons 8 First the pros … Hadoop is a time- and cost-effective approach to store, process and analyze large volumes of unstructured data allowing for new and unprecedented types of analytics. Now the cons … Hadoop is complex and difficult to deploy and manage; there’s a dearth of Hadoop-savvy engineers and Data Scientists on the job market; the risk of forking and vendor lock-in remains.

9. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Hadoop: The Pros and Cons cont. 9 More pros … Many bright minds contributing to Hadoop resulting in rapid development and an ecosystem of vendors emerging to make Hadoop enterprise-ready.

11. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Big Data Pioneers 11 •  Largest Hadoop instance on the planet … 40,000 nodes handling 200+ PB of data. •  Used to support research for ad systems and Web search. •  Match ads with users, detect spam in Yahoo! Mail, pick relevant top stories.

12. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Big Data Pioneers cont. 12 •  Two major clusters processing and storing over 30 PB of data. •  Uses HDFS to store copies of internal log and dimension data. •  Developed Hive to perform large-scale analytics on user data. •  Using HBase to store, manage and retrieve Facebook Messenger data.

13. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Big Data Pioneers cont. 13 •  Uses Hadoop to support “People You May Know” feature. •  Tailors its search engine to return most relevant results for recruiters, employers and job seekers. •  Created a visualization tool to allow users to explore their professional network to discover hidden patterns.

14. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Big Data in Financial Services 14 •  Over 30,000 databases and 15,000 applications spread across 7 business units. •  Using Hadoop as the basis of its Common Data Platform. •  Looking to establish 360 degree view of customer for upsell and cross-sell opportunities.

15. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Big Data in Financial Services cont. 15 •  Risk management and analysis to understand financial exposure. •  Detecting fraudulent transactions and potentially criminal activity. •  Conduct sentiment analysis on social media data.

16. © Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org Thank You 16 Jeffrey F. Kelly Principal Research Contributor The Wikibon Project jeff.kelly@wikibon.org @jeffreyfkelly www.wikibon.org www.siliconangle.com