Hadoop: Revolutionizing Analytics AND Operations

•

4 j'aime•1,529 vues

This presentation was given by MapR CMO Jack Norris at Gartner BI and Analytics Summit in las Vegas on April 2, 2014. Hadoop revolutionizes how data is stored processed and analyzed. Hadoop represents a new data and compute stack that provides huge operational advantages and is being used to change how organizations compete. This session will provide an overview of how customers are using Hadoop today through details on initial uses and a glimpse of how this new platform is providing organizations 10X performance at 1/10 the cost

Technologie Business

© 2014 MapR Technologies 1© 2014 MapR Technologies

© 2014 MapR Technologies 2
Industry Leaders Compete and Win with Data1TREND
More Data Beats Better Algorithms
Collecting interaction data from ecommerce, social media, offline, and call centers
enables a “customer 360 view” and consumer intimacy
Competitive Advantage is Decided by 0.5%
Consumer financial services: 1% improvement in fraud means hundreds of millions of dollars
Advertising and retail: 0.5% improvement in lift means millions of dollars increase in profitability

© 2014 MapR Technologies 3
Fortune 100 Retailer

© 2014 MapR Technologies 4
Leading Cancer Research Center

© 2014 MapR Technologies 6
Production Hadoop in Waste Management

© 2014 MapR Technologies 7
FINANCIAL
SERVICES RETAIL SECURITY INTERNET MEDIA
INFORMATION
TECHNOLOGY
ADVERTISING HEALTH TELCOM GOVERNMENT
Top 10 industries determined by customer bookings
Addressing Diverse Industries

© 2014 MapR Technologies 8
Difficult to Leverage Data with Traditional Systems
• Mission-critical reliability
• Transaction guarantees
• Deep security
• Real-time performance
• Backup and recovery
• Interactive SQL
• Rich analytics
• Workload management
• Data governance
• Backup and recovery
Enterprise
Data
Architecture
2TREND
ENTERPRISE
USERS
OPERATIONAL
SYSTEMS
ANALYTICAL
SYSTEMS
PRODUCTION
REQUIREMENTS
PRODUCTION
REQUIREMENTS
OUTSIDE SOURCES

© 2014 MapR Technologies 9
Hadoop: The Disruptive Technology at the Core of Big Data3TREND
JOB TRENDS FROM INDEED.COM
Jan „06 Jan „12 Jan „14Jan „07 Jan „08 Jan „09 Jan „10 Jan „11 Jan „13

© 2014 MapR Technologies 10
Hadoop: Distributed Compute on Data

© 2014 MapR Technologies 11
The Hadoop Advantage
BIG DATA
HADOOP
Data on
compute
Simple
algorithms on
Big Data
unstructured
data

© 2014 MapR Technologies 12
Economics: Hadoop Just Makes Sense
Data
IT Budgets
• Gartner, "Forecast Analysis: Enterprise IT Spending by Vertical Industry Market, Worldwide, 2010-2016, 3Q12 Update.“
• Wall Street Journal, “Financial Services Companies Firms See Results from Big Data Push”, Jan. 27, 2014
$9,000
$40,000
<$1,000
2013
ENTERPRISE
STORAGE
IT BUDGETS
GROWING AT 2.5%
2014 2015 2016 2017
DATABASE
WAREHOUSE
DATA GROWING
AT 40%
$ PER TERABYTE
IT budgets can’t keep up growing data

© 2014 MapR Technologies 13
OPERATIONAL
SYSTEMS
ANALYTICAL
SYSTEMS
ENTERPRISE
USERS
1REALITY
• Data staging
• Archive
• Data transformation
• Data exploration
• Streaming,
interactions
Hadoop Relieves the Pressure on Enterprise Systems
2 Interoperability
1 Reliability and DR
4
Supports operations
and analytics
3 High performance
Keys for Production Success

© 2014 MapR Technologies 14
Architecture Matters for Success2REALITY
FOUNDATION

© 2014 MapR Technologies 15
FOUNDATION
Architecture Matters for Success2REALITY
Data protection
& security
High performance
Multi-tenancy
Workload
management
Open standards
for integration
NEW APPLICATIONS SLAs TRUSTEDINFORMATION LOWERTCO

© 2014 MapR Technologies 16
Hadoop is Being Used to Drive Small, Rapid Decisions3REALITY
High Arrival Rate Data
• Clickstream
• Social media
• Sensor data, …
Business Impact
• Revenue optimization
• Risk mitigation
• Operational efficiency

© 2014 MapR Technologies 17
Advertising
Automation
Cloud
Sellers
Cloud
Buyers
Cloud
100B
AD AUCTIONS
per day

© 2014 MapR Technologies 18
Largest Biometric Database in the World
PEOPLE
1.2B
PEOPLE

© 2014 MapR Technologies 19
50M
SET-TOP BOXES

© 2014 MapR Technologies 20
104M
CARD MEMBERS
Fortune 100 Financial Services Company

© 2014 MapR Technologies 21
World-Record Performance
PREVIOUS
RECORD: 1.6 TB
with 2200 nodes
1.65 TBIN 1 MINUTE
298 NODES
NEW MINUTESORT WORLD RECORD
MapR: With a Fraction of the Hardware
Previous Record

© 2014 MapR Technologies 22
Operations + Analytics
Fraud model
Recommendations
table
MapR Distribution for Hadoop
Fraud
investigator
Interactive
marketer
Online
transactions
Fraud
detection
Personalized
offers
Clickstream
analysis
Fraud
investigation tool
Real-time Operational Applications
Analytics

© 2014 MapR Technologies 23
Data Warehouse Optimization Using Hadoop
ADVANTAGES:
 Multi-million dollar cost savings
year over year
 Long term data offload with
HA, data protection and disaster
recovery
 Streaming writes to existing EDW
using NFS
 1T files
EDW
ETL and
Long Term Storage
Data
Warehouse
Data Warehouse:
Query and Report
Hadoop
Data Sources
Data Sources

© 2014 MapR Technologies 24
From Redundant Processing Silos and Data Science Experiments…
Opportunity to Revolutionize Enterprise Data Architecture

© 2014 MapR Technologies 25
… to Consolidated Operational and Analytical Workloads
The Production Enterprise Data Hub

© 2014 MapR Technologies 26
Q&A
@mapr maprtech
jnorris@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies

Recommandé

Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Publicis Sapient Engineering

WCIT 2014 Rohit Tandon - Big Data to Drive Business Results: HP HAVEnWCIT 2014

Customer Experience: A Catalyst for Digital TransformationCloudera, Inc.

Revolution in Business Analytics-Zika Virus ExampleBardess Group

Haven 2 0 Data Science Warsaw

Security and governanceDataWorks Summit

Optimize your cloud strategy for machine learning and analyticsCloudera, Inc.

Extending BI with Big Data AnalyticsDatameer

Recommandé

Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Publicis Sapient Engineering

WCIT 2014 Rohit Tandon - Big Data to Drive Business Results: HP HAVEnWCIT 2014

Customer Experience: A Catalyst for Digital TransformationCloudera, Inc.

Revolution in Business Analytics-Zika Virus ExampleBardess Group

Haven 2 0 Data Science Warsaw

Security and governanceDataWorks Summit

Optimize your cloud strategy for machine learning and analyticsCloudera, Inc.

Extending BI with Big Data AnalyticsDatameer

The Big Picture: Real-time Data is Defining Intelligent OffersCloudera, Inc.

The Five Markers on Your Big Data JourneyCloudera, Inc.

Accelerating Time to Success for Your Big Data Initiatives☁Jake Weaver ☁

Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...Cloudera, Inc.

Big Data LDN 2017: How to leverage the cloud for Business SolutionsMatt Stubbs

Introducing Gartnerchrisforte43

APAC Big Data Strategy_RKIntelAPAC

Monetizing Big Data with Streaming Analytics for Telecoms Service ProvidersCubic Corporation

The Vortex of Change - Digital Transformation (Presented by Intel)Cloudera, Inc.

Foundational Strategies for Trusted Data: Getting Your Data to the CloudPrecisely

Big datacamp june14_alex_liuData Con LA

Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSIONRenee Yao

Informatica Becomes Part of the Business Data Lake EcosystemCapgemini

Meet the experts dwo bde vds v7mmathipra

Data Mashups for AnalyticsKatharine Bierce

Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks

Big data for Telco: opportunity or threat?Swiss Big Data User Group

Reveal the Intelligence in your Data with Talend Data FabricJean-Michel Franco

Fit For Purpose: Preventing a Big Data LetdownInside Analysis

Get Started with Cloudera’s Cyber SolutionCloudera, Inc.

Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...ervogler

Hadoop In The Real WorldMapR Technologies

Contenu connexe

Tendances

The Big Picture: Real-time Data is Defining Intelligent OffersCloudera, Inc.

The Five Markers on Your Big Data JourneyCloudera, Inc.

Accelerating Time to Success for Your Big Data Initiatives☁Jake Weaver ☁

Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...Cloudera, Inc.

Big Data LDN 2017: How to leverage the cloud for Business SolutionsMatt Stubbs

Introducing Gartnerchrisforte43

APAC Big Data Strategy_RKIntelAPAC

Monetizing Big Data with Streaming Analytics for Telecoms Service ProvidersCubic Corporation

The Vortex of Change - Digital Transformation (Presented by Intel)Cloudera, Inc.

Foundational Strategies for Trusted Data: Getting Your Data to the CloudPrecisely

Big datacamp june14_alex_liuData Con LA

Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSIONRenee Yao

Informatica Becomes Part of the Business Data Lake EcosystemCapgemini

Meet the experts dwo bde vds v7mmathipra

Data Mashups for AnalyticsKatharine Bierce

Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks

Big data for Telco: opportunity or threat?Swiss Big Data User Group

Reveal the Intelligence in your Data with Talend Data FabricJean-Michel Franco

Fit For Purpose: Preventing a Big Data LetdownInside Analysis

Get Started with Cloudera’s Cyber SolutionCloudera, Inc.

Tendances (20)

The Big Picture: Real-time Data is Defining Intelligent Offers

The Five Markers on Your Big Data Journey

Accelerating Time to Success for Your Big Data Initiatives

Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...

Big Data LDN 2017: How to leverage the cloud for Business Solutions

Introducing Gartner

APAC Big Data Strategy_RK

Monetizing Big Data with Streaming Analytics for Telecoms Service Providers

The Vortex of Change - Digital Transformation (Presented by Intel)

Foundational Strategies for Trusted Data: Getting Your Data to the Cloud

Big datacamp june14_alex_liu

Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION

Informatica Becomes Part of the Business Data Lake Ecosystem

Meet the experts dwo bde vds v7

Data Mashups for Analytics

Interpretation Tool for Genomic Sequencing Data in Clinical Environments

Big data for Telco: opportunity or threat?

Reveal the Intelligence in your Data with Talend Data Fabric

Fit For Purpose: Preventing a Big Data Letdown

Get Started with Cloudera’s Cyber Solution

Similaire à Hadoop: Revolutionizing Analytics AND Operations

Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...ervogler

Hadoop In The Real WorldMapR Technologies

Integrating Hadoop into your enterprise IT environmentMapR Technologies

Meruvian - Introduction to MapRThe World Bank

Key Considerations for Putting Hadoop in Production SlideShareMapR Technologies

Powering the "As it Happens" BusinessMapR Technologies

Fast and Furious: From POC to an Enterprise Big Data Stack in 2014MapR Technologies

MapR Streams and MapR Converged Data PlatformMapR Technologies

Steve Jenkins - Business Opportunities for Big Data in the Enterprise WeAreEsynergy

Innovation Without Compromise: The Challenges of Securing Big DataCloudera, Inc.

Big Data & Analytics Day IBM Innovation Center Silicon Valley

How Experian increased insights with HadoopPrecisely

Big Data LDN 2017: Data Governance ReimaginedMatt Stubbs

Converging your data landscapeMapR Technologies

Benefiting from Big Data - A New Approach for the Telecom Industry Persontyle

Are you ready for Big Data 2.0? EMA Analyst ResearchEnterprise Management Associates

MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB

Turning Your Data Lake into Measurable Business ValueActian Corporation

From Data to Data Driven - Applications that will change your businessNG DATA

Hortonworks and HP Vertica WebinarHortonworks

Similaire à Hadoop: Revolutionizing Analytics AND Operations (20)

Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...

Hadoop In The Real World

Integrating Hadoop into your enterprise IT environment

Meruvian - Introduction to MapR

Key Considerations for Putting Hadoop in Production SlideShare

Powering the "As it Happens" Business

Fast and Furious: From POC to an Enterprise Big Data Stack in 2014

MapR Streams and MapR Converged Data Platform

Steve Jenkins - Business Opportunities for Big Data in the Enterprise

Innovation Without Compromise: The Challenges of Securing Big Data

Big Data & Analytics Day

How Experian increased insights with Hadoop

Big Data LDN 2017: Data Governance Reimagined

Converging your data landscape

Benefiting from Big Data - A New Approach for the Telecom Industry

Are you ready for Big Data 2.0? EMA Analyst Research

MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera

Turning Your Data Lake into Measurable Business Value

From Data to Data Driven - Applications that will change your business

Hortonworks and HP Vertica Webinar

Plus de MapR Technologies

ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies

Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies

Enabling Real-Time Business with Change Data CaptureMapR Technologies

Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies

ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies

Machine Learning Success: The Key to Easier Model ManagementMapR Technologies

Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies

Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies

Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies

Live Machine Learning Tutorial: Churn PredictionMapR Technologies

An Introduction to the MapR Converged Data PlatformMapR Technologies

How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies

Best Practices for Data Convergence in HealthcareMapR Technologies

Geo-Distributed Big Data and AnalyticsMapR Technologies

MapR Product Update - Spring 2017MapR Technologies

3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies

Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies

MapR and Cisco Make IT BetterMapR Technologies

Evolving from RDBMS to NoSQL + SQLMapR Technologies

Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies

Plus de MapR Technologies (20)

ML Workshop 2: Machine Learning Model Comparison & Evaluation

Self-Service Data Science for Leveraging ML & AI on All of Your Data

Enabling Real-Time Business with Change Data Capture

Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...

ML Workshop 1: A New Architecture for Machine Learning Logistics

Machine Learning Success: The Key to Easier Model Management

Data Warehouse Modernization: Accelerating Time-To-Action

Live Tutorial – Streaming Real-Time Events Using Apache APIs

Bringing Structure, Scalability, and Services to Cloud-Scale Storage

Live Machine Learning Tutorial: Churn Prediction

An Introduction to the MapR Converged Data Platform

How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...

Best Practices for Data Convergence in Healthcare

Geo-Distributed Big Data and Analytics

MapR Product Update - Spring 2017

3 Benefits of Multi-Temperature Data Management for Data Analytics

Cisco & MapR bring 3 Superpowers to SAP HANA Deployments

MapR and Cisco Make IT Better

Evolving from RDBMS to NoSQL + SQL

Evolving Beyond the Data Lake: A Story of Wind and Rain

Dernier

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Commit 2024 - Secret Management made easyAlfredo García Lavilla

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Story boards and shot lists for my a level piececharlottematthew16

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

Dernier (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Developer Data Modeling Mistakes: From Postgres to NoSQL

Commit 2024 - Secret Management made easy

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

DevEX - reference for building teams, processes, and platforms

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

Anypoint Exchange: It’s Not Just a Repo!

Scanning the Internet for External Cloud Exposures via SSL Certs

My Hashitalk Indonesia April 2024 Presentation

Designing IA for AI - Information Architecture Conference 2024

Advanced Test Driven-Development @ php[tek] 2024

WordPress Websites for Engineers: Elevate Your Brand

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

Unleash Your Potential - Namagunga Girls Coding Club

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Human Factors of XR: Using Human Factors to Design XR Systems

Story boards and shot lists for my a level piece

SAP Build Work Zone - Overview L2-L3.pptx

DevoxxFR 2024 Reproducible Builds with Apache Maven

What's New in Teams Calling, Meetings and Devices March 2024

Hadoop: Revolutionizing Analytics AND Operations

2. © 2014 MapR Technologies 2 Industry Leaders Compete and Win with Data1TREND More Data Beats Better Algorithms Collecting interaction data from ecommerce, social media, offline, and call centers enables a “customer 360 view” and consumer intimacy Competitive Advantage is Decided by 0.5% Consumer financial services: 1% improvement in fraud means hundreds of millions of dollars Advertising and retail: 0.5% improvement in lift means millions of dollars increase in profitability

7. © 2014 MapR Technologies 7 FINANCIAL SERVICES RETAIL SECURITY INTERNET MEDIA INFORMATION TECHNOLOGY ADVERTISING HEALTH TELCOM GOVERNMENT Top 10 industries determined by customer bookings Addressing Diverse Industries

8. © 2014 MapR Technologies 8 Difficult to Leverage Data with Traditional Systems • Mission-critical reliability • Transaction guarantees • Deep security • Real-time performance • Backup and recovery • Interactive SQL • Rich analytics • Workload management • Data governance • Backup and recovery Enterprise Data Architecture 2TREND ENTERPRISE USERS OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS PRODUCTION REQUIREMENTS PRODUCTION REQUIREMENTS OUTSIDE SOURCES

12. © 2014 MapR Technologies 12 Economics: Hadoop Just Makes Sense Data IT Budgets • Gartner, "Forecast Analysis: Enterprise IT Spending by Vertical Industry Market, Worldwide, 2010-2016, 3Q12 Update.“ • Wall Street Journal, “Financial Services Companies Firms See Results from Big Data Push”, Jan. 27, 2014 $9,000 $40,000 <$1,000 2013 ENTERPRISE STORAGE IT BUDGETS GROWING AT 2.5% 2014 2015 2016 2017 DATABASE WAREHOUSE DATA GROWING AT 40% $ PER TERABYTE IT budgets can’t keep up growing data

13. © 2014 MapR Technologies 13 OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS ENTERPRISE USERS 1REALITY • Data staging • Archive • Data transformation • Data exploration • Streaming, interactions Hadoop Relieves the Pressure on Enterprise Systems 2 Interoperability 1 Reliability and DR 4 Supports operations and analytics 3 High performance Keys for Production Success

15. © 2014 MapR Technologies 15 FOUNDATION Architecture Matters for Success2REALITY Data protection & security High performance Multi-tenancy Workload management Open standards for integration NEW APPLICATIONS SLAs TRUSTEDINFORMATION LOWERTCO

16. © 2014 MapR Technologies 16 Hadoop is Being Used to Drive Small, Rapid Decisions3REALITY High Arrival Rate Data • Clickstream • Social media • Sensor data, … Business Impact • Revenue optimization • Risk mitigation • Operational efficiency

21. © 2014 MapR Technologies 21 World-Record Performance PREVIOUS RECORD: 1.6 TB with 2200 nodes 1.65 TBIN 1 MINUTE 298 NODES NEW MINUTESORT WORLD RECORD MapR: With a Fraction of the Hardware Previous Record

22. © 2014 MapR Technologies 22 Operations + Analytics Fraud model Recommendations table MapR Distribution for Hadoop Fraud investigator Interactive marketer Online transactions Fraud detection Personalized offers Clickstream analysis Fraud investigation tool Real-time Operational Applications Analytics

23. © 2014 MapR Technologies 23 Data Warehouse Optimization Using Hadoop ADVANTAGES:  Multi-million dollar cost savings year over year  Long term data offload with HA, data protection and disaster recovery  Streaming writes to existing EDW using NFS  1T files EDW ETL and Long Term Storage Data Warehouse Data Warehouse: Query and Report Hadoop Data Sources Data Sources

Notes de l'éditeur

Hadoop: Revolutionizing Analytics AND Operations.Hadoop revolutionizes how data is stored processed and analyzed. Hadoop represents a new data and compute stack that provides huge operational advantages and is being used to change how organizations compete. This session will provide an overview of how customers are using Hadoop today through details on initial uses and a glimpse of how this new platform is providing organizations 10X performance at 1/10 the costOverview of Big Data Data driven companies Use cases….examples of data driven 2 to 3. Show importance of leveraging data… Existing systems getting overrun Examples of what this means ….Size of data, Oracle hitting the wall…Analytic speed…. Hadoop is at the center What is Hadoop Additional proof points???3 Realities Relieves the pressure Processing example in terms of how it scales Cost example… You don’t need to know the questions you’re going to ask ahead of time…. Small Rapid Decisions Examples of Operational Hadoop Rubicon 3 to 4 Follow up with the Use case… Architecture Matters Why is this the case The Results Where do you start…. Offloading examples…. Cisco – DW IRI – Mainframe offload
The first trend is that the industry leaders have shown how to use big data to compete and win in their markets. It’s no longer a nice to have – you need big data to competeGoogle pioneered MapReduce processing on commodity hardware and used that to catapult themselves to into the leading search engine even though they were 19th in the marketYahoo! Leveraged these ideas to create Hadoop to keep up with Google and many mainstream companies have followed with new data-driven applications such as “people you may know” (started by LinkedIN and now used by Facebook, Twitter, and every social application), product recommendation engines, contextual and personalized music services (beats), measuring digital media effectiveness (comScore), serving more relevant/targeted ads(Comcast, rubicon project), fraud and risk detection, healthcare efficacy, and moreWhat makes the difference? A lot of attention is given to data science and developing sophisticated new algorithms, but in many cases just having more data beats better algorithms. (make point on collecting more consumer interaction as well as transaction data, as an example). In addition, competitive advantage is decided by very small percentages. Just 1% improvement in fraud can mean hundreds $millions in savings. A ½% lift in advertising effectiveness means millions in new product sales and profitability. The same can be applied to customer churn, disease diagnosis, and more.
Doctors, particularly oncologists, are faced with an enormous amount of data regarding patient treatments, outcomes, and disease states. Hadoop is having an impact across the health care industry but for this minute we will focus on its use for developing better treatments. In one minute Hadoop can analyze more than 20,000 genes across hundreds of thousands of patients. The outcome of this analysis is to get a better understanding of genomic factors and integrate imaging and clinical analytics to better understand, predict, and impact survival. In any given minute our cluster is sequencing 422,000 genes per minute.
Beats headphones by Dr. Dre have swept the audio market. Beats has launched a new Beats Music service thatis able to personalize music selections and select the perfect song in a minute from over 20 million songs. It joins a crowded space for online music, but now by using MapR Beats is able to provide a completely new personalized service from over 20 million songs in their library.It’s not about delivering 20Million songs, but providing a continuously-updating, personalized and tailored experience to users.
A second trend in enterprise architecture has been big data overwhelming the existing workload-specific systems which are in production. (list of requirements for each of these on the side in text)People started with mainframes or operational systems which run ERP, finance, CRM and other mission-critical applications. They require… (pick out attributes you want to stress on the left)You also have data warehouses, marts, data mining, and other analytical systems which pull data from these operational and other systems for providing insights to the business for decision makingThe amount/variety of data has been overloading these systems. You reach a certain point as you try to ingest new types of data when these systems are not cost-effective to scale to terabytes or petabytes of data
Hadoop has become the defacto big data platform which allows organizations to keep up with big data and feed data-driven applications and processesThis chart shows the percentage growth of jobs from Indeed.com.Compared to other popular technologies such as MongoDB and Cassandra, Hadoop is not only the fastest growing big data technology it’s one of the fastest growing technologies period. Hadoop has the most robust ecosystem and momentum and is the big data platform of choice for industry-leading, data-driven companies(Also of interest is that Indeed.com (which is a subsidiary of a Japanese-owned company) is a customer of MapR – they harness and analyze all of the job trends data using MapR)
As implemented, MapReduce is actually a collection of complementary techniques and strategies that include employing commoditized hardware and software, specialized underlying file systems, and parallel processing methodologies. Many of the benefits arise from the fact that computation can be done on the same machines where data resides and from the fact that individual pieces of the overall computation can be recomputed if necessary due to hardware failure or other delays. This is a revolutionary architectural philosophy that shelters the average developer from the overwhelming complexity that had formerly been required to properly carry out parallel processing. But as we’ll see later, the implementation of MapReduce laid the foundation for significant problems now being experienced by many enterprises that are seeking to put it to work.
Map Reduce is a paradigm shift. It’s moving the processing to the data.Apache Hadoop is a software framework that supports data-intensive distributed applications. Hadoop was inspired by a published Google MapReduce whitepaper. Apache Hadoop provides a new platform to analyze and process Big Data. With data growth exploding and new unstructured sources of data expanding a new approach is required to handle the volume, variety and velocity of this growing data. Hadoop clustering exploits commodity servers and increasingly less expensive compute, network and storage.Google is the Poster Child for the power of MapReduce. They were the 19th search engine to enter the market. There were 18 companies more successful and within 2 years, Google was the dominant player. That’s the power of the MapReduce framework.---------------------------Long versionA poster child for this is Google. We now take Google’s dominance for granted, but when Google launched their beta in 1998 they were late. They were at least the 19 search engines on the market. Yahoo was dominant, there was infoseek, excite, Lycos, Ask Jeeves, AltaVista (which had the technical cred). It wasn’t until Google published a paper in 2003 that we got a glimpse at their back end architecture. Google was able to reach dominance because they recognized early on the paradigm shift and they were able to index more data, get better results and do it much much more efficiently and cost effectively than their competitors. They went from 19th to first in a few short years because of MapReduce.A Yahoo engineer by the name of Doug Cutting read that same paper in 2003 and developed a Java implementation of MapReduce named after his son’s stuffed elephant that became the basis for the open source Hadoop project. Now when we say Hadoop we’re talking about a robust ecosystem. There are now multiple commercial versions of Hadoop. There’s a complete stack that includes job management, development tools, schedulers, machine learning libraries, etc. MapR’s co-founder and CTO was at Google he was in charge of the BigTable group and understands MapReduce at scale. Our charter was to fix the underlying flaws of the hadoop implementation to make it appropriate more a broader set of applications and work for most organizations.
Need a Platform that serves the broadest sets of use cases….
The first reality is that as people put Hadoop into production, to relieve the pressure from other systems in their enterprise architecture it needs to reliable . Hadoop needs to be held to the same enterprise standards as your Oracle, SAP, Teradata, NetApp storage, or any other enterprise system.Many organizations are putting Hadoop into their data center to provide (list of use cases underneath) … it can do all of this and more, butFor hadoop to act as a system of record , it must provide the same guarantees for SLA’s, performance, data protection, and moreMost importantly, Hadoop has the potential for both analytics AND operations. It can be used to optimize the data warehouse provide batch data refining or storage. But Hadoop can provide many operational analytics or database operations/jobs when done right.
Choosing the right big data architecture is critical for success with your Hadoop projects and business applicationsOne analogy is building a sky scraper. Before you can start building up, you have to lay a rock-solid foundation. This building is the new Wilshire Grand project in Los Angeles. In Feb of this year they set a Guinness World Record for pouring a 21,000 cubic yard (16,000 cubic meters) foundation over 26 hours (http://www.theguardian.com/cities/2014/feb/14/world-largest-concrete-pour-la-trucks-los-angeles) When completed in 2017, the building will be the tallest in the US outside of NY and Chicago.
This analogy applies as well to building a data platform – you have to architect for the future. This allows you to build higher, stronger, and faster, without retrofitting later down the road (anyone who has added a second story to their house can attest to the additional cost and construction delays if you have to reinforce a foundation which wasn’t designed to hold the stress)For business-critical applications you must have data protection and security (availability, data protection, and recovery), high performance (with random read-write system), multi-tenancy (to support multiple business units, isolate applications or user data,…), provide good resource and workload management to support multiple applications, and open standards to integrate with the rest of the enterprise data architectureThis data foundation allows you to support new data-driven applications (both operational and analytical) , maintain service level agreements with the business, provide information you can trust and count on being there when you need it, and ultimately being the best TCO for the long-run. Supporting enterprise systems without retrofits or multiple clusters to work around platform deficiencies (e.g., to support operational/online applications in Hadoop today, you need a separate HBase cluster – separate from the rest of your Hadoop cluster/investment)
In a recent article by Tom Davenport (http://www.cmswire.com/cms/big-data/5-things-to-lessen-your-anxiety-about-big-data-024382.php) – he says“Big data’s biggest wins come from making many small decisions vs. one that’s huge. The majority of big data driven decisions will be recurring, made at speed (in milliseconds), and at scale; actions will be taken automatically (vs. reviewed and approved by an individual). Examples include ad platforms making many constant adjustments, fraud detection on millions of transactions that are based on individual patterns, fleet management and routing taking into account current conditions….This requires a Hadoop platform that can go beyond batch and support streaming writes so data can be constantly writing to the system while analysis is being conducted. High performance to meet the business needs and real-time operations the ability to perform online database operations to react to the business situation and impact business as it happens not report on it one week, month or quarter later.To do this requires THE RIGHT ARCHITECTURE
One great example is the Rubicon Project, who recently filed their S1 to go public. They bet their business on data with Hadoop as the cornerstone of their business and developed pioneering technology that created a new model for the advertising industry – similar to what NASDAQ did for stock trading. Rubicon Project’s uses MapR for their automated advertising platform that processes over 100B ad auctions a day and provides the most extensive ad reach in the industry touching 96% of internet users in the US. They use MapR because of the superior system reliability, and performance and ability to run in their “lights out datacenters”. They switched from one of our competitors after experiencing a Namenode failure and constant up and down. This was fine in development, but Hadoop needed to be a production system in 2011, which is when they switched to MapR
In India, there is no social security card. It’s difficult for the average citizen to set up a bank account, access benefit programs, and enjoy economic mobility. It’s difficult for the government as well with over a $1B of government aid classified as leakage, the result of fraud and corruption. The Aadhaar program is poised to change all that by leveraging the unique IDs that all people are born with to create the largest biometric database in the world The program aims to get fingerprints and retina scans for all 1.2 billion citizens. The scale of this project required MapR’sin-Hadoop database that is capable of 200 millisecond response times while supporting millions of concurrent look-ups.
They ran the MinuteSort benchmark, a test which shows how much data you can sort in 1 minute. The Minutesort world record was set by Yahoo by sorting 1.6 terabytes with 2200 nodes. This MapR customer broke the record by sorting 1.65TB with 298 nodes. That’s 1/7th the hardware – that translates into tremendous cost, space, and management savings….
Because only MapR can reliably run both operational and analytical applications on one platform/cluster, MapR enables a faster closed-loop process between operational applications and analytics. This means:interactive marketers and algorithms can update the rules engines more quickly and provide more real-time targeting of offers and relevant content to consumersFraud models are kept more up to date with the latest patterns to better detect anomalies and take action more quickly on bad actors
MapR creates a new opportunity for enterprises. The Opportunity to revolutionize the enterprise data architectureFrom... ‘redundant processing silos’ and ‘data science experiments’. Where you need separate Hadoop clusters for streaming, HDFS/Hive, Hbase and more To… ‘
To… ‘converged data & processing hub’ that provides a TRUE PRODUCTIon enterprise data hub.This allows you to consolidate operational and analytical workloads. Not only across Hadoop use cases and applications, but for optimizing your enterprise data architecture