SlideShare une entreprise Scribd logo
1  sur  7
Télécharger pour lire hors ligne
WHITE paper
www.hcltech.com
Abstract
Abbreviations
Market Trends and Challenges
Solution
Case Study
Revenue Benchmarking
MR Latency Benchmarking
Word Count with Combiner
Word Count without Combiner
Best Practices
Conclusion
Reference
Author Info
2
2
2
3
5
5
5
6
7
7
7
7
7
TABLE OF CONTENTS
Adaptive Map
Reduce
WHITE PAPER
This paper explores the various map reduce design patterns and comes out with a unified working solution (library). The
library has the potential to ‘adapt’ itself to any data processing need which can be achieved by Map Reduce. This would
not only enable HCL and Clients save a lot of man hours but as well as enforces the ‘good practices’ of map reduce
design pattern in the code by default. HCL Technologies has been actively working with multiple clients for the last couple
of years in verticals such as ISPs, Aero, Banking & Finance and Media & Entertainment delivering them service and
solutions in the Big Data/Data Analytics domain. One of the fundamental problems that all of these leading companies
came up with to HCLT was processing big data which is in different data formats, spread across multiple sources and with
one or more co-relational mapping parameters. There fuels a need for a unified library which can act as a bridge for solving
these varied cross domain problems and utilize the good practices of Map Reduce.
Hadoop efficiently solved the Volume and Velocity of Big Data; however there is a gap which calls for a solution which
will make use of existing frameworks to solve the Variety problem. The solution of the 3rd V (Variety) actually boils down
to seamlessly handling of data processing even if the data type/processing algorithm gets modified. The clients gener-
ally come up with ad-hoc data source/processing/mapping problems and we have to implement with the appropriate
MR programs. However, due to isolated problems/data sources solo programs are written resulting in redundant effort
in/across teams and project. Most of the times clients initially lack clear visibility of the entire requirements and midways
may request to include a data source. In most of the cases there calls for a lot of rework involved which results in scope
change from project management perspective and clients generally don’t want to reschedule much. The project which
we are currently implementing for the largest Aerospace Company is a pre-prod application which will expand into a
full time production environment in the near future. We currently have visibility into only 3 data source and in production
the number of data sources would be at-least 5 times more. The task that the client has asked us to deliver is that there
should be minimum code changes and no change at all in the architecture. This challenge is in line with the problem
described in paragraphs above.
Fig1.MPP Report highlighting the efforts in man-days
Data processor/MR job for
Data processor/MR job for
Data processor/MR job for
Unit test with representative data
Report & Dashboard development
Tool evaluation for reports and dashboard
Develop reports (3 reports)
Develop dashboard
5 days
5 days
5 days
5 days
35 days
5 days
15 days
10 days
49
50
51
52
53
54
55
56
ID Task Name Duration Start Finish Predecess
Tue 11/25/14
Tue 11/25/14
Tue 12/2/14
Tue 12/9/14
Mon 11/3/14
Mon 11/3/14
Mon 11/10/14
Mon 12/1/14
Mon 12/1/14
Mon 12/1/14
Mon 12/8/14
Mon 12/15/14
Fri 12/19/14
Fri 11/7/14
Fri 11/28/14
Fri 12/12/14
33
33
49
51
54
55
Sl. No. Acronyms Full form
1 AMR Adaptive Map Reduce
Market Trends and Challenges
Abbreviations
Abstract
As we can clearly see in the diagram above to support each Data Processing Algorithm we need to spend about 5
Man-Days for the development alone. Now with use of AMR the need for such cycles can be eliminated
As in any programing paradigm MR has a set of design patterns too. The design patterns are generally based out of ‘good
practices’ which evolves out of years of research and implementation in the industry. Currently when MR programs are
written these patterns are not used always. However it has been noticed that there is a considerable improvement in
performance when patterns are used. By introducing a library/framework we would enforce the projects to follow the
good practices of MR. This would also enable projects to quickly map the processing logic to a pattern without much
research and would ease the development effort a lot.
HCLT Analytics group have a lot of customizable solutions off the shelves for Data Ingestion, Data Persistence and Multi
Tenancy however we don’t have a framework/library for core Data Processing of Hadoop.
The diagram depicts the fact that the degree to which software is customized does play an important role in project acquisi-
tions. Hence a highly customizable solution in Big Data processing module can be of a great value addition to HCLT as a
company. It will enable us to go for project acquisitions with overall solutions for every aspect of Data Analytics.
We decided to approach this problem first by analysing the Map Reduce design patterns. There are 23 patterns as of now.
Fig2. Major Variables affecting Software Acquisition
Join
Meta
Patterns
Input and
Output
Summarization Filtering
Data
Organitation
Reduce Side
Join
Replicated
Join
Composite
Join
Cartesian
Products
Job Chaining
Chain Folding
Top Ten Items
Job Marging
Generating
Data
External
Source Data
External
Source Input
Numerical
Summarization
Inverted Index
Summarization
Counting with
Counters
Filtering
Bloom
Filering
Top Ten Items
Distinet
Structuredto
Hierarchical
Parttioning
Binning
Total Order
Sorting
Shuffling
Partition
Pruning
Solution
Entirely
Off-the-Shelf
Software
Off-the-Shelf
Software
Partly Customized
(a) Degree to which Acquired Software is Customized
(b) Scale of Acquisition, or Degree to which the
overall Acquisition is Acquired as Separated Components
Entirely
Custom
Software
Full
System
Several
Components
Single
Component
The idea was to identify the commonality across these patterns and also to understand the level of dependencies among
the implementation details for each pattern. We found out that each pattern require at least
Input and Output Paths: Which dataset to process? Where should be the output written?
Class of Action required for example: Filtering, Aggregation etc.
Processing Details: Which set of fields are required? How?
Input and Output Data Types: What to process ?
Here as depicted in the diagram, different shapes are created using the Factory Pattern. The shapes are created using
‘Concrete Classes’, the Factory is passed on with the information to create the objects, the Factory instantiate the concrete
class according to the information passed and a shape object is created.
The question that we asked ourselves was how to create a library/framework which can be used to instantiate the MR Job
objects required serving any MR pattern. The well-oiled ‘Factory’ Design Pattern was used for this purpose.
Fig2. Major Variables affecting Software Acquisition
In AMR we created concrete MR classes for every MR design pattern. The information of which class to instantiate is
passed on to the Factory using the xml configuration file as shown in the diagram above. When the data comes into the
system the appropriate object is instantiated according to the rules set in regards to the source/algorithm and the MR Job
is started.
The design pattern used is in its nascent stages, though we are currently using Factory we can slowly evolve into a Builder
Pattern when we would want to achieve greater granularity in the data processing. As of now the generic version of the
library is WIP. * We cannot reveal the original Class Diagrams and Full Config file details currently due to NDA.
Quantitative benefits which can be achieved by AMR are mostly measurable however the framework/ library have the
potential to get us some project acquisitions too. Currently we have not taken the solution to our sales teams who are likely
to give us those figures. Through latency and cost benchmarking we can illustrate the measurable parameters as follows:
The MR Job above without Combiner takes about 40 min to complete as evident in the screenshot above. The CPU Time
Taken is about 1964120 ms. One can notice that the Combine Input/Records are present in the screen shot below.
Case Study
Revenue Benchmarking
Let us assume an average of 5 man-days effort for on boarding a data source. With proposed AMR if we are proposing
to reduce it to 4 days (average) per data source, we can claim 20% reduction in development effort to on board a new
data source.
MR Latency Benchmarking
The showcased example is the simplest example of Word Count in MR, but the benchmarks clearly highlight the
advantages of using a design pattern.
Data Set:
NY Times news articles: Source: ldc.upenn.edu
Documents =300000
No. of Words =102660
Size of Data = 1 GB
Word Count with Combiner
The MR Job above without Combiner takes about 42.5 min to complete. The CPU Time Taken is about 1853760 ms.
One can notice that the Combine Input/Records are 0 in the screen shot below.
We can deduce the following from the above
There is a gain of about 2.5 min in processing latency
There is an increase of about 6% CPU time utilization and 2% Physical Memory utilization. It shows greater
consumption of the machine resources. More consumption of the machine resources is always preferable in a
distributed environment.
Now as control measure we comment out the Combiner class as depicted above and run the program again.
Word Count without Combiner
We are utilizing the best practices of industry and bringing it all under an umbrella. These would result in huge qualitative
benefits in terms of program code and processes.
The quality principle/objective of HCL as an organization is “We shall satisfy our customers by delivering quality products and
services that meet their requirements on time, every time”. AMR as a framework ensures highest level of quality in the
product/service we develop for implementing Data Processing for Big Data.
We also belief “The quality of a product is largely determined by the quality of the process that is used to develop and
maintain it”. By introducing AMR we would be able to enforce a standardized process of MR across the organization which
is based of industry’s best practices in terms of design patterns thus ensuring highest level of quality in the process itself.
“On time Delivery, Cost Control, Enhance Customer Satisfaction and Continual Service Improvement” are the key quality
objectives of HCLT; AMR would allow us to realize most of the goals effectively One of the core principles of quality is REUSE
which AMR promotes by reusing MR code.
The tools used for developing the library are free open sources tools none of which is proprietary to the client or any compa-
ny. However it may be noted that the AMR concept and the library developed are proprietary to HCLT as a whole.
Key Domains where Big Data is in use today are Aero, Auto, Manufacturing, Public Sector, Governance, Health Care and
Media, the list goes on. Now all of these domains have unique processing needs for each of the data sources and the
algorithm which can be addressed by AMR. Also if one notes closely the solution is domain independent. The modification
that is required is only in form of the configuration file which is required to run the program. The solution can be used as-is
as a library for any scenarios where we have to use MR for processing data.
The solution is not library version or tool dependent. It can support any upgrades or modifications in the supporting libraries
as long as there is no major change in the implementation of Map Reduce algorithms itself. We are currently using it with
Cloudera Hadoop 4/5 releases as well as vanilla Apache Hadoop.
http://www.byzantinereality.com/2009/4/History-of-MapReduce-Part-2w
http://www.maxwideman.com/papers/acquisition/involve.htm
http://www.slideshare.net/zhengwenshen/20130201-mapreduce-design-patterns
https://qualitydiva.hcl.com/Other_Links/OMS_Overview.ppt
http://www.tutorialspoint.com/design_pattern/factory_pattern.htm
Author Info
Kinnar Kumar Sen
HCL Engineering and R&D Services
Hello there! I am an Ideapreneur. I believe that sustainable business outcomes are driven by relationships nurtured through values
like trust, transparency and flexibility. I respect the contract, but believe in going beyond through collaboration, applied innovation
and new generation partnership models that put your interest above everything else. Right now 110,000 Ideapreneurs are in a
Relationship Beyond the Contract™ with 500 customers in 31 countries. How can I help you?
TM
Best Practices
Conclusion
Reference

Contenu connexe

Tendances

Bringing AIOps to Hybrid Cloud Monitoring and Management
Bringing AIOps to Hybrid Cloud Monitoring and ManagementBringing AIOps to Hybrid Cloud Monitoring and Management
Bringing AIOps to Hybrid Cloud Monitoring and ManagementOpsRamp
 
AI-led Operations Management | GAVS Technologies
AI-led Operations Management | GAVS TechnologiesAI-led Operations Management | GAVS Technologies
AI-led Operations Management | GAVS TechnologiesGAVS Technologies
 
Mobile Technologies and Supply Chain @ ARC's 2011 Industry Forum
Mobile Technologies and Supply Chain @ ARC's 2011 Industry Forum Mobile Technologies and Supply Chain @ ARC's 2011 Industry Forum
Mobile Technologies and Supply Chain @ ARC's 2011 Industry Forum ARC Advisory Group
 
Digital oilfield
Digital oilfield Digital oilfield
Digital oilfield Cimation
 
IT Transformation - Make IT the Engine of Innovation for Enterprise Transform...
IT Transformation - Make IT the Engine of Innovation for Enterprise Transform...IT Transformation - Make IT the Engine of Innovation for Enterprise Transform...
IT Transformation - Make IT the Engine of Innovation for Enterprise Transform...Software AG South Africa
 
Uptime Institute 2015 Industry Survey
Uptime Institute 2015 Industry SurveyUptime Institute 2015 Industry Survey
Uptime Institute 2015 Industry SurveyUptime Institute
 
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...Sentient Science
 
Business Intelligence PowerPoint Presentation Slides
Business Intelligence PowerPoint Presentation Slides Business Intelligence PowerPoint Presentation Slides
Business Intelligence PowerPoint Presentation Slides SlideTeam
 
Data capabilities and competitive advantage
Data capabilities and competitive advantageData capabilities and competitive advantage
Data capabilities and competitive advantageNUS-ISS
 
What Does Artificial Intelligence Have to Do with IT Operations?
What Does Artificial Intelligence Have to Do with IT Operations?What Does Artificial Intelligence Have to Do with IT Operations?
What Does Artificial Intelligence Have to Do with IT Operations?Precisely
 
Infosys' session on IoT World - Systems Integration in an IOT world: A practi...
Infosys' session on IoT World - Systems Integration in an IOT world: A practi...Infosys' session on IoT World - Systems Integration in an IOT world: A practi...
Infosys' session on IoT World - Systems Integration in an IOT world: A practi...Infosys
 
“The Digital Oilfield” : Using IoT to reduce costs in an era of decreasing oi...
“The Digital Oilfield” : Using IoT to reduce costs in an era of decreasing oi...“The Digital Oilfield” : Using IoT to reduce costs in an era of decreasing oi...
“The Digital Oilfield” : Using IoT to reduce costs in an era of decreasing oi...Karthikeyan Rajamanickam
 
Value of solar remote monitoring and analytics for operational intelligence
Value of solar remote monitoring and analytics for operational  intelligenceValue of solar remote monitoring and analytics for operational  intelligence
Value of solar remote monitoring and analytics for operational intelligenceMachinePulse
 
Doing DevOps for Big Data? What You Need to Know About AIOps
Doing DevOps for Big Data? What You Need to Know About AIOpsDoing DevOps for Big Data? What You Need to Know About AIOps
Doing DevOps for Big Data? What You Need to Know About AIOpsDevOps.com
 
Winning with data
Winning with dataWinning with data
Winning with dataNUS-ISS
 

Tendances (20)

Bringing AIOps to Hybrid Cloud Monitoring and Management
Bringing AIOps to Hybrid Cloud Monitoring and ManagementBringing AIOps to Hybrid Cloud Monitoring and Management
Bringing AIOps to Hybrid Cloud Monitoring and Management
 
AI-led Operations Management | GAVS Technologies
AI-led Operations Management | GAVS TechnologiesAI-led Operations Management | GAVS Technologies
AI-led Operations Management | GAVS Technologies
 
Mobile Technologies and Supply Chain @ ARC's 2011 Industry Forum
Mobile Technologies and Supply Chain @ ARC's 2011 Industry Forum Mobile Technologies and Supply Chain @ ARC's 2011 Industry Forum
Mobile Technologies and Supply Chain @ ARC's 2011 Industry Forum
 
Digital oilfield
Digital oilfield Digital oilfield
Digital oilfield
 
IT Transformation - Make IT the Engine of Innovation for Enterprise Transform...
IT Transformation - Make IT the Engine of Innovation for Enterprise Transform...IT Transformation - Make IT the Engine of Innovation for Enterprise Transform...
IT Transformation - Make IT the Engine of Innovation for Enterprise Transform...
 
Uptime Institute 2015 Industry Survey
Uptime Institute 2015 Industry SurveyUptime Institute 2015 Industry Survey
Uptime Institute 2015 Industry Survey
 
BDPA Cincinnati: 'Big Data - Friend or Foe?'
BDPA Cincinnati: 'Big Data - Friend or Foe?' BDPA Cincinnati: 'Big Data - Friend or Foe?'
BDPA Cincinnati: 'Big Data - Friend or Foe?'
 
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
 
Business Intelligence PowerPoint Presentation Slides
Business Intelligence PowerPoint Presentation Slides Business Intelligence PowerPoint Presentation Slides
Business Intelligence PowerPoint Presentation Slides
 
Data capabilities and competitive advantage
Data capabilities and competitive advantageData capabilities and competitive advantage
Data capabilities and competitive advantage
 
What Does Artificial Intelligence Have to Do with IT Operations?
What Does Artificial Intelligence Have to Do with IT Operations?What Does Artificial Intelligence Have to Do with IT Operations?
What Does Artificial Intelligence Have to Do with IT Operations?
 
Infosys' session on IoT World - Systems Integration in an IOT world: A practi...
Infosys' session on IoT World - Systems Integration in an IOT world: A practi...Infosys' session on IoT World - Systems Integration in an IOT world: A practi...
Infosys' session on IoT World - Systems Integration in an IOT world: A practi...
 
“The Digital Oilfield” : Using IoT to reduce costs in an era of decreasing oi...
“The Digital Oilfield” : Using IoT to reduce costs in an era of decreasing oi...“The Digital Oilfield” : Using IoT to reduce costs in an era of decreasing oi...
“The Digital Oilfield” : Using IoT to reduce costs in an era of decreasing oi...
 
Energy Management Solution - iARMS-EMS/PMS
Energy Management Solution - iARMS-EMS/PMSEnergy Management Solution - iARMS-EMS/PMS
Energy Management Solution - iARMS-EMS/PMS
 
Infosys Information Platform - Translating data into action
Infosys Information Platform - Translating data into actionInfosys Information Platform - Translating data into action
Infosys Information Platform - Translating data into action
 
Infrastructure Matters
Infrastructure MattersInfrastructure Matters
Infrastructure Matters
 
iARMS Condition Monitoring_Envision Enterprise Solutions
iARMS Condition Monitoring_Envision Enterprise Solutions iARMS Condition Monitoring_Envision Enterprise Solutions
iARMS Condition Monitoring_Envision Enterprise Solutions
 
Value of solar remote monitoring and analytics for operational intelligence
Value of solar remote monitoring and analytics for operational  intelligenceValue of solar remote monitoring and analytics for operational  intelligence
Value of solar remote monitoring and analytics for operational intelligence
 
Doing DevOps for Big Data? What You Need to Know About AIOps
Doing DevOps for Big Data? What You Need to Know About AIOpsDoing DevOps for Big Data? What You Need to Know About AIOps
Doing DevOps for Big Data? What You Need to Know About AIOps
 
Winning with data
Winning with dataWinning with data
Winning with data
 

En vedette

HCL HELPS A LEADING US TELECOM PROTECT ITS MARKET SHARE AND MAINTAIN HIGH LEV...
HCL HELPS A LEADING US TELECOM PROTECT ITS MARKET SHARE AND MAINTAIN HIGH LEV...HCL HELPS A LEADING US TELECOM PROTECT ITS MARKET SHARE AND MAINTAIN HIGH LEV...
HCL HELPS A LEADING US TELECOM PROTECT ITS MARKET SHARE AND MAINTAIN HIGH LEV...HCL Technologies
 
Big Data: Architectures and Approaches
Big Data: Architectures and ApproachesBig Data: Architectures and Approaches
Big Data: Architectures and ApproachesThoughtworks
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...SoftServe
 

En vedette (8)

HCL HELPS A LEADING US TELECOM PROTECT ITS MARKET SHARE AND MAINTAIN HIGH LEV...
HCL HELPS A LEADING US TELECOM PROTECT ITS MARKET SHARE AND MAINTAIN HIGH LEV...HCL HELPS A LEADING US TELECOM PROTECT ITS MARKET SHARE AND MAINTAIN HIGH LEV...
HCL HELPS A LEADING US TELECOM PROTECT ITS MARKET SHARE AND MAINTAIN HIGH LEV...
 
Multi-channel retailing
Multi-channel retailingMulti-channel retailing
Multi-channel retailing
 
Big Data: Architectures and Approaches
Big Data: Architectures and ApproachesBig Data: Architectures and Approaches
Big Data: Architectures and Approaches
 
Hcl company ppt.
Hcl  company ppt.Hcl  company ppt.
Hcl company ppt.
 
Hcl project
Hcl projectHcl project
Hcl project
 
Hcl
HclHcl
Hcl
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 

Similaire à USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS

Prodev Solutions Intro
Prodev Solutions IntroProdev Solutions Intro
Prodev Solutions IntrolarryATprodev
 
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : NotesIs Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : NotesSubhajit Sahu
 
Basic-Project-Estimation-1999
Basic-Project-Estimation-1999Basic-Project-Estimation-1999
Basic-Project-Estimation-1999Michael Wigley
 
TechoERP.pdf
TechoERP.pdfTechoERP.pdf
TechoERP.pdfTechoERP
 
Amplitude wave architecture - Test
Amplitude wave architecture - TestAmplitude wave architecture - Test
Amplitude wave architecture - TestKiran Naiga
 
Adapting data warehouse architecture to benefit from agile methodologies
Adapting data warehouse architecture to benefit from agile methodologiesAdapting data warehouse architecture to benefit from agile methodologies
Adapting data warehouse architecture to benefit from agile methodologiesbboyina
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Daniel Zivkovic
 
Documentation on bigmarket copy
Documentation on bigmarket   copyDocumentation on bigmarket   copy
Documentation on bigmarket copyswamypotharaveni
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Rio Info
 
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demandsMongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demandsMongoDB
 
Development of resource-intensive applications in Visual C++
Development of resource-intensive applications in Visual C++Development of resource-intensive applications in Visual C++
Development of resource-intensive applications in Visual C++PVS-Studio
 
Adapting data warehouse architecture to benefit from agile methodologies
Adapting data warehouse architecture to benefit from agile methodologiesAdapting data warehouse architecture to benefit from agile methodologies
Adapting data warehouse architecture to benefit from agile methodologiesTom Breur
 
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5Robert Grossman
 
Development of resource-intensive applications in Visual C++
Development of resource-intensive applications in Visual C++Development of resource-intensive applications in Visual C++
Development of resource-intensive applications in Visual C++Andrey Karpov
 

Similaire à USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS (20)

Prodev Solutions Intro
Prodev Solutions IntroProdev Solutions Intro
Prodev Solutions Intro
 
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : NotesIs Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
 
Basic-Project-Estimation-1999
Basic-Project-Estimation-1999Basic-Project-Estimation-1999
Basic-Project-Estimation-1999
 
TechoERP.pdf
TechoERP.pdfTechoERP.pdf
TechoERP.pdf
 
Amplitude wave architecture - Test
Amplitude wave architecture - TestAmplitude wave architecture - Test
Amplitude wave architecture - Test
 
Adapting data warehouse architecture to benefit from agile methodologies
Adapting data warehouse architecture to benefit from agile methodologiesAdapting data warehouse architecture to benefit from agile methodologies
Adapting data warehouse architecture to benefit from agile methodologies
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
 
Documentation on bigmarket copy
Documentation on bigmarket   copyDocumentation on bigmarket   copy
Documentation on bigmarket copy
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
Blue book
Blue bookBlue book
Blue book
 
Streaming is a Detail
Streaming is a DetailStreaming is a Detail
Streaming is a Detail
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
 
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demandsMongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
 
251 - Alogarithms Lects.pdf
251 - Alogarithms Lects.pdf251 - Alogarithms Lects.pdf
251 - Alogarithms Lects.pdf
 
Development of resource-intensive applications in Visual C++
Development of resource-intensive applications in Visual C++Development of resource-intensive applications in Visual C++
Development of resource-intensive applications in Visual C++
 
Mr bi
Mr biMr bi
Mr bi
 
Adapting data warehouse architecture to benefit from agile methodologies
Adapting data warehouse architecture to benefit from agile methodologiesAdapting data warehouse architecture to benefit from agile methodologies
Adapting data warehouse architecture to benefit from agile methodologies
 
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
 
Is 4 th
Is 4 thIs 4 th
Is 4 th
 
Development of resource-intensive applications in Visual C++
Development of resource-intensive applications in Visual C++Development of resource-intensive applications in Visual C++
Development of resource-intensive applications in Visual C++
 

Plus de HCL Technologies

HCL HELPS A US BASED WIRELINE TELECOM OPERATOR FOR BETTER LEAD-TO-CASH AND TH...
HCL HELPS A US BASED WIRELINE TELECOM OPERATOR FOR BETTER LEAD-TO-CASH AND TH...HCL HELPS A US BASED WIRELINE TELECOM OPERATOR FOR BETTER LEAD-TO-CASH AND TH...
HCL HELPS A US BASED WIRELINE TELECOM OPERATOR FOR BETTER LEAD-TO-CASH AND TH...HCL Technologies
 
Noise Control of Vacuum Cleaners
Noise Control of Vacuum CleanersNoise Control of Vacuum Cleaners
Noise Control of Vacuum CleanersHCL Technologies
 
Cost-effective Video Analytics in Smart Cities
Cost-effective Video Analytics in Smart CitiesCost-effective Video Analytics in Smart Cities
Cost-effective Video Analytics in Smart CitiesHCL Technologies
 
A novel approach towards a Smarter DSLR Camera
A novel approach towards a Smarter DSLR CameraA novel approach towards a Smarter DSLR Camera
A novel approach towards a Smarter DSLR CameraHCL Technologies
 
Security framework for connected devices
Security framework for connected devicesSecurity framework for connected devices
Security framework for connected devicesHCL Technologies
 
Connected Cars - Use Cases for Indian Scenario
Connected Cars - Use Cases for Indian ScenarioConnected Cars - Use Cases for Indian Scenario
Connected Cars - Use Cases for Indian ScenarioHCL Technologies
 
A Sigh of Relief for Patients with Chronic Diseases
A Sigh of Relief for Patients with Chronic DiseasesA Sigh of Relief for Patients with Chronic Diseases
A Sigh of Relief for Patients with Chronic DiseasesHCL Technologies
 
Painting a Social & Mobile Picture in Real Time
Painting a Social & Mobile Picture in Real TimePainting a Social & Mobile Picture in Real Time
Painting a Social & Mobile Picture in Real TimeHCL Technologies
 
A Novel Design Approach for Electronic Equipment - FEA Based Methodology
A Novel Design Approach for Electronic Equipment - FEA Based MethodologyA Novel Design Approach for Electronic Equipment - FEA Based Methodology
A Novel Design Approach for Electronic Equipment - FEA Based MethodologyHCL Technologies
 
Intrusion Detection System (IDS)
Intrusion Detection System (IDS)Intrusion Detection System (IDS)
Intrusion Detection System (IDS)HCL Technologies
 
Manufacturing Automation and Digitization
Manufacturing Automation and DigitizationManufacturing Automation and Digitization
Manufacturing Automation and DigitizationHCL Technologies
 
Managing Customer Care in Digital
Managing Customer Care in DigitalManaging Customer Care in Digital
Managing Customer Care in DigitalHCL Technologies
 
Digital Customer Care Solutions, Smart Customer Care Solutions, Next Gen Cust...
Digital Customer Care Solutions, Smart Customer Care Solutions, Next Gen Cust...Digital Customer Care Solutions, Smart Customer Care Solutions, Next Gen Cust...
Digital Customer Care Solutions, Smart Customer Care Solutions, Next Gen Cust...HCL Technologies
 
The Internet of Things. Wharton Guest Lecture by Sandeep Kishore – Corporate ...
The Internet of Things. Wharton Guest Lecture by Sandeep Kishore – Corporate ...The Internet of Things. Wharton Guest Lecture by Sandeep Kishore – Corporate ...
The Internet of Things. Wharton Guest Lecture by Sandeep Kishore – Corporate ...HCL Technologies
 
Be Digital or Be Extinct. Wharton Guest Lecture by Sandeep Kishore – Corporat...
Be Digital or Be Extinct. Wharton Guest Lecture by Sandeep Kishore – Corporat...Be Digital or Be Extinct. Wharton Guest Lecture by Sandeep Kishore – Corporat...
Be Digital or Be Extinct. Wharton Guest Lecture by Sandeep Kishore – Corporat...HCL Technologies
 
Transform and Modernize -UK's leading specialists in Pension and Employee Ben...
Transform and Modernize -UK's leading specialists in Pension and Employee Ben...Transform and Modernize -UK's leading specialists in Pension and Employee Ben...
Transform and Modernize -UK's leading specialists in Pension and Employee Ben...HCL Technologies
 
"Cost Savings Enabled for European Financial Services company "
"Cost Savings Enabled for European Financial Services company ""Cost Savings Enabled for European Financial Services company "
"Cost Savings Enabled for European Financial Services company "HCL Technologies
 
Transforming the Product Portfolio
Transforming the Product PortfolioTransforming the Product Portfolio
Transforming the Product PortfolioHCL Technologies
 
Improved Underwriting Capabilities for Life Insurance Provider
Improved Underwriting Capabilities for Life Insurance ProviderImproved Underwriting Capabilities for Life Insurance Provider
Improved Underwriting Capabilities for Life Insurance ProviderHCL Technologies
 

Plus de HCL Technologies (20)

HCL HELPS A US BASED WIRELINE TELECOM OPERATOR FOR BETTER LEAD-TO-CASH AND TH...
HCL HELPS A US BASED WIRELINE TELECOM OPERATOR FOR BETTER LEAD-TO-CASH AND TH...HCL HELPS A US BASED WIRELINE TELECOM OPERATOR FOR BETTER LEAD-TO-CASH AND TH...
HCL HELPS A US BASED WIRELINE TELECOM OPERATOR FOR BETTER LEAD-TO-CASH AND TH...
 
Noise Control of Vacuum Cleaners
Noise Control of Vacuum CleanersNoise Control of Vacuum Cleaners
Noise Control of Vacuum Cleaners
 
Comply
Comply Comply
Comply
 
Cost-effective Video Analytics in Smart Cities
Cost-effective Video Analytics in Smart CitiesCost-effective Video Analytics in Smart Cities
Cost-effective Video Analytics in Smart Cities
 
A novel approach towards a Smarter DSLR Camera
A novel approach towards a Smarter DSLR CameraA novel approach towards a Smarter DSLR Camera
A novel approach towards a Smarter DSLR Camera
 
Security framework for connected devices
Security framework for connected devicesSecurity framework for connected devices
Security framework for connected devices
 
Connected Cars - Use Cases for Indian Scenario
Connected Cars - Use Cases for Indian ScenarioConnected Cars - Use Cases for Indian Scenario
Connected Cars - Use Cases for Indian Scenario
 
A Sigh of Relief for Patients with Chronic Diseases
A Sigh of Relief for Patients with Chronic DiseasesA Sigh of Relief for Patients with Chronic Diseases
A Sigh of Relief for Patients with Chronic Diseases
 
Painting a Social & Mobile Picture in Real Time
Painting a Social & Mobile Picture in Real TimePainting a Social & Mobile Picture in Real Time
Painting a Social & Mobile Picture in Real Time
 
A Novel Design Approach for Electronic Equipment - FEA Based Methodology
A Novel Design Approach for Electronic Equipment - FEA Based MethodologyA Novel Design Approach for Electronic Equipment - FEA Based Methodology
A Novel Design Approach for Electronic Equipment - FEA Based Methodology
 
Intrusion Detection System (IDS)
Intrusion Detection System (IDS)Intrusion Detection System (IDS)
Intrusion Detection System (IDS)
 
Manufacturing Automation and Digitization
Manufacturing Automation and DigitizationManufacturing Automation and Digitization
Manufacturing Automation and Digitization
 
Managing Customer Care in Digital
Managing Customer Care in DigitalManaging Customer Care in Digital
Managing Customer Care in Digital
 
Digital Customer Care Solutions, Smart Customer Care Solutions, Next Gen Cust...
Digital Customer Care Solutions, Smart Customer Care Solutions, Next Gen Cust...Digital Customer Care Solutions, Smart Customer Care Solutions, Next Gen Cust...
Digital Customer Care Solutions, Smart Customer Care Solutions, Next Gen Cust...
 
The Internet of Things. Wharton Guest Lecture by Sandeep Kishore – Corporate ...
The Internet of Things. Wharton Guest Lecture by Sandeep Kishore – Corporate ...The Internet of Things. Wharton Guest Lecture by Sandeep Kishore – Corporate ...
The Internet of Things. Wharton Guest Lecture by Sandeep Kishore – Corporate ...
 
Be Digital or Be Extinct. Wharton Guest Lecture by Sandeep Kishore – Corporat...
Be Digital or Be Extinct. Wharton Guest Lecture by Sandeep Kishore – Corporat...Be Digital or Be Extinct. Wharton Guest Lecture by Sandeep Kishore – Corporat...
Be Digital or Be Extinct. Wharton Guest Lecture by Sandeep Kishore – Corporat...
 
Transform and Modernize -UK's leading specialists in Pension and Employee Ben...
Transform and Modernize -UK's leading specialists in Pension and Employee Ben...Transform and Modernize -UK's leading specialists in Pension and Employee Ben...
Transform and Modernize -UK's leading specialists in Pension and Employee Ben...
 
"Cost Savings Enabled for European Financial Services company "
"Cost Savings Enabled for European Financial Services company ""Cost Savings Enabled for European Financial Services company "
"Cost Savings Enabled for European Financial Services company "
 
Transforming the Product Portfolio
Transforming the Product PortfolioTransforming the Product Portfolio
Transforming the Product Portfolio
 
Improved Underwriting Capabilities for Life Insurance Provider
Improved Underwriting Capabilities for Life Insurance ProviderImproved Underwriting Capabilities for Life Insurance Provider
Improved Underwriting Capabilities for Life Insurance Provider
 

Dernier

Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfpollardmorgan
 
8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCRashishs7044
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdfKhaled Al Awadi
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis UsageNeil Kimberley
 
Ms Motilal Padampat Sugar Mills vs. State of Uttar Pradesh & Ors. - A Milesto...
Ms Motilal Padampat Sugar Mills vs. State of Uttar Pradesh & Ors. - A Milesto...Ms Motilal Padampat Sugar Mills vs. State of Uttar Pradesh & Ors. - A Milesto...
Ms Motilal Padampat Sugar Mills vs. State of Uttar Pradesh & Ors. - A Milesto...ShrutiBose4
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCRashishs7044
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africaictsugar
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Seta Wicaksana
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Pereraictsugar
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?Olivia Kresic
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Riya Pathan
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMVoces Mineras
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607dollysharma2066
 
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCRashishs7044
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCRashishs7044
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyotictsugar
 
Future Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionFuture Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionMintel Group
 

Dernier (20)

Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
 
8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage
 
Enjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCREnjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCR
 
Ms Motilal Padampat Sugar Mills vs. State of Uttar Pradesh & Ors. - A Milesto...
Ms Motilal Padampat Sugar Mills vs. State of Uttar Pradesh & Ors. - A Milesto...Ms Motilal Padampat Sugar Mills vs. State of Uttar Pradesh & Ors. - A Milesto...
Ms Motilal Padampat Sugar Mills vs. State of Uttar Pradesh & Ors. - A Milesto...
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africa
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Perera
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQM
 
Call Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North GoaCall Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North Goa
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
 
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyot
 
Future Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionFuture Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted Version
 

USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS

  • 1. WHITE paper www.hcltech.com Abstract Abbreviations Market Trends and Challenges Solution Case Study Revenue Benchmarking MR Latency Benchmarking Word Count with Combiner Word Count without Combiner Best Practices Conclusion Reference Author Info 2 2 2 3 5 5 5 6 7 7 7 7 7 TABLE OF CONTENTS Adaptive Map Reduce WHITE PAPER
  • 2. This paper explores the various map reduce design patterns and comes out with a unified working solution (library). The library has the potential to ‘adapt’ itself to any data processing need which can be achieved by Map Reduce. This would not only enable HCL and Clients save a lot of man hours but as well as enforces the ‘good practices’ of map reduce design pattern in the code by default. HCL Technologies has been actively working with multiple clients for the last couple of years in verticals such as ISPs, Aero, Banking & Finance and Media & Entertainment delivering them service and solutions in the Big Data/Data Analytics domain. One of the fundamental problems that all of these leading companies came up with to HCLT was processing big data which is in different data formats, spread across multiple sources and with one or more co-relational mapping parameters. There fuels a need for a unified library which can act as a bridge for solving these varied cross domain problems and utilize the good practices of Map Reduce. Hadoop efficiently solved the Volume and Velocity of Big Data; however there is a gap which calls for a solution which will make use of existing frameworks to solve the Variety problem. The solution of the 3rd V (Variety) actually boils down to seamlessly handling of data processing even if the data type/processing algorithm gets modified. The clients gener- ally come up with ad-hoc data source/processing/mapping problems and we have to implement with the appropriate MR programs. However, due to isolated problems/data sources solo programs are written resulting in redundant effort in/across teams and project. Most of the times clients initially lack clear visibility of the entire requirements and midways may request to include a data source. In most of the cases there calls for a lot of rework involved which results in scope change from project management perspective and clients generally don’t want to reschedule much. The project which we are currently implementing for the largest Aerospace Company is a pre-prod application which will expand into a full time production environment in the near future. We currently have visibility into only 3 data source and in production the number of data sources would be at-least 5 times more. The task that the client has asked us to deliver is that there should be minimum code changes and no change at all in the architecture. This challenge is in line with the problem described in paragraphs above. Fig1.MPP Report highlighting the efforts in man-days Data processor/MR job for Data processor/MR job for Data processor/MR job for Unit test with representative data Report & Dashboard development Tool evaluation for reports and dashboard Develop reports (3 reports) Develop dashboard 5 days 5 days 5 days 5 days 35 days 5 days 15 days 10 days 49 50 51 52 53 54 55 56 ID Task Name Duration Start Finish Predecess Tue 11/25/14 Tue 11/25/14 Tue 12/2/14 Tue 12/9/14 Mon 11/3/14 Mon 11/3/14 Mon 11/10/14 Mon 12/1/14 Mon 12/1/14 Mon 12/1/14 Mon 12/8/14 Mon 12/15/14 Fri 12/19/14 Fri 11/7/14 Fri 11/28/14 Fri 12/12/14 33 33 49 51 54 55 Sl. No. Acronyms Full form 1 AMR Adaptive Map Reduce Market Trends and Challenges Abbreviations Abstract
  • 3. As we can clearly see in the diagram above to support each Data Processing Algorithm we need to spend about 5 Man-Days for the development alone. Now with use of AMR the need for such cycles can be eliminated As in any programing paradigm MR has a set of design patterns too. The design patterns are generally based out of ‘good practices’ which evolves out of years of research and implementation in the industry. Currently when MR programs are written these patterns are not used always. However it has been noticed that there is a considerable improvement in performance when patterns are used. By introducing a library/framework we would enforce the projects to follow the good practices of MR. This would also enable projects to quickly map the processing logic to a pattern without much research and would ease the development effort a lot. HCLT Analytics group have a lot of customizable solutions off the shelves for Data Ingestion, Data Persistence and Multi Tenancy however we don’t have a framework/library for core Data Processing of Hadoop. The diagram depicts the fact that the degree to which software is customized does play an important role in project acquisi- tions. Hence a highly customizable solution in Big Data processing module can be of a great value addition to HCLT as a company. It will enable us to go for project acquisitions with overall solutions for every aspect of Data Analytics. We decided to approach this problem first by analysing the Map Reduce design patterns. There are 23 patterns as of now. Fig2. Major Variables affecting Software Acquisition Join Meta Patterns Input and Output Summarization Filtering Data Organitation Reduce Side Join Replicated Join Composite Join Cartesian Products Job Chaining Chain Folding Top Ten Items Job Marging Generating Data External Source Data External Source Input Numerical Summarization Inverted Index Summarization Counting with Counters Filtering Bloom Filering Top Ten Items Distinet Structuredto Hierarchical Parttioning Binning Total Order Sorting Shuffling Partition Pruning Solution Entirely Off-the-Shelf Software Off-the-Shelf Software Partly Customized (a) Degree to which Acquired Software is Customized (b) Scale of Acquisition, or Degree to which the overall Acquisition is Acquired as Separated Components Entirely Custom Software Full System Several Components Single Component
  • 4. The idea was to identify the commonality across these patterns and also to understand the level of dependencies among the implementation details for each pattern. We found out that each pattern require at least Input and Output Paths: Which dataset to process? Where should be the output written? Class of Action required for example: Filtering, Aggregation etc. Processing Details: Which set of fields are required? How? Input and Output Data Types: What to process ? Here as depicted in the diagram, different shapes are created using the Factory Pattern. The shapes are created using ‘Concrete Classes’, the Factory is passed on with the information to create the objects, the Factory instantiate the concrete class according to the information passed and a shape object is created. The question that we asked ourselves was how to create a library/framework which can be used to instantiate the MR Job objects required serving any MR pattern. The well-oiled ‘Factory’ Design Pattern was used for this purpose. Fig2. Major Variables affecting Software Acquisition
  • 5. In AMR we created concrete MR classes for every MR design pattern. The information of which class to instantiate is passed on to the Factory using the xml configuration file as shown in the diagram above. When the data comes into the system the appropriate object is instantiated according to the rules set in regards to the source/algorithm and the MR Job is started. The design pattern used is in its nascent stages, though we are currently using Factory we can slowly evolve into a Builder Pattern when we would want to achieve greater granularity in the data processing. As of now the generic version of the library is WIP. * We cannot reveal the original Class Diagrams and Full Config file details currently due to NDA. Quantitative benefits which can be achieved by AMR are mostly measurable however the framework/ library have the potential to get us some project acquisitions too. Currently we have not taken the solution to our sales teams who are likely to give us those figures. Through latency and cost benchmarking we can illustrate the measurable parameters as follows: The MR Job above without Combiner takes about 40 min to complete as evident in the screenshot above. The CPU Time Taken is about 1964120 ms. One can notice that the Combine Input/Records are present in the screen shot below. Case Study Revenue Benchmarking Let us assume an average of 5 man-days effort for on boarding a data source. With proposed AMR if we are proposing to reduce it to 4 days (average) per data source, we can claim 20% reduction in development effort to on board a new data source. MR Latency Benchmarking The showcased example is the simplest example of Word Count in MR, but the benchmarks clearly highlight the advantages of using a design pattern. Data Set: NY Times news articles: Source: ldc.upenn.edu Documents =300000 No. of Words =102660 Size of Data = 1 GB Word Count with Combiner
  • 6. The MR Job above without Combiner takes about 42.5 min to complete. The CPU Time Taken is about 1853760 ms. One can notice that the Combine Input/Records are 0 in the screen shot below. We can deduce the following from the above There is a gain of about 2.5 min in processing latency There is an increase of about 6% CPU time utilization and 2% Physical Memory utilization. It shows greater consumption of the machine resources. More consumption of the machine resources is always preferable in a distributed environment. Now as control measure we comment out the Combiner class as depicted above and run the program again. Word Count without Combiner
  • 7. We are utilizing the best practices of industry and bringing it all under an umbrella. These would result in huge qualitative benefits in terms of program code and processes. The quality principle/objective of HCL as an organization is “We shall satisfy our customers by delivering quality products and services that meet their requirements on time, every time”. AMR as a framework ensures highest level of quality in the product/service we develop for implementing Data Processing for Big Data. We also belief “The quality of a product is largely determined by the quality of the process that is used to develop and maintain it”. By introducing AMR we would be able to enforce a standardized process of MR across the organization which is based of industry’s best practices in terms of design patterns thus ensuring highest level of quality in the process itself. “On time Delivery, Cost Control, Enhance Customer Satisfaction and Continual Service Improvement” are the key quality objectives of HCLT; AMR would allow us to realize most of the goals effectively One of the core principles of quality is REUSE which AMR promotes by reusing MR code. The tools used for developing the library are free open sources tools none of which is proprietary to the client or any compa- ny. However it may be noted that the AMR concept and the library developed are proprietary to HCLT as a whole. Key Domains where Big Data is in use today are Aero, Auto, Manufacturing, Public Sector, Governance, Health Care and Media, the list goes on. Now all of these domains have unique processing needs for each of the data sources and the algorithm which can be addressed by AMR. Also if one notes closely the solution is domain independent. The modification that is required is only in form of the configuration file which is required to run the program. The solution can be used as-is as a library for any scenarios where we have to use MR for processing data. The solution is not library version or tool dependent. It can support any upgrades or modifications in the supporting libraries as long as there is no major change in the implementation of Map Reduce algorithms itself. We are currently using it with Cloudera Hadoop 4/5 releases as well as vanilla Apache Hadoop. http://www.byzantinereality.com/2009/4/History-of-MapReduce-Part-2w http://www.maxwideman.com/papers/acquisition/involve.htm http://www.slideshare.net/zhengwenshen/20130201-mapreduce-design-patterns https://qualitydiva.hcl.com/Other_Links/OMS_Overview.ppt http://www.tutorialspoint.com/design_pattern/factory_pattern.htm Author Info Kinnar Kumar Sen HCL Engineering and R&D Services Hello there! I am an Ideapreneur. I believe that sustainable business outcomes are driven by relationships nurtured through values like trust, transparency and flexibility. I respect the contract, but believe in going beyond through collaboration, applied innovation and new generation partnership models that put your interest above everything else. Right now 110,000 Ideapreneurs are in a Relationship Beyond the Contract™ with 500 customers in 31 countries. How can I help you? TM Best Practices Conclusion Reference