SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
THREE
Big Data
CASE STUDIES
Great use cases of Big Data
Big Data Exploration
Find, visualize, understand all big
data to improve decision making
Enhanced 3600 View
of the Customer
Extend existing customer views
(CRM, etc) by incorporating
additional internal and external
information sources
Security/Intelligence Extension
Lower risk, detect fraud and
monitor cyber security in real-time
Data Warehouse Augmentation
Integrate big data and data
warehouse capabilities to increase
operational efficiency
Operations Analysis
Analyze a variety of machine
data for improved business results
• Greater efficiencies
in business
processes
• New insights from
combining and
analyzing data
types in new ways
• Develop new
business models
with resulting
increased market
presence and
revenue
Why Big Data
File Systems
Relational Data
Content Mgmt
Email
CRM
Supply Chain
ERP
RSS Feeds
Cloud
Custom SourcesDataViews
Applications/
Users
Atidan Approach
Implement a
Hadoop-
centric
reference
architecture
Move
enterprise
batch
processing to
Hadoop
Make Hadoop
the single
point of truth
Massively
reduce ETL by
transforming
within
Hadoop
Move results
and
aggregates
back to legacy
systems for
consumption
Retain, within
Hadoop,
source files at
the finest
granularity for
re-use
Top Criteria
• Allow users to use familiar consumption interfaces (web, mobile)
• Enable businesses to unlock previously unusable data
Unlock Big
Data
Simplify
Your
Warehouse
Preprocess
Raw Data
Ingest
BigData
ArchitectureHighlevel
Atidan Case Study
Usage Analysis using Hadoop
• Business Need
• A large conglomerate had to analyze the last 10 years usage of its web applications by using the IIS logs
• The logs received from IIS were stored in multiple files e.g. Daily logs
• The data had free text, it was unstructured and it also contained irrelevant data
• The exact analysis criteria/parameters/desired outcome were not pre-known
• Solution
• Traditional RDBMS could not handle the problem due to the type and volume of the data and the
uncertainty around ultimate analysis criteria
• Atidan delivered a Hadoop based solution that performed transformation of raw data into reports easily
• The solution was fault tolerant to data inconsistencies
• Hadoop provided elasticity to incremental data addition
• Scalability in the range of Peta Bytes
• Based on data size and complexity, the processing can be scaled from one node to 100 nodes
• Schema-less architecture helped in dynamically changing the data model and analytics even at a late stage
in the project
• The organization got completely new and unexpected insights on employee, customer and vendor/partner
behavior
• Correlations between employee’s usage pattern and attrition as well as productivity were established
Atidan Case Study
Usage Analysis using Hadoop
0
2000
4000
6000
8000
10000
12000
14000
Accepted…
BadRequest…
Created(201)
Forbidden…
Not…
NotFound…
OK(200)
Unauthorise…
Request Types
0
200
400
600
800
1000
1200
January
March
May
July
September
November
January
March
May
July
September
November
2001 2002
Monthly Requests
0
200000
400000
600000
Amare
Amit
Bhagat
Mukesh
Praneel
Sanjog
Vimal
Users
• The size of data being collected
and analyzed in industry for
business intelligence (BI) is
growing rapidly making
traditional warehousing solution
prohibitively expensive
• Map Reduce is low level and
complex to write
• Hive provides high level query
language like SQL
• This allows for ad-hoc analysis
• Business need not know patterns
to look for in advance
Big Query - Hive
Atidan Case Study
Customer data collection (KYC) using Hadoop
• Business Need
• A financial institution had to periodically collect customer data
• Customers are very reluctant to provide updated data
• This customer data has to be cross-checked against the billions of transactions they receive per day
• They want to collate data that is available in public domain from known social media sites
• The data had free text, it was unstructured and it also contained irrelevant data
• Solution
• A graph database is constructed over the extracted social data to analyze transactions
• Atidan delivered a Hadoop based solution that performed transformation of raw data into a graph database
• Aggregate customer information from existing sources, social media, government sources
• Analyzed transaction to find hidden patterns
• Enable link analysis, risk monitoring
• Facilitate decision making(new products) and customer discovery
Atidan Case Study
Customer data collection (KYC) using Hadoop
Big Data Processing
Graph Database
Customer Clustering
Income/Expense changes
Corporate structure
changes
AML
Peer group analysis
Pattern Analysis
Customer InformationWeb
Social
Channel
Partners
Utility
Providers
Aadhar
UIDAI
• Lowers cost of follow-up with users
• Reduces loses by highlighting risky
users early
• Graph database based AML
• Insights into
• New products
• New customers
• New loans to existing customers
• New investment opportunities for
customers
• Reduces operational errors
• Traceability of data source
Advantages
of Hadoop (KYC) Solution to Banks
AML
Graph
Queries
Due
Diligence
Risk
Credit
Scoring
Mitigation
Analysis
Peer
groups
New
Prospects
Insights
New
Products
New
Customers
Atidan Case Study
Email scanning and categorization using MongoDB
Business Need
Retrieve potentially millions of daily emails from a common webmail account, categorize them and post them into individual user’s
page for frontend access
The existing process had significant performance, reliability and scalability issues. The user would also receive a lot of SPAM
Solution
Atidan proposed a MongoDB-Drupal based solution with the following approach:
• Scheduler was created to pull only headers from the all-user common webmail account
• Stored them into the intermediate Catalog in MongoDB
• Data transformed based on the recipient address and user preferences. SPAM removed. Email body was fetched for the filtered
records and saved into the final Catalog in MongoDB
• Emails from the final catalog pushed into the front end platform (Drupal)
Key Takeaways
• Leverage the power of MongoDB in processing ’Big Data’ of millions of daily emails. It is much faster, easy to scale and very flexible
• The task was spilt into multiple sub-tasks and better algorithm used for performance and efficiency
Atidan Case Study
Email scanning and categorization using MongoDB
• Node.js (data transformation)
• MongoDB (database)
• Schema-less
• RESTFUL service to access data from the browser
• Drupal (Frontend)
• Basic unit of data storage and transfer was JSON object
• Storage and querying
• NoSQL/Simple/Schema-less database
• Advantages
• highly scalable, very flexible, simple
• Connectivity
• node.js
 Server side Javascript
Technologies used
Thank you!
www.atidan.com
social@atidan.com

Contenu connexe

Tendances

Data Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation SlidesData Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation SlidesSlideTeam
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
1. Data Analytics-introduction
1. Data Analytics-introduction1. Data Analytics-introduction
1. Data Analytics-introductionkrishna singh
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analyticsSSaudia
 
Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To AnalyticsAlex Meadows
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solidLars Albertsson
 
Big Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-CommerceBig Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-CommerceUyoyo Edosio
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data scienceTanujaSomvanshi1
 
Introduction to basic data analytics tools
Introduction to basic data analytics toolsIntroduction to basic data analytics tools
Introduction to basic data analytics toolsNascenia IT
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.pptneelamoberoi1030
 
Big Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesBig Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesSlideTeam
 
Big data & Digital Marketing
Big data & Digital MarketingBig data & Digital Marketing
Big data & Digital MarketingKarthik Bharath
 

Tendances (20)

Data Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation SlidesData Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation Slides
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
1. Data Analytics-introduction
1. Data Analytics-introduction1. Data Analytics-introduction
1. Data Analytics-introduction
 
Data science
Data scienceData science
Data science
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To Analytics
 
Data analytics
Data analyticsData analytics
Data analytics
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
 
Big Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-CommerceBig Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-Commerce
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
 
Introduction to basic data analytics tools
Introduction to basic data analytics toolsIntroduction to basic data analytics tools
Introduction to basic data analytics tools
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Big Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesBig Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation Slides
 
Data science
Data scienceData science
Data science
 
Data visualization
Data visualizationData visualization
Data visualization
 
Big data & Digital Marketing
Big data & Digital MarketingBig data & Digital Marketing
Big data & Digital Marketing
 

En vedette

Business case for Big Data Analytics
Business case for Big Data AnalyticsBusiness case for Big Data Analytics
Business case for Big Data AnalyticsVijay Rao
 
Big Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBig Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBlueData, Inc.
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBernard Marr
 

En vedette (6)

Business case for Big Data Analytics
Business case for Big Data AnalyticsBusiness case for Big Data Analytics
Business case for Big Data Analytics
 
Big Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBig Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 Telco
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 
What is big data?
What is big data?What is big data?
What is big data?
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 

Similaire à Three Big Data Case Studies

Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Cloudera, Inc.
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Empowering Businesses through Big Data Analytics
Empowering Businesses through  Big Data AnalyticsEmpowering Businesses through  Big Data Analytics
Empowering Businesses through Big Data AnalyticsOlha Hrytsay
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeMongoDB
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data PlatformVikas Manoria
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence ArchitecturePhilippe Julio
 
Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise deteo
 
Hadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural PatternsHadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural PatternsDataWorks Summit
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureMongoDB
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesAmazon Web Services
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationMongoDB
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 
Foundation of Business Intelligence for Business Firms .ppt
Foundation of Business Intelligence for Business Firms .pptFoundation of Business Intelligence for Business Firms .ppt
Foundation of Business Intelligence for Business Firms .pptRoshni814224
 
New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013IBM Sverige
 

Similaire à Three Big Data Case Studies (20)

Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Empowering Businesses through Big Data Analytics
Empowering Businesses through  Big Data AnalyticsEmpowering Businesses through  Big Data Analytics
Empowering Businesses through Big Data Analytics
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
 
Bi orientations
Bi orientationsBi orientations
Bi orientations
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence Architecture
 
Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise
 
Hadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural PatternsHadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural Patterns
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Retail & CPG
Retail & CPGRetail & CPG
Retail & CPG
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Foundation of Business Intelligence for Business Firms .ppt
Foundation of Business Intelligence for Business Firms .pptFoundation of Business Intelligence for Business Firms .ppt
Foundation of Business Intelligence for Business Firms .ppt
 
New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013
 

Dernier

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Dernier (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Three Big Data Case Studies

  • 2. Great use cases of Big Data Big Data Exploration Find, visualize, understand all big data to improve decision making Enhanced 3600 View of the Customer Extend existing customer views (CRM, etc) by incorporating additional internal and external information sources Security/Intelligence Extension Lower risk, detect fraud and monitor cyber security in real-time Data Warehouse Augmentation Integrate big data and data warehouse capabilities to increase operational efficiency Operations Analysis Analyze a variety of machine data for improved business results
  • 3. • Greater efficiencies in business processes • New insights from combining and analyzing data types in new ways • Develop new business models with resulting increased market presence and revenue Why Big Data File Systems Relational Data Content Mgmt Email CRM Supply Chain ERP RSS Feeds Cloud Custom SourcesDataViews Applications/ Users
  • 4. Atidan Approach Implement a Hadoop- centric reference architecture Move enterprise batch processing to Hadoop Make Hadoop the single point of truth Massively reduce ETL by transforming within Hadoop Move results and aggregates back to legacy systems for consumption Retain, within Hadoop, source files at the finest granularity for re-use Top Criteria • Allow users to use familiar consumption interfaces (web, mobile) • Enable businesses to unlock previously unusable data Unlock Big Data Simplify Your Warehouse Preprocess Raw Data Ingest BigData ArchitectureHighlevel
  • 5.
  • 6. Atidan Case Study Usage Analysis using Hadoop • Business Need • A large conglomerate had to analyze the last 10 years usage of its web applications by using the IIS logs • The logs received from IIS were stored in multiple files e.g. Daily logs • The data had free text, it was unstructured and it also contained irrelevant data • The exact analysis criteria/parameters/desired outcome were not pre-known • Solution • Traditional RDBMS could not handle the problem due to the type and volume of the data and the uncertainty around ultimate analysis criteria • Atidan delivered a Hadoop based solution that performed transformation of raw data into reports easily • The solution was fault tolerant to data inconsistencies • Hadoop provided elasticity to incremental data addition • Scalability in the range of Peta Bytes • Based on data size and complexity, the processing can be scaled from one node to 100 nodes • Schema-less architecture helped in dynamically changing the data model and analytics even at a late stage in the project • The organization got completely new and unexpected insights on employee, customer and vendor/partner behavior • Correlations between employee’s usage pattern and attrition as well as productivity were established
  • 7. Atidan Case Study Usage Analysis using Hadoop 0 2000 4000 6000 8000 10000 12000 14000 Accepted… BadRequest… Created(201) Forbidden… Not… NotFound… OK(200) Unauthorise… Request Types 0 200 400 600 800 1000 1200 January March May July September November January March May July September November 2001 2002 Monthly Requests 0 200000 400000 600000 Amare Amit Bhagat Mukesh Praneel Sanjog Vimal Users
  • 8. • The size of data being collected and analyzed in industry for business intelligence (BI) is growing rapidly making traditional warehousing solution prohibitively expensive • Map Reduce is low level and complex to write • Hive provides high level query language like SQL • This allows for ad-hoc analysis • Business need not know patterns to look for in advance Big Query - Hive
  • 9.
  • 10. Atidan Case Study Customer data collection (KYC) using Hadoop • Business Need • A financial institution had to periodically collect customer data • Customers are very reluctant to provide updated data • This customer data has to be cross-checked against the billions of transactions they receive per day • They want to collate data that is available in public domain from known social media sites • The data had free text, it was unstructured and it also contained irrelevant data • Solution • A graph database is constructed over the extracted social data to analyze transactions • Atidan delivered a Hadoop based solution that performed transformation of raw data into a graph database • Aggregate customer information from existing sources, social media, government sources • Analyzed transaction to find hidden patterns • Enable link analysis, risk monitoring • Facilitate decision making(new products) and customer discovery
  • 11. Atidan Case Study Customer data collection (KYC) using Hadoop Big Data Processing Graph Database Customer Clustering Income/Expense changes Corporate structure changes AML Peer group analysis Pattern Analysis Customer InformationWeb Social Channel Partners Utility Providers Aadhar UIDAI
  • 12. • Lowers cost of follow-up with users • Reduces loses by highlighting risky users early • Graph database based AML • Insights into • New products • New customers • New loans to existing customers • New investment opportunities for customers • Reduces operational errors • Traceability of data source Advantages of Hadoop (KYC) Solution to Banks AML Graph Queries Due Diligence Risk Credit Scoring Mitigation Analysis Peer groups New Prospects Insights New Products New Customers
  • 13.
  • 14. Atidan Case Study Email scanning and categorization using MongoDB Business Need Retrieve potentially millions of daily emails from a common webmail account, categorize them and post them into individual user’s page for frontend access The existing process had significant performance, reliability and scalability issues. The user would also receive a lot of SPAM Solution Atidan proposed a MongoDB-Drupal based solution with the following approach: • Scheduler was created to pull only headers from the all-user common webmail account • Stored them into the intermediate Catalog in MongoDB • Data transformed based on the recipient address and user preferences. SPAM removed. Email body was fetched for the filtered records and saved into the final Catalog in MongoDB • Emails from the final catalog pushed into the front end platform (Drupal) Key Takeaways • Leverage the power of MongoDB in processing ’Big Data’ of millions of daily emails. It is much faster, easy to scale and very flexible • The task was spilt into multiple sub-tasks and better algorithm used for performance and efficiency
  • 15. Atidan Case Study Email scanning and categorization using MongoDB
  • 16. • Node.js (data transformation) • MongoDB (database) • Schema-less • RESTFUL service to access data from the browser • Drupal (Frontend) • Basic unit of data storage and transfer was JSON object • Storage and querying • NoSQL/Simple/Schema-less database • Advantages • highly scalable, very flexible, simple • Connectivity • node.js  Server side Javascript Technologies used