SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
! 
Mohammad Quraishi (IT Senior Principal - Cigna) 
atif71@gmail.com 
! 
Leading a Healthcare Company 
to the 
Big Data Promised Land: 
!!! 
A Case Study of Hadoop in Healthcare
About me 
•BS in Computer Science and Engineering from University of 
Connecticut 
•In the Healthcare Industry for over 19 years 
•Programmer most of my career - Architect, Designer 
•Worked in the SOA space for a number of years 
•Lead engineer in the mobile application space 
•Now Lead engineer in the Big Data Analytics Space - Hadoop 
! 
In my spare time 
• Love to travel with the family 
• Video games, music, movies 
• Community relations work 
• Fan of College basketball 
2
Breakdown of the Hadoop Journey 
1 2 3 
The blowback 
What we 
accomplished 
Roadmap to the 
future 
Lessons Learned 
Questions? 
3 
Making the case 
Vision 
Architecture
The Elephant in the room 
Image Credit: Guian Bolisay/Flickr 4
What’s the problem? 
5 
We already have a mature data analysis infrastructure
And it looks something like this… 
What we already do 
•We have independent data marts 
•We have the Hub-and-spoke architecture, the centralized warehouse 
6
What is the vision? 
7 
The ability to perform 
•Descriptive, Predictive and Prescriptive Analytics 
! 
Remove the traditional IT barriers separating the 
business users from insights
Benefits of Big Data 
8 
•Hadoop has the lowest cost per TB ratio of any 
data technology available 
! 
•Getting started with Hadoop is fairly inexpensive 
•“Entry-level” clusters relatively inexpensive 
•Grow in small steps
9 
Benefits of Big Data 
! 
You don’t have to throw away data anymore!
Vision - Reference Architecture 
Analysis 
Microstrategy 
10 
Logs 
Web 
IVR 
Portal 
Mobile 
NoSQL 
Storing 
weblogs 
Real%me'Data'Store'or'event'processing 
SAS 
Pentaho 
R 
? 
Analysis/Modeling'Tools 
Tableau 
Spotfire 
Platfora 
Cognos 
? 
Data'Science'Tools' 
Teradata 
filestack 
RDBMS 
External'Hadoop'Output 
Or'in'HDFS 
*Use Spark 
Streaming and 
append to Hadoop 
output - Realtime 
events 
Live Data 
Streams 
Web Analytics 
Event detection(Storm) 
Scalding 
(Scala) 
MapReduce'Distributed'Programming'Framework 
HDFS 
2 
Hadoop'Cluster'running'HDFS'and'MapReduce 
Includes'Management,'Monitoring'and'Security 
Visualization 
Map Reduce jobs 
Batch 
Log files to 
HDFS 
Data from tables 
to HDFS 
CED/ 
Claims 
Clinical 
Data 
RDBMS 
SQOOP 
Back'up'or'copy'data'from'HDFS'to'a' 
redundant'cluster'for'quick'recovery 
*'For'future'implementa%on'TBD 
Realtime Feed 
Flume 
Teradata 
Hive/Impala 
(SQL) 
Cascading 
(Java) 
Python 
1 
5 
3 
7 
6 
4 
Flume 
SQOOP 
Edge'Node'For'Hadoop'Client 
Jobs 
SQOOP/ 
Flume 
Chronos 
Hadoop' 
Cluster'#2 
Hadoop' 
Cluster'#3 
8
The Initial Evaluation 
11 
•Vendor Evaluation: Which relationship best fits 
our needs without lock-in? 
! 
•Selection of use cases for demonstration 
! 
•Visualization of those use cases
Use Case 1 
12
Use Case 2 
13
Success! 
14 
•Ready to tackle tougher more 
complicated problems 
! 
•Went out looking for more use cases
Ran into misconceptions 
15 
“Let’s use Hadoop as ETL!” 
! 
“Help us move data.” 
! 
“Can we back up data for archiving?” 
! 
! 
!
… & Challenges 
16
But Why? 
17 
•Overuse of the words “Big” & “Data” 
! 
•There was an overlap with other tools and 
platforms 
! 
•Hadoop looked like a swiss army knife 
! 
•Will it take over the world and replace other 
platforms?
Broader impact - Business Benefits 
18 
!! 
•Building a Customer Persona 
!! 
•Service Ops efficiency 
!! 
•Being Customer Centric 
!! 
•Product Efficiency 
!! •Brand Impact
Broader impact - IT Benefits 
19 
!!! 
•Predictive threat modeling 
!! 
•Data Archival 
!! 
•Network Efficiency
Hadoop and Big Data 
20 
! 
•Big Data = Hadoop + Relational + other 
suitable task related technologies 
! 
•Hadoop is complementary
Hadoop is Complementary 
21 
•Hadoop excels at processing and analyzing large volumes of 
distributed, unstructured, structured and semi-structured 
data in batch or near real-time fashion for analysis 
! 
•NoSQL databases are adept at storing and serving up multi-structured 
data in near-real time for web-based applications 
! 
•Massively parallel OLAP databases are best at providing 
analysis of large volumes of mainly structured data - 
Teradata 
! 
•SAS/R - Modeling and Business Intelligence 
! 
•Tableau - Visualization
Embrace the Most Important Change: 
Culture 
22 
Democratize your data and 
reap the benefits!
Why is Hadoop Complementary? 
23
What we accomplished? 
24 
•Evangelized Hadoop 
! 
•Linked Hadoop to BI Tools 
! 
•R on Hadoop 
! 
•A fail fast iterative analytics approach
Lambda Architecture as the foundation 
Credit Nathan Marz - Big Data 
25 
The master dataset is the only part of the Lambda Architecture that absolutely must be safeguarded from corruption. So for this λ
What we accomplished? 
26 
•ETL - Ingest, Transform and Move patterns 
! 
•Logs generated from consumer channels were 
ingested with Flume 
! 
•Standardized on Parquet (Storage) and Snappy 
(Compression) 
! 
•Lifecycle and organization of Data on HDFS 
! 
•LUKS - dm-crypt — for data at rest encryption 
! 
•Sentry and LDAP for Role Based Access Control
A Custom NLP Framework 
27
A Roadmap to the Future 
28
A Roadmap to the Future 
29 
! 
Data Driven Solutions + FP 
! 
! 
“Functional Programming: I came for the 
concurrency, but I stayed for the Data Science” 
Dean Wampler
The Hadoop Stack – Advanced View 
There’s also Workflow Management with Oozie. 
30
Lessons Learned 
31 
•Overuse of the words “Big” & “Data” 
! 
•The overlap 
! 
•Everyone found a use for Hadoop 
! 
•Big Change/Baby Steps 
! 
•Agility + Process = Cognitive Dissonance
Healthcare company needs 
32 
•Security 
! 
•Vendors 
! 
•Vendor Partnerships
WWYS 
33 
“Difficult to see. Always in motion 
is the future…” 
Yoda 
! 
“Many of the truths that we cling to depend on 
our point of view.” 
Yoda 
! 
The Journey of a thousand miles begins with one 
cluster…
34 
Questions? 
! 
Mohammad Quraishi (IT Senior Principal - Cigna) 
atif71@gmail.com

Contenu connexe

Tendances

Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An OverviewArvind Kalyan
 
ROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on HadoopROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on HadoopDataWorks Summit
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleSpringPeople
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
The DBA Is Dead (Again). Long Live the DBA !
The DBA Is Dead (Again). Long Live the DBA !The DBA Is Dead (Again). Long Live the DBA !
The DBA Is Dead (Again). Long Live the DBA !Christian Bilien
 
Data Science with Hadoop: A Primer
Data Science with Hadoop: A PrimerData Science with Hadoop: A Primer
Data Science with Hadoop: A PrimerDataWorks Summit
 
Introduction to big data and apache spark
Introduction to big data and apache sparkIntroduction to big data and apache spark
Introduction to big data and apache sparkMohammed Guller
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use CasesInSemble
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopFebiyan Rachman
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascienceAdam Muise
 
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...CloudxLab
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureRoman Nikitchenko
 
Splunk hunkbeta
Splunk hunkbetaSplunk hunkbeta
Splunk hunkbetaAhnku Toh
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introductionFrans van Noort
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 Dataiku
 
An introduction to Big Data
An introduction to Big DataAn introduction to Big Data
An introduction to Big DataForwardSprint
 

Tendances (20)

Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
ROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on HadoopROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on Hadoop
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeople
 
Smart data for a predictive bank
Smart data for a predictive bankSmart data for a predictive bank
Smart data for a predictive bank
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
The DBA Is Dead (Again). Long Live the DBA !
The DBA Is Dead (Again). Long Live the DBA !The DBA Is Dead (Again). Long Live the DBA !
The DBA Is Dead (Again). Long Live the DBA !
 
Data Science with Hadoop: A Primer
Data Science with Hadoop: A PrimerData Science with Hadoop: A Primer
Data Science with Hadoop: A Primer
 
Introduction to big data and apache spark
Introduction to big data and apache sparkIntroduction to big data and apache spark
Introduction to big data and apache spark
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascience
 
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
 
Splunk hunkbeta
Splunk hunkbetaSplunk hunkbeta
Splunk hunkbeta
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introduction
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016
 
The Ecosystem is too damn big
The Ecosystem is too damn big The Ecosystem is too damn big
The Ecosystem is too damn big
 
An introduction to Big Data
An introduction to Big DataAn introduction to Big Data
An introduction to Big Data
 

Similaire à Leading Healthcare to Big Data with Hadoop

Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asiaMuhammad Rifqi
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Andrew Brust
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelUwe Printz
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupReal Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupCaserta
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Vantara
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overviewRohit Jain
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Eric Baldeschwieler
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...MapR Technologies
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Has Traditional MDM Finally Met its Match?
Has Traditional MDM Finally Met its Match?Has Traditional MDM Finally Met its Match?
Has Traditional MDM Finally Met its Match?Inside Analysis
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 

Similaire à Leading Healthcare to Big Data with Hadoop (20)

Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupReal Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Has Traditional MDM Finally Met its Match?
Has Traditional MDM Finally Met its Match?Has Traditional MDM Finally Met its Match?
Has Traditional MDM Finally Met its Match?
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 

Plus de BigDataEverywhere

Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...BigDataEverywhere
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)BigDataEverywhere
 
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...BigDataEverywhere
 
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) BigDataEverywhere
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...BigDataEverywhere
 
Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop BigDataEverywhere
 
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...BigDataEverywhere
 

Plus de BigDataEverywhere (7)

Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
 
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
 
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
 
Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop
 
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
 

Dernier

DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 

Dernier (20)

DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 

Leading Healthcare to Big Data with Hadoop

  • 1. ! Mohammad Quraishi (IT Senior Principal - Cigna) atif71@gmail.com ! Leading a Healthcare Company to the Big Data Promised Land: !!! A Case Study of Hadoop in Healthcare
  • 2. About me •BS in Computer Science and Engineering from University of Connecticut •In the Healthcare Industry for over 19 years •Programmer most of my career - Architect, Designer •Worked in the SOA space for a number of years •Lead engineer in the mobile application space •Now Lead engineer in the Big Data Analytics Space - Hadoop ! In my spare time • Love to travel with the family • Video games, music, movies • Community relations work • Fan of College basketball 2
  • 3. Breakdown of the Hadoop Journey 1 2 3 The blowback What we accomplished Roadmap to the future Lessons Learned Questions? 3 Making the case Vision Architecture
  • 4. The Elephant in the room Image Credit: Guian Bolisay/Flickr 4
  • 5. What’s the problem? 5 We already have a mature data analysis infrastructure
  • 6. And it looks something like this… What we already do •We have independent data marts •We have the Hub-and-spoke architecture, the centralized warehouse 6
  • 7. What is the vision? 7 The ability to perform •Descriptive, Predictive and Prescriptive Analytics ! Remove the traditional IT barriers separating the business users from insights
  • 8. Benefits of Big Data 8 •Hadoop has the lowest cost per TB ratio of any data technology available ! •Getting started with Hadoop is fairly inexpensive •“Entry-level” clusters relatively inexpensive •Grow in small steps
  • 9. 9 Benefits of Big Data ! You don’t have to throw away data anymore!
  • 10. Vision - Reference Architecture Analysis Microstrategy 10 Logs Web IVR Portal Mobile NoSQL Storing weblogs Real%me'Data'Store'or'event'processing SAS Pentaho R ? Analysis/Modeling'Tools Tableau Spotfire Platfora Cognos ? Data'Science'Tools' Teradata filestack RDBMS External'Hadoop'Output Or'in'HDFS *Use Spark Streaming and append to Hadoop output - Realtime events Live Data Streams Web Analytics Event detection(Storm) Scalding (Scala) MapReduce'Distributed'Programming'Framework HDFS 2 Hadoop'Cluster'running'HDFS'and'MapReduce Includes'Management,'Monitoring'and'Security Visualization Map Reduce jobs Batch Log files to HDFS Data from tables to HDFS CED/ Claims Clinical Data RDBMS SQOOP Back'up'or'copy'data'from'HDFS'to'a' redundant'cluster'for'quick'recovery *'For'future'implementa%on'TBD Realtime Feed Flume Teradata Hive/Impala (SQL) Cascading (Java) Python 1 5 3 7 6 4 Flume SQOOP Edge'Node'For'Hadoop'Client Jobs SQOOP/ Flume Chronos Hadoop' Cluster'#2 Hadoop' Cluster'#3 8
  • 11. The Initial Evaluation 11 •Vendor Evaluation: Which relationship best fits our needs without lock-in? ! •Selection of use cases for demonstration ! •Visualization of those use cases
  • 14. Success! 14 •Ready to tackle tougher more complicated problems ! •Went out looking for more use cases
  • 15. Ran into misconceptions 15 “Let’s use Hadoop as ETL!” ! “Help us move data.” ! “Can we back up data for archiving?” ! ! !
  • 17. But Why? 17 •Overuse of the words “Big” & “Data” ! •There was an overlap with other tools and platforms ! •Hadoop looked like a swiss army knife ! •Will it take over the world and replace other platforms?
  • 18. Broader impact - Business Benefits 18 !! •Building a Customer Persona !! •Service Ops efficiency !! •Being Customer Centric !! •Product Efficiency !! •Brand Impact
  • 19. Broader impact - IT Benefits 19 !!! •Predictive threat modeling !! •Data Archival !! •Network Efficiency
  • 20. Hadoop and Big Data 20 ! •Big Data = Hadoop + Relational + other suitable task related technologies ! •Hadoop is complementary
  • 21. Hadoop is Complementary 21 •Hadoop excels at processing and analyzing large volumes of distributed, unstructured, structured and semi-structured data in batch or near real-time fashion for analysis ! •NoSQL databases are adept at storing and serving up multi-structured data in near-real time for web-based applications ! •Massively parallel OLAP databases are best at providing analysis of large volumes of mainly structured data - Teradata ! •SAS/R - Modeling and Business Intelligence ! •Tableau - Visualization
  • 22. Embrace the Most Important Change: Culture 22 Democratize your data and reap the benefits!
  • 23. Why is Hadoop Complementary? 23
  • 24. What we accomplished? 24 •Evangelized Hadoop ! •Linked Hadoop to BI Tools ! •R on Hadoop ! •A fail fast iterative analytics approach
  • 25. Lambda Architecture as the foundation Credit Nathan Marz - Big Data 25 The master dataset is the only part of the Lambda Architecture that absolutely must be safeguarded from corruption. So for this λ
  • 26. What we accomplished? 26 •ETL - Ingest, Transform and Move patterns ! •Logs generated from consumer channels were ingested with Flume ! •Standardized on Parquet (Storage) and Snappy (Compression) ! •Lifecycle and organization of Data on HDFS ! •LUKS - dm-crypt — for data at rest encryption ! •Sentry and LDAP for Role Based Access Control
  • 27. A Custom NLP Framework 27
  • 28. A Roadmap to the Future 28
  • 29. A Roadmap to the Future 29 ! Data Driven Solutions + FP ! ! “Functional Programming: I came for the concurrency, but I stayed for the Data Science” Dean Wampler
  • 30. The Hadoop Stack – Advanced View There’s also Workflow Management with Oozie. 30
  • 31. Lessons Learned 31 •Overuse of the words “Big” & “Data” ! •The overlap ! •Everyone found a use for Hadoop ! •Big Change/Baby Steps ! •Agility + Process = Cognitive Dissonance
  • 32. Healthcare company needs 32 •Security ! •Vendors ! •Vendor Partnerships
  • 33. WWYS 33 “Difficult to see. Always in motion is the future…” Yoda ! “Many of the truths that we cling to depend on our point of view.” Yoda ! The Journey of a thousand miles begins with one cluster…
  • 34. 34 Questions? ! Mohammad Quraishi (IT Senior Principal - Cigna) atif71@gmail.com