SlideShare une entreprise Scribd logo
1  sur  14
Migrating Master Data to a Data Lake
DAMA Chicago – December 2017 Chapter Meeting
DAMA Chicago – December 2017 Chapter Meeting 2
My Background
 Employed by Protective Insurance (just started in October of this year):
– Senior Enterprise Data Architect
– Previous employer was CNO Financial Group (Director – Data Strategy & Architecture)
 Experience (IT, over 25 yrs; Data focus, nearly 20 yrs):
– Disciplines:
• Enterprise Data Strategy, Data Architecture, Data Design, Data Integration, Reference & Master Data, Data
Warehousing, Business Intelligence, Metadata, Data Quality, Data Governance
– Industries:
• Insurance & Financial Services
• Pharmaceutical
• State Government
• Manufacturing
 Other Items:
– Founding member (since 2009) and current President (2016) of DAMA Indiana chapter
– Hold CDMP certification (Master level since 2010)
– Contributing author to DM-BOK2 (Reference & Master Data) released June of this year
DAMA Chicago – December 2017 Chapter Meeting 3
Discussion Topics
– Current State Review
• Data Model
• Data Architecture
– Future State Proposed
• Overall Architecture
• Data Lake Specific
– Big Data POC (Proof of Concept)
• Environment Setup
• Use Case Review
• POC Results
– Items on Deck
• Data Access / Presentation Layer
• Information Governance Implications
– Wrap up and Questions
DAMA Chicago – December 2017 Chapter Meeting 4
Current State – Enterprise Data Model (High-level Conceptual)
 Main Business Entities: (9 in Total)
– Product (Coverage Master)
– Client (Consolidated Level View)
– Party (Source Level View)
– Point of Contact (Communication Method)
– Agent (Producer Contracts & Licenses)
– Application (for Policy Coverage)
– Policy (Pending, Active, or Terminated)
– Claim (Submitted against Policy)
– Event (Type and Timestamp)
 Subject Area Relationships:
– Identify Relationship Type / Role
 Enterprise Data Glossary:
– Business Terms & Attributes
– Vetted by Data Governance Council
DAMA Chicago – December 2017 Chapter Meeting 5
Current State – Data Sharing Model (High-level Logical)
 Current Data Design:
– Relational Model
– Abstract Design
– Source Linkage and Lineage
– Lends Itself to Columnar
 Reference Entities:
– Static Reference Data
– Environment Metadata
 Subject Area Entities:
– Domain Specific (by Business Entity)
– Key-value Pairs (Simulate Columnar)
 Model instantiated for each Subject
Area identified (9 in total)
DAMA Chicago – December 2017 Chapter Meeting 6
Current State – Data Sharing Architecture
 Current Data Stores (all Oracle):
– Landing Zone
– Master Data Hub
– Enterprise Data Warehouse
 Current Data Flows:
– Traditional ETL (Informatica)
– Custom Extracts (COBOL, PL/SQL)
 Current Reporting & Analytics:
– Static (Business Objects)
– Visualization (Tableau)
– Predictive / Statistical (SAS)
 Current Data Profiling:
– Informatica IDQ and Traditional SQL
DAMA Chicago – December 2017 Chapter Meeting 7
Future State – Proposed Architecture
 Data Layer Components:
– Operational Zone
– Presentation Zone + DV
– Data Lake (BDE)
– Ad-Hoc Zone
 Data Flows:
– Batch (solid black lines)
– Service (solid red lines)
• proxied via ESB
– RT Query (dashed black lines)
 All Data Layer components
expected to be on-prem
with exception of Ad-Hoc
Zone (to enable variable
use and cost models)
DAMA Chicago – December 2017 Chapter Meeting 8
Future State – Proposed Architecture
 Architecture Approach:
– Assure Data Centric
– Design as Hub-n-Spoke
– Reduce Point-to-Point
– Enable Data Accessibility
– Implement Data Services
 Data Layer as Hub:
– Manage Client Identities
– Proxy Transactions
– Implement EDW
– Provide Data Domain
Perspective Views
– Curate Master Data
– Link Transactional Data
– Enable Data Archiving
– Establish Enterprise LZ
DAMA Chicago – December 2017 Chapter Meeting 9
Future State – Proposed Data Lake
 Data Lake Environment:
– Cloudera distribution of Hadoop
– 14 Node cluster (10 data, 4 name/edge)
 Technical Considerations:
– Enterprise Landing Zone (HDFS + Hive)
– Archive Zone (HDFS)
– Curation Zone (Hive + Impala + Kudu)
– Insights Zone (Hive + Impala + HBase)
– Sandbox Zone (Hive + Hbase + SAS)
– Ingestion (Sqoop + Syncsort)
– Transformation (M/R + Hive + Python + SAS)
 Existing MDS Hub to be migrated from relational Oracle data store to columnar Kudu data store
 Existing ETL to be migrated from Informatica to Hive + Impala
 Utilize Security Toolset from Cloudera to ensure Data encrypted at rest
 Note that Informatica BDM (Big Data Management) suite was reviewed / considered
DAMA Chicago – December 2017 Chapter Meeting 10
Data Lake POC (Proof of Concept)
 POC Environment:
– MS Azure (IaaS set up)
– Cloudera distribution of Hadoop
– 4 Node cluster (3 data, 1 name/edge)
 Focused on Three (3) Use Cases:
– Actuarial Valuation Analysis (Single Product Type)
– Ingestion of Relational and Mainframe Data
– Data Service Query (Performance Goal <= 300ms)
 Results:
– Condensed Valuation Process
(From Two Weeks to Twenty Hours)
– Ingestion of Relational Data (via Sqoop) and
Mainframe Data (via Syncsort) Successful
– Mirrored 1000 simultaneous executions
(Average Response Time Obtained of 150ms)
DAMA Chicago – December 2017 Chapter Meeting 11
Next Steps – Items on Deck
 Data Access / Presentation Layer:
– Perform POC on Data
Virtualization Product (Denodo)
– Determine How to Package
Conformed Dimensions from
EDW to Present
‘Perspective Views’
– Establish Integration
Patterns within ESB
Environment
(Semantic / Taxonomic
Messaging Approach)
– Execute Performance
Testing of Data
Service Queries from
Presentation Zone
 Information Governance Implications:
– Establish Governance Policies
– Determine Data Classification Approach
– Define Security Architecture for Data Lake
– Identify Access Roles
and Security Controls
– Certify Security of
Data Lake
Environment
DAMA Chicago – December 2017 Chapter Meeting 12
Next Steps – Plans for 2018
 Funding Secured for POC Environment until June:
– But Establish a Larger Cluster (10 data, 4 name/edge)
– Along with Security Set-up and Data Encryption
 Collaborate with Business Areas on new / expanded
prospective Use Cases:
– Expand Actuarial Valuation to Other Product Types
– Additional Actuarial Items outside of Valuation
– Agent Recruiting and Retention
– Claims Fraud (although this one has a long tail…)
– Customer Experience (Journey Map and/or Retention)
 Go on the Road…
– Presentations to Business Partners and IT folks
– Extoll the Value of BD and Future State Architecture
– Troll for Funding…$$$ (Sad but true…)
DAMA Chicago – December 2017 Chapter Meeting 13
Recap
– Current State Review
• Data Model (Conceptual and Logical)
• Data Architecture
– Future State Proposed
• Overall Architecture (Layout and Approach)
• Data Layer Components
• Data Lake Environment
– Big Data POC (Proof of Concept)
• Environment Setup
• Use Case Review
• POC Results
– Items on Deck
• Data Access / Presentation Layer
• Information Governance Implications
• Next Steps
In the end it is all about…
Happy Holidays
Thank You For Your Time and Interest…!!!
Contact Information:
Gene Boomer
Protective Insurance
gboomer@protectiveinsurance.com

Contenu connexe

Similaire à DAMA-Chicago-Dec-2017.pptx

Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Denodo
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesDenodo
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Denodo
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work
Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to WorkDenodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work
Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to WorkDenodo
 
Myth Busters II: BI Tools and Data Virtualization are Interchangeable
Myth Busters II: BI Tools and Data Virtualization are InterchangeableMyth Busters II: BI Tools and Data Virtualization are Interchangeable
Myth Busters II: BI Tools and Data Virtualization are InterchangeableDenodo
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Denodo
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Denodo
 
Prague data management meetup #30 2019-10-04
Prague data management meetup #30 2019-10-04Prague data management meetup #30 2019-10-04
Prague data management meetup #30 2019-10-04Martin Bém
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Enterprise PODS_UC2013_EP_BI_vD
Enterprise PODS_UC2013_EP_BI_vDEnterprise PODS_UC2013_EP_BI_vD
Enterprise PODS_UC2013_EP_BI_vDDion Duran
 
[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...
[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...
[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...Insight Technology, Inc.
 
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data LakesA Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data LakesArcadia Data
 
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...DataWorks Summit
 
Navigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryNavigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryDataWorks Summit/Hadoop Summit
 
Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationDenodo
 
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...Denodo
 

Similaire à DAMA-Chicago-Dec-2017.pptx (20)

Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business Outcomes
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work
Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to WorkDenodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work
Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work
 
Myth Busters II: BI Tools and Data Virtualization are Interchangeable
Myth Busters II: BI Tools and Data Virtualization are InterchangeableMyth Busters II: BI Tools and Data Virtualization are Interchangeable
Myth Busters II: BI Tools and Data Virtualization are Interchangeable
 
Database 2 External Schema
Database 2   External SchemaDatabase 2   External Schema
Database 2 External Schema
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)
 
Prague data management meetup #30 2019-10-04
Prague data management meetup #30 2019-10-04Prague data management meetup #30 2019-10-04
Prague data management meetup #30 2019-10-04
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Enterprise PODS_UC2013_EP_BI_vD
Enterprise PODS_UC2013_EP_BI_vDEnterprise PODS_UC2013_EP_BI_vD
Enterprise PODS_UC2013_EP_BI_vD
 
[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...
[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...
[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...
 
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data LakesA Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
 
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
 
Navigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryNavigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data Discovery
 
Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow Presentation
 
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
 

Dernier

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Dernier (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

DAMA-Chicago-Dec-2017.pptx

  • 1. Migrating Master Data to a Data Lake DAMA Chicago – December 2017 Chapter Meeting
  • 2. DAMA Chicago – December 2017 Chapter Meeting 2 My Background  Employed by Protective Insurance (just started in October of this year): – Senior Enterprise Data Architect – Previous employer was CNO Financial Group (Director – Data Strategy & Architecture)  Experience (IT, over 25 yrs; Data focus, nearly 20 yrs): – Disciplines: • Enterprise Data Strategy, Data Architecture, Data Design, Data Integration, Reference & Master Data, Data Warehousing, Business Intelligence, Metadata, Data Quality, Data Governance – Industries: • Insurance & Financial Services • Pharmaceutical • State Government • Manufacturing  Other Items: – Founding member (since 2009) and current President (2016) of DAMA Indiana chapter – Hold CDMP certification (Master level since 2010) – Contributing author to DM-BOK2 (Reference & Master Data) released June of this year
  • 3. DAMA Chicago – December 2017 Chapter Meeting 3 Discussion Topics – Current State Review • Data Model • Data Architecture – Future State Proposed • Overall Architecture • Data Lake Specific – Big Data POC (Proof of Concept) • Environment Setup • Use Case Review • POC Results – Items on Deck • Data Access / Presentation Layer • Information Governance Implications – Wrap up and Questions
  • 4. DAMA Chicago – December 2017 Chapter Meeting 4 Current State – Enterprise Data Model (High-level Conceptual)  Main Business Entities: (9 in Total) – Product (Coverage Master) – Client (Consolidated Level View) – Party (Source Level View) – Point of Contact (Communication Method) – Agent (Producer Contracts & Licenses) – Application (for Policy Coverage) – Policy (Pending, Active, or Terminated) – Claim (Submitted against Policy) – Event (Type and Timestamp)  Subject Area Relationships: – Identify Relationship Type / Role  Enterprise Data Glossary: – Business Terms & Attributes – Vetted by Data Governance Council
  • 5. DAMA Chicago – December 2017 Chapter Meeting 5 Current State – Data Sharing Model (High-level Logical)  Current Data Design: – Relational Model – Abstract Design – Source Linkage and Lineage – Lends Itself to Columnar  Reference Entities: – Static Reference Data – Environment Metadata  Subject Area Entities: – Domain Specific (by Business Entity) – Key-value Pairs (Simulate Columnar)  Model instantiated for each Subject Area identified (9 in total)
  • 6. DAMA Chicago – December 2017 Chapter Meeting 6 Current State – Data Sharing Architecture  Current Data Stores (all Oracle): – Landing Zone – Master Data Hub – Enterprise Data Warehouse  Current Data Flows: – Traditional ETL (Informatica) – Custom Extracts (COBOL, PL/SQL)  Current Reporting & Analytics: – Static (Business Objects) – Visualization (Tableau) – Predictive / Statistical (SAS)  Current Data Profiling: – Informatica IDQ and Traditional SQL
  • 7. DAMA Chicago – December 2017 Chapter Meeting 7 Future State – Proposed Architecture  Data Layer Components: – Operational Zone – Presentation Zone + DV – Data Lake (BDE) – Ad-Hoc Zone  Data Flows: – Batch (solid black lines) – Service (solid red lines) • proxied via ESB – RT Query (dashed black lines)  All Data Layer components expected to be on-prem with exception of Ad-Hoc Zone (to enable variable use and cost models)
  • 8. DAMA Chicago – December 2017 Chapter Meeting 8 Future State – Proposed Architecture  Architecture Approach: – Assure Data Centric – Design as Hub-n-Spoke – Reduce Point-to-Point – Enable Data Accessibility – Implement Data Services  Data Layer as Hub: – Manage Client Identities – Proxy Transactions – Implement EDW – Provide Data Domain Perspective Views – Curate Master Data – Link Transactional Data – Enable Data Archiving – Establish Enterprise LZ
  • 9. DAMA Chicago – December 2017 Chapter Meeting 9 Future State – Proposed Data Lake  Data Lake Environment: – Cloudera distribution of Hadoop – 14 Node cluster (10 data, 4 name/edge)  Technical Considerations: – Enterprise Landing Zone (HDFS + Hive) – Archive Zone (HDFS) – Curation Zone (Hive + Impala + Kudu) – Insights Zone (Hive + Impala + HBase) – Sandbox Zone (Hive + Hbase + SAS) – Ingestion (Sqoop + Syncsort) – Transformation (M/R + Hive + Python + SAS)  Existing MDS Hub to be migrated from relational Oracle data store to columnar Kudu data store  Existing ETL to be migrated from Informatica to Hive + Impala  Utilize Security Toolset from Cloudera to ensure Data encrypted at rest  Note that Informatica BDM (Big Data Management) suite was reviewed / considered
  • 10. DAMA Chicago – December 2017 Chapter Meeting 10 Data Lake POC (Proof of Concept)  POC Environment: – MS Azure (IaaS set up) – Cloudera distribution of Hadoop – 4 Node cluster (3 data, 1 name/edge)  Focused on Three (3) Use Cases: – Actuarial Valuation Analysis (Single Product Type) – Ingestion of Relational and Mainframe Data – Data Service Query (Performance Goal <= 300ms)  Results: – Condensed Valuation Process (From Two Weeks to Twenty Hours) – Ingestion of Relational Data (via Sqoop) and Mainframe Data (via Syncsort) Successful – Mirrored 1000 simultaneous executions (Average Response Time Obtained of 150ms)
  • 11. DAMA Chicago – December 2017 Chapter Meeting 11 Next Steps – Items on Deck  Data Access / Presentation Layer: – Perform POC on Data Virtualization Product (Denodo) – Determine How to Package Conformed Dimensions from EDW to Present ‘Perspective Views’ – Establish Integration Patterns within ESB Environment (Semantic / Taxonomic Messaging Approach) – Execute Performance Testing of Data Service Queries from Presentation Zone  Information Governance Implications: – Establish Governance Policies – Determine Data Classification Approach – Define Security Architecture for Data Lake – Identify Access Roles and Security Controls – Certify Security of Data Lake Environment
  • 12. DAMA Chicago – December 2017 Chapter Meeting 12 Next Steps – Plans for 2018  Funding Secured for POC Environment until June: – But Establish a Larger Cluster (10 data, 4 name/edge) – Along with Security Set-up and Data Encryption  Collaborate with Business Areas on new / expanded prospective Use Cases: – Expand Actuarial Valuation to Other Product Types – Additional Actuarial Items outside of Valuation – Agent Recruiting and Retention – Claims Fraud (although this one has a long tail…) – Customer Experience (Journey Map and/or Retention)  Go on the Road… – Presentations to Business Partners and IT folks – Extoll the Value of BD and Future State Architecture – Troll for Funding…$$$ (Sad but true…)
  • 13. DAMA Chicago – December 2017 Chapter Meeting 13 Recap – Current State Review • Data Model (Conceptual and Logical) • Data Architecture – Future State Proposed • Overall Architecture (Layout and Approach) • Data Layer Components • Data Lake Environment – Big Data POC (Proof of Concept) • Environment Setup • Use Case Review • POC Results – Items on Deck • Data Access / Presentation Layer • Information Governance Implications • Next Steps In the end it is all about…
  • 14. Happy Holidays Thank You For Your Time and Interest…!!! Contact Information: Gene Boomer Protective Insurance gboomer@protectiveinsurance.com