2. DAMA Chicago – December 2017 Chapter Meeting 2
My Background
Employed by Protective Insurance (just started in October of this year):
– Senior Enterprise Data Architect
– Previous employer was CNO Financial Group (Director – Data Strategy & Architecture)
Experience (IT, over 25 yrs; Data focus, nearly 20 yrs):
– Disciplines:
• Enterprise Data Strategy, Data Architecture, Data Design, Data Integration, Reference & Master Data, Data
Warehousing, Business Intelligence, Metadata, Data Quality, Data Governance
– Industries:
• Insurance & Financial Services
• Pharmaceutical
• State Government
• Manufacturing
Other Items:
– Founding member (since 2009) and current President (2016) of DAMA Indiana chapter
– Hold CDMP certification (Master level since 2010)
– Contributing author to DM-BOK2 (Reference & Master Data) released June of this year
3. DAMA Chicago – December 2017 Chapter Meeting 3
Discussion Topics
– Current State Review
• Data Model
• Data Architecture
– Future State Proposed
• Overall Architecture
• Data Lake Specific
– Big Data POC (Proof of Concept)
• Environment Setup
• Use Case Review
• POC Results
– Items on Deck
• Data Access / Presentation Layer
• Information Governance Implications
– Wrap up and Questions
4. DAMA Chicago – December 2017 Chapter Meeting 4
Current State – Enterprise Data Model (High-level Conceptual)
Main Business Entities: (9 in Total)
– Product (Coverage Master)
– Client (Consolidated Level View)
– Party (Source Level View)
– Point of Contact (Communication Method)
– Agent (Producer Contracts & Licenses)
– Application (for Policy Coverage)
– Policy (Pending, Active, or Terminated)
– Claim (Submitted against Policy)
– Event (Type and Timestamp)
Subject Area Relationships:
– Identify Relationship Type / Role
Enterprise Data Glossary:
– Business Terms & Attributes
– Vetted by Data Governance Council
5. DAMA Chicago – December 2017 Chapter Meeting 5
Current State – Data Sharing Model (High-level Logical)
Current Data Design:
– Relational Model
– Abstract Design
– Source Linkage and Lineage
– Lends Itself to Columnar
Reference Entities:
– Static Reference Data
– Environment Metadata
Subject Area Entities:
– Domain Specific (by Business Entity)
– Key-value Pairs (Simulate Columnar)
Model instantiated for each Subject
Area identified (9 in total)
6. DAMA Chicago – December 2017 Chapter Meeting 6
Current State – Data Sharing Architecture
Current Data Stores (all Oracle):
– Landing Zone
– Master Data Hub
– Enterprise Data Warehouse
Current Data Flows:
– Traditional ETL (Informatica)
– Custom Extracts (COBOL, PL/SQL)
Current Reporting & Analytics:
– Static (Business Objects)
– Visualization (Tableau)
– Predictive / Statistical (SAS)
Current Data Profiling:
– Informatica IDQ and Traditional SQL
7. DAMA Chicago – December 2017 Chapter Meeting 7
Future State – Proposed Architecture
Data Layer Components:
– Operational Zone
– Presentation Zone + DV
– Data Lake (BDE)
– Ad-Hoc Zone
Data Flows:
– Batch (solid black lines)
– Service (solid red lines)
• proxied via ESB
– RT Query (dashed black lines)
All Data Layer components
expected to be on-prem
with exception of Ad-Hoc
Zone (to enable variable
use and cost models)
8. DAMA Chicago – December 2017 Chapter Meeting 8
Future State – Proposed Architecture
Architecture Approach:
– Assure Data Centric
– Design as Hub-n-Spoke
– Reduce Point-to-Point
– Enable Data Accessibility
– Implement Data Services
Data Layer as Hub:
– Manage Client Identities
– Proxy Transactions
– Implement EDW
– Provide Data Domain
Perspective Views
– Curate Master Data
– Link Transactional Data
– Enable Data Archiving
– Establish Enterprise LZ
9. DAMA Chicago – December 2017 Chapter Meeting 9
Future State – Proposed Data Lake
Data Lake Environment:
– Cloudera distribution of Hadoop
– 14 Node cluster (10 data, 4 name/edge)
Technical Considerations:
– Enterprise Landing Zone (HDFS + Hive)
– Archive Zone (HDFS)
– Curation Zone (Hive + Impala + Kudu)
– Insights Zone (Hive + Impala + HBase)
– Sandbox Zone (Hive + Hbase + SAS)
– Ingestion (Sqoop + Syncsort)
– Transformation (M/R + Hive + Python + SAS)
Existing MDS Hub to be migrated from relational Oracle data store to columnar Kudu data store
Existing ETL to be migrated from Informatica to Hive + Impala
Utilize Security Toolset from Cloudera to ensure Data encrypted at rest
Note that Informatica BDM (Big Data Management) suite was reviewed / considered
10. DAMA Chicago – December 2017 Chapter Meeting 10
Data Lake POC (Proof of Concept)
POC Environment:
– MS Azure (IaaS set up)
– Cloudera distribution of Hadoop
– 4 Node cluster (3 data, 1 name/edge)
Focused on Three (3) Use Cases:
– Actuarial Valuation Analysis (Single Product Type)
– Ingestion of Relational and Mainframe Data
– Data Service Query (Performance Goal <= 300ms)
Results:
– Condensed Valuation Process
(From Two Weeks to Twenty Hours)
– Ingestion of Relational Data (via Sqoop) and
Mainframe Data (via Syncsort) Successful
– Mirrored 1000 simultaneous executions
(Average Response Time Obtained of 150ms)
11. DAMA Chicago – December 2017 Chapter Meeting 11
Next Steps – Items on Deck
Data Access / Presentation Layer:
– Perform POC on Data
Virtualization Product (Denodo)
– Determine How to Package
Conformed Dimensions from
EDW to Present
‘Perspective Views’
– Establish Integration
Patterns within ESB
Environment
(Semantic / Taxonomic
Messaging Approach)
– Execute Performance
Testing of Data
Service Queries from
Presentation Zone
Information Governance Implications:
– Establish Governance Policies
– Determine Data Classification Approach
– Define Security Architecture for Data Lake
– Identify Access Roles
and Security Controls
– Certify Security of
Data Lake
Environment
12. DAMA Chicago – December 2017 Chapter Meeting 12
Next Steps – Plans for 2018
Funding Secured for POC Environment until June:
– But Establish a Larger Cluster (10 data, 4 name/edge)
– Along with Security Set-up and Data Encryption
Collaborate with Business Areas on new / expanded
prospective Use Cases:
– Expand Actuarial Valuation to Other Product Types
– Additional Actuarial Items outside of Valuation
– Agent Recruiting and Retention
– Claims Fraud (although this one has a long tail…)
– Customer Experience (Journey Map and/or Retention)
Go on the Road…
– Presentations to Business Partners and IT folks
– Extoll the Value of BD and Future State Architecture
– Troll for Funding…$$$ (Sad but true…)
13. DAMA Chicago – December 2017 Chapter Meeting 13
Recap
– Current State Review
• Data Model (Conceptual and Logical)
• Data Architecture
– Future State Proposed
• Overall Architecture (Layout and Approach)
• Data Layer Components
• Data Lake Environment
– Big Data POC (Proof of Concept)
• Environment Setup
• Use Case Review
• POC Results
– Items on Deck
• Data Access / Presentation Layer
• Information Governance Implications
• Next Steps
In the end it is all about…
14. Happy Holidays
Thank You For Your Time and Interest…!!!
Contact Information:
Gene Boomer
Protective Insurance
gboomer@protectiveinsurance.com