SlideShare une entreprise Scribd logo
1  sur  49
Télécharger pour lire hors ligne
a platform by
Getting to Real-Time in a
Multi-Model Architecture
Data Architecture Summit 2017
Benjamin Nussbaum – CTO, AtomRain Inc.
(Creators of GraphGrid Connected Data Platform)
ben@atomrain.com | @bennussbaum
atomrain.com | graphgrid.com
#DASummit
Business and Architecture
An exercise in how to think about data and the technologies
available to make it a valuable business asset
111/21/17
#DASummit
Technology is Overhyped
• No one-size fits all, silver bullets
• Need to align the shape of your data with the technology made for it
• Rarely does a truly transformational technology come along
• Deep understanding and discipline required to get through the noise
• Avoid running from one technology to the next hot one
• A cursory or surface understanding is not enough here
• Throwing technology or more developer bodies at a problem is not the
solution; often it makes it worse.
• Work with a SME to establish and navigate the data landscape
211/21/17
#DASummit
Data is Living
• Constantly changing as business needs and requirements change
• Data comes in many shapes and sizes
• Data goes through many transformations within an enterprise
• Data needs to become all things to all people
311/21/17
#DASummit
Separating Data Concerns
What do I need to do with my data?
411/21/17
#DASummit
Separating Data Concerns
What do I need to do with my data?
Ingest it (ETL) from multiple sources
511/21/17
#DASummit
Separating Data Concerns
What do I need to do with my data?
Ingest it (ETL) from multiple sources
Store both structured and unstructured data
611/21/17
#DASummit
Separating Data Concerns
What do I need to do with my data?
Ingest it (ETL) from multiple sources
Store both structured and unstructured data
Process the unstructured data to make it useable
711/21/17
#DASummit
Separating Data Concerns
What do I need to do with my data?
Ingest it (ETL) from multiple sources
Store both structured and unstructured data
Process the unstructured data to make it useable
Contextualize, enrich and improve the structured data
811/21/17
#DASummit
Separating Data Concerns
What do I need to do with my data?
Ingest it (ETL) from multiple sources
Store both structured and unstructured data
Process the unstructured data to make it useable
Contextualize, enrich and improve the structured data
Analyze, Reason & Learn. Understand & Drive Decisions
911/21/17
#DASummit
Indexes for Direct Retrieval
• Designed for responding to anticipated questions
• Rapid responses for indexed values
• Not designed for adhoc and unexpected questions
• Costly to maintain in a rapidly changing data environment
• Slow to update because it requires dev/dba cycles
1011/21/17
#DASummit
Pointers for Traversal
• Designed for responding to unanticipated questions
• Rapid responses for connection-centric and depth-based questions
• Not designed for static cache-like return sets
• Easy to maintain in a rapidly changing data environment
• No dev/dba cycle required to maintain performance as data changes
• Optimal for adhoc and unexpected connection-centric questions
1111/21/17
#DASummit
Disk/S3/etc for Binary Files
• Designed for storing files of binary types (i.e pdf, docx, etc)
• Efficient for raw storage of files
• Processing phase should extract meaningful information
• Structured data should reference stored location when relevant
• Optimal for non-processed, unstructured data
1211/21/17
#DASummit
Search for Natural Language
• Designed for finding data through plain text
• Rapid response in ranked order for defined indexes
• Not designed for adhoc and unexpected questions
• Requires synchronization with primary database as data changes
• Optimal for processed data built into indexed text documents
1311/21/17
#DASummit
Separating Data Concerns
Many aspects are not solved by databases
1411/21/17
#DASummit
All The Database Things
DBMS: Database Management System
1511/21/17
#DASummit
Object DBMS
Specialty Database:
• Designed for storing object oriented structures without translation
• Defines classes, objects and models (no separation – mapping – between data and applications)
• Object structures including inheritance and other objects referenced on a property persisted together
• Not designed for multiple applications with varying code structures
• Object oriented code structures are stored directly
• Today many applications use the same database but with varying views and configurations of the data
• Not an option as a backing store for your primary data
1611/21/17
#DASummit
Object DBMS
1711/21/17
#DASummit
Native XML DBMS
Specialty Database:
• Designed for storing XML structures without translation
• Internal data model corresponds to XML documents, but don’t necessarily store data as XML documents
• Support XML specific query languages such as XPath, XQuery and/or XSLT
• Not designed for normalized representations of data
• Similar to document stores in this way
• Overlapping representations of the same data in varying XML documents
• Not an option as a backing store for your primary data
1811/21/17
#DASummit
Native XML DBMS
1911/21/17
http://www.brainkart.com/media/extra/COsmyOf.jpg
#DASummit
Time Series
Specialty Database:
• Designed for streams of time series data as inputs where time is
written in ascending order (most recent on top)
• Not designed for all CRUD operations
• Updates to operations are expected to be a rare occurrence
• Deletions of data are rare and only anticipated to be a large chunk of data far in the past
• Not an option as a backing store for your primary data
2011/21/17
#DASummit
Time Series
2111/21/17
https://axibase.com/wp-content/uploads/2015/08/img_55ccc91fe5d9a.png
#DASummit
Search Engine
Specialty Database:
• Designed for finding content within the search data store
• Typically stored as finely tuned textual or geospatial indexes
• Should be seen as a purposely built cache to support finding data in ranked order
• Not designed for ACID reliability
• Not an option as a backing store for your primary data
2211/21/17
#DASummit
Search Engine
2311/21/17
https://developer.apple.com/library/content/documentation/UserExperience/Conceptual/SearchKitConcepts/art/inverted_index_textposition.jpg
#DASummit
RDF/Triple Stores
Specialty Database:
• Designed for storing RDF model
• Resource Description Framework is a methodology for description of information
• Information is represented in triples: subject – predicate – object
• Provide methods specifically for dealing with triples in an SQL/SPARQL query language
• Rely on indexes being built and maintained for retrieving data
• Not designed for application and other such non-RDF format data
• Not an option as a backing store for your primary data
2411/21/17
#DASummit
RDF/Triple Stores
2511/21/17
https://image.slidesharecdn.com/semanticblockchain-170220202334/95/semantic-blockchain-12-638.jpg?cb=1487667022
#DASummit
Key-Value Stores
Specialty Database:
• Designed to store and retrieve values associated with a given key
• Very simple structure that works well in a store<-->retrieve paradigm with a known key
• Good for caching type requirements when near instant retrieval of a key’s value is needed
• Not designed for ACID reliability
• Not an option as a backing store for your primary data
2611/21/17
#DASummit
Key-Value Stores
2711/21/17
#DASummit
Document Stores
Specialty Database:
• Designed to store and index schema-free textual documents (i.e JSON)
• Enables defining and maintaining indexes for querying against fields in the documents
• Good for store<-->retrieve of large text document structures based on indexed fields within them
• Some implementations support varying document levels of ACID reliability
2811/21/17
#DASummit
Document Stores
2911/21/17
http://docs.mongodb.org/master/MongoDB-data-models-guide.pdf
#DASummit
Wide Column Stores
Specialty Database:
• Designed to store a large number of dynamic columns
• Uses a table structure where columns are created for each row instead of being defined by the table
• Column names and record keys are not fixed (schema-free in this regard compared to RDBMS tables)
• Good for store<-->retrieve of data that looks like two-dimensional key-value store and is within a table
• Some implementations support varying levels of ACID reliability
3011/21/17
#DASummit
Wide Column Stores
3111/21/17
#DASummit
Relational DBMS
Primary Database:
• Designed to store and retrieve data using a table-oriented data model
• Tables look like excel – columns are the schema (properties/attributes) and rows are the data entries
• Good for aggregations and filters within a single table
• Supports JOIN operations to return data across 2 or more tables
• JOIN operations slow down exponentially with each table included
• This is due to the way the Cartesian product of a JOIN works
• Designed to be fully ACID and Transactional (avoid those that aren’t)
• Historically has been the backing store for primary data throughout the enterprise
• Yes, you can read this as “a changing of the guard is taking place” (I’ve seen it accelerate a lot this year)
3211/21/17
#DASummit
Relational DBMS
3311/21/17
http://www2.amk.fi/digma.fi/www.amk.fi/material/images/vanhaamk/etuotanto/5hNkauUBp/Concepts.gif
#DASummit
Graph DBMS
Primary Database:
• Designed to store and retrieve data using a connection-oriented model
• Connections (edges) provide the context of how two things are related
• Entities (nodes) are the things (i.e Person, Account, etc) in your data
• Properties are supported on nodes and edges (avoid those that don’t)
• Indexes are only used to find starting points in the data (avoid those that don’t)
• This removes JOIN pain when answering questions requiring movement across data entities (traversals)
• Designed to be fully ACID and Transactional (avoid those that aren’t)
• Provide Dynamic (index-free), Constant Time movement across data entities
• Becoming the backing store for primary data throughout the enterprise
• Yes, you can read this as “replacing RDBMS as the primary store” (I’ve been using one like this for 5+yrs)
3411/21/17
#DASummit
Graph DBMS
3511/21/17
#DASummit
Multi-Model Database
Distracted Database:
• Mostly designed for or favors one primary storage model and purpose
• Then added support for other conceptual models at the API level
• Beneficial for marketing
• Convenient for developers
• Largely a distraction to initial primary objective
• There are always tradeoffs in optimizing for a specific model
3611/21/17
#DASummit
Not a Database
Not a Database:
• Be warry of databases that aren’t actually databases
• There is much marketing confusion and misclassification happening
• Commonly tout a pluggable storage engine (that just means they’re using a database for storage)
• Other times you have to dig through the documentation to see how they’re actually storing their data
• Often the novel thing is a value added API layer
• Using these will tie you into whichever data storage decisions they’ve made
• Not all “databases” listed on db-engines are actually databases in the true sense
• A database that uses another database as the storage engine
• A database that is all in memory and doesn’t deal with persisted data
3711/21/17
#DASummit
Separating Data Concerns
Many aspects are not solved by databases
3811/21/17
#DASummit
Real-World, Real-Time
An example scenario of what becomes possible when
matching the shape of your data with the technology
3911/21/17
#DASummit
Real-World, Real-Time
Business: Buying and selling of online advertising
Accepted Reality: Maximum of 1hr to update bids
Original Technical: 3TB SQL RDBMS relying on distributed,
federated and highly indexed views to come close to 1hr
Challenge: Taking more than 1hr to update bids
4011/21/17
#DASummit
Real-World, Real-Time
Solution: Identified data structure as highly-connected & deep
New Reality: Search and Intelligent Bid Optimization
Solution Technical: 3TB Neo4j (10% of hardware), Elasticsearch
integrated on GraphGrid, writing over 2B nodes/edges per day
Result: Taking less than 300ms to update bids
4111/21/17
#DASummit
Real-World, Real-Time
An example scenario of what becomes possible when
matching the shape of your data with the technology
4211/21/17
#DASummit
Real-World, Real-Time
Business: Selling complex content packages
Accepted Reality: Between 4-6hrs for sales rep to get answer
Original Technical: Generating 1B row hash tables (Oracle
RDBMS) w/only 1 or 2 SMEs able able to modify stored proc
Challenge: Takes 4-6hrs to know if content package can be sold
4311/21/17
#DASummit
Real-World, Real-Time
Solution: Identified data structure as highly-connected, living
New Reality: Search and intelligent content package negotiator
Solution Technical: Neo4j, Elasticsearch integrated on
GraphGrid, interactive package optimizer & recommender
Result: Sub-second determination of non-conflicting package
across entire sales organization & advisory recommender
system suggesting content to include/exclude throughout deal
4411/21/17
#DASummit
Real-World, Real-Time
An example scenario of what becomes possible when
matching the shape of your data with the technology
4511/21/17
#DASummit
Real-World, Real-Time
Business: Highly regulated global financial institution
Accepted Reality: Complex data lineages will never finish
Original Technical: Oracle SQL RDBMS
Challenge: Queries for complex lineages never finish
4611/21/17
#DASummit
Real-World, Real-Time
Solution: Identified data structure as highly-connected & deep
New Reality: Complex lineages finish in under 1 minute
Solution Technical: Neo4j
Result: Even the most complex lineages finish under 1 minute
4711/21/17
#DASummit
Thank You! Questions?
Getting to Real-Time in a Multi-Model Architecture
by
Benjamin Nussbaum – CTO, AtomRain Inc.
(Creators of GraphGrid Connected Data Platform)
ben@atomrain.com | @bennussbaum
atomrain.com | graphgrid.com
@atomrain | @graphgrid
4811/21/17

Contenu connexe

Tendances

Choosing data warehouse considerations
Choosing data warehouse considerationsChoosing data warehouse considerations
Choosing data warehouse considerationsAseem Bansal
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Amazon Web Services
 
Raising Up Voters with Microsoft Azure Cloud
Raising Up Voters with Microsoft Azure CloudRaising Up Voters with Microsoft Azure Cloud
Raising Up Voters with Microsoft Azure CloudCCG
 
Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big DataJames Serra
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
 
Next Big Thing In IT Space
Next Big Thing In IT SpaceNext Big Thing In IT Space
Next Big Thing In IT SpaceAhsan Shamsudeen
 
Why Data Lake should be the foundation of Enterprise Data Architecture
Why Data Lake should be the foundation of Enterprise Data ArchitectureWhy Data Lake should be the foundation of Enterprise Data Architecture
Why Data Lake should be the foundation of Enterprise Data ArchitectureAgilisium Consulting
 
Enterprise and multi-tier Power BI deployments with Azure DevOps.
Enterprise and multi-tier Power BI deployments with Azure DevOps.Enterprise and multi-tier Power BI deployments with Azure DevOps.
Enterprise and multi-tier Power BI deployments with Azure DevOps.Marc Lelijveld
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Anna Shymchenko
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digitalsambiswal
 
SplunkSummit 2015 - Real World Big Data Architecture
SplunkSummit 2015 -  Real World Big Data ArchitectureSplunkSummit 2015 -  Real World Big Data Architecture
SplunkSummit 2015 - Real World Big Data ArchitectureSplunk
 
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBig Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBigDataExpo
 
Enhancing your career: Building your personal brand
Enhancing your career: Building your personal brandEnhancing your career: Building your personal brand
Enhancing your career: Building your personal brandJames Serra
 
O'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data LakeO'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data LakeVasu S
 
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013Jen Stirrup
 
KnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge baseKnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge baseLaurent Alquier
 
NoSQL – Beyond the Key-Value Store
NoSQL – Beyond the Key-Value StoreNoSQL – Beyond the Key-Value Store
NoSQL – Beyond the Key-Value StoreDATAVERSITY
 
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...Cloudera, Inc.
 

Tendances (20)

Choosing data warehouse considerations
Choosing data warehouse considerationsChoosing data warehouse considerations
Choosing data warehouse considerations
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
 
Raising Up Voters with Microsoft Azure Cloud
Raising Up Voters with Microsoft Azure CloudRaising Up Voters with Microsoft Azure Cloud
Raising Up Voters with Microsoft Azure Cloud
 
Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big Data
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
Next Big Thing In IT Space
Next Big Thing In IT SpaceNext Big Thing In IT Space
Next Big Thing In IT Space
 
Why Data Lake should be the foundation of Enterprise Data Architecture
Why Data Lake should be the foundation of Enterprise Data ArchitectureWhy Data Lake should be the foundation of Enterprise Data Architecture
Why Data Lake should be the foundation of Enterprise Data Architecture
 
Data lake
Data lakeData lake
Data lake
 
Enterprise and multi-tier Power BI deployments with Azure DevOps.
Enterprise and multi-tier Power BI deployments with Azure DevOps.Enterprise and multi-tier Power BI deployments with Azure DevOps.
Enterprise and multi-tier Power BI deployments with Azure DevOps.
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
 
SplunkSummit 2015 - Real World Big Data Architecture
SplunkSummit 2015 -  Real World Big Data ArchitectureSplunkSummit 2015 -  Real World Big Data Architecture
SplunkSummit 2015 - Real World Big Data Architecture
 
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBig Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
 
Enhancing your career: Building your personal brand
Enhancing your career: Building your personal brandEnhancing your career: Building your personal brand
Enhancing your career: Building your personal brand
 
O'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data LakeO'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data Lake
 
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
 
KnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge baseKnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge base
 
NoSQL – Beyond the Key-Value Store
NoSQL – Beyond the Key-Value StoreNoSQL – Beyond the Key-Value Store
NoSQL – Beyond the Key-Value Store
 
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...
 

Similaire à Getting to Real-Time in a Multi-Model Architecture

Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analyticsIke Ellis
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
 
Complete first chapter rdbm 17332
Complete first chapter rdbm 17332Complete first chapter rdbm 17332
Complete first chapter rdbm 17332Tushar Wagh
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptxIke Ellis
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBAhmed Farag
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...Mark Rittman
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDataWorks Summit
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture OverviewChristopher Foot
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothAdaryl "Bob" Wakefield, MBA
 
CST204 DBMS Module-1
CST204 DBMS Module-1CST204 DBMS Module-1
CST204 DBMS Module-1Jyothis Menon
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?James Serra
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
 

Similaire à Getting to Real-Time in a Multi-Model Architecture (20)

Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analytics
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Database Technologies
Database TechnologiesDatabase Technologies
Database Technologies
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
Complete first chapter rdbm 17332
Complete first chapter rdbm 17332Complete first chapter rdbm 17332
Complete first chapter rdbm 17332
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptx
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
NOsql Presentation.pdf
NOsql Presentation.pdfNOsql Presentation.pdf
NOsql Presentation.pdf
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the fly
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
 
CST204 DBMS Module-1
CST204 DBMS Module-1CST204 DBMS Module-1
CST204 DBMS Module-1
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 

Dernier

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 

Dernier (20)

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 

Getting to Real-Time in a Multi-Model Architecture

  • 1. a platform by Getting to Real-Time in a Multi-Model Architecture Data Architecture Summit 2017 Benjamin Nussbaum – CTO, AtomRain Inc. (Creators of GraphGrid Connected Data Platform) ben@atomrain.com | @bennussbaum atomrain.com | graphgrid.com
  • 2. #DASummit Business and Architecture An exercise in how to think about data and the technologies available to make it a valuable business asset 111/21/17
  • 3. #DASummit Technology is Overhyped • No one-size fits all, silver bullets • Need to align the shape of your data with the technology made for it • Rarely does a truly transformational technology come along • Deep understanding and discipline required to get through the noise • Avoid running from one technology to the next hot one • A cursory or surface understanding is not enough here • Throwing technology or more developer bodies at a problem is not the solution; often it makes it worse. • Work with a SME to establish and navigate the data landscape 211/21/17
  • 4. #DASummit Data is Living • Constantly changing as business needs and requirements change • Data comes in many shapes and sizes • Data goes through many transformations within an enterprise • Data needs to become all things to all people 311/21/17
  • 5. #DASummit Separating Data Concerns What do I need to do with my data? 411/21/17
  • 6. #DASummit Separating Data Concerns What do I need to do with my data? Ingest it (ETL) from multiple sources 511/21/17
  • 7. #DASummit Separating Data Concerns What do I need to do with my data? Ingest it (ETL) from multiple sources Store both structured and unstructured data 611/21/17
  • 8. #DASummit Separating Data Concerns What do I need to do with my data? Ingest it (ETL) from multiple sources Store both structured and unstructured data Process the unstructured data to make it useable 711/21/17
  • 9. #DASummit Separating Data Concerns What do I need to do with my data? Ingest it (ETL) from multiple sources Store both structured and unstructured data Process the unstructured data to make it useable Contextualize, enrich and improve the structured data 811/21/17
  • 10. #DASummit Separating Data Concerns What do I need to do with my data? Ingest it (ETL) from multiple sources Store both structured and unstructured data Process the unstructured data to make it useable Contextualize, enrich and improve the structured data Analyze, Reason & Learn. Understand & Drive Decisions 911/21/17
  • 11. #DASummit Indexes for Direct Retrieval • Designed for responding to anticipated questions • Rapid responses for indexed values • Not designed for adhoc and unexpected questions • Costly to maintain in a rapidly changing data environment • Slow to update because it requires dev/dba cycles 1011/21/17
  • 12. #DASummit Pointers for Traversal • Designed for responding to unanticipated questions • Rapid responses for connection-centric and depth-based questions • Not designed for static cache-like return sets • Easy to maintain in a rapidly changing data environment • No dev/dba cycle required to maintain performance as data changes • Optimal for adhoc and unexpected connection-centric questions 1111/21/17
  • 13. #DASummit Disk/S3/etc for Binary Files • Designed for storing files of binary types (i.e pdf, docx, etc) • Efficient for raw storage of files • Processing phase should extract meaningful information • Structured data should reference stored location when relevant • Optimal for non-processed, unstructured data 1211/21/17
  • 14. #DASummit Search for Natural Language • Designed for finding data through plain text • Rapid response in ranked order for defined indexes • Not designed for adhoc and unexpected questions • Requires synchronization with primary database as data changes • Optimal for processed data built into indexed text documents 1311/21/17
  • 15. #DASummit Separating Data Concerns Many aspects are not solved by databases 1411/21/17
  • 16. #DASummit All The Database Things DBMS: Database Management System 1511/21/17
  • 17. #DASummit Object DBMS Specialty Database: • Designed for storing object oriented structures without translation • Defines classes, objects and models (no separation – mapping – between data and applications) • Object structures including inheritance and other objects referenced on a property persisted together • Not designed for multiple applications with varying code structures • Object oriented code structures are stored directly • Today many applications use the same database but with varying views and configurations of the data • Not an option as a backing store for your primary data 1611/21/17
  • 19. #DASummit Native XML DBMS Specialty Database: • Designed for storing XML structures without translation • Internal data model corresponds to XML documents, but don’t necessarily store data as XML documents • Support XML specific query languages such as XPath, XQuery and/or XSLT • Not designed for normalized representations of data • Similar to document stores in this way • Overlapping representations of the same data in varying XML documents • Not an option as a backing store for your primary data 1811/21/17
  • 21. #DASummit Time Series Specialty Database: • Designed for streams of time series data as inputs where time is written in ascending order (most recent on top) • Not designed for all CRUD operations • Updates to operations are expected to be a rare occurrence • Deletions of data are rare and only anticipated to be a large chunk of data far in the past • Not an option as a backing store for your primary data 2011/21/17
  • 23. #DASummit Search Engine Specialty Database: • Designed for finding content within the search data store • Typically stored as finely tuned textual or geospatial indexes • Should be seen as a purposely built cache to support finding data in ranked order • Not designed for ACID reliability • Not an option as a backing store for your primary data 2211/21/17
  • 25. #DASummit RDF/Triple Stores Specialty Database: • Designed for storing RDF model • Resource Description Framework is a methodology for description of information • Information is represented in triples: subject – predicate – object • Provide methods specifically for dealing with triples in an SQL/SPARQL query language • Rely on indexes being built and maintained for retrieving data • Not designed for application and other such non-RDF format data • Not an option as a backing store for your primary data 2411/21/17
  • 27. #DASummit Key-Value Stores Specialty Database: • Designed to store and retrieve values associated with a given key • Very simple structure that works well in a store<-->retrieve paradigm with a known key • Good for caching type requirements when near instant retrieval of a key’s value is needed • Not designed for ACID reliability • Not an option as a backing store for your primary data 2611/21/17
  • 29. #DASummit Document Stores Specialty Database: • Designed to store and index schema-free textual documents (i.e JSON) • Enables defining and maintaining indexes for querying against fields in the documents • Good for store<-->retrieve of large text document structures based on indexed fields within them • Some implementations support varying document levels of ACID reliability 2811/21/17
  • 31. #DASummit Wide Column Stores Specialty Database: • Designed to store a large number of dynamic columns • Uses a table structure where columns are created for each row instead of being defined by the table • Column names and record keys are not fixed (schema-free in this regard compared to RDBMS tables) • Good for store<-->retrieve of data that looks like two-dimensional key-value store and is within a table • Some implementations support varying levels of ACID reliability 3011/21/17
  • 33. #DASummit Relational DBMS Primary Database: • Designed to store and retrieve data using a table-oriented data model • Tables look like excel – columns are the schema (properties/attributes) and rows are the data entries • Good for aggregations and filters within a single table • Supports JOIN operations to return data across 2 or more tables • JOIN operations slow down exponentially with each table included • This is due to the way the Cartesian product of a JOIN works • Designed to be fully ACID and Transactional (avoid those that aren’t) • Historically has been the backing store for primary data throughout the enterprise • Yes, you can read this as “a changing of the guard is taking place” (I’ve seen it accelerate a lot this year) 3211/21/17
  • 35. #DASummit Graph DBMS Primary Database: • Designed to store and retrieve data using a connection-oriented model • Connections (edges) provide the context of how two things are related • Entities (nodes) are the things (i.e Person, Account, etc) in your data • Properties are supported on nodes and edges (avoid those that don’t) • Indexes are only used to find starting points in the data (avoid those that don’t) • This removes JOIN pain when answering questions requiring movement across data entities (traversals) • Designed to be fully ACID and Transactional (avoid those that aren’t) • Provide Dynamic (index-free), Constant Time movement across data entities • Becoming the backing store for primary data throughout the enterprise • Yes, you can read this as “replacing RDBMS as the primary store” (I’ve been using one like this for 5+yrs) 3411/21/17
  • 37. #DASummit Multi-Model Database Distracted Database: • Mostly designed for or favors one primary storage model and purpose • Then added support for other conceptual models at the API level • Beneficial for marketing • Convenient for developers • Largely a distraction to initial primary objective • There are always tradeoffs in optimizing for a specific model 3611/21/17
  • 38. #DASummit Not a Database Not a Database: • Be warry of databases that aren’t actually databases • There is much marketing confusion and misclassification happening • Commonly tout a pluggable storage engine (that just means they’re using a database for storage) • Other times you have to dig through the documentation to see how they’re actually storing their data • Often the novel thing is a value added API layer • Using these will tie you into whichever data storage decisions they’ve made • Not all “databases” listed on db-engines are actually databases in the true sense • A database that uses another database as the storage engine • A database that is all in memory and doesn’t deal with persisted data 3711/21/17
  • 39. #DASummit Separating Data Concerns Many aspects are not solved by databases 3811/21/17
  • 40. #DASummit Real-World, Real-Time An example scenario of what becomes possible when matching the shape of your data with the technology 3911/21/17
  • 41. #DASummit Real-World, Real-Time Business: Buying and selling of online advertising Accepted Reality: Maximum of 1hr to update bids Original Technical: 3TB SQL RDBMS relying on distributed, federated and highly indexed views to come close to 1hr Challenge: Taking more than 1hr to update bids 4011/21/17
  • 42. #DASummit Real-World, Real-Time Solution: Identified data structure as highly-connected & deep New Reality: Search and Intelligent Bid Optimization Solution Technical: 3TB Neo4j (10% of hardware), Elasticsearch integrated on GraphGrid, writing over 2B nodes/edges per day Result: Taking less than 300ms to update bids 4111/21/17
  • 43. #DASummit Real-World, Real-Time An example scenario of what becomes possible when matching the shape of your data with the technology 4211/21/17
  • 44. #DASummit Real-World, Real-Time Business: Selling complex content packages Accepted Reality: Between 4-6hrs for sales rep to get answer Original Technical: Generating 1B row hash tables (Oracle RDBMS) w/only 1 or 2 SMEs able able to modify stored proc Challenge: Takes 4-6hrs to know if content package can be sold 4311/21/17
  • 45. #DASummit Real-World, Real-Time Solution: Identified data structure as highly-connected, living New Reality: Search and intelligent content package negotiator Solution Technical: Neo4j, Elasticsearch integrated on GraphGrid, interactive package optimizer & recommender Result: Sub-second determination of non-conflicting package across entire sales organization & advisory recommender system suggesting content to include/exclude throughout deal 4411/21/17
  • 46. #DASummit Real-World, Real-Time An example scenario of what becomes possible when matching the shape of your data with the technology 4511/21/17
  • 47. #DASummit Real-World, Real-Time Business: Highly regulated global financial institution Accepted Reality: Complex data lineages will never finish Original Technical: Oracle SQL RDBMS Challenge: Queries for complex lineages never finish 4611/21/17
  • 48. #DASummit Real-World, Real-Time Solution: Identified data structure as highly-connected & deep New Reality: Complex lineages finish in under 1 minute Solution Technical: Neo4j Result: Even the most complex lineages finish under 1 minute 4711/21/17
  • 49. #DASummit Thank You! Questions? Getting to Real-Time in a Multi-Model Architecture by Benjamin Nussbaum – CTO, AtomRain Inc. (Creators of GraphGrid Connected Data Platform) ben@atomrain.com | @bennussbaum atomrain.com | graphgrid.com @atomrain | @graphgrid 4811/21/17