Graph databases are seeing a spike in popularity as their value in leveraging large data sets for key areas such as fraud detection, marketing, and network optimization become increasingly apparent. With graph databases, it’s been said that ‘the data model and the metadata are the database’. What does this mean in a practical application, and how can this technology be optimized for maximum business value?
2. U N C O N N E C T E D D ATA
I S A L I A B I L I T Y
3. E N T E R P R I S E S N E E D F L E X I B L E , R E U S A B L E
D ATA O N D E M A N D ,
W I T H L E S S D I S R U P T I O N A N D O V E R H E A D
4. K N O W L E D G E G R A P H I S T H E A N S W E R
F L E X I B L E
R E U S A B L E
A C C R E T I V E
5. K N O W L E D G E G R A P H =
K N O W L E D G E T O O L K I T + G R A P H D B
6. W H AT ' S A K N O W L E D G E T O O L K I T ?
V I RT U A L G R A P H S B U I L D K N O W L E D G E A C R O S S S I L O S
B U S I N E S S L O G I C B U I L D S R E U S A B L E , L O G I C A L R E A S O N I N G I N T O T H E G R A P H
M A C H I N E L E A R N I N G I N T E G R AT E S S TAT I S T I C A L R E A S O N I N G
I N T E G R I T Y C O N S T R A I N T VA L I D AT I O N E M P O W E R S D ATA S TA N D A R D S
7. K N O W L E D G E = D ATA P L U S R E A S O N I N G
FA C T C O U N T: 4 E X P L I C I T FA C T S
Inferno
Gareth Edwards
Rogue One
Felicity Jones
Tom Hanks
actor
director
actor
actor
8. K N O W L E D G E = D ATA P L U S R E A S O N I N G
actorOf inverseOf actor
directorOf inverseOf director
actorOf subPropertyOf workedOn
directorOf subPropertyOf workedOn
coworker propertyChain
(workedOn [inverseOf workedOn])
coworker subPropertyOf connectedTo
connectedTo a TransitiveProperty
Inferno
Gareth Edwards
Rogue One
Felicity Jones
Tom Hanks
actor
director
actor
actor
actorOf
actorOf
directorOf
coworker
connectedTo
coworker
connectedTo
connectedTo
, workedOn
, workedOn
, workedOn
FA C T C O U N T: 1 5 E X P L I C I T / I M P L I C I T FA C T S
B U S I N E S S L O G I C T H AT B E T T E R
E X P L A I N S T H E D O M A I N
9. K N O W L E D G E G R A P H S C O N N E C T A L L D ATA
C O N N E C T I N G A L L D ATA C H A N G E S E V E RY T H I N G
10. T H A N K Y O U
A . J . C O O K , N O R T H A M E R I C A N S A L E S
A J @ S TA R D O G . C O M
11. Data Modeling & Metadata
for Graph Databases
Donna Burbank
Global Data Strategy Ltd.
Lessons in Data Modeling DATAVERSITY Series
July 27th, 2017
12. Global Data Strategy, Ltd. 2017
Donna Burbank
Donna is a recognised industry expert in
information management with over 20
years of experience in data strategy,
information management, data modeling,
metadata management, and enterprise
architecture. Her background is multi-
faceted across consulting, product
development, product management, brand
strategy, marketing, and business
leadership.
She is currently the Managing Director at
Global Data Strategy, Ltd., an international
information management consulting
company that specializes in the alignment
of business drivers with data-centric
technology. In past roles, she has served in
key brand strategy and product
management roles at CA Technologies and
Embarcadero Technologies for several of
the leading data management products in
the market.
As an active contributor to the data
management community, she is a long
time DAMA International member, Past
President and Advisor to the DAMA Rocky
Mountain chapter, and was recently
awarded the Excellence in Data
Management Award from DAMA
International in 2016. She was on the
review committee for the Object
Management Group’s (OMG) Information
Management Metamodel (IMM) and the
Business Process Modeling Notation
(BPMN). Donna is also an analyst at the
Boulder BI Train Trust (BBBT) where she
provides advices and gains insight on the
latest BI and Analytics software in the
market.
She has worked with dozens of Fortune
500 companies worldwide in the Americas,
Europe, Asia, and Africa and speaks
regularly at industry conferences. She has
co-authored two books: Data Modeling for
the Business and Data Modeling Made
Simple with ERwin Data Modeler and is a
regular contributor to industry
publications. She can be reached at
donna.burbank@globaldatastrategy.com
Donna is based in Boulder, Colorado, USA.
2
Follow on Twitter @donnaburbank
Today’s hashtag: #LessonsDM
13. Global Data Strategy, Ltd. 2017
Lessons in Data Modeling Series
• January 26th How Data Modeling Fits Into an Overall Enterprise Architecture
• February 23rd Data Modeling and Business Intelligence
• March Conceptual Data Modeling – How to Get the Attention of Business Users
• April The Evolving Role of the Data Architect – What does it mean for your Career?
• May Data Modeling & Metadata Management
• June Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling
• July Data Modeling & Metadata for Graph Databases
• August Data Modeling & Data Integration
• September Data Modeling & MDM
• October Agile & Data Modeling – How Can They Work Together?
• December Data Modeling, Data Quality & Data Governance
3
This Year’s Line Up
14. Global Data Strategy, Ltd. 2017
Word from our Sponsor
4
Stardog Enterprise Knowledge Graph
www.stardog.com
15. Global Data Strategy, Ltd. 2017
Agenda
• What is a Graph Database
• Use Cases for Graph Databases
• Data Modeling & Metadata for Graph Databases
5
What we’ll cover today
16. Global Data Strategy, Ltd. 2017
What is a Graph Database?
• A graph database uses a set of nodes, edges, and
properties to represent and store data.
• With graph databases, the relationships between data
points often matter more than the individual points
themselves. In order to leverage those data relationships,
your organization needs a database technology that stores
• These relationships can help you discover new insights
from your data.
6
18. Global Data Strategy, Ltd. 2017
Graph Database = Thing Relates to Thing
8
Node
Vertice
Edge
Relationship
The more formal way of referring to “thing relates to thing” is
“Nodes & Edges”, “Vertices & Relationships”, etc.
19. Global Data Strategy, Ltd. 2017
Graph Databases Mirror the Way We Think
9
Squirrel!
I should go
visit Mary
I wonder how her
brother John is doing?
Is he still dating
Stephanie?
…In the mind, as in data,
there are always random
data points…
Do they still have that
house at the Lake?
Riding their boats on the lake was great.
Remember when John crashed the boat?
Like my toy
as a child.
Graph databases can be intuitive to many, since they mirror the way the human brain
typically thinks – through Association.
20. Global Data Strategy, Ltd. 2017
“Traditional” way of Looking at the World: Hierarchies
• Carolus Linnaeus in 1735 established a hierarchy/taxonomy for organizing and identifying
biological systems.
Kingdom
Phylum
Class
Order
Family
Genus
Species
21. Global Data Strategy, Ltd. 2017
“New” Way of Looking at the World - Emergence
In philosophy, systems theory, science, and art, emergence is
the way complex systems and patterns arise out of a
multiplicity of relatively simple interactions.
- Wikipedia
22. Global Data Strategy, Ltd. 2017
Graph Databases Combine Flexibility w/ Structure & Meaning
• In many ways, graph databases provide the “best of both worlds”.
12
Flexibility of the “New World”
of Discovery & “Emergence”
Structure & Meaning of the “Old
World” through Ontologies+
23. Global Data Strategy, Ltd. 2017
It’s All About Relationships
• In graph databases, relationships are first class constructs.
• Rather ironically, relational databases lack relationships.
• In relational databases, relationships are enforced through joins and constraints.
• NoSQL (e.g. Key Value) databases are also weak at supporting relationships.
13
“A relational database isn’t about relationships, it’s about constraints.”
– Karen Lopez
Customer Account
Is Owner Of
<Customer> <Owner Of> <Account>
25. Global Data Strategy, Ltd. 2017
Social Networks
15
Donna
Sad, Lonely Person who
doesn’t like data
Who are the cool kids?
i.e. People linked with Donna
26. Global Data Strategy, Ltd. 2017
X Degrees of Separation – “The Bacon Number”
• What’s Audrey Hepburn’s “Bacon Number”? i.e. degrees of separation/relation to actor Kevin Bacon
• As always, metadata and data quality are important., i.e Which Audrey Hepburn?
16Courtesy of oracleofbacon.org
27. Global Data Strategy, Ltd. 2017
Fraud Detection in Online Transactions
• Online transactions typically have certain identifiers, e.g. User ID, IP address, geo location, tracking cookie, credit card number, etc.
• Graph patterns can help detect fraud, e.g.
• The more interconnections exist among identifiers, the greater the cause for concern.
• Typically they would be 1:1.
• Some variations may occur, e.g. Multiple credit cards with one person. Families using same machine, etc.
• Large and tightly-knit graphs are very strong indicators that fraud is taking place.
• Triggers can be put into place so that these patterns are uncovered before they cause damage.
17
IP1 IP1 IP1 IP1 IP1 IP1 IP1 IP1 IP1 IP1
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 CC11 CC12 CC13 CC14 CC15 CC16 CC17
Fraud? FamilyPersonal & Business Card
28. Global Data Strategy, Ltd. 2017
Recommendation Engines
• Recommendation Engines are familiar to most of us who do any online shopping.
• These engines can be powered by a graph database, e.g.
• Capture a customer’s browsing behavior and demographics
• Combine those with their buying history to provide relevant recommendations
18
29. Global Data Strategy, Ltd. 2017
Data Quality & Volume Matters
• Recommendation engines are based on evaluating data sets. If those data sets are faulty or of
poor quality, your results will be flawed.
• Especially if the data sets are small
19
30. Global Data Strategy, Ltd. 2017
Master Data Management (MDM)
• Master Data Management (MDM) is the practice of identifying, cleansing, storing & governance
core data assets of the organization (e.g. customer, product, etc.)
• There are many architectural approaches to MDM. Two are the following:
20
Centralized -- Commonly Relational Virtualized/Registry – Commonly Graph
MDM
Virtualization Layer
• Core data stored in
a common schema
in a centralized
“hub”.
• Used as a common
reference for
operational systems,
DW, etc.
• Data remains in
source systems.
• Referenced through
a common
virtualization layer.
BOTH require the same core foundation of data quality, parsing & matching, semantic meaning,
data governance, etc. in order to be successful… and that’s usually the hardest stuff.
31. Global Data Strategy, Ltd. 2017 21
When you have a
Hammer, everything
looks like a nail
i.e. Data Warehouses serve a
particular purpose for aggregating &
summarizing data. Not ideal for
graph databases.
Graph Databases for Data Warehousing
32. Global Data Strategy, Ltd. 2017
Data Warehousing & Enterprise Knowledge Graph
22
Data Warehouse
…Show me Total Sales by Region and by
Customer each month in 2017
Enterprise Knowledge Graph
Relational & Dimensional data model Graph data model
…Who are my most influential
customers. (with the most connections)
33. Global Data Strategy, Ltd. 2017
Data Management & Ballroom Dancing
“First you dance with yourself, then with your partner, then you dance with the room.”
23
34. Global Data Strategy, Ltd. 2017
An Enterprise Knowledge Graph Provides a Holistic View of
the Organization through Relationships
24
“First you dance with yourself, then with your partner, then you dance with the room.”
Customer Data
Data Quality & Semantics are important
for core enterprise data assets.
Name: Audrey Hepburn
DOB: May 4, 1929
Current Customer: No
But the true value is in the
interrelationships between data assets.
Mother of
Name: Luca Dotti
DOB: February 8, 1970
Current
Customer: Yes
Purchased Yacht Insurance
Purchased Home
Insurance
Filed a Claim
36. Global Data Strategy, Ltd. 2017
Data Modeling for Graph Databases
• There are several dominant ways to model graph databases. Two popular ones include:
• Resource Description Language (RDF) Triples
• Labeled Property Graph
26
Labeled Property Graph
• Made up of nodes, relationships, properties & labels
• Sample Query language: Cypher
• Sample Vendor: Neo4J
Resource Description Language (RDF) Triples
• Made up of subject, predicate object triples
• Sample Query: SPARQL
• Sample Vendor: Stardog
• Both have a close affinity between logical & physical models
• i.e. We already think in “thing relates to thing”
• In the following slides, we’ll use the RDF example, since that is a W3C Open Standard.
37. Global Data Strategy, Ltd. 2017
Graph Query Languages
• Unlike relational databases, where SQL is a general standard, there are a number
of query language options available for graph databases:
• SPARQL: is SQL-like declarative query language that was created by W3C to query RDF
(Resource Description Framework) graphs.
• Cypher: is also a declarative query language that resembles SQL. Created by Neo4J
• GraphQL: is a query language for APIs. Isn’t specific to graph databases, but can be used for
them. Developed by Facebook.
• Gremlin: is a graph traversal language developed for Apache TinkerPop™, an open source,
vendor-agnostic, graph computing framework distributed under the Apache2 license.
27
Again, we’ll use SPARQL in our examples since it’s a W3C standard.
38. Global Data Strategy, Ltd. 2017
Resource Description Framework (RDF)
• The RDF (Resource Description Framework) model from the World Wide Web Consortium (W3C)
provides a way to link resources on the web (people, places, things) using the concept of “triples”.
• This linking structure forms a directed, labeled graph, where the edges represent the named link
between two resources, represented by the graph nodes.
28
Subject Object
Predicate
RDF Triples
39. Global Data Strategy, Ltd. 2017
RDF Triple Example
29
Cynthia Fido
Is Owner Of
<Cynthia> <Owner Of> <Fido>
Reference
• Brackets indicate individual references in RDF. Note that these are
defined by URIs in RDF, but have been simplified for this example.
Subject Predicate Object
40. Global Data Strategy, Ltd. 2017
RDF Triples
30
<Cynthia> <type> <Person>.
<Fido> <type> <Dog>
<Cynthia> <hasName> “Cynthia Smith”
<Fido> <hasName> “Fido”
<Cynthia> <ownerOf> <Fido>
Class
Literal
Instance
41. Global Data Strategy, Ltd. 2017
RDF Triple Graphical Representation
• RDF triples can be intuitively visualized graphically
31
<Cynthia>
<Person>
<Fido>
<ownerOf>
“Cynthia Smith”
<hasName>
“Fido”
<hasName>
<type>
<Dog>
<type>
42. Global Data Strategy, Ltd. 2017
Logical Groupings
@prefix example: http://example.org/example#.
example: Cynthia rdf:type example: Person;
example: hasName “Cynthia Smith” ;
example: ownerOf example: Fido> .
Example: Fido rdf:type example: Dog;
example: hasName: “Fido” .
32
• A Person has a name
• A Person can be an owner
• A Dog has a name
43. Global Data Strategy, Ltd. 2017
Ontologies
• An ontology is a data model of sorts to describe the “things” in RDF data.
• Two types of languages include:
• OWL (W3C Web Ontology): is a Semantic Web language designed to represent rich and complex
knowledge about things, groups of things, and relations between things.
• RDFS (RDF Schema): is a general-purpose language for representing simple RDF vocabularies. It is
considered a precursor to OWL.
• For example:
33
• People have Names
• People can own kinds of things
• Pets can be owned
• A dog is a pet
• Dogs can have names
RDFS OWL can be more Expressive
• A Mother is union of (Parent, Woman)
• This Family ontology links with the Person ontology
(meta-meta-metadata)
• Etc.
44. Global Data Strategy, Ltd. 2017
Ontologies help Define Queries
34
People have Names
People can own kinds of things
Pets can be owned
A dog is a pet
Dogs can have names
Ontology
Show me all of the People who Own Dogs
Query
45. Global Data Strategy, Ltd. 2017
Putting Ontologies & Queries Together
35
SELECT ?name
WHERE {
?person type Person ;
hasName ?name ;
ownerOf ?pet .
?pet type Dog .
}
-> RESULT “Cynthia Smith”
Define Variables
?person type Person ;
hasName ?name ;
ownerOf ?pet .
?pet type Dog.
Write out the Graph
using Variables
Query across the
Graph
46. Global Data Strategy, Ltd. 2017
Summary
• Graph Databases provide powerful enterprise-wide association using simple constructs
• “Thing Relates to Thing”
• Relationships are first class constructs
• Enterprise use cases are best suited to those that focus on interrelationships between data points
• Social Networks
• Fraud Detection
• Recommendation Engines
• Enterprise Knowledge Graph
• Data Modeling & Metadata are supported by simple constructs
• Data structures through Triples: Subject, Predicate, Object
• Semantics through Ontologies (e.g. OWL)
• Queries through SPARQL and other methods
47. Global Data Strategy, Ltd. 2017
About Global Data Strategy, Ltd
• Global Data Strategy is an international information management consulting company that specializes
in the alignment of business drivers with data-centric technology.
• Our passion is data, and helping organizations enrich their business opportunities through data and
information.
• Our core values center around providing solutions that are:
• Business-Driven: We put the needs of your business first, before we look at any technology solution.
• Clear & Relevant: We provide clear explanations using real-world examples.
• Customized & Right-Sized: Our implementations are based on the unique needs of your organization’s
size, corporate culture, and geography.
• High Quality & Technically Precise: We pride ourselves in excellence of execution, with years of
technical expertise in the industry.
37
Data-Driven Business Transformation
Business Strategy
Aligned With
Data Strategy
Visit www.globaldatastrategy.com for more information
48. Global Data Strategy, Ltd. 2017
Contact Info
• Email: donna.burbank@globaldatastrategy.com
• Twitter: @donnaburbank
@GlobalDataStrat
• Website: www.globaldatastrategy.com
38
49. Global Data Strategy, Ltd. 2017
Lessons in Data Modeling Series
• January 26th How Data Modeling Fits Into an Overall Enterprise Architecture
• February 23rd Data Modeling and Business Intelligence
• March Conceptual Data Modeling – How to Get the Attention of Business Users
• April The Evolving Role of the Data Architect – What does it mean for your Career?
• May Data Modeling & Metadata Management
• June Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling
• July Data Modeling & Metadata for Graph Databases
• August Data Modeling & Data Integration
• September Data Modeling & MDM
• October Agile & Data Modeling – How Can They Work Together?
• December Data Modeling, Data Quality & Data Governance
39
This Year’s Line Up