Managing Complex, Heterogeneous Data with Knowledge Graphs

a platform by
Knowledge Bases as a Data
Platform: How to Effectively
Manage Complex,
Heterogeneous Data
Data Architecture Summit 2017
Benjamin Nussbaum – CTO, AtomRain Inc.
(Creators of GraphGrid Connected Data Platform)
ben@atomrain.com | @bennussbaum
atomrain.com | graphgrid.com

#DASummit
Complex Data Use Cases
Representing data as a knowledge graph
111/21/17

#DASummit
What We Want From Data
Intuitive
Speed
Agility
Meaning
Intelligence
211/21/17

#DASummit
How is Data a Graph?
Representing data as a knowledge graph
611/21/17

#DASummit
User Personalization
711/21/17

#DASummit
Master Data Management
811/21/17

#DASummit
Fraud Detection
911/21/17

#DASummit
Graph Based Search
1011/21/17
IN
IN

#DASummit
Network & IT Operations
1111/21/17

#DASummit
Identity & Access Management
1211/21/17

#DASummit
Day in the Life of an RDBMS
Developer
1311/21/17

#DASummit
2111/21/17
SELECT
p.name,
c.country, c.leader, p.hair,
u.name, u.pres, u.state
FROM
people p
LEFT JOIN country c ON c.ID=p.country
LEFT JOIN uni u ON p.uni=u.id
WHERE
u.state=‘CT’

#DASummit
SQL Pains
• Complex to model and store relationships
• Performance degrades with increases in data
• Queries get long and complex
• Maintenance is painful
2711/21/17

#DASummit
Graph Gains
• Easy to model and store relationships
• Performance of relationship traversal remains constant with growth in
data size
• Queries are shortened and more readable
• Adding additional properties and relationships can be done on the fly -
no migrations
2811/21/17

#DASummit
What does this knowledge
graph look like?
2911/21/17

#DASummit
3011/21/17
AnnDan LOVES

#DASummit
Property Graph Model
3111/21/17
LOVES
AnnDan
CREATE (:Person { name:“Dan”} ) - [:LOVES]-> (:Person { name:“Ann”} )
NODE
LABEL PROPERTY
NODE
LABEL PROPERTY

#DASummit
3311/21/17
MATCH
(p:Person)-[:WENT_TO]->(u:Uni),
(p)-[:LIVES_IN]->(c:Country),
(u)-[:LED_BY]->(l:Leader),
(u)-[:LOCATED_IN]->(s:State)
WHERE
s.abbr = ‘CT’
RETURN
p.name,
c.country, c.leader, p.hair,
u.name, l.name, s.abbr

#DASummit
How do you get to a
knowledge graph?
3411/21/17
CREATE
MODEL
+
LOAD
DATA
QUERY
DATA

#DASummit
Architecture Example
3511/21/17

#DASummit
RDBMS to Graph Options
3611/21/17

#DASummit
From RDBMS to Graphs
3711/21/17

#DASummit
3911/21/17
( )-[:TO]->(Graph)
Northwind – the canonical RDBMS Example

#DASummit
4111/21/17
( )-[:IS_BETTER_AS]->(Graph)

#DASummit
Starting with the ER Diagram
4211/21/17

#DASummit
Locate the Foreign Keys
4311/21/17

#DASummit
Drop the Foreign Keys
4411/21/17

#DASummit
Find the JOIN Tables
4511/21/17

#DASummit
(Simple) JOIN Tables Become Relationships
4611/21/17

#DASummit
Attributed JOIN Tables -> Relationships with Properties
4711/21/17

#DASummit
Converting a Subset of the Tables
4811/21/17

#DASummit
As a Graph
4911/21/17

#DASummit
Querying the Knowledge Graph
Using openCypher: http://www.opencypher.org/
- The SQL for graph databases
- Implementers include Apache Spark, SAP HANA, etc
5011/21/17

#DASummit
Property Graph Model
5111/21/17

#DASummit
Who do people report to?
5211/21/17
MATCH
(e:Employee)<-[:REPORTS_TO]-(sub:Employee)
RETURN
*

#DASummit
5311/21/17

#DASummit
5411/21/17
MATCH
(e:Employee)<-[:REPORTS_TO]-(sub:Employee)
RETURN
e.employeeID AS managerID,
e.firstName AS managerName,
sub.employeeID AS employeeID,
sub.firstName AS employeeName;

#DASummit
5511/21/17

#DASummit
Who does Robert report to?
5611/21/17
MATCH
p=(e:Employee)<-[:REPORTS_TO]-(sub:Employee)
WHERE
sub.firstName = ‘Robert’
RETURN
p

#DASummit
5711/21/17

#DASummit
What is Robert’s reporting chain?
5811/21/17
MATCH
p=(e:Employee)<-[:REPORTS_TO*]-(sub:Employee)
WHERE
sub.firstName = ‘Robert’
RETURN
p

#DASummit
5911/21/17

#DASummit
Who’s the big boss?
6011/21/17
MATCH
(e:Employee)
WHERE
NOT (e)-[:REPORTS_TO]->()
RETURN
e.firstName as bigBoss

#DASummit
Who’s the big boss?
6111/21/17

#DASummit
Product Cross-Selling
6211/21/17
MATCH
(choc:Product {productName: 'Chocolade'})
<-[:INCLUDES]-(:Order)<-[:SOLD]-(employee),
(employee)-[:SOLD]->(o2)-[:INCLUDES]->(other:Product)
RETURN
employee.firstName,
other.productName,
COUNT(DISTINCT o2) as count
ORDER BY
count DESC
LIMIT 5;

#DASummit
Product Cross-Selling
6311/21/17

#DASummit
(Aside on Graph Compute)
6411/21/17
MATCH
p = shortestPath(
(a:Airport {code:”SFO”})-[*0..2]->
(b:Airport {code: “MSO”}))
RETURN
p

#DASummit
Loading Data into Knowledge Graph
6511/21/17

#DASummit
3 Steps to Creating the Graph
6611/21/17
https://developer.apple.com/library/content/documentation/UserExperience/Conceptual/SearchKitConcepts/art/inverted_index_textposition.jpg
IMPORT
NODES
CREATE
INDEXES
IMPORT
RELATIONSHIPS

#DASummit
Create Indexes
6711/21/17
CREATE INDEX ON :Product(productID);
CREATE INDEX ON :Product(productName);
CREATE INDEX ON :Category(categoryID);
CREATE INDEX ON :Employee(employeeID);
CREATE INDEX ON :Supplier(supplierID);
CREATE INDEX ON :Customer(customerID);
CREATE INDEX ON :Customer(customerName);

#DASummit
Import Nodes
6811/21/17
// Create customers
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/customers.csv" AS row
CREATE (:Customer {companyName: row.CompanyName, customerID: row.CustomerID,
fax: row.Fax, phone: row.Phone});
// Create products
contrib/developer-resources/gh-pages/data/northwind/products.csv" AS row
CREATE (:Product {productName: row.ProductName, productID: row.ProductID,
unitPrice: toFloat(row.UnitPrice)});

#DASummit
Import Nodes
6911/21/17
// Create suppliers
contrib/developer-resources/gh-pages/data/northwind/suppliers.csv" AS row
CREATE (:Supplier {companyName: row.CompanyName, supplierID:
row.SupplierID});
// Create employees
contrib/developer-resources/gh-pages/data/northwind/employees.csv" AS row
CREATE (:Employee {employeeID:row.EmployeeID, firstName: row.FirstName,
lastName: row.LastName, title: row.Title});

#DASummit
Import Nodes
7011/21/17
// Create categories
contrib/developer-resources/gh-pages/data/northwind/categories.csv" AS row
CREATE (:Category {categoryID: row.CategoryID, categoryName:
row.CategoryName, description: row.Description});
// Create orders
contrib/developer-resources/gh-pages/data/northwind/orders.csv" AS row
MERGE (order:Order {orderID: row.OrderID}) ON CREATE SET order.shipName =
row.ShipName;

#DASummit
Creating Relationships
7111/21/17
MATCH (order:Order {orderID: row.OrderID})
MATCH (customer:Customer {customerID: row.CustomerID})
MERGE (customer)-[:PURCHASED]->(order);
MATCH (product:Product {productID: row.ProductID})
MATCH (supplier:Supplier {supplierID: row.SupplierID})
MERGE (supplier)-[:SUPPLIES]->(product);

#DASummit
7211/21/17
MERGE (order)-[pu:INCLUDES]->(product)
ON CREATE SET pu.unitPrice = toFloat(row.UnitPrice), pu.quantity =
toFloat(row.Quantity);
MATCH (employee:Employee {employeeID: row.EmployeeID})
MERGE (employee)-[:SOLD]->(order);

#DASummit
7311/21/17
MATCH (category:Category {categoryID: row.CategoryID})
MERGE (product)-[:PART_OF]->(category);
contrib/developer-resources/gh-pages/data/northwind/employees.csv" AS row
MATCH (employee:Employee {employeeID: row.EmployeeID})
MATCH (manager:Employee {employeeID: row.ReportsTo})
MERGE (employee)-[:REPORTS_TO]->(manager);

#DASummit
Today we’re seeing graph projects across
virtually every industry
7411/21/17
Social networks RetailHR &
Recruiting
Manufacturing
& Logistics
Health Care TelcoFinance

#DASummit
Traditional Retail Value Chain
7611/21/17
End Consumers
Component
Manufacturers
Logistics
RetailersWholesalers
Assembly
Plants

#DASummit
The Online Retail Value Chain
7711/21/17
PAYMEN
TS
SALES-
CHANNE
LS
SUPPLY
CHAIN
PRODUC
TS
MARKETI
NG
CRM
CUSTOMER
EXPERIENCE

#DASummit
7811/21/17
PAYMEN
TS
SALES-
CHANNE
LS
SUPPLY
CHAIN
PRODUC
TS
MARKETI
NG
CRM
CUSTOMER
EXPERIENCEStore
Mobile
Webstore

#DASummit
7911/21/17
PAYMEN
TS
SALES-
CHANNE
LS
SUPPLY
CHAIN
PRODUC
TS
MARKETI
NG
CRM
CUSTOMER
EXPERIENCEStore
Mobile
Webstore
Shipping
Inventory
Express goods
Home delivery

#DASummit
Digital Transformation in Retail today
requires us to put all this data to good use
8411/21/17

#DASummit
8511/21/17
SHOPPING EXPERIENCE

#DASummit
Recommendations in Real-Time
8711/21/17
Related products
People who bought X
also bought Y
The main
product

#DASummit
8811/21/17
KITCHEN
AID
SERIES

#DASummit
8911/21/17
KITCHEN
AID
SERIES
Complaints
reviews
Tweets
Emails

#DASummit
9011/21/17
KITCHEN
AID
SERIES
Complaints
reviews
Tweets
Emails
Returns

#DASummit
9111/21/17
KITCHEN
AID
SERIES
Complaints
reviews
Tweets
Emails
Returns
Inventory
Home delivery
Express goods
Location/A
dressPromotions
Bundling

#DASummit
9211/21/17
KITCHEN
AID
SERIES
Complaints
reviewsTweets Emails
Returns
Inventory
Home delivery
Express goods
Location/A
dressPromotions
Bundling
Purchase History
Price-range
Category

#DASummit
9311/21/17
KITCHEN
AID
SERIES
Complaints
Returns
Inventory
Home delivery
Express goods
Location/A
dressPromotions
Bundling
Purchase History
Price-range
Category

#DASummit
9411/21/17
KITCHEN
AID
SERIES
Complaints
Returns
Inventory
Home delivery
Express goods
Location/A
dressPromotions
Bundling
Purchase History
Price-range
Category

#DASummit
Knowledge Graph Successes
To get results in real-time from a dataset that is highly interconnected
and have common knowledge across the enterprise you need a graph
9611/21/17
Adidas uses Neo4j to combine
content and product data into a
single, searchable graph
database which is used to create
a personalized customer
experience
“We have many different silos,
many different data domains, and
in order to make sense out of our
data, we needed to bring those
together and make them useful for
us,”
– Sokratis Kartelias, Adidas
eBay Now Tackles eCommerce
Delivery Service Routing with
Neo4j
“We needed to rebuild when growth
and new features made our slowest
query longer than our fastest
delivery - 15 minutes! Neo4j gave
us best solution”
– Volker Pacher, eBay
Walmart uses Neo4j to give
customer best web experience
through relevant and personal
recommendations
“As the current market leader in
graph databases, and with
enterprise features for scalability
and availability, Neo4j is the right
choice to meet our demands”.
- Marcos Vada, Walmart

#DASummit
Graph DBMS
Primary Database:
• Designed to store and retrieve data using a connection-oriented model
• Connections (edges) provide the context of how two things are related
• Entities (nodes) are the things (i.e Person, Account, etc) in your data
• Properties are supported on nodes and edges (avoid those that don’t)
• Indexes are only used to find starting points in the data (avoid those that don’t)
• This removes JOIN pain when answering questions requiring movement across data entities (traversals)
• Designed to be fully ACID and Transactional (avoid those that aren’t)
• Provide Dynamic (index-free), Constant Time movement across data entities
• Becoming the backing store for primary data throughout the enterprise
• Yes, you can read this as “replacing RDBMS as the primary store” (I’ve been using one like this for 5+yrs)
9711/21/17

#DASummit
Indexes for Direct Retrieval
• Designed for responding to anticipated questions
• Rapid responses for indexed values
• Not designed for adhoc and unexpected questions
• Costly to maintain in a rapidly changing data environment
• Slow to update because it requires dev/dba cycles
9811/21/17

#DASummit
Pointers for Traversal
• Designed for responding to unanticipated questions
• Rapid responses for connection-centric and depth-based questions
• Not designed for static cache-like return sets
• Easy to maintain in a rapidly changing data environment
• No dev/dba cycle required to maintain performance as data changes
• Optimal for adhoc and unexpected connection-centric questions
9911/21/17

#DASummit
GraphGrid Data Platform
Provides all components needed for knowledge bases
10011/21/17

#DASummit
Thank You! Questions?
Knowledge Bases as a Data Platform: How to Effectively
Manage Complex, Heterogeneous Data
by
Benjamin Nussbaum – CTO, AtomRain Inc.
(Creators of GraphGrid Connected Data Platform)
ben@atomrain.com | @bennussbaum
atomrain.com | graphgrid.com
10111/21/17

Managing Complex, Heterogeneous Data with Knowledge Graphs

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Managing Complex, Heterogeneous Data with Knowledge Graphs

Similaire à Managing Complex, Heterogeneous Data with Knowledge Graphs (20)

Dernier

Dernier (20)

Managing Complex, Heterogeneous Data with Knowledge Graphs