The document discusses how knowledge graphs can be used to effectively manage complex, heterogeneous data. It provides examples of how various use cases like user personalization, fraud detection, and network operations can be modeled as a knowledge graph. It also describes how to convert relational data to a graph and load data into a knowledge graph. Key benefits of knowledge graphs include intuitive queries, speed, and the ability to easily add new relationships and properties.
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
Managing Complex, Heterogeneous Data with Knowledge Graphs
1. a platform by
Knowledge Bases as a Data
Platform: How to Effectively
Manage Complex,
Heterogeneous Data
Data Architecture Summit 2017
Benjamin Nussbaum – CTO, AtomRain Inc.
(Creators of GraphGrid Connected Data Platform)
ben@atomrain.com | @bennussbaum
atomrain.com | graphgrid.com
28. #DASummit
SQL Pains
• Complex to model and store relationships
• Performance degrades with increases in data
• Queries get long and complex
• Maintenance is painful
2711/21/17
29. #DASummit
Graph Gains
• Easy to model and store relationships
• Performance of relationship traversal remains constant with growth in
data size
• Queries are shortened and more readable
• Adding additional properties and relationships can be done on the fly -
no migrations
2811/21/17
51. #DASummit
Querying the Knowledge Graph
Using openCypher: http://www.opencypher.org/
- The SQL for graph databases
- Implementers include Apache Spark, SAP HANA, etc
5011/21/17
55. #DASummit
Who do people report to?
5411/21/17
MATCH
(e:Employee)<-[:REPORTS_TO]-(sub:Employee)
RETURN
e.employeeID AS managerID,
e.firstName AS managerName,
sub.employeeID AS employeeID,
sub.firstName AS employeeName;
67. #DASummit
3 Steps to Creating the Graph
6611/21/17
https://developer.apple.com/library/content/documentation/UserExperience/Conceptual/SearchKitConcepts/art/inverted_index_textposition.jpg
IMPORT
NODES
CREATE
INDEXES
IMPORT
RELATIONSHIPS
68. #DASummit
Create Indexes
6711/21/17
CREATE INDEX ON :Product(productID);
CREATE INDEX ON :Product(productName);
CREATE INDEX ON :Category(categoryID);
CREATE INDEX ON :Employee(employeeID);
CREATE INDEX ON :Supplier(supplierID);
CREATE INDEX ON :Customer(customerID);
CREATE INDEX ON :Customer(customerName);
69. #DASummit
Import Nodes
6811/21/17
// Create customers
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/customers.csv" AS row
CREATE (:Customer {companyName: row.CompanyName, customerID: row.CustomerID,
fax: row.Fax, phone: row.Phone});
// Create products
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/products.csv" AS row
CREATE (:Product {productName: row.ProductName, productID: row.ProductID,
unitPrice: toFloat(row.UnitPrice)});
70. #DASummit
Import Nodes
6911/21/17
// Create suppliers
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/suppliers.csv" AS row
CREATE (:Supplier {companyName: row.CompanyName, supplierID:
row.SupplierID});
// Create employees
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/employees.csv" AS row
CREATE (:Employee {employeeID:row.EmployeeID, firstName: row.FirstName,
lastName: row.LastName, title: row.Title});
71. #DASummit
Import Nodes
7011/21/17
// Create categories
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/categories.csv" AS row
CREATE (:Category {categoryID: row.CategoryID, categoryName:
row.CategoryName, description: row.Description});
// Create orders
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/orders.csv" AS row
MERGE (order:Order {orderID: row.OrderID}) ON CREATE SET order.shipName =
row.ShipName;
72. #DASummit
Creating Relationships
7111/21/17
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/orders.csv" AS row
MATCH (order:Order {orderID: row.OrderID})
MATCH (customer:Customer {customerID: row.CustomerID})
MERGE (customer)-[:PURCHASED]->(order);
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/products.csv" AS row
MATCH (product:Product {productID: row.ProductID})
MATCH (supplier:Supplier {supplierID: row.SupplierID})
MERGE (supplier)-[:SUPPLIES]->(product);
73. #DASummit
Creating Relationships
7211/21/17
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/orders.csv" AS row
MATCH (order:Order {orderID: row.OrderID})
MATCH (product:Product {productID: row.ProductID})
MERGE (order)-[pu:INCLUDES]->(product)
ON CREATE SET pu.unitPrice = toFloat(row.UnitPrice), pu.quantity =
toFloat(row.Quantity);
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/orders.csv" AS row
MATCH (order:Order {orderID: row.OrderID})
MATCH (employee:Employee {employeeID: row.EmployeeID})
MERGE (employee)-[:SOLD]->(order);
74. #DASummit
Creating Relationships
7311/21/17
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/products.csv" AS row
MATCH (product:Product {productID: row.ProductID})
MATCH (category:Category {categoryID: row.CategoryID})
MERGE (product)-[:PART_OF]->(category);
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/employees.csv" AS row
MATCH (employee:Employee {employeeID: row.EmployeeID})
MATCH (manager:Employee {employeeID: row.ReportsTo})
MERGE (employee)-[:REPORTS_TO]->(manager);
75. #DASummit
Today we’re seeing graph projects across
virtually every industry
7411/21/17
Social networks RetailHR &
Recruiting
Manufacturing
& Logistics
Health Care TelcoFinance
97. #DASummit
Knowledge Graph Successes
To get results in real-time from a dataset that is highly interconnected
and have common knowledge across the enterprise you need a graph
9611/21/17
Adidas uses Neo4j to combine
content and product data into a
single, searchable graph
database which is used to create
a personalized customer
experience
“We have many different silos,
many different data domains, and
in order to make sense out of our
data, we needed to bring those
together and make them useful for
us,”
– Sokratis Kartelias, Adidas
eBay Now Tackles eCommerce
Delivery Service Routing with
Neo4j
“We needed to rebuild when growth
and new features made our slowest
query longer than our fastest
delivery - 15 minutes! Neo4j gave
us best solution”
– Volker Pacher, eBay
Walmart uses Neo4j to give
customer best web experience
through relevant and personal
recommendations
“As the current market leader in
graph databases, and with
enterprise features for scalability
and availability, Neo4j is the right
choice to meet our demands”.
- Marcos Vada, Walmart
98. #DASummit
Graph DBMS
Primary Database:
• Designed to store and retrieve data using a connection-oriented model
• Connections (edges) provide the context of how two things are related
• Entities (nodes) are the things (i.e Person, Account, etc) in your data
• Properties are supported on nodes and edges (avoid those that don’t)
• Indexes are only used to find starting points in the data (avoid those that don’t)
• This removes JOIN pain when answering questions requiring movement across data entities (traversals)
• Designed to be fully ACID and Transactional (avoid those that aren’t)
• Provide Dynamic (index-free), Constant Time movement across data entities
• Becoming the backing store for primary data throughout the enterprise
• Yes, you can read this as “replacing RDBMS as the primary store” (I’ve been using one like this for 5+yrs)
9711/21/17
99. #DASummit
Indexes for Direct Retrieval
• Designed for responding to anticipated questions
• Rapid responses for indexed values
• Not designed for adhoc and unexpected questions
• Costly to maintain in a rapidly changing data environment
• Slow to update because it requires dev/dba cycles
9811/21/17
100. #DASummit
Pointers for Traversal
• Designed for responding to unanticipated questions
• Rapid responses for connection-centric and depth-based questions
• Not designed for static cache-like return sets
• Easy to maintain in a rapidly changing data environment
• No dev/dba cycle required to maintain performance as data changes
• Optimal for adhoc and unexpected connection-centric questions
9911/21/17
102. #DASummit
Thank You! Questions?
Knowledge Bases as a Data Platform: How to Effectively
Manage Complex, Heterogeneous Data
by
Benjamin Nussbaum – CTO, AtomRain Inc.
(Creators of GraphGrid Connected Data Platform)
ben@atomrain.com | @bennussbaum
atomrain.com | graphgrid.com
10111/21/17