2. About
Me
• Max
De
Marzi
-‐
Neo4j
Field
Engineer
• My
Blog:
http://maxdemarzi.com
• Find
me
on
Twitter:
@maxdemarzi
• Email
me:
maxdemarzi@gmail.com
• GitHub:
http://github.com/maxdemarzi
7. The
Problem
• all JOINs are executed every time you query
(traverse) the relationship
• executing a JOIN means to search for a key in
another table
• with Indices executing a JOIN means to lookup a key
• B-Tree Index: O(log(n))
• more entries => more lookups => slower JOINs
9. Max
Big Data Tech Con
NoSQL Now
Chariot Data IO
143
326
725
981
143 981
143 725
143 326
10. uid: MDM
name: Max
uid: BDTC
where: Burlinggame
uid: NSN
where: San Francisco
uid: CDIO
where: Philadelphia
Nodes
Relationships
member
member
member
A Property Graph
11. Neo4j
Secret
Sauce
• Pointers instead of look-ups
• Fixed sized records for fast access
• Do all your “Joining” on creation
• Spin spin spin through this data
structure
12. Relational
Databases
Can’t
Handle
Relationships
Well
• Cannot
model
or
store
data
and
relationships
without
complexity
• Performance
degrades
with
number
&
levels
of
relationships,
and
database
size
• Query
complexity
grows
with
need
for
JOINs
• Adding
new
types
of
data
and
relationships
requires
schema
redesign,
increasing
time
to
market
…
making
traditional
databases
inappropriate
when
relationships
are
valuable
in
real-‐time
Slow
development
Poor
performance
Low
scalability
Hard
to
maintain
13. NoSQL
Databases
Don’t
Handle
Relationships
• No
data
structures
to
model
or
store
relationships
• No
query
constructs
to
support
relationships
• Relating
data
requires
“JOIN
logic”
in
the
application
• No
ACID
support
for
transactions
…
making
NoSQL
databases
inappropriate
when
relationships
are
valuable
in
real-‐time
14. Real-‐Time
Query
Performance
Performance
must
hold
steady
with
scale
Connectedness
and
Size
of
Data
Set
Response
Time
0
to
2
hops
0
to
3
degrees
Thousands
of
connections
Tens
to
hundreds
of
hops
Thousands
of
degrees
Billions
of
connections
Relational
and
Other
NoSQL
Databases
Neo4j
Neo4j
is
1000x
faster
Reduces
minutes
to
milliseconds
15. Re-‐Imagine
Your
Data
as
a
Graph
Neo4j
is
an
enterprise-‐grade
graph
database
that
enables
you
to:
• Model
and
store
your
data
as
a
graph
• Query
relationships
with
ease
and
in
real-‐time
• Seamlessly
evolve
applications
to
support
new
requirements
by
adding
new
kinds
of
data
and
relationships
Agile
development
High
performance
Vertical
and
horizontal
scale
Seamless
evolution
16. Neo4j
Overview
Product
• Neo4j
-‐
World’s
leading
graph
database
• 1M+
downloads,
adding
50k+
per
month
• 150+
enterprise
subscription
customers
including
over
50
of
the
Global
2000
Company
• Neo
Technology,
Creator
of
Neo4j
• 80
employees
with
HQ
in
Silicon
Valley,
London,
Munich,
Paris
and
Malmö
• $45M
in
funding
from
Fidelity,
Sunstone,
Conor,
Creandum,
Dawn
Capital
17.
2000
2003
2007
2009
2011 2013 2014 2015
Neo4j:
The
Graph
Database
Leader
GraphConnect,
first
conference
for
graph
DBs
First
Global
2000
Customer
Introduced
Cypher
a
declarative
query
language
for
property
graphs
Published
O’Reilly
book
on
Graph
Databases
$11M
Series
A
from
Fidelity,
Sunstone
and
Conor
$11M
Series
B
from
Fidelity,
Sunstone
and
Conor
Commercial
Leadership
First
native
graph
DB
in
24/7
production
Invented
property
graph
model
Contributed
first
graph
DB
to
open
source
$2.5M
Seed
Round
from
Sunstone
and
Conor
Funding
Technical
Leadership
Extended
graph
data
model
to
labeled
property
graph
150+
customers
50K+
monthly
downloads
500+
graph
DB
events
worldwide
$20M
Series
C
led
by
Creandum,
with
Dawn
and
existing
investors
18. “Forrester
estimates
that
over
25%
of
enterprises
will
be
using
graph
databases
by
2017”
Neo4j
Leads
the
Graph
Database
Revolution
“Neo4j
is
the
current
market
leader
in
graph
databases.”
“Graph
analysis
is
possibly
the
single
most
effective
competitive
differentiator
for
organizations
pursuing
data-‐driven
operations
and
decisions
after
the
design
of
data
capture.”
1.
IT
Market
Clock
for
Database
Management
Systems,
2014
2.
TechRadar™:
Enterprise
DBMS,
Q1
2014
3.Graph
Databases
–
and
Their
Potential
to
Transform
How
We
Capture
Interdependencies
(Enterprise
Management
Associates)
19. Building
a
Recommendation
Engine
in
2
Minutes
with
Neo4j
Developer
Experience:
Neo4j
UI
with
Cypher
Query
Language
Two-‐Minute
Video
Demo
https://www.youtube.com/watch?v=qbZ_Q-‐YnHYo
20. Neo4j
–
Key
Product
Features
Native
Graph
Storage
Ensures
data
consistency
and
performance
Native
Graph
Processing
Millions
of
hops
per
second,
in
real
time
“Whiteboard
Friendly”
Data
Modeling
Model
data
as
it
naturally
occurs
High
Data
Integrity
Fully
ACID
transactions
The
Graph
Query
Language:
Cypher
Requires
10x
to
100x
less
code
than
SQL
Scalability
and
High
Availability
Vertical
and
horizontal
scaling
optimized
for
graphs
Built-‐in
ETL
Seamless
import
from
other
databases
Integration
Drivers
and
APIs
for
popular
languages
MATCH
(A)
21. CAR
DRIVES
name:
“Dan”
born:
May
29,
1970
twitter:
“@dan”
name:
“Ann”
born:
Dec
5,
1975
since:
Jan
10,
2011
brand:
“Volvo”
model:
“V70”
Property
Graph
Model
Components
Nodes
• The
objects
in
the
graph
• Can
have
properties
• Can
be
labeled
Relationships
• Relate
nodes
by
type
and
direction
• Can
have
properties
LOVES
LOVES
LIVES
WITH
OW
NS
PERSON PERSON
22. Triple
Store/RDF
Model
• Resource
Description
Framework
• Subject,
Predicate,
Object
• Standard
Data
Model
• Names
for
subjects,
predicates,
objects
must
be
URIs
• Names
must
be
Global
• No
properties
on
the
Relationships
• Like
“3rd
Normal
Form”
for
Relational
Databases
(but
really
more
like
5/6th)
25. Property
Graph
Vs
Triple
Store
• Property
Graph
is
a
more
generic
case
of
the
Triple
Store
• Lack
of
properties
on
relationships
for
Triple
Stores
reduce
(
or
complicate)
their
expressive
power
27. General
Use
Cases
• Graph
Databases:
• Local
Queries
(anchor
on
a
node
or
set
of
nodes
then
traverse)
• Realtime
(<20ms)
requirements
• Complex,
deep
traversals
• Flexible
graph
models
• Triple
Stores:
• Global
Queries
(find
pattern
in
large
volume
of
information)
• Browsing
Content
• Inference
Discovery
41. Traversal
API
• Start
with
the
Simple
Defaults
(order,
relationships,
depth,
uniqueness,
etc)
• Custom
Expanders
• Where
should
I
go
next
• Custom
Evaluators
• I’ve
gone
there…
should
I
accept
this
path?
44. Cypher:
Powerful
and
Expressive
Query
Language
MATCH
(:Person
{
name:“Dan”}
)
-‐[:LOVES]-‐>
(:Person
{
name:“Ann”}
)
LOVES
Dan Ann
Label Property Label Property
Node Node
45. MATCH
(boss)-‐[:MANAGES*0..3]-‐>(sub),
(sub)-‐[:MANAGES*1..3]-‐>(report)
WHERE
boss.name
=
“John
Doe”
RETURN
sub.name
AS
Subordinate,
count(report)
AS
Total
Express
Complex
Queries
Easily
with
Cypher
Find
all
direct
reports
and
how
many
people
they
manage,
up
to
3
levels
down
Cypher
QuerySQL
Query
49. Cypher
Query:
Movie
Recommendation
MATCH
(watched:Movie
{title:"Toy
Story”})
<-‐[r1:RATED]-‐
()
-‐[r2:RATED]-‐>
(unseen:Movie)
WHERE
r1.rating
>
7
AND
r2.rating
>
7
AND
watched.genres
=
unseen.genres
AND
NOT(
(:Person
{username:”maxdemarzi"})
-‐[:RATED|WATCHED]-‐>
(unseen)
)
RETURN
unseen.title,
COUNT(*)
ORDER
BY
COUNT(*)
DESC
LIMIT
25
What
are
the
Top
25
Movies
• that
I
haven't
seen
• with
the
same
genres
as
Toy
Story
• given
high
ratings
• by
people
who
liked
Toy
Story
51. Cypher
Query:
k-‐NN
Recommendation
MATCH
(m:Movie)
<-‐[r:RATED]-‐
(b:Person)
-‐[s:SIMILARITY]-‐
(p:Person
{name:'Zoltan
Varju'})
WHERE
NOT(
(p)
-‐[:RATED|WATCHED]-‐>
(m)
)
WITH
m,
s.similarity
AS
similarity,
r.rating
AS
rating
ORDER
BY
m.name,
similarity
DESC
WITH
m.name
AS
movie,
COLLECT(rating)[0..3]
AS
ratings
WITH
movie,
REDUCE(s
=
0,
i
IN
ratings
|
s
+
i)*1.0
/
LENGTH(ratings)
AS
recommendation
ORDER
BY
recommendation
DESC
RETURN
movie,
recommendation
LIMIT
25
What
are
the
Top
25
Movies
• that
Zoltan
Varju
has
not
seen
• using
the
average
rating
• by
my
top
3
neighbors
57. Neo4j
Clustering
Architecture
Optimized
for
Speed
&
Availability
at
Scale
57
Performance
Benefits:
• No
network
hops
within
queries
• Real-‐time
operations
with
fast
and
consistent
response
times
• Cache
sharding
spreads
cache
across
cluster
for
very
large
graphs
Clustering
Features:
• Master-‐slave
replication
with
master
re-‐election
and
failover
• Each
instance
has
its
own
local
cache
• Horizontal
scaling
&
disaster
recovery
Load
Balancer
Neo4jNeo4jNeo4j
58. Getting
Data
into
Neo4j
Cypher-‐Based
“LOAD
CSV”
Capability
• Transactional
(ACID)
writes
• Initial
and
incremental
loads
of
up
to
10
million
nodes
and
relationships
Command-‐Line
Bulk
Loader
neo4j-‐import
• For
initial
database
population
• For
loads
with
10B+
records
• Up
to
1M
records
per
second
4.58
million
things
and
their
relationships…
Loads
in
100
seconds!
59. Databases
Data
Storage
and
Business
Rules
Execution
Data
Mining
and
Aggregation
Neo4j
Fits
into
Your
Enterprise
Environment
Application
Graph
Database
Cluster
Neo4j Neo4j Neo4j
Ad
Hoc
Analysis
ETL
Bulk
Analytic
Infrastructure
Graph
Compute
Engine
Hadoop
EDW
…
ETL
Data
Scientist
End
User
60. Value
from
Relationships
–
Common
Use
Cases
Internal
Applications
Master
Data
Management
Network
and
IT
Operations
Fraud
Detection
Customer-‐Facing
Applications
Real-‐time
Recommendations
Graph-‐based
Search
Identity
and
Access
Management
69. Recommend
Love
Find
your
soulmate
in
the
graph
• Are
they
energetic?
• Do
they
like
dogs?
• Have
a
good
sense
of
humor?
• Neat
and
tidy,
but
not
crazy
about
it?
What
are
the
Top
10
Potential
Mates
for
me
• that
are
in
the
same
location
• are
sexually
compatible
• have
traits
I
want
• want
traits
I
have
73. Walmart
BUSINESS
CASE
World’s
largest
company
by
revenue
World’s
largest
retailer
and
private
employer
SF-‐based
global
e-‐commerce
division
manages
several
websites
Found
in
1969
Bentonville,
Arkansas
• Needed
online
customer
recommendations
to
keep
pace
with
competition
• Data
connections
provided
predictive
context,
but
were
not
in
a
usable
format
• Solution
had
to
serve
many
millions
of
customers
and
products
while
maintaining
superior
scalability
and
performance
74. Walmart
SOLUTION
• Brings
customers,
preferences,
purchases,
products
and
locations
into
a
graph
model
• Uses
connections
to
make
product
recommendations
• Solution
deployed
across
WalMart
divisions
and
websites
75. Global
Courier
BUSINESS
CASE
World’s
largest
courier
480,000
employees
€55
billion
in
revenue
Needed
new
B2C
and
B2B
parcel
routing
system
for
its
logistics
practice
Legacy
system
neither
supported
the
full
network
nor
the
shift
to
online
demands
Needed
to
replace
aging
B2B
and
B2C
parcel
routing
system
whose
requirements
include:
• 24x7
availability
• Peak
loads
of
5M
parcels
per
day,
3K
per
second
• Support
for
complex
and
diverse
software
stack
• Predictable
performance
with
linear
scalability
• Daily
changes
to
logistics
networks
• Route
from
any
point
to
any
point
• Single
point
of
truth
for
entire
network
76. Global
Courier
SOLUTION
Neo4j
provides
the
ideal
domain
fit
since
a
logistics
network
is
a
graph
• High
availability
and
performance
via
Neo4j
clustering
• Greatly
simplified
Cypher
queries
for
routing
versus
relational
SQL
queries
• Flexible
data
model
that
reflects
the
real
logistics
world
far
better
than
relational
• Easy-‐to-‐grasp
whiteboard-‐friendly
model
77. eBay
BUSINESS
CASE
C2C
and
B2C
retail
network
Full
e-‐commerce
functionality
for
individuals
and
businesses
Integrated
with
logistics
vendors
for
product
deliveries
• Needed
an
offering
to
compete
with
Amazon
Prime
• Enable
customer-‐selected
delivery
inside
90
minutes
• Calculate
best
route
option
in
real-‐time
• Scale
to
enable
a
variety
of
services
• Offer
more
predictable
delivery
times
78. eBay
Now
SOLUTION
• Acquired
UK-‐based
Shutl.
a
leader
in
same-‐day
delivery
• Used
Neo4j
to
create
eBay
Now
• 1000
times
faster
than
the
prior
MySQL-‐based
solution
• Faster
time-‐to-‐market
• Improved
code
quality
with
10
to
100
times
less
query
code
79. Classmates
BUSINESS
CASE
Online
yearbook
connecting
friends
from
school,
work
and
military
in
US
and
Canada
Founded
as
Memory
Lane
in
Seattle
Develop
new
social
networking
capabilities
to
monetize
yearbook-‐related
offerings
• Show
all
the
people
I
know
in
a
yearbook
• Show
yearbooks
my
friends
appear
in
most
often
• Show
sections
of
a
yearbook
that
my
friends
appear
most
in
• Show
me
other
schools
my
friends
attended
80. Classmates
SOLUTION
Neo4j
provides
a
robust
and
scalable
graph
database
solution
• 3-‐instance
cluster
with
cache
sharding
and
disaster-‐recovery
• 18ms
response
time
for
top
4
queries
• 100M
nodes
and
600M
relationships
in
initial
graph—including
people,
images,
schools,
yearbooks
and
pages
• Projected
to
grow
to
1B
nodes
and
6B
relationships
81. National
Geographic
BUSINESS
CASE
Non-‐profit
scientific
and
educational
institution
founded
in
1888
Covers
geography,
archaeology,
natural
science,
environment
and
historical
conservation
Journals,
online
media,
radio,
TV,
documentaries,
live
events
and
consumer
content
and
goods
• Improve
poor
performance
of
PostgreSQL
app
• Increase
user
engagement
by
linking
to
100+
years
of
multimedia
content
• Improve
targeting
by
understand
subscribers’
interests
better
• Recommend
content
and
services
to
users
based
on
their
interests
82. National
Geographic
SOLUTION
• Enabled
complex
real-‐time
analytics
across
eight
million
users
and
a
century
of
content
• Delivered
robust
performance
by
eliminating
triple-‐nested
SQL
joins
• Cross-‐refers
users
among
content,
live
events,
travel,
goods
and
causes
• Neo4j
solution
much
less
cumbersome
and
easier
to
maintain
than
previous
SQL
system
83. Curaspan
BUSINESS
CASE
Leader
in
patient
management
for
discharges
and
referrals
Manages
patient
referrals
4600+
health
care
facilities
Connects
providers,
payers
via
web-‐based
patient
management
platform
Founded
in
1999
in
Newton,
Massachusetts
• Improve
poor
performance
of
Oracle
solution
• Support
more
complexity
including
granular,
role-‐based
access
control
• Satisfy
complex
Graph
Search
queries
by
discharge
nurses
and
intake
coordinators
Find
a
skilled
nursing
facility
within
n
miles
of
a
given
location,
belonging
to
health
care
group
XYZ,
offering
speech
therapy
and
cardiac
care,
and
optionally
Italian
language
services
84. Curaspan
SOLUTION
• Met
fast,
real-‐time
performance
demands
• Supported
queries
span
multiple
hierarchies
including
provider
and
employee-‐permissions
graphs
• Improved
data
model
to
handle
adding
more
dimensions
to
the
data
such
as
insurance
networks,
service
areas
and
care
organizations
• Greatly
simplified
queries,
simplifying
multi-‐page
SQL
statements
into
one
Neo4j
function
85. FiftyThree
BUSINESS
CASE
Maker
of
Paper,
one
of
the
top
apps
in
Apple’s
App
Store,
with
millions
of
users
Based
in
New
York
City
• Add
social
capabilities
to
digital-‐paper
app
• Support
social
collaboration
across
millions
of
users
in
new
Mix
app
• Enable
seamless
interaction
between
social
and
content-‐asset
networks
• Ensure
new
apps
are
robust,
scalable
and
fast
86. FiftyThree
SOLUTION
• Neo4j
data
model
ideal
for
social
network,
content
management
and
access
control
• Users
create,
publish
and
share
designs
simply
• Easy
to
develop
and
evolve
Neo4j-‐based
app
• Integrates
well
with
FiftyThree
EC2
architecture
See
the
Neo4j
solution
in
action
Betting
the
Company
(Literally)
on
a
Graph
Database
http://aseemk.com/talks/neo4j-‐lessons-‐learned#/
App
Store
Editor’s
Choice
2012
iPad
App
of
Year
Apple
Best
Apps
of
2014