SlideShare une entreprise Scribd logo
1  sur  37
OrientDB vs Neo4j
Comparisons (querys and functionality)
Curtis Mosters
@02.12.2014
Content
• Schema
• Indexes
• Comparison
• Query/Speed
• Functionality
• Results
2OrientDB vs Neo4j - Comparison
Prototype Comparison
Schema
ApplnPerson
WROTE
Abstract
HAS_ABSTRACT
ID:INTEGER
name:String
ID:INTEGER
title:String
ID:INTEGER
abstract:String
Indexes
• Appln.title
• LUCENE FULLTEXT
• Appln.ID
• SBTREE UNIQUE (in Neo4j the usual INDEX)
• Person.title
• LUCENE FULLTEXT
• Person.ID
• SBTREE UNIQUE (in Neo4j the usual INDEX)
4OrientDB vs Neo4j - Comparison
ComparisonPrototype
Querys and used systems
• comparing the speed of both on typical requests
• Linux 64-bit (same instance on AWS)
• OrientDB v.2.0M2
• Neo4j v.2.1.5
• Speed tests are done in the same order the slides/rows are
• One database per instance  2 instances
• Servers are idling and just OrientDB/Neo4j running
• Querys are tested by hand on the command line (not in the studio)
• Querys always having the same results on both databases
• Times are always given in milliseconds (ms) if not specified
• Both databases using the StandardAnalyzer from Lucene
• Cache cleared after querys
ComparisonPrototype
System cache notes
• OrientDB is always clearing the cache when restarted
• Neo4j does not clear the cache
• So in the Neo4j column I in some cases tested with cleared system cache and sometimes
without
• If there is just one column on Neo4j it is „No System cache cleared“
Comparison (Query/Speed)
OrientDB vs Neo4j - Comparison 7
ComparisonPrototype
Import
OrientDB
• Official supported methods
• OrientDB-ETL/JDBC
• Java API
• Clean Java code
• ETL tool is performant but at last tests having
issues with edge creation
• Not using Multi-Threading
• Not using Mapping
Neo4j
• Official supported methods
• LOAD CSV command
• Java API
• Groovy
• Batch-Importer
• Talend
• No really „easy“ way but Java is the fastest and
most reliable way
• Using Multi-Threading and Mapping
OrientDB vs Neo4j - Comparison8
~300mio lines {APPLNs,TITLEs,PERSONs} with edges and indexes
25 hours 19 hours
ComparisonPrototype
Startup/Shutdown speed
OrientDB
• Nearly always the same time when starting or
shutting down the server
• 2 sec – 10 sec
Neo4j
• Different times when starting and especially by
shutting down the server when task is still
running
• 3 sec – 3 min (no infos)
OrientDB vs Neo4j - Comparison9
Good for testing and later reliability
ComparisonPrototype
Query #1
OrientDB Neo4j
OrientDB vs Neo4j - Comparison10
Checking Single ID lookup
? SELECT FROM Appln WHERE ID=? MATCH (a:Appln)WHERE a.ID=? RETURN a
1412 27 71 939
763773 9 30 44
234526 15 26 43
858584 10 25 44
536367 11 25 43
2323 17 18 31
5267 1 15 24
73573 14 29 35
585985 10 25 34
797977 10 26 35
Average 12,4 (10 of 10) 29 (0 of 10)
No system cache cleared System cache cleared
ComparisonPrototype
Query #2
OrientDB Neo4j
OrientDB vs Neo4j - Comparison11
Checking Fulltext Lucene Lookup
?
Note on Neo4j:
more than one word needs to
be put in a new property
statement, e.g. instead of
'title:super efficient'
we take 'title:super OR
title:efficient'
SELECT FROM (SELECT title,ID FROM ApplnWHERE title
LUCENE "?" ORDER BY ID) LIMIT 10
START n=node:titles('title:?') RETURN n.title,n.IDORDER BY n.ID
LIMIT 10
solar 10172 801 137088
panel 263698 121494 161215
druck 25582 9679 11290
machine 1146339 297645 357818
cell 253565 55397 26298
automatic vehicle 961054 131772 163794
super efficient 53380 8432 8707
motor 398803 79527 46687
airplane 14066 892 390
windshield 8969 1004 536
Average 313 sec (5,2 min) (0 of 10) 70 sec (10 of 10)
No system cache cleared System cache cleared
ComparisonPrototype
Query #3.1
OrientDB Neo4j
OrientDB vs Neo4j - Comparison12
Checking Fulltext Lucene Lookup Overall Count on 1 indices
?
Note on Neo4j:
more than one word needs to
be put in a new property
statement, e.g. instead of
'title:super efficient'
we take 'title:super OR
title:efficient'
SELECT $totalHits
FROMAppln
WHERE title LUCENE "?" LIMIT 1
START n=node:titles("title:?")
RETURN count(*)
solar 4611 215263
panel 3318 77442
druck 2890 12503
machine 1846 198479
cell 2351 34685
automatic vehicle 1063 49283
super efficient 984 4054
motor 465 47085
airplane 1172 429
windshield 62 585
Average 9 of 10 1 of 10
ComparisonPrototype
Query #3.2
OrientDB Neo4j
OrientDB vs Neo4j - Comparison13
Checking Fulltext Lucene Lookup Overall Count on 2 indices
?
Note on Neo4j:
more than one word needs to
be put in a new property
statement, e.g. instead of
'title:super efficient'
we take 'title:super OR
title:efficient'
SELECT $totalHits
FROMAppln
WHERE [title,abstract] LUCENE "?" LIMIT 1
START n=node:titles ('title:?')
MATCH (n)-[:HAS_ABSTRACT]->(a)WHERE a.abstract =~ ".*?.*"
RETURN count(*)
solar 227234
panel
druck
machine
cell
automatic vehicle
super efficient
motor
airplane
windshield
Average
ComparisonPrototype
Query #4
OrientDB Neo4j
OrientDB vs Neo4j - Comparison14
Internal ID function node lookup
?
OrientDB
?
Neo4j
SELECT title FROM #11:? / SELECT name FROM #12:? START n=node(?) RETURN n.title / START n=node(?) RETURN
n.name
11:0 0 1 10 816
11:141 141 1 13 27
11:26526 26526 3 13 28
11:2526 2526 2 12 27
11:6262 6262 1 12 28
12:0 76594275 1 11 25
12:515 76594790 2 14 23
12:4115 76598390 3 14 25
12:52627 76646902 2 13 26
12:47484 76641759 1 13 25
Average 2 (10 of 10) 13 (0 of 10)
No system cache cleared System cache cleared
ComparisonPrototype
Query #5
OrientDB Neo4j
OrientDB vs Neo4j - Comparison15
Count Applns of a specific Person
?
OrientDB
?
Neo4j
SELECT out(WROTE).size()
FROM #?
START p=node(?)
MATCH (p)-[:WROTE]->(a)
RETURN count(*)
12:0 76594275 8 81 980
12:1 76594276 1 18 42
12:2 76594277 1 20 41
12:3 76594278 1 18 38
12:4 76594279 1 17 39
12:5 76594280 1 23 41
12:6 76594281 1 21 37
12:7 76594282 1 17 43
12:8 76594283 1 18 45
12:9 76594284 1 17 41
Average 1 (10 of 10) 25 (0 of 10)
No system cache cleared System cache cleared
ComparisonPrototype
Query #6
OrientDB Neo4j
OrientDB vs Neo4j - Comparison16
Searching for 3 Applns of one specific Person
?
OrientDB
?
Neo4j
select out.@class as sourceClass,out.@rid as source ,out.name
as sourceName,in.@class as targetClass,in.@rid as target,in.ID
as targetID ,in.nrEpodoc as targetName from (select
expand(outE('WROTE')) from #?) order by targetID ASC limit 3
START p=node(?)
MATCH (p)-[:WROTE]->(a)
RETURN labels(p) as sourceClass, id(p) as source, p.name as
sourceName, labels(a) as targetClass, id(a) as target, a.nrEpodoc
as targetNameORDER BY a.ID ASC LIMIT 3
12:0 76594275 1051 107 212
12:1 76594276 3 39 77
12:2 76594277 2 40 68
12:3 76594278 2 38 60
12:4 76594279 3 41 58
12:5 76594280 53 59 55
12:6 76594281 56 53 59
12:7 76594282 7 38 56
12:8 76594283 5 38 62
12:9 76594284 2 33 66
Average 118 (8 of 10) 49 (2 of 10)
No system cache cleared System cache cleared
ComparisonPrototype
Query #7
OrientDB Neo4j
OrientDB vs Neo4j - Comparison17
Searching for Appln.title and Appln.abstract
return Person.name matching both
?
Title
SELECT FROM (SELECT title,abstract,ID from Appln
where [title,abstract] LUCENE "?" ORDER BY ID) LIMIT 3
START p=node:titles('title:?')
MATCH (p)-[:HAS_ABSTRACT]->(a) WHERE a.abstract
=~ ".*?.*"
RETURN p.title,a.abstract,a.ID ORDER BY a.ID LIMIT 3
panel 1733261 424789
Average
ComparisonPrototype
Query #7
OrientDB Neo4j
OrientDB vs Neo4j - Comparison18
Searching a Person.name + searching on Appln.title for Appln of that specific Person
return Person.name matching both
?
Title
START p=node:people('name:?')
MATCH (p)-[:WROTE]->(a) WHERE a.title =~ ".*?.*"
RETURN p.name,a.title,a.IDORDER BY a.ID LIMIT 3
machine 99538
Average
ComparisonPrototype
Query #8
OrientDB Neo4j
OrientDB vs Neo4j - Comparison19
Searching for an Abstract of an Appln
?
Note on Neo4j:
more than one word needs to
be put in a new property
statement, e.g. instead of
'title:super efficient'
we take 'title:super OR
title:efficient'
select @rid,abstract,ID as titleID,in(HAS_ABSTRACT).title as
title,in(HAS_ABSTRACT).ID as AbstrID fromAbstract where
abstract LUCENE "method" LIMIT 3
START n=node:abstracts("abstract:method")
WITH n limit 3
MATCH (x:Appln)-[:HAS_ABSTRACT]->(n)
RETURN n.ID,x.ID
solar
panel
druck
machine
cell
automatic vehicle
super efficient
motor
airplane
windshield
Average
ComparisonPrototype
Query #9
OrientDB Neo4j
OrientDB vs Neo4j - Comparison20
Counting the Applns of Person.names containing a specific name
? SELECT sum(out(WROTE).size())
FROM Person
WHERE name LUCENE "?" LIMIT -1
START p=node:people('name:?')
MATCH (p)-[:WROTE]->(a)
RETURN count(a)
bosch 7475 3771
intel 13261 7461
siemens 19302 16297
audi 3888 1844
volkswagen 2872 1298
toyota 23223 13561
sony 16520 11449
panasonic 6314 2287
microsoft 2849 1313
apple 3127 1088
Average 0 of 10 10 of 10
Comparison (Functionality)
OrientDB vs Neo4j - Comparison 21
ComparisonPrototype
Database Overview
OrientDB
• Schema, naming policies, overall records,
cluster infos and many more infos
• Whole page in 0,1 sec
Neo4j
• No schema infos except naming policies
• Counting single label nodes takes ~10 min
OrientDB vs Neo4j - Comparison22
Easy and fast way to check state of the database Neo4j‘s supported way to get infos on all
labels in one query just gives a Heap Error
(maybe too much data?)
ComparisonPrototype
Graph Explorer
OrientDB
• Good overview, straightforward and fast
• Nodes can be edited, edges added
• Never-ending-graph like
Neo4j
• Showing nodes/edges and when being clicked
some infos about
• No other features, not even zooming or
dragging all elements
OrientDB vs Neo4j - Comparison23
Good for checking graph issues as near as possible to the database
v.2 only!
ComparisonPrototype
Result view
OrientDB
• Great overview and paging possible to lower
showup and query speed
• If you miss setting a „LIMIT“ it‘s set for you!
• Using new GraphTab for visual things (v.2!)
Neo4j
• Graph andTable view
• Miss setting a LIMIT? Go smoking 
• Graph just able to see up to 10 nodes
• Table view endless scrolling
OrientDB vs Neo4j - Comparison24
Getting an overview is quite important to check specific query issues
ComparisonPrototype
Function integration
OrientDB
• Good overview and management
• Integrated in the Studio
• No restart needed
• Functions can even be copied to another db
Neo4j
• Server plugins [1]
• Needs to be written in Java and inherited from
ServerPlugin class
• No overview
• Not fail-save
• No easy change/access
• Requires Server restart
• Many lines for simple things
OrientDB vs Neo4j - Comparison25
Needed for exchange information with the prototype
ComparisonPrototype
Query style
OrientDB
• Simple querys really short
• Hard to write querys when they are getting
complex
• Bad overview and using variable names not
intuitive
Neo4j
• Simple querys really long due to needed
cypher statements
• Easy to write also complex querys
• Using variables name is very intuivite and
always keeping up the overview
OrientDB vs Neo4j - Comparison26
Useful for result checking and testings
ComparisonPrototype
Lucene Index
OrientDB
• Still a „new“ addon
• Prior v.2 plugin needed
• With v.2 integreated in OrientDB
• Use it as if you set an usual index
• Index can easily be changed at any time
• Analyzer can be easily changed
Neo4j
• Neo4j does not always use Lucene as indexer
• Needs to be set before importing data
• Works together via node_auto_index
configuration
• Changing index or set index to Lucene after the
import is not viable in terms of time aspects
• Analyzer is not easy to change
OrientDB vs Neo4j - Comparison27
Important for full text search the new graph tab builds up
ComparisonPrototype
Security
OrientDB
• Different security levels (like in MySQL)
Neo4j
• None
OrientDB vs Neo4j - Comparison28
Good for integrating more databases and setting access levels
ComparisonPrototype
Disc usage
OrientDB
• Db size = 120 GB
• Classes in different files
• Classes can also be easily deleted by external
deletion
Neo4j
• Db size = 40 GB
• Nodes, properties and relations in separate
files
• Specific data can only be deleted by Neo4j
commands
OrientDB vs Neo4j - Comparison29
Good for testing and later reliability
ComparisonPrototype
Future Perspective
OrientDB
• OrientDB still „new“ on the market, many
features still coming
• Still much place for improvements
• Brings the possibility to replace MySQL
Neo4j
• Neo4j „oldest“ Graph database and nearly any
feature in there
• Algorithms already improved as best as
possible
• No possiblity to replace a current system, just
an extension for using graphs
OrientDB vs Neo4j - Comparison30
To see ahead of the current state
ComparisonPrototype
Costs
OrientDB
• Good support for free available
• Commercial support much cheaper than Neo4j
• EnterpriseVersion available with good
monitoring features
Neo4j
• Commercial support needed to setup a well
defined database
• Features like clustering only available when
paying (e.g. important for our where clause)
OrientDB vs Neo4j - Comparison31
Important for startups
ComparisonPrototype
Support / Production speed / Own Ideas
OrientDB
• Good support via
• E-Mail
• Google Group (anyone from the team helping)
• Gitter
• Github
• Every 2-3 weeks new release
• Own Issues answered in 1-2 day
• Own ideas are discussed, every day 30-40
comments in Github
Neo4j
• Poor support for the most popular graph db
• Google Group only semi-active community
• Just one member from Neo4j helping there
• Every 1-2 month new release
• Own issues answered ~1 week
• Own ideas are mainly ignored, every day 20-30
comments in Github
OrientDB vs Neo4j - Comparison32
Important for later issue solvings
Results (Speed)
Measure OrientDB Neo4j
Import no use of MT/mapping full use of MT/mapping
Startup/Shutdown Speed x -
Query #1 Checking Single ID lookup x -
Query #2 Checking Fulltext Lucene Lookup - x
Query #3.1 Checking Fulltext Lucene Lookup Overall Count on 1 indices x -
Query #3.2 Checking Fulltext Lucene Lookup Overall Count on 2 indices - -
Query #4 Internal ID function node lookup x -
Query #5 Count Applns of a specific Person x -
Query #6 Searching for 3 Applns of one specific Person single bolter making poor average value always quite same speed
Query #7 Searching a Person.name + searching on Appln.title for Appln - -
Query #8 Searching for an Abstract of an Appln - -
Query #9 Counting the Applns of Person.names containing a specific name - x
Results 4 3
OrientDB vs Neo4j - Comparison 33
Results (Misc)
Measure OrientDB Neo4j
Database Overview x
Graph Explorer x
Result View x
Function Integreation x
Query style x
Lucene Index x
Security x
Disc Usage every class in single file using less disk space
Future Perspective x
Costs x
Support / Production Speed / Own ideas x
Results 9 1
OrientDB vs Neo4j - Comparison 34
Results
• OrientDB working on fixing the very slow querys
• OrientDB has inconsistent query speed somtimes (super high and super low)
• OrientDB Studio is on a really next level
• Neo4j Studio nearly useless compared to OrientDB‘s
OrientDB vs Neo4j - Comparison 35
Supporters
• I want to give a special thanks to Michael Hunger, without him the Neo4j
import would still have trouble
• I also want to thank Enrico Risa for his help and fast implementation of
Lucene improvements
• Keep up the great work!
36OrientDB vs Neo4j - Comparison
Links
• [1] http://docs.neo4j.org/chunked/stable/server-plugins.html
• [2] http://docs.neo4j.org/refcard/2.0/
37OrientDB vs Neo4j - Comparison

Contenu connexe

Tendances

Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryIlya Ganelin
 
Using Neo4j from Java
Using Neo4j from JavaUsing Neo4j from Java
Using Neo4j from JavaNeo4j
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesDatabricks
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceDatabricks
 
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...Databricks
 
Strata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark communityStrata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark communityDatabricks
 
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on TutorialsSparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on TutorialsDatabricks
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Julian Hyde
 
Sasi, cassandra on full text search ride
Sasi, cassandra on full text search rideSasi, cassandra on full text search ride
Sasi, cassandra on full text search rideDuyhai Doan
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkDatabricks
 
Spark SQL with Scala Code Examples
Spark SQL with Scala Code ExamplesSpark SQL with Scala Code Examples
Spark SQL with Scala Code ExamplesTodd McGrath
 
Demystifying DataFrame and Dataset
Demystifying DataFrame and DatasetDemystifying DataFrame and Dataset
Demystifying DataFrame and DatasetKazuaki Ishizaki
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
Works with persistent graphs using OrientDB
Works with persistent graphs using OrientDB Works with persistent graphs using OrientDB
Works with persistent graphs using OrientDB graphdevroom
 
SQL on Big Data using Optiq
SQL on Big Data using OptiqSQL on Big Data using Optiq
SQL on Big Data using OptiqJulian Hyde
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesRussell Spitzer
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...Andrew Lamb
 
Structured Streaming for Columnar Data Warehouses with Jack Gudenkauf
Structured Streaming for Columnar Data Warehouses with Jack GudenkaufStructured Streaming for Columnar Data Warehouses with Jack Gudenkauf
Structured Streaming for Columnar Data Warehouses with Jack GudenkaufDatabricks
 
Data centric Metaprogramming by Vlad Ulreche
Data centric Metaprogramming by Vlad UlrecheData centric Metaprogramming by Vlad Ulreche
Data centric Metaprogramming by Vlad UlrecheSpark Summit
 

Tendances (20)

Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
 
Using Neo4j from Java
Using Neo4j from JavaUsing Neo4j from Java
Using Neo4j from Java
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFrames
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
 
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
 
Strata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark communityStrata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark community
 
Real-World NoSQL Schema Design
Real-World NoSQL Schema DesignReal-World NoSQL Schema Design
Real-World NoSQL Schema Design
 
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on TutorialsSparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
 
Sasi, cassandra on full text search ride
Sasi, cassandra on full text search rideSasi, cassandra on full text search ride
Sasi, cassandra on full text search ride
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Spark SQL with Scala Code Examples
Spark SQL with Scala Code ExamplesSpark SQL with Scala Code Examples
Spark SQL with Scala Code Examples
 
Demystifying DataFrame and Dataset
Demystifying DataFrame and DatasetDemystifying DataFrame and Dataset
Demystifying DataFrame and Dataset
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Works with persistent graphs using OrientDB
Works with persistent graphs using OrientDB Works with persistent graphs using OrientDB
Works with persistent graphs using OrientDB
 
SQL on Big Data using Optiq
SQL on Big Data using OptiqSQL on Big Data using Optiq
SQL on Big Data using Optiq
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector Dataframes
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
Structured Streaming for Columnar Data Warehouses with Jack Gudenkauf
Structured Streaming for Columnar Data Warehouses with Jack GudenkaufStructured Streaming for Columnar Data Warehouses with Jack Gudenkauf
Structured Streaming for Columnar Data Warehouses with Jack Gudenkauf
 
Data centric Metaprogramming by Vlad Ulreche
Data centric Metaprogramming by Vlad UlrecheData centric Metaprogramming by Vlad Ulreche
Data centric Metaprogramming by Vlad Ulreche
 

Similaire à OrientDB vs Neo4j - Comparison of query/speed/functionality

MySQL Performance Optimization
MySQL Performance OptimizationMySQL Performance Optimization
MySQL Performance OptimizationMindfire Solutions
 
Distributed Queries in IDS: New features.
Distributed Queries in IDS: New features.Distributed Queries in IDS: New features.
Distributed Queries in IDS: New features.Keshav Murthy
 
Drill / SQL / Optiq
Drill / SQL / OptiqDrill / SQL / Optiq
Drill / SQL / OptiqJulian Hyde
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)Jerome Eteve
 
Odtug2011 adf developers make the database work for you
Odtug2011 adf developers make the database work for youOdtug2011 adf developers make the database work for you
Odtug2011 adf developers make the database work for youLuc Bors
 
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...Qbeast
 
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...jexp
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Serban Tanasa
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer OverviewOlav Sandstå
 
SFDC Advanced Apex
SFDC Advanced Apex SFDC Advanced Apex
SFDC Advanced Apex Sujit Kumar
 
SQL Performance Tuning and New Features in Oracle 19c
SQL Performance Tuning and New Features in Oracle 19cSQL Performance Tuning and New Features in Oracle 19c
SQL Performance Tuning and New Features in Oracle 19cRachelBarker26
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer OverviewOlav Sandstå
 
MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0Manyi Lu
 
Scylla Summit 2017: Planning Your Queries for Maximum Performance
Scylla Summit 2017: Planning Your Queries for Maximum PerformanceScylla Summit 2017: Planning Your Queries for Maximum Performance
Scylla Summit 2017: Planning Your Queries for Maximum PerformanceScyllaDB
 
Fast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL ReleasesFast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL ReleasesDataWorks Summit
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsDatabricks
 
MySQL Optimizer Cost Model
MySQL Optimizer Cost ModelMySQL Optimizer Cost Model
MySQL Optimizer Cost ModelOlav Sandstå
 
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)Jason L Brugger
 
Sql query tuning or query optimization
Sql query tuning or query optimizationSql query tuning or query optimization
Sql query tuning or query optimizationVivek Singh
 

Similaire à OrientDB vs Neo4j - Comparison of query/speed/functionality (20)

MySQL Performance Optimization
MySQL Performance OptimizationMySQL Performance Optimization
MySQL Performance Optimization
 
Distributed Queries in IDS: New features.
Distributed Queries in IDS: New features.Distributed Queries in IDS: New features.
Distributed Queries in IDS: New features.
 
Drill / SQL / Optiq
Drill / SQL / OptiqDrill / SQL / Optiq
Drill / SQL / Optiq
 
Chapter15
Chapter15Chapter15
Chapter15
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)
 
Odtug2011 adf developers make the database work for you
Odtug2011 adf developers make the database work for youOdtug2011 adf developers make the database work for you
Odtug2011 adf developers make the database work for you
 
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
 
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 
SFDC Advanced Apex
SFDC Advanced Apex SFDC Advanced Apex
SFDC Advanced Apex
 
SQL Performance Tuning and New Features in Oracle 19c
SQL Performance Tuning and New Features in Oracle 19cSQL Performance Tuning and New Features in Oracle 19c
SQL Performance Tuning and New Features in Oracle 19c
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 
MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0
 
Scylla Summit 2017: Planning Your Queries for Maximum Performance
Scylla Summit 2017: Planning Your Queries for Maximum PerformanceScylla Summit 2017: Planning Your Queries for Maximum Performance
Scylla Summit 2017: Planning Your Queries for Maximum Performance
 
Fast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL ReleasesFast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL Releases
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDs
 
MySQL Optimizer Cost Model
MySQL Optimizer Cost ModelMySQL Optimizer Cost Model
MySQL Optimizer Cost Model
 
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
 
Sql query tuning or query optimization
Sql query tuning or query optimizationSql query tuning or query optimization
Sql query tuning or query optimization
 

Dernier

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 

Dernier (20)

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 

OrientDB vs Neo4j - Comparison of query/speed/functionality

  • 1. OrientDB vs Neo4j Comparisons (querys and functionality) Curtis Mosters @02.12.2014
  • 2. Content • Schema • Indexes • Comparison • Query/Speed • Functionality • Results 2OrientDB vs Neo4j - Comparison
  • 4. Indexes • Appln.title • LUCENE FULLTEXT • Appln.ID • SBTREE UNIQUE (in Neo4j the usual INDEX) • Person.title • LUCENE FULLTEXT • Person.ID • SBTREE UNIQUE (in Neo4j the usual INDEX) 4OrientDB vs Neo4j - Comparison
  • 5. ComparisonPrototype Querys and used systems • comparing the speed of both on typical requests • Linux 64-bit (same instance on AWS) • OrientDB v.2.0M2 • Neo4j v.2.1.5 • Speed tests are done in the same order the slides/rows are • One database per instance  2 instances • Servers are idling and just OrientDB/Neo4j running • Querys are tested by hand on the command line (not in the studio) • Querys always having the same results on both databases • Times are always given in milliseconds (ms) if not specified • Both databases using the StandardAnalyzer from Lucene • Cache cleared after querys
  • 6. ComparisonPrototype System cache notes • OrientDB is always clearing the cache when restarted • Neo4j does not clear the cache • So in the Neo4j column I in some cases tested with cleared system cache and sometimes without • If there is just one column on Neo4j it is „No System cache cleared“
  • 8. ComparisonPrototype Import OrientDB • Official supported methods • OrientDB-ETL/JDBC • Java API • Clean Java code • ETL tool is performant but at last tests having issues with edge creation • Not using Multi-Threading • Not using Mapping Neo4j • Official supported methods • LOAD CSV command • Java API • Groovy • Batch-Importer • Talend • No really „easy“ way but Java is the fastest and most reliable way • Using Multi-Threading and Mapping OrientDB vs Neo4j - Comparison8 ~300mio lines {APPLNs,TITLEs,PERSONs} with edges and indexes 25 hours 19 hours
  • 9. ComparisonPrototype Startup/Shutdown speed OrientDB • Nearly always the same time when starting or shutting down the server • 2 sec – 10 sec Neo4j • Different times when starting and especially by shutting down the server when task is still running • 3 sec – 3 min (no infos) OrientDB vs Neo4j - Comparison9 Good for testing and later reliability
  • 10. ComparisonPrototype Query #1 OrientDB Neo4j OrientDB vs Neo4j - Comparison10 Checking Single ID lookup ? SELECT FROM Appln WHERE ID=? MATCH (a:Appln)WHERE a.ID=? RETURN a 1412 27 71 939 763773 9 30 44 234526 15 26 43 858584 10 25 44 536367 11 25 43 2323 17 18 31 5267 1 15 24 73573 14 29 35 585985 10 25 34 797977 10 26 35 Average 12,4 (10 of 10) 29 (0 of 10) No system cache cleared System cache cleared
  • 11. ComparisonPrototype Query #2 OrientDB Neo4j OrientDB vs Neo4j - Comparison11 Checking Fulltext Lucene Lookup ? Note on Neo4j: more than one word needs to be put in a new property statement, e.g. instead of 'title:super efficient' we take 'title:super OR title:efficient' SELECT FROM (SELECT title,ID FROM ApplnWHERE title LUCENE "?" ORDER BY ID) LIMIT 10 START n=node:titles('title:?') RETURN n.title,n.IDORDER BY n.ID LIMIT 10 solar 10172 801 137088 panel 263698 121494 161215 druck 25582 9679 11290 machine 1146339 297645 357818 cell 253565 55397 26298 automatic vehicle 961054 131772 163794 super efficient 53380 8432 8707 motor 398803 79527 46687 airplane 14066 892 390 windshield 8969 1004 536 Average 313 sec (5,2 min) (0 of 10) 70 sec (10 of 10) No system cache cleared System cache cleared
  • 12. ComparisonPrototype Query #3.1 OrientDB Neo4j OrientDB vs Neo4j - Comparison12 Checking Fulltext Lucene Lookup Overall Count on 1 indices ? Note on Neo4j: more than one word needs to be put in a new property statement, e.g. instead of 'title:super efficient' we take 'title:super OR title:efficient' SELECT $totalHits FROMAppln WHERE title LUCENE "?" LIMIT 1 START n=node:titles("title:?") RETURN count(*) solar 4611 215263 panel 3318 77442 druck 2890 12503 machine 1846 198479 cell 2351 34685 automatic vehicle 1063 49283 super efficient 984 4054 motor 465 47085 airplane 1172 429 windshield 62 585 Average 9 of 10 1 of 10
  • 13. ComparisonPrototype Query #3.2 OrientDB Neo4j OrientDB vs Neo4j - Comparison13 Checking Fulltext Lucene Lookup Overall Count on 2 indices ? Note on Neo4j: more than one word needs to be put in a new property statement, e.g. instead of 'title:super efficient' we take 'title:super OR title:efficient' SELECT $totalHits FROMAppln WHERE [title,abstract] LUCENE "?" LIMIT 1 START n=node:titles ('title:?') MATCH (n)-[:HAS_ABSTRACT]->(a)WHERE a.abstract =~ ".*?.*" RETURN count(*) solar 227234 panel druck machine cell automatic vehicle super efficient motor airplane windshield Average
  • 14. ComparisonPrototype Query #4 OrientDB Neo4j OrientDB vs Neo4j - Comparison14 Internal ID function node lookup ? OrientDB ? Neo4j SELECT title FROM #11:? / SELECT name FROM #12:? START n=node(?) RETURN n.title / START n=node(?) RETURN n.name 11:0 0 1 10 816 11:141 141 1 13 27 11:26526 26526 3 13 28 11:2526 2526 2 12 27 11:6262 6262 1 12 28 12:0 76594275 1 11 25 12:515 76594790 2 14 23 12:4115 76598390 3 14 25 12:52627 76646902 2 13 26 12:47484 76641759 1 13 25 Average 2 (10 of 10) 13 (0 of 10) No system cache cleared System cache cleared
  • 15. ComparisonPrototype Query #5 OrientDB Neo4j OrientDB vs Neo4j - Comparison15 Count Applns of a specific Person ? OrientDB ? Neo4j SELECT out(WROTE).size() FROM #? START p=node(?) MATCH (p)-[:WROTE]->(a) RETURN count(*) 12:0 76594275 8 81 980 12:1 76594276 1 18 42 12:2 76594277 1 20 41 12:3 76594278 1 18 38 12:4 76594279 1 17 39 12:5 76594280 1 23 41 12:6 76594281 1 21 37 12:7 76594282 1 17 43 12:8 76594283 1 18 45 12:9 76594284 1 17 41 Average 1 (10 of 10) 25 (0 of 10) No system cache cleared System cache cleared
  • 16. ComparisonPrototype Query #6 OrientDB Neo4j OrientDB vs Neo4j - Comparison16 Searching for 3 Applns of one specific Person ? OrientDB ? Neo4j select out.@class as sourceClass,out.@rid as source ,out.name as sourceName,in.@class as targetClass,in.@rid as target,in.ID as targetID ,in.nrEpodoc as targetName from (select expand(outE('WROTE')) from #?) order by targetID ASC limit 3 START p=node(?) MATCH (p)-[:WROTE]->(a) RETURN labels(p) as sourceClass, id(p) as source, p.name as sourceName, labels(a) as targetClass, id(a) as target, a.nrEpodoc as targetNameORDER BY a.ID ASC LIMIT 3 12:0 76594275 1051 107 212 12:1 76594276 3 39 77 12:2 76594277 2 40 68 12:3 76594278 2 38 60 12:4 76594279 3 41 58 12:5 76594280 53 59 55 12:6 76594281 56 53 59 12:7 76594282 7 38 56 12:8 76594283 5 38 62 12:9 76594284 2 33 66 Average 118 (8 of 10) 49 (2 of 10) No system cache cleared System cache cleared
  • 17. ComparisonPrototype Query #7 OrientDB Neo4j OrientDB vs Neo4j - Comparison17 Searching for Appln.title and Appln.abstract return Person.name matching both ? Title SELECT FROM (SELECT title,abstract,ID from Appln where [title,abstract] LUCENE "?" ORDER BY ID) LIMIT 3 START p=node:titles('title:?') MATCH (p)-[:HAS_ABSTRACT]->(a) WHERE a.abstract =~ ".*?.*" RETURN p.title,a.abstract,a.ID ORDER BY a.ID LIMIT 3 panel 1733261 424789 Average
  • 18. ComparisonPrototype Query #7 OrientDB Neo4j OrientDB vs Neo4j - Comparison18 Searching a Person.name + searching on Appln.title for Appln of that specific Person return Person.name matching both ? Title START p=node:people('name:?') MATCH (p)-[:WROTE]->(a) WHERE a.title =~ ".*?.*" RETURN p.name,a.title,a.IDORDER BY a.ID LIMIT 3 machine 99538 Average
  • 19. ComparisonPrototype Query #8 OrientDB Neo4j OrientDB vs Neo4j - Comparison19 Searching for an Abstract of an Appln ? Note on Neo4j: more than one word needs to be put in a new property statement, e.g. instead of 'title:super efficient' we take 'title:super OR title:efficient' select @rid,abstract,ID as titleID,in(HAS_ABSTRACT).title as title,in(HAS_ABSTRACT).ID as AbstrID fromAbstract where abstract LUCENE "method" LIMIT 3 START n=node:abstracts("abstract:method") WITH n limit 3 MATCH (x:Appln)-[:HAS_ABSTRACT]->(n) RETURN n.ID,x.ID solar panel druck machine cell automatic vehicle super efficient motor airplane windshield Average
  • 20. ComparisonPrototype Query #9 OrientDB Neo4j OrientDB vs Neo4j - Comparison20 Counting the Applns of Person.names containing a specific name ? SELECT sum(out(WROTE).size()) FROM Person WHERE name LUCENE "?" LIMIT -1 START p=node:people('name:?') MATCH (p)-[:WROTE]->(a) RETURN count(a) bosch 7475 3771 intel 13261 7461 siemens 19302 16297 audi 3888 1844 volkswagen 2872 1298 toyota 23223 13561 sony 16520 11449 panasonic 6314 2287 microsoft 2849 1313 apple 3127 1088 Average 0 of 10 10 of 10
  • 21. Comparison (Functionality) OrientDB vs Neo4j - Comparison 21
  • 22. ComparisonPrototype Database Overview OrientDB • Schema, naming policies, overall records, cluster infos and many more infos • Whole page in 0,1 sec Neo4j • No schema infos except naming policies • Counting single label nodes takes ~10 min OrientDB vs Neo4j - Comparison22 Easy and fast way to check state of the database Neo4j‘s supported way to get infos on all labels in one query just gives a Heap Error (maybe too much data?)
  • 23. ComparisonPrototype Graph Explorer OrientDB • Good overview, straightforward and fast • Nodes can be edited, edges added • Never-ending-graph like Neo4j • Showing nodes/edges and when being clicked some infos about • No other features, not even zooming or dragging all elements OrientDB vs Neo4j - Comparison23 Good for checking graph issues as near as possible to the database v.2 only!
  • 24. ComparisonPrototype Result view OrientDB • Great overview and paging possible to lower showup and query speed • If you miss setting a „LIMIT“ it‘s set for you! • Using new GraphTab for visual things (v.2!) Neo4j • Graph andTable view • Miss setting a LIMIT? Go smoking  • Graph just able to see up to 10 nodes • Table view endless scrolling OrientDB vs Neo4j - Comparison24 Getting an overview is quite important to check specific query issues
  • 25. ComparisonPrototype Function integration OrientDB • Good overview and management • Integrated in the Studio • No restart needed • Functions can even be copied to another db Neo4j • Server plugins [1] • Needs to be written in Java and inherited from ServerPlugin class • No overview • Not fail-save • No easy change/access • Requires Server restart • Many lines for simple things OrientDB vs Neo4j - Comparison25 Needed for exchange information with the prototype
  • 26. ComparisonPrototype Query style OrientDB • Simple querys really short • Hard to write querys when they are getting complex • Bad overview and using variable names not intuitive Neo4j • Simple querys really long due to needed cypher statements • Easy to write also complex querys • Using variables name is very intuivite and always keeping up the overview OrientDB vs Neo4j - Comparison26 Useful for result checking and testings
  • 27. ComparisonPrototype Lucene Index OrientDB • Still a „new“ addon • Prior v.2 plugin needed • With v.2 integreated in OrientDB • Use it as if you set an usual index • Index can easily be changed at any time • Analyzer can be easily changed Neo4j • Neo4j does not always use Lucene as indexer • Needs to be set before importing data • Works together via node_auto_index configuration • Changing index or set index to Lucene after the import is not viable in terms of time aspects • Analyzer is not easy to change OrientDB vs Neo4j - Comparison27 Important for full text search the new graph tab builds up
  • 28. ComparisonPrototype Security OrientDB • Different security levels (like in MySQL) Neo4j • None OrientDB vs Neo4j - Comparison28 Good for integrating more databases and setting access levels
  • 29. ComparisonPrototype Disc usage OrientDB • Db size = 120 GB • Classes in different files • Classes can also be easily deleted by external deletion Neo4j • Db size = 40 GB • Nodes, properties and relations in separate files • Specific data can only be deleted by Neo4j commands OrientDB vs Neo4j - Comparison29 Good for testing and later reliability
  • 30. ComparisonPrototype Future Perspective OrientDB • OrientDB still „new“ on the market, many features still coming • Still much place for improvements • Brings the possibility to replace MySQL Neo4j • Neo4j „oldest“ Graph database and nearly any feature in there • Algorithms already improved as best as possible • No possiblity to replace a current system, just an extension for using graphs OrientDB vs Neo4j - Comparison30 To see ahead of the current state
  • 31. ComparisonPrototype Costs OrientDB • Good support for free available • Commercial support much cheaper than Neo4j • EnterpriseVersion available with good monitoring features Neo4j • Commercial support needed to setup a well defined database • Features like clustering only available when paying (e.g. important for our where clause) OrientDB vs Neo4j - Comparison31 Important for startups
  • 32. ComparisonPrototype Support / Production speed / Own Ideas OrientDB • Good support via • E-Mail • Google Group (anyone from the team helping) • Gitter • Github • Every 2-3 weeks new release • Own Issues answered in 1-2 day • Own ideas are discussed, every day 30-40 comments in Github Neo4j • Poor support for the most popular graph db • Google Group only semi-active community • Just one member from Neo4j helping there • Every 1-2 month new release • Own issues answered ~1 week • Own ideas are mainly ignored, every day 20-30 comments in Github OrientDB vs Neo4j - Comparison32 Important for later issue solvings
  • 33. Results (Speed) Measure OrientDB Neo4j Import no use of MT/mapping full use of MT/mapping Startup/Shutdown Speed x - Query #1 Checking Single ID lookup x - Query #2 Checking Fulltext Lucene Lookup - x Query #3.1 Checking Fulltext Lucene Lookup Overall Count on 1 indices x - Query #3.2 Checking Fulltext Lucene Lookup Overall Count on 2 indices - - Query #4 Internal ID function node lookup x - Query #5 Count Applns of a specific Person x - Query #6 Searching for 3 Applns of one specific Person single bolter making poor average value always quite same speed Query #7 Searching a Person.name + searching on Appln.title for Appln - - Query #8 Searching for an Abstract of an Appln - - Query #9 Counting the Applns of Person.names containing a specific name - x Results 4 3 OrientDB vs Neo4j - Comparison 33
  • 34. Results (Misc) Measure OrientDB Neo4j Database Overview x Graph Explorer x Result View x Function Integreation x Query style x Lucene Index x Security x Disc Usage every class in single file using less disk space Future Perspective x Costs x Support / Production Speed / Own ideas x Results 9 1 OrientDB vs Neo4j - Comparison 34
  • 35. Results • OrientDB working on fixing the very slow querys • OrientDB has inconsistent query speed somtimes (super high and super low) • OrientDB Studio is on a really next level • Neo4j Studio nearly useless compared to OrientDB‘s OrientDB vs Neo4j - Comparison 35
  • 36. Supporters • I want to give a special thanks to Michael Hunger, without him the Neo4j import would still have trouble • I also want to thank Enrico Risa for his help and fast implementation of Lucene improvements • Keep up the great work! 36OrientDB vs Neo4j - Comparison
  • 37. Links • [1] http://docs.neo4j.org/chunked/stable/server-plugins.html • [2] http://docs.neo4j.org/refcard/2.0/ 37OrientDB vs Neo4j - Comparison