SlideShare a Scribd company logo
1 of 31
ETL into Neo4j
    Max De Marzi
About Me
    Built the Neography Gem (Ruby
    Wrapper to the Neo4j REST API)
    Playing with Neo4j since 10/2009


•   My Blog: http://maxdemarzi.com
•   Find me on Twitter: @maxdemarzi
•   Email me: maxdemarzi@gmail.com
•   GitHub: http://github.com/maxdemarzi
Agenda
•   ETL your mind
•   ETL with Batch and the REST API
•   ETL with Gremlin and Groovy
•   ETL with the Batch Importer
•   ETL from SQL
ETL your Mind

You have to start there
More Relational than Relational
Stop thinking about how   Start thinking about relationships
Tables are related
Objects like to mingle
Optimized for “trees” of data   Optimized for seeing the forest and the
                                trees, and the branches, and the trunks
SELECT skills.*, user_skill.*
FROM users
JOIN user_skill ON users.id = user_skill.user_id
JOIN skills ON user_skill.skill_id = skill.id
WHERE users.id = 1
START user = node(1)
MATCH user -[user_skill]-> skill
RETURN skill, user_skill
Property Graph
Language    LanguageCountry          Country

language_code     language_code      country_code
language_name     country_code       country_name
word_count        primary            flag_uri




       Language                             Country

name                                 name
                    IS_SPOKEN_IN
code                                 code
word_count           as_primary      flag_uri
name: “Canada”
                 languages_spoken: “[ „English‟, „French‟ ]”




                           language:“English”     spoken_in
                                                               name: “USA”




name: “Canada”




                 language:“French”    spoken_in
                                                     name: “France”
Country

                 name
                 flag_uri
                 language_name
                 number_of_words
                 yes_in_langauge
                 no_in_language
                 currency_code
                 currency_name

       Country
                                          Language
name                               name
flag_uri                SPEAKS
                                   number_of_words
                                   yes
                                   no
                        Currency
                   code
                   name
ETL with Batch and the REST API
Batch command from REST API




Great for importing Facebook/Twitter friends

Keep each request under 10k commands

Preferably send a request every 2k to 5k commands
Using Batch from Neography
Why Batch
 Transactional: any failures not
 committed.

 Ordered: responses guaranteed
 to be in the same order as sent.

 Continuous loading/updating
 nodes and relationships in
 spurts or streaming.
ETL with Gremlin and Groovy
Commit every 1000 changes or so, make sure to stop the transaction to commit the
last few changes at the very end.


Look into auto-indexing to make life easier.




Disabled by default. See Docs for trick to make it full text
instead of exact index.

http://docs.neo4j.org/chunked/milestone/auto-indexing.html
Crazy Format is ok
                                                               Id :: Title :: Genre|Genre|Genre

                                                               But it’s preferable to stay clear of
                                                               escape characters like “|”




String location of data file, converted to URL, then processed one line at a time.
Movie vertex created, genre vertex created unless it exists (index lookup), edge
from movie to genre is created.

Full walk-through on http://maxdemarzi.com/2012/01/13/neo4j-on-heroku-part-
one/
ETL with the Batch Importer
Installation Walk-Through
Testing it




7.5M nodes, 42M relationships in just over 3 minutes on a laptop.
Loading it into Neo4j




Full walk-through on http://maxdemarzi.com/2012/02/28/batch-importer-part-1/
When to use the Batch Importer?

           • 1st time loading or
             periodic reloading

           • When you need Speed

           • When you don’t mind a
             little Java
ETL from SQL
Identities who vouched for each other




row_number() and INTO are our friends
The “term” vouched for will serve as our relationship type, status is a relationship property.
Notice there are no node ids.
These are automatic, clkao is node 1
No time to get coffee   >8-[
What about multiple types of nodes?
No problem, just add the MAX(node_id) from the first table.




                  Full walk-through at:
                  http://maxdemarzi.com/2012/02/28/batch-importer-part-2/

                  Need help? E-mail me, catch me on Google chat or Skype.

                  Please don’t be shy…. and read my blog:


                   http://maxdemarzi.com
Thank you!
 http://maxdemarzi.com

More Related Content

What's hot

Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4jjexp
 
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...DataWorks Summit/Hadoop Summit
 
The Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewThe Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewNeo4j
 
Neo4j - Cas d'usages pour votre métier
Neo4j - Cas d'usages pour votre métierNeo4j - Cas d'usages pour votre métier
Neo4j - Cas d'usages pour votre métierNeo4j
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4jNeo4j
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseDataWorks Summit/Hadoop Summit
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.Navdeep Charan
 
Applying Network Analytics in KYC
Applying Network Analytics in KYCApplying Network Analytics in KYC
Applying Network Analytics in KYCNeo4j
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalBhaskar Mitra
 
Azure AI platform - Automated ML workshop
Azure AI platform - Automated ML workshopAzure AI platform - Automated ML workshop
Azure AI platform - Automated ML workshopParashar Shah
 
Intro to Neo4j
Intro to Neo4jIntro to Neo4j
Intro to Neo4jNeo4j
 
The Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent ApplicationsThe Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent ApplicationsNeo4j
 
Graphs for Genealogists
Graphs for GenealogistsGraphs for Genealogists
Graphs for GenealogistsNeo4j
 
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Ed Fernandez
 

What's hot (20)

MLOps at OLX
MLOps at OLXMLOps at OLX
MLOps at OLX
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
 
The Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewThe Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j Overview
 
Neo4j - Cas d'usages pour votre métier
Neo4j - Cas d'usages pour votre métierNeo4j - Cas d'usages pour votre métier
Neo4j - Cas d'usages pour votre métier
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4j
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
DDS QoS Unleashed
DDS QoS UnleashedDDS QoS Unleashed
DDS QoS Unleashed
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
 
Applying Network Analytics in KYC
Applying Network Analytics in KYCApplying Network Analytics in KYC
Applying Network Analytics in KYC
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information Retrieval
 
Azure AI platform - Automated ML workshop
Azure AI platform - Automated ML workshopAzure AI platform - Automated ML workshop
Azure AI platform - Automated ML workshop
 
Intro to Neo4j
Intro to Neo4jIntro to Neo4j
Intro to Neo4j
 
The Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent ApplicationsThe Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent Applications
 
Real time simulation with HLA and DDS
Real time simulation with HLA and DDSReal time simulation with HLA and DDS
Real time simulation with HLA and DDS
 
UML Profile for DDS
UML Profile for DDSUML Profile for DDS
UML Profile for DDS
 
Graphs for Genealogists
Graphs for GenealogistsGraphs for Genealogists
Graphs for Genealogists
 
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
 

Viewers also liked

Converting Relational to Graph Databases
Converting Relational to Graph DatabasesConverting Relational to Graph Databases
Converting Relational to Graph DatabasesAntonio Maccioni
 
Relational to Graph - Import
Relational to Graph - ImportRelational to Graph - Import
Relational to Graph - ImportNeo4j
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4jNeo4j
 
Importing Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowImporting Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowNeo4j
 
Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...Neo4j
 
Migrating from MongoDB to Neo4j - Lessons Learned
Migrating from MongoDB to Neo4j - Lessons LearnedMigrating from MongoDB to Neo4j - Lessons Learned
Migrating from MongoDB to Neo4j - Lessons LearnedNick Manning
 
Пути открытого доступа и открытые лицензии для научных журналов
Пути открытого доступа и открытые лицензии для научных журналовПути открытого доступа и открытые лицензии для научных журналов
Пути открытого доступа и открытые лицензии для научных журналовDmitry Semyachkin
 
Правовой стандарт научной коммуникации в мире
Правовой стандарт научной коммуникации в миреПравовой стандарт научной коммуникации в мире
Правовой стандарт научной коммуникации в миреDmitry Semyachkin
 
Model Visualisation
Model VisualisationModel Visualisation
Model VisualisationAmit Kapoor
 
How the IBM Platform LSF Architecture Accelerates Technical Computing
How the IBM Platform LSF Architecture Accelerates Technical ComputingHow the IBM Platform LSF Architecture Accelerates Technical Computing
How the IBM Platform LSF Architecture Accelerates Technical ComputingIBM India Smarter Computing
 
Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Cloud PARTE: Elastic Complex Event Processing based on Mobile ActorsCloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Cloud PARTE: Elastic Complex Event Processing based on Mobile ActorsStefan Marr
 
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...Karthik Ramasamy
 
Graph Stream Processing : spinning fast, large scale, complex analytics
Graph Stream Processing : spinning fast, large scale, complex analyticsGraph Stream Processing : spinning fast, large scale, complex analytics
Graph Stream Processing : spinning fast, large scale, complex analyticsParis Carbone
 
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4jWebinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4jNeo4j
 
Neo4j -[:LOVES]-> Cypher
Neo4j -[:LOVES]-> CypherNeo4j -[:LOVES]-> Cypher
Neo4j -[:LOVES]-> Cypherjexp
 
Deploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime InsightsDeploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime InsightsNeo4j
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4jNeo4j
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkGelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkVasia Kalavri
 
Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture Neo4j
 

Viewers also liked (20)

Converting Relational to Graph Databases
Converting Relational to Graph DatabasesConverting Relational to Graph Databases
Converting Relational to Graph Databases
 
Relational to Graph - Import
Relational to Graph - ImportRelational to Graph - Import
Relational to Graph - Import
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
Importing Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowImporting Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflow
 
Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...
 
Migrating from MongoDB to Neo4j - Lessons Learned
Migrating from MongoDB to Neo4j - Lessons LearnedMigrating from MongoDB to Neo4j - Lessons Learned
Migrating from MongoDB to Neo4j - Lessons Learned
 
Пути открытого доступа и открытые лицензии для научных журналов
Пути открытого доступа и открытые лицензии для научных журналовПути открытого доступа и открытые лицензии для научных журналов
Пути открытого доступа и открытые лицензии для научных журналов
 
Правовой стандарт научной коммуникации в мире
Правовой стандарт научной коммуникации в миреПравовой стандарт научной коммуникации в мире
Правовой стандарт научной коммуникации в мире
 
Model Visualisation
Model VisualisationModel Visualisation
Model Visualisation
 
How the IBM Platform LSF Architecture Accelerates Technical Computing
How the IBM Platform LSF Architecture Accelerates Technical ComputingHow the IBM Platform LSF Architecture Accelerates Technical Computing
How the IBM Platform LSF Architecture Accelerates Technical Computing
 
Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Cloud PARTE: Elastic Complex Event Processing based on Mobile ActorsCloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors
 
Flink. Pure Streaming
Flink. Pure StreamingFlink. Pure Streaming
Flink. Pure Streaming
 
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
 
Graph Stream Processing : spinning fast, large scale, complex analytics
Graph Stream Processing : spinning fast, large scale, complex analyticsGraph Stream Processing : spinning fast, large scale, complex analytics
Graph Stream Processing : spinning fast, large scale, complex analytics
 
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4jWebinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
 
Neo4j -[:LOVES]-> Cypher
Neo4j -[:LOVES]-> CypherNeo4j -[:LOVES]-> Cypher
Neo4j -[:LOVES]-> Cypher
 
Deploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime InsightsDeploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime Insights
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkGelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
 
Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture
 

Similar to ETL into Neo4j

NoSQL, Neo4J for Java Developers , OracleWeek-2012
NoSQL, Neo4J for Java Developers , OracleWeek-2012NoSQL, Neo4J for Java Developers , OracleWeek-2012
NoSQL, Neo4J for Java Developers , OracleWeek-2012Eugene Hanikblum
 
Writing Readable Code
Writing Readable CodeWriting Readable Code
Writing Readable Codeeddiehaber
 
Flow or Type - how to React to that?
Flow or Type - how to React to that?Flow or Type - how to React to that?
Flow or Type - how to React to that?Krešimir Antolić
 
Learning To Run - XPages for Lotus Notes Client Developers
Learning To Run - XPages for Lotus Notes Client DevelopersLearning To Run - XPages for Lotus Notes Client Developers
Learning To Run - XPages for Lotus Notes Client DevelopersKathy Brown
 
Ruby for .NET developers
Ruby for .NET developersRuby for .NET developers
Ruby for .NET developersMax Titov
 
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...Maarten Balliauw
 
From Dream socialbot to Multiskill AI Assistant Platform
From Dream socialbot to Multiskill AI Assistant PlatformFrom Dream socialbot to Multiskill AI Assistant Platform
From Dream socialbot to Multiskill AI Assistant PlatformDaniel Kornev
 
How To Build And Launch A Successful Globalized App From Day One Or All The ...
How To Build And Launch A Successful Globalized App From Day One  Or All The ...How To Build And Launch A Successful Globalized App From Day One  Or All The ...
How To Build And Launch A Successful Globalized App From Day One Or All The ...agileware
 
Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009spierre
 
Jariko - A JVM interpreter for RPG written in kotlin
Jariko - A JVM interpreter for RPG written in kotlinJariko - A JVM interpreter for RPG written in kotlin
Jariko - A JVM interpreter for RPG written in kotlinFederico Tomassetti
 
What we can learn from Rebol?
What we can learn from Rebol?What we can learn from Rebol?
What we can learn from Rebol?lichtkind
 
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...Maarten Balliauw
 
Os Worthington
Os WorthingtonOs Worthington
Os Worthingtonoscon2007
 
Lambda The Extreme: Test-Driving a Functional Language
Lambda The Extreme: Test-Driving a Functional LanguageLambda The Extreme: Test-Driving a Functional Language
Lambda The Extreme: Test-Driving a Functional LanguageAccenture | SolutionsIQ
 
Modern_2.pptx for java
Modern_2.pptx for java Modern_2.pptx for java
Modern_2.pptx for java MayaTofik
 
Jmp107 Web Services
Jmp107 Web ServicesJmp107 Web Services
Jmp107 Web Servicesdominion
 
Groovy Update - JavaPolis 2007
Groovy Update - JavaPolis 2007Groovy Update - JavaPolis 2007
Groovy Update - JavaPolis 2007Guillaume Laforge
 

Similar to ETL into Neo4j (20)

NoSQL, Neo4J for Java Developers , OracleWeek-2012
NoSQL, Neo4J for Java Developers , OracleWeek-2012NoSQL, Neo4J for Java Developers , OracleWeek-2012
NoSQL, Neo4J for Java Developers , OracleWeek-2012
 
Writing Readable Code
Writing Readable CodeWriting Readable Code
Writing Readable Code
 
kornev.pdf
kornev.pdfkornev.pdf
kornev.pdf
 
Flow or Type - how to React to that?
Flow or Type - how to React to that?Flow or Type - how to React to that?
Flow or Type - how to React to that?
 
Chado-XML
Chado-XMLChado-XML
Chado-XML
 
Learning To Run - XPages for Lotus Notes Client Developers
Learning To Run - XPages for Lotus Notes Client DevelopersLearning To Run - XPages for Lotus Notes Client Developers
Learning To Run - XPages for Lotus Notes Client Developers
 
Ruby for .NET developers
Ruby for .NET developersRuby for .NET developers
Ruby for .NET developers
 
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
 
From Dream socialbot to Multiskill AI Assistant Platform
From Dream socialbot to Multiskill AI Assistant PlatformFrom Dream socialbot to Multiskill AI Assistant Platform
From Dream socialbot to Multiskill AI Assistant Platform
 
How To Build And Launch A Successful Globalized App From Day One Or All The ...
How To Build And Launch A Successful Globalized App From Day One  Or All The ...How To Build And Launch A Successful Globalized App From Day One  Or All The ...
How To Build And Launch A Successful Globalized App From Day One Or All The ...
 
Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009
 
Avro
AvroAvro
Avro
 
Jariko - A JVM interpreter for RPG written in kotlin
Jariko - A JVM interpreter for RPG written in kotlinJariko - A JVM interpreter for RPG written in kotlin
Jariko - A JVM interpreter for RPG written in kotlin
 
What we can learn from Rebol?
What we can learn from Rebol?What we can learn from Rebol?
What we can learn from Rebol?
 
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
 
Os Worthington
Os WorthingtonOs Worthington
Os Worthington
 
Lambda The Extreme: Test-Driving a Functional Language
Lambda The Extreme: Test-Driving a Functional LanguageLambda The Extreme: Test-Driving a Functional Language
Lambda The Extreme: Test-Driving a Functional Language
 
Modern_2.pptx for java
Modern_2.pptx for java Modern_2.pptx for java
Modern_2.pptx for java
 
Jmp107 Web Services
Jmp107 Web ServicesJmp107 Web Services
Jmp107 Web Services
 
Groovy Update - JavaPolis 2007
Groovy Update - JavaPolis 2007Groovy Update - JavaPolis 2007
Groovy Update - JavaPolis 2007
 

More from Max De Marzi

DataDay 2023 Presentation
DataDay 2023 PresentationDataDay 2023 Presentation
DataDay 2023 PresentationMax De Marzi
 
DataDay 2023 Presentation - Notes
DataDay 2023 Presentation - NotesDataDay 2023 Presentation - Notes
DataDay 2023 Presentation - NotesMax De Marzi
 
Developer Intro Deck-PowerPoint - Download for Speaker Notes
Developer Intro Deck-PowerPoint - Download for Speaker NotesDeveloper Intro Deck-PowerPoint - Download for Speaker Notes
Developer Intro Deck-PowerPoint - Download for Speaker NotesMax De Marzi
 
Outrageous Ideas for Graph Databases
Outrageous Ideas for Graph DatabasesOutrageous Ideas for Graph Databases
Outrageous Ideas for Graph DatabasesMax De Marzi
 
Neo4j Training Cypher
Neo4j Training CypherNeo4j Training Cypher
Neo4j Training CypherMax De Marzi
 
Neo4j Training Modeling
Neo4j Training ModelingNeo4j Training Modeling
Neo4j Training ModelingMax De Marzi
 
Neo4j Training Introduction
Neo4j Training IntroductionNeo4j Training Introduction
Neo4j Training IntroductionMax De Marzi
 
Detenga el fraude complejo con Neo4j
Detenga el fraude complejo con Neo4jDetenga el fraude complejo con Neo4j
Detenga el fraude complejo con Neo4jMax De Marzi
 
Data Modeling Tricks for Neo4j
Data Modeling Tricks for Neo4jData Modeling Tricks for Neo4j
Data Modeling Tricks for Neo4jMax De Marzi
 
Fraud Detection and Neo4j
Fraud Detection and Neo4j Fraud Detection and Neo4j
Fraud Detection and Neo4j Max De Marzi
 
Detecion de Fraude con Neo4j
Detecion de Fraude con Neo4jDetecion de Fraude con Neo4j
Detecion de Fraude con Neo4jMax De Marzi
 
Neo4j Data Science Presentation
Neo4j Data Science PresentationNeo4j Data Science Presentation
Neo4j Data Science PresentationMax De Marzi
 
Neo4j Stored Procedure Training Part 2
Neo4j Stored Procedure Training Part 2Neo4j Stored Procedure Training Part 2
Neo4j Stored Procedure Training Part 2Max De Marzi
 
Neo4j Stored Procedure Training Part 1
Neo4j Stored Procedure Training Part 1Neo4j Stored Procedure Training Part 1
Neo4j Stored Procedure Training Part 1Max De Marzi
 
Decision Trees in Neo4j
Decision Trees in Neo4jDecision Trees in Neo4j
Decision Trees in Neo4jMax De Marzi
 
Neo4j y Fraude Spanish
Neo4j y Fraude SpanishNeo4j y Fraude Spanish
Neo4j y Fraude SpanishMax De Marzi
 
Data modeling with neo4j tutorial
Data modeling with neo4j tutorialData modeling with neo4j tutorial
Data modeling with neo4j tutorialMax De Marzi
 
Neo4j Fundamentals
Neo4j FundamentalsNeo4j Fundamentals
Neo4j FundamentalsMax De Marzi
 
Neo4j Presentation
Neo4j PresentationNeo4j Presentation
Neo4j PresentationMax De Marzi
 
Fraud Detection Class Slides
Fraud Detection Class SlidesFraud Detection Class Slides
Fraud Detection Class SlidesMax De Marzi
 

More from Max De Marzi (20)

DataDay 2023 Presentation
DataDay 2023 PresentationDataDay 2023 Presentation
DataDay 2023 Presentation
 
DataDay 2023 Presentation - Notes
DataDay 2023 Presentation - NotesDataDay 2023 Presentation - Notes
DataDay 2023 Presentation - Notes
 
Developer Intro Deck-PowerPoint - Download for Speaker Notes
Developer Intro Deck-PowerPoint - Download for Speaker NotesDeveloper Intro Deck-PowerPoint - Download for Speaker Notes
Developer Intro Deck-PowerPoint - Download for Speaker Notes
 
Outrageous Ideas for Graph Databases
Outrageous Ideas for Graph DatabasesOutrageous Ideas for Graph Databases
Outrageous Ideas for Graph Databases
 
Neo4j Training Cypher
Neo4j Training CypherNeo4j Training Cypher
Neo4j Training Cypher
 
Neo4j Training Modeling
Neo4j Training ModelingNeo4j Training Modeling
Neo4j Training Modeling
 
Neo4j Training Introduction
Neo4j Training IntroductionNeo4j Training Introduction
Neo4j Training Introduction
 
Detenga el fraude complejo con Neo4j
Detenga el fraude complejo con Neo4jDetenga el fraude complejo con Neo4j
Detenga el fraude complejo con Neo4j
 
Data Modeling Tricks for Neo4j
Data Modeling Tricks for Neo4jData Modeling Tricks for Neo4j
Data Modeling Tricks for Neo4j
 
Fraud Detection and Neo4j
Fraud Detection and Neo4j Fraud Detection and Neo4j
Fraud Detection and Neo4j
 
Detecion de Fraude con Neo4j
Detecion de Fraude con Neo4jDetecion de Fraude con Neo4j
Detecion de Fraude con Neo4j
 
Neo4j Data Science Presentation
Neo4j Data Science PresentationNeo4j Data Science Presentation
Neo4j Data Science Presentation
 
Neo4j Stored Procedure Training Part 2
Neo4j Stored Procedure Training Part 2Neo4j Stored Procedure Training Part 2
Neo4j Stored Procedure Training Part 2
 
Neo4j Stored Procedure Training Part 1
Neo4j Stored Procedure Training Part 1Neo4j Stored Procedure Training Part 1
Neo4j Stored Procedure Training Part 1
 
Decision Trees in Neo4j
Decision Trees in Neo4jDecision Trees in Neo4j
Decision Trees in Neo4j
 
Neo4j y Fraude Spanish
Neo4j y Fraude SpanishNeo4j y Fraude Spanish
Neo4j y Fraude Spanish
 
Data modeling with neo4j tutorial
Data modeling with neo4j tutorialData modeling with neo4j tutorial
Data modeling with neo4j tutorial
 
Neo4j Fundamentals
Neo4j FundamentalsNeo4j Fundamentals
Neo4j Fundamentals
 
Neo4j Presentation
Neo4j PresentationNeo4j Presentation
Neo4j Presentation
 
Fraud Detection Class Slides
Fraud Detection Class SlidesFraud Detection Class Slides
Fraud Detection Class Slides
 

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

ETL into Neo4j

  • 1. ETL into Neo4j Max De Marzi
  • 2. About Me Built the Neography Gem (Ruby Wrapper to the Neo4j REST API) Playing with Neo4j since 10/2009 • My Blog: http://maxdemarzi.com • Find me on Twitter: @maxdemarzi • Email me: maxdemarzi@gmail.com • GitHub: http://github.com/maxdemarzi
  • 3. Agenda • ETL your mind • ETL with Batch and the REST API • ETL with Gremlin and Groovy • ETL with the Batch Importer • ETL from SQL
  • 4. ETL your Mind You have to start there
  • 5. More Relational than Relational Stop thinking about how Start thinking about relationships Tables are related
  • 6. Objects like to mingle Optimized for “trees” of data Optimized for seeing the forest and the trees, and the branches, and the trunks
  • 7. SELECT skills.*, user_skill.* FROM users JOIN user_skill ON users.id = user_skill.user_id JOIN skills ON user_skill.skill_id = skill.id WHERE users.id = 1
  • 8. START user = node(1) MATCH user -[user_skill]-> skill RETURN skill, user_skill
  • 10. Language LanguageCountry Country language_code language_code country_code language_name country_code country_name word_count primary flag_uri Language Country name name IS_SPOKEN_IN code code word_count as_primary flag_uri
  • 11. name: “Canada” languages_spoken: “[ „English‟, „French‟ ]” language:“English” spoken_in name: “USA” name: “Canada” language:“French” spoken_in name: “France”
  • 12. Country name flag_uri language_name number_of_words yes_in_langauge no_in_language currency_code currency_name Country Language name name flag_uri SPEAKS number_of_words yes no Currency code name
  • 13. ETL with Batch and the REST API
  • 14. Batch command from REST API Great for importing Facebook/Twitter friends Keep each request under 10k commands Preferably send a request every 2k to 5k commands
  • 15. Using Batch from Neography
  • 16. Why Batch Transactional: any failures not committed. Ordered: responses guaranteed to be in the same order as sent. Continuous loading/updating nodes and relationships in spurts or streaming.
  • 17. ETL with Gremlin and Groovy
  • 18. Commit every 1000 changes or so, make sure to stop the transaction to commit the last few changes at the very end. Look into auto-indexing to make life easier. Disabled by default. See Docs for trick to make it full text instead of exact index. http://docs.neo4j.org/chunked/milestone/auto-indexing.html
  • 19. Crazy Format is ok Id :: Title :: Genre|Genre|Genre But it’s preferable to stay clear of escape characters like “|” String location of data file, converted to URL, then processed one line at a time. Movie vertex created, genre vertex created unless it exists (index lookup), edge from movie to genre is created. Full walk-through on http://maxdemarzi.com/2012/01/13/neo4j-on-heroku-part- one/
  • 20. ETL with the Batch Importer
  • 22. Testing it 7.5M nodes, 42M relationships in just over 3 minutes on a laptop.
  • 23. Loading it into Neo4j Full walk-through on http://maxdemarzi.com/2012/02/28/batch-importer-part-1/
  • 24. When to use the Batch Importer? • 1st time loading or periodic reloading • When you need Speed • When you don’t mind a little Java
  • 26. Identities who vouched for each other row_number() and INTO are our friends
  • 27. The “term” vouched for will serve as our relationship type, status is a relationship property.
  • 28. Notice there are no node ids. These are automatic, clkao is node 1
  • 29. No time to get coffee >8-[
  • 30. What about multiple types of nodes? No problem, just add the MAX(node_id) from the first table. Full walk-through at: http://maxdemarzi.com/2012/02/28/batch-importer-part-2/ Need help? E-mail me, catch me on Google chat or Skype. Please don’t be shy…. and read my blog: http://maxdemarzi.com