SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
OQGraph 3 for MariaDB
   Graphs and Hierarchies in Plain SQL
             http://goo.gl/gqr7b




Antony T Curtis <atcurtis@gmail.com>


                        graph@openquery.com
                        http://openquery.com/graph
Graphs / Networks
     ● Nodes connected by Edges.
     ● Edges may be directional.
     ● Edges may have a "weight" / "cost" attribute.
     ● Directed graphs may have bi-directional edges.
     ● Unconnected sets of nodes may exist on same graph.
     ● There need not be a "root" node.




   Examples:
    ● "Social Graphs" / friend relationships.
    ● Decision / State graphs.
    ● Airline routes
OQGRAPH computation engine © 2009-2013 Open Query
RDBMS with Heirarchies and Graphs

     ● Not always a particularly good fit.
     ● Various tree models exist; each with limitations:
        ○ Adjacency model
           ■ Either uses fixed max depth or recursive queries.
           ■ Oracle has CONNECT BY PRIOR
           ■ SQL99 has WITH RECURSIVE...UNION...
        ○ Nested set
           ■ complex
           ■ recursive queries to find path to root.
        ○ Materialised path
           ■ Ugly and not relational.
           ■ Can be quite effective when used correctly.

                                              Further reading: http://dev.mysql.com/tech-resources/articles/hierarchical-data.html

OQGRAPH computation engine © 2009-2013 Open Query
What is OQGRAPH?

    ● Implemented as a storage engine.
       ○ Original concept by Arjen Lentz
    ● Mk. 2 implementation 2008
       ○ GPLv2+
       ○ Bundled with MariaDB 5.2+
       ○ Boost Graph Library
    ● Mk. 3 implementation
       ○ GPLv2+
       ○ Bundled with MariaDB 10.0 (soon)
    ● Easy to enable
         ○ INSTALL PLUGIN oqgraph SONAME ‘ha_oqgraph’;




OQGRAPH computation engine © 2009-2013 Open Query
OQGRAPH: A Computation Engine

     ● It is not a general purpose data engine.
        ○ unlike MyISAM, InnoDB or MEMORY.
     ● Looks like an ordinary table.
     ● Has a very different internal architecture.
     ● It does not operate in terms of
        ○ storing data for later retrieval.
        ○ having indexes on data.

     ● May be regarded as a "magic view" or "table function".




OQGRAPH computation engine © 2009-2013 Open Query
OQGRAPH: A Computation Engine

                MySQL Server                   Communications, Session and Thread Management



                                       DDL, DML,
                 Management             Tables,            SQL Parser and SQL
                                      Views, Lock
                  Services,           Management
                                                         Stored Procedure Engine
                                                                                               Buffers
                  Logging,
                                                                                                and
                 Utilities and                                                                 Caches
                  Runtime
                  Libraries
                                       Query Optimizer and Execution Engine

                                                built in and run-time loaded plug ins


                                         OQGraph


                          InnoDB




OQGRAPH computation engine © 2009-2013 Open Query
What's new in OQGRAPH 3
   Features:
    ● Judy array bitmaps for Graph coloring.
    ● Uses existing tables for edge data.
    ● Much lower memory cost per query.
    ● Does not impose any strict structure on the source table.
    ● Can handle significantly larger graphs than OQGRAPHv2.
       ○ 100K+ index reads per second are possible.
       ○ Millions of edges are possible.
    ● All edges of graph need not fit in memory.
       ○ Only Judy bitmap array must be held in RAM.
   Notes:
    ● Tables are read-only and only read from the backing table.
    ● Table must be in same schema as the backing table.
    ● Table must have appropriate indexes.

OQGRAPH computation engine © 2009-2013 Open Query
Anatomy of an OQGRAPH 3 table
   CREATE TABLE db.tblname (
     latch SMALLINT UNSIGNED NULL,
     origid BIGINT UNSIGNED NULL,
     destid BIGINT UNSIGNED NULL,
     weight DOUBLE NULL,
     seq BIGINT UNSIGNED NULL,
     linkid BIGINT UNSIGNED NULL,
     KEY (latch, origid, destid) USING HASH,
     KEY (latch, destid, origid) USING HASH
   ) ENGINE=OQGRAPH
     data_table='link'       -- data table
     origid='source'         -- column name
     destid='target'         -- column name
     weight='weight';        -- optional column name
   ;
OQGRAPH computation engine © 2009-2013 Open Query
OQGRAPH - Data source
     ● Edges are directed edges.
     ● Edge weight are optional and default to 1.0
     ● Undirected edges may be represented as two directed
       edges, in opposite directions.

   CREATE TABLE foo (
      origid INT UNSIGNED NOT NULL,
      destid INT UNSIGNED NOT NULL,
      PRIMARY KEY(origid, destid),
      KEY (destid)
   );
   INSERT INTO foo (origid,destid) VALUES
   (1,2), (2,3), (2,4),
   (4,5), (3,6), (5,6);


OQGRAPH computation engine © 2009-2013 Open Query
OQGRAPH - Data source, cont.

   Creating the OQGRAPH table:
   CREATE TABLE foo_graph (
     latch SMALLINT UNSIGNED NULL,
     origid BIGINT UNSIGNED NULL,
     destid BIGINT UNSIGNED NULL,
     weight DOUBLE NULL,
     seq BIGINT UNSIGNED NULL,
     linkid BIGINT UNSIGNED NULL,
     KEY (latch, origid, destid) USING HASH,
     KEY (latch, destid, origid) USING HASH
   ) ENGINE=OQGRAPH
     data_table='foo' origid='origid' destid='destid';




OQGRAPH computation engine © 2009-2013 Open Query
Selecting Edges

   MariaDB [foo]> select * from foo_graph;
   +-------+--------+--------+--------+------+--------+
   | latch | origid | destid | weight | seq | linkid |
   +-------+--------+--------+--------+------+--------+
   | NULL |       1 |      2 |      1 | NULL |   NULL |
   | NULL |       2 |      3 |      1 | NULL |   NULL |
   | NULL |       2 |      4 |      1 | NULL |   NULL |
   | NULL |       3 |      6 |      1 | NULL |   NULL |
   | NULL |       4 |      5 |      1 | NULL |   NULL |
   | NULL |       5 |      6 |      1 | NULL |   NULL |
   +-------+--------+--------+--------+------+--------+
   6 rows in set (0.38 sec)




OQGRAPH computation engine © 2009-2013 Open Query
Now, it's time for some magic.
   (shortest path calculation)

      ● SELECT * FROM foo_graph
        WHERE latch=1 AND origid=1 AND destid=6;
        +-------+--------+--------+--------+------+--------+
        | latch | origid | destid | weight | seq | linkid |
        +-------+--------+--------+--------+------+--------+
        |     1 |      1 |      6 |   NULL |    0 |      1 |
        |     1 |      1 |      6 |      1 |    1 |      2 |
        |     1 |      1 |      6 |      1 |    2 |      3 |
        |     1 |      1 |      6 |      1 |    3 |      6 |
        +-------+--------+--------+--------+------+--------+


      ● SELECT GROUP_CONCAT(linkid ORDER BY seq) AS path
        FROM foo_graph WHERE latch=1 AND origid=1 AND destid=6 G
        path: 1,2,3,6


OQGRAPH computation engine © 2009-2013 Open Query
Other computations,
     ● Which paths lead to node 4?
        SELECT GROUP_CONCAT(linkid) AS list
        FROM foo_graph WHERE latch=1 AND destid=4 G

        list: 1,2,4


     ● Where can I get to from node 4?
        SELECT GROUP_CONCAT(linkid) AS list
        FROM foo_graph WHERE latch=1 AND origid=4 G

        list: 6,5,4




OQGRAPH computation engine © 2009-2013 Open Query
Other computations, continued.

     ● See docs for latch 0 and latch NULL
     ● latch 1 : Dijkstra's shortest path.
        ○ O((V + E).log V)
     ● latch 2 : Breadth-first search.
        ○ O(V+E)
     ● Other algorithms possible




OQGRAPH computation engine © 2009-2013 Open Query
Joins make it prettier,
     ● INSERT INTO people VALUES
       (1,’pearce’), (2,’hunnicut’), (3,’potter’),
       (4,’hoolihan’), (5,’winchester’), (6,’
       mulcahy’);


     ● SELECT GROUP_CONCAT(name ORDER BY seq) path
       FROM foo_graph
       JOIN people ON (foo.linkid = people.id)
       WHERE latch=1 AND origid=1 AND destid=6 G

        path: pearce,hunnicut,potter,mulcahy


OQGRAPH computation engine © 2009-2013 Open Query
Tree of Life
 Load the tol.sql schema,

 Create tol_link backing store table,
 CREATE TABLE tol_link (
   source INT UNSIGNED NOT NULL,
   target INT UNSIGNED NOT NULL,
   PRIMARY KEY (source, target),
   KEY (target) ) ENGINE=innodb;

 Populate it with all the edges we need:
 INSERT INTO tol_link (source,target)
 SELECT parent,id FROM tol WHERE parent IS NOT NULL
 UNION ALL
 SELECT id,parent FROM tol WHERE parent IS NOT NULL;
 Query OK, 178102 rows affected (46.35 sec)
 Records: 178102 Duplicates: 0 Warnings: 0

                 Direct download: http://bazaar.launchpad.net/~openquery-core/oqgraph/trunk/view/head:/examples/tree-of-life/tol.sql

OQGRAPH computation engine © 2009-2013 Open Query
Tree of Life, cont.

   Creating the OQGRAPH table:
   CREATE TABLE tol_tree (
     latch SMALLINT UNSIGNED NULL,
     origid BIGINT UNSIGNED NULL,
     destid BIGINT UNSIGNED NULL,
     weight DOUBLE NULL,
     seq BIGINT UNSIGNED NULL,
     linkid BIGINT UNSIGNED NULL,
     KEY (latch, origid, destid) USING HASH,
     KEY (latch, destid, origid) USING HASH
   ) ENGINE=OQGRAPH
     data_table='tol_link' origid='source' destid='target';




OQGRAPH computation engine © 2009-2013 Open Query
Tree of Life - finding H.Sapiens

   SELECT
      GROUP_CONCAT(name ORDER BY seq SEPARATOR ' -> ') AS path
      FROM tol_tree JOIN tol ON (linkid=id)
      WHERE latch=1 AND origid=1 AND destid=16421 G

   path: Life on Earth -> Eukaryotes -> Unikonts ->
   Opisthokonts -> Animals -> Bilateria ->
   Deuterostomia -> Chordata -> Craniata -> Vertebrata
   -> Gnathostomata -> Teleostomi -> Osteichthyes ->
   Sarcopterygii -> Terrestrial Vertebrates ->
   Tetrapoda -> Reptiliomorpha -> Amniota -> Synapsida
   -> Eupelycosauria -> Sphenacodontia ->
   Sphenacodontoidea -> Therapsida -> Theriodontia ->
   Cynodontia -> Mammalia -> Eutheria -> Primates ->
   Catarrhini -> Hominidae -> Homo -> Homo sapiens

OQGRAPH computation engine © 2009-2011 Open Query
Internet Movie DataBase (IMDB)
 Transform and load the movie database (this takes a long time)
 CREATE TABLE `entity` (
   `id` int(11) NOT NULL AUTO_INCREMENT,
   `type` enum('ACTOR','MOVIE','TV MOVIE','TV MINI','TV SERIES','VIDEO
 MOVIE','VIDEO GAME','VOICE','ARCHIVE') NOT NULL,
   `name` varchar(128) COLLATE utf8_unicode_ci NOT NULL,
   PRIMARY KEY (`id`),
   UNIQUE KEY `type` (`type`,`name`) USING BTREE
 ) ENGINE=InnoDB;

 CREATE TABLE `link` (
   `rel_id` int(11) NOT NULL AUTO_INCREMENT,
   `link_from` int(11) NOT NULL,
   `link_to` int(11) NOT NULL,
   PRIMARY KEY (`rel_id`),
   KEY `link_from` (`link_from`,`link_to`),
   KEY `link_to` (`link_to`)
 ) ENGINE=InnoDB;




OQGRAPH computation engine © 2009-2013 Open Query
Degrees of N!xau
 Graph of movies approximately 3.7 million nodes with 9 million edges. Tables are
 about 1GB and InnoDB configured for 512MB buffer pool.
 MariaDB [imdb]> SELECT
               -> GROUP_CONCAT(name ORDER BY seq SEPARATOR ' -> ') AS
 path
               -> FROM movie_graph JOIN entity ON (id=linkid)
               -> WHERE latch=1
               -> AND origid=(SELECT a.id FROM entity a
               ->               WHERE name='Kevin Bacon')
               -> AND destid=(SELECT b.id FROM entity b
                                WHERE name='N!xau')G




OQGRAPH computation engine © 2009-2013 Open Query
Degrees of N!xau
 Graph of movies approximately 3.7 million nodes with 9 million edges. Tables are
 about 1GB and InnoDB configured for 512MB buffer pool.
 MariaDB [imdb]> SELECT
               -> GROUP_CONCAT(name ORDER BY seq SEPARATOR ' -> ') AS
 path
               -> FROM movie_graph JOIN entity ON (id=linkid)
               -> WHERE latch=1
               -> AND origid=(SELECT a.id FROM entity a
               ->               WHERE name='Kevin Bacon')
               -> AND destid=(SELECT b.id FROM entity b
                                WHERE name='N!xau')G
 *************************** 1. row ***************************
 path: Kevin Bacon -> The Air Up There (1994) -> Fanyana H. Sidumo ->
 The Gods Must Be Crazy (1981) -> N!xau
 1 row in set (3 min 9.67 sec)
 --again
 *************************** 1. row ***************************
 path: Kevin Bacon -> The Air Up There (1994) -> Fanyana H. Sidumo ->
 The Gods Must Be Crazy (1981) -> N!xau
 1 row in set (1 min 7.13 sec)
 Each query requires approximately 7.8 million secondary key reads.




OQGRAPH computation engine © 2009-2013 Open Query
Degrees of N!xau
 Graph of approximately 3.7 million nodes with 30 million edges. Tables are about
 3.5GB and InnoDB configured for 512MB buffer pool.
 MariaDB [imdb]> SELECT
               -> GROUP_CONCAT(name ORDER BY seq SEPARATOR ' -> ') AS
 path
               -> FROM imdb_graph JOIN entity ON (id=linkid)
               -> WHERE latch=1
               -> AND origid=(SELECT a.id FROM entity a
               ->                 WHERE name='Kevin Bacon')
               -> AND destid=(SELECT b.id FROM entity b
                                  WHERE name='N!xau')G
 *************************** 1. row ***************************
 path: Kevin Bacon -> The 45th Annual Golden Globe Awards (1988) ->
 Richard Attenborough -> In Darkest Hollywood: Cinema and Apartheid
 (1993) -> N!xau
 1 row in set (10 min 6.55 sec)
 --again
 *************************** 1. row ***************************
 path: Kevin Bacon -> The 45th Annual Golden Globe Awards (1988) ->
 Richard Attenborough -> In Darkest Hollywood: Cinema and Apartheid
 (1993) -> N!xau
 1 row in set (8 min 29.66 sec)
 Each query requires approximately 16.6 million secondary key reads.

OQGRAPH computation engine © 2009-2013 Open Query
We want your feedback!

     ● Very easy to use...
         But do feel free to ask us for help/advice.

     ● OpenQuery created friendlist_graph for Drupal 6.
        ○ Currently based on OQGraph v2
          ○ Addition to the existing friendlist module.
          ○ Enables easy social networking in Drupal.
          ○ Peter Lieverdink (@cafuego) did this in about 30 minutes

     ● We would like to know how you are using OQGRAPH!
       ○ You could be doing something really cool...



OQGRAPH computation engine © 2009-2013 Open Query
Links and support
    ● Binaries & Packages
         ○ http://mariadb.com (MariaDB 10.0 soon)
    ● Source collaboration
       ○ https://launchpad.net/oqgraph
         ○ https://code.launchpad.net/~oqgraph-dev/maria/10.0-oqgraph3
    ● Info, Docs, Support, Licensing, Engineering
         ○ http://openquery.com/graph
         ○ This presentation: http://goo.gl/gqr7b




                                     Thank you!
                                     Antony Curtis & Arjen Lentz
                                     graph@openquery.com
OQGRAPH computation engine © 2009-2013 Open Query

Contenu connexe

Tendances

ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOAltinity Ltd
 
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...DataWorks Summit
 
Redshift at Lightspeed: How to continuously optimize and modify Redshift sche...
Redshift at Lightspeed: How to continuously optimize and modify Redshift sche...Redshift at Lightspeed: How to continuously optimize and modify Redshift sche...
Redshift at Lightspeed: How to continuously optimize and modify Redshift sche...Amazon Web Services
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions Yugabyte
 
Splunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operatorSplunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operatorImply
 
Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)ajmal anbu
 
AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAmazon Web Services
 
Credit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperCredit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperGarvit Burad
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best PracticesCloudera, Inc.
 
Looking towards an official cassandra sidecar netflix
Looking towards an official cassandra sidecar   netflixLooking towards an official cassandra sidecar   netflix
Looking towards an official cassandra sidecar netflixVinay Kumar Chella
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...Krishnaram Kenthapadi
 
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...Amazon Web Services
 
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and InteroperabilityImproving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and InteroperabilityWes McKinney
 
Deep Learning using Keras
Deep Learning using KerasDeep Learning using Keras
Deep Learning using KerasAly Abdelkareem
 
Real-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and DruidReal-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and DruidJan Graßegger
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dwelephantscale
 
GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLSpark Summit
 

Tendances (20)

ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
 
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
 
Redshift at Lightspeed: How to continuously optimize and modify Redshift sche...
Redshift at Lightspeed: How to continuously optimize and modify Redshift sche...Redshift at Lightspeed: How to continuously optimize and modify Redshift sche...
Redshift at Lightspeed: How to continuously optimize and modify Redshift sche...
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions
 
Splunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operatorSplunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operator
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)
 
AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing Performance
 
Credit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperCredit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research Paper
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best Practices
 
Looking towards an official cassandra sidecar netflix
Looking towards an official cassandra sidecar   netflixLooking towards an official cassandra sidecar   netflix
Looking towards an official cassandra sidecar netflix
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
 
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...
 
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and InteroperabilityImproving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and Interoperability
 
Deep Learning using Keras
Deep Learning using KerasDeep Learning using Keras
Deep Learning using Keras
 
Real-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and DruidReal-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and Druid
 
Credit card fraud dection
Credit card fraud dectionCredit card fraud dection
Credit card fraud dection
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dw
 
GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQL
 

Similaire à OQGraph @ SCaLE 11x 2013

OQGraph at MySQL Users Conference 2011
OQGraph at MySQL Users Conference 2011OQGraph at MySQL Users Conference 2011
OQGraph at MySQL Users Conference 2011Antony T Curtis
 
MySQL HA Orchestrator Proxysql Consul.pdf
MySQL HA Orchestrator Proxysql Consul.pdfMySQL HA Orchestrator Proxysql Consul.pdf
MySQL HA Orchestrator Proxysql Consul.pdfYunusShaikh49
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with GoJames Tan
 
Design and Implementation of the Security Graph Language
Design and Implementation of the Security Graph LanguageDesign and Implementation of the Security Graph Language
Design and Implementation of the Security Graph LanguageAsankhaya Sharma
 
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamNeville Li
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllMichael Mior
 
Web Traffic Time Series Forecasting
Web Traffic  Time Series ForecastingWeb Traffic  Time Series Forecasting
Web Traffic Time Series ForecastingBillTubbs
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartMukesh Singh
 
Apache Hive for modern DBAs
Apache Hive for modern DBAsApache Hive for modern DBAs
Apache Hive for modern DBAsLuis Marques
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifyNeville Li
 
Impala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris TsirogiannisImpala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris TsirogiannisFelicia Haggarty
 
Custom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDBCustom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDBArangoDB Database
 
What’s new in 9.6, by PostgreSQL contributor
What’s new in 9.6, by PostgreSQL contributorWhat’s new in 9.6, by PostgreSQL contributor
What’s new in 9.6, by PostgreSQL contributorMasahiko Sawada
 
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMartin Zapletal
 
Apache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming modelApache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming modelMartin Zapletal
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarSpark Summit
 

Similaire à OQGraph @ SCaLE 11x 2013 (20)

OQGraph at MySQL Users Conference 2011
OQGraph at MySQL Users Conference 2011OQGraph at MySQL Users Conference 2011
OQGraph at MySQL Users Conference 2011
 
MySQL HA Orchestrator Proxysql Consul.pdf
MySQL HA Orchestrator Proxysql Consul.pdfMySQL HA Orchestrator Proxysql Consul.pdf
MySQL HA Orchestrator Proxysql Consul.pdf
 
Neo4j: Graph-like power
Neo4j: Graph-like powerNeo4j: Graph-like power
Neo4j: Graph-like power
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with Go
 
Design and Implementation of the Security Graph Language
Design and Implementation of the Security Graph LanguageDesign and Implementation of the Security Graph Language
Design and Implementation of the Security Graph Language
 
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
 
Web Traffic Time Series Forecasting
Web Traffic  Time Series ForecastingWeb Traffic  Time Series Forecasting
Web Traffic Time Series Forecasting
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Apache Hive for modern DBAs
Apache Hive for modern DBAsApache Hive for modern DBAs
Apache Hive for modern DBAs
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Impala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris TsirogiannisImpala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris Tsirogiannis
 
Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'
 
Custom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDBCustom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDB
 
What’s new in 9.6, by PostgreSQL contributor
What’s new in 9.6, by PostgreSQL contributorWhat’s new in 9.6, by PostgreSQL contributor
What’s new in 9.6, by PostgreSQL contributor
 
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
 
Apache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming modelApache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming model
 
MapReduce and Hadoop
MapReduce and HadoopMapReduce and Hadoop
MapReduce and Hadoop
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
 

Dernier

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Dernier (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

OQGraph @ SCaLE 11x 2013

  • 1. OQGraph 3 for MariaDB Graphs and Hierarchies in Plain SQL http://goo.gl/gqr7b Antony T Curtis <atcurtis@gmail.com> graph@openquery.com http://openquery.com/graph
  • 2. Graphs / Networks ● Nodes connected by Edges. ● Edges may be directional. ● Edges may have a "weight" / "cost" attribute. ● Directed graphs may have bi-directional edges. ● Unconnected sets of nodes may exist on same graph. ● There need not be a "root" node. Examples: ● "Social Graphs" / friend relationships. ● Decision / State graphs. ● Airline routes OQGRAPH computation engine © 2009-2013 Open Query
  • 3. RDBMS with Heirarchies and Graphs ● Not always a particularly good fit. ● Various tree models exist; each with limitations: ○ Adjacency model ■ Either uses fixed max depth or recursive queries. ■ Oracle has CONNECT BY PRIOR ■ SQL99 has WITH RECURSIVE...UNION... ○ Nested set ■ complex ■ recursive queries to find path to root. ○ Materialised path ■ Ugly and not relational. ■ Can be quite effective when used correctly. Further reading: http://dev.mysql.com/tech-resources/articles/hierarchical-data.html OQGRAPH computation engine © 2009-2013 Open Query
  • 4. What is OQGRAPH? ● Implemented as a storage engine. ○ Original concept by Arjen Lentz ● Mk. 2 implementation 2008 ○ GPLv2+ ○ Bundled with MariaDB 5.2+ ○ Boost Graph Library ● Mk. 3 implementation ○ GPLv2+ ○ Bundled with MariaDB 10.0 (soon) ● Easy to enable ○ INSTALL PLUGIN oqgraph SONAME ‘ha_oqgraph’; OQGRAPH computation engine © 2009-2013 Open Query
  • 5. OQGRAPH: A Computation Engine ● It is not a general purpose data engine. ○ unlike MyISAM, InnoDB or MEMORY. ● Looks like an ordinary table. ● Has a very different internal architecture. ● It does not operate in terms of ○ storing data for later retrieval. ○ having indexes on data. ● May be regarded as a "magic view" or "table function". OQGRAPH computation engine © 2009-2013 Open Query
  • 6. OQGRAPH: A Computation Engine MySQL Server Communications, Session and Thread Management DDL, DML, Management Tables, SQL Parser and SQL Views, Lock Services, Management Stored Procedure Engine Buffers Logging, and Utilities and Caches Runtime Libraries Query Optimizer and Execution Engine built in and run-time loaded plug ins OQGraph InnoDB OQGRAPH computation engine © 2009-2013 Open Query
  • 7. What's new in OQGRAPH 3 Features: ● Judy array bitmaps for Graph coloring. ● Uses existing tables for edge data. ● Much lower memory cost per query. ● Does not impose any strict structure on the source table. ● Can handle significantly larger graphs than OQGRAPHv2. ○ 100K+ index reads per second are possible. ○ Millions of edges are possible. ● All edges of graph need not fit in memory. ○ Only Judy bitmap array must be held in RAM. Notes: ● Tables are read-only and only read from the backing table. ● Table must be in same schema as the backing table. ● Table must have appropriate indexes. OQGRAPH computation engine © 2009-2013 Open Query
  • 8. Anatomy of an OQGRAPH 3 table CREATE TABLE db.tblname ( latch SMALLINT UNSIGNED NULL, origid BIGINT UNSIGNED NULL, destid BIGINT UNSIGNED NULL, weight DOUBLE NULL, seq BIGINT UNSIGNED NULL, linkid BIGINT UNSIGNED NULL, KEY (latch, origid, destid) USING HASH, KEY (latch, destid, origid) USING HASH ) ENGINE=OQGRAPH data_table='link' -- data table origid='source' -- column name destid='target' -- column name weight='weight'; -- optional column name ; OQGRAPH computation engine © 2009-2013 Open Query
  • 9. OQGRAPH - Data source ● Edges are directed edges. ● Edge weight are optional and default to 1.0 ● Undirected edges may be represented as two directed edges, in opposite directions. CREATE TABLE foo ( origid INT UNSIGNED NOT NULL, destid INT UNSIGNED NOT NULL, PRIMARY KEY(origid, destid), KEY (destid) ); INSERT INTO foo (origid,destid) VALUES (1,2), (2,3), (2,4), (4,5), (3,6), (5,6); OQGRAPH computation engine © 2009-2013 Open Query
  • 10. OQGRAPH - Data source, cont. Creating the OQGRAPH table: CREATE TABLE foo_graph ( latch SMALLINT UNSIGNED NULL, origid BIGINT UNSIGNED NULL, destid BIGINT UNSIGNED NULL, weight DOUBLE NULL, seq BIGINT UNSIGNED NULL, linkid BIGINT UNSIGNED NULL, KEY (latch, origid, destid) USING HASH, KEY (latch, destid, origid) USING HASH ) ENGINE=OQGRAPH data_table='foo' origid='origid' destid='destid'; OQGRAPH computation engine © 2009-2013 Open Query
  • 11. Selecting Edges MariaDB [foo]> select * from foo_graph; +-------+--------+--------+--------+------+--------+ | latch | origid | destid | weight | seq | linkid | +-------+--------+--------+--------+------+--------+ | NULL | 1 | 2 | 1 | NULL | NULL | | NULL | 2 | 3 | 1 | NULL | NULL | | NULL | 2 | 4 | 1 | NULL | NULL | | NULL | 3 | 6 | 1 | NULL | NULL | | NULL | 4 | 5 | 1 | NULL | NULL | | NULL | 5 | 6 | 1 | NULL | NULL | +-------+--------+--------+--------+------+--------+ 6 rows in set (0.38 sec) OQGRAPH computation engine © 2009-2013 Open Query
  • 12. Now, it's time for some magic. (shortest path calculation) ● SELECT * FROM foo_graph WHERE latch=1 AND origid=1 AND destid=6; +-------+--------+--------+--------+------+--------+ | latch | origid | destid | weight | seq | linkid | +-------+--------+--------+--------+------+--------+ | 1 | 1 | 6 | NULL | 0 | 1 | | 1 | 1 | 6 | 1 | 1 | 2 | | 1 | 1 | 6 | 1 | 2 | 3 | | 1 | 1 | 6 | 1 | 3 | 6 | +-------+--------+--------+--------+------+--------+ ● SELECT GROUP_CONCAT(linkid ORDER BY seq) AS path FROM foo_graph WHERE latch=1 AND origid=1 AND destid=6 G path: 1,2,3,6 OQGRAPH computation engine © 2009-2013 Open Query
  • 13. Other computations, ● Which paths lead to node 4? SELECT GROUP_CONCAT(linkid) AS list FROM foo_graph WHERE latch=1 AND destid=4 G list: 1,2,4 ● Where can I get to from node 4? SELECT GROUP_CONCAT(linkid) AS list FROM foo_graph WHERE latch=1 AND origid=4 G list: 6,5,4 OQGRAPH computation engine © 2009-2013 Open Query
  • 14. Other computations, continued. ● See docs for latch 0 and latch NULL ● latch 1 : Dijkstra's shortest path. ○ O((V + E).log V) ● latch 2 : Breadth-first search. ○ O(V+E) ● Other algorithms possible OQGRAPH computation engine © 2009-2013 Open Query
  • 15. Joins make it prettier, ● INSERT INTO people VALUES (1,’pearce’), (2,’hunnicut’), (3,’potter’), (4,’hoolihan’), (5,’winchester’), (6,’ mulcahy’); ● SELECT GROUP_CONCAT(name ORDER BY seq) path FROM foo_graph JOIN people ON (foo.linkid = people.id) WHERE latch=1 AND origid=1 AND destid=6 G path: pearce,hunnicut,potter,mulcahy OQGRAPH computation engine © 2009-2013 Open Query
  • 16. Tree of Life Load the tol.sql schema, Create tol_link backing store table, CREATE TABLE tol_link ( source INT UNSIGNED NOT NULL, target INT UNSIGNED NOT NULL, PRIMARY KEY (source, target), KEY (target) ) ENGINE=innodb; Populate it with all the edges we need: INSERT INTO tol_link (source,target) SELECT parent,id FROM tol WHERE parent IS NOT NULL UNION ALL SELECT id,parent FROM tol WHERE parent IS NOT NULL; Query OK, 178102 rows affected (46.35 sec) Records: 178102 Duplicates: 0 Warnings: 0 Direct download: http://bazaar.launchpad.net/~openquery-core/oqgraph/trunk/view/head:/examples/tree-of-life/tol.sql OQGRAPH computation engine © 2009-2013 Open Query
  • 17. Tree of Life, cont. Creating the OQGRAPH table: CREATE TABLE tol_tree ( latch SMALLINT UNSIGNED NULL, origid BIGINT UNSIGNED NULL, destid BIGINT UNSIGNED NULL, weight DOUBLE NULL, seq BIGINT UNSIGNED NULL, linkid BIGINT UNSIGNED NULL, KEY (latch, origid, destid) USING HASH, KEY (latch, destid, origid) USING HASH ) ENGINE=OQGRAPH data_table='tol_link' origid='source' destid='target'; OQGRAPH computation engine © 2009-2013 Open Query
  • 18. Tree of Life - finding H.Sapiens SELECT GROUP_CONCAT(name ORDER BY seq SEPARATOR ' -> ') AS path FROM tol_tree JOIN tol ON (linkid=id) WHERE latch=1 AND origid=1 AND destid=16421 G path: Life on Earth -> Eukaryotes -> Unikonts -> Opisthokonts -> Animals -> Bilateria -> Deuterostomia -> Chordata -> Craniata -> Vertebrata -> Gnathostomata -> Teleostomi -> Osteichthyes -> Sarcopterygii -> Terrestrial Vertebrates -> Tetrapoda -> Reptiliomorpha -> Amniota -> Synapsida -> Eupelycosauria -> Sphenacodontia -> Sphenacodontoidea -> Therapsida -> Theriodontia -> Cynodontia -> Mammalia -> Eutheria -> Primates -> Catarrhini -> Hominidae -> Homo -> Homo sapiens OQGRAPH computation engine © 2009-2011 Open Query
  • 19. Internet Movie DataBase (IMDB) Transform and load the movie database (this takes a long time) CREATE TABLE `entity` ( `id` int(11) NOT NULL AUTO_INCREMENT, `type` enum('ACTOR','MOVIE','TV MOVIE','TV MINI','TV SERIES','VIDEO MOVIE','VIDEO GAME','VOICE','ARCHIVE') NOT NULL, `name` varchar(128) COLLATE utf8_unicode_ci NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `type` (`type`,`name`) USING BTREE ) ENGINE=InnoDB; CREATE TABLE `link` ( `rel_id` int(11) NOT NULL AUTO_INCREMENT, `link_from` int(11) NOT NULL, `link_to` int(11) NOT NULL, PRIMARY KEY (`rel_id`), KEY `link_from` (`link_from`,`link_to`), KEY `link_to` (`link_to`) ) ENGINE=InnoDB; OQGRAPH computation engine © 2009-2013 Open Query
  • 20. Degrees of N!xau Graph of movies approximately 3.7 million nodes with 9 million edges. Tables are about 1GB and InnoDB configured for 512MB buffer pool. MariaDB [imdb]> SELECT -> GROUP_CONCAT(name ORDER BY seq SEPARATOR ' -> ') AS path -> FROM movie_graph JOIN entity ON (id=linkid) -> WHERE latch=1 -> AND origid=(SELECT a.id FROM entity a -> WHERE name='Kevin Bacon') -> AND destid=(SELECT b.id FROM entity b WHERE name='N!xau')G OQGRAPH computation engine © 2009-2013 Open Query
  • 21. Degrees of N!xau Graph of movies approximately 3.7 million nodes with 9 million edges. Tables are about 1GB and InnoDB configured for 512MB buffer pool. MariaDB [imdb]> SELECT -> GROUP_CONCAT(name ORDER BY seq SEPARATOR ' -> ') AS path -> FROM movie_graph JOIN entity ON (id=linkid) -> WHERE latch=1 -> AND origid=(SELECT a.id FROM entity a -> WHERE name='Kevin Bacon') -> AND destid=(SELECT b.id FROM entity b WHERE name='N!xau')G *************************** 1. row *************************** path: Kevin Bacon -> The Air Up There (1994) -> Fanyana H. Sidumo -> The Gods Must Be Crazy (1981) -> N!xau 1 row in set (3 min 9.67 sec) --again *************************** 1. row *************************** path: Kevin Bacon -> The Air Up There (1994) -> Fanyana H. Sidumo -> The Gods Must Be Crazy (1981) -> N!xau 1 row in set (1 min 7.13 sec) Each query requires approximately 7.8 million secondary key reads. OQGRAPH computation engine © 2009-2013 Open Query
  • 22. Degrees of N!xau Graph of approximately 3.7 million nodes with 30 million edges. Tables are about 3.5GB and InnoDB configured for 512MB buffer pool. MariaDB [imdb]> SELECT -> GROUP_CONCAT(name ORDER BY seq SEPARATOR ' -> ') AS path -> FROM imdb_graph JOIN entity ON (id=linkid) -> WHERE latch=1 -> AND origid=(SELECT a.id FROM entity a -> WHERE name='Kevin Bacon') -> AND destid=(SELECT b.id FROM entity b WHERE name='N!xau')G *************************** 1. row *************************** path: Kevin Bacon -> The 45th Annual Golden Globe Awards (1988) -> Richard Attenborough -> In Darkest Hollywood: Cinema and Apartheid (1993) -> N!xau 1 row in set (10 min 6.55 sec) --again *************************** 1. row *************************** path: Kevin Bacon -> The 45th Annual Golden Globe Awards (1988) -> Richard Attenborough -> In Darkest Hollywood: Cinema and Apartheid (1993) -> N!xau 1 row in set (8 min 29.66 sec) Each query requires approximately 16.6 million secondary key reads. OQGRAPH computation engine © 2009-2013 Open Query
  • 23. We want your feedback! ● Very easy to use... But do feel free to ask us for help/advice. ● OpenQuery created friendlist_graph for Drupal 6. ○ Currently based on OQGraph v2 ○ Addition to the existing friendlist module. ○ Enables easy social networking in Drupal. ○ Peter Lieverdink (@cafuego) did this in about 30 minutes ● We would like to know how you are using OQGRAPH! ○ You could be doing something really cool... OQGRAPH computation engine © 2009-2013 Open Query
  • 24. Links and support ● Binaries & Packages ○ http://mariadb.com (MariaDB 10.0 soon) ● Source collaboration ○ https://launchpad.net/oqgraph ○ https://code.launchpad.net/~oqgraph-dev/maria/10.0-oqgraph3 ● Info, Docs, Support, Licensing, Engineering ○ http://openquery.com/graph ○ This presentation: http://goo.gl/gqr7b Thank you! Antony Curtis & Arjen Lentz graph@openquery.com OQGRAPH computation engine © 2009-2013 Open Query