MySQL 8 introduces support for ANSI SQL recursive queries with common table expressions, a powerful method for working with recursive data references. Until now, MySQL application developers have had to use workarounds for hierarchical data relationships. It's time to write SQL queries in a more standardized way, and be compatible with other brands of SQL implementations. But as always, the bottom line is: how does it perform? This presentation will briefly describe how to use recursive queries, and then test the performance and scalability of those queries against other solutions for hierarchical queries.
2. Bill Karwin
Software developer, consultant, trainer
Using MySQL since 2000
Senior Database Architect at SchoolMessenger
Author of SQL Antipatterns: Avoiding the Pitfalls of
Database Programming
Oracle ACE Director
3. How to Query a Tree?
Hierarchical data
§ Organization charts
§ Categories and sub-categories
§ Parts explosion
§ Threaded discussions
https://commons.wikimedia.org/wiki/File:Staff_Organisation_Diagram,_1896.jpg
5. Adjacency List Example Data
comment_id parent_id author comment
1 NULL Fran What’s the cause of this bug?
2 1 Ollie I think it’s a null pointer.
3 2 Fran No, I checked for that.
4 1 Kukla We need to check valid input.
5 4 Ollie Yes, that’s a bug.
6 4 Fran Yes, please add a check
7 6 Kukla That fixed it.
6. Can’t Easily Query Deep Trees
SELECT * FROM Comments c1
LEFT JOIN Comments c2 ON (c2.parent_id = c1.comment_id)
LEFT JOIN Comments c3 ON (c3.parent_id = c2.comment_id)
LEFT JOIN Comments c4 ON (c4.parent_id = c3.comment_id)
LEFT JOIN Comments c5 ON (c5.parent_id = c4.comment_id)
LEFT JOIN Comments c6 ON (c6.parent_id = c5.comment_id)
LEFT JOIN Comments c7 ON (c7.parent_id = c6.comment_id)
LEFT JOIN Comments c8 ON (c8.parent_id = c7.comment_id)
LEFT JOIN Comments c9 ON (c9.parent_id = c8.comment_id)
LEFT JOIN Comments c10 ON (c10.parent_id = c9.comment_id)
...
8. MySQL Workarounds
MySQL lacked support for recursive queries, so workarounds were needed
These are all denormalized designs, most don’t have referential integrity
§Path enumeration
§Nested sets
§Closure table
9. Path Enumeration Example Data
comment_id path author comment
1 1/ Fran What’s the cause of this bug?
2 1/2/ Ollie I think it’s a null pointer.
3 1/2/3/ Fran No, I checked for that.
4 1/4/ Kukla We need to check valid input.
5 1/4/5/ Ollie Yes, that’s a bug.
6 1/4/6/ Fran Yes, please add a check
7 1/4/6/7/ Kukla That fixed it.
10. Path Enumeration Example Queries
Query ancestors of comment #7:
SELECT * FROM Comments
WHERE '1/4/6/7/' LIKE CONCAT(path, '%');
Query descendants of comment #4:
SELECT * FROM Comments
WHERE path LIKE '1/4/%';
11. Path Enumeration Pros and Cons
Pros:
§Single non-recursive query to get a tree or a subtree
Cons:
§Complex updates to add or remove a node
§Numbers are stored in a string—no referential integrity
12. Nested Sets
Each comment encodes its descendants using two numbers:
§ A comment’s left number is less than all numbers used by the comment’s descendants.
§ A comment’s right number is greater than all numbers used by the comment’s
descendants.
§ A comment’s numbers are between all
numbers used by the comment’s ancestors.
References:
§ “Recursive Hierarchies: The Relational Taboo!” Michael J. Kamfonas,
Relational Journal, Oct/Nov 1992
§ “Trees and Hierarchies in SQL For Smarties,” Joe Celko, 2004
§ “Managing Hierarchical Data in MySQL,” Mike Hillyer, 2005
14. Nested Sets Example Data
comment_id nsleft nsright author comment
1 1 14 Fran What’s the cause of this bug?
2 2 5 Ollie I think it’s a null pointer.
3 3 4 Fran No, I checked for that.
4 6 13 Kukla We need to check valid input.
5 7 8 Ollie Yes, that’s a bug.
6 9 12 Fran Yes, please add a check
7 10 11 Kukla That fixed it.
15. Nested Sets Example Queries
Query ancestors of comment #7:
SELECT ancestor.* FROM Comments child
JOIN Comments ancestor
ON child.nsleft BETWEEN ancestor.nsleft AND ancestor.nsright
WHERE child.comment_id = 7;
Query subtree under comment #4:
SELECT descendant.* FROM Comments parent
JOIN Comments descendant
ON descendant.nsleft BETWEEN parent.nsleft AND parent.nsright
WHERE parent.comment_id = 4;
16. Nested Sets Pros and Cons
Pros:
§Single non-recursive query to get a tree or a subtree
Cons:
§Complex updates to add or remove a node
§Numbers are not foreign keys—no referential integrity
17. Closure Table
Many-to-many table
Stores every path from each node to each of its descendants
A node even connects to itself
CREATE TABLE Closure (
ancestor INT NOT NULL,
descendant INT NOT NULL,
length INT NOT NULL,
PRIMARY KEY (ancestor, descendant),
FOREIGN KEY(ancestor) REFERENCES Comments(comment_id),
FOREIGN KEY(descendant) REFERENCES Comments(comment_id)
);
19. Closure Table Example Data
comment_id author comment
1 Fran What’s the cause of this bug?
2 Ollie I think it’s a null pointer.
3 Fran No, I checked for that.
4 Kukla We need to check valid input.
5 Ollie Yes, that’s a bug.
6 Fran Yes, please add a check
7 Kukla That fixed it.
ancestor descendant length
1 1 0
1 2 1
1 3 2
1 4 1
1 5 2
1 6 2
1 7 3
2 2 0
2 3 1
3 3 0
4 4 0
4 5 1
4 6 1
4 7 2
5 5 0
6 6 0
6 7 1
7 7 0
20. Closure Table Example Queries
Query ancestors of comment #7:
SELECT c.* FROM Comments c
JOIN Closure t
ON (c.comment_id = t.ancestor)
WHERE t.descendant = 7;
Query subtree under comment #4:
SELECT c.* FROM Comments c
JOIN Closure t
ON (c.comment_id = t.descendant)
WHERE t.ancestor = 4;
21. Closure Table Pros and Cons
Pros:
§Single non-recursive query to get a tree or a subtree
§Referential integrity!
Cons:
§Extra table is required
§Hierarchy is stored redundantly, too easy to mess up
§Lots of joins to do most kinds of queries
23. WITHer Recursive Queries in MySQL?
SQL vendors gradually implemented SQL-99 WITH syntax:
§ IBM DB2 UDB 8 (Dec. 2002)
§ Microsoft SQL Server 2005 (Oct. 2005)
§ Sybase SQL Anywhere 11 (Aug. 2008)
§ Firebird 2.1 (Sep. 2008)
§ PostgreSQL 8.4 (Jul. 2009)
§ Oracle 11g release 2 (Sep. 2009)
§ Teradata (date and version of support unknown, at least 2009)
§ HSQLDB 2.3 (Jul. 2013)
§ SQLite 3.8.3.1 (Feb. 2014)
§ H2 (date and version unknown)
https://www.percona.com/blog/2014/02/11/wither-recursive-queries/
24. ANSI SQL Recursive Common Table Expression
WITH RECURSIVE cte_name (col_name, col_name, col_name) AS
(
subquery base case
UNION ALL
subquery referencing cte_name
)
SELECT ... FROM cte_name ...
https://dev.mysql.com/doc/refman/8.0/en/with.html
25. Generating a Series of Numbers
WITH RECURSIVE MySeries (n) AS
(
SELECT 1 AS n
UNION ALL
SELECT 1+n FROM MySeries WHERE n < 10
)
SELECT * FROM MySeries;
+------+
| n |
+------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
| 10 |
+------+
26. Generating a Series of Dates
WITH RECURSIVE MyDates (d) AS
(
SELECT CURRENT_DATE() AS d
UNION ALL
SELECT d + INTERVAL 1 DAY FROM MyDates
WHERE d < CURRENT_DATE() + INTERVAL 7 DAY
)
SELECT * FROM MyDates;
+------------+
| d |
+------------+
| 2017-04-24 |
| 2017-04-25 |
| 2017-04-26 |
| 2017-04-27 |
| 2017-04-28 |
| 2017-04-29 |
| 2017-04-30 |
| 2017-05-01 |
+------------+
27. Query ancestors of comment #7
WITH RECURSIVE CommentTree (comment_id, parent_id, author, comment,
depth) AS
(
SELECT comment_id, parent_id, author, comment, 0 AS depth
FROM Comments
WHERE comment_id = 7
UNION ALL
SELECT c.comment_id, c.parent_id, c.author, c.comment, ct.depth+1
FROM CommentTree ct
JOIN Comments c ON (ct.parent_id = c.comment_id)
)
SELECT * FROM CommentTree;
28.
29. Query subtree under comment #4
WITH RECURSIVE CommentTree (comment_id, parent_id, author, comment,
depth) AS
(
SELECT comment_id, parent_id, author, comment, 0 AS depth
FROM Comments
WHERE comment_id = 4
UNION ALL
SELECT c.comment_id, c.parent_id, c.author, c.comment, ct.depth+1
FROM CommentTree ct
JOIN Comments c ON (ct.comment_id = c.parent_id)
)
SELECT * FROM CommentTree;
30. Recursive CTE Pros and Cons
Pros:
§ ANSI SQL-99 Standard
§ Compatible with other SQL implementations
§ Works with Adjacency List (single source of authority)
§ Referential integrity!
Cons:
§ Not compatible with earlier MySQL versions
§ Use of materialized temporary tables may cause performance problems
31. MySQL CTE Implementation: 💯
Thanks to @MarkusWinand for his preview analysis based on 8.0.1-dmr
http://modern-sql.com/feature/with
33. ITIS: Sample Hierarchical Data
Integrated Taxonomic Information System
(https://www.itis.gov/)
§Biological database of species of animals, plants, fungi
§One big tree of 544,954 nodes
§Data comes in adjacency list & path enumeration format
§I converted to closure table for query tests
34. ITIS Data Model
mysql> select * from longnames
where completename = 'Eschscholzia californica';
+--------+---------------------------+
| tsn | completename |
+--------+---------------------------+
| 18956 | Eschscholzia californica |
+--------+---------------------------+
mysql> select * from hierarchy where TSN = '18956'G
TSN: 18956
Parent_TSN: 18954
level: 11
ChildrenCount: 8
hierarchy_string: 202422-954898-846494-954900-846496-846504-18063-846547-18409-18880-18954-18956
36. Breadcrumbs Query
WITH RECURSIVE taxonomy AS
(
SELECT base.tsn, base.parent_tsn, 0 as depth
FROM hierarchy base
WHERE tsn = '18956'
UNION ALL
SELECT next.tsn, next.parent_tsn, t.depth+1
FROM hierarchy next JOIN taxonomy t
WHERE t.parent_tsn = next.tsn
)
SELECT * FROM taxonomy JOIN longnames USING (tsn)
ORDER BY depth DESC;
42. Tree Expansion Query
WITH RECURSIVE ancestors (tsn, parent_tsn) AS (
SELECT h.tsn, h.parent_tsn FROM hierarchy AS h WHERE h.tsn = %s
UNION ALL
SELECT h.tsn, h.parent_tsn FROM hierarchy AS h JOIN ancestors AS base ON h.tsn = base.parent_tsn
),
breadcrumbs (tsn, parent_tsn, depth, breadcrumbs) AS (
SELECT h.tsn, h.parent_tsn, 0 AS depth, CAST(LPAD(h.tsn, 8, '0') AS CHAR(255)) AS breadcrumbs
FROM hierarchy AS h WHERE h.parent_tsn = 0
UNION ALL
SELECT h.tsn, h.parent_tsn, base.depth+1 AS depth, CONCAT(base.breadcrumbs, ',', LPAD(h.tsn, 8,
'0'))
FROM hierarchy AS h
JOIN ancestors AS a ON h.tsn = a.tsn
JOIN breadcrumbs AS base ON h.parent_tsn = base.tsn
)
SELECT l.tsn, l.completename, b.depth, b.breadcrumbs
FROM breadcrumbs AS b JOIN longnames AS l ON b.tsn = l.tsn
UNION
SELECT l.tsn, l.completename, b.depth+1, CONCAT(b.breadcrumbs, ',', LPAD(h.tsn, 8, '0'))
FROM breadcrumbs AS b
JOIN hierarchy AS h ON b.tsn = h.parent_tsn
JOIN longnames AS l ON l.tsn = h.tsn
ORDER BY breadcrumbs
43. Tree Expansion Query EXPLAIN
--------------+------------+--------+-------------+---------+-------------------+--------+----------+--------------------------------
select_type | table | type | key | key_len | ref | rows | filtered | Extra
--------------+------------+--------+-------------+---------+-------------------+--------+----------+--------------------------------
PRIMARY | <derived2> | ALL | NULL | NULL | NULL | 250230 | 100.00 | Using where
PRIMARY | l | eq_ref | PRIMARY | 4 | b.tsn | 1 | 100.00 | NULL
DERIVED | h | index | TSN | 9 | NULL | 500466 | 10.00 | Using where; Using index
UNION | base | ALL | NULL | NULL | NULL | 50046 | 100.00 | Recursive; Using where
UNION | <derived4> | ALL | NULL | NULL | NULL | 4 | 100.00 | Using where; Using join buffer
UNION | h | ref | TSN | 9 | a.tsn,base.tsn | 1 | 100.00 | Using index
DERIVED | h | ref | TSN | 4 | const | 1 | 100.00 | Using index
UNION | base | ALL | NULL | NULL | NULL | 2 | 100.00 | Recursive; Using where
UNION | h | ref | TSN | 4 | base.parent_tsn | 1 | 100.00 | Using index
UNION | h | index | TSN | 9 | NULL | 500466 | 100.00 | Using where; Using index
UNION | l | eq_ref | PRIMARY | 4 | itis.h.TSN | 1 | 100.00 | NULL
UNION | <derived2> | ref | <auto_key0> | 5 | itis.h.Parent_TSN | 10 | 100.00 | NULL
| UNION RESULT | <union1,8> | ALL | NULL | NULL | NULL | NULL | NULL | Using temporary; Using filesort
--------------+------------+--------+-------------+---------+-------------------+--------+----------+--------------------------------
Maybe I need more indexes?
Unfortunately I ran out of time to analyze.
47. Conclusions
§Overall, MySQL 8 support for recursive CTE queries is
worth the wait.
§Exotic cases exist that are beyond any optimizer.
§I'm excited to upgrade to MySQL 8.0.x ASAP!
§Now that virtually all major SQL brands support
recursive CTE's, we need developer tools and popular
apps to use them!
48. License and Copyright
Copyright 2017 Bill Karwin
http://www.slideshare.net/billkarwin
Released under a Creative Commons 3.0 License:
http://creativecommons.org/licenses/by-nc-nd/3.0/
You are free to share—to copy, distribute,
and transmit this work, under the following conditions:
Attribution.
You must attribute this
work to Bill Karwin.
Noncommercial.
You may not use this
work for commercial
purposes.
No Derivative Works.
You may not alter,
transform, or build
upon this work.