Presto is a distributed SQL query engine used at Facebook for analytics workloads like warehouse ETL, dashboards, and A/B testing. Over the past year there have been improvements to workload management, security, execution performance, and language features. Upcoming work includes improving scalability, adding more connectors, and advanced query planning techniques like dynamic filtering. Future projects aim to enhance failure recovery, temporary tables, and introduce user-defined functions.
3. Presto @ Facebook
• Multiple uses
• Warehouse ETL and ad-hoc analytics
• Dashboards
• Analytics backend for A/B testing system
• Analytics backed for user facing products
• 1000s of nodes across several clusters
• 100s of PBs and quadrillions of rows processed/day
• > 70% of new warehouse ETL workloads on Presto
4.
5. Since last year's meetup...
• 30 releases (0.176 to 0.205)
• ~3,500 commits (13,800 total)
• ~140 contributors (330 total)
6. Workload management
• Fairness in local scheduler
• Resource group improvements
• Query runtime limits
Since last year's meetup...
7. Security
• Authentication
• Client certificates
• Password
• JSON Web Tokens (RFC 7519)
• Worker-to-worker encryption
Since last year's meetup...
8. Security
• Column-level access control
public interface ConnectorAccessControl
{
...
default void checkCanSelectFromColumns(
ConnectorTransactionHandle transaction,
Identity identity,
SchemaTableName table,
Set<String> columns)
}
Since last year's meetup...
9. Execution
• Partitioned (a.k.a "bucket by bucket") query execution
• Distributed sort
• Dynamic writer scaling
• Support for JOIN spilling
• Improved memory accounting
• Java 9 support
• ... many, many performance and reliability improvements
Since last year's meetup...
10. Language
• Geospatial functions and optimizations (SQL/MM Part 3)
• Language constructs
• GROUPING
• CURRENT_USER
• New data types
• IPADDRESS
• SET_DIGEST
Since last year's meetup...
12. Language
• Ordered aggregates
SELECT id, array_agg(value ORDER BY index)
FROM t,
UNNEST(t.elements) WITH ORDINALITY
AS u(index, value)
GROUP BY id
Since last year's meetup...
13. Language
• LATERAL join
SELECT orderkey, total
FROM orders o, LATERAL (
SELECT sum(totalprice) total
FROM orders
WHERE custkey = o.custkey
AND orderdate <= o.orderdate
AND orderpriority > o.orderpriority
) t
Since last year's meetup...