1. The document summarizes a presentation given by Kamil Bajda-Pawlikowski and Matt Fuller at the Boston Hadoop User Group Meetup on July 7, 2015 about Presto and Teradata's involvement with it.
2. Presto is an open source distributed SQL query engine that allows fast interactive querying of large datasets. It was originally developed at Facebook and is now supported by Teradata.
3. Teradata acquired the company that founded Presto in 2014 and has been contributing to the open source project, with plans to further its support and expand Presto's capabilities and adoption over multiple phases.
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
Boston Hadoop Meetup: Presto for the Enterprise
1. 1
Boston Hadoop User Group Meetup, July 7, 2015
Kamil Bajda-Pawlikowski
Matt Fuller
2. 2
• History of Teradata Center for Hadoop
– Formerly Hadapt Founded in July, 2010 by Borgman, Bajda-Pawlikowski, and
Abadi
– Pioneered SQL-on-Hadoop market
– Based on work done by database research group in Yale Computer Science
Department
– Hybrid of Hadoop scalability and DBMS performance
• Today
– Acquired by Teradata in July, 2014, renamed Teradata Center for Hadoop
– 30 developers with deep Hadoop and database expertise
– Headquarters in Boston, MA
– Contributors to open source project Presto
Who are we? - Teradata Center for Hadoop!
3. 3
• What is Presto?
• What is Teradata doing?
• Can I see a Demo?
• How can I contribute?
Talk Agenda
4. 4
• 100% open source distributed ANSI SQL engine for Big Data
– Modern code base
– Proven scalability
– Optimized for low latency, Interactive querying
• Cross platform query capability, not only SQL on Hadoop
• Distributed under the Apache license, now supported by Teradata
• Used by a community of well known, well respected technology companies
What is Presto?
5. 5
History of Presto
FALL 2012
4 developers
start Presto
development
FALL 2014
88 Releases
41 Contributors
3943 Commits
SPRING 2015
98 Releases
65 Contributors
4587 Commits
---------
Teradata joins
Presto community
& offers support
SPRING 2013
Presto rolled out
within Facebook
FALL 2013
Facebook open
sources Presto
FALL 2008
Facebook
open sources
Hive
Timeline image courtesy of Facebook
6. 6
Presto Architecture
Data stream API
Worker
Data stream API
Worker
Coordinator
Metadata
API
Parser/
analyzer
Planner Scheduler
Worker
Client
Data location
API
Pluggable
https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920
7. 7
Presto Extensibility – connectors
Parser/
analyzer
Planner
Worker
Data location API
Hive
Cassandra
Kafka
MySQL
…
Metadata API
Hive
Cassandra
Kafka
MySQL
…
Data stream API
Hive
Cassandra
Kafka
MySQL
…
Scheduler
Coordinator
https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920
8. 8
• Data stays in memory during execution and is pipelined across nodes MPP-style
• Vectorized columnar processing
• Presto is written in highly tuned Java
– Efficient in-memory data structures
– Very careful coding of inner loops
– Bytecode generation
• Optimized ORC reader
Presto = Performance
9. 9
• Facebook
– Multiple production clusters (100s of nodes total)
- Including 300PB Hadoop data warehouse
– 1000s of internal daily active users
– Millions of queries each month
– Multiple PBs scanned every day
– Trillions of rows a day
• Netflix
– Over 200-node production cluster on EC2
– Over 15 PB in S3 (Parquet format)
– Over 300 users and 2.5K queries daily
Presto in Production
10. 10
• 100% open source contributions to Presto to
increase adoption in the enterprise
• A multi-year roadmap commitment to
phased enhancements of the open source
code
• The first ever commercial support offering for
Presto
What is Teradata Doing?
Teradata Certified Presto
www.teradata.com/presto
11. 11
• Hadoop Distro Agnostic
• Modern Code Base
– Presto is well-designed open source software with proper database
architecture
• Strong Like-Minded Community
• Push down processing across multiple data platforms
• Leverage Teradata expertise to make SQL for Hadoop viable
Why is Teradata Contributing to Presto?
14. 14
• Ease of install and management via Presto-Admin tool
– www.github.com/prestodb/presto-admin
– Packaging Presto as an RPM
• Testing Framework for Presto
– www.github.com/prestodb/tempto
– Added large number of tests
• Improvements to JDBC driver
– To be open sourced on www.github.com/prestodb soon!
• Various SQL improvements
Teradata’s Contributions
15. 15
• YARN Integration
• Ambari Integration
• ODBC & JDBC Drivers that actually work
• Security – Authentication & Authorization
• Continued SQL Improvements
• BI tool certifications – e.g. Tableau
• More Connectors – e.g. Hbase
• Open Source our Docker based Dev Env
• Open our Continuous Integration platform to the community
Teradata’s Contribution Product Roadmap