Boston Hadoop Meetup: Presto for the Enterprise

1
Boston Hadoop User Group Meetup, July 7, 2015
Kamil Bajda-Pawlikowski
Matt Fuller

2
•  History of Teradata Center for Hadoop
–  Formerly Hadapt Founded in July, 2010 by Borgman, Bajda-Pawlikowski, and
Abadi
–  Pioneered SQL-on-Hadoop market
–  Based on work done by database research group in Yale Computer Science
Department
–  Hybrid of Hadoop scalability and DBMS performance
•  Today
–  Acquired by Teradata in July, 2014, renamed Teradata Center for Hadoop
–  30 developers with deep Hadoop and database expertise
–  Headquarters in Boston, MA
–  Contributors to open source project Presto
Who are we? - Teradata Center for Hadoop!

3
•  What is Presto?
•  What is Teradata doing?
•  Can I see a Demo?
•  How can I contribute?
Talk Agenda

4
•  100% open source distributed ANSI SQL engine for Big Data
–  Modern code base
–  Proven scalability
–  Optimized for low latency, Interactive querying
•  Cross platform query capability, not only SQL on Hadoop
•  Distributed under the Apache license, now supported by Teradata
•  Used by a community of well known, well respected technology companies
What is Presto?

5
History of Presto
FALL 2012
4 developers
start Presto
development
FALL 2014
88 Releases
41 Contributors
3943 Commits
SPRING 2015
98 Releases
65 Contributors
4587 Commits
---------
Teradata joins
Presto community
& offers support
SPRING 2013
Presto rolled out
within Facebook
FALL 2013
Facebook open
sources Presto
FALL 2008
Facebook
open sources
Hive
Timeline image courtesy of Facebook

6
Presto Architecture
Data stream API
Worker
Data stream API
Worker
Coordinator
Metadata
API
Parser/
analyzer
Planner Scheduler
Worker
Client
Data location
API
Pluggable
https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920

7
Presto Extensibility – connectors
Parser/
analyzer
Planner
Worker
Data location API
Hive
Cassandra
Kafka
MySQL
…
Metadata API
Hive
Cassandra
Kafka
MySQL
…
Data stream API
Hive
Cassandra
Kafka
MySQL
…
Scheduler
Coordinator
https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920

8
•  Data stays in memory during execution and is pipelined across nodes MPP-style
•  Vectorized columnar processing
•  Presto is written in highly tuned Java
–  Efficient in-memory data structures
–  Very careful coding of inner loops
–  Bytecode generation
•  Optimized ORC reader
Presto = Performance

9
•  Facebook
–  Multiple production clusters (100s of nodes total)
-  Including 300PB Hadoop data warehouse
–  1000s of internal daily active users
–  Millions of queries each month
–  Multiple PBs scanned every day
–  Trillions of rows a day
•  Netflix
–  Over 200-node production cluster on EC2
–  Over 15 PB in S3 (Parquet format)
–  Over 300 users and 2.5K queries daily
Presto in Production

10
•  100% open source contributions to Presto to
increase adoption in the enterprise
•  A multi-year roadmap commitment to
phased enhancements of the open source
code
•  The first ever commercial support offering for
Presto
What is Teradata Doing?
Teradata Certified Presto
www.teradata.com/presto

11
•  Hadoop Distro Agnostic
•  Modern Code Base
–  Presto is well-designed open source software with proper database
architecture
•  Strong Like-Minded Community
•  Push down processing across multiple data platforms
•  Leverage Teradata expertise to make SQL for Hadoop viable
Why is Teradata Contributing to Presto?

13
Implement Integrate Proliferate
•  Installer
•  Documentation
•  Monitoring & Support
Tools
•  Management Tool
Integration
•  YARN Integration
•  ODBC / JDBC Drivers
•  BI Certification
•  Security
•  Connectors
Commercial Support
Phase 1 Phase 2 Phase 3
June 8, 2015 Q4 2015 2016
Expanding ANSI SQL Coverage
Teradata Contributions to Presto

14
•  Ease of install and management via Presto-Admin tool
–  www.github.com/prestodb/presto-admin
–  Packaging Presto as an RPM
•  Testing Framework for Presto
–  www.github.com/prestodb/tempto
–  Added large number of tests
•  Improvements to JDBC driver
–  To be open sourced on www.github.com/prestodb soon!
•  Various SQL improvements
Teradata’s Contributions

15
•  YARN Integration
•  Ambari Integration
•  ODBC & JDBC Drivers that actually work
•  Security – Authentication & Authorization
•  Continued SQL Improvements
•  BI tool certifications – e.g. Tableau
•  More Connectors – e.g. Hbase
•  Open Source our Docker based Dev Env
•  Open our Continuous Integration platform to the community
Teradata’s Contribution Product Roadmap

16
www.github.com/facebook/presto
www.github.com/prestodb
Certified Distro: www.teradata.com/presto
Website: www.prestodb.io
Presto User’s Group: www.groups.google.com/group/presto-users
Facebook Page: www.facebook.com/prestodb
Twitter: #prestodb
How can I contribute?

17
Available for Download
–  Presto 101t Server, CLI, JDBC
–  Presto-Admin 0.1
–  Documentation
–  HDP w/ Presto VM Sandbox
–  CDH w/ Presto VM Sandbox
www.teradata.com/presto
Presto 101t certified by Teradata

Boston Hadoop Meetup: Presto for the Enterprise

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Boston Hadoop Meetup: Presto for the Enterprise

Similaire à Boston Hadoop Meetup: Presto for the Enterprise (20)

Dernier

Dernier (20)

Boston Hadoop Meetup: Presto for the Enterprise