Presto is a distributed SQL query engine that runs on a cluster of machines. It enables interactive, ad-hoc analytics on large amounts of data on data lakes. Presto enables querying data where it lives, including HDFS, AWS S3, relational databases, NoSQL databases, and some proprietary data stores. 

Open Source Presto 

Presto is built for high performance interactive querying with in-memory execution.

Key characteristics include:

  • High scalability from 1 to 1000s of workers
  • Flexibility to support a wide range of SQL use cases
  • Highly pluggable architecture that makes it easy to extend Presto with custom integrations for security, event listeners, etc.
  • Federation of data sources particularly data lakes via Presto connectors
  • Seamless integration with existing SQL systems with ANSI SQL standard

A full deployment of Presto has a coordinator and multiple workers. Queries are sub‐ mitted to the coordinator by a client like the command line interface (CLI), a BI tool, or a notebook that supports SQL. The coordinator parses, analyzes and creates the optimal query execution plan using metadata and data distribution information. That plan is then distributed to the workers for processing. The advantage of this decoupled storage model is that Presto can provide a single view of all your data that has been aggregated into the data storage tier like Hadoop Distributed File System (HDFS).

