quartz/content/BigData/Hadoop/Google Dremel.md
2025-07-23 20:36:04 +03:00

916 B

Hadoop Eccosystem

Key Ideas • Leverages columnar file format • Optimized for SQL performance

Concepts

  • Tree-based query execution.
  • Efficient scanning and aggregation of nested columnar data.

Columnare data format

Illustration of what columnar storage is all about: given a 3 columns: !Screenshot 2025-07-23 at 18.42.46.png In a row-oriented storage, the data is laid out one row at a time as follows: !Screenshot 2025-07-23 at 18.45.25.png Whereas in a column-oriented storage, it is laid out one column at a time: !Screenshot 2025-07-23 at 18.46.55.png

Nested data in columnar format !Screenshot 2025-07-23 at 18.50.10.png!Screenshot 2025-07-23 at 18.50.16.png

Frameworks inspired by Google Dremel

• Apache Dril (MapR) • Apache Impala (Cloudera) • Apache Tez (Hortonworks) • Presto (Facebook)