quartz/Apache Hive.md at 7b7a97b7cf19075c84db31690cf504e4dd0069a8

GitHub/quartz

Fork 2

mirror of https://github.com/jackyzha0/quartz.git synced 2025-12-23 21:04:07 -06:00

DeGefen 7b7a97b7cf Add my Obsidian notes

2025-07-23 20:36:04 +03:00

1.5 KiB

Raw Blame History

aliases

Hive

Hadoop Eccosystem

Apache Hive

Key Features

Developed by Apache.
General SQL-like syntax for querying HDFS or other large databases
Translates SQL queries into one or more MapReduce jobs.
Maps data in HDFS into virtual RDBMS-like tables.
Pro:
- Convenient for data analytics uses SQL.

Con:
- Quite slow in response time

Hive Data Model

Structure

Physical: Data stored in HDFS blocks across nodes.
Virtual Table: Defined with schema using metadata.
Partitions: Logical splits of data to speed up queries.

Metadata

Hive stores metadatain DB
Map physical files to tables.
Map fields (columns) to line structures in raw data.

Hive Architecture !

Hive Usage

#Start a hive shell:
$hive

#create hive table:
hive> CREATE TABLE mta (id BIGINT, name STRING, startdate TIMESTAMP, email STRING)

#Show all tables:
hive> SHOW TABLES;

#Add a new column to the table:
hive> ALTER TABLE mta ADD COLUMNS (description STRING);

#Load HDFS data file into the table:
hive> LOAD DATA INPATH '/home/hadoop/mta_users' OVERWRITE INTO TABLE mta;

#Query employees that work more than a year:
hive> SELECT name FROM mta WHERE (unix_timestamp() - startdate > 365 * 24 * 60 * 60);

#Execute command without shell
$hive -e 'SELECT name FROM mta;'

#Execute script from file
$hive -f hive_script.txt

1.5 KiB Raw Blame History

Apache Hive

Key Features

Hive Data Model

Hive Usage

1.5 KiB

Raw Blame History