quartz/content/BigData/Hadoop/Apache Hive.md

---
aliases:
  - Hive
---
> [[Hadoop Eccosystem|Systems based on MapReduce]]

### Apache Hive
##### **Key Features**
- Developed by **Apache**.
- General SQL-like syntax for querying [[HDFS]] or other large databases
- Translates SQL queries into one or more [[MapReduce]] jobs.
- Maps data in [[HDFS]] into virtual [[RDBMS]]-like tables.
- **Pro**:
	- Convenient for **data analytics** uses SQL.
* **Con**:
	* Quite slow in response time

##### **Hive Data Model**
**Structure**
- **Physical**: Data stored in [[HDFS]] blocks across nodes.
- **Virtual Table**: Defined with schema using metadata.
- **Partitions**: Logical splits of data to speed up queries.

**Metadata**
- Hive stores metadatain DB
- Map physical files to tables.
- Map fields (columns) to line structures in raw data.

![[Screenshot 2025-07-23 at 18.25.32.png]]

**Hive Architecture**
![[Screenshot 2025-07-23 at 18.27.30.png|]]

##### Hive Usage
```
#Start a hive shell:
$hive

#create hive table:
hive> CREATE TABLE mta (id BIGINT, name STRING, startdate TIMESTAMP, email STRING)

#Show all tables:
hive> SHOW TABLES;

#Add a new column to the table:
hive> ALTER TABLE mta ADD COLUMNS (description STRING);

#Load HDFS data file into the table:
hive> LOAD DATA INPATH '/home/hadoop/mta_users' OVERWRITE INTO TABLE mta;

#Query employees that work more than a year:
hive> SELECT name FROM mta WHERE (unix_timestamp() - startdate > 365 * 24 * 60 * 60);

#Execute command without shell
$hive -e 'SELECT name FROM mta;'

#Execute script from file
$hive -f hive_script.txt
```