quartz/content/BigData/Hadoop/HDFS.md

##### HDFS ([[Hadoop]] Distributed File System)
Stores huge files (Typical file size GB-TB) across multiple machines.
- Breaks files into **blocks** (typically 128 MB).
- **Replicates** blocks (default 3 copies) for fault tolerance.
- Access using POSIX API.

##### HDFS design principles
* **Immutable**: **write-once, read-many**
* **No Failures**: Disk or node failure does not affect file system
* **File Size Unlimited**: Up to 512 yottabytes (2^63 X 64MB)
* **File Num Limited**: 1048576 files in a directory
* **Prefer bigger files**: Big files provide better performance

##### HDFS File Formats
- Text/CSV - No schema, no metadata
- Json Records - metadata is stored with data
- Avro Files - schema independent of data
- Sequence Files - binary files (used as intermediate storage in M/R)
- RC Files - Record Columnar files
- ORC Files - Optimized RC files. Compress better
- Parquet Files - Yet another RC file

##### HDFS Command Line
```
# List files
hadoop fs -ls /path

# Make directory
hadoop fs -mkdir /user/hadoop

# Print file
hadoop fs -cat /file

# Upload file
hadoop fs -copyFromLocal file.txt hdfs://...
```

#### HDFS Architecture – Main Components
##### **1.** NameNode (Master Node)
- **Stores metadata** about the filesystem:
    - Filenames
    - Directory structure
    - Block locations
    - Permissions

- It **does not store the actual data**.
- There is **one active NameNode** per cluster.

##### **2.** DataNodes (Worker Nodes)
- Store the **actual data blocks** of files.
- Send **heartbeat** messages to the NameNode to report that they are alive.
- When a file is written, it’s split into blocks and distributed across many DataNodes.
- DataNodes also **replicate** blocks (typically 3 copies) to provide **fault tolerance**.

#### File Read / Write
**When a file is written:**
1. The client contacts the **NameNode** to ask: “Where should I write the blocks?”
2. The NameNode responds with a list of **DataNodes** to use.
3. The client sends the blocks of the file to those DataNodes.
4. Blocks are **replicated** automatically across different nodes for redundancy.

**When a file is read:**
1. The client contacts the **NameNode** to get the list of DataNodes storing the required blocks.
2. The client reads the blocks **directly** from the DataNodes.