mirror of
https://github.com/jackyzha0/quartz.git
synced 2025-12-23 21:04:07 -06:00
21 lines
771 B
Markdown
21 lines
771 B
Markdown
## RDD (Resilient Distributed Dataset)
|
||
>RDD is an immutable (read only) distributed collection of objects.
|
||
>
|
||
>Dataset in RDD is divided into logical partitions, which may be computed on different nodes of the cluster
|
||
|
||
![[Screenshot 2025-07-23 at 19.08.40.png|600]]
|
||
##### **Key Properties:**
|
||
- Distributed: Automatically split across cluster nodes.
|
||
- Lazy Evaluation: Transformations aren’t executed until an action is called.
|
||
- Fault-tolerant: Can **recompute lost partitions** using lineage graph.
|
||
- Parallel: Operates concurrently across cluster cores.
|
||
##### Data Sharing
|
||
> In [[Hadoop]] [[MapReduce]]
|
||
![[Screenshot 2025-07-23 at 19.11.44.png|500]]
|
||
|
||
> In [[Apache Spark|Spark]]
|
||
![[Screenshot 2025-07-23 at 19.12.57.png|500]]
|
||
>10-100x faster than network and disk!
|
||
|
||
|