## RDD (Resilient Distributed Dataset)
>RDD is an immutable (read only) distributed collection of objects.
>
>Dataset in RDD is divided into logical partitions, which may be computed on different nodes of the cluster

![[Screenshot 2025-07-23 at 19.08.40.png|600]]
##### **Key Properties:**
- Distributed: Automatically split across cluster nodes.
- Lazy Evaluation: Transformations aren’t executed until an action is called.
- Fault-tolerant: Can **recompute lost partitions** using lineage graph.
- Parallel: Operates concurrently across cluster cores.
##### Data Sharing 
> In [[Hadoop]] [[MapReduce]]
![[Screenshot 2025-07-23 at 19.11.44.png|500]]

> In [[Apache Spark|Spark]]
![[Screenshot 2025-07-23 at 19.12.57.png|500]]
>10-100x faster than network and disk!