mirror of
https://github.com/jackyzha0/quartz.git
synced 2025-12-23 12:54:06 -06:00
Add my Obsidian notes
This commit is contained in:
parent
7b7a97b7cf
commit
b8edcf9d55
@ -32,6 +32,7 @@ aliases:
|
||||
![[Screenshot 2025-07-23 at 18.27.30.png|]]
|
||||
|
||||
##### Hive Usage
|
||||
{% raw %}
|
||||
```
|
||||
#Start a hive shell:
|
||||
$hive
|
||||
@ -57,3 +58,4 @@ $hive -e 'SELECT name FROM mta;'
|
||||
#Execute script from file
|
||||
$hive -f hive_script.txt
|
||||
```
|
||||
{% endraw %}
|
||||
@ -36,6 +36,7 @@ aliases:
|
||||
- Each [[RDD]] keeps track of how it was derived. If a node fails, Spark **recomputes only the lost partition** from the original transformations.
|
||||
|
||||
##### Writing Spark Code in Python
|
||||
{% raw %}
|
||||
```
|
||||
# Spark Context Initialization
|
||||
from pyspark import SparkConf, SparkContext
|
||||
@ -52,6 +53,8 @@ distData = sc.parallelize(data)
|
||||
distFile = sc.textFile("data.txt")
|
||||
distFile = sc.textFile("folder/*.txt")
|
||||
```
|
||||
{% endraw %}
|
||||
|
||||
##### **RDD Transformations (Lazy)**
|
||||
These create a new RDD from an existing one.
|
||||
|
||||
|
||||
@ -21,6 +21,7 @@ Stores huge files (Typical file size GB-TB) across multiple machines.
|
||||
- Parquet Files - Yet another RC file
|
||||
|
||||
##### HDFS Command Line
|
||||
{% raw %}
|
||||
```
|
||||
# List files
|
||||
hadoop fs -ls /path
|
||||
@ -34,7 +35,7 @@ hadoop fs -cat /file
|
||||
# Upload file
|
||||
hadoop fs -copyFromLocal file.txt hdfs://...
|
||||
```
|
||||
|
||||
{% endraw %}
|
||||
#### HDFS Architecture – Main Components
|
||||
##### **1.** NameNode (Master Node)
|
||||
- **Stores metadata** about the filesystem:
|
||||
|
||||
Loading…
Reference in New Issue
Block a user