diff --git a/content/BigData/Hadoop/Apache Hive.md b/content/BigData/Hadoop/Apache Hive.md index 7d6b5f0ff..59052de03 100644 --- a/content/BigData/Hadoop/Apache Hive.md +++ b/content/BigData/Hadoop/Apache Hive.md @@ -32,6 +32,7 @@ aliases: ![[Screenshot 2025-07-23 at 18.27.30.png|]] ##### Hive Usage +{% raw %} ``` #Start a hive shell: $hive @@ -57,3 +58,4 @@ $hive -e 'SELECT name FROM mta;' #Execute script from file $hive -f hive_script.txt ``` +{% endraw %} \ No newline at end of file diff --git a/content/BigData/Hadoop/Apache Spark.md b/content/BigData/Hadoop/Apache Spark.md index f3188d9c0..4cd1e978b 100644 --- a/content/BigData/Hadoop/Apache Spark.md +++ b/content/BigData/Hadoop/Apache Spark.md @@ -36,6 +36,7 @@ aliases: - Each [[RDD]] keeps track of how it was derived. If a node fails, Spark **recomputes only the lost partition** from the original transformations. ##### Writing Spark Code in Python +{% raw %} ``` # Spark Context Initialization from pyspark import SparkConf, SparkContext @@ -52,6 +53,8 @@ distData = sc.parallelize(data) distFile = sc.textFile("data.txt") distFile = sc.textFile("folder/*.txt") ``` +{% endraw %} + ##### **RDD Transformations (Lazy)** These create a new RDD from an existing one. diff --git a/content/BigData/Hadoop/HDFS.md b/content/BigData/Hadoop/HDFS.md index 276a5029a..99d6184b2 100644 --- a/content/BigData/Hadoop/HDFS.md +++ b/content/BigData/Hadoop/HDFS.md @@ -21,6 +21,7 @@ Stores huge files (Typical file size GB-TB) across multiple machines. - Parquet Files - Yet another RC file ##### HDFS Command Line +{% raw %} ``` # List files hadoop fs -ls /path @@ -34,7 +35,7 @@ hadoop fs -cat /file # Upload file hadoop fs -copyFromLocal file.txt hdfs://... ``` - +{% endraw %} #### HDFS Architecture – Main Components ##### **1.** NameNode (Master Node) - **Stores metadata** about the filesystem: