diff --git a/content/.gitkeep b/content/.gitkeep deleted file mode 100644 index e69de29bb..000000000 diff --git a/content/AI&DATA/Data Engineering/Apache Airflow.md b/content/AI&DATA/Data Engineering/Apache Airflow.md new file mode 100644 index 000000000..4d8652dda --- /dev/null +++ b/content/AI&DATA/Data Engineering/Apache Airflow.md @@ -0,0 +1,2105 @@ +Source Code of Book [url](https://github.com/BasPH/data-pipelines-with-apache-airflow.git) + +Data pipelines with apache airflow book name + +# Chapter 1 + +Airflow’s key feature is that it enables you to easily build scheduled +data pipelines using a flexible Python framework, while also providing many building blocks that allow you to stitch together the many different technologies encountered in modern technological landscapes. + +Airflow is not a data processing tool in itself but orchestrates the different com- +ponents responsible for processing your data in data pipelines. + +![[Screenshot from 2023-06-08 00-47-21.png]] + + + + +The **Airflow scheduler** —Parses DAGs, checks their schedule interval, and (if the +DAGs’ schedule has passed) starts scheduling the DAGs’ tasks for execution by +passing them to the Airflow workers. + + +The **Airflow workers**—Pick up tasks that are scheduled for execution and execute +them. As such, the workers are responsible for actually “doing the work.” + +The **Airflow webserver** —Visualizes the DAGs parsed by the scheduler and provides +the main interface for users to monitor DAG runs and their results. + +![[Screenshot from 2023-06-08 01-03-45.png]] + +![[Pasted image 20230609230856.png]] + +This property of Airflow’s schedule intervals is invaluable for implementing efficient data pipelines, as it allows you to build incremental data pipelines. In these incremental pipelines, each DAG run processes only data for the corresponding **time slot (the data’s delta)** instead of having to reprocess the entire data set every time. Especially for larger data sets, this can provide significant time and cost benefits by avoiding expensive recomputation of existing results. Schedule intervals become even more powerful when combined with the concept of #backfillingAirflow, which allows you to execute a new DAG for historical schedule intervals that occurred in the past. This feature allows you to easily create (or backfill) new data sets with historical data simply by running your DAG for these past schedule intervals. Moreover, by clearing the results of past runs, you can also use this Airflow feature to easily rerun any historical tasks if you make changes to your task code, allowing you to easily reprocess an entire data set when needed. +# Chapter 2 + +# Anatomy of an Airflow DAG + +![[Pasted image 20230609232930.png]] + +Download_launches >> get_pictures >>notify is our pipeline. +*DAGS for Downloading and proceeding rocket launch data* +```python +import json + +import pathlib + + + +import airflow.utils.dates + +import requests + +import requests.exceptions as requests_exceptions + +from airflow import DAG + +from airflow.operators.bash import BashOperator + +from airflow.operators.python import PythonOperator + + + +dag = DAG( + +dag_id="download_rocket_launches", + +description="Download rocket pictures of recently launched rockets.", + +start_date=airflow.utils.dates.days_ago(14), + +schedule_interval="@daily", + +) + + + +download_launches = BashOperator( + +task_id="download_launches", + +bash_command="curl -o /tmp/launches.json -L 'https://ll.thespacedevs.com/2.0.0/launch/upcoming'", # noqa: E501 + +dag=dag, + +) + + + + +def _get_pictures(): + +# Ensure directory exists + +pathlib.Path("/tmp/images").mkdir(parents=True, exist_ok=True) + + + +# Download all pictures in launches.json + +with open("/tmp/launches.json") as f: + +launches = json.load(f) + +image_urls = [launch["image"] for launch in launches["results"]] + +for image_url in image_urls: + +try: + +response = requests.get(image_url) + +image_filename = image_url.split("/")[-1] + +target_file = f"/tmp/images/{image_filename}" + +with open(target_file, "wb") as f: + +f.write(response.content) + +print(f"Downloaded {image_url} to {target_file}") + +except requests_exceptions.MissingSchema: + +print(f"{image_url} appears to be an invalid URL.") + +except requests_exceptions.ConnectionError: + +print(f"Could not connect to {image_url}.") + + + + +get_pictures = PythonOperator( + +task_id="get_pictures", python_callable=_get_pictures, dag=dag + +) + + + +notify = BashOperator( + +task_id="notify", + +bash_command='echo "There are now $(ls /tmp/images/ | wc -l) images."', + +dag=dag, + +) + + + +download_launches >> get_pictures >> notify +``` + +Each operator performs a single unit of work, and multiple operators together form a +workflow or DAG in Airflow. Operators run independently of each other, although +you can define the order of execution, which we call dependencies in Airflow: +download_launches >> get_pictures >> notify + +**Tasks vs. operators** +In this context and throughout the Airflow documentation, we see the terms operator and task used interchangeably. From a user’s perspective, they refer to the same +thing, and the two often substitute each other in discussions. Operators provide the +implementation of a piece of work. Airflow has a class called BaseOperator and many +subclasses inheriting from the BaseOperator, such as PythonOperator, EmailOperator, +and OracleOperator. + +![[Screenshot from 2023-06-08 01-23-04.png]] + + +NOTE It is unnecessary to restart the entire workflow. A nice feature of Airflow is that you can restart from the point of failure and onward, without having to restart any previously succeeded tasks. + +![[Pasted image 20230610004618.png]] + + +# Scheduling in Airflow +```python +from datetime import datetime + +from pathlib import Path + + + +import pandas as pd + +from airflow import DAG + +from airflow.operators.bash import BashOperator + +from airflow.operators.python import PythonOperator + + + +dag = DAG( + +dag_id="01_unscheduled", start_date=datetime(2019, 1, 1), schedule_interval=None + +) + + + +fetch_events = BashOperator( + +task_id="fetch_events", + +bash_command=( + +"mkdir -p /data/events && " + +"curl -o /data/events.json http://events_api:5000/events" + +), + +dag=dag, + +) + + + + +def _calculate_stats(input_path, output_path): + +"""Calculates event statistics.""" + + + +Path(output_path).parent.mkdir(exist_ok=True) + + + +events = pd.read_json(input_path) + +stats = events.groupby(["date", "user"]).size().reset_index() + + + +stats.to_csv(output_path, index=False) + + + + +calculate_stats = PythonOperator( + +task_id="calculate_stats", + +python_callable=_calculate_stats, + +op_kwargs={"input_path": "/data/events.json", "output_path": "/data/stats.csv"}, + +dag=dag, + +) + + + +fetch_events >> calculate_stats + +``` + + +```python +#schedule intervals for every 3 days +""" +This would result in our DAG being run every three days following the start date (on the 4th, 7th, 10th, and so on of January 2019). Of course, you can also use this approach to run your DAG every 10 minutes (using timedelta(minutes=10)) or every two hours (using timedelta(hours=2)) +""" +dag = DAG( dag_id="04_time_delta", schedule_interval=dt.timedelta(days=3), start_date=dt.datetime(year=2019, month=1, day=1), end_date=dt.datetime(year=2019, month=1, day=5), ) +``` + + + +In Airflow, we can use these execution dates by referencing them in our operators. For example, in the BashOperator, we can use Airflow’s templating functionality to include the execution dates dynamically in our Bash command. Templating is covered in detail in chapter 4. + + +```python +import datetime as dt + +from pathlib import Path + + + +import pandas as pd + + + +from airflow import DAG + +from airflow.operators.bash import BashOperator + +from airflow.operators.python import PythonOperator + + + +dag = DAG( + +dag_id="06_templated_query", + +schedule_interval="@daily", + +start_date=dt.datetime(year=2019, month=1, day=1), + +end_date=dt.datetime(year=2019, month=1, day=5), + +) + + + +fetch_events = BashOperator( + +task_id="fetch_events", +# THIS ONE IS IMPORTANT +bash_command=( + +"mkdir -p /data/events && " + +"curl -o /data/events.json " + +"http://events_api:5000/events?" + +"start_date={{execution_date.strftime('%Y-%m-%d')}}&" + +"end_date={{next_execution_date.strftime('%Y-%m-%d')}}" + +), + +dag=dag, + +) + + + + +def _calculate_stats(input_path, output_path): + +"""Calculates event statistics.""" + + + +events = pd.read_json(input_path) + +stats = events.groupby(["date", "user"]).size().reset_index() + + + +Path(output_path).parent.mkdir(exist_ok=True) + +stats.to_csv(output_path, index=False) + + + + +calculate_stats = PythonOperator( + +task_id="calculate_stats", + +python_callable=_calculate_stats, + +op_kwargs={"input_path": "/data/events.json", "output_path": "/data/stats.csv"}, + +dag=dag, + +) + + + +fetch_events >> calculate_stats +``` + +![[Screenshot from 2023-06-08 14-12-22.png]] + +![[Screenshot from 2023-06-08 14-14-44.png]] + + +Without an end date, Airflow will (in principle) keep executing our DAG on this daily schedule until the end of time. However, if we already know that our project has a fixed duration, we can tell Airflow to stop running our DAG after a certain date using the end_date parameter. + + + + +AIRFLOW schedule_interval paramater uses linux cron jobs syntax + +![[Screenshot from 2023-06-08 14-37-30.png]] + +```python +dag = DAG( dag_id="04_time_delta", schedule_interval=dt.timedelta(days=3), start_date=dt.datetime(year=2019, month=1, day=1), end_date=dt.datetime(year=2019, month=1, day=5), ) + +``` + +# Templating tasks using the Airflow context + +In Airflow, you have a number of variables available at runtime from the task context. One of these variables is execution_date. Airflow uses the Pendulum (https:// pendulum.eustace.io) library for datetimes, and execution_date is such a Pendulum datetime object. It is a drop-in replacement for native Python datetime, so all methods that can be applied to Python can also be applied to Pendulum. Just like you can do datetime.now().year, you get the same result with pendulum.now().year. + + ### Bash Operator templating +```python +import airflow +from airflow import DAG +from airflow.operators.bash import BashOperator + +dag = DAG( + dag_id="listing_4_01", + start_date=airflow.utils.dates.days_ago(3), + schedule_interval="@hourly", +) + +get_data = BashOperator( + task_id="get_data", + bash_command=( + "curl -o /tmp/wikipageviews.gz " + "https://dumps.wikimedia.org/other/pageviews/" + "{{ execution_date.year }}/" + "{{ execution_date.year }}-{{ '{:02}'.format(execution_date.month) }}/" + "pageviews-{{ execution_date.year }}" + "{{ '{:02}'.format(execution_date.month) }}" + "{{ '{:02}'.format(execution_date.day) }}-" + "{{ '{:02}'.format(execution_date.hour) }}0000.gz" + ), + dag=dag, +) +``` + +![[Screenshot from 2023-06-08 15-53-58.png]] + + +### Python operator templating + +```python +from urllib import request + +import airflow.utils.dates +from airflow import DAG +from airflow.operators.python import PythonOperator + +dag = DAG( + dag_id="listing_4_05", + start_date=airflow.utils.dates.days_ago(1), + schedule_interval="@hourly", +) + + +def _get_data(execution_date): + year, month, day, hour, *_ = execution_date.timetuple() + url = ( + "https://dumps.wikimedia.org/other/pageviews/" + f"{year}/{year}-{month:0>2}/pageviews-{year}{month:0>2}{day:0>2}-{hour:0>2}0000.gz" + ) + output_path = "/tmp/wikipageviews.gz" + request.urlretrieve(url, output_path) + + +get_data = PythonOperator(task_id="get_data", python_callable=_get_data, dag=dag) +``` + +In Apache Airflow, `op_args` and `op_kwargs` are both used to pass arguments to a PythonOperator. However, there is a key difference between the two: `op_args` is a list of positional arguments, while `op_kwargs` is a dictionary of keyword arguments. + +- **op_args** + +`op_args` is a list of positional arguments that will be unpacked when calling the callable. For example, if you have a Python function that takes two arguments, you can pass them to the PythonOperator using the `op_args` argument: + +Code snippet + +```python +from airflow.operators.python import PythonOperator + +def my_function(arg1, arg2): + print(arg1, arg2) + +operator = PythonOperator( + task_id="my_task", + python_callable=my_function, + op_args=["arg1", "arg2"], +) +``` + +When the `operator` is run, the `my_function` function will be called with the arguments `arg1` and `arg2`. + +- **op_kwargs** + +`op_kwargs` is a dictionary of keyword arguments that will get unpacked in your function. For example, if you have a Python function that takes two keyword arguments, you can pass them to the PythonOperator using the `op_kwargs` argument: + +Code snippet + +```python +from airflow.operators.python import PythonOperator + +def my_function(arg1, arg2): + print(arg1, arg2) + +operator = PythonOperator( + task_id="my_task", + python_callable=my_function, + op_kwargs={"arg1": "arg1_value", "arg2": "arg2_value"}, +) +``` + +When the `operator` is run, the `my_function` function will be called with the arguments `arg1="arg1_value"` and `arg2="arg2_value"`. + +- **Which one should you use?** + +In general, you should use `op_kwargs` if you need to pass keyword arguments to your Python function. However, if you only need to pass positional arguments, you can use `op_args`. + +Here is a table that summarizes the differences between `op_args` and `op_kwargs`: + +Argument + +Description + +`op_args` + +A list of positional arguments that will be unpacked when calling the callable. + +`op_kwargs` + +A dictionary of keyword arguments that will get unpacked in your function. + + + + +This code currently prints the found pageview count, and now we want to connect the dots by writing those results to the Postgres table. The PythonOperator currently prints the results but does not write to the database, so we’ll need a second task to write the results. In Airflow, there are two ways of passing data between tasks:  By using the Airflow metastore to write and read results between tasks. This is called XCom and covered in chapter 5.  By writing results to and from a persistent location (e.g., disk or database) between tasks. Airflow tasks run independently of each other, possibly on different physical machines depending on your setup, and therefore cannot share objects in memory. Data between tasks must therefore be persisted elsewhere, where it resides after a task finishes and can be read by another task. + +Airflow provides one mechanism out of the box called XCom, which allows storing and later reading any picklable object in the Airflow metastore. Pickle is Python’s serialization protocol, and serialization means converting an object in memory to a format that can be stored on disk to be read again later, possibly by another process. By default, all objects built from basic Python types (e.g., string, int, dict, list) can be pickled. + +By default, Airflow will schedule and run any past schedule intervals that have not been run. As such, specifying a past start date and activating the corresponding DAG will result in all intervals that have passed before the current time being executed. This behavior is controlled by the DAG catchup parameter and can be disabled by setting catchup to false + +# Code for no catchup + +```python +import datetime as dt + +from pathlib import Path + + + +import pandas as pd + + + +from airflow import DAG + +from airflow.operators.bash import BashOperator + +from airflow.operators.python import PythonOperator + + + +dag = DAG( + +dag_id="09_no_catchup", + +schedule_interval="@daily", + +start_date=dt.datetime(year=2019, month=1, day=1), + +end_date=dt.datetime(year=2019, month=1, day=5), + +catchup=False, + +) +``` + + +# Best practices for designing tasks + + +Airflow tasks: atomicity and idempotency. + +## Atomicity + +The term atomicity is frequently used in database systems, where an atomic transaction is considered an indivisible and irreducible series of database operations such that either all occur or nothing occurs. Similarly, in Airflow, tasks should be defined so that Start date Now Start date Now Catchup = false Airflow starts processing, including past intervals (= backfilling). Catchup = true (default) Current interval Current interval Airflow starts processing from the current interval. Figure 3.8 Backfilling in Airflow. By default, Airflow will run tasks for all past intervals up to the current time. This behavior can be disabled by setting the catchup parameter of a DAG to false, in which case Airflow will only start executing tasks from the current interval. Scheduling in Airflow they either succeed and produce some proper result or fail in a manner that does not affect the state of the system +![[Screenshot from 2023-06-10 21-01-40.png]] + +Example: Sending an email after writing to CSV creates two pieces of work in a single function, which breaks the atomicity of the task.To implement this functionality in an atomic fashion, we could simply split the email functionality into a separate task. + +## Idempotency + +Tasks are said to be idempotent if calling the same task multiple times with the same inputs has no additional effect. This means that rerunning a task without changing the inputs should not change the overall output. + +```python +fetch_events = BashOperator( task_id="fetch_events", + bash_command=( "mkdir -p /data/events && " "curl -o /data/events/{{ds}}.json " + "http:/ /localhost:5000/events?" "start_date={{ds}}&" "end_date={{next_ds}}" ), dag=dag, ) +``` +Rerunning this task for a given date would result in the task fetching the same set of events as its previous execution (assuming the date is within our 30-day window), and overwriting the existing JSON file in the /data/events folder, producing the same result. As such, this implementation of the fetch events task is clearly idempotent. + +![[Screenshot from 2023-06-10 21-06-43.png]] + +DAGs can run at regular intervals by setting the schedule interval. + The work for an interval is started at the end of the interval. + The schedule interval can be configured with cron and timedelta expressions.  Data can be processed incrementally by dynamically setting variables with templating. + The execution date refers to the start datetime of the interval, not to the actual time of execution. + A DAG can be run back in time with backfilling. + Idempotency ensures tasks can be rerun while producing the same output results + +# Templating tasks using the Airflow context + +![[Screenshot from 2023-06-11 02-50-56.png]] + +```python +import airflow.utils.dates + +from airflow import DAG + +from airflow.operators.python import PythonOperator + + + +dag = DAG( + +dag_id="listing_4_08", + +start_date=airflow.utils.dates.days_ago(3), + +schedule_interval="@daily", + +) + + + + +def _print_context(**context): + +start = context["execution_date"] + +end = context["next_execution_date"] + +print(f"Start: {start}, end: {end}") + + + + +# Prints e.g.: + +# Start: 2019-07-13T14:00:00+00:00, end: 2019-07-13T15:00:00+00:00 + + + + +print_context = PythonOperator( + +task_id="print_context", python_callable=_print_context, dag=dag + +) +``` + + +# Providing User defined Variables to Python Operators + +```python +def _get_data(year, month, day, hour, output_path, **_): + +url = ( + +"https://dumps.wikimedia.org/other/pageviews/" + +f"{year}/{year}-{month:0>2}/pageviews-{year}{month:0>2}{day:0>2}-{hour:0>2}0000.gz" + +) + +request.urlretrieve(url, output_path) + + + + +get_data = PythonOperator( + +task_id="get_data", + +python_callable=_get_data, + +op_kwargs={ + +"year": "{{ execution_date.year }}", + +"month": "{{ execution_date.month }}", + +"day": "{{ execution_date.day }}", + +"hour": "{{ execution_date.hour }}", + +"output_path": "/tmp/wikipageviews.gz", + +}, + +dag=dag, + +) +``` + +In Apache Airflow, `op_args` and `op_kwargs` are parameters used in the task definition to pass arguments to operators. + +Operators in Airflow are the building blocks of workflows, representing individual tasks that need to be executed. Each operator has a set of arguments that define its behavior. However, in some cases, you may want to pass dynamic or variable values to these arguments when defining the tasks. + +That's where `op_args` and `op_kwargs` come in. + +`op_args` is used to pass a list of arguments to an operator. These arguments are positional and must be provided in the correct order expected by the operator. For example: + +```python +op_args=['value1', 'value2', 'value3'] +``` + +On the other hand, `op_kwargs` is used to pass a dictionary of keyword arguments to an operator. This allows you to specify the arguments by their names, regardless of their order. For example: + +```python +op_kwargs={'arg1': 'value1', 'arg2': 'value2', 'arg3': 'value3'} +``` + +Both `op_args` and `op_kwargs` can be used together, allowing you to pass a combination of positional and keyword arguments to an operator. For example: + +```python +op_args=['value1'] +op_kwargs={'arg2': 'value2', 'arg3': 'value3'} +``` + +When defining a task in Airflow, you can use these parameters to pass arguments to the operator. Here's an example of how you can use `op_args` and `op_kwargs` while defining a task: + +```python +my_task = MyOperator(task_id='my_task_id', op_args=['value1'], op_kwargs={'arg2': 'value2'}) +``` + +In this example, `my_task` is an instance of the `MyOperator` class, and it will receive `'value1'` as a positional argument and `'value2'` as a keyword argument with the name `'arg2'`. The operator can then use these values during its execution. + +Using `op_args` and `op_kwargs` provides flexibility in passing dynamic values to operators, allowing you to customize their behavior based on the specific context or requirements of your workflow. + +A useful tool to debug issues with templated arguments is the Airflow UI. You can inspect the templated argument values after running a task by selecting it in either the graph or tree view and clicking the Rendered Template button + +![[Screenshot from 2023-06-11 16-50-37.png]] + + +The CLI provides us with exactly the same information as shown in the Airflow UI, without having to run a task, which makes it easier to inspect the result. The command to render templates using the CLI is +```python +airflow tasks render [dag id] [task id] [desired execution date] +``` + + +### Hooking up other systems + +![[Screenshot from 2023-06-11 16-56-37.png]] + +it’s typically advised to apply XComs only for transferring small pieces of data such as a handful of strings (e.g., a list of names). + +### What is XCom how it is works? +In Apache Airflow, Xcom (short for cross-communication) is a mechanism that allows tasks to exchange small amounts of data between them. It serves as a communication channel for sharing information or passing values between different tasks within a workflow. + +The Xcom system in Airflow works as follows: + +1. During the execution of a task, an operator can produce output or results that need to be shared with other tasks. This output could be a value, a small dataset, or any other piece of information. + +2. The task can use the `xcom_push()` method to push the output to the Xcom system. This method takes a `key` and a `value` as parameters. The `key` is used to identify the output data, while the `value` represents the actual data to be shared. + +3. Other tasks in the workflow can retrieve the output of a previous task by using the `xcom_pull()` method. This method takes the `task_ids` and an optional `key` parameter. It returns the value associated with the specified `key` from the specified task. + +4. The Xcom system stores the output data in its backend database, which can be a relational database like MySQL or PostgreSQL, or a key-value store like Redis, depending on the configuration. + +By default, Airflow stores Xcom data in its metadata database, but you can also configure it to use an external database or message broker for scalability and durability. + +Here's an example that demonstrates the usage of Xcom in Airflow: + +```python +from airflow import DAG +from airflow.operators.python import PythonOperator + +def push_data(**context): + data = "Hello, Airflow!" + context['ti'].xcom_push(key='my_key', value=data) + +def pull_data(**context): + data = context['ti'].xcom_pull(key='my_key', task_ids='push_task') + print(data) + +with DAG('xcom_example', start_date=datetime(2023, 6, 11), schedule_interval=None) as dag: + push_task = PythonOperator(task_id='push_task', python_callable=push_data) + pull_task = PythonOperator(task_id='pull_task', python_callable=pull_data) + + push_task >> pull_task +``` + +In this example, the `push_data()` function is a Python callable used as an operator. It pushes the string "Hello, Airflow!" to the Xcom system using the `xcom_push()` method. + +The `pull_data()` function is another Python callable used as an operator. It retrieves the value from the Xcom system using the `xcom_pull()` method and prints it. + +The `push_task` and `pull_task` are instances of the `PythonOperator` class, representing the tasks in the workflow. The output of the `push_task` is shared with the `pull_task` using the Xcom system. + +When the workflow is executed, the `push_task` pushes the data to Xcom, and the `pull_task` pulls the data from Xcom and prints it. + +Xcom provides a simple way to share information between tasks, enabling coordination and data sharing within an Airflow workflow. + + Some arguments of operators can be templated. + Templating happens at runtime. + Templating the PythonOperator works different from other operators; variables are passed to the provided callable. + The result of templated arguments can be checked with airflow tasks render.  Operators can communicate with other systems via hooks. + Operators describe what to do; hooks determine how to do work. + + +# Defining dependencies between tasks + +![[Screenshot from 2023-06-11 17-21-28.png]] + +```python +import airflow + +from airflow import DAG +from airflow.operators.dummy import DummyOperator + +with DAG( + dag_id="01_start", + start_date=airflow.utils.dates.days_ago(3), + schedule_interval="@daily", +) as dag: + start = DummyOperator(task_id="start") + + fetch_sales = DummyOperator(task_id="fetch_sales") + clean_sales = DummyOperator(task_id="clean_sales") + + fetch_weather = DummyOperator(task_id="fetch_weather") + clean_weather = DummyOperator(task_id="clean_weather") + + join_datasets = DummyOperator(task_id="join_datasets") + train_model = DummyOperator(task_id="train_model") + deploy_model = DummyOperator(task_id="deploy_model") + + start >> [fetch_sales, fetch_weather] + fetch_sales >> clean_sales + fetch_weather >> clean_weather + [clean_sales, clean_weather] >> join_datasets + join_datasets >> train_model >> deploy_model +``` + +# BranchPythonOperator + +In Apache Airflow, the `BranchPythonOperator` is an operator that allows you to create conditional branching within your workflows. It enables you to execute different tasks or branches based on the result of a Python function that you define. + +The `BranchPythonOperator` works as follows: + +1. When defining your workflow, you specify a `BranchPythonOperator` task, which includes the following parameters: + - `task_id`: A unique identifier for the task. + - `python_callable`: A Python function that determines the branching logic. This function should return the task ID of the next task to execute based on the current context. + - Other optional parameters, such as `provide_context` to pass the context to the Python function. + +2. During task execution, the `BranchPythonOperator` calls the specified `python_callable` function, passing the context as an argument. The context includes information such as the current execution date, task instance, and other relevant details. + +3. The `python_callable` function evaluates the necessary conditions based on the context and returns the task ID of the next task to execute. The returned task ID should match the `task_id` of one of the downstream tasks. + +4. The `BranchPythonOperator` uses the returned task ID to determine the next task to execute in the workflow. It dynamically sets the downstream dependency based on the returned task ID. + +Here's an example to illustrate the usage of `BranchPythonOperator`: + +```python +from airflow import DAG +from airflow.operators.python import BranchPythonOperator +from datetime import datetime + +def decide_branch(**context): + current_hour = datetime.now().hour + if current_hour < 12: + return 'morning_task' + else: + return 'afternoon_task' + +with DAG('branching_example', start_date=datetime(2023, 6, 11), schedule_interval=None) as dag: + decide_branch_task = BranchPythonOperator( + task_id='decide_branch_task', + python_callable=decide_branch + ) + + morning_task = ... + afternoon_task = ... + + decide_branch_task >> [morning_task, afternoon_task] +``` + +In this example, the `decide_branch()` function is the Python callable that determines the branching logic. It checks the current hour and returns the task ID of either `'morning_task'` or `'afternoon_task'` based on the result. + +The `decide_branch_task` is an instance of the `BranchPythonOperator` class, representing the branching task in the workflow. It uses the `decide_branch()` function to determine the next task to execute dynamically. + +The `morning_task` and `afternoon_task` are downstream tasks, and the dependency is set based on the result of the `decide_branch_task`. + +By using the `BranchPythonOperator`, you can create dynamic and conditional workflows in Airflow, allowing different branches of the workflow to be executed based on the outcome of the Python function. + +### Branching example from book + +Take a look at this line carefully +```python +join_datasets = DummyOperator(task_id="join_datasets", trigger_rule="none_failed") +``` + +![[Screenshot from 2023-06-11 17-36-45.png]] + + +```python +import airflow + +from airflow import DAG +from airflow.operators.dummy import DummyOperator +from airflow.operators.python import PythonOperator, BranchPythonOperator + +ERP_CHANGE_DATE = airflow.utils.dates.days_ago(1) + +def _pick_erp_system(**context): + if context["execution_date"] < ERP_CHANGE_DATE: + return "fetch_sales_old" + else: + return "fetch_sales_new" + +def _fetch_sales_old(**context): + print("Fetching sales data (OLD)...") + +def _fetch_sales_new(**context): + print("Fetching sales data (NEW)...") + +def _clean_sales_old(**context): + print("Preprocessing sales data (OLD)...") + +def _clean_sales_new(**context): + print("Preprocessing sales data (NEW)...") + +with DAG( + dag_id="03_branch_dag", + start_date=airflow.utils.dates.days_ago(3), + schedule_interval="@daily", +) as dag: + start = DummyOperator(task_id="start") + + pick_erp_system = BranchPythonOperator( + task_id="pick_erp_system", python_callable=_pick_erp_system + ) + + fetch_sales_old = PythonOperator( + task_id="fetch_sales_old", python_callable=_fetch_sales_old + ) + clean_sales_old = PythonOperator( + task_id="clean_sales_old", python_callable=_clean_sales_old + ) + + fetch_sales_new = PythonOperator( + task_id="fetch_sales_new", python_callable=_fetch_sales_new + ) + clean_sales_new = PythonOperator( + task_id="clean_sales_new", python_callable=_clean_sales_new + ) + + fetch_weather = DummyOperator(task_id="fetch_weather") + clean_weather = DummyOperator(task_id="clean_weather") + + # Using the wrong trigger rule ("all_success") results in tasks being skipped downstream. + # join_datasets = DummyOperator(task_id="join_datasets") + + join_datasets = DummyOperator(task_id="join_datasets", trigger_rule="none_failed") + train_model = DummyOperator(task_id="train_model") + deploy_model = DummyOperator(task_id="deploy_model") + + start >> [pick_erp_system, fetch_weather] + pick_erp_system >> [fetch_sales_old, fetch_sales_new] + fetch_sales_old >> clean_sales_old + fetch_sales_new >> clean_sales_new + fetch_weather >> clean_weather + [clean_sales_old, clean_sales_new, clean_weather] >> join_datasets + join_datasets >> train_model >> deploy_model +``` + +In Apache Airflow, Trigger Rules are used to define the conditions under which a task should be triggered or skipped during workflow execution. Each task in Airflow can have a trigger rule associated with it, which determines how the task's execution is affected by the status of its upstream tasks. + +Here are the available trigger rules in Airflow: + +1. `all_success` (default): The task will be triggered only if all of its upstream tasks have succeeded. If any upstream task has failed, been skipped, or is in a state other than success, the task will be skipped. + +2. `all_failed`: The task will be triggered only if all of its upstream tasks have failed. If any upstream task has succeeded, been skipped, or is in a state other than failure, the task will be skipped. + +3. `all_done`: The task will be triggered only if all of its upstream tasks have completed, regardless of their status. If any upstream task is still running or has been skipped, the task will be skipped. + +4. `one_success`: The task will be triggered if at least one of its upstream tasks has succeeded. It will be skipped only if all of its upstream tasks have failed or have been skipped. + +5. `one_failed`: The task will be triggered if at least one of its upstream tasks has failed. It will be skipped only if all of its upstream tasks have succeeded or have been skipped. + +6. `none_failed`: The task will be triggered if none of its upstream tasks have failed. It will be skipped if any of its upstream tasks have failed, even if others have succeeded or been skipped. + +To apply a trigger rule to a task in Airflow, you can set the `trigger_rule` parameter when defining the task. Here's an example: + +```python +from airflow import DAG +from airflow.operators.dummy import DummyOperator +from datetime import datetime + +with DAG('trigger_rule_example', start_date=datetime(2023, 6, 11), schedule_interval=None) as dag: + task1 = DummyOperator(task_id='task1') + task2 = DummyOperator(task_id='task2', trigger_rule='all_done') + task3 = DummyOperator(task_id='task3', trigger_rule='one_failed') + + task1 >> task2 + task1 >> task3 +``` + +In this example, we have three tasks: `task1`, `task2`, and `task3`. `task1` is connected to both `task2` and `task3`. + +- `task2` has a trigger rule of `'all_done'`, so it will only be triggered if both upstream tasks (`task1`) have completed, regardless of their status. +- `task3` has a trigger rule of `'one_failed'`, so it will be triggered if at least one upstream task (`task1`) has failed. It will be skipped only if all upstream tasks have succeeded or been skipped. + +By setting different trigger rules for tasks, you can define complex dependencies and conditions within your workflows, ensuring that tasks are executed or skipped based on the desired logic and the status of their upstream tasks. + +![[Screenshot from 2023-06-11 17-41-02.png]] + +```python +join_erp = DummyOperator(task_id="join_erp_branch", trigger_rule="none_failed") +``` + +### Conditional tasks + + +![[Screenshot from 2023-06-11 17-51-48.png]] + +```python + +import airflow +import pendulum + +from airflow import DAG +from airflow.exceptions import AirflowSkipException +from airflow.operators.dummy import DummyOperator +from airflow.operators.python import PythonOperator, BranchPythonOperator + +ERP_CHANGE_DATE = airflow.utils.dates.days_ago(1) + +def _pick_erp_system(**context): + if context["execution_date"] < ERP_CHANGE_DATE: + return "fetch_sales_old" + else: + return "fetch_sales_new" + +def _latest_only(**context): + now = pendulum.now("UTC") + left_window = context["dag"].following_schedule(context["execution_date"]) + right_window = context["dag"].following_schedule(left_window) + + if not left_window < now <= right_window: + raise AirflowSkipException() + +with DAG( + dag_id="06_condition_dag", + start_date=airflow.utils.dates.days_ago(3), + schedule_interval="@daily", +) as dag: + start = DummyOperator(task_id="start") + + pick_erp = BranchPythonOperator( + task_id="pick_erp_system", python_callable=_pick_erp_system + ) + + fetch_sales_old = DummyOperator(task_id="fetch_sales_old") + clean_sales_old = DummyOperator(task_id="clean_sales_old") + + fetch_sales_new = DummyOperator(task_id="fetch_sales_new") + clean_sales_new = DummyOperator(task_id="clean_sales_new") + + join_erp = DummyOperator(task_id="join_erp_branch", trigger_rule="none_failed") + + fetch_weather = DummyOperator(task_id="fetch_weather") + clean_weather = DummyOperator(task_id="clean_weather") + + join_datasets = DummyOperator(task_id="join_datasets") + train_model = DummyOperator(task_id="train_model") + + latest_only = PythonOperator(task_id="latest_only", python_callable=_latest_only) + + deploy_model = DummyOperator(task_id="deploy_model") + + start >> [pick_erp, fetch_weather] + pick_erp >> [fetch_sales_old, fetch_sales_new] + fetch_sales_old >> clean_sales_old + fetch_sales_new >> clean_sales_new + [clean_sales_old, clean_sales_new] >> join_erp + fetch_weather >> clean_weather + [join_erp, clean_weather] >> join_datasets + join_datasets >> train_model >> deploy_model + latest_only >> deploy_model + +``` + +![[Screenshot from 2023-06-11 17-53-44.png]] + + +### Shared Data between task + +#### Sharing data using XComs +```python +import uuid + +import airflow +from airflow import DAG + +from airflow.operators.dummy import DummyOperator + +from airflow.operators.python import PythonOperator + +def _train_model(**context): + +model_id = str(uuid.uuid4()) + +context["task_instance"].xcom_push(key="model_id", value=model_id) + + + + +def _deploy_model(**context): + +model_id = context["task_instance"].xcom_pull( + +task_ids="train_model", key="model_id" + +) + +print(f"Deploying model {model_id}") + + + + +with DAG( + +dag_id="10_xcoms", + +start_date=airflow.utils.dates.days_ago(3), + +schedule_interval="@daily", + +) as dag: + +start = DummyOperator(task_id="start") + + + +fetch_sales = DummyOperator(task_id="fetch_sales") + +clean_sales = DummyOperator(task_id="clean_sales") + + + +fetch_weather = DummyOperator(task_id="fetch_weather") + +clean_weather = DummyOperator(task_id="clean_weather") + + + +join_datasets = DummyOperator(task_id="join_datasets") + + + +train_model = PythonOperator(task_id="train_model", python_callable=_train_model) + + + +deploy_model = PythonOperator(task_id="deploy_model", python_callable=_deploy_model) + + + +start >> [fetch_sales, fetch_weather] + +fetch_sales >> clean_sales + +fetch_weather >> clean_weather + +[clean_sales, clean_weather] >> join_datasets + +join_datasets >> train_model >> deploy_model +``` + + +### Chaining Python tasks with the Taskflow API + +dependencies called the Taskflow API. Although not without its flaws, the Taskflow API can considerably simplify your code if you’re primarily using PythonOperators and passing data between them as XComs. + +![[Screenshot from 2023-06-11 17-56-54.png]] + + + +### PART 2 Beyond basics + + +### Triggering Workflows + +#### Polling conditions with sensors + +```python +import airflow.utils.dates + +from airflow import DAG + +from airflow.operators.dummy import DummyOperator + + + +dag = DAG( + +dag_id="figure_6_01", + +start_date=airflow.utils.dates.days_ago(3), + +schedule_interval="0 16 * * *", + +description="A batch workflow for ingesting supermarket promotions data, demonstrating the FileSensor.", + +default_args={"depends_on_past": True}, + +) + + + +create_metrics = DummyOperator(task_id="create_metrics", dag=dag) + + + +for supermarket_id in [1, 2, 3, 4]: + + copy = DummyOperator(task_id=f"copy_to_raw_supermarket_{supermarket_id}", dag=dag) + + process = DummyOperator(task_id=f"process_supermarket_{supermarket_id}", dag=dag) + +copy >> process >> create_metrics +``` + +![[Pasted image 20230611182025.png]] + +### implementing with FileSensor + +```python +import airflow.utils.dates + +from airflow import DAG + +from airflow.operators.dummy import DummyOperator + +from airflow.sensors.filesystem import FileSensor + + + +dag = DAG( + +dag_id="figure_6_05", + +start_date=airflow.utils.dates.days_ago(3), + +schedule_interval="0 16 * * *", + +description="A batch workflow for ingesting supermarket promotions data, demonstrating the FileSensor.", + +default_args={"depends_on_past": True}, + +) + + + +create_metrics = DummyOperator(task_id="create_metrics", dag=dag) + + + +for supermarket_id in [1, 2, 3, 4]: + +wait = FileSensor( + +task_id=f"wait_for_supermarket_{supermarket_id}", + +filepath=f"/data/supermarket{supermarket_id}/data.csv", + +dag=dag, + +) + +copy = DummyOperator(task_id=f"copy_to_raw_supermarket_{supermarket_id}", dag=dag) + +process = DummyOperator(task_id=f"process_supermarket_{supermarket_id}", dag=dag) + +wait >> copy >> process >> create_metrics +``` + +![[Pasted image 20230611182221.png]] + +By default, the sensor timeout is set to seven days. If the DAG schedule_interval is set to once a day, this will lead to an undesired snowball effect—which is surprisingly easy to encounter with many DAGs! The DAG runs once a day, and supermarkets 2, 3, and 4 will fail after seven days, as shown in figure 6.7. However, new DAG runs are added every day and the sensors for those respective days are started, and as a result more and more tasks start running. Here’s the catch: there’s a limit to the number of tasks Airflow can handle and will run (on various levels). + +Setting the maximum number of concurrent tasks in a DAG +```python +dag = DAG( dag_id="couponing_app", start_date=datetime(2019, 1, 1), schedule_interval="0 0 * * *", concurrency=50, ) +``` + + Day 1: Supermarket 1 succeeded; supermarkets 2, 3, and 4 are polling, occupying 3 tasks. + Day 2: Supermarket 1 succeeded; supermarkets 2, 3, and 4 are polling, occupying 6 tasks. + Day 3: Supermarket 1 succeeded; supermarkets 2, 3, and 4 are polling, occupying 9 tasks. + Day 4: Supermarket 1 succeeded; supermarkets 2, 3, and 4 are polling, occupying 12 tasks. + Day 5: Supermarket 1 succeeded; supermarkets 2, 3, and 4 are polling, occupying 15 tasks. + Day 6: Supermarket 1 succeeded; supermarkets 2, 3, and 4 are polling, occupying 16 tasks; two new tasks cannot run, and any other task trying to run is blocked. +This behavior is often referred to as **sensor deadlock**. In this example, the maximum number of running tasks in the supermarket couponing DAG is reached, and thus the impact is limited to that DAG, while other DAGs can still run. However, once the global Airflow limit of maximum tasks is reached, your entire system is stalled, which is obviously undesirable. This issue can be solved in various ways. + +![[Pasted image 20230611183035.png]] + + +### TriggerDagOperator + +The `TriggerDagRunOperator` is an operator in Apache Airflow that allows you to trigger the execution of another DAG (Directed Acyclic Graph) from within your workflow. It enables you to programmatically start the execution of a separate DAG, providing flexibility and the ability to orchestrate complex workflows. + +Here's how the `TriggerDagRunOperator` works: + +1. When defining your main DAG, you include a `TriggerDagRunOperator` task, specifying the DAG ID of the target DAG that you want to trigger. + +2. During task execution, the `TriggerDagRunOperator` triggers the execution of the target DAG by creating a new DagRun for that DAG. A DagRun is an instance of a DAG that represents a specific run or execution of the DAG. + +3. You can provide additional parameters to the `TriggerDagRunOperator` to customize the triggered DagRun. These parameters can include configuration variables, execution dates, and other context variables that will be passed to the triggered DAG. + +4. Once the DagRun is created, the scheduler of Airflow takes over and starts executing the tasks within the triggered DAG, following the defined dependencies and scheduling parameters. + +Here's an example to illustrate the usage of `TriggerDagRunOperator`: + +```python +from airflow import DAG +from airflow.operators.trigger_dagrun import TriggerDagRunOperator +from datetime import datetime + +with DAG('main_dag', start_date=datetime(2023, 6, 11), schedule_interval=None) as dag: + trigger_task = TriggerDagRunOperator( + task_id='trigger_task', + trigger_dag_id='target_dag', + execution_date="{{ execution_date }}" + ) +``` + +In this example, the `main_dag` includes a `TriggerDagRunOperator` task named `trigger_task`. It is configured to trigger the DAG with the ID `'target_dag'`. + +The `execution_date` parameter is set to `"{{ execution_date }}"`, which is a Jinja template variable that passes the current execution date to the triggered DAG. This allows the triggered DAG to use the same execution date as the main DAG. + +When the `trigger_task` is executed, it triggers the execution of `'target_dag'`, creating a new DagRun for that DAG. The target DAG will start executing its tasks based on its own schedule and dependencies. + +By using the `TriggerDagRunOperator`, you can create complex workflows that orchestrate the execution of multiple DAGs, enabling you to modularize and manage your workflows more effectively. + +### ExternalTaskSensor +![[Pasted image 20230611185557.png]] + +### Starting workflows with REST/CLI + +```bash +#!/usr/bin/env bash + + + +# Trigger DAG with Airflow CLI + +airflow dags trigger listing_6_8 --conf '{"supermarket": 1}' + + + +# Trigger DAG with Airflow REST API + +curl -X POST "http://localhost:8080/api/v1/dags/listing_6_8/dagRuns" -H "Content-Type: application/json" -d '{"conf": {"supermarket": 1}}' --user "admin:admin" +``` + +# Communicating with external systems + +### Moving data from between systems + +Let’s imagine we have a very large job that would take all resources on the machine Airflow is running on. In this case, it’s better to run the job elsewhere; Airflow will start the job and wait for it to complete. The idea is that there should be a strong separation between orchestration and execution, which we can achieve by Airflow starting the job and waiting for completion and a data-processing framework such as Spark performing the actual work + +In Spark, there are various ways to start a job: + Using the SparkSubmitOperator—This requires a spark-submit binary and YARN client config on the Airflow machine to find the Spark instance. + Using the SSHOperator—This requires SSH access to a Spark instance but does not require Spark client config on the Airflow instance. + Using the SimpleHTTPOperator—This requires running Livy, a REST API for Apache Spark, to access Spark. + +![[Pasted image 20230611195143.png]] + + +### Building custom components + + + + +### How to store API keys or connection variables like datas in apache airflow + +![[Pasted image 20230612000055.png]] + +In Apache Airflow, you can store API tokens or any other sensitive information using Airflow's built-in feature called Connections. Connections allow you to securely store and manage credentials, API tokens, and other connection details used by your Airflow tasks. + +Here are the steps to store API tokens using Airflow's Connections feature: + +1. Access the Airflow web UI: Open your web browser and navigate to the Airflow web interface. + +2. Go to the Admin section: Click on the "Admin" menu on the top navigation bar. You will need the necessary permissions to access this section. + +3. Select "Connections": In the Admin section, find and click on the "Connections" option. This will open the Connections management page. + +4. Add a new connection: On the Connections page, click on the "+ Add a Connection" button to create a new connection. + +5. Enter connection details: In the connection creation form, fill in the necessary details for your API token. You can provide a connection name, connection type, host, login, password, and any other relevant information. + +6. Save the connection: Once you have entered the connection details, click on the "Save" button to save the connection. + +7. Use the connection in your tasks: In your Airflow DAGs, you can access the stored API token by referencing the connection you created. You can use the connection's name or provide the connection ID to retrieve the necessary credentials or API token within your task code. + +By storing the API token as a connection in Airflow, you can avoid hardcoding sensitive information in your code and manage credentials centrally through the Airflow UI. + +Note: It's important to ensure that only authorized users have access to the Airflow web UI and appropriate permissions to view and manage connections, as they contain sensitive information. + + +### And How can I get back in python code + +To retrieve the stored API token or connection information from Apache Airflow within your Python code, you can use the `BaseHook` class provided by Airflow. The `BaseHook` class provides methods to retrieve connection details based on the connection name or ID. + +Here's an example of how you can retrieve the API token using `BaseHook`: + +```python +from airflow.hooks.base_hook import BaseHook + +# Retrieve the connection by name +connection_name = "my_api_connection" +connection = BaseHook.get_connection(connection_name) + +# Retrieve the API token +api_token = connection.password + +# Use the API token in your code +print(f"API token: {api_token}") +``` + +In the above example, replace `"my_api_connection"` with the actual name of your connection that holds the API token. The `BaseHook.get_connection()` method retrieves the connection object based on the connection name provided. Then, you can access the API token using the `password` attribute of the connection object. + +You can also retrieve the connection by ID instead of the name using `BaseHook.get_connection_by_id()`: + +```python +from airflow.hooks.base_hook import BaseHook + +# Retrieve the connection by ID +connection_id = 123 +connection = BaseHook.get_connection_by_id(connection_id) + +# Retrieve the API token +api_token = connection.password + +# Use the API token in your code +print(f"API token: {api_token}") +``` + +In this case, replace `123` with the actual ID of your connection. + +By using the `BaseHook` class and its methods, you can easily retrieve the stored API token or any other connection details within your Python code in Apache Airflow. + +![[Pasted image 20230612000055.png]] + +Custom hooks + +```python +import requests + + + +from airflow.hooks.base_hook import BaseHook + + + + +class MovielensHook(BaseHook): + +""" + +Hook for the MovieLens API. + + + +Abstracts details of the Movielens (REST) API and provides several convenience + +methods for fetching data (e.g. ratings, users, movies) from the API. Also + +provides support for automatic retries of failed requests, transparent + +handling of pagination, authentication, etc. + + + +Parameters + +---------- + +conn_id : str + +ID of the connection to use to connect to the Movielens API. Connection + +is expected to include authentication details (login/password) and the + +host that is serving the API. + +""" + + + +DEFAULT_SCHEMA = "http" + +DEFAULT_PORT = 5000 + + + +def __init__(self, conn_id, retry=3): + +super().__init__() + +self._conn_id = conn_id + +self._retry = retry + + + +self._session = None + +self._base_url = None + + + +def __enter__(self): + +return self + + + +def __exit__(self, exc_type, exc_val, exc_tb): + +self.close() + + + +def get_conn(self): + +""" + +Returns the connection used by the hook for querying data. + +Should in principle not be used directly. + +""" + + + +if self._session is None: + +# Fetch config for the given connection (host, login, etc). + +config = self.get_connection(self._conn_id) + + + +if not config.host: + +raise ValueError(f"No host specified in connection {self._conn_id}") + + + +schema = config.schema or self.DEFAULT_SCHEMA + +port = config.port or self.DEFAULT_PORT + + + +self._base_url = f"{schema}://{config.host}:{port}" + + + +# Build our session instance, which we will use for any + +# requests to the API. + +self._session = requests.Session() + + + +if config.login: + +self._session.auth = (config.login, config.password) + + + +return self._session, self._base_url + + + +def close(self): + +"""Closes any active session.""" + +if self._session: + +self._session.close() + +self._session = None + +self._base_url = None + + + +# API methods: + + + +def get_movies(self): + +"""Fetches a list of movies.""" + +raise NotImplementedError() + + + +def get_users(self): + +"""Fetches a list of users.""" + +raise NotImplementedError() + + + +def get_ratings(self, start_date=None, end_date=None, batch_size=100): + +""" + +Fetches ratings between the given start/end date. + + + +Parameters + +---------- + +start_date : str + +Start date to start fetching ratings from (inclusive). Expected + +format is YYYY-MM-DD (equal to Airflow's ds formats). + +end_date : str + +End date to fetching ratings up to (exclusive). Expected + +format is YYYY-MM-DD (equal to Airflow's ds formats). + +batch_size : int + +Size of the batches (pages) to fetch from the API. Larger values + +mean less requests, but more data transferred per request. + +""" + + + +yield from self._get_with_pagination( + +endpoint="/ratings", + +params={"start_date": start_date, "end_date": end_date}, + +batch_size=batch_size, + +) + + + +def _get_with_pagination(self, endpoint, params, batch_size=100): + +""" + +Fetches records using a get request with given url/params, + +taking pagination into account. + +""" + + + +session, base_url = self.get_conn() + +url = base_url + endpoint + + + +offset = 0 + +total = None + +while total is None or offset < total: + +response = session.get( + +url, params={**params, **{"offset": offset, "limit": batch_size}} + +) + +response.raise_for_status() + +response_json = response.json() + + + +yield from response_json["result"] + + + +offset += batch_size + +total = response_json["total"] +``` + +### Building Custom Operator + +Although building a MovielensHook has allowed us to move a lot of complexity from our DAG into the hook, we still have to write a considerable amount of boilerplate code for defining start/end dates and writing the ratings to an output file. This means that, if we want to reuse this functionality in multiple DAGs, we will still have some considerable code duplication and extra effort involved. Fortunately, Airflow also allows us to build custom operators, which can be used to perform repetitive tasks with a minimal amount of boilerplate code. In this case, we could, for example, use this functionality to build a MovielensFetchRatingsOperator, which would allow us to fetch movie ratings using a specialized operator class. + +```python +import json + +import os + + + +from airflow.models import BaseOperator + +from airflow.utils.decorators import apply_defaults + + + +from custom.hooks import MovielensHook + + + + +class MovielensFetchRatingsOperator(BaseOperator): + +""" + +Operator that fetches ratings from the Movielens API (introduced in Chapter 8). + + + +Parameters + +---------- + +conn_id : str + +ID of the connection to use to connect to the Movielens API. Connection + +is expected to include authentication details (login/password) and the + +host that is serving the API. + +output_path : str + +Path to write the fetched ratings to. + +start_date : str + +(Templated) start date to start fetching ratings from (inclusive). + +Expected format is YYYY-MM-DD (equal to Airflow's ds formats). + +end_date : str + +(Templated) end date to fetching ratings up to (exclusive). + +Expected format is YYYY-MM-DD (equal to Airflow's ds formats). + +batch_size : int + +Size of the batches (pages) to fetch from the API. Larger values + +mean less requests, but more data transferred per request. + +""" + + + +template_fields = ("_start_date", "_end_date", "_output_path") + + + +@apply_defaults + +def __init__( + +self, + +conn_id, + +output_path, + +start_date="{{ds}}", + +end_date="{{next_ds}}", + +batch_size=1000, + +**kwargs, + +): + +super(MovielensFetchRatingsOperator, self).__init__(**kwargs) + + + +self._conn_id = conn_id + +self._output_path = output_path + +self._start_date = start_date + +self._end_date = end_date + +self._batch_size = batch_size + + + +# pylint: disable=unused-argument,missing-docstring + +def execute(self, context): + +hook = MovielensHook(self._conn_id) + + + +try: + +self.log.info( + +f"Fetching ratings for {self._start_date} to {self._end_date}" + +) + +ratings = list( + +hook.get_ratings( + +start_date=self._start_date, + +end_date=self._end_date, + +batch_size=self._batch_size, + +) + +) + +self.log.info(f"Fetched {len(ratings)} ratings") + +finally: + +# Make sure we always close our hook's session. + +hook.close() + + + +self.log.info(f"Writing ratings to {self._output_path}") + + + +# Make sure output directory exists. + +output_dir = os.path.dirname(self._output_path) + +os.makedirs(output_dir, exist_ok=True) + + + +# Write output as JSON. + +with open(self._output_path, "w") as file_: + +json.dump(ratings, fp=file_) +``` + +### Building custom sensors + +```python +"""Module containing file system sensors.""" + + + +from airflow.sensors.base import BaseSensorOperator + +from airflow.utils.decorators import apply_defaults + + + +from custom.hooks import MovielensHook + + + + +class MovielensRatingsSensor(BaseSensorOperator): + +""" + +Sensor that waits for the Movielens API to have ratings for a time period. + + + +start_date : str + +(Templated) start date of the time period to check for (inclusive). + +Expected format is YYYY-MM-DD (equal to Airflow's ds formats). + +end_date : str + +(Templated) end date of the time period to check for (exclusive). + +Expected format is YYYY-MM-DD (equal to Airflow's ds formats). + +""" + + + +template_fields = ("_start_date", "_end_date") + + + +@apply_defaults + +def __init__(self, conn_id, start_date="{{ds}}", end_date="{{next_ds}}", **kwargs): + +super().__init__(**kwargs) + +self._conn_id = conn_id + +self._start_date = start_date + +self._end_date = end_date + + + +# pylint: disable=unused-argument,missing-docstring + +def poke(self, context): + +hook = MovielensHook(self._conn_id) + + + +try: + +next( + +hook.get_ratings( + +start_date=self._start_date, end_date=self._end_date, batch_size=1 + +) + +) + +# If no StopIteration is raised, the request returned at least one record. + +# This means that there are records for the given period, which we indicate + +# to Airflow by returning True. + +self.log.info( + +f"Found ratings for {self._start_date} to {self._end_date}, continuing!" + +) + +return True + +except StopIteration: + +self.log.info( + +f"Didn't find any ratings for {self._start_date} " + +f"to {self._end_date}, waiting..." + +) + +# If StopIteration is raised, we know that the request did not find + +# any records. This means that there a no ratings for the time period, + +# so we should return False. + +return False + +finally: + +# Make sure we always close our hook's session. + +hook.close() + +``` + +### NEED TO READ AND PRACTICE IN TESTING UNIT 9 + +### Running tasks in containers Unit 10 + +![[Pasted image 20230612010748.png]] + +```python +import datetime as dt + +import os + + + +from airflow import DAG + +from airflow.providers.docker.operators.docker import DockerOperator + + + + +with DAG( + +dag_id="01_docker", + +description="Fetches ratings from the Movielens API using Docker.", + +start_date=dt.datetime(2019, 1, 1), + +end_date=dt.datetime(2019, 1, 3), + +schedule_interval="@daily", + +) as dag: + + + +fetch_ratings = DockerOperator( + +task_id="fetch_ratings", + +image="manning-airflow/movielens-fetch", + +command=[ + +"fetch-ratings", + +"--start_date", + +"{{ds}}", + +"--end_date", + +"{{next_ds}}", + +"--output_path", + +"/data/ratings/{{ds}}.json", + +"--user", + +os.environ["MOVIELENS_USER"], + +"--password", + +os.environ["MOVIELENS_PASSWORD"], + +"--host", + +os.environ["MOVIELENS_HOST"], + +], + +network_mode="airflow", + +# Note: this host path is on the HOST, not in the Airflow docker container. + +volumes=["/tmp/airflow/data:/data"], + +) + + + +rank_movies = DockerOperator( + +task_id="rank_movies", + +image="manning-airflow/movielens-rank", + +command=[ + +"rank-movies", + +"--input_path", + +"/data/ratings/{{ds}}.json", + +"--output_path", + +"/data/rankings/{{ds}}.csv", + +], + +volumes=["/tmp/airflow/data:/data"], + +) + + + +fetch_ratings >> rank_movies +``` +![[Pasted image 20230612011006.png]] + + +![[Pasted image 20230612011104.png]] + + Airflow deployments can be difficult to manage if they involve many different operators, as this requires knowledge of the different APIs and complicates debugging and dependency management. + One way of tackling this issue is to use container technologies such as Docker to encapsulate your tasks inside container images and run these images from within Airflow. + This containerized approach has several advantages, including easier dependency management, a more uniform interface for running tasks, and improved testability of tasks. + Using the DockerOperator, you can run tasks in container images directly using Docker, similar to the docker run CLI command. + You can use the KubernetesPodOperator to run containerized tasks in pods on a Kubernetes cluster. + Kubernetes allows you to scale your containerized tasks across a compute cluster, which provides (among other things) greater scalability and more flexibility in terms of computing resources. + + diff --git a/content/AI&DATA/Data Engineering/data eng index.md b/content/AI&DATA/Data Engineering/data eng index.md new file mode 100644 index 000000000..0cf361973 --- /dev/null +++ b/content/AI&DATA/Data Engineering/data eng index.md @@ -0,0 +1,3 @@ +#index +* [[Apache Airflow]] +* \ No newline at end of file diff --git a/content/AI&DATA/Generative AI Book.md b/content/AI&DATA/Generative AI Book.md new file mode 100644 index 000000000..19207612d --- /dev/null +++ b/content/AI&DATA/Generative AI Book.md @@ -0,0 +1,122 @@ + +# Generative Modeling + +A generative model describes how a dataset is generated, in terms of a probabilistic +model. By sampling from this model, we are able to generate new data. + +First, we require a dataset consisting of many examples of the entity we are trying to +generate. This is known as the training data, and one such data point is called an +**observation**. + +![[Screenshot from 2023-07-18 09-32-29.png]] + + +generative model must also be probabilistic rather than deterministic. If our model +is merely a fixed calculation, such as taking the average value of each pixel in the +dataset, it is not generative because the model produces the same output every time. +The model must include a stochastic (random) element that influences the individual +samples generated by the model. + + +![[Screenshot from 2023-07-18 09-34-24.png]] + +One key difference is that when performing discriminative modeling, each observa‐ +tion in the training data has a label. + +Discriminative modeling estimates p( y | x) —the probability of a label y given observa‐ +tion x. + +Generative modeling estimates p(x) —the probability of observing observation x. +If the dataset is labeled, we can also build a generative model that estimates the distri‐ +bution p(x | y) . + +In other words, discriminative modeling attempts to estimate the probability that an +observation x belongs to category y. Generative modeling doesn’t care about labeling +observations. Instead, it attempts to estimate the probability of seeing the observation +at all. + + +sample space, +density function, +parametric modeling, +maximum likelihood estimation + +![[Screenshot from 2023-07-18 09-54-01.png]] + +![[Pasted image 20230718102453.png]] + +Generative Modeling Challenges +• How does the model cope with the high degree of conditional dependence +between features? +• How does the model find one of the tiny proportion of satisfying possible gener‐ +ated observations among a high-dimensional sample space? + +The fact that deep learning can form its own features in a lower-dimensional space +means that it is a form of representation learning. It is important to understand the +key concepts of representation learning before we tackle deep learning in the next +chapter. + +### Representation Learning + +The core idea behind representation learning is that instead of trying to model the +high-dimensional sample space directly, we should instead describe each observation +in the training set using some low-dimensional latent space and then learn a mapping +function that can take a point in the latent space and map it to a point in the original +domain. In other words, each point in the latent space is the representation of some +high-dimensional image. + +![[Pasted image 20230718105536.png]] + +## Variational Autoencoder VAE + + + +• An encoder network that compresses high-dimensional input data into a lower- +dimensional representation vector +• A decoder network that decompresses a given representation vector back to the +original domain + + + + +![[Pasted image 20230719124201.png]] + +The network is trained to find weights for the encoder and decoder that minimize the +loss between the original input and the reconstruction of the input after it has passed +through the encoder and decoder. + +Autoencoders can be used to generate new data through a process called "autoencoder decoding" or "autoencoder sampling." Autoencoders are neural network models that learn to encode input data into a lower-dimensional representation (latent space) and then decode it back to reconstruct the original input. This reconstruction process can also be used to generate new data that resembles the patterns learned during training. + +Here's a general approach to using an autoencoder for data generation: + +1. Train an Autoencoder: Start by training an autoencoder on a dataset of your choice. The autoencoder consists of an encoder network that maps the input data to a lower-dimensional latent space and a decoder network that reconstructs the original input from the latent space representation. + +2. Latent Space Exploration: After training, you can explore the learned latent space by sampling points from it. Randomly generate vectors or sample from a probability distribution to create latent space representations. + +3. Decoding: Pass the sampled latent space representations through the decoder network to generate new data. The decoder will transform the latent space representations back into the original data space, generating synthetic data that resembles the patterns learned during training. + +4. Control Generation: By manipulating the values of the latent space representations, you can control the characteristics of the generated data. For example, you can interpolate between two latent space points to create a smooth transition between two data samples or explore specific directions in the latent space to generate variations of a particular feature. + + +It's important to note that the quality of the generated data heavily depends on the quality of the trained autoencoder and the complexity of the dataset. Autoencoders are most effective when trained on datasets with clear patterns and structure. + +There are variations of autoencoders, such as variational autoencoders (VAEs), that introduce probabilistic components and offer more control over the generation process. VAEs can generate data that follows a specific distribution by sampling latent variables from the learned distributions. + +Remember that the generated data is synthetic and may not perfectly match the real data distribution. It's crucial to evaluate the generated samples and assess their usefulness for your specific application. + + +Variational autoencoders solve these problems, by introducing randomness into the +model and constraining how points in the latent space are distributed. We saw that +with a few minor adjustments, we can transform our autoencoder into a variational +autoencoder, thus giving it the power to be a generative model. + + +Finally, we applied our new technique to the problem of face generation and saw how +we can simply choose points from a standard normal distribution to generate new +faces. Moreover, by performing vector arithmetic within the latent space, we can ach‐ +ieve some amazing effects, such as face morphing and feature manipulation. With +these features, it is easy to see why VAEs have become a prominent technique for gen‐ +erative modeling in recent years. + +![[Pasted image 20230719165826.png]] + diff --git a/content/AI&DATA/Generative AI By GOOGLE.md b/content/AI&DATA/Generative AI By GOOGLE.md new file mode 100644 index 000000000..92f4302b2 --- /dev/null +++ b/content/AI&DATA/Generative AI By GOOGLE.md @@ -0,0 +1,23 @@ + +## # What are the 4 Vs of Big Data? + There are generally four characteristics that must be part of a dataset to qualify it as big data—volume, velocity, variety and veracity [link](https://bernardmarr.com/what-are-the-4-vs-of-big-data/#:~:text=There%20are%20generally%20four%20characteristics,%2C%20velocity%2C%20variety%20and%20veracity.) + +### What is ETL + +ETL provides the foundation for data analytics and machine learning workstreams. Through a series of business rules, ETL cleanses and organizes data in a way which addresses specific business intelligence needs, like monthly reporting, but it can also tackle more advanced analytics, which can improve back-end processes or end user experiences. ETL is often used by an organization to:  + +- Extract data from legacy systems +- Cleanse the data to improve data quality and establish consistency +- Load data into a target database + + +### Apache Beam +Apache Beam is an open-source, unified programming model and set of tools for building batch and streaming data processing pipelines. It provides a way to express data processing pipelines that can run on various distributed processing backends, such as Apache Spark, Apache Flink, Google Cloud Dataflow, and others. Apache Beam offers a high-level API that abstracts away the complexities of distributed data processing and allows developers to write pipeline code in a language-agnostic manner. + +The key concept in Apache Beam is the data processing pipeline, which consists of a series of transforms that are applied to input data to produce an output. A transform represents a specific operation on the data, such as filtering, mapping, aggregating, or joining. Apache Beam provides a rich set of built-in transforms, as well as the ability to create custom transforms to suit specific processing needs. + +One of the main advantages of Apache Beam is its portability across different processing engines. With Apache Beam, you can write your pipeline code once and run it on multiple execution engines without modifying the code. This flexibility allows you to choose the processing engine that best fits your requirements or take advantage of the capabilities offered by different engines for specific tasks. + +Apache Beam supports both batch and streaming processing. It provides a programming model that enables developers to write pipelines that can handle both bounded (batch) and unbounded (streaming) data. This makes it possible to build end-to-end data processing solutions that can handle diverse data processing scenarios. + +Overall, Apache Beam simplifies the development of data processing pipelines by providing a unified model and a set of tools that abstract away the complexities of distributed data processing. It allows developers to focus on the logic of their data transformations rather than the intricacies of the underlying execution engines. \ No newline at end of file diff --git a/content/AI&DATA/Haystack by Deepset.md b/content/AI&DATA/Haystack by Deepset.md new file mode 100644 index 000000000..991162c3d --- /dev/null +++ b/content/AI&DATA/Haystack by Deepset.md @@ -0,0 +1,188 @@ + +Haystack is an **open-source framework** for building **search systems** that work intelligently over large document collections + + +### The Building Blocks of Haystack + +### Nodes + +* Haystack offers [nodes](https://docs.haystack.deepset.ai/docs/nodes_overview) that perform different kinds of text processing +* These are often powered by the latest transformer models. + + +### Transformers + +The Transformer model revolutionized the field of NLP and became the foundation for many subsequent advancements, including OpenAI's GPT models. Unlike earlier NLP models that relied on recurrent neural networks (RNNs) or convolutional neural networks (CNNs), the Transformer model relies on a self-attention mechanism. + +The self-attention mechanism allows the Transformer model to capture dependencies between different words in a sentence or sequence by assigning different weights to each word based on its relevance to other words in the sequence. This enables the model to effectively model long-range dependencies and improve performance on various NLP tasks such as machine translation, text summarization, and question answering. + +The Transformer model consists of an encoder-decoder architecture, where the encoder processes the input sequence and the decoder generates the output sequence. Both the encoder and decoder are composed of multiple layers of self-attention mechanisms and feed-forward neural networks. The model is trained using a technique called "unsupervised learning" on large amounts of text data. + +Overall, the Transformer model has significantly advanced the state of the art in NLP and has become a crucial component in many applications involving natural language understanding and generation. + +NLP’s Transformer is a new architecture that aims to solve tasks sequence-to-sequence while easily handling long-distance dependencies. Computing the input and output representations without using sequence-aligned RNNs or convolutions and it relies entirely on self-attention. Lets look in detail what are transformers. + +https://blog.knoldus.com/what-are-transformers-in-nlp-and-its-advantages/ + +```python +reader = FARMReader(model="deepset/roberta-base-squad2") result = reader.predict( query="Which country is Canberra located in?", documents=documents, top_k=10 ) +#https://docs.haystack.deepset.ai/reference/reader-api +``` + + +### Pipelines + +```python +p = Pipeline() +p.add_node(component=retriever, name="Retriever", inputs=["Query"]) +p.add_node(component=reader, name="Reader", inputs=["Retriever"]) +result = p.run(query="What did Einstein work on?") + +``` + + + +**Readers**, also known as Closed-Domain Question Answering systems in machine learning speak, are powerful models that closely analyze documents and perform the core task of question answering. The Readers in Haystack are trained from the latest transformer-based language models and can be significantly sped up using GPU acceleration. But it's not currently feasible to use the Reader directly on a large collection of documents. + +The **Retriever** assists the Reader by acting as a lightweight filter that reduces the number of documents the Reader must process. It scans through all documents in the database, quickly identifies the relevant ones, and dismisses the irrelevant ones. It ends up with a small set of candidate documents that it passes on to the Reader. + +```python +p = ExtractiveQAPipeline(reader, retriever) +result = p.run(query="What is the capital of Australia?") +``` + +You can't do question answering with a Retriever only. And with just a Reader, it would be unacceptably slow. The power of this system comes from the combination of the two nodes. + + +### Agent + + +[The Agent](https://docs.haystack.deepset.ai/docs/agent) is a very versatile, prompt-based component that uses a large language model and employs reasoning to answer complex questions beyond the capabilities of extractive or generative question answering. It's particularly useful for multi-hop question answering scenarios where it must combine information from multiple sources to arrive at an answer. When the Agent receives a query, it forms a plan of action consisting of steps it has to complete. It then starts with choosing the right tool and proceeds using the output from each tool as input for the next. It uses the tools in a loop until it reaches the final answer. + +```python +agent = Agent( prompt_node=prompt_node, prompt_template=few_shot_agent_template, tools=[web_qa_tool], final_answer_pattern=r"Final Answer\s*:\s*(.*)", ) +hotpot_questions = [ "What year was the father of the Princes in the Tower born?", "Name the movie in which the daughter of Noel Harrison plays Violet Trefusis.", "Where was the actress who played the niece in the Priest film born?", "Which author is English: John Braine or Studs Terkel?", ] +``` + +### REST API + +To deploy a search system, you need more than just a Python script. You need a service that can stay on, handle requests as they come in, and be callable by many different applications. For this, Haystack comes with a [REST API](https://docs.haystack.deepset.ai/docs/rest_api) designed to work in production environments. + +# Tutorial: Build Your First Question Answering System + + +DocumentStore stores the Documents that the question answering system uses to find answers to your questions. In this tutorial, we’re using the `InMemoryDocumentStore`, which is the simplest DocumentStore to get started with. It requires no external dependencies and it’s a good option for smaller projects and debugging. But it doesn’t scale up so well to larger Document collections, so it’s not a good choice for production systems. To learn more about the DocumentStore and the different types of external databases that we support, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store). + +```python +from haystack.document_stores import InMemoryDocumentStore +document_store = InMemoryDocumentStore(use_bm25=True) +``` + +### In Haystack Which one connect to Documentstore first retriver or reader ? + +In Haystack, both the retriever and the reader components can be connected to the document store, but the order in which they are connected depends on the specific pipeline configuration and use case. + +The document store is responsible for storing and indexing the documents that the retriever component will search through. It acts as the initial source of information for the retrieval process. + +Typically, the retriever component is connected to the document store first. The retriever performs an initial search using a given query to retrieve a set of relevant documents or passages from the document store based on their similarity or relevance to the query. The retrieved documents or passages are then passed on to the reader component for further processing. + +Once the retriever component retrieves the relevant documents or passages, the reader component is connected to the retriever's output. The reader component is responsible for extracting the answer or information from the retrieved documents or passages, typically using techniques like machine reading comprehension. + +Here's an example of how the retriever and reader components can be connected to the document store in a Haystack pipeline: + +```python +# Connect retriever to document store +p.add_node(component=retriever, name="Retriever", inputs=["Query"], outputs=["RetrievedDocuments"]) +p.add_node(component=document_store, name="DocumentStore", inputs=["Retriever"]) + +# Connect reader to retriever's output +p.add_node(component=reader, name="Reader", inputs=["RetrievedDocuments"]) +``` + +In this example, the retriever component is connected to the document store, and its output is named "RetrievedDocuments". The document store serves as the input to the retriever component. The reader component is then connected to the "RetrievedDocuments" output, using the retrieved information as its input. + +Please note that the actual configuration and connection of components in your Haystack pipeline may differ based on your specific requirements and implementation. + + +### What is BM25 ? +https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables + +Download 517 articles from the Game of Thrones Wikipedia. You can find them in _data/build_your_first_question_answering_system_ as a set of _.txt_ files +```python +from haystack.utils import fetch_archive_from_http + +doc_dir = "data/build_your_first_question_answering_system" + +fetch_archive_from_http( + url="https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip", + output_dir=doc_dir +) +``` + +2. Use `TextIndexingPipeline` to convert the files you just downloaded into Haystack [Document objects](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document) and write them into the DocumentStore: +```python +import os + +from haystack.pipelines.standard_pipelines import TextIndexingPipeline + + + +files_to_index = [doc_dir + "/" + f for f in os.listdir(doc_dir)] + +indexing_pipeline = TextIndexingPipeline(document_store) + +indexing_pipeline.run_batch(file_paths=files_to_index) +``` + +## Initializing the Retriever + +Our search system will use a Retriever, so we need to initialize it. A Retriever sifts through all the Documents and returns only the ones relevant to the question. This tutorial uses the BM25 algorithm. For more Retriever options, see [Retriever](https://docs.haystack.deepset.ai/docs/retriever). + +Let's initialize a BM25Retriever and make it use the InMemoryDocumentStore we initialized earlier in this tutorial: + +```python +from haystack.nodes import BM25Retriever +retriever = BM25Retriever(document_store=document_store) +``` + +## Initializing the Reader + +A Reader scans the texts it received from the Retriever and extracts the top answer candidates. Readers are based on powerful deep learning models but are much slower than Retrievers at processing the same amount of text. In this tutorial, we're using a FARMReader with a base-sized RoBERTa question answering model called [`deepset/roberta-base-squad2`](https://huggingface.co/deepset/roberta-base-squad2). It's a strong all-round model that's good as a starting point. To find the best model for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models). + +Let's initialize the Reader: +```python +from haystack.nodes import FARMReader + + + +reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True) +``` + +### Create Retriver-Reader pipeline + +In this tutorial, we're using a ready-made pipeline called `ExtractiveQAPipeline`. It connects the Reader and the Retriever. The combination of the two speeds up processing because the Reader only processes the Documents that the Retriever has passed on. To learn more about pipelines, see [Pipelines](https://docs.haystack.deepset.ai/docs/pipelines). + +To create the pipeline, run: + +```python +from haystack.pipelines import ExtractiveQAPipeline + +pipe = ExtractiveQAPipeline(reader,retriver) + +prediction = pipe.run( + +query="Who is the father of Arya Stark?", + +params={ + +"Retriever": {"top_k": 10}, + +"Reader": {"top_k": 5} + +} + +) + + +``` + diff --git a/content/AI&DATA/ML index.md b/content/AI&DATA/ML index.md new file mode 100644 index 000000000..99a1ef77a --- /dev/null +++ b/content/AI&DATA/ML index.md @@ -0,0 +1 @@ +#index \ No newline at end of file diff --git a/content/Article&Books/TECH ARTICLE.md b/content/Article&Books/TECH ARTICLE.md new file mode 100644 index 000000000..839e2b290 --- /dev/null +++ b/content/Article&Books/TECH ARTICLE.md @@ -0,0 +1,4 @@ +* https://www.darkreading.com/dr-tech/10-free-purple-team-security-tools-2023 +* https://www.darkreading.com/application-security/10-cool-security-tools-open-sourced-by-the-internet-s-biggest-innovators +* https://github.com/danluu/post-mortems +* \ No newline at end of file diff --git a/content/Article&Books/academic paper ideas.md b/content/Article&Books/academic paper ideas.md new file mode 100644 index 000000000..e7ab3f089 --- /dev/null +++ b/content/Article&Books/academic paper ideas.md @@ -0,0 +1,3 @@ +#todos + +# write about diff --git a/content/Article&Books/article index.md b/content/Article&Books/article index.md new file mode 100644 index 000000000..e6db979ab --- /dev/null +++ b/content/Article&Books/article index.md @@ -0,0 +1,4 @@ +#index + +* [[medium article ideas]] +* [[academic paper ideas]] \ No newline at end of file diff --git a/content/Article&Books/books/Book Index.md b/content/Article&Books/books/Book Index.md new file mode 100644 index 000000000..e18cd0dfa --- /dev/null +++ b/content/Article&Books/books/Book Index.md @@ -0,0 +1 @@ +#index diff --git a/content/Cloud/AWS/AWS CL 01 Exam Questions Notes.md b/content/Cloud/AWS/AWS CL 01 Exam Questions Notes.md new file mode 100644 index 000000000..2199d4fa4 --- /dev/null +++ b/content/Cloud/AWS/AWS CL 01 Exam Questions Notes.md @@ -0,0 +1,1097 @@ + +# 6 Pillars of aws + +https://aws.amazon.com/blogs/apn/the-6-pillars-of-the-aws-well-architected-framework/ + +## 1. Operational Excellence +The Operational Excellence pillar includes the ability to support development and run workloads effectively, gain insight into their operation, and continuously improve supporting processes and procedures to delivery business value + + +- Perform operations as code +- Make frequent, small, reversible changes +- Refine operations procedures frequently +- Anticipate failure +- Learn from all operational failures + +## 2. Security + +here are seven design principles for security in the cloud: + +- Implement a strong identity foundation +- Enable traceability +- Apply security at all layers +- Automate security best practices +- Protect data in transit and at rest +- Keep people away from data +- Prepare for security events + +## 3. Reliability + +The Reliability pillar encompasses the ability of a workload to perform its intended function correctly and consistently when it’s expected to. This includes the ability to operate and test the workload through its total lifecycle. + +- Automatically recover from failure +- Test recovery procedures +- Scale horizontally to increase aggregate workload availability +- Stop guessing capacity +- Manage change in automation + +## 4. Performance Efficiency + +- Democratize advanced technologies +- Go global in minutes +- Use serverless architectures +- Experiment more often +- Consider mechanical sympathy + +## 5. Cost Optimization + +- Implement cloud financial management +- Adopt a consumption model +- Measure overall efficiency +- Stop spending money on undifferentiated heavy lifting +- Analyze and attribute expenditure + +## 6. Sustainability + +The discipline of sustainability addresses the long-term environmental, economic, and societal impact of your business activities. + +- Understand your impact +- Establish sustainability goals +- Maximize utilization +- Anticipate and adopt new, more efficient hardware and software offerings +- Use managed services +- Reduce the downstream impact of your cloud workloads + + +# Aws config + +AWS Config is a service that enables you to assess, audit, and evaluate the configurations of your AWS resources. Config continuously monitors and records your AWS resource configurations and allows you to automate the evaluation of recorded configurations against desired configurations. +![[awsconfig.png]] + +# **AWS CloudTrail** + +AWS CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of your AWS account. With CloudTrail, you can log, continuously monitor, and retain account activity related to actions across your AWS infrastructure. CloudTrail provides event history of your AWS account activity, including actions taken through the AWS Management Console, AWS SDKs, command-line tools, and other AWS services.![[Product-Page-Diagram-AWSX-CloudTrail_How-it-Works.d2f51f6e3ec3ea3b33d0c48d472f0e0b59b46e59.png]] + +# **Amazon CloudWatch** + +Amazon CloudWatch is a monitoring and observability service built for DevOps engineers, developers, site reliability engineers (SREs), and IT managers. CloudWatch provides data and actionable insights to monitor applications, respond to system-wide performance changes, optimize resource utilization, and get a unified view of operational health. + + +# **AWS Trusted Advisor** + +AWS Trusted Advisor is an online tool that provides you real-time guidance to help you provision your resources following AWS best practices on cost optimization, security, fault tolerance, service limits, and performance improvement. + +**Amazon Inspector** - Amazon Inspector is an automated security assessment service that helps improve the security and compliance of applications deployed on AWS. Amazon Inspector automatically assesses applications for exposure, vulnerabilities, and deviations from best practices. + +**Amazon GuardDuty** - Amazon GuardDuty is a threat detection service that monitors malicious activity and unauthorized behavior to protect your AWS account. GuardDuty analyzes billions of events across your AWS accounts from AWS CloudTrail (AWS user and API activity in your accounts), Amazon VPC Flow Logs (network traffic data), and DNS Logs (name query patterns). This service is for AWS account level access, not for instance-level management like an EC2. GuardDuty cannot be used to check OS vulnerabilities. + + +On-demand EC2 pricing is seconds based. **Per-Second Billing for EC2 and EBS** + +ec2 pricings [url](https://aws.amazon.com/ec2/pricing/) + RESERVED VS SAVINGS + +Amazon offers two pricing models for EC2 instances: **EC2 Reserved Instances** and **Savings Plans**. EC2 Reserved Instances provide you with a significant discount (up to 72%) compared to On-Demand Instance pricing, and can be purchased for a 1-year or 3-year term. Customers have the flexibility to change the Availability Zone, the instance size, and networking type of their Standard Reserved Instances. Purchase Convertible Reserved Instances if you need additional flexibility, such as the ability to use different instance families, operating systems, or tenancies over the Reserved Instance term. [Convertible Reserved Instances provide you with a significant discount (up to 66%) compared to On-Demand Instances and can be purchased for a 1-year or 3-year term](https://aws.amazon.com/ec2/pricing/reserved-instances/pricing/)[1](https://aws.amazon.com/ec2/pricing/reserved-instances/pricing/). + +Savings Plans is another flexible pricing model that provides savings of up to 72% on your AWS compute usage. This pricing model offers lower prices on Amazon EC2 instances usage, regardless of instance family, size, OS, tenancy or AWS Region, and also applies to AWS Fargate and AWS Lambda usage. Savings Plans offer significant savings over On-Demand Instances, just like EC2 Reserved Instances, in exchange for a commitment to use a specific amount of compute power (measured in $/hour) for a one or three-year period. [You can sign up for Savings Plans for a one- or three-year term and easily manage your plans by taking advantage of recommendations, performance reporting and budget alerts in the AWS Cost Explorer](https://docs.aws.amazon.com/whitepapers/latest/cost-optimization-reservation-models/savings-plans.html)[2](https://docs.aws.amazon.com/whitepapers/latest/cost-optimization-reservation-models/savings-plans.html) + +**Instance Store** + +An instance store provides temporary block-level storage for your instance. This storage is located on disks that are physically attached to the host computer. This is a good option when you need storage with very low latency, but you don't need the data to persist when the instance terminates or you can take advantage of fault-tolerant architectures. For this use-case, the computation application itself has a fault tolerant architecture, so it can automatically handle any failures of Instance Store volumes. + +As the Instance Store volumes are included as part of the instance's usage cost, therefore this is the correct option. + +**AWS Shield** + +AWS Shield is a managed Distributed Denial of Service (DDoS) protection service that safeguards applications running on AWS. AWS Shield provides always-on detection and automatic inline mitigations that minimize application downtime and latency, so there is no need to engage AWS Support to benefit from DDoS protection. There are two tiers of AWS Shield - Standard and Advanced. + +All AWS customers benefit from the automatic protections of AWS Shield Standard, at no additional charge. AWS Shield Standard defends against most common, frequently occurring network and transport layer DDoS attacks that target your web site or applications. When you use AWS Shield Standard with Amazon CloudFront and Amazon Route 53, you receive comprehensive availability protection against all known infrastructure (Layer 3 and 4) attacks. + +For higher levels of protection against attacks targeting your applications running on Amazon Elastic Compute Cloud (EC2), Elastic Load Balancing (ELB), Amazon CloudFront, AWS Global Accelerator and Amazon Route 53 resources, you can subscribe to AWS Shield Advanced. In addition to the network and transport layer protections that come with Standard, AWS Shield Advanced provides additional detection and mitigation against large and sophisticated DDoS attacks, near real-time visibility into attacks, and integration with AWS WAF, a web application firewall. + + +# Amazon S3 Storage Classes + +https://aws.amazon.com/s3/storage-classes/ + +![[pt1-q9-i1.jpg]] + +For archive data that does not require immediate access but needs the flexibility to retrieve large sets of data at no cost, such as backup or disaster recovery use cases, choose S3 Glacier Flexible Retrieval (formerly S3 Glacier), with retrieval in minutes or free bulk retrievals in 5—12 hours. To save even more on long-lived archive storage such as compliance archives and digital media preservation, choose S3 Glacier Deep Archive, the lowest cost storage in the cloud with data retrieval from 12—48 hours. + +*Cloudtrail logs are encryption enabled by default...* +By default, the log files delivered by CloudTrail to your S3 bucket are encrypted using server-side encryption with Amazon S3–managed encryption keys (SSE-S3). + + s3 encryption types https://www.encryptionconsulting.com/amazon-s3-simple-storage-service-encryption-at-a-glance/#:~:text=AWS%20S3%20%E2%80%93%20Client%20and%20Server%20Side%20Encryption,...%204%20Comparison%20of%20S3%20encryption%20options%3A%20 + +## Client-side Encryption for s3 +- Use a CMK (customer [master key](https://www.encryptionconsulting.com/education-center/master-key/)) stored in AWS KMS (Key Management Service) +- Use a Customer provided master key stored in the customer’s proprietary application + +## Server-side Encryption + + +1. **Use Amazon S3-managed keys (SSE-S3)**In this, the key material and the key will be provided by AWS itself to encrypt the objects in the S3 bucket. + +2. **Use CMK (Customer Master key) in AWS KMS (SSE-KMS)**In this, key material and the key will be generated in AWS KMS service to encrypt the objects in S3 bucket. + +3. **Use a Customer provided encryption key (SSE-C)**In this, the key will be provided by the customer and Amazon S3 manages the encryption and [decryption](https://learn.encryptionconsulting.com/what-is-decryption/) process while uploading/downloading the objects into the S3 bucket. + +You may see a question around this concept in the exam. Just remember that only S3 and DynamoDB support VPC Endpoint Gateway. All other services that support VPC Endpoints use a VPC Endpoint Interface. + + +**SQS** - Amazon Simple Queue Service (SQS) is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications. Using SQS, you can send, store, and receive messages between software components at any volume, without losing messages or requiring other services to be available. + +**SNS** - Amazon Simple Notification Service (SNS) is a highly available, durable, secure, fully managed pub/sub messaging service that enables you to decouple microservices, distributed systems, and serverless applications. Using Amazon SNS topics, your publisher systems can fan-out messages to a large number of subscriber endpoints for parallel processing, including Amazon SQS queues, AWS Lambda functions, and HTTP/S webhooks. Additionally, SNS can be used to fan out notifications to end users using mobile push, SMS, and email. + + +**Business** - AWS recommends Business Support if you have production workloads on AWS and want 24x7 phone, email and chat access to technical support and architectural guidance in the context of your specific use-cases. You get full access to AWS Trusted Advisor Best Practice Checks. Also, you get access to Infrastructure Event Management for an additional fee. + +**Developer** - AWS recommends Developer Support if you are testing or doing early development on AWS and want the ability to get email-based technical support during business hours as well as general architectural guidance as you build and test. + +**Basic** - The basic plan only provides access to the following: + +Customer Service & Communities - 24x7 access to customer service, documentation, whitepapers, and support forums. AWS Trusted Advisor - Access to the 7 cEFS" - Amazon EFS is a file storage service for use with Amazon EC2. Amazon EFS provides a file system interface, file system access semantics, and concurrently-accessible storage for up to thousands of Amazon EC2 instances. Amazon EFS uses the Network File System protocol. + +How EFS works:esources following best practices to increase performance and improve security. AWS Health - Your Account Health Dashboard : A personalized view of the health of your AWS services, and alerts when your resources are impacted. + +**Enterprise** - AWS Enterprise Support provides customers with concierge-like service where the main focus is helping the customer achieve their outcomes and find success in the cloud. With Enterprise Support, you get 24x7 technical support from high-quality engineers, tools and technology to automatically manage the health of your environment, consultative architectural guidance delivered in the context of your applications and use-cases, and a designated Technical Account Manager (TAM) to coordinate access to proactive/preventative programs and AWS subject matter experts. Access to Infrastructure Event Management is included in the plan. + +https://aws.amazon.com/premiumsupport/plans/ + +EFS" - Amazon EFS is a file storage service for use with Amazon EC2. Amazon EFS provides a file system interface, file system access semantics, and concurrently-accessible storage for up to thousands of Amazon EC2 instances. Amazon EFS uses the Network File System protocol. + +How EFS works: + + +![[Pasted image 20230424003955.png]] + + +**Amazon GuardDuty** - Amazon GuardDuty is a threat detection service that monitors malicious activity and unauthorized behavior to protect your AWS account. GuardDuty analyzes billions of events across your AWS accounts from AWS CloudTrail (AWS user and API activity in your accounts), Amazon VPC Flow Logs (network traffic data), and DNS Logs (name query patterns). This service is for AWS account level access, not for instance-level management like an EC2. GuardDuty cannot be used to check OS vulnerabilities. + + +A**mazon Macie** - Amazon Macie is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data in AWS. Macie helps identify and alert you to sensitive data, such as personally identifiable information (PII). This service is for securing data and has nothing to do with an EC2 security assessment. Macie cannot be used to check OS vulnerabilities. + + +**AWS Direct Connect** + +AWS Direct Connect is a cloud service solution that makes it easy to establish a dedicated network connection from your premises to AWS. You can use AWS Direct Connect to establish a private virtual interface from your on-premise network directly to your Amazon VPC, providing you with a private, high bandwidth network connection between your network and your VPC. This connection is private and does not go over the public internet. It takes at least a month to establish this physical connection. + +![[Pasted image 20230424004931.png]] + +**Amazon VPC Endpoint** - A VPC endpoint enables you to privately connect your VPC to supported AWS services and VPC endpoint services powered by AWS PrivateLink without requiring an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection. Instances in your VPC do not require public IP addresses to communicate with resources in the service. Traffic between your VPC and the other service does not leave the Amazon network. VPC Endpoint cannot be used to privately connect on-premises data center to AWS Cloud. + + +**Each AWS Region consists of a minimum of three Availability Zones** + +**Each Availability Zone (AZ) consists of one or more discrete data centers** + + + +Amazon EC2 Spot Instances let you take advantage of unused EC2 capacity in the AWS cloud. Spot Instances are available at up to a 90% discount compared to On-Demand prices. You can use Spot Instances for various stateless, fault-tolerant, or flexible applications such as big data, containerized workloads, CI/CD, web servers, high-performance computing (HPC), and other test & development workloads + + +AWS Web Application Firewall (WAF) offers protection from common web exploits at which layer? +**Layer 7** + +AWS WAF is a web application firewall that lets you monitor the HTTP and HTTPS requests that are forwarded to an Amazon API Gateway API, Amazon CloudFront or an Application Load Balancer. HTTP and HTTPS requests are part of the Application layer, which is layer 7. + +Incorrect options: + +**Layer 3** - Layer 3 is the Network layer and this layer decides which physical path data will take when it moves on the network. AWS Shield offers protection at this layer. WAF does not offer protection at this layer. + +**Layer 4** - Layer 4 is the Transport layer and this layer data transmission occurs using TCP or UDP protocols. AWS Shield offers protection at this layer. WAF does not offer protection at this layer. + +**Infrastructure as a Service (IaaS) one example is ec2** + +AWS Shield Advanced provides expanded DDoS attack protection for web applications running on which of the following resources? (Select two) + +**Amazon Route 53** + +**AWS Global Accelerator** + +AWS Shield Standard is activated for all AWS customers, by default. For higher levels of protection against attacks, you can subscribe to AWS Shield Advanced. With Shield Advanced, you also have exclusive access to advanced, real-time metrics and reports for extensive visibility into attacks on your AWS resources. With the assistance of the DRT (DDoS response team), AWS Shield Advanced includes intelligent DDoS attack detection and mitigation for not only for network layer (layer 3) and transport layer (layer 4) attacks but also for application layer (layer 7) attacks. + +AWS Shield Advanced provides expanded DDoS attack protection for web applications running on the following resources: Amazon Elastic Compute Cloud, Elastic Load Balancing (ELB), Amazon CloudFront, Amazon Route 53, AWS Global Accelerator. AWS Global Accelerator is a service in which you create _accelerators_ to improve the performance of your applications for local and global users. Depending on the type of accelerator you choose, you can gain additional benefits: + +- With a standard accelerator, you can improve availability of your internet applications that are used by a global audience. With a standard accelerator, Global Accelerator directs traffic over the AWS global network to endpoints in the nearest Region to the client. + +- With a custom routing accelerator, you can map one or more users to a specific destination among many destinations. + + + **Compute Optimizer** - AWS Compute Optimizer recommends optimal AWS resources for your workloads to reduce costs and improve performance by using machine learning to analyze historical utilization metrics. Over-provisioning resources can lead to unnecessary infrastructure costs, and under-provisioning resources can lead to poor application performance. Compute Optimizer helps you choose optimal configurations for three types of AWS resources: Amazon EC2 instances, Amazon EBS volumes, and AWS Lambda functions, based on your utilization data. + +Compute Optimizer recommends up to 3 options from 140+ EC2 instance types, as well as a wide range of EBS volume and Lambda function configuration options, to right-size your workloads. Compute Optimizer also projects what the CPU utilization, memory utilization, and run time of your workload would have been on recommended AWS resource options. This helps you understand how your workload would have performed on the recommended options before implementing the recommendations. + +How Compute Optimizer works:![[Pasted image 20230424010029.jpg]] + +**AWS Budgets** - AWS Budgets allows you to set custom budgets to track your cost and usage from the simplest to the most complex use cases. With AWS Budgets, you can choose to be alerted by email or SNS notification when actual or forecasted cost and usage exceed your budget threshold, or when your actual RI and Savings Plans' utilization or coverage drops below your desired threshold. With AWS Budget Actions, you can also configure specific actions to respond to cost and usage status in your accounts, so that if your cost or usage exceeds or is forecasted to exceed your threshold, actions can be executed automatically or with your approval to reduce unintentional over-spending. + +**AWS Cost Explorer** - AWS Cost Explorer has an easy-to-use interface that lets you visualize, understand, and manage your AWS costs and usage over time. Cost Explorer Resource Rightsizing Recommendations and Compute Optimizer use the same recommendation engine. The Compute Optimizer recommendation engine delivers recommendations to help customers identify optimal EC2 instance types for their workloads. The Cost Explorer console and API surface a subset of these recommendations that may lead to cost savings, and augments them with customer-specific cost and savings information (e.g. billing information, available credits, RI, and Savings Plans) to help Cost Management owners quickly identify savings opportunities through infrastructure rightsizing. Compute Optimizer console and its API delivers all recommendations regardless of the cost implications. + +# Route 53 Routing Policy +![[Pasted image 20230424010230.jpg]] + +# Supported aws reservation to optimize cost +**EC2 Instances** + +**DynamoDB** + +**RDS** + +The following AWS services support reservations to optimize costs: + +Amazon EC2 Reserved Instances: You can use Amazon EC2 Reserved Instances to reserve capacity and receive a discount on your instance usage compared to running On-Demand instances. + +Amazon DynamoDB Reserved Capacity: If you can predict your need for Amazon DynamoDB read-and-write throughput, Reserved Capacity offers significant savings over the normal price of DynamoDB provisioned throughput capacity. + +Amazon ElastiCache Reserved Nodes: Amazon ElastiCache Reserved Nodes give you the option to make a low, one-time payment for each cache node you want to reserve and, in turn, receive a significant discount on the hourly charge for that node. + +Amazon RDS RIs: Like Amazon EC2 RIs, Amazon RDS RIs can be purchased using No Upfront, Partial Upfront, or All Upfront terms. All Reserved Instance types are available for Aurora, MySQL, MariaDB, PostgreSQL, Oracle, and SQL Server database engines. + +Amazon Redshift Reserved Nodes: If you intend to keep an Amazon Redshift cluster running continuously for a prolonged period, you should consider purchasing reserved-node offerings. These offerings provide significant savings over on-demand pricing, but they require you to reserve compute nodes and commit to paying for those nodes for either a 1- or 3-year duration. + + +**EBS volume can be attached to a single instance in the same Availability Zone whereas EFS file system can be mounted on instances across multiple Availability Zones** + +Amazon Elastic File System (Amazon EFS) provides a simple, scalable, fully managed elastic NFS file system for use with AWS Cloud services and on-premises resources. It is built to scale on-demand to petabytes without disrupting applications, growing and shrinking automatically as you add and remove files, eliminating the need to provision and manage capacity to accommodate growth. + +The service is designed to be highly scalable, highly available, and highly durable. Amazon EFS file systems store data and metadata across multiple Availability Zones in an AWS Region. EFS file system can be mounted on instances across multiple Availability Zones. + +Amazon Elastic Block Store (EBS) is an easy to use, high-performance block storage service designed for use with Amazon Elastic Compute Cloud (EC2) for both throughput and transaction-intensive workloads at any scale. + +Designed for mission-critical systems, EBS volumes are replicated within an Availability Zone (AZ) and can easily scale to petabytes of data. You can attach an available EBS volume to one instance that is in the same Availability Zone as the volume. + +**AWS Artifact** + +AWS Artifact is your go-to, central resource for compliance-related information that matters to your organization. It provides on-demand access to AWS’ security and compliance reports and select online agreements. Reports available in AWS Artifact include our Service Organization Control (SOC) reports, Payment Card Industry (PCI) reports, and certifications from accreditation bodies across geographies and compliance verticals that validate the implementation and operating effectiveness of AWS security controls. Different types of agreements are available in AWS Artifact Agreements to address the needs of customers subject to specific regulations. For example, the Business Associate Addendum (BAA) is available for customers that need to comply with the Health Insurance Portability and Accountability Act (HIPAA). It is not a service, it's a no-cost, self-service portal for on-demand access to AWS’ compliance reports. + + +![[Screenshot from 2023-04-24 01-19-35.png]] + + +**Customer Managed CMK** + +A customer master key (CMK) is a logical representation of a master key. The CMK includes metadata, such as the key ID, creation date, description, and key state. The CMK also contains the key material used to encrypt and decrypt data. These are created and managed by the AWS customer. Access to these can be controlled using the AWS IAM service. + +Incorrect options: + +**Secrets Manager** - AWS Secrets Manager helps you protect secrets needed to access your applications, services, and IT resources. The service enables you to easily rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle. You cannot use Secrets Manager for creating and using your own keys for encryption on AWS services. + +**AWS Managed CMK** - AWS managed CMKs are CMKs in your account that are created, managed, and used on your behalf by an AWS service that is integrated with AWS KMS. + +**AWS Owned CMK** - AWS owned CMKs are a collection of CMKs that an AWS service owns and manages for use in multiple AWS accounts. AWS owned CMKs are not in your AWS account. You cannot view or manage these CMKs. + +Reference: + +[https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#master_keys](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#master_keys) + + +**Use Access Key ID and Secret Access Key to access AWS resources programmatically** + +Access keys are long-term credentials for an IAM user or the AWS account root user. You can use access keys to sign programmatic requests to the AWS CLI or AWS API (directly or using the AWS SDK). Access keys consist of two parts: an access key ID and a secret access key. As a user name and password, you must use both the access key ID and secret access key together to authenticate your requests. When you create an access key pair, save the access key ID and secret access key in a secure location. The secret access key is available only at the time you create it. If you lose your secret access key, you must delete the access key and create a new one. + +**Amazon DynamoDB with global tables** + +Amazon DynamoDB is a fully managed, serverless, key-value NoSQL database designed to run high-performance applications at any scale. DynamoDB offers built-in security, continuous backups, automated multi-region replication, in-memory caching, and data export tools. + +DynamoDB global tables replicate data automatically across your choice of AWS Regions and automatically scale capacity to accommodate your workloads. With global tables, your globally distributed applications can access data locally in the selected regions to get single-digit millisecond read and write performance. DynamoDB offers active-active cross-region support that is needed for the company. +![[Pasted image 20230424012706.jpg]] + +**AWS Pricing Calculator** + +AWS Pricing Calculator lets you explore AWS services and create an estimate for the cost of your use cases on AWS. You can model your solutions before building them, explore the price points and calculations behind your estimate, and find the available instance types and contract terms that meet your needs. This enables you to make informed decisions about using AWS. You can plan your AWS costs and usage or price out setting up a new set of instances and services. AWS Pricing Calculator can provide the estimate of the AWS service usage based on the list of AWS services.![[Pasted image 20230424012922.png]] + +![[Pasted image 20230424013004.jpg]] + +![[Screenshot from 2023-04-24 01-36-06.png]] + +The AWS Partner Network (APN) is the global partner program for technology and consulting businesses that leverage Amazon Web Services to build solutions and services for customers. + +APN Consulting Partners are professional services firms that help customers of all types and sizes design, architect, build, migrate, and manage their workloads and applications on AWS, accelerating their migration to AWS cloud. + +![[Pasted image 20230424013722.jpg]] + + +There are three fundamental drivers of cost with AWS: compute, storage, and outbound data transfer. In most cases, there is no charge for inbound data transfer or data transfer between other AWS services within the same region. Outbound data transfer is aggregated across services and then charged at the outbound data transfer rate. + +Per AWS pricing, data transfer between S3 and EC2 instances within the same region is not charged, so there would be no data transfer charge for moving 500 GB of data from an EC2 instance to an S3 bucket in the same region. + + +![[Pasted image 20230424013916.jpg]] +![[Pasted image 20230424013923.jpg]] + +![[Pasted image 20230424013934.jpg]] +NAT Gateway is managed by AWS but NAT Instance is managed by you. + +**Leverage AWS Professional Services to accelerate the infrastructure migration** + +The AWS Professional Services organization is a global team of experts that can help you realize your desired business outcomes when using the AWS Cloud. AWS Professional Services consultants can supplement your team with specialized skills and experience that can help you achieve quick results. Therefore, leveraging AWS Professional Services can accelerate the infrastructure migration for the startup. + +**Utilize AWS Partner Network (APN) to build a custom solution for this infrastructure migration** + +The AWS Partner Network (APN) is the global partner program for technology and consulting businesses that leverage Amazon Web Services to build solutions and services for customers. The startup can work with experts from APN to build a custom solution for this infrastructure migration. + +![[Pasted image 20230424144542.jpg]] + +# **Amazon Elastic Container Service - Fargate launch type** + +AWS Fargate is a serverless compute engine for containers. It works with both Amazon Elastic Container Service (ECS) and Amazon Elastic Kubernetes Service (EKS). Fargate makes it easy for you to focus on building your applications. Fargate removes the need to provision and manage servers, lets you specify and pay for resources per application, and improves security through application isolation by design. Fargate allocates the right amount of compute, eliminating the need to choose instances and scale cluster capacity. You only pay for the resources required to run your containers, so there is no over-provisioning and paying for additional servers. Fargate runs each task or pod in its kernel providing the tasks and pods their own isolated compute environment. This enables your application to have workload isolation and improved security by design.![[Pasted image 20230424144719.png]] + +![[Pasted image 20230424144908.jpg]]![[Pasted image 20230424144912.jpg]] + +**AWS Service Catalog** - AWS Service Catalog allows organizations to create and manage catalogs of IT services that are approved for use on AWS. These IT services can include everything from virtual machine images, servers, software, and databases to complete multi-tier application architectures. + +**AWS Partner Network** - Organizations can take help from the AWS Partner Network (APN) to identify the right AWS services to build solutions on AWS Cloud. APN is the global partner program for technology and consulting businesses that leverage Amazon Web Services to build solutions and services for customers. + + +**A VPC spans all of the Availability Zones in the Region whereas a subnet spans only one Availability Zone in the Region** Amazon Virtual Private Cloud (Amazon VPC) is a logically isolated section of the AWS Cloud where you can launch AWS resources in a virtual network that you define. You have complete control over your virtual networking environment, including the selection of your IP address range, creation of subnets, and configuration of route tables and network gateways. A VPC spans all of the Availability Zones in the Region. + +A subnet is a range of IP addresses within your VPC. A subnet spans only one Availability Zone in the Region. + + +![[Pasted image 20230424145138.jpg]] + + +**S3 is object based storage, EBS is block based storage and EFS is file based storage** +# AWS Trusted advisor +![[Pasted image 20230424145312.png]] + +https://aws.amazon.com/premiumsupport/technology/trusted-advisor/ + +[https://aws.amazon.com/premiumsupport/technology/trusted-advisor/](https://aws.amazon.com/premiumsupport/technology/trusted-advisor/)![[Pasted image 20230424145430.jpg]] + +[https://aws.amazon.com/rds/features/multi-az/](https://aws.amazon.com/rds/features/multi-az/) + +**AWS Step Function** - AWS Step Function lets you coordinate multiple AWS services into serverless workflows. You can design and run workflows that stitch together services such as AWS Lambda, AWS Glue and Amazon SageMaker.![[Pasted image 20230424145650.png]] +[https://aws.amazon.com/step-functions/](https://aws.amazon.com/step-functions/) +[https://aws.amazon.com/batch/](https://aws.amazon.com/batch/) + +Understand the difference between AWS Step Functions and AWS Batch. You may get questions to choose one over the other. AWS Batch runs batch computing workloads by provisioning the compute resources. AWS Step Function does not provision any resources. Step Function only orchestrates AWS services required for a given workflow. You cannot use Step Functions to plan, schedule and execute your batch computing workloads by provisioning underlying resources. + + [https://aws.amazon.com/lambda/](https://aws.amazon.com/lambda/) + + +**AWS Config** + +AWS Config is a service that enables you to assess, audit, and evaluate the configurations of your AWS resources. Config continuously monitors and records your AWS resource configurations and allows you to automate the evaluation of recorded configurations against desired configurations. Think resource-specific history, audit, and compliance; think Config. + +With AWS Config, you can do the following: 1. Evaluate your AWS resource configurations for desired settings. 2. Get a snapshot of the current configurations of the supported resources that are associated with your AWS account. 3. Retrieve configurations of one or more resources that exist in your account. 4. Retrieve historical configurations of one or more resources. 5. Receive a notification whenever a resource is created, modified, or deleted. 6.View relationships between resources. For example, you might want to find all resources that use a particular security group. + +```question + +A photo sharing web application wants to store thumbnails of user-uploaded images on Amazon S3. The thumbnails are rarely used but need to be immediately accessible from the web application. The thumbnails can be regenerated easily if they are lost. Which is the most cost-effective way to store these thumbnails on S3? +``` + +**Use S3 One-Zone Infrequent Access (One-Zone IA) to store the thumbnails** + +S3 One Zone-IA is for data that is accessed less frequently but requires rapid access when needed. Unlike other S3 Storage Classes which store data in a minimum of three Availability Zones (AZs), S3 One Zone-IA stores data in a single AZ and costs 20% less than S3 Standard-IA. S3 One Zone-IA offers the same high durability, high throughput, and low latency of S3 Standard, with a low per GB storage price and per GB retrieval fee. Although S3 One Zone-IA offers less availability than S3 Standard but that's not an issue for the given use-case since the thumbnails can be regenerated easily. + +As the thumbnails are rarely used but need to be rapidly accessed when required, so S3 One Zone-IA is the best choice for this use-case. + +[https://docs.aws.amazon.com/vpc/latest/userguide/VPC_SecurityGroups.html](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_SecurityGroups.html) + [https://docs.aws.amazon.com/vpc/latest/userguide/vpc-network-acls.html](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-network-acls.html) + + +**AWS Local Zones** + +AWS Local Zones allow you to use select AWS services, like compute and storage services, closer to more end-users, providing them very low latency access to the applications running locally. AWS Local Zones are also connected to the parent region via Amazon’s redundant and very high bandwidth private network, giving applications running in AWS Local Zones fast, secure, and seamless access to the rest of AWS services. + +You should use AWS Local Zones to deploy workloads closer to your end-users for low-latency requirements. AWS Local Zones have their connection to the internet and support AWS Direct Connect, so resources created in the Local Zone can serve local end-users with very low-latency communications. + +Various AWS services such as Amazon Elastic Compute Cloud (EC2), Amazon Virtual Private Cloud (VPC), Amazon Elastic Block Store (EBS), Amazon FSx, Amazon Elastic Load Balancing, Amazon EMR, Amazon ElastiCache, and Amazon Relational Database Service (RDS) are available locally in the AWS Local Zones. You can also use services that orchestrate or work with local services such as Amazon EC2 Auto Scaling, Amazon EKS clusters, Amazon ECS clusters, Amazon EC2 Systems Manager, Amazon CloudWatch, AWS CloudTrail, and AWS CloudFormation. AWS Local Zones also provide a high-bandwidth, secure connection to the AWS Region, allowing you to seamlessly connect to the full range of services in the AWS Region through the same APIs and toolsets + +**AWS Edge Locations** - An AWS Edge location is a site that CloudFront uses to cache copies of the content for faster delivery to users at any location. + + [https://aws.amazon.com/rekognition/](https://aws.amazon.com/rekognition/) + + +[https://aws.amazon.com/sns/](https://aws.amazon.com/sns/) + +ASG IAM always free + +![[Screenshot from 2023-04-24 15-21-36.png]] + +AWS Pillars needs to recap with details. + + +**AWS Systems Manager Session Manager** + +AWS SSM Session Manager is a fully-managed service that provides you with an interactive browser-based shell and CLI experience. It helps provide secure and auditable instance management without the need to open inbound ports, maintain bastion hosts, and manage SSH keys. Session Manager helps to enable compliance with corporate policies that require controlled access to instances, increase security and auditability of access to the instances while providing simplicity and cross-platform instance access to end-users. + +![[Pasted image 20230424155853.jpg]] +https://aws.amazon.com/storagegateway/features/ + + +The AWS Health - Service Health Dashboard is the single place to learn about the availability and operations of AWS services. You can view the overall status of AWS services, and you can sign in to view personalized communications about your particular AWS account or organization. + +You can check on this page [https://health.aws.amazon.com/health/status](https://health.aws.amazon.com/health/status) to get current status information. + + +# Aws compute Optimizer + +AWS Compute Optimizer helps you identify the optimal AWS resource configurations, such as Amazon EC2 instance types, Amazon EBS volume configurations, and AWS Lambda function memory sizes, using machine learning to analyze historical utilization metrics. AWS Compute Optimizer delivers recommendations for selected types of EC2 instances, EC2 Auto Scaling groups, EBS volumes, and Lambda functions. + +Compute Optimizer calculates an individual performance risk score for each resource dimension of the recommended instance, including CPU, memory, EBS throughput, EBS IOPS, disk throughput, disk throughput, network throughput, and network packets per second (PPS). + +AWS Compute Optimizer provides EC2 instance type and size recommendations for EC2 Auto Scaling groups with a fixed group size, meaning desired, minimum, and maximum are all set to the same value and have no scaling policy attached. + +AWS Compute Optimizer supports IOPS and throughput recommendations for General Purpose (SSD) (gp3) volumes and IOPS recommendations for Provisioned IOPS (io1 and io2) volumes. + +Compute Optimizer helps you optimize two categories of Lambda functions. The first category includes Lambda functions that may be over-provisioned in memory sizes. The second category includes compute-intensive Lambda functions that may benefit from additional CPU power. + + +# **Amazon MQ** +- Amazon MQ is a managed message broker service for Apache ActiveMQ and RabbitMQ that makes it easy to set up and operate message brokers on AWS. Amazon MQ reduces your operational responsibilities by managing the provisioning, setup, and maintenance of message brokers for you. Because Amazon MQ connects to your current applications with industry-standard APIs and protocols, you can easily migrate to AWS without having to rewrite code. + +If you're using messaging with existing applications, and want to move the messaging functionality to the cloud quickly and easily, AWS recommends you consider Amazon MQ. It supports industry-standard APIs and protocols so you can switch from any standards-based message broker to Amazon MQ without rewriting the messaging code in your applications. If you are building brand new applications in the cloud, AWS recommends you consider Amazon SQS and Amazon SNS. + +How MQ works: + +![[Pasted image 20230424161932.jpg]] + +# AWS Budget and Cloud watch +via - [https://aws.amazon.com/aws-cost-management/aws-budgets/](https://aws.amazon.com/aws-cost-management/aws-budgets/) + +Exam Alert: + +It is useful to note the difference between CloudWatch Billing vs Budgets: + +CloudWatch Billing Alarms: Sends an alarm when the actual cost exceeds a certain threshold. + +Budgets: Sends an alarm when the actual cost exceeds the budgeted amount or even when the cost forecast exceeds the budgeted amount. + +![[Pasted image 20230424162035.jpg]] +**AWS Budgets** + +AWS Budgets gives the ability to set custom budgets that alert you when your costs or usage exceed (or are forecasted to exceed) your budgeted amount. You can also use AWS Budgets to set reservation utilization or coverage targets and receive alerts when your utilization drops below the threshold you define. Budgets can be created at the monthly, quarterly, or yearly level, and you can customize the start and end dates. You can further refine your budget to track costs associated with multiple dimensions, such as AWS service, linked account, tag, +and others. Budget alerts can be sent via email and/or Amazon Simple Notification Service (SNS) topic. + + +**AWS Cost Explorer** - AWS Cost Explorer has an easy-to-use interface that lets you visualize, understand, and manage your AWS costs and usage over time. AWS Cost Explorer includes a default report that helps you visualize the costs and usage associated with your top five cost-accruing AWS services, and gives you a detailed breakdown on all services in the table view. The reports let you adjust the time range to view historical data going back up to twelve months to gain an understanding of your cost trends. + +AWS Cost Explorer Reports: + +![[Pasted image 20230424170820.jpg]] + + +**IAM Role** - An IAM role is similar to an IAM user, in that it is an AWS identity with permission policies that determine what the identity can and cannot do in AWS. However, instead of being uniquely associated with one person, a role is intended to be assumable by anyone who needs it. + +**IAM Group** - An IAM group is a collection of IAM users. Groups let you specify permissions for multiple users, which can make it easier to manage the permissions for those users. + +**AWS Policy** - You manage access in AWS by creating policies and attaching them to IAM identities (users, groups of users, or roles) or AWS resources. A policy is an object in AWS that, when associated with an identity or resource, defines their permissions. + +[https://d1.awsstatic.com/whitepapers/architecture/AWS_Well-Architected_Framework.pdf](https://d1.awsstatic.com/whitepapers/architecture/AWS_Well-Architected_Framework.pdf) + + + + Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket. Transfer Acceleration takes advantage of Amazon CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path. Transfer Acceleration cannot be used to improve the performance of a static website. + +https://docs.aws.amazon.com/AmazonS3/latest/dev/transfer-acceleration.html + +# Autoscaling group + +[https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html) + +![[Screenshot from 2023-04-24 23-56-06.png]] + +**AWS Trusted Advisor** + +AWS Trusted Advisor is an online tool that provides real-time guidance to help provision your resources following AWS best practices. Whether establishing new workflows, developing applications, or as part of ongoing improvement, recommendations provided by Trusted Advisor regularly help keep your solutions provisioned optimally. AWS Trusted Advisor analyzes your AWS environment and provides best practice recommendations in five categories: Cost Optimization, Performance, Security, Fault Tolerance, Service Limits. + +AWS Trusted Advisor checks the Amazon Elastic Compute Cloud (Amazon EC2) instances that were running at any time during the last 14 days and alerts you if the daily CPU utilization was 10% or less and network I/O was 5 MB or less on 4 or more days. +via - [https://aws.amazon.com/premiumsupport/technology/trusted-advisor/best-practice-checklist/#Cost_Optimization](https://aws.amazon.com/premiumsupport/technology/trusted-advisor/best-practice-checklist/#Cost_Optimization) + +**AWS Cost Explorer** + +AWS Cost Explorer has an easy-to-use interface that lets you visualize, understand, and manage your AWS costs and usage over time. AWS Cost Explorer includes a default report that helps you visualize the costs and usage associated with your top five cost-accruing AWS services, and gives you a detailed breakdown of all services in the table view. The reports let you adjust the time range to view historical data going back up to twelve months to gain an understanding of your cost trends. + +The rightsizing recommendations feature in Cost Explorer helps you identify cost-saving opportunities by downsizing or terminating EC2 instances. You can see all of your underutilized EC2 instances across member accounts in a single view to immediately identify how much you can save. + +via - [https://aws.amazon.com/aws-cost-management/aws-cost-explorer/](https://aws.amazon.com/aws-cost-management/aws-cost-explorer/) +"AWS Cost Explorer" vs "AWS Cost and Usage Reports": ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt3-q59-i2.png) +![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt3-q59-i3.png) via - [https://aws.amazon.com/aws-cost-management/aws-cost-and-usage-reporting/](https://aws.amazon.com/aws-cost-management/aws-cost-and-usage-reporting/) + +References: + +[https://docs.aws.amazon.com/cur/latest/userguide/what-is-cur.html](https://docs.aws.amazon.com/cur/latest/userguide/what-is-cur.html) + +[https://aws.amazon.com/aws-cost-management/aws-cost-explorer/](https://aws.amazon.com/aws-cost-management/aws-cost-explorer/) + +[https://aws.amazon.com/aws-cost-management/aws-cost-and-usage-reporting/](https://aws.amazon.com/aws-cost-management/aws-cost-and-usage-reporting/) + +NEED TO WORK COST RELATED AWS PRODUCTS. + +** **Effect, Action** - Most policies are stored in AWS as JSON documents. Identity-based policies and policies used to set permissions boundaries are JSON policy documents that you attach to a user or role. Resource-based policies are JSON policy documents that you attach to a resource. + +A JSON policy document includes these elements: + +1. Optional policy-wide information at the top of the document +2. One or more individual statements + +Each statement includes information about a single permission. The information in a statement is contained within a series of elements. + +1. Version – Specify the version of the policy language that you want to use. As a best practice, use the latest 2012-10-17 version. + +2. Statement – Use this main policy element as a container for the following elements. You can include more than one statement in a policy. + + 1. Sid (Optional) – Include an optional statement ID to differentiate between your statements. + + 2. Effect – Use Allow or Deny to indicate whether the policy allows or denies access. + + 3. Principal (Required in only some circumstances) – If you create a resource-based policy, you must indicate the account, user, role, or federated user to which you would like to allow or deny access. If you are creating an IAM permissions policy to attach to a user or role, you cannot include this element. The principal is implied as that user or role. + + 4. Action – Include a list of actions that the policy allows or denies. + + 5. Resource (Required in only some circumstances) – If you create an IAM permissions policy, you must specify a list of resources to which the actions apply. If you create a resource-based policy, this element is optional. If you do not include this element, then the resource to which the action applies is the resource to which the policy is attached. + + 6. Condition (Optional) – Specify the circumstances under which the policy grants permission. + + +Amazon EFS Overview: ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt3-q23-i1.jpg) via - [https://aws.amazon.com/efs/](https://aws.amazon.com/efs/) + +**DynamoDB** - Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. It's a fully managed, multi-Region, multi-master, durable database with built-in security, backup and restore, and in-memory caching for internet-scale applications. All of your data is stored on solid-state disks (SSDs) and is automatically replicated across multiple Availability Zones in an AWS Region, providing built-in high availability and data durability. + +DynamoDB High Availability: ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt3-q12-i1.jpg) via - [https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.html](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.html) + +**EFS** - Amazon Elastic File System (Amazon EFS) provides a simple, scalable, fully managed elastic NFS file system for use with AWS Cloud services and on-premises resources. It is built to scale on-demand to petabytes without disrupting applications, growing and shrinking automatically as you add and remove files, eliminating the need to provision and manage capacity to accommodate growth. Amazon EFS is a regional service storing data within and across multiple Availability Zones (AZs) for high availability and durability. + +EFS High Availability: ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt3-q12-i2.jpg) via - [https://aws.amazon.com/efs/faq/](https://aws.amazon.com/efs/faq/) + +**US East (N. Virginia) - us-east-1** + +You can monitor your estimated AWS charges by using Amazon CloudWatch. Billing metric data is stored in the US East (N. Virginia) Region and represents worldwide charges. This data includes the estimated charges for every service in AWS that you use, in addition to the estimated overall total of your AWS charges.\ + +You may see a question around this concept in the exam. Just remember that only **S3 and DynamoDB** support **VPC Endpoint Gateway**. A**ll other services that support VPC Endpoints use a VPC Endpoint Interface.** + +**Read Replica improves database scalability** + + +**Virtual Private Gateway** + +**Customer Gateway** + +AWS Site-to-Site VPN enables you to securely connect your on-premises network or branch office site to your Amazon Virtual Private Cloud (Amazon VPC). VPN Connections are a good solution if you have an immediate need, and have low to modest bandwidth requirements. This connection goes over the public internet. Virtual Private Gateway (or a Transit Gateway) and Customer Gateway are the components of a VPC. + +A virtual private gateway is the VPN concentrator on the Amazon side of the Site-to-Site VPN connection. A customer gateway is a resource in AWS that provides information to AWS about your Customer gateway device. + +Components of an AWS Site-to-Site VPN: ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt3-q11-i1.jpg) via - [https://docs.aws.amazon.com/vpn/latest/s2svpn/how_it_works.html](https://docs.aws.amazon.com/vpn/latest/s2svpn/how_it_works.html) + +**You will pay a fee each time you read from or write data stored on the EFS - Infrequent Access storage class** - The Infrequent Access storage class is cost-optimized for files accessed less frequently. Data stored on the Infrequent Access storage class costs less than Standard and you will pay a fee each time you read from or write to a file. + +**Amazon EBS Snapshots are stored incrementally, which means you are billed only for the changed blocks stored** - Amazon EBS Snapshots are a point in time copy of your block data. For the first snapshot of a volume, Amazon EBS saves a full copy of your data to Amazon S3. EBS Snapshots are stored incrementally, which means you are billed only for the changed blocks stored. + +**You must use an AMI from the same region as that of the EC2 instance. The region of the AMI has no bearing on the performance of the EC2 instance** + +An Amazon Machine Image (AMI) provides the information required to launch an instance. You must specify an AMI when you launch an instance. You can launch multiple instances from a single AMI when you need multiple instances with the same configuration. + +The AMI must be in the same region as that of the EC2 instance to be launched. If the AMI exists in a different region, you can copy that AMI to the region where you want to launch the EC2 instance. The region of AMI has no bearing on the performance of the EC2 instance. + +Amazon Machine Images (AMI) Overview: ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt3-q34-i1.jpg) via - [https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html) + +**For each resource, each tag key must be unique, and each tag key can have only one value** + +**You must activate both AWS generated tags and user-defined tags separately before they can appear in Cost Explorer or on a cost allocation report** + +A Cost Allocation Tag is a label that you or AWS assigns to an AWS resource. Each tag consists of a key and a value. For each resource, each tag key must be unique, and each tag key can have only one value. You can use tags to organize your resources, and cost allocation tags to track your AWS costs on a detailed level. + +AWS provides two types of cost allocation tags, an AWS generated tags and user-defined tags. AWS defines, creates, and applies the AWS generated tags for you, and you define, create, and apply user-defined tags. You must activate both types of tags separately before they can appear in Cost Explorer or on a cost allocation report. + +AWS Cost Allocation Tags Overview: ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt3-q64-i1.jpg) via - [https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/cost-alloc-tags.html](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/cost-alloc-tags.html) + +**Amazon S3 Glacier** - Amazon S3 Glacier (S3 Glacier), is a storage service optimized for infrequently used data, or "cold data. Data at rest stored in S3 Glacier is automatically server-side encrypted using 256-bit Advanced Encryption Standard (AES-256) with keys maintained by AWS. + +**AWS Storage Gateway** - AWS Storage Gateway is a hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage. All data transferred between the gateway and AWS storage is encrypted using SSL (for all three types of gateways - File, Volume and Tape Gateways). + +**AWS Migration Evaluator** + +Migration Evaluator (Formerly TSO Logic) is a complimentary service to create data-driven business cases for AWS Cloud planning and migration. + +Migration Evaluator quickly provides a business case to make sound AWS planning and migration decisions. With Migration Evaluator, your organization can build a data-driven business case for AWS, gets access to AWS expertise, visibility into the costs associated with multiple migration strategies, and insights on how reusing existing software licensing reduces costs further. + +**AWS Shield Advanced offers protection against higher fees that could result from a DDoS attack** + +AWS Shield Advanced offers some cost protection against spikes in your AWS bill that could result from a DDoS attack. This cost protection is provided for your Elastic Load Balancing load balancers, Amazon CloudFront distributions, Amazon Route 53 hosted zones, Amazon Elastic Compute Cloud instances, and your AWS Global Accelerator accelerators. + +AWS Shield Advanced is a paid service for all customers, irrespective of the Support plan. + +**Amazon Kendra** - Amazon Kendra is an intelligent search service powered by machine learning. Kendra reimagines enterprise search for your websites and applications so your employees and customers can easily find the content they are looking for, even when it’s scattered across multiple locations and content repositories within your organization. + +Using Amazon Kendra, you can stop searching through troves of unstructured data and discover the right answers to your questions, when you need them. Amazon Kendra is a fully managed service, so there are no servers to provision, and no machine learning models to build, train, or deploy. Kendra supports unstructured and semi-structured data in .html, MS Office (.doc, .ppt), PDF, and text formats. + +Unlike conventional search technology, natural language search capabilities return the answers you’re looking for quickly and accurately, no matter where the information lives within your organization. + +Kendra’s deep learning models come pre-trained across 14 industry domains, allowing it to extract more accurate answers across a wide range of business use cases from the get-go. You can also fine-tune search results by manually adjusting the importance of data sources, authors, freshness, or using custom tags. + +Incorrect options: + +**Amazon Personalize** - Amazon Personalize enables developers to build applications with the same machine learning (ML) technology used by Amazon.com for real-time personalized recommendations. Amazon Personalize makes it easy for developers to build applications capable of delivering a wide array of personalization experiences, including specific product recommendations, personalized product re-ranking, and customized direct marketing. + +**Amazon Comprehend** - Amazon Comprehend is a natural-language processing (NLP) service that uses machine learning to uncover information in unstructured data. Instead of combing through documents, the process is simplified and unseen information is easier to understand. + +Amazon Kendra provides ML-powered search capabilities for all unstructured data customers store in AWS. Kendra offers easy-to-use native connectors to popular AWS repository types such as S3 and RDS databases. Other AI services such as Amazon Comprehend, Amazon Transcribe, and Amazon Comprehend Medical can be used to pre-process documents, generate searchable text, extract entities, and enrich their metadata for more specialized search experiences. + +**Amazon Lex** - Amazon Lex is a service for building conversational interfaces into any application using voice and text. Amazon Lex provides the advanced deep learning functionalities of automatic speech recognition (ASR) for converting speech to text, and natural language understanding (NLU) to recognize the intent of the text, to enable you to build applications with highly engaging user experiences and lifelike conversational interactions. + +Reference: + +[https://aws.amazon.com/kendra/](https://aws.amazon.com/kendra/) + +```question +**The company should just start creating new resources in the destination AWS Region and then migrate the relevant data and applications into this new AWS Region** - The company needs to create resources in the new AWS Region and then move the relevant data and applications into the new AWS Region. There is no off-the-shelf solution or service that the company can use to facilitate this transition. +``` +**The company should just start creating new resources in the destination AWS Region and then migrate the relevant data and applications into this new AWS Region** - The company needs to create resources in the new AWS Region and then move the relevant data and applications into the new AWS Region. There is no off-the-shelf solution or service that the company can use to facilitate this transition. + +**WS Transit Gateway** + +AWS Transit Gateway connects VPCs and on-premises networks through a central hub. This simplifies your network and puts an end to complex peering relationships. It acts as a cloud router – each new connection is only made once. As you expand globally, inter-Region peering connects AWS Transit Gateways using the AWS global network. Your data is automatically encrypted and never travels over the public internet. + +How Transit Gateway can simplify your network: ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt3-q5-i1.jpg) via - [https://aws.amazon.com/transit-gateway/](https://aws.amazon.com/transit-gateway/) +**VPC Peering** - A VPC peering connection is a networking connection between two VPCs that enables you to route traffic between them privately. VPC peering is not transitive, a separate VPC peering connection has to be made between two VPCs that need to talk to each other. With growing VPCs, this gets difficult to manage. + +Transitive VPC Peering is not allowed: ![](https://docs.aws.amazon.com/vpc/latest/peering/images/transitive-peering-diagram.png) via - [https://docs.aws.amazon.com/vpc/latest/peering/invalid-peering-configurations.html](https://docs.aws.amazon.com/vpc/latest/peering/invalid-peering-configurations.html) + + +**Use Cross-Region replication (CRR) to replicate data between distant AWS Regions** + +Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. + +Replication enables automatic, asynchronous copying of objects across Amazon S3 buckets. Buckets that are configured for object replication can be owned by the same AWS account or by different accounts. You can copy objects between different AWS Regions or within the same Region. + +Although Amazon S3 stores your data across multiple geographically distant Availability Zones by default, compliance requirements might dictate that you store data at even greater distances. Cross-Region Replication (CRR) allows you to replicate data between distant AWS Regions to satisfy these requirements. + +Incorrect options: + +**Use Same-Region replication (SRR) to replicate data between distant AWS Regions** - SRR is used to copy objects across Amazon S3 buckets in the same AWS Region, so this option is incorrect. + +Exam Alert: + +Please review the differences between SRR and CRR: ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt3-q8-i1.jpg) via - [https://docs.aws.amazon.com/AmazonS3/latest/dev/replication.html](https://docs.aws.amazon.com/AmazonS3/latest/dev/replication.html) + +Amazon Virtual Private Cloud (Amazon VPC) enables you to launch AWS resources into a virtual network that you've defined. + +The following are the key concepts for VPCs: + +Virtual private cloud (VPC) — A virtual network dedicated to your AWS account. + +Subnet — A range of IP addresses in your VPC. + +Route table — A set of rules, called routes, that are used to determine where network traffic is directed. + +Internet Gateway — A gateway that you attach to your VPC to enable communication between resources in your VPC and the internet. + +VPC endpoint — Enables you to privately connect your VPC to supported AWS services and VPC endpoint services powered by PrivateLink without requiring an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection. + +[https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/consolidatedbilling-other.html](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/consolidatedbilling-other.html) + +AZ is One or more data centers in the same location + +![[Pasted image 20230425100803.jpg]] + +There are four cost components to consider for S3 pricing – storage pricing; request and data retrieval pricing; data transfer and transfer acceleration pricing; and data management features pricing. Under "Data Transfer", You pay for all bandwidth into and out of Amazon S3, except for the following: (1) Data transferred in from the internet, (2) Data transferred out to an Amazon Elastic Compute Cloud (Amazon EC2) instance, when the instance is in the same AWS Region as the S3 bucket, (3) Data transferred out to Amazon CloudFront (CloudFront). + + +# **AWS Device Farm** +- AWS Device Farm is an application testing service that lets you improve the quality of your web and mobile apps by testing them across an extensive range of desktop browsers and real mobile devices; without having to provision and manage any testing infrastructure. The service enables you to run your tests concurrently on multiple desktop browsers or real devices to speed up the execution of your test suite, and generates videos and logs to help you quickly identify issues with your app. + +AWS Device Farm is designed for developers, QA teams, and customer support representatives who are building, testing, and supporting mobile apps to increase the quality of their apps. Application quality is increasingly important, and also getting complex due to the number of device models, variations in firmware and OS versions, carrier and manufacturer customizations, and dependencies on remote services and other apps. AWS Device Farm accelerates the development process by executing tests on multiple devices, giving developers, QA and support professionals the ability to perform automated tests and manual tasks like reproducing customer issues, exploratory testing of new functionality, and executing manual test plans. AWS Device Farm also offers significant savings by eliminating the need for internal device labs, lab managers, and automation infrastructure development. + +How it works: + +![[Pasted image 20230425101319.jpg]] + + + +# AWS Shield Advanced +provides expanded DDoS attack protection for web applications running on the following resources: Amazon Elastic Compute Cloud, Elastic Load Balancing (ELB), Amazon CloudFront, Amazon Route 53, AWS Global Accelerator. + +aws lambda and fargate is serverless. + +**S3 Transfer Acceleration** + +Amazon S3 Transfer Acceleration (S3TA) enables fast, easy, and secure transfers of files over long distances between your client and your Amazon S3 bucket. S3 Transfer Acceleration leverages Amazon CloudFront’s globally distributed AWS Edge Locations. As data arrives at an AWS Edge Location, data is routed to your Amazon S3 bucket over an optimized network path. S3 Transfer Acceleration is designed to optimize transfer speeds from across the world into S3 buckets. If you are uploading to a centralized bucket from geographically dispersed locations, or if you regularly transfer GBs or TBs of data across continents, you may save hours or days of data transfer time with S3 Transfer Acceleration. + +Benefits of S3 Transfer Acceleration (S3TA): ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt4-q33-i1.jpg) + + +via - [https://aws.amazon.com/s3/transfer-acceleration/](https://aws.amazon.com/s3/transfer-acceleration/) + +![[Pasted image 20230425102110.jpg]]![[Pasted image 20230425102238.jpg]] + +AWS Systems Manager gives you visibility and control of your infrastructure on AWS. Systems Manager provides a unified user interface so you can view operational data from multiple AWS services and allows you to automate operational tasks such as collecting software inventory, running commands, managing patches, and configuring servers across AWS Cloud as well as on-premises infrastructure. + +AWS Systems Manager offers utilities for running commands, patch-management and configuration compliance: ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt4-q11-i1.png) via - [https://aws.amazon.com/systems-manager/faq/](https://aws.amazon.com/systems-manager/faq/) + +![](https://d1.awsstatic.com/AWS%20Systems%20Manager/product-page-diagram-AWS-Systems-Manager_how-it-works.2e7c5d550e833eed0f49fb8dc1872de23b09d183.png) via - [https://aws.amazon.com/systems-manager/](https://aws.amazon.com/systems-manager/) + + +**Warm Standby strategy** + +When selecting your DR strategy, you must weigh the benefits of lower RTO (recovery time objective) and RPO (recovery point objective) vs the costs of implementing and operating a strategy. The pilot light and warm standby strategies both offer a good balance of benefits and cost. + +This strategy replicates data from the primary Region to data resources in the recovery Region, such as Amazon Relational Database Service (Amazon RDS) DB instances or Amazon DynamoDB tables. These data resources are ready to serve requests. In addition to replication, this strategy requires you to create a continuous backup in the recovery Region. This is because when "human action" type disasters occur, data can be deleted or corrupted, and replication will replicate the bad data. Backups are necessary to enable you to get back to the last known good state. + +The warm standby strategy deploys a functional stack, but at reduced capacity. The DR endpoint can handle requests, but cannot handle production levels of traffic. It may be more, but is always less than the full production deployment for cost savings. If the passive stack is deployed to the recovery Region at full capacity, however, then this strategy is known as “hot standby.” Because warm standby deploys a functional stack to the recovery Region, this makes it easier to test Region readiness using synthetic transactions. + +DR strategies: ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt4-q56-i1.jpg) via - [https://aws.amazon.com/blogs/architecture/disaster-recovery-dr-architecture-on-aws-part-iii-pilot-light-and-warm-standby/](https://aws.amazon.com/blogs/architecture/disaster-recovery-dr-architecture-on-aws-part-iii-pilot-light-and-warm-standby/) + +Incorrect options: + +**Multi-site active-active strategy** - This strategy uses AWS Regions as your active sites, creating a multi-Region active/active architecture. Generally, two Regions are used. Each Region hosts a highly available, multi-Availability Zone (AZ) workload stack. In each Region, data is replicated live between the data stores and also backed up. This protects against disasters that include data deletion or corruption since the data backup can be restored to the last known good state. Each regional stack serves production traffic effectively. But, this strategy is cost involving and should only be used for mission-critical applications. + +**Pilot Light strategy** - Pilot Light, like Warm Standby strategy, replicates data from the primary Region to data resources in the recovery Region, such as Amazon Relational Database Service (Amazon RDS) DB instances or Amazon DynamoDB tables. But, the DR Region in a pilot light strategy (unlike warm standby) cannot serve requests until additional steps are taken. A pilot light in a home furnace does not provide heat to the home. It provides a quick way to light the furnace burners that then provide heat. + +Warm standby can handle traffic at reduced levels immediately. Pilot light requires you to first deploy infrastructure and then scale out resources before the workload can handle requests. + +**Backup & Restore strategy** - Backup and Restore is associated with higher RTO (recovery time objective) and RPO (recovery point objective). This results in longer downtimes and greater loss of data between when the disaster event occurs and recovery. However, backup and restore can still be the right strategy for workloads because it is the easiest and least expensive strategy to implement. + +Reference: + +[https://aws.amazon.com/blogs/architecture/disaster-recovery-dr-architecture-on-aws-part-iii-pilot-light-and-warm-standby/](https://aws.amazon.com/blogs/architecture/disaster-recovery-dr-architecture-on-aws-part-iii-pilot-light-and-warm-standby/) + +**EC2 Instance Connect** + +Amazon EC2 Instance Connect provides a simple and secure way to connect to your instances using Secure Shell (SSH). With EC2 Instance Connect, you use AWS Identity and Access Management (IAM) policies and principals to control SSH access to your instances, removing the need to share and manage SSH keys. All connection requests using EC2 Instance Connect are logged to AWS CloudTrail so that you can audit connection requests. + +You can use Instance Connect to connect to your Linux instances using a browser-based client, the Amazon EC2 Instance Connect CLI, or the SSH client of your choice. EC2 Instance Connect can be used to connect to an EC2 instance from a Mac OS, Windows or Linux based computer. + +https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html + +**S3 is a key value based object storage service** + +**S3 stores data in a flat non-hierarchical structure** + +**Credential Reports** + +You can generate and download a credential report that lists all users in your account and the status of their various credentials, including passwords, access keys, and MFA devices. You can use credential reports to assist in your auditing and compliance efforts. You can use the report to audit the effects of credential lifecycle requirements, such as password and access key rotation. You can provide the report to an external auditor, or grant permissions to an auditor so that he or she can download the report directly. + +Amazon Elastic File System (Amazon EFS) provides a simple, scalable, fully managed elastic NFS file system for use with AWS Cloud services and on-premises resources. + +To access EFS file systems from on-premises, you must have an AWS Direct Connect or AWS VPN connection between your on-premises datacenter and your Amazon VPC. You mount an EFS file system on your on-premises Linux server using the standard Linux mount command for mounting a file system + +![[Pasted image 20230425103933.jpg]] + +**Amazon S3 Replication** + +Replication enables automatic, asynchronous copying of objects across Amazon S3 buckets. Buckets that are configured for object replication can be owned by the same AWS account or by different accounts. You can copy objects between different AWS Regions or within the same Region. You can use replication to make copies of your objects that retain all metadata, such as the original object creation time and version IDs. This capability is important if you need to ensure that your replica is identical to the source object. + +Exam Alert: + +Amazon S3 supports two types of replication: Cross Region Replication vs Same Region Replication. Please review the differences between SRR and CRR: ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt4-q53-i1.jpg) via - [https://docs.aws.amazon.com/AmazonS3/latest/dev/replication.html](https://docs.aws.amazon.com/AmazonS3/latest/dev/replication.html) + +**Amazon SQS** + +Amazon Simple Queue Service (Amazon SQS) is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications. SQS eliminates the complexity and overhead associated with managing and operating message-oriented middleware, and empowers developers to focus on differentiating work. + +Using SQS, you can send, store, and receive messages between software components at any volume, without losing messages or requiring other services to be available. + +Incorrect options: + +**Amazon SNS** - Amazon Simple Notification Service (Amazon SNS) is a highly available, durable, secure, fully managed pub/sub messaging service that enables you to decouple microservices, distributed systems, and serverless applications. + +Please review this reference architecture for building a decoupled order processing system using SNS and SQS: ![](https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2017/06/19/OrderDispatcher2-1024x718.png) via - [https://aws.amazon.com/blogs/compute/building-loosely-coupled-scalable-c-applications-with-amazon-sqs-and-amazon-sns/](https://aws.amazon.com/blogs/compute/building-loosely-coupled-scalable-c-applications-with-amazon-sqs-and-amazon-sns/) + +**Create tags for each department** + +You can assign metadata to your AWS resources in the form of tags. Each tag is a label consisting of a user-defined key and value. Tags can help you manage, identify, organize, search for, and filter resources. You can create tags to categorize resources by purpose, owner, environment, or other criteria. + +Typically, you use business tags such as cost center/business unit, customer, or project to associate AWS costs with traditional cost-allocation dimensions. But a cost allocation report can include any tag. This lets you associate costs with technical or security dimensions, such as specific applications, environments, or compliance programs. + +Example of tagging for cost optimization: ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt5-q62-i1.jpg) via - [https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html](https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html) + +**AWS CloudHSM** + +The AWS CloudHSM service helps you meet corporate, contractual, and regulatory compliance requirements for data security by using a dedicated Hardware Security Module (HSM) instances within the AWS cloud. + +CloudHSM allows you to securely generate, store, and manage cryptographic keys used for data encryption in a way that keys are accessible only by you. + +How AWS CloudHSM works: ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt5-q45-i1.jpg) via - [https://aws.amazon.com/cloudhsm/](https://aws.amazon.com/cloudhsm/) + +**AWS OpsWorks** + +AWS OpsWorks is a configuration management service that provides managed instances of Chef and Puppet. Chef and Puppet are automation platforms that allow you to use code to automate the configurations of your servers. OpsWorks lets you use Chef and Puppet to automate how servers are configured, deployed, and managed across your Amazon EC2 instances or on-premises compute environments. + +**AWS Glue** - AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. AWS Glue job is meant to be used for batch ETL data processing. + +How AWS Glue works: ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt5-q42-i1.jpg) via - [https://aws.amazon.com/glue/](https://aws.amazon.com/glue/) + + + +**Convertible Reserved Instances** + +Purchase Convertible Reserved Instances if you need additional flexibility, such as the ability to use different instance families, operating systems, or tenancies over the Reserved Instance term. Convertible Reserved Instances provide you with a significant discount (up to 54%) compared to On-Demand Instances and can be purchased for a 1-year or 3-year term. + +Convertible Reserved Instances can be useful when workloads are likely to change. In this case, a Convertible Reserved Instance enables you to adapt as needs evolve while still obtaining discounts and capacity reservations. + +EC2 Pricing Options Overview: ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt5-q1-i1.jpg) via - [https://aws.amazon.com/ec2/pricing/](https://aws.amazon.com/ec2/pricing/) + +**Standard Reserved Instances** - Standard Reserved Instances provide you with a significant discount (up to 72%) compared to On-Demand Instance pricing, and can be purchased for a 1-year or 3-year term. Standard Reserved Instances do not offer as much flexibility as Convertible Reserved Instances (such as not being able to change the instance family type), and therefore are not best-suited for this use case. + +Review the differences between Standard Reserved Instances and Convertible Reserved Instances: [https://docs.aws.amazon.com/whitepapers/latest/cost-optimization-reservation-models/standard-vs.-convertible-offering-classes.html](https://docs.aws.amazon.com/whitepapers/latest/cost-optimization-reservation-models/standard-vs.-convertible-offering-classes.html)![[Screenshot from 2023-04-25 14-07-06.png]] + +LOAD balancing with Route 53 use Elastic Load balancer Route 53 does not load balance itself of the shell. + +**S3 Access Logs** + +Server access logging provides detailed records for the requests that are made to a bucket. Server access logs are useful for many applications. For example, access log information can be useful in security and access audits. + +It can also help you learn about your customer base and understand your Amazon S3 bill. + + +![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt5-q19-i1.jpg) via - [https://aws.amazon.com/architecture/well-architected/](https://aws.amazon.com/architecture/well-architected/) + + +**AWS DataSync** + +AWS DataSync is a secure online data transfer service that simplifies, automates, and accelerates copying terabytes of data to and from AWS storage services. Easily migrate or replicate large data sets without having to build custom solutions or oversee repetitive tasks. DataSync can copy data between Network File System (NFS) shares, or Server Message Block (SMB) shares, self-managed object storage, AWS Snowcone, Amazon Simple Storage Service (Amazon S3) buckets, Amazon Elastic File System (Amazon EFS) file systems, and Amazon FSx for Windows File Server file systems. + +You can use AWS DataSync for ongoing transfers from on-premises systems into or out of AWS for processing. DataSync can help speed up your critical hybrid cloud storage workflows in industries that need to move active files into AWS quickly. This includes machine learning in life sciences, video production in media and entertainment, and big data analytics in financial services. DataSync provides timely delivery to ensure dependent processes are not delayed. You can specify exclude filters, include filters, or both, to determine which files, folders or objects get transferred each time your task runs. + +AWS DataSync employs an AWS-designed transfer protocol—decoupled from the storage protocol—to accelerate data movement. The protocol performs optimizations on how, when, and what data is sent over the network. Network optimizations performed by DataSync include incremental transfers, in-line compression, and sparse file detection, as well as in-line data validation and encryption. + +Data Transfer between on-premises and AWS using DataSync: ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt5-q34-i1.jpg) via - [https://aws.amazon.com/datasync/](https://aws.amazon.com/datasync/) + + +via - [https://aws.amazon.com/aws-cost-management/aws-cost-explorer/](https://aws.amazon.com/aws-cost-management/aws-cost-explorer/) + +![AWS Cost and Usage Reportshttps://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt5-q46-i2.jpg) via - [https://aws.amazon.com/aws-cost-management/aws-cost-and-usage-reporting/](https://aws.amazon.com/aws-cost-management/aws-cost-and-usage-reporting/) + +References: + +[https://aws.amazon.com/about-aws/whats-new/2020/03/aws-cost-explorer-now-offers-savings-plans-recommendations-for-member-linked-accounts/](https://aws.amazon.com/about-aws/whats-new/2020/03/aws-cost-explorer-now-offers-savings-plans-recommendations-for-member-linked-accounts/) + +[https://docs.aws.amazon.com/cur/latest/userguide/what-is-cur.html](https://docs.aws.amazon.com/cur/latest/userguide/what-is-cur.html) + +[https://aws.amazon.com/aws-cost-management/aws-cost-explorer/](https://aws.amazon.com/aws-cost-management/aws-cost-explorer/) + +[https://aws.amazon.com/aws-cost-management/aws-cost-and-usage-reporting/](https://aws.amazon.com/aws-cost-management/aws-cost-and-usage-reporting/) + +**A VPC spans all Availability Zones (AZs) within a region** + +Amazon Virtual Private Cloud (Amazon VPC) lets you provision a logically isolated section of the AWS Cloud where you can launch AWS resources in a virtual network that you define. You have complete control over your virtual networking environment, including selection of your own IP address range, creation of subnets, and configuration of route tables and network gateways. + +A VPC spans all Availability Zones (AZs) within a region. + +**AWS IAM Identity Center** + +IAM Identity Center is the successor to AWS Single Sign-On. It is built on top of AWS Identity and Access Management (IAM) to simplify access management to multiple AWS accounts, AWS applications, and other SAML-enabled cloud applications. In IAM Identity Center, you create, or connect, your workforce users for use across AWS. You can choose to manage access just to your AWS accounts, just to your cloud applications, or to both. + +You can create users directly in IAM Identity Center, or you can bring them from your existing workforce directory. With IAM Identity Center, you get a unified administration experience to define, customize, and assign fine-grained access. Your workforce users get a user portal to access their assigned AWS accounts or cloud applications. + +You can use IAM Identity Center to quickly and easily assign and manage your employees’ access to multiple AWS accounts, SAML-enabled cloud applications (such as Salesforce, Microsoft 365, and Box), and custom-built in-house applications, all from a central place. + +How IAM Identity Center works: ![](https://d1.awsstatic.com/product-marketing/IAM/product-page-diagram_AWS-IAM-Identity-Center_SSO-Rework.45817a4d5cdf0acf33a75257713d3266879196b1.png) via - [https://aws.amazon.com/iam/identity-center/](https://aws.amazon.com/iam/identity-center/) + + +**Amazon Cognito** + +Amazon Cognito lets you add user sign-up, sign-in, and access control to your web and mobile apps quickly and easily. With Amazon Cognito, you also have the option to authenticate users through social identity providers such as Facebook, Twitter, or Amazon, with SAML identity solutions, or by using your own identity system. + +**S3 Lifecycle management** + +To manage your objects so that they are stored cost-effectively throughout their lifecycle, configure their Amazon S3 Lifecycle. An S3 Lifecycle configuration is a set of rules that define actions that Amazon S3 applies to a group of objects. There are two types of actions: Transition actions (define when objects transition to another storage class) and expiration actions (define when objects expire. Amazon S3 deletes expired objects on your behalf). + + +**AWS Quick Starts references** + +Quick Starts are built by AWS solutions architects and partners to help you deploy popular technologies on AWS, based on AWS best practices for security and high availability. These accelerators reduce hundreds of manual procedures into just a few steps, so you can build your production environment quickly and start using it immediately. + +Each Quick Start includes AWS CloudFormation templates that automate the deployment and a guide that discusses the architecture and provides step-by-step deployment instructions. + + +**IAM access advisor** + +Access advisor shows the service permissions granted to a user and when those services were last accessed. You can use this information to revise your policies. To summarize, you can identify unnecessary permissions so that you can revise your IAM policies accordingly. + + +https://docs.aws.amazon.com/ARG/latest/userguide/resource-groups.html + +**AWS Resource Groups** - In AWS, a resource is an entity that you can work with. Examples include an Amazon EC2 instance, an AWS CloudFormation stack, or an Amazon S3 bucket. If you work with multiple resources, you might find it useful to manage them as a group rather than move from one AWS service to another for each task. If you manage large numbers of related resources, such as EC2 instances that make up an application layer, you likely need to perform bulk actions on these resources at one time. + +You can use Resource Groups to organize your AWS resources. Resource groups make it easier to manage and automate tasks on large numbers of resources at a time. Resource Groups feature permissions are at the account level. As long as users who are sharing your account have the correct IAM permissions, they can work with resource groups that you create. + + +**AWS CloudTrail Insights** - AWS CloudTrail Insights helps AWS users identify and respond to unusual activity associated with write API calls by continuously analyzing CloudTrail management events. + +Insights events are logged when CloudTrail detects unusual write management API activity in your account. If you have CloudTrail Insights enabled, and CloudTrail detects unusual activity, Insights events are delivered to the destination S3 bucket for your trail. You can also see the type of insight and the incident time period when you view Insights events on the CloudTrail console. Unlike other types of events captured in a CloudTrail trail, Insights events are logged only when CloudTrail detects changes in your account's API usage that differ significantly from the account's typical usage patterns. + +CloudTrail Insights can help you detect unusual API activity in your AWS account by raising Insights events. CloudTrail Insights measures your normal patterns of API call volume, also called the baseline, and generates Insights events when the volume is outside normal patterns. + +CloudTrail Insights continuously monitors CloudTrail write management events, and uses mathematical models to determine the normal levels of API and service event activity for an account. CloudTrail Insights identifies behavior that is outside normal patterns, generates Insights events, and delivers those events to a /CloudTrail-Insight folder in the chosen destination S3 bucket for your trail. You can also access and view Insights events in the AWS Management Console for CloudTrail. + +Identify and Respond to Unusual API Activity using CloudTrail Insights: ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt6-q50-i1.jpg) via - [https://aws.amazon.com/blogs/aws/announcing-cloudtrail-insights-identify-and-respond-to-unusual-api-activity/](https://aws.amazon.com/blogs/aws/announcing-cloudtrail-insights-identify-and-respond-to-unusual-api-activity/) + +**Amazon Lightsail** - Amazon Lightsail is the easiest way to get started with AWS for developers, small businesses, students, and other users who need a solution to build and host their applications on the cloud. Lightsail provides developers with compute, storage, and networking capacity and capabilities to deploy and manage websites and web applications in the cloud. Lightsail includes everything you need to launch your project quickly – virtual machines, containers, databases, CDN, load balancers, DNS management, etc. – for a low, predictable monthly price. + +You can get preconfigured virtual private server plans that include everything to easily deploy and manage your application. Lightsail is best suited to projects that require a few virtual private servers and users who prefer a simple management interface. Common use cases for Lightsail include running websites, web applications, blogs, e-commerce sites, simple software, and more. + +Also referred to as a bundle, a Lightsail plan includes a virtual server with a fixed amount of memory (RAM) and compute (vCPUs), SSD-based storage (disks), and a free data transfer allowance. Lightsail plans also offer static IP addresses (5 per account) and DNS management (3 domain zones per account). Lightsail plans are charged on an hourly, on-demand basis, so you only pay for a plan when you're using it. + +Lightsail offers a number of preconfigured, one-click-to-launch operating systems, development stacks, and web applications, including Linux and Windows OS, WordPress, LAMP, CentOS, and more. + + +**The Elastic Beanstalk health monitoring can determine that the environment's Auto Scaling group is available and has a minimum of at least one instance** - In addition to Elastic Load Balancing health checks, Elastic Beanstalk monitors resources in your environment and changes health status to red if they fail to deploy, are not configured correctly, or become unavailable. These checks confirm that: 1. The environment's Auto Scaling group is available and has a minimum of at least one instance. 2. The environment's security group is available and is configured to allow incoming traffic on port 80. 3. The environment CNAME exists and is pointing to the right load balancer. 4. In a worker environment, the Amazon Simple Queue Service (Amazon SQS) queue is being polled at least once every three minutes. + +**With basic health reporting, the Elastic Beanstalk service does not publish any metrics to Amazon CloudWatch** - With basic health reporting, the Elastic Beanstalk service does not publish any metrics to Amazon CloudWatch. The CloudWatch metrics used to produce graphs on the Monitoring page of the environment console are published by the resources in your environment. + + +**AWS Quick Starts** - AWS Quick Starts are automated reference deployments for key workloads on the AWS Cloud. Each Quick Start launches, configures and runs the AWS compute, network, storage, and other services required to deploy a specific workload on AWS, using AWS best practices for security and availability. + +Quick Starts are accelerators that condense hundreds of manual procedures into just a few steps. They are fast, low-cost, and customizable. They are fully functional and designed for production. + +Quick Starts include: 1. A reference architecture for the deployment 2. AWS CloudFormation templates (JSON or YAML scripts) that automate and configure the deployment 3. A deployment guide, which explains the architecture and implementation in detail, and provides instructions for customizing the deployment. + +Quick Starts also include integrations that extend the cloud-based contact center functionality provided by Amazon Connect with key services and solutions from APN Partners—for customer relationship management (CRM), workforce optimization (WFO), analytics, unified communications (UC), and other use cases. + + +**AWS Outposts** - AWS Outposts is a fully managed service that offers the same AWS infrastructure, AWS services, APIs, and tools to virtually any data center, co-location space, or on-premises facility for a truly consistent hybrid experience. AWS Outposts is ideal for workloads that require low latency access to on-premises systems, local data processing, data residency, and migration of applications with local system interdependencies. + +AWS compute, storage, database, and other services run locally on Outposts, and you can access the full range of AWS services available in the Region to build, manage, and scale your on-premises applications using familiar AWS services and tools. + +You can use Outposts to support your applications that have low latency or local data processing requirements. These applications may need to generate near real-time responses to end-user applications or need to communicate with other on-premises systems or control on-site equipment. These can include workloads running on factory floors for automated operations in manufacturing, real-time patient diagnosis or medical imaging, and content and media streaming. You can use Outposts to securely store and process customer data that needs to remain on-premises or in countries where there is no AWS region. You can run data-intensive workloads on Outposts and process data locally when transmitting data to the cloud is expensive and wasteful and for better control on data analysis, back-up and restore. + +How Outposts Works: ![](https://d1.awsstatic.com/re19/HIW-Diagram_Outposts.93f8622abf9168de83eb929a2678b8fa7543d4e5.png) via - [https://aws.amazon.com/outposts/](https://aws.amazon.com/outposts/) + + +**AWS Well-Architected Tool** - The AWS Well-Architected Tool helps you review the state of your workloads and compares them to the latest AWS architectural best practices. The tool is based on the AWS Well-Architected Framework, developed to help cloud architects build secure, high-performing, resilient, and efficient application infrastructure. + +To use this free tool, available in the AWS Management Console, just define your workload and answer a set of questions regarding operational excellence, security, reliability, performance efficiency, and cost optimization. The AWS Well-Architected Tool then provides a plan on how to architect for the cloud using established best practices. + +The AWS Well-Architected Tool gives you access to knowledge and best practices used by AWS architects, whenever you need it. You answer a series of questions about your workload, and the tool delivers an action plan with step-by-step guidance on how to build better workloads for the cloud. + +How Well-Architected Tool works: ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt6-q54-i1.jpg) via - [https://aws.amazon.com/well-architected-tool/](https://aws.amazon.com/well-architected-tool/) + +**Amazon Elastic Transcoder** - Amazon Elastic Transcoder lets you convert media files that you have stored in Amazon S3 into media files in the formats required by consumer playback devices. For example, you can convert large, high-quality digital media files into formats that users can playback on mobile devices, tablets, web browsers, and connected televisions. + +Amazon Elastic Transcoder manages all aspects of the media transcoding process for you transparently and automatically. There’s no need to administer software, scale hardware, tune performance, or otherwise manage transcoding infrastructure. You simply create a transcoding “job” specifying the location of your source media file and how you want it transcoded. Amazon Elastic Transcoder also provides transcoding presets for popular output formats, which means that you don’t need to guess about which settings work best on particular devices. All these features are available via service API, AWS SDKs and the AWS Management Console. + +[https://aws.amazon.com/appstream2/](https://aws.amazon.com/appstream2/) + +[https://aws.amazon.com/workspaces/](https://aws.amazon.com/workspaces/) + +**Management events** - An event in CloudTrail is the record of an activity in an AWS account. This activity can be an action taken by a user, role, or service that is monitorable by CloudTrail. CloudTrail events provide a history of both API and non-API account activity made through the AWS Management Console, AWS SDKs, command line tools, and other AWS services. + +**There are three types of events that can be logged in CloudTrail: management events, data events, and CloudTrail Insights events. + +**By default, CloudTrail logs all management events and does not include data events or Insights events. Additional charges apply for data and Insights events. All event types use the same CloudTrail JSON log format.**** + +Management events provide information about management operations that are performed on resources in your AWS account. These are also known as control plane operations. Examples include registering devices, configuring rules for routing data, setting up logging etc. + +**AWS Cost Explorer** - AWS Cost Explorer lets you explore your AWS costs and usage at both a high level and at a detailed level of analysis, and empowering you to dive deeper using a number of filtering dimensions (e.g., AWS Service, Region, Member Account, etc.) AWS Cost Explorer also gives you access to a set of default reports to help you get started, while also allowing you to create custom reports from scratch. + +You can explore your usage and costs using the main graph, the Cost Explorer cost, and usage reports, or the Cost Explorer RI report. You can view data for up to the last 12 months, forecast how much you're likely to spend for the next 12 months, and get recommendations for what Reserved Instances to purchase. You can use Cost Explorer to identify areas that need further inquiry and see trends that you can use to understand your costs. + +You can view your costs and usage using the Cost Explorer user interface free of charge. You can also access your data programmatically using the Cost Explorer API. + +When you first sign up for Cost Explorer, AWS prepares the data about your costs for the current month and the last 12 months and then calculates the forecast for the next 12 months. The current month's data is available for viewing in about 24 hours. The rest of your data takes a few days longer. Cost Explorer updates your cost data at least once every 24 hours. After you sign up, Cost Explorer can display up to 12 months of historical data (if you have that much), the current month, and the forecasted costs for the next 12 months. + +![](https://d1.awsstatic.com/products/WAF/product-page-diagram_AWS-WAF_How-it-Works@2x.452efa12b06cb5c87f07550286a771e20ca430b9.png) via - [https://aws.amazon.com/waf/](https://aws.amazon.com/waf/) + +**AWS CloudTrail logs, Amazon VPC Flow Logs and Amazon GuardDuty findings** - Amazon Detective can analyze trillions of events from multiple data sources such as Virtual Private Cloud (VPC) Flow Logs, AWS CloudTrail, and Amazon GuardDuty, and automatically creates a unified, interactive view of your resources, users, and the interactions between them over time. + +Amazon Detective conforms to the AWS shared responsibility model, which includes regulations and guidelines for data protection. Once enabled, Amazon Detective will process data from AWS CloudTrail logs, VPC Flow Logs, and Amazon GuardDuty findings for any accounts where it has been turned on. + +Amazon Detective requires that you have Amazon GuardDuty enabled on your accounts for at least 48 hours before you enable Detective on those accounts. However, you can use Detective to investigate more than just your GuardDuty findings. Amazon Detective provides detailed summaries, analysis, and visualizations of the behaviors and interactions amongst your AWS accounts, EC2 instances, AWS users, roles, and IP addresses. This information can be very useful in understanding security issues or operational account activity. + +How Amazon Detective Works: ![](https://d1.awsstatic.com/re19/Diagram_Detective.93ebed7d2e3452fc03c6496bd7faf5b8f2ef9a6e.png) via - [https://aws.amazon.com/detective/](https://aws.amazon.com/detective/) + +**Amazon S3 Glacier Vault Lock** - S3 Glacier Vault Lock allows you to easily deploy and enforce compliance controls for individual S3 Glacier vaults with a vault lock policy. You can specify controls such as “write once read many” (WORM) in a vault lock policy and lock the policy from future edits. Once locked, the policy can no longer be changed. + +A vault lock policy can be locked to prevent future changes, providing strong enforcement for your compliance controls. You can use the vault lock policy to deploy regulatory and compliance controls, which typically require tight controls on data access. + +As an example of a Vault Lock policy, suppose that you are required to retain archives for one year before you can delete them. To implement this requirement, you can create a Vault Lock policy that denies users permission to delete an archive until the archive has existed for one year. You can test this policy before locking it down. After you lock the policy, the policy becomes immutable. + + +**Amazon Quantum Ledger Database** - Amazon QLDB is a fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log ‎owned by a central trusted authority. Amazon QLDB can be used to track each and every application data change and maintains a complete and verifiable history of changes over time. + +Ledgers are typically used to record a history of economic and financial activity in an organization. Many organizations build applications with ledger-like functionality because they want to maintain an accurate history of their applications' data, for example, tracking the history of credits and debits in banking transactions, verifying the data lineage of an insurance claim, or tracing the movement of an item in a supply chain network. Ledger applications are often implemented using custom audit tables or audit trails created in relational databases. + +Amazon QLDB is a new class of database that eliminates the need to engage in the complex development effort of building your own ledger-like applications. With QLDB, your data’s change history is immutable – it cannot be altered or deleted – and using cryptography, you can easily verify that there have been no unintended modifications to your application’s data. QLDB uses an immutable transactional log, known as a journal, that tracks each application data change and maintains a complete and verifiable history of changes over time. QLDB is easy to use because it provides developers with a familiar SQL-like API, a flexible document data model, and full support for transactions. QLDB’s streaming capability provides a near real-time flow of your data stored within QLDB, allowing you to develop event-driven workflows, real-time analytics, and to replicate data to other AWS services to support advanced analytical processing. QLDB is also serverless, so it automatically scales to support the demands of your application. There are no servers to manage and no read or write limits to configure. With QLDB, you only pay for what you use. + +How Amazon Quantum Ledger Database Works: ![](https://d1.awsstatic.com/r2018/h/99Product-Page-Diagram_AWS-Quantum.f03953678ba33a2d1b12aee6ee530e45507e7ac9.png) via - [https://aws.amazon.com/qldb/](https://aws.amazon.com/qldb/) + +**API Gateway can call an AWS Lambda function to create the front door of a serverless application** - Amazon API Gateway is an AWS service for creating, publishing, maintaining, monitoring, and securing REST, HTTP, and WebSocket APIs at any scale. API developers can create APIs that access AWS or other web services, as well as data stored in the AWS Cloud. + +API Gateway acts as a "front door" for applications to access data, business logic, or functionality from your backend services, such as workloads running on Amazon Elastic Compute Cloud (Amazon EC2), code running on AWS Lambda, any web application, or real-time communication applications. + +**API Gateway can be configured to send data directly to Amazon Kinesis Data Stream** - Amazon API Gateway can execute AWS Lambda functions in your account, start AWS Step Functions state machines, or call HTTP endpoints hosted on AWS Elastic Beanstalk, Amazon EC2, and also non-AWS hosted HTTP based operations that are accessible via the public Internet.API Gateway also allows you to specify a mapping template to generate static content to be returned, helping you mock your APIs before the backend is ready. You can also integrate API Gateway with other AWS services directly – for example, you could expose an API method in API Gateway that sends data directly to Amazon Kinesis. + +How API Gateway Works: ![](https://d1.awsstatic.com/serverless/New-API-GW-Diagram.c9fc9835d2a9aa00ef90d0ddc4c6402a2536de0d.png) via - [https://aws.amazon.com/api-gateway/](https://aws.amazon.com/api-gateway/) + +**AWS Elastic Beanstalk** - There is no additional charge for AWS Elastic Beanstalk. You pay for AWS resources (e.g. EC2 instances or S3 buckets) you create to store and run your application. You only pay for what you use, as you use it; there are no minimum fees and no upfront commitments. + +**AWS Auto Scaling** - There is no additional charge for AWS Auto Scaling. You pay only for the AWS resources needed to run your applications and Amazon CloudWatch monitoring fees. + +**CloudEndure Disaster Recovery** - CloudEndure Disaster Recovery, available from the AWS Marketplace, continuously replicates server-hosted applications and server-hosted databases from any source into AWS using block-level replication of the underlying server. CloudEndure Disaster Recovery enables you to use AWS Cloud as a disaster recovery Region for an on-premises workload and its environment. It can also be used for disaster recovery of AWS hosted workloads if they consist only of applications and databases hosted on EC2 (i.e. not RDS). + +Features of CloudEndure Disaster Recovery: + +1. Continuous replication: CloudEndure Disaster Recovery provides continuous, asynchronous, block-level replication of your source machines into a staging area. This allows you to achieve sub-second Recovery Point Objectives (RPOs), since up-to-date applications are always ready to be spun up on AWS if a disaster strikes. + +2. Low-cost staging area: Data is continually kept in sync in a lightweight staging area in your target AWS Region. The staging area contains low-cost resources that are automatically provisioned and managed by CloudEndure Disaster Recovery. This eliminates the need for duplicate resources and significantly reduces your disaster recovery total cost of ownership (TCO). + +3. Automated machine conversion and orchestration: In the event of a disaster or drill, CloudEndure Disaster Recovery triggers a highly automated machine conversion process and a scalable orchestration engine that quickly spins up thousands of machines in your target AWS Region in parallel. This enables Recovery Time Objectives (RTOs) of minutes. Unlike application-level solutions, CloudEndure Disaster Recovery replicates entire machines, including OS, system state configuration, system disks, databases, applications, and files. + +4. Point-in-time recovery: Granular point-in-time recovery allows you to recover applications and IT environments that have been corrupted as a result of accidental system changes, ransomware, or other malicious attacks. In such cases, you can launch applications from a previous consistent point in time rather than launching applications in their most up-to-date state. During the recovery, you can select either the latest state or an earlier state from a list of points in time. + +5. Easy, non-disruptive drills: With CloudEndure Disaster Recovery, you can conduct disaster recovery drills without disrupting your source environment or risking data loss. During drills, CloudEndure Disaster Recovery spins up machines in your target AWS Region in complete isolation to avoid network conflicts and performance impact. + +6. Wide application and infrastructure support: Because CloudEndure Disaster Recovery replicates data at the block level, you can use it for all applications and databases that run on supported versions of Windows and Linux OS. + + +CloudEndure Disaster Recovery: ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt6-q65-i1.jpg) via - [https://docs.aws.amazon.com/whitepapers/latest/disaster-recovery-workloads-on-aws/disaster-recovery-options-in-the-cloud.html](https://docs.aws.amazon.com/whitepapers/latest/disaster-recovery-workloads-on-aws/disaster-recovery-options-in-the-cloud.html) + + +**AWS IoT Core** - AWS IoT Core lets you connect IoT devices to the AWS cloud without the need to provision or manage servers. AWS IoT Core can support billions of devices and trillions of messages and can process and route those messages to AWS endpoints and to other devices reliably and securely. With AWS IoT Core, your applications can keep track of and communicate with all your devices, all the time, even when they aren’t connected. + +AWS IoT Core also makes it easy to use AWS and Amazon services like AWS Lambda, Amazon Kinesis, Amazon S3, Amazon SageMaker, Amazon DynamoDB, Amazon CloudWatch, AWS CloudTrail, Amazon QuickSight, and Alexa Voice Service to build IoT applications that gather, process, analyze and act on data generated by connected devices, without having to manage any infrastructure. + +AWS IoT Core lets you select the communication protocol most appropriate for your use case to connect and manage IoT devices. AWS IoT Core supports MQTT (Message Queuing and Telemetry Transport), HTTPS (Hypertext Transfer Protocol - Secure), MQTT over WSS (WebSockets Secure), and LoRaWAN (low-power long-range wide-area network). + +AWS IoT Core provides automated configuration and authentication upon a device’s first connection to AWS IoT Core, as well as end-to-end encryption throughout all points of connection, so that data is never exchanged between devices and AWS IoT Core without proven identity. In addition, you can secure access to your devices and applications by applying policies with granular permissions. + +AWS IoT Core capabilities: ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt6-q53-i1.jpg) via - [https://aws.amazon.com/iot-core/](https://aws.amazon.com/iot-core/) + +**AWS OpsHub** - AWS OpsHub is a graphical user interface you can use to manage your AWS Snowball devices, enabling you to rapidly deploy edge computing workloads and simplify data migration to the cloud. With just a few clicks in AWS OpsHub, you have the full functionality of the Snowball devices at your fingertips; you can unlock and configure devices, drag-and-drop data to devices, launch applications, and monitor device metrics. + +Previously, customers operated Snowball devices by either entering commands into a command-line interface or by using REST APIs. Now with AWS OpsHub, you have an easier way to deploy and manage even large fleets of Snowball devices, all while operating without an internet connection. + +AWS OpsHub takes all the existing operations available in the Snowball API and presents them as a simple graphical user interface. This interface helps you quickly and easily migrate data to the AWS Cloud and deploy edge computing applications on Snow Family Devices. + +AWS OpsHub provides a unified view of AWS services that are running on Snow Family Devices and automates operational tasks through AWS Systems Manager. With AWS OpsHub, users with different levels of technical expertise can easily manage a large number of Snow Family Devices. With just a few clicks, you can unlock devices, transfer files, manage Amazon EC2 instances, and monitor device metrics. + +When your Snow device arrives at your site, you download, install, and launch the AWS OpsHub application on a client machine, such as a laptop. After installation, you can unlock the device and start managing it and using supported AWS services locally. AWS OpsHub provides a dashboard that summarizes key metrics such as storage capacity and active instances on your device. It also provides a selection of the AWS services that are supported on the Snow Family Devices. Within minutes, you can begin transferring files to the device. + +**AWS Wavelength** - AWS Wavelength is an AWS Infrastructure offering optimized for mobile edge computing applications. Wavelength Zones are AWS infrastructure deployments that embed AWS compute and storage services within communications service providers’ (CSP) data centers at the edge of the 5G network, so application traffic from 5G devices can reach application servers running in Wavelength Zones without leaving the telecommunications network. This avoids the latency that would result from application traffic having to traverse multiple hops across the Internet to reach their destination, enabling customers to take full advantage of the latency and bandwidth benefits offered by modern 5G networks. + +AWS enterprise customers that build applications to serve their own use-cases such as IoT, live media production, and industrial automation can use Wavelength to deliver low-latency solutions. Customers with edge data processing needs such as image and video recognition, inference, data aggregation, and responsive analytics can use Wavelength to perform low-latency operations and processing right where their data is generated, reducing the need to move large amounts of data to be processed in centralized locations. + +How Wavelength works: ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt6-q56-i2.jpg) via - [https://aws.amazon.com/wavelength/](https://aws.amazon.com/wavelength/) + +**Compute Savings Plans, EC2 Instance Savings Plans** - Savings Plans is a flexible pricing model that provides savings of up to 72% on your AWS compute usage. This pricing model offers lower prices on Amazon EC2 instances usage, regardless of instance family, size, OS, tenancy or AWS Region, and also applies to AWS Fargate and AWS Lambda usage. + +Savings Plans offer significant savings over On-Demand, just like EC2 Reserved Instances, in exchange for a commitment to use a specific amount of compute power (measured in $/hour) for a one or three-year period. You can sign up for Savings Plans for a 1- or 3-year term and easily manage your plans by taking advantage of recommendations, performance reporting and budget alerts in the AWS Cost Explorer. + +AWS offers two types of Savings Plans: + +1. Compute Savings Plans provide the most flexibility and help to reduce your costs by up to 66%. These plans automatically apply to EC2 instance usage regardless of instance family, size, AZ, region, OS or tenancy, and also apply to Fargate and Lambda usage. For example, with Compute Savings Plans, you can change from C4 to M5 instances, shift a workload from EU (Ireland) to EU (London), or move a workload from EC2 to Fargate or Lambda at any time and automatically continue to pay the Savings Plans price. + +2. EC2 Instance Savings Plans provide the lowest prices, offering savings up to 72% in exchange for a commitment to the usage of individual instance families in a region (e.g. M5 usage in N. Virginia). This automatically reduces your cost on the selected instance family in that region regardless of AZ, size, OS or tenancy. EC2 Instance Savings Plans give you the flexibility to change your usage between instances within a family in that region. For example, you can move from c5.xlarge running Windows to c5.2xlarge running Linux and automatically benefit from the Savings Plans prices. + + +How Savings Plans Work: ![](https://d1.awsstatic.com/diagrams/Savings_Plan_Diagram.c47c77f0fc91f9ad6190f2755b65f8e57345116f.png) via - [https://aws.amazon.com/savingsplans/](https://aws.amazon.com/savingsplans/) + +**CloudTrail Logs, S3 Glacier, AWS Storage Gateway** - By default, all data stored by AWS Storage Gateway in S3 is encrypted server-side with Amazon S3-Managed Encryption Keys (SSE-S3). Also, you can optionally configure different gateway types to encrypt stored data with AWS Key Management Service (KMS) via the Storage Gateway API. + +Data at rest stored in S3 Glacier is automatically server-side encrypted using 256-bit Advanced Encryption Standard (AES-256) with keys maintained by AWS. If you prefer to manage your own keys, you can also use client-side encryption before storing data in S3 Glacier. + +By default, the log files delivered by CloudTrail to your bucket are encrypted by Amazon server-side encryption with Amazon S3-managed encryption keys (SSE-S3). To provide a security layer that is directly manageable, you can instead use server-side encryption with AWS KMS–managed keys (SSE-KMS) for your CloudTrail log files. To use SSE-KMS with CloudTrail, you create and manage a KMS key, also known as a customer master key (CMK). + +**Amazon Quicksight** - Amazon QuickSight is a scalable, serverless, embeddable, machine learning-powered business intelligence (BI) service built for the cloud. QuickSight lets you easily create and publish interactive BI dashboards that include Machine Learning-powered insights. QuickSight dashboards can be accessed from any device, and seamlessly embedded into your applications, portals, and websites. + +With QuickSight, you can quickly embed interactive dashboards into your applications, websites, and portals. QuickSight provides a rich set of APIs and SDKs that allow you to easily customize the look and feel of the dashboards to match applications. With QuickSight, you can manage your dashboard versions, grant dashboard authoring privileges, and share usage reports with your end-customers. If your application is used by customers that belong to different teams or organizations, QuickSight ensures that their data is always siloed and secure. + +Amazon QuickSight has a serverless architecture that automatically scales to tens of thousands of users without the need to set up, configure, or manage your own servers. It also ensures that your users don’t have to deal with slow dashboards during peak-hours when multiple BI users are accessing the same dashboards or datasets. And with pay-per-session pricing, you only pay when your users access the dashboards or reports, which makes it cost-effective for deployments with lots of users. There are no upfront costs or annual commitments for using QuickSight. + +How QuickSight Works: ![](https://d1.awsstatic.com/r2018/h/QuickSight%20Q/How%20QuickSight%20Works_without%20Q_final.026e51297c1fa18b850ce2ffc1575a9124bbad16.png) via - [https://aws.amazon.com/quicksight/](https://aws.amazon.com/quicksight/) + +Connecting QuickSight to your Data Lakes (e.g. Amazon S3): ![](https://assets-pt.media.datacumulus.com/aws-clf-pt/assets/pt6-q48-i1.jpg) via - [https://aws.amazon.com/quicksight/](https://aws.amazon.com/quicksight/) \ No newline at end of file diff --git a/content/Cloud/AWS/AWS CLF-01.md b/content/Cloud/AWS/AWS CLF-01.md new file mode 100644 index 000000000..319a39265 --- /dev/null +++ b/content/Cloud/AWS/AWS CLF-01.md @@ -0,0 +1,50 @@ +1**Multi-tenancy** is a **software architecture where a single software instance or application serves multiple customers or user groups, called tenants**[1](https://www.bing.com/ck/a?!&&p=03811f617effd63cJmltdHM9MTY4MTI1NzYwMCZpZ3VpZD0zZWFhOWFlOC03NGY3LTY4M2ItMGMwYi04ODMyNzU2ZTY5ZTkmaW5zaWQ9NTQ1Ng&ptn=3&hsh=3&fclid=3eaa9ae8-74f7-683b-0c0b-8832756e69e9&psq=multi+tenancy+meaning&u=a1aHR0cHM6Ly93d3cucmVkaGF0LmNvbS9lbi90b3BpY3MvY2xvdWQtY29tcHV0aW5nL3doYXQtaXMtbXVsdGl0ZW5hbmN5&ntb=1)[2](https://www.bing.com/ck/a?!&&p=d4bd1f2f65e9884cJmltdHM9MTY4MTI1NzYwMCZpZ3VpZD0zZWFhOWFlOC03NGY3LTY4M2ItMGMwYi04ODMyNzU2ZTY5ZTkmaW5zaWQ9NTQ1Nw&ptn=3&hsh=3&fclid=3eaa9ae8-74f7-683b-0c0b-8832756e69e9&psq=multi+tenancy+meaning&u=a1aHR0cHM6Ly93d3cudGVjaHRhcmdldC5jb20vd2hhdGlzL2RlZmluaXRpb24vbXVsdGktdGVuYW5jeQ&ntb=1)[3](https://www.bing.com/ck/a?!&&p=c3cbe2e48c656247JmltdHM9MTY4MTI1NzYwMCZpZ3VpZD0zZWFhOWFlOC03NGY3LTY4M2ItMGMwYi04ODMyNzU2ZTY5ZTkmaW5zaWQ9NTQ1OA&ptn=3&hsh=3&fclid=3eaa9ae8-74f7-683b-0c0b-8832756e69e9&psq=multi+tenancy+meaning&u=a1aHR0cHM6Ly93d3cuc2ltcGxpbGVhcm4uY29tL3doYXQtaXMtbXVsdGl0ZW5hbmN5LWFydGljbGU&ntb=1). It is the opposite of single tenancy, when a software instance or system has only one user or group[1](https://www.bing.com/ck/a?!&&p=49cc17b36f4fb96aJmltdHM9MTY4MTI1NzYwMCZpZ3VpZD0zZWFhOWFlOC03NGY3LTY4M2ItMGMwYi04ODMyNzU2ZTY5ZTkmaW5zaWQ9NTQ1OQ&ptn=3&hsh=3&fclid=3eaa9ae8-74f7-683b-0c0b-8832756e69e9&psq=multi+tenancy+meaning&u=a1aHR0cHM6Ly93d3cucmVkaGF0LmNvbS9lbi90b3BpY3MvY2xvdWQtY29tcHV0aW5nL3doYXQtaXMtbXVsdGl0ZW5hbmN5&ntb=1). Multi-tenancy is the backbone of cloud computing, where software is hosted, provisioned and managed by a cloud provider and accessed by users over the Internet[1](https://www.bing.com/ck/a?!&&p=df9f98904547e6d5JmltdHM9MTY4MTI1NzYwMCZpZ3VpZD0zZWFhOWFlOC03NGY3LTY4M2ItMGMwYi04ODMyNzU2ZTY5ZTkmaW5zaWQ9NTQ2MA&ptn=3&hsh=3&fclid=3eaa9ae8-74f7-683b-0c0b-8832756e69e9&psq=multi+tenancy+meaning&u=a1aHR0cHM6Ly93d3cucmVkaGF0LmNvbS9lbi90b3BpY3MvY2xvdWQtY29tcHV0aW5nL3doYXQtaXMtbXVsdGl0ZW5hbmN5&ntb=1)[4](https://www.bing.com/ck/a?!&&p=8d572e87cfb1426eJmltdHM9MTY4MTI1NzYwMCZpZ3VpZD0zZWFhOWFlOC03NGY3LTY4M2ItMGMwYi04ODMyNzU2ZTY5ZTkmaW5zaWQ9NTQ2MQ&ptn=3&hsh=3&fclid=3eaa9ae8-74f7-683b-0c0b-8832756e69e9&psq=multi+tenancy+meaning&u=a1aHR0cHM6Ly93d3cudGVjaG9wZWRpYS5jb20vZGVmaW5pdGlvbi8xNjYzMy9tdWx0aXRlbmFuY3k&ntb=1). The tenants are logically isolated from each other, but physically integrated in the shared environment[5](https://www.bing.com/ck/a?!&&p=6c332eaed65358b6JmltdHM9MTY4MTI1NzYwMCZpZ3VpZD0zZWFhOWFlOC03NGY3LTY4M2ItMGMwYi04ODMyNzU2ZTY5ZTkmaW5zaWQ9NTQ2Mg&ptn=3&hsh=3&fclid=3eaa9ae8-74f7-683b-0c0b-8832756e69e9&psq=multi+tenancy+meaning&u=a1aHR0cHM6Ly93d3cuZ2FydG5lci5jb20vZW4vaW5mb3JtYXRpb24tdGVjaG5vbG9neS9nbG9zc2FyeS9tdWx0aXRlbmFuY3k&ntb=1). They can customize some aspects of the software, but not the code. + +**SCALABILITY** - ability of a _software system_ to process higher amount of workload on its current hardware resources (_scale up_) or on current and additional hardware resources (_scale out_) without application service interruption; + +**ELASTICITY** - ability of the _hardware layer_ below (usually cloud infrastructure) to increase or shrink the amount of the physical resources offered by that hardware layer to the software layer above. The increase / decrease is triggered by business rules defined in advance (usually related to application's demands). The increase / decrease happens on the fly without physical service interruption. + +Scalability is the ability of the system to accommodate larger loads just by adding resources either making hardware stronger (scale up) or adding additional nodes (scale out). + +Elasticity is the ability to fit the resources needed to cope with loads dynamically usually in relation to scale out. So that when the load increases you scale by adding more resources and when demand wanes you shrink back and remove unneeded resources. Elasticity is mostly important in Cloud environments where you pay-per-use and don't want to pay for resources you do not currently need on the one hand, and want to meet rising demand when needed on the other hand. + +**AWS Regions** + +AWS has the concept of a Region, which is a physical location around the world where we cluster data centers. We call each group of logical data centers an Availability Zone. Each AWS Region consists of a minimum of three, isolated, and physically separate AZs within a geographic area. + +**Avability Zones** + +An Availability Zone (AZ) is one or more discrete data centers with redundant power, networking, and connectivity in an AWS Region. AZs give customers the ability to operate production applications and databases that are more highly available, fault tolerant, and scalable than would be possible from a single data center. All AZs in an AWS Region are interconnected with high-bandwidth, low-latency networking, over fully redundant, dedicated metro fiber providing high-throughput, low-latency networking between AZs. All traffic between AZs is encrypted. The network performance is sufficient to accomplish synchronous replication between AZs. AZs make partitioning applications for high availability easy. If an application is partitioned across AZs, companies are better isolated and protected from issues such as power outages, lightning strikes, tornadoes, earthquakes, and more. AZs are physically separated by a meaningful distance, many kilometers, from any other AZ, although all are within 100 km (60 miles) of each other. + + +**AWS Edge Locations** + +Edge locations are endpoints for AWS which are used for **caching content** and used as Content delivery network (CDN). + +This consists of Amazon Cloud front (CF).There are many more edge locations than regions (217 Points of Presence (205 Edge Locations and 12 Regional Edge Caches)) across globe. + +Edge locations serve requests for CloudFront and Route 53. CloudFront is a content delivery network, while Route 53 is a DNS service. Requests going to either one of these services will be routed to the nearest edge location automatically. **This allows for low latency no matter where the end user is located**. + +**AWS Local Zones** + +AWS Local Zones allow you to use select AWS services, like compute and storage services, closer to more end-users, providing them very low latency access to the applications running locally. + +AWS Local Zones are also connected to the parent region via Amazon’s redundant and very high bandwidth private network, giving applications running in AWS Local Zones fast, secure, and seamless access to the rest of AWS services. + +AWS Local Zones have their own connection to the internet and support AWS Direct Connect, so resources created in the Local Zone can serve **local end-users** with very low-latency communications. + +# Policy and Role + +Users can manage access in AWS through the creation of policies and then associating them with IAM identities or AWS resources. The policy is an AWS object that defines permissions of identity or resource, with which it associates. + +AWS undertakes an evaluation of these policies upon the request by a principal entity such as user or role. + +[docs](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html#Overview%20of%20Json%20Policies) + +Elastic Beanstalk or Elastic Container service ? + +EB vs ECS really comes down to control. Do you want to control your scaling and capacity or do you want to have that more abstracted and instead focus primarily on your app. ECS will give you control, as you have to specify the size and number of nodes in the cluster and whether or not auto-scaling should be used. With EB, you simply provide a Dockerfile and EB takes care of scaling your provisioning of number and size of nodes, you basically can forget about the infrastructure with the EB route. + +Here's the EB documentation on Docker: [http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create_deploy_docker.html](http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create_deploy_docker.html) + +With ECS you'll have to build the infrastructure first before you can start deploying the the Dockerfile so it really comes down to 1) your familiarity with infrastructure and 2) level of effort that you want to spend on the infrastructure vs the app. \ No newline at end of file diff --git a/content/Cloud/AWS/AWS Cloud AWS Cloud Practitioner Study Plan.md b/content/Cloud/AWS/AWS Cloud AWS Cloud Practitioner Study Plan.md new file mode 100644 index 000000000..7f862f414 --- /dev/null +++ b/content/Cloud/AWS/AWS Cloud AWS Cloud Practitioner Study Plan.md @@ -0,0 +1,8 @@ + +* AWS Cloud Practitioner Udemy Course [url](https://havelsan.udemy.com/course/aws-certified-cloud-practitioner-training-course/learn/lecture/26140426#overview) +* Question Practice [url](https://havelsan.udemy.com/course/practice-exams-aws-certified-cloud-practitioner/learn/quiz/4915789#overview) +* Study Guide book [https://github.com/mohankumarbm/aws-ccp-certification](https://github.com/mohankumarbm/aws-ccp-certification]) +* Github Markdown Notes [url](https://github.com/kennethleungty/AWS-Certified-Cloud-Practitioner-Notes) +* Cheatsheet [url](https://digitalcloud.training/category/aws-cheat-sheets/aws-cloud-practitioner/) +* This book [url](https://lib-5jhezsvfqkepb7glrqx6ivwm.1lib.me/book/5240517/361808) +* Learn AWS in month of lunches \ No newline at end of file diff --git a/content/Cloud/AWS/AWS Projects.md b/content/Cloud/AWS/AWS Projects.md new file mode 100644 index 000000000..97ad52655 --- /dev/null +++ b/content/Cloud/AWS/AWS Projects.md @@ -0,0 +1,4 @@ +* https://engineering.opsgenie.com/convert-radio-waves-to-alerts-using-sdr-aws-lambda-and-amazon-transcribe-7ba64f8eefa +* Create AMI for aws pentesting image create from strach + +* Create price calculator for aws with chatGpt and open source chatgpt frameworks or just use lambda functions like [this] (https://alexanderhose.com/implementing-chatgpt-on-aws-a-step-by-step-guide/) \ No newline at end of file diff --git a/content/Cloud/AWS/AWS TODOS.md b/content/Cloud/AWS/AWS TODOS.md new file mode 100644 index 000000000..96762cbf0 --- /dev/null +++ b/content/Cloud/AWS/AWS TODOS.md @@ -0,0 +1,2 @@ +DEVOPS AWS +https://www.edx.org/xseries/aws-devops-on-aws?index=product&queryID=f7dea23ade2982a1e0b7acacf47ae165&position=1 diff --git a/content/Cloud/AWS/Cloud Practitioner Book.md b/content/Cloud/AWS/Cloud Practitioner Book.md new file mode 100644 index 000000000..790b1d802 --- /dev/null +++ b/content/Cloud/AWS/Cloud Practitioner Book.md @@ -0,0 +1,57 @@ +Study guide + + #Chapter1 Cloud + + Since there’s no human processing involved in cloud compute billing, it’s as easy for a provider to charge a few pennies as it is thousands of dollars. This metered payment makes it possible to consider entirely new ways of testing and delivering your applications, and it often means your cost-cycle expenses will be considerably lower than they would if you were using physical servers running on-premises. +Comparing the costs of cloud deployments against on-premises deployments requires that you fully account for both capital expenses (capex) and operating expenses (opex). On-premises infrastructure tends to be very capex-heavy since you need to purchase loads of expensive hardware up front. Cloud operations, on the other hand, involve virtually no capex costs at all. Instead, your costs are ongoing, consisting mostly of perhour resource “rental” fees. + +# Cloud Platform Models + +* Infrastructure as a Service +You’ll learn much more about these examples later in the book, but AWS IaaS products include Elastic Cloud Compute (EC2) for virtual machine instances, Elastic Block Store (EBS) for storage volumes, and Elastic Load Balancing + +* Platform as a Service +AWS PaaS products include Elastic Beanstalk and Elastic Container Service (ECS). +* Software as a Service +While some may disagree with the designation, AWS SaaS products arguably include Simple Email Service and Amazon WorkSpaces. +* ![[Screenshot from 2023-03-23 10-00-42.png]] +* Serverless +*The serverless model—as provided by services like AWS Lambda—makes it possible to design code that reacts to external events. When, for instance, a video file is uploaded to a repository (like an AWS S3 bucket or even an on-premises FTP site), it can trigger a Lambda function that will convert the file to a new video format. There’s no need to maintain and pay for an actual instance running 24/7, just for the moments your code is actually running. And there’s no administration overhead to worry about. + +While the precise layout and organization will change over time, as of this writing the main AWS documentation page can be found at https://docs.aws.amazon.com. There you’ll find links to more than 100 AWS services along with tutorials and projects, software development kits (SDKs), toolkits, and general resources. + +https://aws.amazon.com/premiumsupport/knowledge-center/ is basically a frequently asked questions (FAQ) page that accidentally swallowed a family pack–sized box of steroids and then walked through the radioactive core of a nuclear power plant wearing wet pajamas. Or, in simpler terms, there’s a lot of information collected here. + +The page, found at https://aws.amazon.com/security/security-resources, points to AWS blogs, white papers, articles, and tutorials covering topics such as security best practices and encrypting your data in transit and at rest. + +# AWS Global Infrastructure: AWS Regions +AWS performs its cloud magic using hundreds of thousands of servers maintained within physical data centers located in a widely distributed set of geographic regions. + +Dividing resources among regions lets you do the following: +* Locate your infrastructure geographically closer to your users to allow access with the lowest possible latency +* Locate your infrastructure within national borders to meet regulatory compliance with legal and banking rules +* Isolate groups of resources from each other and from larger networks to allow the greatest possible security + + +AWS Shared Responsibility ==> security and integrity of resource you run on cloud your problem but cloud itself is managed by aws. + +Comparing the costs of cloud deployments against on-premises deployments requires that you fully account for both capital expenses (capex) and operating expenses (opex). On-premises infrastructure tends to be very capex-heavy since you need to purchase loads of expensive hardware up front. Cloud operations, on the other hand, involve virtually no capex costs at all. Instead, your costs are ongoing, consisting mostly of per hour resource “rental” fees. + +![[Screenshot from 2023-04-03 22-53-28.png]] + + + +IAAS Infrastructure as a Service ==> AWS EC2, Elastic block storage EBS +PAAS Platform as a service ==> aws elastic beanstalk and elastic container service (ECS) +SAAS Software as a Service => simple email service, aws workspace +serverless model => Lambda + +# Scalability and Elasticity + + +scalable service will automatically grow in capacity to seamlessly meet any changes in demand. large cloud provider like AWS will, for all practical purposes, have endless available capacity so the only practical limit to the maximum size of your application is your organization’s budget + +Elasticity The reason the word elastic is used in the names of so many AWS services (Elastic Compute Cloud, Elastic Load Balancing, Elastic Beanstalk, and so on) is because those services are built to be easily and automatically resized. + +Understand how scalability allows applications to grow to meet need. A cloud-optimized application allows for automated provisioning of server instances that are designed from scratch to perform a needed compute function within an appropriate network environment. Understand how elasticity matches compute power to both rising and falling demand. The scaling services of a cloud provider—like AWS Auto Scaling—should be configured to force compliance with your budget and application needs. You set the upper and lower limits, and the scaler handle + diff --git a/content/Cloud/cloud index.md b/content/Cloud/cloud index.md new file mode 100644 index 000000000..99a1ef77a --- /dev/null +++ b/content/Cloud/cloud index.md @@ -0,0 +1 @@ +#index \ No newline at end of file diff --git a/content/Devops&DevSecOps/Azure DevOps.md b/content/Devops&DevSecOps/Azure DevOps.md new file mode 100644 index 000000000..a66353ed3 --- /dev/null +++ b/content/Devops&DevSecOps/Azure DevOps.md @@ -0,0 +1,115 @@ +Tools for Microsoft provides version control, reporting, requirements management. Project management. automated builds , testing and release capabilities. + +### Continuous Integration  + +- Automated tests make sure that the bugs are captured in the early phases, and fewer bugs reach the production phase.  +- After the issues are resolved efficiently, it becomes easy to build the release. +- Developers are alerted when they break any build, so they have to rebuild and fix the build before moving forth on to the next task. +- - As Continuous Integration can run multiple texts within seconds, the costs for testing decreases excessively. +- When lesser time is invested in testing, more time can be spent in the improvement of quality. +### Continuous Delivery +- The process of deploying software is no more complex, and now the team does not need to spend a lot of time preparing the release anymore. +- - The releases can be made more frequently, this in turn speeds up the feedback loop with the customers. +- The iterations in the case of the process become faster. +### Continuous Deployment +- There is no need to stop the development for releases anymore, as the entire deployment process is now automated. +- - The release process is less prone to risks and is easily fixable in the case of any issues, as only the small batches of changes are deployed. +- There is a continuous chain of improvements in quality with every passing day. The process of development now does not take long duration like a month or a year. + +Continuous Delivery vs Deployment +Continuous Delivery is a software engineering practice where the code changes are prepared to be released. +Continuous Deployment aims at continuously releasing the code changes into the production environment. + +# Azure pipelines + +* **Build pipelines**: + These takes instructions from yaml file and build and publish artifacts from cloned source code. +* **Release pipeline** +These pipelines are deploy build artifacts into Agent machines. +* **Create release** +This one help us for complete end to end pipeline for ci/cd impl. + + +Example azure yaml templates [url](https://github.com/microsoft/azure-pipelines-yaml) + + +Azure Board supports Agile boards + +# Azure DevSecOps [URL](https://havelsan.udemy.com/course/devsecops-with-azure-devops/learn/lecture/33386494#overview) + + + + +![[Screenshot from 2023-03-13 14-15-06.png]] + + +* [[SAST(Static Application Security testing)]] +* [[SCA (Software Composition Analysis)]] +* [[DAST (Dynamic Application Security Testing)]] +* [[IAST(Interactive Application Security Testing)]] +* [[IAC(infrastructure as code)]] +* [[API Security]] + +Shift left approach is DevSecOps approach. + + +## Development stage +* Git secrets +* Security Plugins in IDE +* TruffleHog (has enterprise license) similar to git secrets + +## Security +* Code Quality tools (Sonarqube) +* SAST security tools (Fortify, Veracode,Chackmarx) +* SCA tools (Snyk,veracode, fortify,blackduck) +* DAST tools (OWASP,ZAP,WebInspect,Veracode,DAST,ACunetix) +* IAC tools (Synk, bridgecrew) +* Container security (Aqua,Qualys,PrismaCloud) + +## Operations + +* Build pipeline tools (Jenkins, AWS, GCP Cloudbuild,Azure devops, github actions, Gitlab) +* Cloud security posture (AQUA, bridgeCrews) +* Container Registry Scanning Tools (Aqua,AWS native registry scanning) +* Infrastructure Scanning tools ( Chef inspec(Compliance) ,nessus) +* Clouud security (Azure defender, aws security hub ) + + +# Devsecops in Azure DevOps + + + +![[Screenshot from 2023-03-13 14-34-07.png]] + + + +Take a look at repository section. + +https://github.com/asecurityguru/just-another-vulnerable-java-application + +Added Azure DevOps yaml ==> +https://github.com/asecurityguru/devsecops-azure-devops-simple-yaml-file-repo + +# SonarCloud + +SaaS code quality and security tool. #todos/recordingangel + + +Sonar cloud custom quality gate ==> for devsecops pipeline add azure yaml. + +use section 4 for custom show examples. + +**Need to add quality gate for our pipeline** + +Use enviroment section in azure devops for token in YAML. + +# Snyk + +* Source code +* SaaS +* Open source Third party libraries +* Containers +* Infra as Code. + +# OWASP ZAP + diff --git a/content/Devops&DevSecOps/Detect Secrets in source code.md b/content/Devops&DevSecOps/Detect Secrets in source code.md new file mode 100644 index 000000000..6308f712d --- /dev/null +++ b/content/Devops&DevSecOps/Detect Secrets in source code.md @@ -0,0 +1,25 @@ +https://www.cybersecasia.net/tips/nine-devsecops-scanning-tools-to-keep-the-bad-guys-at-bay + +**[gitLeaks](https://github.com/zricethezav/gitleaks)** + +**[Git-Secrets](https://github.com/awslabs/git-secrets)** + +**[Whispers](https://github.com/Skyscanner/whispers)** + +**[GitHub Secret Scanning](https://docs.github.com/en/developers/overview/secret-scanning)** + +**[GittyLeaks](https://github.com/kootenpv/gittyleaks)** + +**[Scan](https://slscan.io/)** + +**[Git-all-secrets](https://github.com/anshumanbh/git-all-secrets)** + +**[Detect-secrets](https://github.com/Yelp/detect-secrets)** + +**[SpectralOps](https://spectralops.io/)** + +https://github.com/Comcast/xGitGuard + +[TruffleHog](https://github.com/trufflesecurity/trufflehog) + + diff --git a/content/Devops&DevSecOps/DevOps.md b/content/Devops&DevSecOps/DevOps.md new file mode 100644 index 000000000..ea86e0bab --- /dev/null +++ b/content/Devops&DevSecOps/DevOps.md @@ -0,0 +1,10 @@ +[[Efective DevOps Building a Culture of Collaboration, Afnity, and Tooling at Scale]] + +[[Kubernetes]] + +# Books to Read +• The Phoenix Project by Gene Kim +* Continuous Delivery: Reliable Soft‐ ware Releases through Build, Test, and Deployment Automation +* hands on security devops +* DevOpsSec book + diff --git a/content/Devops&DevSecOps/DevSecOps Article.md b/content/Devops&DevSecOps/DevSecOps Article.md new file mode 100644 index 000000000..0e349ef9e --- /dev/null +++ b/content/Devops&DevSecOps/DevSecOps Article.md @@ -0,0 +1,36 @@ +## Design and Practice if Security Architecture via DevSecOps Technology DOI:10.1109/ICSESS54813.2022.9930212 + + +![[Screenshot from 2023-03-15 10-31-39.png]] + +![[Screenshot from 2023-03-15 10-31-59.png]] + +DevSecOps architecture design is divided into 10 phases. + +DevSecOps architecture is +designed to meet the international leading cloud native security +4C model (CNCF standard: cloud, cluster, container, code) and +security development life cycle (Microsoft standard) evaluation +system, across the two areas of R&D performance and security, +security is introduced into every stage of the R&D process +(DORA Level 5 standard: Integrate security in the +requirements, design, build, test, and deployment phases). + +![[Screenshot from 2023-03-15 10-41-07.png]] + +## Implementation of DevSecOps by Integrating Static and Dynamic Security Testing in CI/CD Pipelines DOI:10.1109/ICOSNIKOM56551.2022.10034883 + +https://github.com/lianahq/skinner ==> Python script named Skinner performs +automated security testing with Burp Suite Pro on the GitLab +CI pipeline using the DevSecOps implementation procedure. + +## Challenges and solutions when adopting DevSecOps: A systematic review [https://doi.org/10.1016/j.infsof.2021.106700](https://doi.org/10.1016/j.infsof.2021.106700 "Persistent link using digital object identifier") + + + +![[Screenshot from 2023-03-15 13-01-09.png]] +# Challanges About DevSecOps +![[Screenshot from 2023-03-15 13-41-20.png]] +![[Screenshot from 2023-03-15 14-49-53.png]] +![[Screenshot from 2023-03-15 14-50-13.png]]![[Screenshot from 2023-03-15 14-52-15.png]] + diff --git a/content/Devops&DevSecOps/DevSecOps Sans.md b/content/Devops&DevSecOps/DevSecOps Sans.md new file mode 100644 index 000000000..5a868e192 --- /dev/null +++ b/content/Devops&DevSecOps/DevSecOps Sans.md @@ -0,0 +1,206 @@ + +User access keys provide programmatic access to AWS services using the CLI, PowerShell Tools, and SDKs: +• Allow full access to AWS under the user's assigned group/policies +• Do NOT hard-code access keys in source code +• Do NOT check in to source control repositories +• Store in the AWS credentials file, user environment variables, or not at all (more on this later) +• Rotate access keys regularly +• Remove access keys as part of employee off boarding process + +### Docker Image to AMI [URL](https://stackoverflow.com/a/45146861) + +### Passwords for VM SANS +Username: student Password: StartTheLabs + +• Development: rapid and frictionless delivery of features through Agile and Lean methods, by small colocated teams, Continuous Integration, work managed by sticky notes on a wall, “working software over documentation” +• Operations: minimize firefighting and downtime, maximize stability and efficiency by following ITSM governance frameworks (ITIL, COBIT), rigorous change management, using standardized technology, configuration management, work managed in ticketing systems +• Security and Compliance: risk-focused, assurance of controls through stage gates, point-in-time audits, pen testing, spreadsheets, and checklists + +DevOpsSec: break down the barriers with security and compliance + +_Dogfooding_ is short for "Eating your own dog food," which represents the practice of using your own products.  For software developers, that means working with, as a real user, the applications you're building, or at least working closely with people who do use it.  Dogfooding provides a number of advantages, both marketing and technical. + +Three key characteristics of DevOps unicorns: +1. Omnipresent culture: around values of accountability, continuous learning, collaboration, and experimentation. High levels of patience, trust, ethics, and empowerment. Little patience for waste and inefficiency in decision making and bureaucracy. +2. Technology savvy, customer-obsessed business leadership. Executives at all levels fully understand the importance of technology to their success. +3. Optimized organizational structure: prepared to rethink structure, staffing, performance metrics, and ownership + + + + +![[Screenshot from 2023-07-27 21-15-52.png]] + +Security Development Lifecycle (SDL): Microsoft has had a version of its SDL since 2004. However, in 2008, they began publishing/releasing their SDL, and many companies have used their SDL to model their own internal secure development efforts. They have also released tools to assist with the security activities within the SDL, such as the Attack Surface Analyzer (https://www.microsoft.com/en-us/download/details.aspx?id=24487) and the Microsoft Threat Modeling Tool (https://www.microsoft.com/en-us/download/details.aspx?id=49168). + + +Amazon has developed an extensive set of cloud-based security services that are available to users of AWS +• IAM, CloudWatch, CloudTrail, Trusted Advisor, Inspector, DDOS protection, KMS, managed WAF… Shared Responsibility Model +• Understand and separate what Amazon is responsible for and what the customer is responsible for +• You are responsible for using AWS capabilities correctly AWS Cloud Compliance +• Certified operating environments for finance, healthcare, government, PCI +• Higher SLAs and detailed guidance + + +CAMS (or CALMS) is a common lens for understanding DevOps and for driving DevOps change Your organization succeeds when it reaches “CALMS” +• Culture: people come first +• Automation: rely on tools for efficiency and repeatability +• Lean: apply Lean engineering practices to continuously improve +• Measurement: use data to drive decisions and improvements +• Sharing: share ideas, information, and goals across silos + +Developers are lazy, so make it easy for them to do the right thing. Make systems safe by default: Provide safe libraries and safe default configurations in templates and make them available to engineers. Bake security into base images and watch closely when base (“gold”) images are changed. Publish and evangelize safe patterns. • Engineering autonomy: Provide developers with self-service tools so that they can take responsibility for security in whatever they are working on. • Undifferentiated heavy lifting: Work with Amazon AWS to provide high-quality, safe infrastructure as a service, and leverage the cloud provider’s built-in capabilities to scale efficiently. Take advantage of AWS (cloud) APIs to do security work: snapshot drive for forensic analysis, change firewall config, inventory systems… • Scale engineering (and security) through extensive automation. • Eliminate snowflake configurations through standard deployment tools and templates. • Microservices: Assess risks at the service level and provide transparency to teams. • Continuous Deployment: Hundreds of small changes are made every day, which means that there are many chances for making small errors, so…. • Trust, but verify. No security gates or change review boards. Extensive checks in test and production (security, compliance, reliability…). + +In DevOps, the goal is to automate as much of the work as possible through code. Get everything out of paper (policies, procedures, run books, checklists) and spreadsheets and into code that can be reviewed, scanned, tracked, and tested. + +All code needs to be checked in to a source code control system/repository—if possible, a common repository or set of repositories shared by dev and ops—not just application code and unit tests written by developers, but database schemas, application configuration specifications, documentation, build and deployment scripts, operational scripts, job schedules, and everything needed to set up, deploy, and run the system, from the bare metal up (configuration cookbooks or manifests and associated tests, hardening templates… + +MTTR: Mean Time to Recover or Repair from a failure. Together with Change Failure Rate, this measures the reliability/quality of service and availability. Some teams may want to separately track, and optimize for, MTTD—Mean Time to Detect a failure—so that they can look for ways to identify problems quickly. Note that many DevOps teams do not measure or optimize for MTTF (Mean Time to Failure) because they recognize that failures will happen. Instead, they work on trying to minimize the impact and cost of failures. See John Allspaw: https://www.kitchensoap.com/2010/11/07/mttr-mtbf-formost-types-of-f/ + +Change Lead Time or Cycle Time. The average time it takes to get a change or fix into production, which is a key metric for DevOps teams (and Lean teams) to optimize for. This can be measured from three points: 1. Change cycle time: from when a change was requested by the business to when it is deployed. This looks at the full value stream, both upstream and downstream of development. 2. Development change lead time: from when development starts to when the change is deployed (a subset of the change cycle time, which focuses on speeding up development, testing, and deployment) 3. Deployment lead time: from when development is finished to when the change is deployed (the tail end of the change cycle time, which focuses on speeding up acceptance testing, change control, and deployment) + +## Security measurement +Measure automated test coverage for high-risk code • Track # of vulnerabilities found… and where they were found in the pipeline • Track # of vulnerabilities fixed • How long vulnerabilities remain open (window of exposure) • Type of vulnerability (OWASP Top 10) for Root Cause Analysis • Elapsed time for security testing—make feedback loops as short as possible • False positives versus true positives—improve quality of feedback • Vulnerability escape rate to production. + +Continuous Deployment: from 2x/week to 50x/day • Engineers push (small) changes to production on their first day on the job A “Just Culture” shared across the organization • Blameless Postmortems (and Morgue) It is safe to make mistakes—as long as you own them and help fix them • Security Outreach: don’t be a jerk to developers Measure Everything: data-driven learning and decisions • If in doubt, measure it: engineers are “addicted to data porn” • Make data visible: Etsy “worships at the church of graphs” • Use real data to improve security: “attack-driven defense”. + +Automatically monitor changes to high-risk code: why is somebody changing crypto or authentication functions? + +Attack-driven (and data-driven) defense: monitor attack activity at the application level in production, and use this to prioritize testing and defensive actions. What kind of attacks are you seeing? Replay these attacks to see which ones are succeeding. Make information about security attacks visible to everyone in engineering and ops. + + +Technology: How do you manage risks in new, rapidly evolving platforms and architectures such as microservices, cloud, containers, serverless? Integrity: Is there enough time to fully test and review changes before they make it to production? Availability: Does frequent change increase chances of failure? Confidentiality: In “you build it, you run it”, how do you control developer access to production data? + +# DevOps Kata - Single Line of Code + +Since DevOps is a broad topic, it can be difficult to determine if a team has enough skills and is doing enough knowledge sharing to keep the [Bus Factor](http://en.wikipedia.org/wiki/Bus_factor) low. It can also be difficult for someone interested in learning to know where to start. I thought I’d try to brainstorm some DevOps katas to give people challenges to learn and refine their skills. If you’re worried about your bus factor, challenge less experienced team members to do these katas, imagining the senior team members are unavailable. + +## Single Line of Code + +Goal: Deploy a change that involves a single line of code to production. + +The Deployment Kata is also a useful tool for compliance and governance. By deploying a simple easy-tofollow change, you can walk auditors through how patches, upgrades, and other changes are made to a system, showing them all of the steps and tests, and letting them review the build artifacts and evidence created along the path. + + +Opportunities for security testing in Continuous Integration are limited because of the rapid cycle time in CI. Testing in CI is designed to catch regressions on a code change. In order to encourage fast feedback to developers, the entire check-in and build/test cycle has to complete within a few minutes at most, which means that tests have to execute quickly and cannot require complex setup. All of the tests that execute in CI have to provide unambiguous pass/fail results. Flakey tests, and tests that may return false positives, will be ignored by development teams. There is no time for comprehensive static or dynamic scanning in Continuous Integration. + + + +CI often includes at least some basic static analysis (checks for hardcoded credentials, dangerous functions, dependency checks) and incremental static analysis checking if this is supported by the tools that you are using + + +Smoke testing, also called _build verification testing_ or _confidence testing_, is a software testing method that is used to determine if a new software [build](https://www.techtarget.com/searchsoftwarequality/definition/build) is ready for the next testing phase. This testing method determines if the most crucial functions of a program work but does not delve into finer details. + + +## CD + +Pipeline model and control framework built on/extending Continuous Integration and Agile build/test practices • Uses latest good build from CI, packages for deployment, and release • Changes are automatically pushed to test/staging environments to conduct more realistic/comprehensive tests • Can insert manual reviews/testing/approvals between pipeline stages • Log steps and results to provide audit trail from check-in to deploy • Any failures will “stop the line”: No additional changes can be accepted until the failure is corrected • Ensures that code is always ready to be deployed: Changes may be batched up before production release + +A CD workflow could consist of the following steps: +1. IDE checking for coding/security mistakes as code is entered/changed +2. Pre-commit code reviews +3. Pre-commit smoke test +4. Commit build in CI with fast feedback to developers: SAST (incremental), automated unit tests with code coverage failure, integration sanity checks (some of these steps could be done in parallel) +5. Software Component Analysis (SCA) on open-source components to identify code with known vulnerabilities (some SCA tools will also check for licensing risks) +6. Alert on high-risk code changes (e.g., unit tests that check hash value of code, or quick scanning for dangerous functions) which require review by InfoSec +7. Store binaries, configuration files, and other artifacts in repository +8. Deploy to acceptance test environment (configure and stand up test systems using Puppet/Chef, Terraform, Docker…) and run post-deployment asserts/smoke tests +9. Automated acceptance and integration testing +10. Automated performance and load tests (in parallel) +11. Automated dynamic (DAST) scans—with clear pass/fail criteria +12. Deploy to staging using same deployment tools and instructions as acceptance test—and run postdeployment asserts/smoke tests +13. Environmental and data migration tests +14. Code is now ready to be deployed to production +15. Environmental/data migration checks +16. Operations tests +17. Code is ready to be deployed and released to production +![[Screenshot from 2023-07-29 17-06-55.png]] + +![[Screenshot from 2023-07-29 17-09-50.png]] + + + +Blue/Green Deployment is a pattern for managing Continuous Deployment. You run two different environments (one “blue”, one “green”) in production. The blue environment is active. Changes are rolled out to the green environment. Once the changes are deployed and the green environment is running and warmed up, load balancing is used to reroute traffic from the blue to the green environment. Once all customer traffic has been routed to the green environment, the blue environment is available to be updated. + + +Canary Releasing (https://martinfowler.com/bliki/CanaryRelease.html) Another technique to minimize the impact and risk of Continuous Deployment is “canary releasing”. Changes are pushed to one server and carefully monitored to ensure that the update was done correctly, and everything is running as expected. Then the change is pushed to two servers and checked, then ten servers and checked again, then half of the servers, before finally being pushed to all servers. At any point, if monitoring or other checks determine that the change is not working as expected, the change can be automatically rolled back, or the roll out can be halted until a fix is rolled out + +Before deployment, check that operational dependencies are correct After deployment, ensure that the system is set up and running correctly • Simple, end-to-end tests of core functions using test data/simulated transactions • Ensure that all connections are running • Check that monitoring functions are working correctly • Configuration checks • Version/dependency checks • Basic runtime security smoke tests to catch obvious mistakes + +![[Screenshot from 2023-07-29 17-18-38.png]] + + + +CD PIPELINE RULES + +1. Use the CD pipeline for all changes to all environments: changes to code, infrastructure, and runtime configuration 2. Build the binaries once (and protect them) 3. Keep development and test environments in sync with production (as closely as possible) 4. Isolate differences between environments in runtime variables 5. Stop if any step fails—and fix it immediately 6. Run smoke tests/checks after every deployment 7. Automate repetitive/expensive work 8. Timestamp and record every step +1. Use the CD Pipeline for all changes to all environments: code changes, changes to runtime configuration, changes to infrastructure. 2. Build binaries once. Version them and sign them or otherwise protect them to ensure that they are not tampered with along the pipeline stages. 3. Use automated configuration management to set up development and test environments to match production (as closely as possible) and to keep all environments in sync. 4. Isolate differences between environments (test, acceptance, staging, production…) in runtime variables that are supplied to the configuration. 5. If any step fails, stop the line. Based on Toyota’s Lean Manufacturing principles: if something is wrong, pull the “Andon Cord”. 6. Run automated health checks/smoke tests after every deployment or configuration change. 7. Automate repetitive and expensive work wherever possible—minimize manual steps and checks. 8. Audit everything, taking advantage of logs from automated tools. Protect and archive these logs to ensure integrity. + +![[Screenshot from 2023-07-29 17-27-10.png]] + + +production runtimes are immutable, and nobody has direct access to production servers. Every change (to applications and to infrastructure) must be checked in to source control and deployed through automated pipelines. All pipelines must be identified and registered in advance. Every change must be peer reviewed and must pass several levels of testing and scanning. + +https://www.cloudbees.com/blog/blue-ocean-makes-creating-pipelines-fun-and-intuitive-teams-doing-continuous-delivery + +Some security tools can’t be easily automated in pipelines—simpler tools that are API-driven work best • Some checks take too long and have to be done out of band • Get Security, Dev, and Ops working together to solve problems • Help engineers to write their own tests + +![[Screenshot from 2023-07-29 23-22-40.png]] + +In many cases, the “code is the design”, which means that to understand the design, people need to be able to read the code. And this also means that the design changes as the code changes—which is often. + +CODE IS DESIGN + +This makes it difficult for InfoSec to understand where and how they can review the design for security risks. How do you do threat modeling of the design when the design is never finished and is always changing? + + +Tools to help perform rapid risk assessments: +• PayPal risk questionnaire for new apps/services +• Mozilla Rapid Risk Assessment (RRA) model: 30-minute review +• Slack goSDL for questions to determine initial risk rating + + +High-level, basic risk assessments should be done in upfront platform selection and architecture decisions. This should focus on: +• Data classification: What data is sensitive, restricted, or confidential and needs to be protected? What are the legal/compliance restrictions and obligations (for auditing, archival, encryption…)? + +• Security risks in platform choice (OS, cloud platform), data management solutions (SQL or NoSQL), languages, and frameworks. The team needs to understand their tools and how to use them properly. + +• CD toolchain support: What scanning (DAST, SAST, IAST) tools and other test tools are available based on the language(s) and platform that the team is using? + + +Ask these questions when you are making changes (based on SAFECode’s Tactical Threat Modeling Guide): +1. Are you changing the attack surface? +2. Are you changing the technology stack? +3. Are you changing application security controls? +4. Are you adding confidential/sensitive data? +5. Are you modifying high-risk code? +https://safecode.org/safecodepublications/tactical-threat-modeling/ + +### Version Control +* Local (e.g., RCS, SCCS) +• Client-Server (e.g., CVS, Subversion) +• Distributed (e.g., git, mercurial) + +![[Screenshot from 2023-07-29 23-41-33.png]] + + +[**Code ownership is a model in which a developer or team of developers are responsible for a specific piece of code within a software project**](https://www.bing.com/ck/a?!&&p=24374ec20bcf6785JmltdHM9MTY5MDU4ODgwMCZpZ3VpZD0wNmQ1ZjAwYS0wOTVlLTYyN2QtMGFiZC1lMzNkMDhjNzYzMDkmaW5zaWQ9NTY2OA&ptn=3&hsh=3&fclid=06d5f00a-095e-627d-0abd-e33d08c76309&psq=CODEOWNERS&u=a1aHR0cHM6Ly9jb2Rlb3duZXJzLmNvbS9sZWFybi93aGF0LWlzLWNvZGUtb3duZXJzaGlwLw&ntb=1)[1](https://www.bing.com/ck/a?!&&p=12a7ff66e9db05a1JmltdHM9MTY5MDU4ODgwMCZpZ3VpZD0wNmQ1ZjAwYS0wOTVlLTYyN2QtMGFiZC1lMzNkMDhjNzYzMDkmaW5zaWQ9NTY2OQ&ptn=3&hsh=3&fclid=06d5f00a-095e-627d-0abd-e33d08c76309&psq=CODEOWNERS&u=a1aHR0cHM6Ly9jb2Rlb3duZXJzLmNvbS9sZWFybi93aGF0LWlzLWNvZGUtb3duZXJzaGlwLw&ntb=1). [Code owners can be defined in the special file named CODEOWNERS](https://www.bing.com/ck/a?!&&p=165034411758d5e3JmltdHM9MTY5MDU4ODgwMCZpZ3VpZD0wNmQ1ZjAwYS0wOTVlLTYyN2QtMGFiZC1lMzNkMDhjNzYzMDkmaW5zaWQ9NTY3MA&ptn=3&hsh=3&fclid=06d5f00a-095e-627d-0abd-e33d08c76309&psq=CODEOWNERS&u=a1aHR0cHM6Ly93d3cuamV0YnJhaW5zLmNvbS9oZWxwL3NwYWNlL2NvZGUtb3duZXJzLmh0bWw&ntb=1)[2](https://www.bing.com/ck/a?!&&p=6d50054bf9cb3a7bJmltdHM9MTY5MDU4ODgwMCZpZ3VpZD0wNmQ1ZjAwYS0wOTVlLTYyN2QtMGFiZC1lMzNkMDhjNzYzMDkmaW5zaWQ9NTY3MQ&ptn=3&hsh=3&fclid=06d5f00a-095e-627d-0abd-e33d08c76309&psq=CODEOWNERS&u=a1aHR0cHM6Ly93d3cuamV0YnJhaW5zLmNvbS9oZWxwL3NwYWNlL2NvZGUtb3duZXJzLmh0bWw&ntb=1). [People with write permissions for the repository can create or edit the CODEOWNERS file and be listed as code owners](https://www.bing.com/ck/a?!&&p=2088f5aee47567fdJmltdHM9MTY5MDU4ODgwMCZpZ3VpZD0wNmQ1ZjAwYS0wOTVlLTYyN2QtMGFiZC1lMzNkMDhjNzYzMDkmaW5zaWQ9NTY3Mg&ptn=3&hsh=3&fclid=06d5f00a-095e-627d-0abd-e33d08c76309&psq=CODEOWNERS&u=a1aHR0cHM6Ly9kb2NzLmdpdGh1Yi5jb20vZW4vcmVwb3NpdG9yaWVzL21hbmFnaW5nLXlvdXItcmVwb3NpdG9yeXMtc2V0dGluZ3MtYW5kLWZlYXR1cmVzL2N1c3RvbWl6aW5nLXlvdXItcmVwb3NpdG9yeS9hYm91dC1jb2RlLW93bmVycw&ntb=1)[3](https://www.bing.com/ck/a?!&&p=a32460b9b8ab7422JmltdHM9MTY5MDU4ODgwMCZpZ3VpZD0wNmQ1ZjAwYS0wOTVlLTYyN2QtMGFiZC1lMzNkMDhjNzYzMDkmaW5zaWQ9NTY3Mw&ptn=3&hsh=3&fclid=06d5f00a-095e-627d-0abd-e33d08c76309&psq=CODEOWNERS&u=a1aHR0cHM6Ly9kb2NzLmdpdGh1Yi5jb20vZW4vcmVwb3NpdG9yaWVzL21hbmFnaW5nLXlvdXItcmVwb3NpdG9yeXMtc2V0dGluZ3MtYW5kLWZlYXR1cmVzL2N1c3RvbWl6aW5nLXlvdXItcmVwb3NpdG9yeS9hYm91dC1jb2RlLW93bmVycw&ntb=1). [People with admin or owner permissions can require that pull requests have to be approved by code owners before they can be me + +Take advantage of engineering teams that are “test obsessed”: • Ensure high levels of unit test coverage for high-risk code • Review unit tests as well as code when changes are made • Use “OWASP User Security Stories”, “Abuse Cases”, and OWASP ASVS Verification Requirements to come up with test cases (more later) • Make tests count: too many tests will make it expensive to change code • Red means STOP—ensure team does not ignore/remove broken tests • Write unit tests first when fixing vulnerabilities • Leverage unit tests to refactor buggy/complex code—cover the code in tests, then clean it up in small steps • Use Unit tests to alert on changes to high-risk code (more later) + +![[Screenshot from 2023-07-30 00-45-44.png]] + + +![[Screenshot from 2023-07-30 01-44-31.png]] + + +Evil User Story: As a software engineer, I shall not be able to deploy highrisk code to production without a security review + +• security controls (authentication, password handling, access control, output encoding libraries, data entitlement checks, user management, crypto methods) • admin functions • application code that works with private data • runtime frameworks • public network-facing APIs • legacy code that is known to be tricky to change (high complexity…) or that is known to be buggy • release/deployment scripts or tooling + + +Many organizations (especially large enterprises) operate a centralized “Scanning Factory” where code is scanned periodically, the results triaged and reviewed by InfoSec and then submitted back to the development team for remediation. However, by this time, the developers may have already moved on to other work, especially in Agile environments… and in Continuous Deployment, the code has already been deployed to production + + +![[Screenshot from 2023-07-30 02-02-49.png]] + + diff --git a/content/Devops&DevSecOps/DevSecOps.md b/content/Devops&DevSecOps/DevSecOps.md new file mode 100644 index 000000000..4971353ae --- /dev/null +++ b/content/Devops&DevSecOps/DevSecOps.md @@ -0,0 +1,222 @@ +Instead of trying to plan and design everything upfront, [[DevOps]] organizations are running continuous experiments and using data from these experiments to drive design and process improvements. + +* taking advantage of new tools such as programmable con‐ figuration managers and application release automation to simplify and scale everything from design to build and deployment and operations, and taking advantage of cloud services, virtualization, and containers to spin up and run systems faster and cheaper. + +DevOps: + +Infrastructure as Code: + * Chef ,puppet terraform these softwares are increase speed of building systems and scaling them. + * full visibility into configuration details, control over configuration drift and elimination of one-off snowflakes, and a way to define and automatically enforce security policies at run‐ time. + +Continuous Delivery: + +Continuous monitoring and measurement +This involves creating feedback loops from production back to engineering, collecting metrics, and making them visible to everyone to understand how the system is actually used and using this data to learn and improve. You can extend this to security to provide insight into security threats and enable “Attack-Driven Defense.” + +Learning From Failure + +* [[chaos engineering]], and practicing for failure in game days. + + +Amazon has thousands of small (“two pizza”) engineering teams working inde‐ pendently and continuously deploying changes across their infra‐ structure. In 2014, Amazon deployed 50 million changes: that’s more than one change deployed every second of every day.1 So much change so fast... How can security possibly keep up with this rate of change? + + +[[Lean principles]] + +DevOps is heavily influenced by Lean principles: maximizing effi‐ ciency and eliminating waste and delays and unnecessary costs. + + +Major security risks facing users of cloud computing services: + +1. Data breaches +2. Weak identity, credential, and access management +3. Insecure interfaces and APIs +4. System and application vulnerabilities +5. Account hijacking +6. Malicious insiders +7. Advanced Persistent Threats (APTs) +8. Data loss +9. Insufficient due diligence +10. Abuse and nefarious use of cloud services +11. Denial of Service +12. Shared technology issues + +#microservice +An individual [[microservice]] fits in your head, but the interrelationships among them exceed any human’s understanding + +Attack surface. The attack surface of any microservice might be tiny, but the total attack surface of the system can be enormous and hard to see. + +Unlike a tiered web application, there is no clear perimeter, no obvious “choke points” where you can enforce authentication or access control rules. You need to make sure that trust bound‐ aries are established and consistently enforced. + +The polyglot programming problem. If each team is free to use what they feel are the right tools for the job (like at Amazon), it can become extremely hard to understand and manage security risks across many different languages and frameworks. + +Logging strategy, forensics and auditing across different services with different logging approaches can be a nightmare. + +[[containerForensics]] + +Docker Security Risks + +* Kernel exploits +* DOS attacks + one container can monopolize access to certain resources–including memory and more esoteric resources such as user IDs (UIDs)—it can starve out other containers on the host, resulting in a denial-of-service (DoS), whereby legitimate users are unable to access part or all of the system. +* Container breakouts + users are not namespaced, so any process that breaks out of the container will have the same privileges on the host as it did in the container; if you were `root` in the container, you will be `root` on the host. +* Poisoned Images +* Compromising Secrets + + +[Docker Bench Security](https://github.com/docker/docker-bench-security) +#todos We can automatize this process when we use docker pull or etc. +You can lock down a container by using CIS guidelines and other security best practices and using scanning tools like Docker Bench, and you can minimize the container’s attack surface by stripping down the runtime dependencies and making sure that developers don’t package up development tools in a production container. But all of this requires extra work and knowing what to do. + +Etys's DevSecOps + +* Trust people to do the right thing, but still verify. Rely on code reviews and testing and secure defaults and training to prevent or catch mistakes +* If it Moves, Graph it.” Make data visible to everyone so that everyone can understand and act on it, including information about security risks, threats, and attacks. Data visulations +* “Just Ship It.” Every engineer can push to production at any time. This includes security engineers. If something is broken 17 1 Rich Smith, Director of Security Engineering, Etsy. “Crafting an Effective Security Organization.” QCon 2015 http://www.infoq.com/presentations/security-etsy and you can fix it, fix it and ship the fix out right away +* Understand the real risk to the system and to the organization and deal with problems appropriately. + + +“*Designated Hackers*” is a system by which each security engineer supports four or five development teams across the organization and are involved in design and standups. + +![[Screenshot from 2023-03-10 13-08-33.png]] + THIS CALL AS *Shifting Security Left* +and how to take advantage of security features in their application frameworks and security libraries to prevent common security vulnerabilities like injection attacks. The OWASP and [SAFECode](http://www.safecode.org/) communities provide a lot of useful, free tools and frameworks and guidance to help developers with understanding and solving common application security problems in any kind of system. + +[[FuzzingSoftware]] + +OWASP Proactive Control +1. Verify for security early and often +2. parametrize queries ==> Prevent SQL injection by using a parameterized database inter‐ face. +3. encode data +4. Validate all inputs +5. Implement identity and authentication controls +6. Implement appropriate access controls +7. Protect data +8. Implement logging and intrusion detection +9. Take advantage of security frameworks and libraries +10. Error and exception handling + +**CANARY RELEASING** +Another way to minimize the risk of change in Continuous Delivery or Continuous Deployment is canary releasing. Changes can be rol‐ led out to a single node first, and automatically checked to ensure that there are no errors or negative trends in key metrics (for exam‐ ple, conversion rates), based on “the canary in a coal mine” meta‐ phor. If problems are found with the canary system, the change is rolled back, the deployment is canceled, and the pipeline shut down until a fix is ready to go out. After a specified period of time, if the canary is still healthy, the changes are rolled out to more servers, and then eventually to the entire environment.s + +**Honeymoon Effect** +older software that is more vulnerable is easier to attack than software that has recently been changed. +Attacks take time. It takes time to identify vulnerabilities, time to understand them, and time to craft and execute an exploit. This is why many attacks are made against legacy code with known vulner‐ abilities. In an environment where code and configuration changes are rolled out quickly and changed often, it is more difficult for attackers to follow what is going on, to identify a weakness, and to understand how to exploit it. The system becomes a moving target. By the time attackers are ready to make their move, the code or con‐ figuration might have already been changed and the vulnerability might have been moved or closed + + + +Continuous Delivery is provisioning and configuring test environ‐ ments to match production as closely as possible—automatically.This includes packaging the code and deploying it to test environ‐ ments; running acceptance, stress, and performance tests, as well as security tests and other checks, with pass/fail feedback to the team, all automatically; and auditing all of these steps and communicating status to a dashboard.Later, you use the same pipeline to deploy the changes to production. + +# Injecting Security into Continuous Delivery + +Ask these questions before you start: + +What happens before and when a change is checked in? +• Where are the repositories? Who has access to them? +• How do changes transition from check-in to build to Continu‐ ous Integration and unit testing, to functional and integration testing, and to staging and then finally to production? +• What tests are run? Where are the results logged? +• What tools are used? How do they work? +• What manual checks or reviews are performed and when? + +![[Screenshot from 2023-03-10 13-47-48.png]] + +## Precommit + +lightweight iterative threat modeling and risk assesments +SAST (Static Analysis) checking in engineers IDE +Peer code reviews ( for defensive coding and security vulnerabilities) + +## Commit Stage (Continuos Integration) + +This is automatically triggered by a check in. In this stage, you build and perform basic automated testing of the system. These steps return fast feedback to developers: did this change “break the build”? + +security checks that you should include in this stage: + +• Compile and build checks, ensuring that these steps are clean, and that there are no errors or warnings + +* Software Component Analysis in build, identifying risk in thirdparty components +* Incremental static analysis scanning for bugs and security vul‐ nerabilities +* • Alerting on high-risk code changes through static analysis checks or tests +* * Automated unit testing of security functions, with code cover‐ age analysis +* Digitally signing binary artifacts and storing them in secure repositories (For software that is distributed externally, this should involve signing the code with a code-signing certificate from a third-party CA. For internal code, a hash should be enough to ensure code integrity.) +* + +## Acceptance Stage + +To minimize the time required, these tests are often fanned out to different test servers and executed in parallel. Following a “fail fast” approach, the more expensive and time-consuming tests are left until as late as possible in the test cycle, so that they are only executed if other tests have already passed. + +• Secure, automated configuration management and provisioning of the runtime environment (using tools like Ansible, Chef, Puppet, Salt, and/or Docker). Ensure that the test environment is clean and configured to match production as closely as possi‐ ble. + +• Automatically deploy the latest good build from the binary arti‐ fact repository. + +• Smoke tests (including security tests) designed to catch mistakes in configuration or deployment. + +• Targeted dynamic scanning (DAST). + +• Automated functional and integration testing of security fea‐ tures. + +• Automated security attacks, using Gauntlt or other security tools. • Deep static analysis scanning (can be done out of band). + +•Fuzzing (of APIs, files). This can be done out of band. + +• Manual pen testing (out of band). + +## Production Deployment and Post-Deployment + +ending manual review/approvals and scheduling (in Continuous Delivery) or automatically (in Continu‐ ous Deployment). + +* Secure automated configuration managment and provisiong of runtime env +* Automated deployment and release orchestration +* Post-Deployment [[smoke test]] +* Automated runtime asserts and compliance checks (monkeys) +* Production monitoring/feedback +* Runtime defense +* Red teaming +* Bug bounties +* Blameless postmortems (learning from failure) + + +## Source code + +Luckily, you can do this automatically by using Software Compo‐ nent Analysis (SCA) tools like OWASP’s Dependency Check project or commercial tools like Sonatype’s Nexus Lifecycle or SourceClear. + +OWASP’s Dependency Check is an open source scanner that cata‐ logs open source components used in an application. It works for Java, .NET, Ruby (gemspec), PHP (composer), Node.js and Python, and some C/C++ projects. Dependency Check integrates with com‐ mon build tools (including Ant, Maven, and Gradle) and CI servers like Jenkins. + +Code reviews tools needs to invatigate. + +You should not rely on only one tool—even the best tools will catch only some of the problems in your code. Good practice would be to run at least one of each kind to look for different problems in the code, as part of an overall code quality and security program. + + +You can use tools like OWASP ZAP to automatically scan a web app for common vulnerabilities as part of the Continuous Integration/ Continuous Delivery pipeline. You can do this by running the scan‐ ner in headless mode through the command line, through the scan‐ ner’s API, or by using a wrapper of some kind, such as the ZAProxy Jenkins plug-in or a higher-level test framework like BDD-Security (which we’ll look at in a later section) + +#fuzzing +Some newer fuzzing tools are designed to run (or can be adapted to run) in Continuous Integration and Continuous Delivery. They let you to seed test values to create repeatable tests, set time boxes on test runs, detect duplicate errors, and write scripts to automatically set up/restore state in case the system crashes. But you might still find that fuzz testing is best done out of band. + +Behavior-Driven Development (BDD) and TestDriven Development (TDD)—wherein developers write tests before they write the code—encourage developers to create a strong set of automated tests to catch mistakes and protect themselves from regressions as they add new features or make changes or fixes to the code + + +## Automated Attacks + +Tools for automated attacks + +• Gauntlt • Mittn • BDD-Security + +## Vulnerability Management + +* How many vulnerabilities have you found? +• How were they found? What tools or testing approaches are giv‐ ing you the best returns? • What are the most serious vulnerabilities? +• How long are they taking to get fixed? Is this getting better or worse over time? + +Continuous Delivery pipelines into a vulnerability manager, such as Code Dx or ThreadFix. + + +Continuous Delivery pipeline: • Harden the systems that host the source and build artifact repo‐ sitories, the Continuous Integration and Continuous Delivery server(s), and the systems that host the configuration manage‐ ment, build, deployment, and release tools. Ensure that you clearly understand—and control—what is done on-premises and what is in the cloud. • Harden the Continuous Integration and/or Continuous Deliv‐ ery server. Tools like Jenkins are designed for developer conve‐ nience and are not secure by default. Ensure that these tools (and the required plug-ins) are kept up-to-date and tested fre‐ quently. +• Lock down and harden your configuration management tools. See “How to be a Secure Chef,” for example. • Ensure that keys, credentials, and other secrets are protected. Get secrets out of scripts and source code and plain-text files and use an audited, secure secrets manager like Chef Vault, Square’s KeyWhiz project, or HashiCorp Vault. • Secure access to the source and binary repos and audit access to them. • Implement access control across the entire tool chain. Do not allow anonymous or shared access to the repos, to the Continu‐ ous Integration server, or confirmation manager or any other tools. • Change the build steps to sign binaries and other build artifacts to prevent tampering. • Periodically review the logs to ensure that they are complete and that you can trace a change through from start to finish. Ensure that the logs are immutable, that they cannot be erased or forged. • Ensure that all of these systems are monitored as part of the production environment. + +Runtime Application Security Protection/Self-Protection (RASP) +which uses run-time instrumentation to catch security problems as they occur. Like application firewalls, RASP can automatically identify and block attacks. And like application firewalls, you can extend RASP to leg‐ acy apps for which you don’t have source code. + +There are only a small number of RASP solutions available today, mostly limited to applications that run in the Java JVM and .NET CLR, although support for other languages like Node.js, Python, and Ruby is emerging. These tools include the following: + +• Immunio • Waratek • Prevoty \ No newline at end of file diff --git a/content/Devops&DevSecOps/Github Actions.md b/content/Devops&DevSecOps/Github Actions.md new file mode 100644 index 000000000..673c726f6 --- /dev/null +++ b/content/Devops&DevSecOps/Github Actions.md @@ -0,0 +1,18 @@ + + +GitHub Actions goes beyond just DevOps and lets you run workflows when other events happen in your repository. For example, you can run a workflow to automatically add the appropriate labels whenever someone creates a new issue in your repository. + +GitHub provides Linux, Windows, and macOS virtual machines to run your workflows, or you can host your own self-hosted runners in your own data center or cloud infrastructure. + + + +### [Actions](https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions#actions) + +An _action_ is a custom application for the GitHub Actions platform that performs a complex but frequently repeated task. Use an action to help reduce the amount of repetitive code that you write in your workflow files. An action can pull your git repository from GitHub, set up the correct toolchain for your build environment, or set up the authentication to your cloud provider. + +You can write your own actions, or you can find actions to use in your workflows in the GitHub Marketplace. + +For more information, see "[Creating actions](https://docs.github.com/en/actions/creating-actions)." + + +https://youtu.be/TLB5MY9BBa4 \ No newline at end of file diff --git a/content/Devops&DevSecOps/Gitlab.md b/content/Devops&DevSecOps/Gitlab.md new file mode 100644 index 000000000..b8f9d0da3 --- /dev/null +++ b/content/Devops&DevSecOps/Gitlab.md @@ -0,0 +1,647 @@ +Example yaml file name must be **.gitlab-ci.yaml** + +```yaml +stages: + # - test + - build + - deploy + +pre-job: + stage: .pre + script: + - echo 'This message is from .pre-job' + +build-job: + stage: build + script: + - echo 'This message is from build-job' + +test-job: + stage: test + script: + - echo 'This message is from test-job' + +deploy-job: + stage: deploy + script: + - echo 'This message is from deploy-job' + +post-job: + stage: .post + script: + - echo 'This message is from .post-job' +``` +![[Screenshot from 2023-03-14 11-12-33.png]] + +Default stages use default order other than that you can use +```yaml +stages: + -test1 + -test2 + -test3 +``` + +Example: +```yaml +stages: + - build + - deploy + +build: + image: node + stage: build + script: + # - apt update -y + # - apt install npm -y + - npm install + artifacts: + paths: + - node_modules + - package-lock.json + # expire_in: 1 week + +deploy: + image: node + stage: deploy + script: + # - apt update -y + # - apt install nodejs -y + - node index.js > /dev/null 2>&1 & # these command runs in background and does not effect timeout +``` + + +## Gitlab Runners + +Application that works for picking CI/CD and execute CI/CD jobs. + + +Settings > shared runners or specific runners + +Runners has tag like docker mongodb ruby. That means which can runners can handle. + +for example windows tag we can use in our yaml. +```yaml +windows-info: + tags: + - windows + script: + - systeminfo +``` + +Runner must be same version with gitlab. + +**sudo gitlab-runner register** for register runner. You can take runner token from setttings + +run gitlab-runner locally +```yaml +stages: + - build_stage + - deploy_stage + +build: + stage: build_stage + script: + - docker --version + - docker build -t pyapp . + tags: + - localshell + - localrunner + +deploy: + stage: deploy_stage + script: + - docker stop pyappcontainer1 || true && docker rm pyappcontainer1 || true + - docker run -d --name pyappcontainer1 -p 8080:8080 pyapp + tags: + - localshell + - localrunner +``` + + Git-runner add admin group ==> sudo usermod -aG docker gitlab-runner + +![[Screenshot from 2023-03-14 13-08-06.png]] + +Variables ==> use security token,url , long string etc. +Gitlab variable [url](https://docs.gitlab.com/ee/ci/variables/predefined_variables.html) + +predefine +```yaml +demo_job: + script: + - echo $CI_COMMIT_MESSAGE + - echo $CI_JOB_NAME +``` + +direcly set in yaml +```yaml +variables: + name: 'John' + message: 'How are you?' + +display_message: + variables: + name: 'Mark' + script: + - echo "Hello $name, $message" +``` + +secret variable + +```yaml +push_image: + script: + - docker login -u $USERNAME -p $PASSWORD + - docker tag pyapp:latest $USERNAME/mypyapp:latest + - docker push $USERNAME/mypyapp:latest + tags: + - localshell + - localrunner +``` + +# Enviroments +```yaml +stages: + - test + - build + - deploy staging + - automated testing + - deploy production + +variables: + IMAGE_TAG: $CI_REGISTRY_IMAGE/employee-image:$CI_COMMIT_SHORT_SHA + STAGING_APP: emp-portal-staging + PRODUCTION_APP: emp-portal-production + + HEROKU_STAGING: "registry.heroku.com/$STAGING_APP/web" + HEROKU_PRODUCTION: "registry.heroku.com/$PRODUCTION_APP/web" + + +lint_test: + image: python:3.8.0-slim + stage: test + before_script: + - pip install flake8-html + script: + - flake8 --format=html --htmldir=flake_reports/ + artifacts: + when: always + paths: + - flake_reports/ + +pytest: + image: python:3.8.0-slim + stage: test + before_script: + - pip install pytest-html + - pip install -r requirements.txt + script: + - pytest --html=pytest_reports/pytest-report.html --self-contained-html + artifacts: + when: always + paths: + - pytest_reports/ + +build: + image: docker:latest + services: + - docker:dind + stage: build + before_script: + - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY + script: + - docker build -t $IMAGE_TAG . + - docker images + - docker push $IMAGE_TAG + +deploy_stage: + image: docker:latest + services: + - docker:dind + stage: deploy staging + environment: + name: staging + url: https://$STAGING_APP.herokuapp.com/ + before_script: + - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY + script: + - docker pull $IMAGE_TAG + - docker tag $IMAGE_TAG $HEROKU_STAGING + - docker login -u _ -p $HEROKU_STAGING_API_KEY registry.heroku.com + - docker push $HEROKU_STAGING + - docker run --rm -e HEROKU_API_KEY=$HEROKU_STAGING_API_KEY wingrunr21/alpine-heroku-cli container:release web --app $STAGING_APP + - echo "App deployed to stagig server at https://$STAGING_APP.herokuapp.com/" + only: + - main + +test_stage: + image: alpine + stage: automated testing + before_script: + - apk --no-cache add curl + script: + - curl https://$STAGING_APP.herokuapp.com/ | grep "Employee Data" + only: + - main + +deploy_production: + image: docker:latest + services: + - docker:dind + stage: deploy production + environment: + name: production + url: https://$PRODUCTION_APP.herokuapp.com/ + before_script: + - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY + script: + - docker pull $IMAGE_TAG + - docker tag $IMAGE_TAG $HEROKU_PRODUCTION + - docker login -u _ -p $HEROKU_PRODUCTION_API_KEY registry.heroku.com + - docker push $HEROKU_PRODUCTION + - docker run --rm -e HEROKU_API_KEY=$HEROKU_PRODUCTION_API_KEY wingrunr21/alpine-heroku-cli container:release web --app $PRODUCTION_APP + - echo "App deployed to production server at https://$PRODUCTION_APP.herokuapp.com/"Project - deploy to production + only: + - main +``` + + environment: + name: production + url: https://$PRODUCTION_APP.herokuapp.com/ + +# Dynamic enviroments + +https://gitlab.com/gitlab-org/gitlab-runner/-/issues/1809 + + +```yaml +stages: + - test + - build + - deploy feature + - automated feature testing + - deploy staging + - automated testing + - deploy production + +variables: + IMAGE_TAG: $CI_REGISTRY_IMAGE/employee-image:$CI_COMMIT_SHORT_SHA + STAGING_APP: emp-portal-staging + PRODUCTION_APP: emp-portal-production + + HEROKU_STAGING: "registry.heroku.com/$STAGING_APP/web" + HEROKU_PRODUCTION: "registry.heroku.com/$PRODUCTION_APP/web" + + +lint_test: + image: python:3.8.0-slim + stage: test + before_script: + - pip install flake8-html + script: + - flake8 --format=html --htmldir=flake_reports/ + artifacts: + when: always + paths: + - flake_reports/ + +pytest: + image: python:3.8.0-slim + stage: test + before_script: + - pip install pytest-html + - pip install -r requirements.txt + script: + - pytest --html=pytest_reports/pytest-report.html --self-contained-html + artifacts: + when: always + paths: + - pytest_reports/ + +build: + image: docker:latest + services: + - docker:dind + stage: build + before_script: + - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY + script: + - docker build -t $IMAGE_TAG . + - docker images + - docker push $IMAGE_TAG + +deploy_feature: + image: docker:latest + services: + - docker:dind + stage: deploy feature + environment: + name: review/$CI_COMMIT_REF_NAME + url: https://$CI_ENVIRONMENT_SLUG.herokuapp.com/ + before_script: + - export FEATURE_APP="$CI_ENVIRONMENT_SLUG" + - export HEROKU_FEATURE="registry.heroku.com/$FEATURE_APP/web" + - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY + script: + - echo "FEATURE_APP=$CI_ENVIRONMENT_SLUG" >> deploy_feature.env + - docker pull $IMAGE_TAG + - docker tag $IMAGE_TAG $HEROKU_FEATURE + - docker run --rm -e HEROKU_API_KEY=$HEROKU_STAGING_API_KEY wingrunr21/alpine-heroku-cli create $FEATURE_APP + - docker login -u _ -p $HEROKU_STAGING_API_KEY registry.heroku.com + - docker push $HEROKU_FEATURE + - docker run --rm -e HEROKU_API_KEY=$HEROKU_STAGING_API_KEY wingrunr21/alpine-heroku-cli container:release web --app $FEATURE_APP + - echo "App deployed to FEATURE server at https://$FEATURE_APP.herokuapp.com/" + artifacts: + reports: + dotenv: deploy_feature.env + only: + - /^feature-.*$/ + +test_feature: + image: alpine + stage: automated feature testing + before_script: + - apk --no-cache add curl + script: + - curl https://$FEATURE_APP.herokuapp.com/ | grep "Employee Data" + dependencies: + - deploy_feature + only: + - /^feature-.*$/ + +deploy_stage: + image: docker:latest + services: + - docker:dind + stage: deploy staging + environment: + name: staging + url: https://$STAGING_APP.herokuapp.com/ + before_script: + - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY + script: + - docker pull $IMAGE_TAG + - docker tag $IMAGE_TAG $HEROKU_STAGING + - docker login -u _ -p $HEROKU_STAGING_API_KEY registry.heroku.com + - docker push $HEROKU_STAGING + - docker run --rm -e HEROKU_API_KEY=$HEROKU_STAGING_API_KEY wingrunr21/alpine-heroku-cli container:release web --app $STAGING_APP + - echo "App deployed to stagig server at https://$STAGING_APP.herokuapp.com/" + only: + - main + +test_stage: + image: alpine + stage: automated testing + before_script: + - apk --no-cache add curl + script: + - curl https://$STAGING_APP.herokuapp.com/ | grep "Employee Data" + only: + - main + +deploy_production: + image: docker:latest + services: + - docker:dind + stage: deploy production + environment: + name: production + url: https://$PRODUCTION_APP.herokuapp.com/ + before_script: + - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY + script: + - docker pull $IMAGE_TAG + - docker tag $IMAGE_TAG $HEROKU_PRODUCTION + - docker login -u _ -p $HEROKU_PRODUCTION_API_KEY registry.heroku.com + - docker push $HEROKU_PRODUCTION + - docker run --rm -e HEROKU_API_KEY=$HEROKU_PRODUCTION_API_KEY wingrunr21/alpine-heroku-cli container:release web --app $PRODUCTION_APP + - echo "App deployed to production server at https://$PRODUCTION_APP.herokuapp.com/"Project - deploy to production + only: + - main + when: manual + +``` + + +# GitLab DevSecOps + +SAST sonar cloud. +```yaml +stages: + - runSAST + +run-sast-job: + stage: runSAST + image: maven:3.8.5-openjdk-11-slim + script: | + mvn verify package sonar:sonar -Dsonar.host.url=https://sonarcloud.io/ -Dsonar.organization=gitlabdevsecopsintegration -Dsonar.projectKey=gitlabdevsecopsintegration -Dsonar.login=token01 +``` +================================================================== +Sonar cloud quality gateways +``` +1) Create Custom Quality Gate in SonarCloud and Add conditions to the Quality Gate +2) Assign this Quality Gate to the Project +3) Add script in .gitlab-ci.yml file to enable quality gate check (Note: This will fail your build in case Quality Gate fails) + +sleep 5 +apt-get update +apt-get -y install curl jq +quality_status=$(curl -s -u 14ad4797c02810a818f21384add02744d3f9e34d: https://sonarcloud.io/api/qualitygates/project_status?projectKey=gitLabdevsecopsintegration | jq -r '.projectStatus.status') +echo "SonarCloud Analysis Status is $quality_status"; +if [[ $quality_status == "ERROR" ]] ; then exit 1;fi + + +-----------Sample JSON Response from SonarCloud or SonarQube Quality Gate API--------------------- + +{ + "projectStatus": { + "status": "ERROR", + "conditions": [ + { + "status": "ERROR", + "metricKey": "coverage", + "comparator": "LT", + "errorThreshold": "90", + "actualValue": "0.0" + } + ], + "periods": [], + "ignoredConditions": false + } +} +``` + +```yaml +stages: + - runSAST + +run-sast-job: + stage: runSAST + image: maven:3.8.5-openjdk-11-slim + script: | + apt-get update + apt-get -y install curl jq + mvn verify package sonar:sonar -Dsonar.host.url=https://sonarcloud.io/ -Dsonar.organization=gitlabdevsecopsintegrtion -Dsonar.projectKey=gitLabdevsecopsintegration -Dsonar.login=14ad4797c02810a818f21384add02744d3f9e34d + sleep 5 + quality_status=$(curl -s -u 14ad4797c02810a818f21384add02744d3f9e34d: https://sonarcloud.io/api/qualitygates/project_status?projectKey=gitLabdevsecopsintegration | jq -r '.projectStatus.status') + echo "SonarCloud Analysis Status is $quality_status"; + if [[ $quality_status == "ERROR" ]] ; then exit 1;fi +``` +================================================================== +Test coverage +``` +1) Unit Test cases should be present in test folder +2) Junit Plugin should be present in pom.xml +3) Jacoco Plugin should be present in pom.xml +4) Jacoco report execution goal should be present in build tag in pom.xml +5) Maven "verify" goal should be run while running sonar analysis +``` + +```yaml +stages: + - runSAST + +run-sast-job: + stage: runSAST + image: maven:3.8.5-openjdk-11-slim + script: | + mvn verify package sonar:sonar -Dsonar.host.url=https://sonarcloud.io/ -Dsonar.organization=gitlabdevsecopsintegration -Dsonar.projectKey=gitlabdevsecopsintegration -Dsonar.login=2fda8f4a1af600afbede42c54c868083d8e34c01 +``` +================================================================== +SCA in gitlab security + +Steps to integrate Snyk using .gitlab-ci.yml file: + +1) Add Snyk Plugin to Pom.xml +2) Define Snyk Token as an environment Variable on the runner machine +3) Add code changes to .gitlab-ci.yml file + +```yaml +stages: + - runSCAScanUsingSnyk + +run-sca-job: + stage: runSCAScanUsingSnyk + image: maven:3.8.5-openjdk-11-slim + script: | + SNYK_TOKEN='2f4afa39-c493-4c6d-b34e-080c1a8f9014' + export SNYK_TOKEN + mvn snyk:test -fn +``` + +================================================================== + +DAST tool using OWASP ZAP + +```yaml +stages: + - runDASTScanUsingZAP + +run-dast-job: + stage: runDASTScanUsingZAP + image: maven:3.8.5-openjdk-11-slim + script: | + apt-get update + apt-get -y install wget + wget https://github.com/zaproxy/zaproxy/releases/download/v2.11.1/ZAP_2.11.1_Linux.tar.gz + mkdir zap + tar -xvf ZAP_2.11.1_Linux.tar.gz + cd ZAP_2.11.1 + ./zap.sh -cmd -quickurl https://www.example.com -quickprogress -quickout ../zap_report.html + artifacts: + paths: + - zap_report.html +``` +================================================================== +End to end CI/CD pipeline for java projects + + +```yaml +stages: + - runSASTScanUsingSonarCloud + - runSCAScanUsingSnyk + - runDASTScanUsingZAP + +run-sast-job: + stage: runSASTScanUsingSonarCloud + image: maven:3.8.5-openjdk-11-slim + script: | + mvn verify package sonar:sonar -Dsonar.host.url=https://sonarcloud.io/ -Dsonar.organization=gitlabdevsecopsintegration -Dsonar.projectKey=gitlabdevsecopsintegration -Dsonar.login=2fda8f4a1af600afbede42c54c868083d8e34c01 + +run-sca-job: + stage: runSCAScanUsingSnyk + image: maven:3.8.5-openjdk-11-slim + script: | + SNYK_TOKEN='2f4afa39-c493-4c6d-b34e-080c1a8f9014' + export SNYK_TOKEN + mvn snyk:test -fn + +run-dast-job: + stage: runDASTScanUsingZAP + image: maven:3.8.5-openjdk-11-slim + script: | + apt-get update + apt-get -y install wget + wget https://github.com/zaproxy/zaproxy/releases/download/v2.11.1/ZAP_2.11.1_Linux.tar.gz + mkdir zap + tar -xvf ZAP_2.11.1_Linux.tar.gz + cd ZAP_2.11.1 + ./zap.sh -cmd -quickurl https://www.example.com -quickprogress -quickout ../zap_report.html + artifacts: + paths: + - zap_report.html +``` + +# Gitlab buildin SAST and DAST +GitLab SAST Analyzer Documentation Page: https://docs.gitlab.com/ee/user/application_security/sast/ + + +GitLab DAST Analyzer Documentation Page: https://docs.gitlab.com/ee/user/application_security/dast/ + +```yaml +include: + - template: Security/SAST.gitlab-ci.yml + - template: DAST.gitlab-ci.yml + +variables: + SAST_EXPERIMENTAL_FEATURES: "true" + DAST_WEBSITE: http://www.example.com + DAST_FULL_SCAN_ENABLED: "true" + DAST_BROWSER_SCAN: "true" + +stages: + - test + - runSASTScanUsingSonarCloud + - runSCAScanUsingSnyk + - runDASTScanUsingZAP + - dast + +run-sast-job: + stage: runSASTScanUsingSonarCloud + image: maven:3.8.5-openjdk-11-slim + script: | + mvn verify package sonar:sonar -Dsonar.host.url=https://sonarcloud.io/ -Dsonar.organization=gitlabdevsecopsintegrationkey -Dsonar.projectKey=gitlabdevsecopsintegrationkey -Dsonar.login=9ff892826b54980437f4fb0fbc72f4049ec97585 + +run-sca-job: + stage: runSCAScanUsingSnyk + image: maven:3.8.5-openjdk-11-slim + script: | + SNYK_TOKEN='2f4afa39-c493-4c6d-b34e-080c1a8f9014' + export SNYK_TOKEN + mvn snyk:test -fn + +run-dast-job: + stage: runDASTScanUsingZAP + image: maven:3.8.5-openjdk-11-slim + script: | + apt-get update + apt-get -y install wget + wget https://github.com/zaproxy/zaproxy/releases/download/v2.11.1/ZAP_2.11.1_Linux.tar.gz + mkdir zap + tar -xvf ZAP_2.11.1_Linux.tar.gz + cd ZAP_2.11.1 + ./zap.sh -cmd -quickurl https://www.example.com -quickprogress -quickout ../zap_report.html + artifacts: + paths: + - zap_report.html +``` + diff --git a/content/Devops&DevSecOps/Jenkins.md b/content/Devops&DevSecOps/Jenkins.md new file mode 100644 index 000000000..fecf79b6d --- /dev/null +++ b/content/Devops&DevSecOps/Jenkins.md @@ -0,0 +1,55 @@ +# How to create simple setup with docker-compose + +Look this [url](https://www.cloudbees.com/blog/how-to-install-and-run-jenkins-with-docker-compose) +Note: in This link jvm /java path is wrong =>/opt/java/openjdk/bin/java + +# Reddit recommendations + +[YOUTUBE](https://youtu.be/MTm3cb7qiEo?list=PLVx1qovxj-akoYTAboxT1AbHlPmrvRYYZ) +[Docs](https://www.jenkins.io/doc/pipeline/tour/getting-started/) + +Install Linux slave in jenkins [url](https://youtu.be/pzG_ZQNbZug) +Install Windows slave in windows [url](https://youtu.be/655a1itG3xg?list=PLVx1qovxj-akoYTAboxT1AbHlPmrvRYYZ) + + + +==================================================== +Books +* Jenkins: The Definitive Guide +* Jenkins 2: Up and Running Evolve Your Deployment Pipeline for Next Generation Automation + +# Jenkins: The Definitive Guide + + + +[Github Repo Link](https://github.com/ricardoandre97/jenkins-resources.git) + +Continuous Integration is about reducing risk by providing faster feedback. +First and foremost, it is designed to help identify and fix integration and regression issues faster, resulting in smoother, quicker delivery, and fewer bugs. +The practice of automatically deploying every successful build directly into production is generally known as Continuous Deployment. However, a pure Continuous Deployment approach is not always appropriate for everyone. For example, many users would not appreciate new versions falling into their laps several times a week, and prefer a more predictable (and transparent) release cycle. Commercial and marketing considerations might also play a role in when a new release should actually be deployed. + + +# Introducing Continuous Integration into Your Organization + +Phase 1—Create Build Server + +Phase 2—Nightly Builds + +Phase 3—Nightly Builds and Basic Automated Tests + +Phase 4—Enter the Metrics +Automated code quality and code coverage metrics. code quality build also automatically generates API documentation for the application. + +P +hase 5—Getting More Serious About Testing +Test-Driven Development are more widely practiced. The application is no longer simply compiled and tested, but if the tests pass, it is automatically deployed to an application server for more comprehensive end-to-end tests and performance tests. + +Phase 6—Automated Acceptance Tests and More Automated Deployment + +Behavior-Driven Development and Acceptance-Test Driven Development tools to act as communication and documentation tools and documentation as much as testing tools, publishing reports on test results in business terms that non-developers can understand.The application is automatically deployed into test environments for testing by the QA team either as changes are committed, or on a nightly basis; a version can be deployed (or “promoted”) to UAT and possibly production environments using a manually-triggered build when testers consider it ready. rolling back to a previous release, if something goes horribly wrong. + +Phase 7—Continuous Deployment + +# Chapter 2 + +# Udemy Course diff --git a/content/Devops&DevSecOps/KEGM DevSecOps.md b/content/Devops&DevSecOps/KEGM DevSecOps.md new file mode 100644 index 000000000..e916b4e48 --- /dev/null +++ b/content/Devops&DevSecOps/KEGM DevSecOps.md @@ -0,0 +1,154 @@ + +# Index + +* Azure Devops ortaminin kurulmasi +* Azure dev ve hvl bulutun gosterilmesi +* ++[Local install steps](https://www.flexmind.co/azure-devops-local-server/#:~:text=Azure%20DevOps%20Server%20Installation%20Steps%20%3A%201%201.,exe%20file%20downloaded%20for%20us%20.%20More%20items) +```bash +No hosted parallelism has been purchased or granted. To request a free parallelism grant, please fill out the following form https://aka.ms/azpipelines-parallelism-reques + +``` +* Local Agent kurulmasi +* +### Repo url [link](https://github.com/HVLRED/azure-devops-basics) + +### First pipeline yml +Name of file azure-pipeline.yml +```yaml +# Maven +# Build your Java project and run tests with Apache Maven. +# Add steps that analyze code, save build artifacts, deploy, and more: +# https://docs.microsoft.com/azure/devops/pipelines/languages/java + +trigger: +- main + +pool: + name: hvlubuntu + +steps: +- task: Maven@1 + inputs: + mavenPomFile: 'pom.xml' + publishJUnitResults: true + testResultsFiles: '**/surefire-reports/TEST-*.xml' + javaHomeOption: 'JDKVersion' + mavenVersionOption: 'Default' + mavenAuthenticateFeed: false + effectivePomSkip: false + sonarQubeRunAnalysis: false` +``` + +![[Pasted image 20230714155723.png]] + +**CI/CD Build and Release Pipelines** + +![[Pasted image 20230714155935.png]] + +### Change index.jsp and trigger pipeline + + +## Show source code and build dir. + + +![[Screenshot from 2023-07-14 16-37-45.png]] + + +### Copy artifacts + +```yaml +# Maven + +# Build your Java project and run tests with Apache Maven. + +# Add steps that analyze code, save build artifacts, deploy, and more: + +# https://docs.microsoft.com/azure/devops/pipelines/languages/java + + + +trigger: + +- main + + + +pool: + +name: hvlubuntu + + + +steps: + +- task: Maven@1 + +inputs: + +mavenPomFile: 'pom.xml' + +publishJUnitResults: true + +testResultsFiles: '**/surefire-reports/TEST-*.xml' + +javaHomeOption: 'JDKVersion' + +mavenVersionOption: 'Default' + +mavenAuthenticateFeed: false + +effectivePomSkip: false + +sonarQubeRunAnalysis: false` + +- task: CopyFiles@2 + +inputs: + +Contents: '**/*.war' + +TargetFolder: '$(build.artifactstagingdirectory)' +``` + +![[Pasted image 20230714164425.png]] + +### For see results in azuredevops we need to publish artifacts + +![[Pasted image 20230714164855.png]] + +```yaml +# Maven +# Build your Java project and run tests with Apache Maven. +# Add steps that analyze code, save build artifacts, deploy, and more: +# https://docs.microsoft.com/azure/devops/pipelines/languages/java + +trigger: +- main + +pool: + name: hvlubuntu + +steps: +- task: Maven@1 + inputs: + mavenPomFile: 'pom.xml' + publishJUnitResults: true + testResultsFiles: '**/surefire-reports/TEST-*.xml' + javaHomeOption: 'JDKVersion' + mavenVersionOption: 'Default' + mavenAuthenticateFeed: false + effectivePomSkip: false + sonarQubeRunAnalysis: false` + +- task: CopyFiles@2 + inputs: + Contents: '**/*.war' + TargetFolder: '$(build.artifactstagingdirectory)' +- task: PublishPipelineArtifact@1 + inputs: + targetPath: '$(Pipeline.Workspace)' + artifact: 'warfile' + publishLocation: 'pipeline' +``` + + diff --git a/content/Devops&DevSecOps/Lockheed Martin Software Factory.md b/content/Devops&DevSecOps/Lockheed Martin Software Factory.md new file mode 100644 index 000000000..beea23ced --- /dev/null +++ b/content/Devops&DevSecOps/Lockheed Martin Software Factory.md @@ -0,0 +1,75 @@ +![[Screenshot from 2023-03-14 15-53-57.png]] + +**Software Dojos: Iterative Learning** + +Software Dojos — or training facilities. With a globally diverse +workforce, we strive to provide ways for our employees to upskill and master DecSecOps +practices to commit to the most effective software delivery. After developing initial skills, adoption and adherence of these best practices can grow over +time. With common training grounds, our employees continue to leverage the best code +across multiple domains. + +#article +## The Best of Both Worlds: Agile Development Meets Product Line Engineering at Lockheed Martin + +Product line engineering (PLE) brings large-scale improvements in cost, time to market, product quality, and more. It promotes adaptive planning, evolutionary development, early +delivery, continuous improvement, and encourages rapid and flexible response to change +This paper conveys the experience of Lockheed Martin, the world’s largest defense +contractor, as it is applying PLE and Agile together on one of its largest and most important +projects. Not only is the project highly visible with demanding requirements, it is also very +large, comprising some 10 million lines of code + +## PLE as Factory + +Manufacturers have long used engineering techniques to create a product line of similar +products using a common factory that assembles and configures parts to produce the varying products in the product line. For example, automotive manufacturers can create thousands of unique variations of one car model using a single pool of parts carefully designed to be configurable and factories specifically designed to configure and assemble those parts. + +In PLE, the configurator is the factory’s automation component; the “parts” are the assets in +the factory’s supply chain. A statement of the properties desired in the end product tells the +configurator how to configure the assets. + +A product specification at the top tells the configurator how to configure the assets coming in from the left.This enables the rapid production of any variant of any of the assets for any of the products in the portfolio. The products can comprise any combination of software, systems in which software runs, or non-software systems that have software-representable artifacts (such as requirements, engineering models, or development plans) associated with the engineering process that produces them. + + +In this context “product” means not only the primary entity being built and delivered, but also all of the artifacts that are produced along with it. Some of these support the engineering process (such as requirements, project plans, design modes, and test cases), while others are delivered alongside the thing being built (such as user manuals, shipping labels, and parts lists). These artifacts are the product line’s assets. + + +Shared assets can include, but are not limited to, requirements, design specifications, design models, source code, build files, test plans and test cases, user documentation, repair manuals and installation guides, project budgets, schedules, and work plans, product calibration and configuration files, data models, parts lists, and more. + + + +![[Screenshot from 2023-03-14 16-07-26.png]] + + +PLE stands in contrast to traditional product-centric development, in which each individual product is developed and evolved independently from other products, or (at best) starts out as a cloned copy of a similar product that is then changed to suit the new product’s specific needs. Product-centric development takes very little advantage of the commonalities among products in a portfolio after the initial clone operation. + + +a production shop in which N products are developed and maintained. In this stylized view, each product comprises requirements, design models, source code, and test cases. Each engineer in this shop works primarily on a single product. When a new product is launched, its project copies the most similar assets it can find, and starts adapting them to meet the new product’s needs. + +![[Screenshot from 2023-03-14 16-08-26.png]] + + +just enough, just in time” approach with just enough detail to size a project and ensure its technical and economic feasibility. + +Agile provides for more level loading and resource allocation. Previously, the response to a looming deadline was to “surge,” adding resources for a milestone and then ramping back down. Now, with better planning and tighter customer involvement, that can avoided. Lessons about teaming are emerging. First, teams should be co-located, if possible. Second, this structure can expose weaker individuals; everyone needs to carry their weight, since there’s no place for mediocre performers to hide on a small team. Not everyone is cut out for this approach, as it requires individuals to perform to their best abilities. Culminating each sprint with a review or demo for the customer (typically showing off new features or architectural improvements) establishes trust and instills confidence in the customer and other stakeholders. + +# Recommendations from DOD + +Recommendation 1: Software Factory +Recommendation 2: Continuous Iterative Development + * deliver a series of viable products (starting with MVP) followed by successive next viable products (NVPs); + * establish MVP and the equivalent of a product manager for each program in its formal + acquisition strategy, and arrange for the warfighter to adopt the initial operational + capability (IOC) as an MVP for evaluation and feedback + * engage Congress to change statutes to transition Configuration Steering Boards (CSB) to support rapid iterative approaches +Recommendation 3: Risk Reduction and Metrics for New Programs + * Sprint Burndown + * Epic and Release burndown + * Velocity +Recommendation 6: Software is Immortal – Software Sustainment +Recommendation 7: Independent Verification and Validation for Machine Learning + +![[Screenshot from 2023-03-14 16-20-58.png]] + +![[Screenshot from 2023-03-14 16-21-19.png]] + +#softwarefactory diff --git a/content/SoftwareEnginnering/test.md b/content/SoftwareEnginnering/test.md new file mode 100644 index 000000000..68c56b6ec --- /dev/null +++ b/content/SoftwareEnginnering/test.md @@ -0,0 +1,11 @@ +# Software Enginnering +adsmsadsa +d +asd +as +d +as +da +sd +as +d diff --git a/content/imgs/Pasted image 20230424003955.png b/content/imgs/Pasted image 20230424003955.png new file mode 100644 index 000000000..0f11dd06b Binary files /dev/null and b/content/imgs/Pasted image 20230424003955.png differ diff --git a/content/imgs/Pasted image 20230424004931.png b/content/imgs/Pasted image 20230424004931.png new file mode 100644 index 000000000..8a7658a20 Binary files /dev/null and b/content/imgs/Pasted image 20230424004931.png differ diff --git a/content/imgs/Pasted image 20230424010029.jpg b/content/imgs/Pasted image 20230424010029.jpg new file mode 100644 index 000000000..5fa349200 Binary files /dev/null and b/content/imgs/Pasted image 20230424010029.jpg differ diff --git a/content/imgs/Pasted image 20230424010230.jpg b/content/imgs/Pasted image 20230424010230.jpg new file mode 100644 index 000000000..f5b5ef296 Binary files /dev/null and b/content/imgs/Pasted image 20230424010230.jpg differ diff --git a/content/imgs/Pasted image 20230424012706.jpg b/content/imgs/Pasted image 20230424012706.jpg new file mode 100644 index 000000000..9f62b8e02 Binary files /dev/null and b/content/imgs/Pasted image 20230424012706.jpg differ diff --git a/content/imgs/Pasted image 20230424012922.png b/content/imgs/Pasted image 20230424012922.png new file mode 100644 index 000000000..68d507aac Binary files /dev/null and b/content/imgs/Pasted image 20230424012922.png differ diff --git a/content/imgs/Pasted image 20230424013004.jpg b/content/imgs/Pasted image 20230424013004.jpg new file mode 100644 index 000000000..924850282 Binary files /dev/null and b/content/imgs/Pasted image 20230424013004.jpg differ diff --git a/content/imgs/Pasted image 20230424013722.jpg b/content/imgs/Pasted image 20230424013722.jpg new file mode 100644 index 000000000..99f6f5279 Binary files /dev/null and b/content/imgs/Pasted image 20230424013722.jpg differ diff --git a/content/imgs/Pasted image 20230424013916.jpg b/content/imgs/Pasted image 20230424013916.jpg new file mode 100644 index 000000000..4429681b8 Binary files /dev/null and b/content/imgs/Pasted image 20230424013916.jpg differ diff --git a/content/imgs/Pasted image 20230424013923.jpg b/content/imgs/Pasted image 20230424013923.jpg new file mode 100644 index 000000000..fc8c92b65 Binary files /dev/null and b/content/imgs/Pasted image 20230424013923.jpg differ diff --git a/content/imgs/Pasted image 20230424013934.jpg b/content/imgs/Pasted image 20230424013934.jpg new file mode 100644 index 000000000..a312e17e7 Binary files /dev/null and b/content/imgs/Pasted image 20230424013934.jpg differ diff --git a/content/imgs/Pasted image 20230424144542.jpg b/content/imgs/Pasted image 20230424144542.jpg new file mode 100644 index 000000000..890cc1b82 Binary files /dev/null and b/content/imgs/Pasted image 20230424144542.jpg differ diff --git a/content/imgs/Pasted image 20230424144719.png b/content/imgs/Pasted image 20230424144719.png new file mode 100644 index 000000000..1e14e68b1 Binary files /dev/null and b/content/imgs/Pasted image 20230424144719.png differ diff --git a/content/imgs/Pasted image 20230424144908.jpg b/content/imgs/Pasted image 20230424144908.jpg new file mode 100644 index 000000000..a30e34a46 Binary files /dev/null and b/content/imgs/Pasted image 20230424144908.jpg differ diff --git a/content/imgs/Pasted image 20230424144912.jpg b/content/imgs/Pasted image 20230424144912.jpg new file mode 100644 index 000000000..9ed4b87eb Binary files /dev/null and b/content/imgs/Pasted image 20230424144912.jpg differ diff --git a/content/imgs/Pasted image 20230424145138.jpg b/content/imgs/Pasted image 20230424145138.jpg new file mode 100644 index 000000000..936655645 Binary files /dev/null and b/content/imgs/Pasted image 20230424145138.jpg differ diff --git a/content/imgs/Pasted image 20230424145312.png b/content/imgs/Pasted image 20230424145312.png new file mode 100644 index 000000000..a6c2017b8 Binary files /dev/null and b/content/imgs/Pasted image 20230424145312.png differ diff --git a/content/imgs/Pasted image 20230424145430.jpg b/content/imgs/Pasted image 20230424145430.jpg new file mode 100644 index 000000000..c76579894 Binary files /dev/null and b/content/imgs/Pasted image 20230424145430.jpg differ diff --git a/content/imgs/Pasted image 20230424145650.png b/content/imgs/Pasted image 20230424145650.png new file mode 100644 index 000000000..3a7bff451 Binary files /dev/null and b/content/imgs/Pasted image 20230424145650.png differ diff --git a/content/imgs/Pasted image 20230424155853.jpg b/content/imgs/Pasted image 20230424155853.jpg new file mode 100644 index 000000000..0e23cb3ff Binary files /dev/null and b/content/imgs/Pasted image 20230424155853.jpg differ diff --git a/content/imgs/Pasted image 20230424161932.jpg b/content/imgs/Pasted image 20230424161932.jpg new file mode 100644 index 000000000..e6a92fa3c Binary files /dev/null and b/content/imgs/Pasted image 20230424161932.jpg differ diff --git a/content/imgs/Pasted image 20230424162035.jpg b/content/imgs/Pasted image 20230424162035.jpg new file mode 100644 index 000000000..6b7c371dd Binary files /dev/null and b/content/imgs/Pasted image 20230424162035.jpg differ diff --git a/content/imgs/Pasted image 20230424170820.jpg b/content/imgs/Pasted image 20230424170820.jpg new file mode 100644 index 000000000..8c98012d6 Binary files /dev/null and b/content/imgs/Pasted image 20230424170820.jpg differ diff --git a/content/imgs/Pasted image 20230425100803.jpg b/content/imgs/Pasted image 20230425100803.jpg new file mode 100644 index 000000000..737bbb6c8 Binary files /dev/null and b/content/imgs/Pasted image 20230425100803.jpg differ diff --git a/content/imgs/Pasted image 20230425101319.jpg b/content/imgs/Pasted image 20230425101319.jpg new file mode 100644 index 000000000..6c625ce8c Binary files /dev/null and b/content/imgs/Pasted image 20230425101319.jpg differ diff --git a/content/imgs/Pasted image 20230425102110.jpg b/content/imgs/Pasted image 20230425102110.jpg new file mode 100644 index 000000000..e22ab9b80 Binary files /dev/null and b/content/imgs/Pasted image 20230425102110.jpg differ diff --git a/content/imgs/Pasted image 20230425102238.jpg b/content/imgs/Pasted image 20230425102238.jpg new file mode 100644 index 000000000..21fddfb01 Binary files /dev/null and b/content/imgs/Pasted image 20230425102238.jpg differ diff --git a/content/imgs/Pasted image 20230425103933.jpg b/content/imgs/Pasted image 20230425103933.jpg new file mode 100644 index 000000000..2232e97d6 Binary files /dev/null and b/content/imgs/Pasted image 20230425103933.jpg differ diff --git a/content/imgs/Pasted image 20230609230856.png b/content/imgs/Pasted image 20230609230856.png new file mode 100644 index 000000000..c177f2cb5 Binary files /dev/null and b/content/imgs/Pasted image 20230609230856.png differ diff --git a/content/imgs/Pasted image 20230609232930.png b/content/imgs/Pasted image 20230609232930.png new file mode 100644 index 000000000..8f24d6a55 Binary files /dev/null and b/content/imgs/Pasted image 20230609232930.png differ diff --git a/content/imgs/Pasted image 20230610004618.png b/content/imgs/Pasted image 20230610004618.png new file mode 100644 index 000000000..2b8ad64b4 Binary files /dev/null and b/content/imgs/Pasted image 20230610004618.png differ diff --git a/content/imgs/Pasted image 20230611182025.png b/content/imgs/Pasted image 20230611182025.png new file mode 100644 index 000000000..c33dafa15 Binary files /dev/null and b/content/imgs/Pasted image 20230611182025.png differ diff --git a/content/imgs/Pasted image 20230611182221.png b/content/imgs/Pasted image 20230611182221.png new file mode 100644 index 000000000..9e6888955 Binary files /dev/null and b/content/imgs/Pasted image 20230611182221.png differ diff --git a/content/imgs/Pasted image 20230611183035.png b/content/imgs/Pasted image 20230611183035.png new file mode 100644 index 000000000..cbc86d28a Binary files /dev/null and b/content/imgs/Pasted image 20230611183035.png differ diff --git a/content/imgs/Pasted image 20230611185557.png b/content/imgs/Pasted image 20230611185557.png new file mode 100644 index 000000000..37f5c41d1 Binary files /dev/null and b/content/imgs/Pasted image 20230611185557.png differ diff --git a/content/imgs/Pasted image 20230611195143.png b/content/imgs/Pasted image 20230611195143.png new file mode 100644 index 000000000..32e5b7b1d Binary files /dev/null and b/content/imgs/Pasted image 20230611195143.png differ diff --git a/content/imgs/Pasted image 20230612000055.png b/content/imgs/Pasted image 20230612000055.png new file mode 100644 index 000000000..fd1e4cc6a Binary files /dev/null and b/content/imgs/Pasted image 20230612000055.png differ diff --git a/content/imgs/Pasted image 20230612010748.png b/content/imgs/Pasted image 20230612010748.png new file mode 100644 index 000000000..f28efcdcc Binary files /dev/null and b/content/imgs/Pasted image 20230612010748.png differ diff --git a/content/imgs/Pasted image 20230612011006.png b/content/imgs/Pasted image 20230612011006.png new file mode 100644 index 000000000..bdc1de6fb Binary files /dev/null and b/content/imgs/Pasted image 20230612011006.png differ diff --git a/content/imgs/Pasted image 20230612011104.png b/content/imgs/Pasted image 20230612011104.png new file mode 100644 index 000000000..1d576a6e1 Binary files /dev/null and b/content/imgs/Pasted image 20230612011104.png differ diff --git a/content/imgs/Pasted image 20230714155723.png b/content/imgs/Pasted image 20230714155723.png new file mode 100644 index 000000000..4719a4dce Binary files /dev/null and b/content/imgs/Pasted image 20230714155723.png differ diff --git a/content/imgs/Pasted image 20230714155935.png b/content/imgs/Pasted image 20230714155935.png new file mode 100644 index 000000000..7cd53ebd7 Binary files /dev/null and b/content/imgs/Pasted image 20230714155935.png differ diff --git a/content/imgs/Pasted image 20230714164425.png b/content/imgs/Pasted image 20230714164425.png new file mode 100644 index 000000000..1ef4c0139 Binary files /dev/null and b/content/imgs/Pasted image 20230714164425.png differ diff --git a/content/imgs/Pasted image 20230714164855.png b/content/imgs/Pasted image 20230714164855.png new file mode 100644 index 000000000..bc67d6cc7 Binary files /dev/null and b/content/imgs/Pasted image 20230714164855.png differ diff --git a/content/imgs/Pasted image 20230718102453.png b/content/imgs/Pasted image 20230718102453.png new file mode 100644 index 000000000..c0793787f Binary files /dev/null and b/content/imgs/Pasted image 20230718102453.png differ diff --git a/content/imgs/Pasted image 20230718105536.png b/content/imgs/Pasted image 20230718105536.png new file mode 100644 index 000000000..b5355d96a Binary files /dev/null and b/content/imgs/Pasted image 20230718105536.png differ diff --git a/content/imgs/Pasted image 20230719124201.png b/content/imgs/Pasted image 20230719124201.png new file mode 100644 index 000000000..3a292f629 Binary files /dev/null and b/content/imgs/Pasted image 20230719124201.png differ diff --git a/content/imgs/Pasted image 20230719165826.png b/content/imgs/Pasted image 20230719165826.png new file mode 100644 index 000000000..90b540e20 Binary files /dev/null and b/content/imgs/Pasted image 20230719165826.png differ diff --git a/content/imgs/Pasted image 20230719171203.png b/content/imgs/Pasted image 20230719171203.png new file mode 100644 index 000000000..90b540e20 Binary files /dev/null and b/content/imgs/Pasted image 20230719171203.png differ diff --git a/content/imgs/Product-Page-Diagram-AWSX-CloudTrail_How-it-Works.d2f51f6e3ec3ea3b33d0c48d472f0e0b59b46e59.png b/content/imgs/Product-Page-Diagram-AWSX-CloudTrail_How-it-Works.d2f51f6e3ec3ea3b33d0c48d472f0e0b59b46e59.png new file mode 100644 index 000000000..d19b8ecb1 Binary files /dev/null and b/content/imgs/Product-Page-Diagram-AWSX-CloudTrail_How-it-Works.d2f51f6e3ec3ea3b33d0c48d472f0e0b59b46e59.png differ diff --git a/content/imgs/Screenshot from 2023-03-10 13-08-33.png b/content/imgs/Screenshot from 2023-03-10 13-08-33.png new file mode 100644 index 000000000..089ac1b34 Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-10 13-08-33.png differ diff --git a/content/imgs/Screenshot from 2023-03-10 13-47-48.png b/content/imgs/Screenshot from 2023-03-10 13-47-48.png new file mode 100644 index 000000000..9ff553d42 Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-10 13-47-48.png differ diff --git a/content/imgs/Screenshot from 2023-03-13 14-15-06.png b/content/imgs/Screenshot from 2023-03-13 14-15-06.png new file mode 100644 index 000000000..b0467e781 Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-13 14-15-06.png differ diff --git a/content/imgs/Screenshot from 2023-03-13 14-34-07.png b/content/imgs/Screenshot from 2023-03-13 14-34-07.png new file mode 100644 index 000000000..06f0a5695 Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-13 14-34-07.png differ diff --git a/content/imgs/Screenshot from 2023-03-14 11-12-33.png b/content/imgs/Screenshot from 2023-03-14 11-12-33.png new file mode 100644 index 000000000..7fec34607 Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-14 11-12-33.png differ diff --git a/content/imgs/Screenshot from 2023-03-14 13-08-06.png b/content/imgs/Screenshot from 2023-03-14 13-08-06.png new file mode 100644 index 000000000..1811d2e51 Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-14 13-08-06.png differ diff --git a/content/imgs/Screenshot from 2023-03-14 15-53-57.png b/content/imgs/Screenshot from 2023-03-14 15-53-57.png new file mode 100644 index 000000000..4df0957ab Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-14 15-53-57.png differ diff --git a/content/imgs/Screenshot from 2023-03-14 16-07-26.png b/content/imgs/Screenshot from 2023-03-14 16-07-26.png new file mode 100644 index 000000000..059495277 Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-14 16-07-26.png differ diff --git a/content/imgs/Screenshot from 2023-03-14 16-08-26.png b/content/imgs/Screenshot from 2023-03-14 16-08-26.png new file mode 100644 index 000000000..57a9a6614 Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-14 16-08-26.png differ diff --git a/content/imgs/Screenshot from 2023-03-14 16-20-58.png b/content/imgs/Screenshot from 2023-03-14 16-20-58.png new file mode 100644 index 000000000..90935109f Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-14 16-20-58.png differ diff --git a/content/imgs/Screenshot from 2023-03-14 16-21-19.png b/content/imgs/Screenshot from 2023-03-14 16-21-19.png new file mode 100644 index 000000000..17a703a56 Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-14 16-21-19.png differ diff --git a/content/imgs/Screenshot from 2023-03-15 10-31-39.png b/content/imgs/Screenshot from 2023-03-15 10-31-39.png new file mode 100644 index 000000000..365e473b8 Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-15 10-31-39.png differ diff --git a/content/imgs/Screenshot from 2023-03-15 10-31-59.png b/content/imgs/Screenshot from 2023-03-15 10-31-59.png new file mode 100644 index 000000000..6053b3a5b Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-15 10-31-59.png differ diff --git a/content/imgs/Screenshot from 2023-03-15 10-41-07.png b/content/imgs/Screenshot from 2023-03-15 10-41-07.png new file mode 100644 index 000000000..9816d26c6 Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-15 10-41-07.png differ diff --git a/content/imgs/Screenshot from 2023-03-15 13-01-09.png b/content/imgs/Screenshot from 2023-03-15 13-01-09.png new file mode 100644 index 000000000..66bb5e4cd Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-15 13-01-09.png differ diff --git a/content/imgs/Screenshot from 2023-03-15 13-41-20.png b/content/imgs/Screenshot from 2023-03-15 13-41-20.png new file mode 100644 index 000000000..a25fc9c86 Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-15 13-41-20.png differ diff --git a/content/imgs/Screenshot from 2023-03-15 14-49-53.png b/content/imgs/Screenshot from 2023-03-15 14-49-53.png new file mode 100644 index 000000000..058fddba2 Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-15 14-49-53.png differ diff --git a/content/imgs/Screenshot from 2023-03-15 14-50-13.png b/content/imgs/Screenshot from 2023-03-15 14-50-13.png new file mode 100644 index 000000000..a0d8b739b Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-15 14-50-13.png differ diff --git a/content/imgs/Screenshot from 2023-03-15 14-52-15.png b/content/imgs/Screenshot from 2023-03-15 14-52-15.png new file mode 100644 index 000000000..b544d0ee9 Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-15 14-52-15.png differ diff --git a/content/imgs/Screenshot from 2023-03-20 09-17-17.png b/content/imgs/Screenshot from 2023-03-20 09-17-17.png new file mode 100644 index 000000000..00a4da4f8 Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-20 09-17-17.png differ diff --git a/content/imgs/Screenshot from 2023-03-20 09-35-09.png b/content/imgs/Screenshot from 2023-03-20 09-35-09.png new file mode 100644 index 000000000..557cf0560 Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-20 09-35-09.png differ diff --git a/content/imgs/Screenshot from 2023-03-20 09-44-01.png b/content/imgs/Screenshot from 2023-03-20 09-44-01.png new file mode 100644 index 000000000..c260395a9 Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-20 09-44-01.png differ diff --git a/content/imgs/Screenshot from 2023-03-20 10-19-45.png b/content/imgs/Screenshot from 2023-03-20 10-19-45.png new file mode 100644 index 000000000..c404412f0 Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-20 10-19-45.png differ diff --git a/content/imgs/Screenshot from 2023-03-23 10-00-42.png b/content/imgs/Screenshot from 2023-03-23 10-00-42.png new file mode 100644 index 000000000..f15b97860 Binary files /dev/null and b/content/imgs/Screenshot from 2023-03-23 10-00-42.png differ diff --git a/content/imgs/Screenshot from 2023-04-03 22-53-28.png b/content/imgs/Screenshot from 2023-04-03 22-53-28.png new file mode 100644 index 000000000..2405ed1ab Binary files /dev/null and b/content/imgs/Screenshot from 2023-04-03 22-53-28.png differ diff --git a/content/imgs/Screenshot from 2023-04-12 09-25-39.png b/content/imgs/Screenshot from 2023-04-12 09-25-39.png new file mode 100644 index 000000000..8cb2c9137 Binary files /dev/null and b/content/imgs/Screenshot from 2023-04-12 09-25-39.png differ diff --git a/content/imgs/Screenshot from 2023-04-12 09-56-37.png b/content/imgs/Screenshot from 2023-04-12 09-56-37.png new file mode 100644 index 000000000..661dd7ba7 Binary files /dev/null and b/content/imgs/Screenshot from 2023-04-12 09-56-37.png differ diff --git a/content/imgs/Screenshot from 2023-04-24 01-19-35.png b/content/imgs/Screenshot from 2023-04-24 01-19-35.png new file mode 100644 index 000000000..47b5357ac Binary files /dev/null and b/content/imgs/Screenshot from 2023-04-24 01-19-35.png differ diff --git a/content/imgs/Screenshot from 2023-04-24 01-36-06.png b/content/imgs/Screenshot from 2023-04-24 01-36-06.png new file mode 100644 index 000000000..b08557527 Binary files /dev/null and b/content/imgs/Screenshot from 2023-04-24 01-36-06.png differ diff --git a/content/imgs/Screenshot from 2023-04-24 15-21-36.png b/content/imgs/Screenshot from 2023-04-24 15-21-36.png new file mode 100644 index 000000000..082a8890d Binary files /dev/null and b/content/imgs/Screenshot from 2023-04-24 15-21-36.png differ diff --git a/content/imgs/Screenshot from 2023-04-24 23-56-06.png b/content/imgs/Screenshot from 2023-04-24 23-56-06.png new file mode 100644 index 000000000..823084e18 Binary files /dev/null and b/content/imgs/Screenshot from 2023-04-24 23-56-06.png differ diff --git a/content/imgs/Screenshot from 2023-04-25 14-07-06.png b/content/imgs/Screenshot from 2023-04-25 14-07-06.png new file mode 100644 index 000000000..9caf9ea4f Binary files /dev/null and b/content/imgs/Screenshot from 2023-04-25 14-07-06.png differ diff --git a/content/imgs/Screenshot from 2023-06-08 00-47-21.png b/content/imgs/Screenshot from 2023-06-08 00-47-21.png new file mode 100644 index 000000000..093136cf6 Binary files /dev/null and b/content/imgs/Screenshot from 2023-06-08 00-47-21.png differ diff --git a/content/imgs/Screenshot from 2023-06-08 01-03-45.png b/content/imgs/Screenshot from 2023-06-08 01-03-45.png new file mode 100644 index 000000000..4310e11c7 Binary files /dev/null and b/content/imgs/Screenshot from 2023-06-08 01-03-45.png differ diff --git a/content/imgs/Screenshot from 2023-06-08 01-23-04.png b/content/imgs/Screenshot from 2023-06-08 01-23-04.png new file mode 100644 index 000000000..d29c0d5a6 Binary files /dev/null and b/content/imgs/Screenshot from 2023-06-08 01-23-04.png differ diff --git a/content/imgs/Screenshot from 2023-06-08 14-12-22.png b/content/imgs/Screenshot from 2023-06-08 14-12-22.png new file mode 100644 index 000000000..b75f8a3c8 Binary files /dev/null and b/content/imgs/Screenshot from 2023-06-08 14-12-22.png differ diff --git a/content/imgs/Screenshot from 2023-06-08 14-14-44.png b/content/imgs/Screenshot from 2023-06-08 14-14-44.png new file mode 100644 index 000000000..656696d3d Binary files /dev/null and b/content/imgs/Screenshot from 2023-06-08 14-14-44.png differ diff --git a/content/imgs/Screenshot from 2023-06-08 14-37-30.png b/content/imgs/Screenshot from 2023-06-08 14-37-30.png new file mode 100644 index 000000000..b3632fa1b Binary files /dev/null and b/content/imgs/Screenshot from 2023-06-08 14-37-30.png differ diff --git a/content/imgs/Screenshot from 2023-06-08 15-53-58.png b/content/imgs/Screenshot from 2023-06-08 15-53-58.png new file mode 100644 index 000000000..9e5e96b0f Binary files /dev/null and b/content/imgs/Screenshot from 2023-06-08 15-53-58.png differ diff --git a/content/imgs/Screenshot from 2023-06-10 21-01-40.png b/content/imgs/Screenshot from 2023-06-10 21-01-40.png new file mode 100644 index 000000000..0467e115e Binary files /dev/null and b/content/imgs/Screenshot from 2023-06-10 21-01-40.png differ diff --git a/content/imgs/Screenshot from 2023-06-10 21-06-43.png b/content/imgs/Screenshot from 2023-06-10 21-06-43.png new file mode 100644 index 000000000..0b857403e Binary files /dev/null and b/content/imgs/Screenshot from 2023-06-10 21-06-43.png differ diff --git a/content/imgs/Screenshot from 2023-06-11 02-50-56.png b/content/imgs/Screenshot from 2023-06-11 02-50-56.png new file mode 100644 index 000000000..1f9a4bb06 Binary files /dev/null and b/content/imgs/Screenshot from 2023-06-11 02-50-56.png differ diff --git a/content/imgs/Screenshot from 2023-06-11 16-50-37.png b/content/imgs/Screenshot from 2023-06-11 16-50-37.png new file mode 100644 index 000000000..44fd08cdd Binary files /dev/null and b/content/imgs/Screenshot from 2023-06-11 16-50-37.png differ diff --git a/content/imgs/Screenshot from 2023-06-11 16-56-37.png b/content/imgs/Screenshot from 2023-06-11 16-56-37.png new file mode 100644 index 000000000..600f54c23 Binary files /dev/null and b/content/imgs/Screenshot from 2023-06-11 16-56-37.png differ diff --git a/content/imgs/Screenshot from 2023-06-11 17-21-28.png b/content/imgs/Screenshot from 2023-06-11 17-21-28.png new file mode 100644 index 000000000..93fb538e4 Binary files /dev/null and b/content/imgs/Screenshot from 2023-06-11 17-21-28.png differ diff --git a/content/imgs/Screenshot from 2023-06-11 17-36-45.png b/content/imgs/Screenshot from 2023-06-11 17-36-45.png new file mode 100644 index 000000000..2975e8bde Binary files /dev/null and b/content/imgs/Screenshot from 2023-06-11 17-36-45.png differ diff --git a/content/imgs/Screenshot from 2023-06-11 17-41-02.png b/content/imgs/Screenshot from 2023-06-11 17-41-02.png new file mode 100644 index 000000000..3ab6755e8 Binary files /dev/null and b/content/imgs/Screenshot from 2023-06-11 17-41-02.png differ diff --git a/content/imgs/Screenshot from 2023-06-11 17-51-48.png b/content/imgs/Screenshot from 2023-06-11 17-51-48.png new file mode 100644 index 000000000..35aa62fa8 Binary files /dev/null and b/content/imgs/Screenshot from 2023-06-11 17-51-48.png differ diff --git a/content/imgs/Screenshot from 2023-06-11 17-53-44.png b/content/imgs/Screenshot from 2023-06-11 17-53-44.png new file mode 100644 index 000000000..d6a1f4227 Binary files /dev/null and b/content/imgs/Screenshot from 2023-06-11 17-53-44.png differ diff --git a/content/imgs/Screenshot from 2023-06-11 17-56-54.png b/content/imgs/Screenshot from 2023-06-11 17-56-54.png new file mode 100644 index 000000000..3facbd3a2 Binary files /dev/null and b/content/imgs/Screenshot from 2023-06-11 17-56-54.png differ diff --git a/content/imgs/Screenshot from 2023-07-14 16-37-45.png b/content/imgs/Screenshot from 2023-07-14 16-37-45.png new file mode 100644 index 000000000..70f818717 Binary files /dev/null and b/content/imgs/Screenshot from 2023-07-14 16-37-45.png differ diff --git a/content/imgs/Screenshot from 2023-07-18 09-32-29.png b/content/imgs/Screenshot from 2023-07-18 09-32-29.png new file mode 100644 index 000000000..67cecd377 Binary files /dev/null and b/content/imgs/Screenshot from 2023-07-18 09-32-29.png differ diff --git a/content/imgs/Screenshot from 2023-07-18 09-34-24.png b/content/imgs/Screenshot from 2023-07-18 09-34-24.png new file mode 100644 index 000000000..19ad3d7c6 Binary files /dev/null and b/content/imgs/Screenshot from 2023-07-18 09-34-24.png differ diff --git a/content/imgs/Screenshot from 2023-07-18 09-54-01.png b/content/imgs/Screenshot from 2023-07-18 09-54-01.png new file mode 100644 index 000000000..b596f25d1 Binary files /dev/null and b/content/imgs/Screenshot from 2023-07-18 09-54-01.png differ diff --git a/content/imgs/Screenshot from 2023-07-27 21-15-52.png b/content/imgs/Screenshot from 2023-07-27 21-15-52.png new file mode 100644 index 000000000..083847f23 Binary files /dev/null and b/content/imgs/Screenshot from 2023-07-27 21-15-52.png differ diff --git a/content/imgs/Screenshot from 2023-07-29 17-06-55.png b/content/imgs/Screenshot from 2023-07-29 17-06-55.png new file mode 100644 index 000000000..bbeef91b4 Binary files /dev/null and b/content/imgs/Screenshot from 2023-07-29 17-06-55.png differ diff --git a/content/imgs/Screenshot from 2023-07-29 17-09-50.png b/content/imgs/Screenshot from 2023-07-29 17-09-50.png new file mode 100644 index 000000000..33efd6dc3 Binary files /dev/null and b/content/imgs/Screenshot from 2023-07-29 17-09-50.png differ diff --git a/content/imgs/Screenshot from 2023-07-29 17-18-38.png b/content/imgs/Screenshot from 2023-07-29 17-18-38.png new file mode 100644 index 000000000..977d1229c Binary files /dev/null and b/content/imgs/Screenshot from 2023-07-29 17-18-38.png differ diff --git a/content/imgs/Screenshot from 2023-07-29 17-27-10.png b/content/imgs/Screenshot from 2023-07-29 17-27-10.png new file mode 100644 index 000000000..cdb42a99f Binary files /dev/null and b/content/imgs/Screenshot from 2023-07-29 17-27-10.png differ diff --git a/content/imgs/Screenshot from 2023-07-29 23-22-40.png b/content/imgs/Screenshot from 2023-07-29 23-22-40.png new file mode 100644 index 000000000..bbb358115 Binary files /dev/null and b/content/imgs/Screenshot from 2023-07-29 23-22-40.png differ diff --git a/content/imgs/Screenshot from 2023-07-29 23-41-33.png b/content/imgs/Screenshot from 2023-07-29 23-41-33.png new file mode 100644 index 000000000..161f95570 Binary files /dev/null and b/content/imgs/Screenshot from 2023-07-29 23-41-33.png differ diff --git a/content/imgs/Screenshot from 2023-07-30 00-45-44.png b/content/imgs/Screenshot from 2023-07-30 00-45-44.png new file mode 100644 index 000000000..db4978f73 Binary files /dev/null and b/content/imgs/Screenshot from 2023-07-30 00-45-44.png differ diff --git a/content/imgs/Screenshot from 2023-07-30 01-44-31.png b/content/imgs/Screenshot from 2023-07-30 01-44-31.png new file mode 100644 index 000000000..7e26cd10b Binary files /dev/null and b/content/imgs/Screenshot from 2023-07-30 01-44-31.png differ diff --git a/content/imgs/Screenshot from 2023-07-30 02-02-49.png b/content/imgs/Screenshot from 2023-07-30 02-02-49.png new file mode 100644 index 000000000..5a2bbedb0 Binary files /dev/null and b/content/imgs/Screenshot from 2023-07-30 02-02-49.png differ diff --git a/content/imgs/awsconfig.png b/content/imgs/awsconfig.png new file mode 100644 index 000000000..2fa95b307 Binary files /dev/null and b/content/imgs/awsconfig.png differ diff --git a/content/imgs/pt1-q9-i1.jpg b/content/imgs/pt1-q9-i1.jpg new file mode 100644 index 000000000..cc400882e Binary files /dev/null and b/content/imgs/pt1-q9-i1.jpg differ diff --git a/content/index.md b/content/index.md new file mode 100644 index 000000000..56fb37414 --- /dev/null +++ b/content/index.md @@ -0,0 +1,14 @@ +#index +Github [Link](https://github.com/ErdemOzgen/ObsidianSTSV/tree/main) + +* Books [[Book Index]] +* DailyNotes +* HVL [[HVL index]] +* Freelance [[freelance index]] +* Cloud [[cloud index]] +* ML [[ML index]] +* Data eng [[data eng index]] +* NATO LOCKSHIELD [[NATO LOCKSHIELD STUDY PLAN]] +* Article [[article index]] +* Project X [[Project x index]] +* \ No newline at end of file diff --git a/quartz.config.ts b/quartz.config.ts index f677a18f9..fc0c4ad9d 100644 --- a/quartz.config.ts +++ b/quartz.config.ts @@ -3,13 +3,13 @@ import * as Plugin from "./quartz/plugins" const config: QuartzConfig = { configuration: { - pageTitle: "🪴 Quartz 4.0", + pageTitle: "Erdem's Second Brain", enableSPA: true, enablePopovers: true, analytics: { provider: "plausible", }, - baseUrl: "quartz.jzhao.xyz", + baseUrl: "erdemozgen.github.io/brain", ignorePatterns: ["private", "templates", ".obsidian"], defaultDateType: "created", theme: {