general refactoring
@ -3,3 +3,6 @@ tags:
|
||||
- "#cyber"
|
||||
---
|
||||
#cyber
|
||||
#sourcecodeaudit
|
||||
#devsecops
|
||||
#oswe
|
||||
|
||||
@ -1,23 +0,0 @@
|
||||
A data mart is a subset of a data warehouse that is usually oriented to a specific business line or team. Whereas data warehouses have enterprise-wide depth, data marts are often smaller sections of data warehouses segmented for specific uses. Here are some key points about data marts:
|
||||
|
||||
1. **Purpose-Oriented**: Data marts are designed to meet the specific needs of a particular group of users, such as a department within a company, like sales, finance, or marketing. They contain data relevant to that group.
|
||||
|
||||
2. **Scope and Size**: They are smaller in scope and size compared to data warehouses. This makes them easier to implement and manage.
|
||||
|
||||
3. **Data Sources**: Data marts can draw data from a wide range of sources. They may pull data from internal systems like ERP and CRM, as well as external data sources.
|
||||
|
||||
4. **Performance**: Because of their smaller size and focused nature, data marts can improve query performance. They enable users to access and analyze relevant data more quickly and efficiently than sifting through a larger data warehouse.
|
||||
|
||||
5. **Types of Data Marts**: There are two main types: independent data marts, which are created without a data warehouse and rely solely on data from source systems; and dependent data marts, which are created from an existing data warehouse.
|
||||
|
||||
6. **Implementation**: Implementing a data mart is generally faster and less costly than a full data warehouse because of its limited scope.
|
||||
|
||||
7. **Business Intelligence**: Data marts are often used in business intelligence (BI) applications, where they provide the data for reporting and analysis tools.
|
||||
|
||||
8. **Customization**: They can be customized to fit the specific needs of different user groups within an organization, providing more relevant and tailored data for analysis.
|
||||
|
||||
In summary, data marts are a focused and scaled-down version of data warehouses, designed to provide specific groups within an organization with the data they need for analysis and decision-making.
|
||||
|
||||
https://community.databricks.com/t5/data-engineering/how-to-build-data-warehouses-and-data-marts-with-python/td-p/42843
|
||||
|
||||
|
||||
@ -1,82 +0,0 @@
|
||||
# Modern Data Stack
|
||||
|
||||
Last updated Dec 7, 2023
|
||||
|
||||
- [Dataquestion](https://www.ssp.sh/brain/tags/dataquestion/)
|
||||
|
||||
Table of Contents
|
||||
|
||||
1. [Why the Modern Data Stack?](https://www.ssp.sh/brain/modern-data-stack/#why-the-modern-data-stack)
|
||||
2. [Integrating with [[Dagster]]](https://www.ssp.sh/brain/modern-data-stack/#integrating-with-dagster)
|
||||
3. [A comment I made on Social](https://www.ssp.sh/brain/modern-data-stack/#a-comment-i-made-on-social)
|
||||
4. [Further Links](https://www.ssp.sh/brain/modern-data-stack/#further-links)
|
||||
5. [[[Counter Arguments against Modern Data Stack]]](https://www.ssp.sh/brain/modern-data-stack/#counter-arguments-against-modern-data-stack)
|
||||
|
||||
The Modern Data Stack (MDS) comprises a suite of open-source tools designed for end-to-end analytics. This includes data ingestion, transformation, machine learning, and integration into a columnar data warehouse or lake solution, all complemented by an analytics BI dashboard backend. The stack’s versatility allows extensions for data quality, data cataloging, and more.
|
||||
|
||||
MDS aims to enable data insights using the best-suited tools for each process. It’s worth noting that “Modern Data Stack” is a relatively new term, with its definition still evolving.
|
||||
|
||||
> Synonym Names
|
||||
> A burgeoning term, [ngods (new generation open-source data stack)](https://www.ssp.sh/brain/ngods-new-generation-open-source-data-stack), has emerged. Previously, I’ve referred to this concept as the Open Data Stack Project. Additionally, Dagster introduced the term DataStack 2.0 in a [recent blog post](https://dagster.io/blog/evolution-iq-case-study). [Open Data Stack](https://www.ssp.sh/brain/open-data-stack) is my own definition of it.
|
||||
|
||||
> Closed Source vs Open Source
|
||||
> Closed Source examples: dbt, Looker, Snowflake, Fivetran, Hightouch, Census
|
||||
> Open Source alternatives: airbyte, dbt, dagster, Superset, Reverse-ETL?
|
||||
|
||||
> Modern Data Stack on a Laptop
|
||||
> [DuckDB: Modern Data Stack in a Box](https://duckdb.org/2022/10/12/modern-data-stack-in-a-box.html)
|
||||
|
||||
[
|
||||
|
||||
## # Why the Modern Data Stack?
|
||||
|
||||
](https://www.ssp.sh/brain/modern-data-stack/#why-the-modern-data-stack)
|
||||
|
||||
A perspective from [Reddit](https://www.reddit.com/r/dataengineering/comments/12acdrk/comment/jes4pr8/?utm_source=share&utm_medium=web2x&context=3) highlights the shift in data warehousing and analytics. It underscores the reduced need for extensive teams and infrastructure, thanks to new tools that streamline data management and reporting. Particularly for small and mid-sized companies, MDS offers a competitive edge in data handling, allowing even a single data engineer to manage vast datasets efficiently.
|
||||
|
||||
---
|
||||
|
||||
A notable article discussing Lakehouse, Metrics Layer, and Clickhouse:
|
||||
[The Next Cloud Data Platform | Greylock](https://greylock.com/greymatter/the-next-cloud-data-platform/)
|
||||

|
||||
|
||||
[](https://www.ssp.sh/brain/modern-data-stack/#integrating-with-dagster)
|
||||
|
||||
## [# Integrating with](https://www.ssp.sh/brain/modern-data-stack/#integrating-with-dagster) Dagster
|
||||
|
||||
The downside of MDS is the unbundling of Bundling vs Unbundling- Monolith Data vs Microservices, but Dagster helps integrate the full data stack together:
|
||||

|
||||

|
||||
Dagster elevates the Modern Data Stack:
|
||||

|
||||

|
||||
Explore more about its power with Dagster and [Data Assets](https://www.ssp.sh/brain/data-assets).
|
||||
|
||||
[
|
||||
|
||||
## # A comment I made on Social
|
||||
|
||||
](https://www.ssp.sh/brain/modern-data-stack/#a-comment-i-made-on-social)
|
||||
|
||||
I often ponder over the ideal tools for a data stack. My preference leans toward a [Cloud Data Warehouse](https://www.sspaeti.com/blog/olap-whats-coming-next/#cloud-data-warehouses) such as [](https://www.firebolt.io/)Firebolt, [Snowflake](https://www.snowflake.com/), [BigQuery](https://cloud.google.com/bigquery/), [Redshift](https://aws.amazon.com/redshift/), or [Synapse](https://azure.microsoft.com/en-us/services/synapse-analytics/), as a starting point.
|
||||
|
||||
The journey typically begins with [Airbyte](https://airbyte.com/) for data integration, followed by SQL-based transformation with [dbt](https://www.getdbt.com/). Orchestrating the processes in [Python](https://www.sspaeti.com/blog/business-intelligence-meets-data-engineering/#8220use-python-and-sql-if-possible8221) with tools like [dagster](https://ask.astorik.com/c/resources-sspaeti/what-are-common-alternatives-to-apache-airflow) is crucial.
|
||||
|
||||
From there, I would integrate additional open-source tools based on specific needs: [Spark](https://spark.apache.org/) for processing, [Delta Lake](https://delta.io/) for data lake formatting and ACID Transactions, [Amundsen](http://amundsen.io/) for data cataloging, and [Great Expectation](https://greatexpectations.io/) for data quality, among others. For smaller projects, [DuckDB](https://duckdb.org/) is suitable for local [OLAP](https://www.ssp.sh/brain/olap) scenarios, while [Kubernetes](https://kubernetes.io/) and DevOps provide scalability.
|
||||
|
||||
For teams without data engineering resources, closed-source options like [Ascend](https://www.ascend.io/) or [Foundry](https://www.palantir.com/platforms/foundry/) are viable alternatives.
|
||||
|
||||
Feel free to reach out for further discussion or clarifications.
|
||||
|
||||
[
|
||||
|
||||
## # Further Links
|
||||
|
||||
](https://www.ssp.sh/brain/modern-data-stack/#further-links)
|
||||
|
||||
- [The next layer of the modern data stack](https://www.getdbt.com/blog/next-layer-of-the-modern-data-stack/)
|
||||
- [What Is the Modern Data Stack? | Fivetran](https://www.fivetran.com/blog/what-is-the-modern-data-stack)
|
||||
|
||||
[](https://www.ssp.sh/brain/modern-data-stack/#counter-arguments-against-modern-data-stack)
|
||||
|
||||
## [#](https://www.ssp.sh/brain/modern-data-stack/#counter-arguments-against-modern-data-stack) Counter Arguments against Modern Data Stack
|
||||
2
content/SoftwareEngineering/Leetcode/links.md
Normal file
@ -0,0 +1,2 @@
|
||||
|
||||
https://www.youtube.com/watch?v=lvO88XxNAzs
|
||||
@ -6,7 +6,7 @@ tags:
|
||||
---
|
||||
-
|
||||
[[Go By example]]
|
||||
[[ProgramingLang/Index|Index]]
|
||||
[[SoftwareEngineering/ProgramingLang/Index|Index]]
|
||||
|
||||
|
||||
|
||||
@ -1 +0,0 @@
|
||||
{}
|
||||
@ -1 +0,0 @@
|
||||
{}
|
||||
@ -1 +0,0 @@
|
||||
{}
|
||||
|
Before Width: | Height: | Size: 1.2 MiB After Width: | Height: | Size: 1.2 MiB |
|
Before Width: | Height: | Size: 370 KiB After Width: | Height: | Size: 370 KiB |
|
Before Width: | Height: | Size: 279 KiB After Width: | Height: | Size: 279 KiB |
|
Before Width: | Height: | Size: 246 KiB After Width: | Height: | Size: 246 KiB |
|
Before Width: | Height: | Size: 563 KiB After Width: | Height: | Size: 563 KiB |
|
Before Width: | Height: | Size: 551 KiB After Width: | Height: | Size: 551 KiB |
|
Before Width: | Height: | Size: 234 KiB After Width: | Height: | Size: 234 KiB |