diff --git a/content/.trash/algorithm/Untitled.md b/content/.trash/algorithm/Untitled.md new file mode 100644 index 000000000..e69de29bb diff --git a/content/.trash/tmp.md b/content/.trash/tmp.md new file mode 100644 index 000000000..81dd113fa --- /dev/null +++ b/content/.trash/tmp.md @@ -0,0 +1,12 @@ +--- +title: tmp_note +tags: + - tmp_note +date: +--- + +1. 角度和距离,到底是哪个 +2. 水平和竖直,是什么 +3. 个人的误差,不同人的差异 +4. 特征分组,分成不同的矫正曲线 +5. 左右眼一致 \ No newline at end of file diff --git a/content/computer_sci/MOC.md b/content/computer_sci/MOC.md index 81f87ee4c..68fe1c27f 100644 --- a/content/computer_sci/MOC.md +++ b/content/computer_sci/MOC.md @@ -17,4 +17,6 @@ date: 2024-05-21 * [Multi-Processing - MOC](computer_sci/multiProcessing/MOC.md) -* [Computational Geometry - MOC](computer_sci/computational_geometry/MOC.md) \ No newline at end of file +* [Computational Geometry - MOC](computer_sci/computational_geometry/MOC.md) + +* [Interview MOC](computer_sci/interview/interview_MOC.md) \ No newline at end of file diff --git a/content/computer_sci/deep_learning_and_machine_learning/LLM/basic/precision/LLM_precision.md b/content/computer_sci/deep_learning_and_machine_learning/LLM/basic/precision/LLM_precision.md new file mode 100644 index 000000000..4543bf18e --- /dev/null +++ b/content/computer_sci/deep_learning_and_machine_learning/LLM/basic/precision/LLM_precision.md @@ -0,0 +1,21 @@ +--- +title: LLM Precision About +tags: + - LLM +date: 2024-09-26 +--- +# Default Precision + +In conventional scientific computing, we typically use 64-bit floats for a higher precision. While training deep neural networks on a GPU, we typically use a lower-than-maximum precision, namely, 32-bit floating point operation. PyTorch uses 32-bit floats by default. + +Reasons for deep learning use 32-bit precision: + +* 64-bit precision unnecessary and computationally expensive +* GPU not optimized for 64-bit precision + +**32-bit floating point operations have become the standard for training deep neural networks on GPUs.** + + +# Reference + +[1] Raschka, Sebastian. “Accelerating Large Language Models with Mixed-Precision Techniques.” _Sebastian Raschka, PhD_, 11 May 2023, https://sebastianraschka.com/blog/2023/llm-mixed-precision-copy.html. \ No newline at end of file diff --git a/content/computer_sci/deep_learning_and_machine_learning/Trick/quantile_loss.md b/content/computer_sci/deep_learning_and_machine_learning/Trick/quantile_loss.md index 570051f36..b7d773d7d 100644 --- a/content/computer_sci/deep_learning_and_machine_learning/Trick/quantile_loss.md +++ b/content/computer_sci/deep_learning_and_machine_learning/Trick/quantile_loss.md @@ -13,7 +13,7 @@ Quantile loss用于衡量预测分布和目标分布之间的差异,特别适 # What is quantile -[quantile_concept](math/Statistics/basic_concepot/quantile_concept.md) +[quantile_concept](math/statistic/basic_concepot/quantile_concept.md) # What is a prediction interval diff --git a/content/computer_sci/interview/interview_MOC.md b/content/computer_sci/interview/interview_MOC.md new file mode 100644 index 000000000..1642f5626 --- /dev/null +++ b/content/computer_sci/interview/interview_MOC.md @@ -0,0 +1,8 @@ +--- +title: CS interview MOC +tags: + - MOC + - cs +date: 2024-09-29 +--- +* [machine learning interview](computer_sci/interview/machine_learning_interview.md) \ No newline at end of file diff --git a/content/computer_sci/interview/machine_learning_interview.md b/content/computer_sci/interview/machine_learning_interview.md new file mode 100644 index 000000000..11dea814b --- /dev/null +++ b/content/computer_sci/interview/machine_learning_interview.md @@ -0,0 +1,12 @@ +--- +title: Machine Learning Interview +tags: + - machine-learning + - cs + - interview +date: 2024-09-29 +--- +# Transformer + +## Attention计算公式 + diff --git a/content/math/MOC.md b/content/math/MOC.md index b0b73c1c7..99a872924 100644 --- a/content/math/MOC.md +++ b/content/math/MOC.md @@ -10,11 +10,16 @@ date: 2023-12-03 ## Basic Concept -* [quantile_concept](math/Statistics/basic_concepot/quantile_concept.md) +* [quantile_concept](math/statistic/basic_concepot/quantile_concept.md) ## Significance Test -* [Basic about significance test](math/Statistics/significance_test/whats_the_significance_test.md) +* [Basic about significance test](math/statistic/significance_test/whats_the_significance_test.md) + +## Anomaly Detection + +* [Z-Score](math/statistic/anomaly_detection/z_score.md) +* [IQR](math/statistic/anomaly_detection/IQR.md) # Discrete mathematics diff --git a/content/math/statistic/anomaly_detection/IQR.md b/content/math/statistic/anomaly_detection/IQR.md new file mode 100644 index 000000000..f2610a490 --- /dev/null +++ b/content/math/statistic/anomaly_detection/IQR.md @@ -0,0 +1,54 @@ +--- +title: Interquartile Range +tags: + - math + - statistics + - anomaly +date: 2024-10-08 +--- +# What is IQR + +**Interquartile Range**, IQR, 即四分位距。 +基于IQR进行anomaly detection常用于检测非正太分布数据中的异常值,它通过数据的四分位数(Q1和Q3)来识别和去除异常值,较[Z-score](math/statistic/anomaly_detection/z_score.md)方法更适合处理有偏或非正态分布的数据。 + +- **第一四分位数(Q1)**:下四分位数,表示数据中最小25%的点所在位置。 +- **第三四分位数(Q3)**:上四分位数,表示数据中最大25%的点所在位置。 +- **四分位距(IQR)**:是Q3与Q1之间的差值,计算公式为: +$$ +IQR = Q3 - Q1 +$$ + +# Algorithm Detail + +1. **排序数据**: + + - 将数据从小到大排序。 +2. **计算四分位数**: + + - **Q1**:找到排序后数据中第25%的位置。 + - **Q3**:找到排序后数据中第75%的位置。 +3. **计算四分位距**: + + - IQR = Q3 - Q1,表示数据中间部分的扩展范围。 +4. **设定上下限**: + + - 定义**下限**和**上限**,用于判断异常值。 + - **下限** = Q1 - 1.5 × IQR + - **上限** = Q3 + 1.5 × IQR + - 1.5倍IQR是一个常用的经验值,可以调整为其他倍数(如2倍或3倍),取决于具体应用场景。 +5. **检测异常值**: + + - 任何小于下限或大于上限的数据点被认为是异常值。 + + +# Pros and Cons + +### 优点: + +- **不依赖数据分布**:IQR算法不需要假设数据为正态分布,适合处理有偏分布或非对称分布的数据。 +- **对极端值不敏感**:与Z-score不同,IQR不受极端值的影响,因为它依赖于中位数和四分位数,而非均值和标准差。 + +### 缺点: + +- **对大规模数据集处理效率较低**:在大型数据集中计算四分位数和IQR可能会比较耗时。 +- **对数据边界的敏感性**:虽然IQR能有效识别极端的异常值,但对于靠近上下界的边缘数据,可能会过度标记为异常。 \ No newline at end of file diff --git a/content/math/statistic/anomaly_detection/z_score.md b/content/math/statistic/anomaly_detection/z_score.md new file mode 100644 index 000000000..79b893a33 --- /dev/null +++ b/content/math/statistic/anomaly_detection/z_score.md @@ -0,0 +1,30 @@ +--- +title: Z-score +tags: + - math + - statistics +date: 2024-10-08 +--- +# What is Z-score + +$$ +z = \frac{X-\mu}{\sigma} +$$ +* $X$: 单个数据点 +* $\mu$: 总体均值 +* $\sigma$: 总体标准差 + +通过该公式,Z-score表示一个数据点与平均值之间的标准差距离。具体来说: + +- 当Z-score为0时,表示该数据点等于均值。 +- 当Z-score在±1之间时,表示数据点在一个标准差范围内。 +- 当Z-score超过±3时,通常被视为异常值 + + +# Pros and Cons + +Z-score的概念很直接,部署快捷。 + +Z-score为什么要叫做Z-score,是因为**Z的符号来源于正态分布**。在统计学中,标准正态分布是一种具有均值为0、标准差为1的特殊正态分布,通常用字母 **Z** 表示。 + +也是因为此,Z-score用于的数据分布常常处于正太分布,对数据正太分布有依赖性,因此对极端值敏感,使得均值和标准差容易受到极端值影响,导致误判 \ No newline at end of file diff --git a/content/math/Statistics/basic_concepot/distribution/attachments/2bbb645362366906ace3296d35612625_720.jpg b/content/math/statistic/basic_concepot/distribution/attachments/2bbb645362366906ace3296d35612625_720.jpg similarity index 100% rename from content/math/Statistics/basic_concepot/distribution/attachments/2bbb645362366906ace3296d35612625_720.jpg rename to content/math/statistic/basic_concepot/distribution/attachments/2bbb645362366906ace3296d35612625_720.jpg diff --git a/content/math/Statistics/basic_concepot/distribution/attachments/df15541df80b6065fb8296d80ffceac5_720.jpg b/content/math/statistic/basic_concepot/distribution/attachments/df15541df80b6065fb8296d80ffceac5_720.jpg similarity index 100% rename from content/math/Statistics/basic_concepot/distribution/attachments/df15541df80b6065fb8296d80ffceac5_720.jpg rename to content/math/statistic/basic_concepot/distribution/attachments/df15541df80b6065fb8296d80ffceac5_720.jpg diff --git a/content/math/Statistics/basic_concepot/distribution/attachments/prove.jpg b/content/math/statistic/basic_concepot/distribution/attachments/prove.jpg similarity index 100% rename from content/math/Statistics/basic_concepot/distribution/attachments/prove.jpg rename to content/math/statistic/basic_concepot/distribution/attachments/prove.jpg diff --git a/content/math/Statistics/basic_concepot/distribution/beta_binomial.md b/content/math/statistic/basic_concepot/distribution/beta_binomial.md similarity index 100% rename from content/math/Statistics/basic_concepot/distribution/beta_binomial.md rename to content/math/statistic/basic_concepot/distribution/beta_binomial.md diff --git a/content/math/Statistics/basic_concepot/distribution/exponential_distribution_and_poisson_distribution.md b/content/math/statistic/basic_concepot/distribution/exponential_distribution_and_poisson_distribution.md similarity index 95% rename from content/math/Statistics/basic_concepot/distribution/exponential_distribution_and_poisson_distribution.md rename to content/math/statistic/basic_concepot/distribution/exponential_distribution_and_poisson_distribution.md index 91651d3be..9a6d2cafa 100644 --- a/content/math/Statistics/basic_concepot/distribution/exponential_distribution_and_poisson_distribution.md +++ b/content/math/statistic/basic_concepot/distribution/exponential_distribution_and_poisson_distribution.md @@ -63,7 +63,7 @@ $$ # Deduction -![](math/Statistics/basic_concepot/distribution/attachments/2bbb645362366906ace3296d35612625_720.jpg) +![](math/statistic/basic_concepot/distribution/attachments/2bbb645362366906ace3296d35612625_720.jpg) # Reference diff --git a/content/math/Statistics/basic_concepot/distribution/gamma_distribution.md b/content/math/statistic/basic_concepot/distribution/gamma_distribution.md similarity index 91% rename from content/math/Statistics/basic_concepot/distribution/gamma_distribution.md rename to content/math/statistic/basic_concepot/distribution/gamma_distribution.md index a0a0fa3ae..a6374b1f7 100644 --- a/content/math/Statistics/basic_concepot/distribution/gamma_distribution.md +++ b/content/math/statistic/basic_concepot/distribution/gamma_distribution.md @@ -35,7 +35,7 @@ $$ $$ 证明如下: -![](math/Statistics/basic_concepot/distribution/attachments/prove.jpg) +![](math/statistic/basic_concepot/distribution/attachments/prove.jpg) 同时,在integer节点,Gamma function也和阶乘对应起来,即: @@ -45,7 +45,7 @@ $$ 证明如下: -![](math/Statistics/basic_concepot/distribution/attachments/df15541df80b6065fb8296d80ffceac5_720.jpg) +![](math/statistic/basic_concepot/distribution/attachments/df15541df80b6065fb8296d80ffceac5_720.jpg) @@ -53,7 +53,7 @@ $$ Exponential Distribution指的是,probability of the waiting time between events in a Poisson Process -Here's the exponential distribution explain: [Exponential Distribution](math/Statistics/basic_concepot/distribution/exponential_distribution_and_poisson_distribution.md) +Here's the exponential distribution explain: [Exponential Distribution](math/statistic/basic_concepot/distribution/exponential_distribution_and_poisson_distribution.md) # Introduction diff --git a/content/math/Statistics/basic_concepot/distribution/students_t_distribution.md b/content/math/statistic/basic_concepot/distribution/students_t_distribution.md similarity index 100% rename from content/math/Statistics/basic_concepot/distribution/students_t_distribution.md rename to content/math/statistic/basic_concepot/distribution/students_t_distribution.md diff --git a/content/math/Statistics/basic_concepot/quantile_concept.md b/content/math/statistic/basic_concepot/quantile_concept.md similarity index 100% rename from content/math/Statistics/basic_concepot/quantile_concept.md rename to content/math/statistic/basic_concepot/quantile_concept.md diff --git a/content/math/Statistics/game_theory/counterfactual_regret_minimization.md b/content/math/statistic/game_theory/counterfactual_regret_minimization.md similarity index 100% rename from content/math/Statistics/game_theory/counterfactual_regret_minimization.md rename to content/math/statistic/game_theory/counterfactual_regret_minimization.md diff --git a/content/math/Statistics/significance_test/attachments/Pasted image 20240415174359.png b/content/math/statistic/significance_test/attachments/Pasted image 20240415174359.png similarity index 100% rename from content/math/Statistics/significance_test/attachments/Pasted image 20240415174359.png rename to content/math/statistic/significance_test/attachments/Pasted image 20240415174359.png diff --git a/content/math/Statistics/significance_test/whats_the_significance_test.md b/content/math/statistic/significance_test/whats_the_significance_test.md similarity index 98% rename from content/math/Statistics/significance_test/whats_the_significance_test.md rename to content/math/statistic/significance_test/whats_the_significance_test.md index aac9d3194..2562344bf 100644 --- a/content/math/Statistics/significance_test/whats_the_significance_test.md +++ b/content/math/statistic/significance_test/whats_the_significance_test.md @@ -80,7 +80,7 @@ T值的大小并不直接影响相关性的可重复性。然而,如果我们 ### P-value -![](math/Statistics/significance_test/attachments/Pasted%20image%2020240415174359.png) +![](math/statistic/significance_test/attachments/Pasted%20image%2020240415174359.png) P值(P-value),全称为概率值(Probability value),是统计假设检验中的一个重要概念。**它用于帮助我们决定是否拒绝零假设**(null hypothesis)。**P值衡量的是,在零假设为真的情况下,观察到的统计量(如T值、Z值等)或更极端情况出现的概率**。 diff --git a/content/math/Statistics/stochastic_process/attachments/6fd1795d98c9031bc791909a8d098e25.jpg b/content/math/statistic/stochastic_process/attachments/6fd1795d98c9031bc791909a8d098e25.jpg similarity index 100% rename from content/math/Statistics/stochastic_process/attachments/6fd1795d98c9031bc791909a8d098e25.jpg rename to content/math/statistic/stochastic_process/attachments/6fd1795d98c9031bc791909a8d098e25.jpg diff --git a/content/math/Statistics/stochastic_process/markov_chain.md b/content/math/statistic/stochastic_process/markov_chain.md similarity index 96% rename from content/math/Statistics/stochastic_process/markov_chain.md rename to content/math/statistic/stochastic_process/markov_chain.md index c3d7492ba..c36d4749b 100644 --- a/content/math/Statistics/stochastic_process/markov_chain.md +++ b/content/math/statistic/stochastic_process/markov_chain.md @@ -35,7 +35,7 @@ $$ An RPG with a 33% blitz rate. But if the first two times you don't blitz, the third time you're bound to blitz. So what is the actual hit rate? -![](math/Statistics/stochastic_process/attachments/6fd1795d98c9031bc791909a8d098e25.jpg) +![](math/statistic/stochastic_process/attachments/6fd1795d98c9031bc791909a8d098e25.jpg) Simulation Code: diff --git a/content/signal/signal_processing/algorithm/advanced_statistic/autocorrelation/autocorrelation.md b/content/signal/signal_processing/algorithm/advanced_statistic/autocorrelation/autocorrelation.md index fe0cca155..b37e9abef 100644 --- a/content/signal/signal_processing/algorithm/advanced_statistic/autocorrelation/autocorrelation.md +++ b/content/signal/signal_processing/algorithm/advanced_statistic/autocorrelation/autocorrelation.md @@ -70,7 +70,7 @@ For correlation, we usually use **p-value** to **quantify the confidence** of th ![](signal/signal_processing/algorithm/advanced_statistic/autocorrelation/attachments/Pasted%20image%2020240415171855.png) -About P-value, you have better know what's [significance test](math/Statistics/significance_test/whats_the_significance_test.md) +About P-value, you have better know what's [significance test](math/statistic/significance_test/whats_the_significance_test.md) ## Random Signal