Add notes and change file structure

This commit is contained in:
PinkR1ver 2024-10-10 10:39:11 +08:00
parent 0049b139a3
commit 6f6d21292c
24 changed files with 155 additions and 11 deletions

View File

12
content/.trash/tmp.md Normal file
View File

@ -0,0 +1,12 @@
---
title: tmp_note
tags:
- tmp_note
date:
---
1. 角度和距离,到底是哪个
2. 水平和竖直,是什么
3. 个人的误差,不同人的差异
4. 特征分组,分成不同的矫正曲线
5. 左右眼一致

View File

@ -17,4 +17,6 @@ date: 2024-05-21
* [Multi-Processing - MOC](computer_sci/multiProcessing/MOC.md)
* [Computational Geometry - MOC](computer_sci/computational_geometry/MOC.md)
* [Computational Geometry - MOC](computer_sci/computational_geometry/MOC.md)
* [Interview MOC](computer_sci/interview/interview_MOC.md)

View File

@ -0,0 +1,21 @@
---
title: LLM Precision About
tags:
- LLM
date: 2024-09-26
---
# Default Precision
In conventional scientific computing, we typically use 64-bit floats for a higher precision. While training deep neural networks on a GPU, we typically use a lower-than-maximum precision, namely, 32-bit floating point operation. PyTorch uses 32-bit floats by default.
Reasons for deep learning use 32-bit precision:
* 64-bit precision unnecessary and computationally expensive
* GPU not optimized for 64-bit precision
**32-bit floating point operations have become the standard for training deep neural networks on GPUs.**
# Reference
[1] Raschka, Sebastian. “Accelerating Large Language Models with Mixed-Precision Techniques.” _Sebastian Raschka, PhD_, 11 May 2023, https://sebastianraschka.com/blog/2023/llm-mixed-precision-copy.html.

View File

@ -13,7 +13,7 @@ Quantile loss用于衡量预测分布和目标分布之间的差异特别适
# What is quantile
[quantile_concept](math/Statistics/basic_concepot/quantile_concept.md)
[quantile_concept](math/statistic/basic_concepot/quantile_concept.md)
# What is a prediction interval

View File

@ -0,0 +1,8 @@
---
title: CS interview MOC
tags:
- MOC
- cs
date: 2024-09-29
---
* [machine learning interview](computer_sci/interview/machine_learning_interview.md)

View File

@ -0,0 +1,12 @@
---
title: Machine Learning Interview
tags:
- machine-learning
- cs
- interview
date: 2024-09-29
---
# Transformer
## Attention计算公式

View File

@ -10,11 +10,16 @@ date: 2023-12-03
## Basic Concept
* [quantile_concept](math/Statistics/basic_concepot/quantile_concept.md)
* [quantile_concept](math/statistic/basic_concepot/quantile_concept.md)
## Significance Test
* [Basic about significance test](math/Statistics/significance_test/whats_the_significance_test.md)
* [Basic about significance test](math/statistic/significance_test/whats_the_significance_test.md)
## Anomaly Detection
* [Z-Score](math/statistic/anomaly_detection/z_score.md)
* [IQR](math/statistic/anomaly_detection/IQR.md)
# Discrete mathematics

View File

@ -0,0 +1,54 @@
---
title: Interquartile Range
tags:
- math
- statistics
- anomaly
date: 2024-10-08
---
# What is IQR
**Interquartile Range**, IQR, 即四分位距。
基于IQR进行anomaly detection常用于检测非正太分布数据中的异常值它通过数据的四分位数Q1和Q3来识别和去除异常值较[Z-score](math/statistic/anomaly_detection/z_score.md)方法更适合处理有偏或非正态分布的数据。
- **第一四分位数Q1**下四分位数表示数据中最小25%的点所在位置。
- **第三四分位数Q3**上四分位数表示数据中最大25%的点所在位置。
- **四分位距IQR**是Q3与Q1之间的差值计算公式为
$$
IQR = Q3 - Q1
$$
# Algorithm Detail
1. **排序数据**
- 将数据从小到大排序。
2. **计算四分位数**
- **Q1**找到排序后数据中第25%的位置。
- **Q3**找到排序后数据中第75%的位置。
3. **计算四分位距**
- IQR = Q3 - Q1表示数据中间部分的扩展范围。
4. **设定上下限**
- 定义**下限**和**上限**,用于判断异常值。
- **下限** = Q1 - 1.5 × IQR
- **上限** = Q3 + 1.5 × IQR
- 1.5倍IQR是一个常用的经验值可以调整为其他倍数如2倍或3倍取决于具体应用场景。
5. **检测异常值**
- 任何小于下限或大于上限的数据点被认为是异常值。
# Pros and Cons
### 优点:
- **不依赖数据分布**IQR算法不需要假设数据为正态分布适合处理有偏分布或非对称分布的数据。
- **对极端值不敏感**与Z-score不同IQR不受极端值的影响因为它依赖于中位数和四分位数而非均值和标准差。
### 缺点:
- **对大规模数据集处理效率较低**在大型数据集中计算四分位数和IQR可能会比较耗时。
- **对数据边界的敏感性**虽然IQR能有效识别极端的异常值但对于靠近上下界的边缘数据可能会过度标记为异常。

View File

@ -0,0 +1,30 @@
---
title: Z-score
tags:
- math
- statistics
date: 2024-10-08
---
# What is Z-score
$$
z = \frac{X-\mu}{\sigma}
$$
* $X$: 单个数据点
* $\mu$: 总体均值
* $\sigma$: 总体标准差
通过该公式Z-score表示一个数据点与平均值之间的标准差距离。具体来说
- 当Z-score为0时表示该数据点等于均值。
- 当Z-score在±1之间时表示数据点在一个标准差范围内。
- 当Z-score超过±3时通常被视为异常值
# Pros and Cons
Z-score的概念很直接部署快捷。
Z-score为什么要叫做Z-score是因为**Z的符号来源于正态分布**。在统计学中标准正态分布是一种具有均值为0、标准差为1的特殊正态分布通常用字母 **Z** 表示。
也是因为此Z-score用于的数据分布常常处于正太分布对数据正太分布有依赖性因此对极端值敏感使得均值和标准差容易受到极端值影响导致误判

View File

@ -63,7 +63,7 @@ $$
# Deduction
![](math/Statistics/basic_concepot/distribution/attachments/2bbb645362366906ace3296d35612625_720.jpg)
![](math/statistic/basic_concepot/distribution/attachments/2bbb645362366906ace3296d35612625_720.jpg)
# Reference

View File

@ -35,7 +35,7 @@ $$
$$
证明如下:
![](math/Statistics/basic_concepot/distribution/attachments/prove.jpg)
![](math/statistic/basic_concepot/distribution/attachments/prove.jpg)
同时在integer节点Gamma function也和阶乘对应起来
@ -45,7 +45,7 @@ $$
证明如下:
![](math/Statistics/basic_concepot/distribution/attachments/df15541df80b6065fb8296d80ffceac5_720.jpg)
![](math/statistic/basic_concepot/distribution/attachments/df15541df80b6065fb8296d80ffceac5_720.jpg)
@ -53,7 +53,7 @@ $$
Exponential Distribution指的是probability of the waiting time between events in a Poisson Process
Here's the exponential distribution explain: [Exponential Distribution](math/Statistics/basic_concepot/distribution/exponential_distribution_and_poisson_distribution.md)
Here's the exponential distribution explain: [Exponential Distribution](math/statistic/basic_concepot/distribution/exponential_distribution_and_poisson_distribution.md)
# Introduction

View File

@ -80,7 +80,7 @@ T值的大小并不直接影响相关性的可重复性。然而如果我们
### P-value
![](math/Statistics/significance_test/attachments/Pasted%20image%2020240415174359.png)
![](math/statistic/significance_test/attachments/Pasted%20image%2020240415174359.png)
P值P-value全称为概率值Probability value是统计假设检验中的一个重要概念。**它用于帮助我们决定是否拒绝零假设**null hypothesis。**P值衡量的是在零假设为真的情况下观察到的统计量如T值、Z值等或更极端情况出现的概率**。

View File

@ -35,7 +35,7 @@ $$
An RPG with a 33% blitz rate. But if the first two times you don't blitz, the third time you're bound to blitz. So what is the actual hit rate?
![](math/Statistics/stochastic_process/attachments/6fd1795d98c9031bc791909a8d098e25.jpg)
![](math/statistic/stochastic_process/attachments/6fd1795d98c9031bc791909a8d098e25.jpg)
Simulation Code:

View File

@ -70,7 +70,7 @@ For correlation, we usually use **p-value** to **quantify the confidence** of th
![](signal/signal_processing/algorithm/advanced_statistic/autocorrelation/attachments/Pasted%20image%2020240415171855.png)
About P-value, you have better know what's [significance test](math/Statistics/significance_test/whats_the_significance_test.md)
About P-value, you have better know what's [significance test](math/statistic/significance_test/whats_the_significance_test.md)
## Random Signal