hugo to v4

This commit is contained in:
PinkR1ver 2024-02-27 19:32:43 +08:00
parent 5e7c5da7d4
commit 8858d3cfb1
565 changed files with 9313 additions and 0 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 70 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 190 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 112 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 536 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 490 KiB

View File

@ -0,0 +1,97 @@
---
title: Optical Abberation
tags:
- optical
- photography
- basic
---
# What is optical aberration
光学像差是指镜头设计中的缺陷,它会导致光线散开而不是聚焦以形成清晰的图像。 范围从图像中的所有光线到只有某些点或边缘失焦。 成像时可能会出现几种类型的光学像差。 构建一个校正了所有可能像差的理想视觉系统会显着增加镜头的成本。 实际上,镜头中总会存在某种形式的像差,但将像差的影响降至最低至关重要。 因此,制造任何镜头通常都会做出一些妥协。
# Circle of confusion
要解释像差如何使图像模糊,首先要解释一下:什么是混淆圈? 当来自目标的光点到达镜头,然后会聚在传感器上时,它会很清晰。 否则,如果它在传感器之前或之后会聚,则传感器上的光分布会更广。 这可以在图 1 中看到,其中可以看到点光源会聚在传感器上,但随着传感器位置的变化,沿传感器散布的光量也会发生变化。
![](Physics/Optical/attachments/Fig_1_Circles_of_confusion.gif)
光线越分散,图像的焦点就越少。 除非光圈很小,否则图像中彼此距离较大的目标通常会使背景或前景失焦。 这是因为会聚在前景中的光与来自背景中较远目标的光会聚在不同的点。
# Types of Optical Aberration
## Coma慧差
彗形像差,又称彗星像差,此种像差的分布形状以类似于彗星的拖尾而得名。
![](Physics/Optical/attachments/Pasted%20image%2020230424110844.png)
这是一些透镜固有的或是光学设计造成的缺点,导致离开光轴的点光源,例如恒星,产生变形。特别是,彗形像差被定义为偏离入射光孔的放大变异。在折射或衍射的光学系统,特别是在宽光谱范围的影像中,彗形像差是波长的函数。
## Astigmatism (像散)
在两个垂直平面中传播的光线在聚焦于不同点时可能会产生像散。
这可以在图 3 中看到,其中两个焦点由红色水平面和蓝色垂直面表示。 图像中的最佳清晰度点将在这两个点之间,其中任一平面的混淆圈都不太宽。
![](Physics/Optical/attachments/Pasted%20image%2020230424111226.png)
当光学器件未对准时,散光会导致图像的侧面和边缘失真。 它通常被描述为在查看图像中的线条时缺乏清晰度。
这种形式的像差可以使用大多数优质光学器件中的适当透镜设计来校正。 固定散光的光学元件的最初设计是由卡尔蔡司完成的,并且已经发展了一百多年。 在这一点上,它通常只出现在质量非常低的镜头中,或者内部光学元件已损坏或通过镜头滴移动的情况下。
## (Petzval) Field Curvature (场曲)
许多镜头都有圆形的焦点。 这会导致图像出现柔和的角,主要是使图像的中心保持在焦点上。 然而,大多数镜头都有一些圆形的焦点,如果不进行一些裁剪,就无法聚焦整个图像。
场曲是图像平面由于多个焦点而变得不平坦的结果。
![](Physics/Optical/attachments/Pasted%20image%2020230424112159.png)
相机镜头已在很大程度上纠正了这一点,但在许多镜头上可能会发现一些场曲。 一些传感器制造商实际上正在研究可以校正弯曲焦点区域的弯曲传感器。 这种设计将允许传感器校正像差,而不需要以这种精度生产昂贵的镜头设计。 通过实施这种类型的传感器,可以使用更便宜的镜头来产生高质量的结果。 这方面的真实例子可以在开普勒太空天文台看到,那里使用弯曲的传感器阵列来校正望远镜的大型球面光学元件。
## Distortion (畸变)
畸变是指当一物体通过Lens系统成像时会产生一种对物体不同部分有不同的放大率的像差此种像差会导致物像的相似性变坏。但不影响像的清晰度。 根据对物体周边及中心有放大率的差异此种像差可分为两类:
### Barrel distortion (桶形畸变)
具有桶形失真的图像的边缘和侧面远离中心弯曲。 这在视觉上看起来像是图像中有一个凸起,因为它捕获了弯曲视场 (FoV, field of view) 的外观。 例如,当在高层建筑的高处使用较低焦距的镜头(也称为广角镜头)时,可以捕捉到更宽的 FoV。 如图 5 所示,使用产生非常扭曲和宽 FoV 的鱼眼镜头时,这种情况最为夸张。在此图像中,网格线用于帮助说明失真效果如何在靠近侧面的地方向外产生更拉伸的图像, 边缘。
![](Physics/Optical/attachments/Pasted%20image%2020230424113453.png)
### Pincushion distortion (枕型畸变)
当光线通过枕形畸变向光轴弯曲时,图像看起来会向内拉伸。 因此,图像的边缘和侧面看起来会向图像的中心弯曲。
这种形式的像差最常见于焦距较长的远摄镜头。
![](Physics/Optical/attachments/Pasted%20image%2020230424113838.png)
### Mustache distortion
**小胡子畸变**😂是枕形失真和桶形失真的组合。 这会导致图像的内部向外弯曲,而图像的外部向内弯曲。 小胡子失真是一种相当罕见的像差,其中不止一种失真模式会影响图像。 小胡子畸变通常是镜头设计非常糟糕的标志,因为这是导致像差融合的光学错误的高潮。
## Chromatic (位置色差)
### Longitudinal / axial aberration
光的颜色代表特定波长的光。 由于折射,彩色图像将有多个波长进入镜头并聚焦在不同的点。 纵向或轴向色差是由不同波长聚焦在沿光轴的不同点引起的。 波长越短,其焦点将离镜头越近,而波长越远,则反之,离镜头越远,如图 8 所示。通过引入较小的孔径,进入的光仍可能聚焦在不同的位置 点,但“混淆圈”的宽度(直径)会小得多,导致不那么剧烈的模糊。
![](Physics/Optical/attachments/Fig_8_Chromatic_abberation_animation.gif)
### Transverse / lateral aberration
导致不同波长沿图像平面分布的离轴光是横向或横向色差。 这会导致图像中主体边缘出现彩色边纹。 这比纵向色差更难校正。
![](Physics/Optical/attachments/Fig_9_Chromatic_aberration_lateral.gif)
它可以使用引入不同折射率的消色差双合透镜来固定。 通过将可见光谱的两端置于一个焦点上,可以消除色边。 对于横向和纵向色差,减小光圈的大小也有帮助。 此外,在高对比度环境(即具有非常亮的背景的图像)中不成像目标可能是有益的。 在显微镜中,镜头可能使用复消色差透镜 (APO) 而不是消色差透镜,消色差透镜使用三个透镜元件来校正入射光的所有波长。 当颜色最重要时,确保减轻色差将产生最佳效果。
# Reference
* [SIX OPTICAL ABERRATIONS THAT COULD BE IMPACTING YOUR VISION SYSTEM, https://www.lumenera.com](https://www.lumenera.com/blog/six-optical-aberrations-that-could-be-impacting-your-vision-system)
* [光学像差重要知识点详解|光学经典理论, 知乎 - 监控李誉](https://zhuanlan.zhihu.com/p/40149006)

View File

@ -0,0 +1,10 @@
---
title: Physics MOC
tags:
- physics
- MOC
---
# Electromagnetism
* [Electromagnetism MOC](Physics/Electromagnetism/Electromagnetism_MOC.md)

View File

@ -0,0 +1,47 @@
---
title: Doppler Effect
tags:
- physics
- basic
- wave
---
多普勒效应(**Doppler effect**)是波源和观察者有相对运动时,观察者接受到波的频率与波源发出的频率并不相同的现象。
远方急驶过来的火车鸣笛声变得尖细(即频率变高,波长变短),而离我们而去的火车鸣笛声变得低沉(即频率变低,波长变长),就是多普勒效应的现象,同样现象也发生在汽车鸣响与火车的敲钟声。
# General
在classical physics中source的speed和receiver的speed远小于wave在medium中的移动速度observed frequency $f$和emitted frequency$f_0$关系:
$$
f = (\frac{c \pm v_r}{c \pm v_s})f_0
$$
* $c$是wave在介质中的速度
* $v_r$是receiver相对于介质的速度如果receiver向source移动则分子为加号反之为减号
* $v_s$是source相对于介质的速度如果source远离receiver则分母为加号反之为减号
> [!note]
> 请注意,此关系预测如果源或接收器中的任何一个远离另一个,频率将会降低。
$$
\frac{f}{v_{wr}} = \frac{f_0}{v_{ws}} = \frac{1}{\lambda}
$$
* $v_{\omega r}$是wave speed相对于receiver
* $v_{\omega s}$是wave speed相对于source
* $\lambda$是波长
## Example
![](Physics/Wave/attachments/Dopplereffectsourcemovingrightatmach0.7.gif)
其中$v_s = 0.7c$,波前开始在源的右侧(前面)聚集,并在源的左侧(后面)进一步分开。
在前面的receiver会听到higher frequency也就是$f = \frac{c}{c-0.7c}f_0 = 3.33f_0$后面的receiver会听到lower frequency也就是$f = \frac{c}{c + 0.7c}f_0 = 0.59f_0$
# Reference
* [多普勒效应 - Wiki](https://zh.wikipedia.org/wiki/%E5%A4%9A%E6%99%AE%E5%8B%92%E6%95%88%E5%BA%94)

Binary file not shown.

After

Width:  |  Height:  |  Size: 568 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.1 KiB

View File

@ -0,0 +1,50 @@
对天线进行测距能力的测试
# 背景
![](Report/attachments/96251ac46494ab01294e570e352c426.png)
# 测试结果
## 无穷远距离测量
前方30cm内无反射超出本雷达测距能力极限近似为无穷远距离内无反射得到收集端电压
![](Report/attachments/7983094eb03d1dcc285edf9c1768018.png)
以前的天线收集的数据:
![](Report/attachments/f5d557933b15f8ea7f6861f70663d13.png)
问题在于两点:
* 目前天线稳定性不足
* 核心信号峰值下降为1.7v左右而之前核心信号为2.2v
## 实时测距实验
*实时测距实验为在天线段实时测量信号并在前面按照时间放置金属挡板检测天线的测距能力。*
实验大致的放置时间为:
1. 0-25s不放置金属挡板
2. 25-50s金属挡板贴紧天线
3. 50-75s不放置金属挡板
4. 75-100s在10cm处放置金属挡板
5. 100-125s不放置金属挡板
6. 125-150s在20cm处放置金属挡板
7. 175-200s不放置金属挡板
8. 150-175s在30cm处放置金属挡板
新天线收集数据:
![](Report/attachments/abaec3368e16f2c9be67b5edbba39be.png)
旧天线收集信号:
![](Report/attachments/ac4c5aa53392835d3db04a78e73476b.png)
问题在于:
* 新天线信号不稳定,与无穷远测试中的结果吻合。
* 导致了不同距离的信号区分度丧失

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 132 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

25
content/_index.md Normal file

File diff suppressed because one or more lines are too long

View File

62
content/atlas.md Normal file
View File

@ -0,0 +1,62 @@
---
title: Atlas - Map of Maps
tags:
- MOC
---
🚧 There are notebooks about his research career:
* [Deep Learning & Machine Learning](computer_sci/deep_learning_and_machine_learning/Deep%20_Learning_MOC.md)
* [[synthetic_aperture_radar_imaging/SAR_MOC| Synthetic Aperture Radar(SAR) Imaging]]
💻 Also, his research needs some basic science to support
* [Data Structure and Algorithm MOC](computer_sci/data_structure_and_algorithm/MOC.md)
* [Hardware](computer_sci/Hardware/Hardware_MOC.md)
* [Physics](Physics/Physics_MOC.md)
* [Signal Processing](signal_processing/signal_processing_MOC.md)
* [Data Science](data_sci/data_sci_MOC.md)
* [About coding language design detail](computer_sci/coding_knowledge/coding_lang_MOC.md)
* [Math](Math/MOC.md)
* [Computational Geometry](computer_sci/computational_geometry/MOC.md)
* [Code Framework Learn](computer_sci/code_frame_learn/MOC.md)
🦺 I also need some tool to help me:
* [Git](toolkit/git/git_MOC.md)
💻 Code Practice:
* [💽Programing Problem Solution Record](https://github.com/PinkR1ver/JudeW-Problemset)
🛶 Also, he learn some knowledge about his hobbies:
* [📷 Photography](Photography/Photography_MOC.md)
* [📮文学](文学/文学_MOC.md)
* [🥐Food](food/MOC.md)
* [🎬Watching List](https://pinkr1ver.notion.site/5e136466f3664ff1aaaa75b85446e5b4?v=a41efbce52a84f7aa89d8f649f4620f6&pvs=4)
⭐ Here to find my recent study:
* [Recent notes (this function cannot be used on web)](recent.md)
* [Papers Recently Read](research_career/papers_read.md)
🎏 I also have some plans in my mind to do
* [Life List🚀](plan/life.md)
☁️ I also have some daily thoughts:
* [Logs](log/log_MOC.md)

View File

@ -0,0 +1,22 @@
---
title: Deep Learning - MOC
tags:
- MOC
- deep-learning
---
# Tech Explanation
* [⭐Deep Learning MOC](computer_sci/deep_learning_and_machine_learning/deep_learning/deep_learning_MOC.md)
* [✨Machine Learning MOC](computer_sci/deep_learning_and_machine_learning/machine_learning/MOC.md)
* [LLM - MOC](computer_sci/deep_learning_and_machine_learning/LLM/LLM_MOC.md)
# Deep-learning Research
* [Model Interpretability](computer_sci/deep_learning_and_machine_learning/Model_interpretability/Model_Interpretability_MOC.md)
* [Famous Model - MOC](computer_sci/deep_learning_and_machine_learning/Famous_Model/Famous_Model_MOC.md)
* [Model Evaluation - MOC](computer_sci/deep_learning_and_machine_learning/Evaluation/model_evaluation_MOC.md)

Binary file not shown.

After

Width:  |  Height:  |  Size: 119 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 119 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 77 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 68 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 43 KiB

View File

@ -0,0 +1,8 @@
---
title: Model Evaluation - MOC
tags:
- deep-learning
- evaluation
---
* [Model Evaluation in Time Series Forecasting](computer_sci/deep_learning_and_machine_learning/Evaluation/time_series_forecasting.md)

View File

@ -0,0 +1,121 @@
---
title: Model Evaluation in Time Series Forecasting
tags:
- deep-learning
- evaluation
- time-series-dealing
---
![](computer_sci/deep_learning_and_machine_learning/Evaluation/attachments/Pasted%20image%2020230526162839.png)
# Some famous time series scoring technics
1. **MAE, RMSE and AIC**
2. **Mean Forecast Accuracy**
3. **Warning: The time series model EVALUATION TRAP!**
4. **RdR Score Benchmark**
## MAE, RMSE, AIC
MAE means **Mean Absolute Error (MAE)** and RMSE means **Root Mean Squared Error (RMSE)**.
这是两个衡量 continuous variables的accuracy的著名指标MAE在以前的文章中被时常使用16年的观察已经发现RMSE或者其他version的R-squared逐渐被使用起来
*我们需要了解何时使用哪种指标会更好*
### MAE
$$
\text{MAE} = \frac{1}{n}\sum_{j=1}^n |y_j - \hat{y}_j|
$$
MAE的特点在于所有individual difference有着equal weight
如果将绝对值去掉MAE会变成**Mean Bias Error (MBE)**使用MBE时要注意正反bias相互抵消
### RMSE
$$
\text{RMSE} = \sqrt{\frac{1}{n} \sum_{j=1}^n (y_j - \hat{y}_j)^2}
$$
均方根误差RMSE是一种二次评分规则它还测量误差的平均幅度。它是预测值和实际观测值之间差异的平方的平均值的平方根。
### AIC
$$
\text{AIC} = 2k - 2\ln{(\hat{L})}
$$
$k$是模型参数的估计,$\hat{L}$是模型似然函数(likelihood function)的最大化值
**Akaike information criterion**赤池信息准则AIC是一个有助于比较模型的指标因为它同时考虑了模型对数据的拟合程度和模型的复杂性。
AIC衡量信息的损失并**对模型的复杂性进行惩罚**。它是*参数数量惩罚后的负对数似然函数*。AIC的主要思想是模型参数越少越好。**AIC允许您测试模型在不过拟合数据集的情况下拟合数据的程度**
### Comparison
#### Similarities between MAE and RMSE
均方误差MAE和均方根误差RMSE都以感兴趣变量的单位来表示平均模型预测误差。这两个指标都可以在0到∞的范围内变化并且对误差的方向不敏感。它们是负向评分指标也就是说数值越低越好。
#### Differences between MAE and RMSE
*由于误差在求平均之前被平方RMSE对大误差给予相对较高的权重*。这意味着在特别不希望出现大误差的情况下RMSE应该更有用而在MAE的平均值中这些大误差将被稀释
![](computer_sci/deep_learning_and_machine_learning/Evaluation/attachments/Pasted%20image%2020230526161422.png)
AIC the lower is better但没有perfect score只能用来相同dataset下不同model的性能
## Mean Forecast Accuracy
![](computer_sci/deep_learning_and_machine_learning/Evaluation/attachments/Pasted%20image%2020230526162035.png)
计算每个点的Forecast Accuracy然后求平均得到 Mean Forecast Accuracy
Mean Forecast Accuracy的重大缺陷在大的偏离值造成巨大的负面影响比如$1 - \frac{|\hat{y}_j - y_j|}{y_j} = 1 - \frac{250-25}{25} = -800\%$
解决方案是将Forecast Accuracy的最小值限制为0%同时可以使用Median代替Mean。
一般来说,**当你的误差分布偏斜时,你应该使用 Median 而不是 Mean**。 在某些情况下Mean Forecast Accuray也可能毫无意义。 如果你还记得你的统计数据; 变异系数 (**coefficient of variation**, CV) 表示标准偏差与平均值的比率($\text{CV} = (\text{Standard Deviation}/\text{Mean} * 100)$)。 大 CV 值意味着大变异性,这也意味着围绕均值的离差程度更大。 **例如,我们可以将 CV 高于 0.7 的任何事物视为高度可变且不可真正预测的。 另外,还可以说明你的预测模型预测能力很不稳定!**
## RdR Score Benchmark (这是一个具有实验性的指标blogger指出这个指标并没有在research paper出现过)
RdR metric stands for:
* *R*: **Naïve Random Walk**
* *d*: **Dynamic Time Warping**
* *R*: **Root Mean Squared Error**
### DTW to deal with shape similarity
![](computer_sci/deep_learning_and_machine_learning/Evaluation/attachments/Pasted%20image%2020230526163614.png)
RMSE、MAE这些指标都没有考虑到一个重要的标准**THE SHAPE SIMILARITY**
RdR Score Benchmark使用 [**Dynamic Time Warping(DTW动态时间调整)** ](computer_sci/deep_learning_and_machine_learning/Trick/DTW.md)作为shape similarity的指标
![](computer_sci/deep_learning_and_machine_learning/Evaluation/attachments/Pasted%20image%2020230526164106.png)
欧氏距离在时间序列之间可能是一个不好的选择,因为时间轴上存在扭曲的情况。
* DTW通过“同步”/“对齐”时间轴上的不同信号,找到两个时间序列之间的最佳(最小距离)扭曲路径
### RdR score means
![](computer_sci/deep_learning_and_machine_learning/Evaluation/attachments/Pasted%20image%2020230529130501.png)
![](computer_sci/deep_learning_and_machine_learning/Evaluation/attachments/Pasted%20image%2020230529130509.png)
*RdR score*通过RMSE和DTW distance来计算用于比较你的model和Radnom Walk(*Random Walk的RdR score = 0*)相比的优越性
### RdR calculation details
可以通过绘制 RMSE vs. DTW来计算RdR score绘制的图如下所示
![](computer_sci/deep_learning_and_machine_learning/Evaluation/attachments/Pasted%20image%2020230529130856.png)
计算矩阵面积来计算RdR score文章里并没有完整介绍计算在[github code](https://github.com/CoteDave/blog/tree/master/RdR%20score)里有,并不确定)
# Reference
* M.Sc, Dave Cote. “RdR Score Metric for Evaluating Time Series Forecasting Models.” _Medium_, 8 Feb. 2022, https://medium.com/@dave.cote.msc/rdr-score-metric-for-evaluating-time-series-forecasting-models-1c23f92f80e7.
* JJ. “MAE and RMSE — Which Metric Is Better?” _Human in a Machine World_, 23 Mar. 2016, https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d.
* _Accelerating Dynamic Time Warping Subsequence Search with GPU_. https://www.slideshare.net/DavideNardone/accelerating-dynamic-time-warping-subsequence-search-with-gpu. Accessed 29 May 2023.

View File

@ -0,0 +1,77 @@
---
title: DeepAR - Time Series Forcasting
tags:
- deep-learning
- model
- time-series-dealing
---
DeepAR, an autoregressive recurrent network developed by Amazon, is the first model that could natively work on multiple time-series. It's a milestone in time-series community.
# What is DeepAR
> [!quote]
> DeepAR is the first successful model to combine Deep Learning with traditional Probabilistic Forecasting.
* **Multiple time-series support**
* **Extra covariates**: *DeepAR* allows extra features, covariates. It is very important for me when I learn *DeepAR*, because in my task, I have corresponding feature for each time series.
* **Probabilistic output**:  Instead of making a single prediction, the model leverages [**quantile loss**](computer_sci/deep_learning_and_machine_learning/Trick/quantile_loss.md) to output prediction intervals.
* **“Cold” forecasting:** By learning from thousands of time-series that potentially share a few similarities, _DeepAR_ can provide forecasts for time-series that have little or no history at all.
# Block used in DeepAR
* [LSTM](computer_sci/deep_learning_and_machine_learning/deep_learning/LSTM.md)
# *DeepAR* Architecture
DeepAR模型并不直接使用LSTMs去计算prediction而是去估计Gaussian likelihood function的参数即$\theta=(\mu,\sigma)$估计Gaussian likelihood function的mean和standard deviation。
## Training Step-by-Step
![](computer_sci/deep_learning_and_machine_learning/Famous_Model/attachments/Pasted%20image%2020230523134255.png)
假设目前我们在time-series $i$ 的 t 时刻,
1. LSTM cell会输入covariates $x_{i,t}$,即$x_i$在t时刻的值还有上一时刻的target variable$z_{i,t-1}$LSTM还需要输入上一时刻的隐藏状态$h_{i,t-1}$
2. LSTM紧接着就会输出当前的hidden state $h_{i,t}$,会输入到下一步中
3. Gaussian likelihood function里的parameter$\mu$和$\sigma$会从$h_{i,t}$中不直接计算出,计算细节在后面
> [!quote]
> 换言之,这个模型是为了得到最好的$\mu$和$\sigma$去构建gaussian distribution让预测更接近$z_{i,t}$;同时,因为*DeepAR*每次都是train and predicts a single data point所以这个模型也被称为autoregressive模型
## Inference Step-by-Step
![](computer_sci/deep_learning_and_machine_learning/Famous_Model/attachments/Pasted%20image%2020230523141219.png)
在使用model进行预测的时候某一改变的就是使用预测值$\hat{z}$ 代替真实值$z$,同时$\hat{z}_{i,t}$是在我们模型学习到的Gaussian distribution里sample得到的而这个Gaussian distribution里的参数$\mu$和$\sigma$并不是model直接学习到的*DeepAR*如何做到这一点的呢?
# Gaussian Likelihood
$$
\ell_G(z|\mu,\sigma) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp{(-\frac{(z-\mu)^2)}{2\sigma^2}}
$$
Estimate gaussian distribution的任务一般会被转化成maximize gaussian log-likelihood function的任务即**MLEformulas**(maximum log-likelihood estimators)
**Gaussian log-likelihood function**:
$$
\mathcal{L} = \sum_{i=1}^{N}\sum_{t=t_o}^{T} \log{\ell(z_{i,t}|\theta(h_{i,t}))}
$$
# Parameter estimation in *DeepAR*
在统计学中预估Gaussian Distribution一般使用MLEformulas但是在*DeepAR*中并不这么去做而是使用两个dense layer去做预估如下图
![](computer_sci/deep_learning_and_machine_learning/Famous_Model/attachments/Pasted%20image%2020230523151201.png)
使用dense layer的方式去预估Gaussian distribution的原因在于可以使用backpropagation
# Reference
* [https://towardsdatascience.com/deepar-mastering-time-series-forecasting-with-deep-learning-bc717771ce85](https://towardsdatascience.com/deepar-mastering-time-series-forecasting-with-deep-learning-bc717771ce85)

View File

@ -0,0 +1,11 @@
---
title: Famous Model MOC
tags:
- deep-learning
- MOC
---
# Time-series
* [DeepAR](computer_sci/deep_learning_and_machine_learning/Famous_Model/DeepAR.md)

View File

@ -0,0 +1,8 @@
---
title: Temporal Fusion Transformer
tags:
- deep-learning
- model
- time-series-dealing
---

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 65 KiB

View File

@ -0,0 +1,25 @@
---
title: Large Language Model(LLM) - MOC
tags:
- deep-learning
- LLM
- NLP
---
# Training
* [Training Tech Outline](computer_sci/deep_learning_and_machine_learning/LLM/train/steps.md)
* [⭐⭐⭐Train LLM from scratch](computer_sci/deep_learning_and_machine_learning/LLM/train/train_LLM.md)
* [⭐⭐⭐Detailed explanation of RLHF technology](computer_sci/deep_learning_and_machine_learning/LLM/train/RLHF.md)
* [How to do use fine tune tech to create your chatbot](computer_sci/deep_learning_and_machine_learning/LLM/train/finr_tune/how_to_fine_tune.md)
* [Learn finetune by Stanford Alpaca](computer_sci/deep_learning_and_machine_learning/LLM/train/finr_tune/learn_finetune_byStanfordAlpaca.md)
# Metrics
How to evaluate a LLM performance?
* [Tasks to evaluate BERT - Maybe can be deployed in other LM](computer_sci/deep_learning_and_machine_learning/LLM/metircs/some_task.md)
# Basic
* [LLM Hyperparameter](computer_sci/deep_learning_and_machine_learning/LLM/basic/llm_hyperparameter.md)

Binary file not shown.

After

Width:  |  Height:  |  Size: 216 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 216 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 173 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 444 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.5 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 MiB

View File

@ -0,0 +1,56 @@
---
title: LLM hyperparameter
tags:
- hyperparameter
- LLM
- deep-learning
- basic
---
# LLM Temperature
Temperature definition come from the physical meaning of temperature. The more higher temperature, the atoms moving more faster, meaning more randomness.
![](computer_sci/deep_learning_and_machine_learning/LLM/basic/attachments/physic_temp.gif)
LLM temperature is a hyperparameter that regulates **the randomness, or creativity.**
* Higher the LLM temperature, more diverse and creative, increasing likelihood of straying from context.
* Lower the LLM temperature, more focused and deterministic, sticking closely to the most likely prediction
![](computer_sci/deep_learning_and_machine_learning/LLM/basic/attachments/Pasted%20image%2020230627160125.png)
## More detail
The LLM model is to give a probability of next word, like this:
![](computer_sci/deep_learning_and_machine_learning/LLM/basic/attachments/Pasted%20image%2020230627162848.png)
"A cat is chasing a …", there are lots of words can be filled in that blank. Different words have different probabilities, in the model, we output the next word ratings.
Sure, we can always pick the highest rating word, but that would result in very standard predictable boring sentences, and the model wouldn't be equivalent to human language, because we don't always use the most common word either.
So, we want to design a mechanism that **allows all words with a decent rating to occur with a reasonable probability**, that's why we need temperature in LLM model.
Like real physic world, we can do samples to describe the distribution, *we use SoftMax to describe the distribution of the probability of the next word*. The temperature is the element $T$ in the formula:
$$
p_i = \frac{\exp{(\frac{R_i}{T})}}{\sum_i \exp{(\frac{R_i}{T})}}
$$
![](computer_sci/deep_learning_and_machine_learning/LLM/basic/attachments/Pasted%20image%2020230627163514.png)
More lower the $T$, the higher rating word's probability will goes to 100%, and more higher the $T$, the probability will be more smoother for very words.
*The gif below is important and intuitive.*
![](computer_sci/deep_learning_and_machine_learning/LLM/basic/attachments/rating_probabililty.gif)
So, set different $T$, the next word's probability will be changed, we will output next word depending on the probability.
![](computer_sci/deep_learning_and_machine_learning/LLM/basic/attachments/Pasted%20image%2020230627165311.png)
# Reference
* [LLM Temperature, dedpchecks](https://deepchecks.com/glossary/llm-parameters/#:~:text=One%20intriguing%20parameter%20within%20LLMs,of%20straying%20from%20the%20context.)
* [⭐⭐⭐https://www.youtube.com/watch?v=YjVuJjmgclU](https://www.youtube.com/watch?v=YjVuJjmgclU)

Binary file not shown.

After

Width:  |  Height:  |  Size: 272 KiB

View File

@ -0,0 +1,44 @@
---
title: LangChain Explained
tags:
- LLM
- basic
- langchain
---
# What is LangChain
LangChain is an open source framework that allows AI developers to combine LLMs like GPT-4 *with external sources of computation and data*.
# Why LangChain
LangChain can make LLM answer question depending on your own documents. It can help you doing lots of amazing apps.
You can use LangChain to make GPT to do analysis on your own company data, booking flight depending on schedule. summarizing abstract on bunches of PDFs, .….
# LangChain value propositions
## Components
* LLM Wrappers
* Prompt Templates
* Indexes for relevant information retrieval
## Chains
Assemble components to solve a specific task - finding info in a book...
## Agents
Agents allow LLMs to interact with it's environment. - For instance, make API request with a specific action
# LangChain Framework
![](computer_sci/deep_learning_and_machine_learning/LLM/langchain/attachments/Pasted%20image%2020230627154149.png)
# Reference
* [https://www.youtube.com/watch?v=aywZrzNaKjs](https://www.youtube.com/watch?v=aywZrzNaKjs)
*

Binary file not shown.

After

Width:  |  Height:  |  Size: 88 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 88 KiB

View File

@ -0,0 +1,36 @@
---
title: Tasks to evaluate BERT - Maybe can be deployed in other LM
tags:
- LLM
- metircs
- deep-learning
- benchmark
---
# Overview
![](computer_sci/deep_learning_and_machine_learning/LLM/metircs/attachments/Pasted%20image%2020230629140929.png)
# MNLI-m (Multi-Genre Natural Language Inference - Matched):
MNLI-m is a benchmark dataset and task for natural language inference (NLI). The goal of NLI is to determine the logical relationship between two given sentences: whether the relationship is "entailment," "contradiction," or "neutral." MNLI-m focuses on matched data, which means the sentences are drawn from the same genres as the sentences in the training set. It is part of the GLUE (General Language Understanding Evaluation) benchmark, which evaluates the performance of models on various natural language understanding tasks.
# QNLI (Question Natural Language Inference):
QNLI is another NLI task included in the GLUE benchmark. In this task, the model is given a sentence that is a premise and a sentence that is a question related to the premise. The goal is to determine whether the answer to the question can be inferred from the given premise. The dataset for QNLI is derived from the Stanford Question Answering Dataset (SQuAD).
# MRPC (Microsoft Research Paraphrase Corpus):
MRPC is a dataset used for paraphrase identification or semantic equivalence detection. It consists of sentence pairs from various sources that are labeled as either paraphrases or not. The task is to classify whether a given sentence pair expresses the same meaning (paraphrase) or not. MRPC is also part of the GLUE benchmark and helps evaluate models' ability to understand sentence similarity and equivalence.
# SST-2 (Stanford Sentiment Treebank - Binary Sentiment Classification):
SST-2 is a binary sentiment classification task based on the Stanford Sentiment Treebank dataset. The dataset contains sentences from movie reviews labeled as either positive or negative sentiment. The task is to classify a given sentence as expressing a positive or negative sentiment. SST-2 is often used to evaluate the ability of models to understand and classify sentiment in natural language.
# SQuAD (Stanford Question Answering Dataset):
SQuAD is a widely known dataset and task for machine reading comprehension. It consists of questions posed by humans on a set of Wikipedia articles, where the answers to the questions are spans of text from the corresponding articles. The goal is to build models that can accurately answer the questions based on the provided context. SQuAD has been instrumental in advancing the field of question answering and evaluating models' reading comprehension capabilities.
Overall, these tasks and datasets serve as benchmarks for evaluating natural language understanding and processing models. They cover a range of language understanding tasks, including natural language inference, paraphrase identification, sentiment analysis, and machine reading comprehension.

View File

@ -0,0 +1,65 @@
---
title: Reinforcement Learning from Human Feedback
tags:
- LLM
- deep-learning
- RLHF
- LLM-training-method
---
# Review: Reinforcement Learning Basics
![](computer_sci/deep_learning_and_machine_learning/LLM/train/attachments/Pasted%20image%2020230628145009.png)
Reinforcement learning is a mathematical framework.
Demystify the reinforcement learning model, it's a open-ended model using reward function to optimize agent to solve complex task in target environment.
<!---
# Origins of RLHF
## Pre Deep RL
![](Deep_Learning_And_Machine_Learning/LLM/train/attachments/Pasted%20image%2020230628160836.png)
Before, Deep RL don't use neural network to represent policy. What this system did was a machine learning system that created a policy by having humans label the actions that an agent took as being kind of correct or incorrect. This was just a simple decision rule where humans labeled every actions as good or bad. This was essentially a reward model and a policy put together.
## For Deep RL
![](Deep_Learning_And_Machine_Learning/LLM/train/attachments/Pasted%20image%2020230628161627.png)
--->
# Step by Step
For RLHF training method, here are three core steps:
1. Pretraining a language model
2. Gathering data(问答数据) and training a reward model
3. Fine-tuning the LM with reinforcement learning
## Step 1. Pretraining Language Models
Read this to learn how to train a LM:
[Pretraining language models](computer_sci/deep_learning_and_machine_learning/LLM/train/train_LLM.md)
OpenAI used a smaller version of GPT-3 for its first popular RLHF model - InstructGPT.
Nowadays, RLHF is new area, there's no answer to which model is the best for starting point of RLHF and using expensive augmented data to fine-tune is not necessarily.
## Step 2. Reward model training
In reward model, we integrate human preferences into the system.
![](computer_sci/deep_learning_and_machine_learning/LLM/train/attachments/Pasted%20image%2020230629145231.png)
# Reference
* [Reinforcement Learning from Human Feedback: From Zero to chatGPT, YouTube, HuggingFace](https://www.youtube.com/watch?v=2MBJOuVq380)
* [Hugging Face blog, ChatGPT 背后的“功臣”——RLHF 技术详解](https://huggingface.co/blog/zh/rlhf)

Binary file not shown.

After

Width:  |  Height:  |  Size: 62 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 70 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 90 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 86 KiB

View File

@ -0,0 +1,8 @@
---
title: How to make custom dataset?
tags:
- dataset
- LLM
- deep-learning
---

View File

@ -0,0 +1,7 @@
---
title: How to do use fine tune tech to create your chatbot
tags:
- deep-learning
- LLM
---

View File

@ -0,0 +1,19 @@
---
title: Learn finetune by Stanford Alpaca
tags:
- deep-learning
- LLM
- fine-tune
- LLaMA
---
![](computer_sci/deep_learning_and_machine_learning/LLM/train/finr_tune/attachments/Pasted%20image%2020230627145954.png)
# Reference
* [https://www.youtube.com/watch?v=pcszoCYw3vc](https://www.youtube.com/watch?v=pcszoCYw3vc)
* [https://crfm.stanford.edu/2023/03/13/alpaca.html](https://crfm.stanford.edu/2023/03/13/alpaca.html)

View File

@ -0,0 +1,24 @@
---
title: LLM training steps
tags:
- LLM
- deep-learning
---
训练大型语言模型LLM的方法通常涉及以下步骤
1. **数据收集**收集大规模的文本数据作为训练数据。这些数据可以是互联网上的文本、书籍、文章、新闻、对话记录等。数据的质量和多样性对于训练出高质量的LLM非常重要。
2. **预处理**:对数据进行预处理以使其适合模型训练。这包括分词(将文本划分为词或子词单元)、建立词汇表(将词映射到数字表示)、清理和规范化文本等操作。
3. **构建模型架构**选择适当的模型架构来构建LLM。目前最常用的模型架构是Transformer其中包含多层的自注意力机制和前馈神经网络层。
4. **预训练**:使用大规模的文本数据集对模型进行预训练。预训练是指在无监督的情况下,通过让模型学习预测缺失的词语或下一个词语等任务来提取语言知识。这使得模型能够学习到丰富的语言表示。
5. **微调Fine-tuning**:在预训练之后,使用特定的任务数据对模型进行微调。微调是指在特定任务的标注数据上进行有监督的训练,例如文本生成、问题回答等。通过微调,模型可以更好地适应特定任务的要求。
6. **超参数调优**:调整模型的超参数,例如学习率、批量大小、模型层数等,以获得更好的性能和效果。
7. **评估和迭代**:对训练后的模型进行评估,并根据评估结果进行迭代改进。这可能包括调整模型架构、增加训练数据、调整训练策略等。
这些步骤通常是迭代进行的通过不断的训练和改进使LLM能够在各种自然语言处理任务中展现出更好的性能和生成能力。值得注意的是LLM的训练需要大量的计算资源和时间并且通常由专业团队在大规模的计算环境中进行。

View File

@ -0,0 +1,143 @@
---
title: Train LLM from scratch
tags:
- LLM
- LLM-training-method
- deep-learning
---
# Find a dataset
Find a corpus of text in language you prefer.
* Such as [OSCAR](https://oscar-project.org/)
Intuitively, the more data you can get to pretrain on, the better results you will get.
# Train a tokenizer
There are something you need take into consideration when train a tokenizer
## Tokenization
You can read more detailed post - [Tokenization](computer_sci/deep_learning_and_machine_learning/NLP/basic/tokenization.md)
Tokenization is the process of **breaking text into words of sentences**. These tokens helps machine to learn context of the text. This helps in *interpreting the meaning behind the text*. Hence, tokenization is *the first and foremost process while working on the text*. Once the tokenization is performed on the corpus, the resulted tokens can be used to prepare vocabulary which can be used for further steps to train the model.
Example:
“The city is on the river bank” -> “The”, ”city”, ”is”, ”on”, ”the”, ”river”, ”bank”
Here are some typical tokenization:
* Word ( White Space ) Tokenization
* Character Tokenization
* **Subword Tokenization (SOTA)**
Subword Tokenization can handle OOV(Out Of Vocabulary) problem effectively.
### Subword Tokenization Algorithm
* **Byte pair encoding** *(BPE)*
* **Byte-level byte pair encoding**
* **WordPiece**
* **unigram**
* **SentencePiece**
## Word embedding
After tokenization, we make our text into token. We also wants to present token in math type. Here we use word embedding technique, converting word to math.
Here are some typical word embedding algorithms:
* **Word2Vec**
* skip-gram
* continuous bag-of-words (CBOW)
* **GloVe** (Global Vectors for Word Representations)
* **FastText**
* **ELMo** (Embeddings from Language Models)
* **BERT** (Bidirectional Encoder Representations from Transformers)
* a language model rather than a traditional word embedding algorithm. **While BERT does generate word embeddings as a byproduct of its training process**, its primary purpose is to learn contextualized representations of words and text segments.
# Train a language model from scratch
We need clear the definition of language model.
## Language model definition
Simply to say, the language model is a computational model or algorithm that is designed to understand and generate human language. It is a type of artificial intelligence(AI) model that uses *statistical and probabilistic techniques to predict and generate sequences of words and sentences*.
It captures the statistical relationships between words or characters and *builds a probability distribution of the likelihood of a particular word or sequence of words appearing in a given context.*
Language model can be used for various NLP tasks, including machine translation, speech recognition, text generation and so on....
As usual, a language model takes a seed input or prompt and uses its *learned knowledge of language(model weights)* to predict most likely words or characters to follow.
The SOTA of language model today is GPT-4.
## Language model algorithm
### Classical LM
* **n-gram**
* N-gram can be used as *both a tokenization algorithm and a component of a language model*. In my searching experience, n-grams are easier to understand as a language model to predict a likelihood distribution.
* **HMMs** (Hidden Markov Models)
* **RNNs** (Recurrent Neural Networks)
### Cutting-edge
* **GPT** (Generative Pre-trained Transformer)
* **BERT** (Bidirectional Encoder Representations from Transformers)
* **T5** (Text-To-Text Transfer Transformer)
* **Megatron-LM**
## Train Method
Different designed models usually have different training methods. Here we take BERT-like model as example.
### BERT-Like model
![](computer_sci/deep_learning_and_machine_learning/LLM/train/attachments/Pasted%20image%2020230629104307.png)
To train BERT-Like model, we'll train it on a task of **Masked Language Modeling**(MLM), i.e. the predict how to fill arbitrary tokens that we randomly mask in the dataset.
Also, we'll train BERT-Like model using **Next Sentence Prediction** (NSP). *MLM teaches BERT to understand relationships between words and NSP teaches BERT to understand long-term dependencies across sentences.* In NSP training, give BERT two sentences, A and B, then BERT will determine B is A's next sentence or not, i.e. outputting `IsNextSentence` or `NotNextSentence`
With NSP training, BERT will have better performance.
| Task | MNLI-m (acc) | QNLI (acc) | MRPC (acc) | SST-2 (acc) | SQuAD (f1) |
| --- | --- | --- | --- | --- | --- |
| With NSP | 84.4 | 88.4 | 86.7 | 92.7 | 88.5 |
| Without NSP | 83.9 | 84.9 | 86.5 | 92.6 | 87.9 |
[Table source](https://arxiv.org/pdf/1810.04805.pdf)
[Table metrics explain](computer_sci/deep_learning_and_machine_learning/LLM/metircs/some_task.md)
# Check LM actually trained
## Take BERT as example
Aside from looking at the training and eval losses going down, we can check our model using `FillMaskPipeline`.
This is a method input *a masked token (here, `<mask>`) and return a list of the most probable filled sequences, with their probabilities.*
With this method, we can see our LM captures more semantic knowledge or even some sort of (statistical) common sense reasoning.
# Fine-tune our LM on a downstream task
Finally, we can fine-tune our LM on a downstream task such as translation, chatbot, text generation and so on.
Different downstream task may need different methods to do fine-tune.
# Example
[https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/01_how_to_train.ipynb#scrollTo=G-kkz81OY6xH](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/01_how_to_train.ipynb#scrollTo=G-kkz81OY6xH)
# Reference
* [HuggingFace blog, How to train a new language model from scratch using Transformers and Tokenizers](https://huggingface.co/blog/how-to-train)
* [Medium blog, NLP Tokenization](https://medium.com/nerd-for-tech/nlp-tokenization-2fdec7536d17)
* [Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. (2018). Improving language understanding by generative pre-training. , .](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf)

View File

@ -0,0 +1,9 @@
---
title: Model Interpretability - MOC
tags:
- MOC
- deep-learning
- interpretability
---
* [SHAP](computer_sci/deep_learning_and_machine_learning/Model_interpretability/SHAP.md)

View File

@ -0,0 +1,193 @@
---
title: SHAP - a reliable way to analyze model interpretability
tags:
- deep-learning
- interpretability
- algorithm
---
SHAP is the most popular model-agnostic technique that is used to explain predictions. SHAP stands for **SH**apley **A**dditive ex**P**lanations
Shapely values are obtained by incorporating concepts from *Cooperative Game Theory*  and *local explanations*
# Mathematical and Algorithm Foundation
## Shapely Values
Shapely values were from game theory and invented by Lloyd Shapley. Shapely values were invented to be a way of providing a fair solution to the following question:
> [!question]
> If we have a coalition **C** that collaborates to produce a value **V**: How much did each individual member contribute to the final value
The method here we assess each individual members contribution is to removing each member to get a new coalition and then compare their production, like this graphs:
![](computer_sci/deep_learning_and_machine_learning/Model_interpretability/attachments/Pasted%20image%2020230329165429.png)
And then, we get every member 1 included or not included coalitions like this:
![](computer_sci/deep_learning_and_machine_learning/Model_interpretability/attachments/Pasted%20image%2020230329165523.png)
Using left value - right value, we can get difference like image left above; And then we calculate the mean of them:
$$
\varphi_i=\frac{1}{\text{Members}}\sum_{\forall \text{C s.t. i}\notin \text{C}} \frac{\text{Marginal Contribution of i to C}}{\text{Coalitions of size |C|}}
$$
## Shapely Additive Explanations
We need to know whats **additive** mean here. Lundberg and Lee define an additive feature attribution as follows:
![](computer_sci/deep_learning_and_machine_learning/Model_interpretability/attachments/Pasted%20image%2020230329165623.png)
![](computer_sci/deep_learning_and_machine_learning/Model_interpretability/attachments/Pasted%20image%2020230329165818.png)
$x'$, the simplified local inputs usually means that we turn a feature vector into a discrete binary vector, where features are either included or excluded. Also, the $g(x')$ should take this form:
$$
g(x')=\varphi_0+\sum_{i=1}^N \varphi_i {x'}_i
$$
* $\varphi_0$ is the **null output** of this model, that is, the **average output** of this model
- $\varphi_i$ is **feature affect**, is how much that feature changes the output of the model, introduced above. Its called **attribution**
![](computer_sci/deep_learning_and_machine_learning/Model_interpretability/attachments/Pasted%20image%2020230329165840.png)
Now Lundberg and Lee go on to describe a set of three desirable properties of such an additive feature method, **local accuracy**, **missingness**, and **consistency**.
### Local accuracy
$$
g(x')\approx f(x) \quad \text{if} \quad x'\approx x
$$
### Missingness
$$
{x_i}' = 0 \rightarrow \varphi_i = 0
$$
if a feature excluded from the model. its attribution must be zero; that is, the only thing that can affect the output of the explanation model is the inclusion of features, not the exclusion.
### Consistency
If feature contribution changes, the feature effect cannot change in the opposite direction
# Why SHAP
Lee and Lundberg in their paper argue that only SHAP satisfies all three properties if **the feature attributions in only additive explanatory model are specifically chosen to be the shapley values of those features**
# SHAP, step-by-step Process, same as shap.explainer
For example, we consider a ice cream shop in the airport, it has four features we can know to predict his business.
$$
\begin{bmatrix}
\text{temperature} & \text{day of weeks} & \text{num of flights} & \text{num of hours}
\end{bmatrix}
\\
\rightarrow \\
\begin{bmatrix}
T & D & F & H
\end{bmatrix}
$$
For, example, we want to know the temperature 80 in sample [80 1 100 4] shapley value, heres the step
- Step 1. Get random permutation of features, and give a bracket to the feature we care and everything in its right. (manually)
$$
\begin{bmatrix}
F & D & \underbrace{T \quad H}
\end{bmatrix}
$$
- Step 2. Pick random sample from dataset
For example, [200 5 70 8], form: [F D T H]
- Step 3. Form vectors $x_1 \quad x_2$
$$
x_1=[100 \quad 1 \quad 80 \quad \color{#BF40BF} 8 \color{#FFFFFF}]
$$
$x_1$ is partially from original sample and partially from the random chosen one, the feature in bracket will from random chosen one, exclude what we care
$$
x_2 = [100 \quad 1 \quad \color{#BF40BF} 70 \quad 8 \color{#FFFFFF}]
$$
$x_2$ just change the feature we care into the same as random chosen ones feature value
Then, calculate the diff and record
$$
DIFF = c_1 - c_2
$$
- Step 4. Record the diff & return to step 1. and repeat many times
$$
\text{SHAP}(T=80 | [80 \quad 1 \quad 100 \quad 4]) = \text{average(DIFF)}
$$
# Shapley kernel
## Too many coalitions need to be sampled
Like we introduce shapley values above, for each $\varphi_i$ we need to sample a lot of coalitions to compute the difference.
For 4 features, we need 64 total coalitions to sample; For 32 features, we need 17.1 billion coalitions to sample.
Its entirely untenable.
So, to get over this difficulty, we need devise a **shapley kernel**, and thats how the Lee and Lundberg do
![](computer_sci/deep_learning_and_machine_learning/Model_interpretability/attachments/Pasted%20image%2020230329181956.png)
## Detail
![](computer_sci/deep_learning_and_machine_learning/Model_interpretability/attachments/Pasted%20image%2020230329182011.png)
Though most of ML models wont just let you omit a feature, what we do is define a **background dataset** B, one that contains a set of representative data points that model was trained over. We then filled in out omitted feature of features with values from background dataset, while holding the features are included in the permutation fixed to their original values. We then take the average of the model output over all of these new synthetic data point as our model output for that feature permutation which we call $\bar{y}$.
$$
E[y_{\text{12i4}}\ \ \forall \ \text{i}\in B] = \bar{y}_{\text{124}}
$$
![](computer_sci/deep_learning_and_machine_learning/Model_interpretability/attachments/Pasted%20image%2020230329205039.png)
Them we have a number of samples computed in this way,like image in left.
We can formulate this as a weighted linear regression, with each feature assigned a coefficient.
And we can prove that, in the special choice, the coefficient can be the shaplely values. **This weighting scheme is the basis of the Shapley Kernal.** In this situation, the weighted linear regression process as a whole is Kernal SHAP.
### Different types of SHAP
- **Kernal SHAP**
- Low-order SHAP
- Linear SHAP
- Max SHAP
- Deep SHAP
- Tree SHAP
![](computer_sci/deep_learning_and_machine_learning/Model_interpretability/attachments/Pasted%20image%2020230329205130.png)
### You need to notice
We can see that, we calculate shapley values using linear regression lastly. So there must be the error here, but some python packages can not give us the error bound, so its confusion to konw if this error come from linear regression or the data, or the model.
# Reference
[Shapley Additive Explanations (SHAP)](https://www.youtube.com/watch?v=VB9uV-x0gtg)
[SHAP: A reliable way to analyze your model interpretability](https://towardsdatascience.com/shap-a-reliable-way-to-analyze-your-model-interpretability-874294d30af6)
[【Python可解释机器学习库SHAP】Python的可解释机器学习库SHAP](https://zhuanlan.zhihu.com/p/483622352)
[Shapley Values : Data Science Concepts](https://www.youtube.com/watch?v=NBg7YirBTN8)
# Appendix
Other methods to interprete model:
[Papers with Code - SHAP Explained](https://paperswithcode.com/method/shap)

View File

@ -0,0 +1,9 @@
---
title: Tokenization
tags:
- NLP
- deep-learning
- tokenization
- basic
---

View File

@ -0,0 +1,58 @@
---
title: Dynamic Time Warping (DTW)
tags:
- metrics
- time-series-dealing
- evalution
---
![](computer_sci/deep_learning_and_machine_learning/Trick/attachments/Pasted%20image%2020230526164724.png)
欧氏距离在时间序列之间可能是一个不好的选择因为时间轴上存在扭曲的情况。DTW 是一个考虑到这种扭曲的测量距离来比较两个时间序列的一个指标本section讲解如何计算 DTW distance
# Detail
## Step 1. 准备输入序列
假设两个time series, A & B
## Step 2. 计算距离矩阵
创建一个距离矩阵,其中的元素表示序列 A 和序列 B 中每个时间点之间的距离。常见的距离度量方法包括欧氏距离、曼哈顿距离、余弦相似度等。根据你的数据类型和需求选择适当的距离度量方法。
## Step 3. 初始化累积距离矩阵
创建一个与距离矩阵大小相同的累积距离矩阵,用于存储从起点到每个位置的累积距离。将起点 (0, 0) 的累积距离设为距离矩阵的起始点距离。
## Step 4. 计算累积距离
从起点开始,按照动态规划的方式计算累积距离矩阵中每个位置的累积距离。对于每个位置 (i, j)**累积距离等于该位置的距离加上三个相邻位置中选择最小累积距离的值。**
$$
DTW(i, j) = d_{i,j} + \min{\{DTW(i-1,j), DTW(i, j-1), DTW(i-1, j-1)\}}
$$
## Step 5. 回溯最优路径
从累积距离矩阵的最右下角开始,根据最小累积距离的路径回溯到起点 (0, 0)。记录下经过的路径,即为最优路径。
## Step 6. 计算最终距离
根据最优路径上的累积距离,计算出最终的 DTW 距离。
# Example
![](computer_sci/deep_learning_and_machine_learning/Trick/attachments/Pasted%20image%2020230526170120.png)
左边是距离矩阵右边是DTW矩阵也就是累积距离矩阵
![](computer_sci/deep_learning_and_machine_learning/Trick/attachments/Pasted%20image%2020230526170921.png)
![](computer_sci/deep_learning_and_machine_learning/Trick/attachments/Pasted%20image%2020230526171119.png)
通过回溯找到optimal warping pathDTW distance就是 the optimal warping path的square root本例中就是$\sqrt{15}$

Binary file not shown.

After

Width:  |  Height:  |  Size: 98 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 521 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 563 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 880 KiB

View File

@ -0,0 +1,63 @@
---
title: Quantile loss
tags:
- loss-function
- deep-learning
- deep-learning-math
---
在大多数现实世界的预测问题中,我们的预测所带来的不确定性具有重要价值。相较于仅仅提供点估计,了解预测范围能够显著改善许多商业应用的决策过程。**Quantile loss**就是为例帮助我们了解预测范围的loss function。
Quantile loss用于衡量预测分布和目标分布之间的差异特别适用于处理不确定性较高的预测问题。
# What is quantile
[Quantile](Math/Statistics/Basic/Quantile.md)
# What is a prediction interval
预测区间是对预测的不确定性进行量化的一种方法。它为结果变量的估计提供了**概率上限和下限的范围**。
![](computer_sci/deep_learning_and_machine_learning/Trick/attachments/Pasted%20image%2020230522151015.png)
输出本身是随机变量,因此具有分布特性。预测区间的目的在于了解结果的正确性可能性。
# What is Quantile Loss
在Quantile loss中我们将预测结果和目标值都表示为分位数形式例如我们可以用预测的α分位数来表示预测结果用真实值的α分位数来表示目标值。然后Quantile loss衡量了这两个分布之间的差异通常使用分位数损失函数来计算。
分位数回归损失函数(Quantile Regression Loss)用于预测分位数(Quantile)。例如对于分位数为0.9的预测应该在90%的情况下做出过高的预测。
对于一条数据prediction是$y_i^p$,真实值是$y_i$mean regression loss for a quantile q:
$$
L(y_i^p, y_i) = \max[q(y_i^p - y_i), (q-1)(y_i^p - y_i)]
$$
一系列prediction数据来通过minimize这个loss function后得到quantile - $q$
## Intuitive Understanding
在上述的回归损失方程中,由于 q 的取值范围在 0 到 1 之间,当进行过高预测($y_i^p$ > $y_i$)时,第一项将为正并占主导地位;而当进行过低预测($y_i^p$ < $y_i$)时,第二项将占主导地位。当 q 等于 0.5 时过低预测和过高预测将受到相同的惩罚因子从而得到中位数。q 的值越大,相比于过低预测,过高预测将受到更严厉的惩罚。例如,当 q 等于 0.75 时,过高预测将受到 0.75 的惩罚因子,而过低预测将受到 0.25 的惩罚因子。模型做出过高预测的可能性的*难度*将会是过低预测可能性的3倍从而得到 0.75 分位数。
## Why Quantile loss
> [!quote]
> **“同方差性”,“恒定方差假设”**
>
> 在最小二乘回归中,预测区间基于一个假设,即残差在自变量的各个取值上具有恒定的方差。这假设被称为“同方差性”或“恒定方差假设”。
>
> 这个假设是基于对回归模型中误差项的性质的一种合理假设。在最小二乘回归中,我们假设因变量的观测值是由真实值和一个误差项组成的,而这个误差项是独立同分布的,即在每个自变量取值上都具有相同的分布。
>
> 如果残差在自变量的各个取值上具有恒定的方差,意味着误差的大小不会随着自变量的变化而发生显著的变化。这样的话,我们可以使用统计方法来计算出预测区间,这个区间能够给出对未来观测值的置信度。
>
> 然而,如果恒定方差假设不成立,也就是残差在自变量的取值上具有不同的方差,那么最小二乘回归的结果可能会出现问题。在这种情况下,预测区间可能会低估或高估预测的不确定性,导致对未来观测值的置信度估计不准确。
Quantile Loss Regression可以提供合理的预测区间即使对于具有非恒定方差或非正态分布的残差也是如此
# Reference
* [Kandi, Shabeel. “Prediction Intervals in Forecasting: Quantile Loss Function.” _Analytics Vidhya_, 24 Apr. 2023, https://medium.com/analytics-vidhya/prediction-intervals-in-forecasting-quantile-loss-function-18f72501586f.](https://medium.com/analytics-vidhya/prediction-intervals-in-forecasting-quantile-loss-function-18f72501586f)

View File

@ -0,0 +1,109 @@
import cv2
import numpy as np
import matplotlib.pyplot as plt
from tkinter import Tk, filedialog
from mpl_toolkits.mplot3d import Axes3D
from sklearn.cluster import KMeans
# Create a Tkinter root window
root = Tk()
root.withdraw()
# Open a file explorer dialog to select an image file
file_path = filedialog.askopenfilename()
# Read the selected image using cv2
image = cv2.imread(file_path)
# Convert the image to RGB color space
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Get the dimensions of the image
height, width, _ = image_rgb.shape
# Reshape the image to a 2D array of pixels, one is pixel number, one is pixel channel
pixels = image_rgb.reshape((height * width, 3))
# Create an empty dataset
dataset = []
# Iterate over each pixel and store the RGB values as a vector in the dataset
for pixel in pixels:
dataset.append(pixel)
# Convert the dataset to a NumPy array
dataset = np.array(dataset)
# Get the RGB values from the dataset
red = dataset[:, 0]
green = dataset[:, 1]
blue = dataset[:, 2]
# plot show
'''
# Plot the histograms
plt.figure(figsize=(10, 6))
plt.hist(red, bins=256, color='red', alpha=0.5, label='Red')
plt.hist(green, bins=256, color='green', alpha=0.5, label='Green')
plt.hist(blue, bins=256, color='blue', alpha=0.5, label='Blue')
plt.title('RGB Value Histogram')
plt.xlabel('RGB Value')
plt.ylabel('Frequency')
plt.legend()
plt.show()
# Plot the 3D scatter graph
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(red, green, blue, c='#000000', s=1)
ax.set_xlabel('Red')
ax.set_ylabel('Green')
ax.set_zlabel('Blue')
ax.set_title('RGB Scatter Plot')
plt.show()
'''
# Perform k-means clustering
num_clusters = 3 # Specify the desired number of clusters
kmeans = KMeans(n_clusters=num_clusters, n_init='auto', random_state=42)
labels = kmeans.fit_predict(dataset)
# Show K-means Clustering result
'''
# Plot the scatter plot for each iteration of the k-means algorithm
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
for i in range(num_clusters):
cluster_points = dataset[labels == i]
ax.scatter(cluster_points[:, 0], cluster_points[:, 1], cluster_points[:, 2], s=1)
ax.set_xlabel('Red')
ax.set_ylabel('Green')
ax.set_zlabel('Blue')
ax.set_title('RGB Scatter Plot - K-Means Clustering')
plt.show()
'''
center_values = kmeans.cluster_centers_.astype(int)
for i in range(num_clusters):
dataset[labels == i] = center_values[i]
# Reshape the pixels array back into an image with the original dimensions and convert it to BGR color space
reshaped_image = dataset.reshape((height, width, 3))
reshaped_image_bgr = cv2.cvtColor(reshaped_image.astype(np.uint8), cv2.COLOR_RGB2BGR)
# Display the image using matplotlib
plt.imshow(reshaped_image)
plt.show()
# Opencv store image
cv2.imwrite('C:/Users/BME51/Desktop/color8bit_style.jpg', reshaped_image_bgr)

Binary file not shown.

After

Width:  |  Height:  |  Size: 730 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 518 KiB

View File

@ -0,0 +1,102 @@
---
title: K-means Clustering Algorithm
tags:
- machine-learning
- clustering
- algorithm
---
# Step by Step
Our algorithm works as follows, assuming we have inputs $x_1, x_2, \cdots, x_n$ and value of $K$
- **Step 1** - Pick $K$ random points as cluster centers called centroids.
- **Step 2** - Assign each $x_i$ to nearest cluster by calculating its distance to each centroid.
- **Step 3** - Find new cluster center by taking the average of the assigned points.
- **Step 4** - Repeat Step 2 and 3 until none of the cluster assignments change.
![](computer_sci/deep_learning_and_machine_learning/clustering/k-means/attachments/k4XcapI.gif)
# Implementation
## Core code
### Distance calculation:
```python
# Euclidean Distance Caculator
def dist(a, b, ax=1):
return np.linalg.norm(a - b, axis=ax)
```
### Generate Random Clustering center at first
```python
# Number of clusters
k = 3
# X coordinates of random centroids
C_x = np.random.randint(0, np.max(X)-20, size=k)
# Y coordinates of random centroids
C_y = np.random.randint(0, np.max(X)-20, size=k)
C = np.array(list(zip(C_x, C_y)), dtype=np.float32)
print(C)
```
### Calculate dis and tag point, then update every tag's new center
```python
# To store the value of centroids when it updates
C_old = np.zeros(C.shape)
# Cluster Lables(0, 1, 2)
clusters = np.zeros(len(X))
# Error func. - Distance between new centroids and old centroids
error = dist(C, C_old, None)
# Loop will run till the error becomes zero
while error != 0:
# Assigning each value to its closest cluster
for i in range(len(X)):
distances = dist(X[i], C)
cluster = np.argmin(distances)
clusters[i] = cluster
# Storing the old centroid values
C_old = deepcopy(C)
# Finding the new centroids by taking the average value
for i in range(k):
points = [X[j] for j in range(len(X)) if clusters[j] == i]
C[i] = np.mean(points, axis=0)
error = dist(C, C_old, None)
```
## Simple approach by scikit-learn
```python
from sklearn.cluster import KMeans
# Number of clusters
kmeans = KMeans(n_clusters=3)
# Fitting the input data
kmeans = kmeans.fit(X)
# Getting the cluster labels
labels = kmeans.predict(X)
# Centroid values
centroids = kmeans.cluster_centers_
# Comparing with scikit-learn centroids
print(C) # From Scratch
print(centroids) # From sci-kit learn
```
# Application
## 8bit style
Read image and use k-means to do clustering for pixel value. Make pic to 8bit color style.
![](computer_sci/deep_learning_and_machine_learning/clustering/k-means/attachments/3ed5fee41bd566be093bebd62a33d12.jpg)
[color8bit_style.py](https://github.com/PinkR1ver/Jude.W-s-Knowledge-Brain/blob/master/Deep_Learning_And_Machine_Learning/clustering/k-means/application/color8bit_style.py)
# Reference
* [K-Means Clustering in Python, https://mubaris.com/posts/kmeans-clustering/. Accessed 3 July 2023.](https://mubaris.com/posts/kmeans-clustering/)

View File

@ -0,0 +1,38 @@
---
title: AdaBoost
tags:
- deep-learning
- ensemble-learning
---
# Video you need to watch first
* [AdaBoost, Clearly Explained](https://www.youtube.com/watch?v=LsK-xG1cLYA)
# Key words and equation
- **Stump(树桩) means classification just by one feature**
- Amount of say
$$
\text{Amout of say} = \frac{1}{2}\log{(\frac{1-\text{Total Error}}{\text{Total Error}})}
$$
- Wrong Classified Sample New Weight
$$
\text{New Sample Weight} = \text{Sample Weight}\times e^{\text{amount of say}}
$$
- Correct Clasified Sample New Weight
$$
\text{New Sample Weight} = \text{Sample Weight}\times e^{-\text{amount of say}}
$$
- After reassing sample weight, do bootstrap sample based on their new weight, it will select big weight sample lots of times to adjust next model
- In last prediction, the **amount of say** decide which results we will pick.
# Question
- **[why decision stumps instead of trees?](https://stats.stackexchange.com/questions/520667/adaboost-why-decision-stumps-instead-of-trees)**

Some files were not shown because too many files have changed in this diff Show More