hugo to v4
BIN
content/Physics/Electromagnetism/attachments/my-life 1.gif
Normal file
|
After Width: | Height: | Size: 70 MiB |
BIN
content/Physics/Electromagnetism/attachments/my-life.gif
Normal file
|
After Width: | Height: | Size: 55 MiB |
BIN
content/Physics/Electromagnetism/attachments/output 2.gif
Normal file
|
After Width: | Height: | Size: 35 MiB |
|
After Width: | Height: | Size: 24 KiB |
|
After Width: | Height: | Size: 9.9 KiB |
|
After Width: | Height: | Size: 12 KiB |
|
After Width: | Height: | Size: 190 KiB |
|
After Width: | Height: | Size: 36 KiB |
|
After Width: | Height: | Size: 112 KiB |
|
After Width: | Height: | Size: 536 KiB |
|
After Width: | Height: | Size: 490 KiB |
97
content/Physics/Optical/optical_abberation.md
Normal file
@ -0,0 +1,97 @@
|
||||
---
|
||||
title: Optical Abberation
|
||||
tags:
|
||||
- optical
|
||||
- photography
|
||||
- basic
|
||||
---
|
||||
|
||||
# What is optical aberration
|
||||
|
||||
光学像差是指镜头设计中的缺陷,它会导致光线散开而不是聚焦以形成清晰的图像。 范围从图像中的所有光线到只有某些点或边缘失焦。 成像时可能会出现几种类型的光学像差。 构建一个校正了所有可能像差的理想视觉系统会显着增加镜头的成本。 实际上,镜头中总会存在某种形式的像差,但将像差的影响降至最低至关重要。 因此,制造任何镜头通常都会做出一些妥协。
|
||||
|
||||
# Circle of confusion
|
||||
|
||||
要解释像差如何使图像模糊,首先要解释一下:什么是混淆圈? 当来自目标的光点到达镜头,然后会聚在传感器上时,它会很清晰。 否则,如果它在传感器之前或之后会聚,则传感器上的光分布会更广。 这可以在图 1 中看到,其中可以看到点光源会聚在传感器上,但随着传感器位置的变化,沿传感器散布的光量也会发生变化。
|
||||
|
||||

|
||||
|
||||
光线越分散,图像的焦点就越少。 除非光圈很小,否则图像中彼此距离较大的目标通常会使背景或前景失焦。 这是因为会聚在前景中的光与来自背景中较远目标的光会聚在不同的点。
|
||||
|
||||
# Types of Optical Aberration
|
||||
|
||||
## Coma(慧差)
|
||||
|
||||
|
||||
彗形像差,又称彗星像差,此种像差的分布形状以类似于彗星的拖尾而得名。
|
||||
|
||||

|
||||
|
||||
这是一些透镜固有的或是光学设计造成的缺点,导致离开光轴的点光源,例如恒星,产生变形。特别是,彗形像差被定义为偏离入射光孔的放大变异。在折射或衍射的光学系统,特别是在宽光谱范围的影像中,彗形像差是波长的函数。
|
||||
|
||||
## Astigmatism (像散)
|
||||
|
||||
在两个垂直平面中传播的光线在聚焦于不同点时可能会产生像散。
|
||||
|
||||
这可以在图 3 中看到,其中两个焦点由红色水平面和蓝色垂直面表示。 图像中的最佳清晰度点将在这两个点之间,其中任一平面的混淆圈都不太宽。
|
||||
|
||||

|
||||
|
||||
当光学器件未对准时,散光会导致图像的侧面和边缘失真。 它通常被描述为在查看图像中的线条时缺乏清晰度。
|
||||
|
||||
这种形式的像差可以使用大多数优质光学器件中的适当透镜设计来校正。 固定散光的光学元件的最初设计是由卡尔蔡司完成的,并且已经发展了一百多年。 在这一点上,它通常只出现在质量非常低的镜头中,或者内部光学元件已损坏或通过镜头滴移动的情况下。
|
||||
|
||||
## (Petzval) Field Curvature (场曲)
|
||||
|
||||
许多镜头都有圆形的焦点。 这会导致图像出现柔和的角,主要是使图像的中心保持在焦点上。 然而,大多数镜头都有一些圆形的焦点,如果不进行一些裁剪,就无法聚焦整个图像。
|
||||
|
||||
场曲是图像平面由于多个焦点而变得不平坦的结果。
|
||||
|
||||

|
||||
|
||||
相机镜头已在很大程度上纠正了这一点,但在许多镜头上可能会发现一些场曲。 一些传感器制造商实际上正在研究可以校正弯曲焦点区域的弯曲传感器。 这种设计将允许传感器校正像差,而不需要以这种精度生产昂贵的镜头设计。 通过实施这种类型的传感器,可以使用更便宜的镜头来产生高质量的结果。 这方面的真实例子可以在开普勒太空天文台看到,那里使用弯曲的传感器阵列来校正望远镜的大型球面光学元件。
|
||||
|
||||
## Distortion (畸变)
|
||||
|
||||
畸变是指当一物体通过Lens系统成像时,会产生一种对物体不同部分有不同的放大率的像差,此种像差会导致物像的相似性变坏。但不影响像的清晰度。 根据对物体周边及中心有放大率的差异此种像差可分为两类:
|
||||
|
||||
### Barrel distortion (桶形畸变)
|
||||
|
||||
具有桶形失真的图像的边缘和侧面远离中心弯曲。 这在视觉上看起来像是图像中有一个凸起,因为它捕获了弯曲视场 (FoV, field of view) 的外观。 例如,当在高层建筑的高处使用较低焦距的镜头(也称为广角镜头)时,可以捕捉到更宽的 FoV。 如图 5 所示,使用产生非常扭曲和宽 FoV 的鱼眼镜头时,这种情况最为夸张。在此图像中,网格线用于帮助说明失真效果如何在靠近侧面的地方向外产生更拉伸的图像, 边缘。
|
||||
|
||||

|
||||
|
||||
|
||||
### Pincushion distortion (枕型畸变)
|
||||
|
||||
当光线通过枕形畸变向光轴弯曲时,图像看起来会向内拉伸。 因此,图像的边缘和侧面看起来会向图像的中心弯曲。
|
||||
|
||||
这种形式的像差最常见于焦距较长的远摄镜头。
|
||||
|
||||

|
||||
|
||||
### Mustache distortion
|
||||
|
||||
**小胡子畸变**😂是枕形失真和桶形失真的组合。 这会导致图像的内部向外弯曲,而图像的外部向内弯曲。 小胡子失真是一种相当罕见的像差,其中不止一种失真模式会影响图像。 小胡子畸变通常是镜头设计非常糟糕的标志,因为这是导致像差融合的光学错误的高潮。
|
||||
|
||||
|
||||
## Chromatic (位置色差)
|
||||
|
||||
### Longitudinal / axial aberration
|
||||
|
||||
光的颜色代表特定波长的光。 由于折射,彩色图像将有多个波长进入镜头并聚焦在不同的点。 纵向或轴向色差是由不同波长聚焦在沿光轴的不同点引起的。 波长越短,其焦点将离镜头越近,而波长越远,则反之,离镜头越远,如图 8 所示。通过引入较小的孔径,进入的光仍可能聚焦在不同的位置 点,但“混淆圈”的宽度(直径)会小得多,导致不那么剧烈的模糊。
|
||||
|
||||

|
||||
|
||||
### Transverse / lateral aberration
|
||||
|
||||
导致不同波长沿图像平面分布的离轴光是横向或横向色差。 这会导致图像中主体边缘出现彩色边纹。 这比纵向色差更难校正。
|
||||
|
||||

|
||||
|
||||
它可以使用引入不同折射率的消色差双合透镜来固定。 通过将可见光谱的两端置于一个焦点上,可以消除色边。 对于横向和纵向色差,减小光圈的大小也有帮助。 此外,在高对比度环境(即具有非常亮的背景的图像)中不成像目标可能是有益的。 在显微镜中,镜头可能使用复消色差透镜 (APO) 而不是消色差透镜,消色差透镜使用三个透镜元件来校正入射光的所有波长。 当颜色最重要时,确保减轻色差将产生最佳效果。
|
||||
|
||||
# Reference
|
||||
|
||||
* [SIX OPTICAL ABERRATIONS THAT COULD BE IMPACTING YOUR VISION SYSTEM, https://www.lumenera.com](https://www.lumenera.com/blog/six-optical-aberrations-that-could-be-impacting-your-vision-system)
|
||||
* [光学像差重要知识点详解|光学经典理论, 知乎 - 监控李誉](https://zhuanlan.zhihu.com/p/40149006)
|
||||
10
content/Physics/Physics_MOC.md
Normal file
@ -0,0 +1,10 @@
|
||||
---
|
||||
title: Physics MOC
|
||||
tags:
|
||||
- physics
|
||||
- MOC
|
||||
---
|
||||
|
||||
# Electromagnetism
|
||||
|
||||
* [Electromagnetism MOC](Physics/Electromagnetism/Electromagnetism_MOC.md)
|
||||
47
content/Physics/Wave/Doppler_Effect.md
Normal file
@ -0,0 +1,47 @@
|
||||
---
|
||||
title: Doppler Effect
|
||||
tags:
|
||||
- physics
|
||||
- basic
|
||||
- wave
|
||||
---
|
||||
|
||||
多普勒效应(**Doppler effect**)是波源和观察者有相对运动时,观察者接受到波的频率与波源发出的频率并不相同的现象。
|
||||
|
||||
远方急驶过来的火车鸣笛声变得尖细(即频率变高,波长变短),而离我们而去的火车鸣笛声变得低沉(即频率变低,波长变长),就是多普勒效应的现象,同样现象也发生在汽车鸣响与火车的敲钟声。
|
||||
|
||||
# General
|
||||
|
||||
在classical physics中,source的speed和receiver的speed远小于wave在medium中的移动速度,observed frequency $f$和emitted frequency$f_0$关系:
|
||||
|
||||
$$
|
||||
f = (\frac{c \pm v_r}{c \pm v_s})f_0
|
||||
$$
|
||||
* $c$是wave在介质中的速度
|
||||
* $v_r$是receiver相对于介质的速度,如果receiver向source移动,则分子为加号,反之为减号
|
||||
* $v_s$是source相对于介质的速度,如果source远离receiver,则分母为加号,反之为减号
|
||||
|
||||
> [!note]
|
||||
> 请注意,此关系预测如果源或接收器中的任何一个远离另一个,频率将会降低。
|
||||
|
||||
$$
|
||||
\frac{f}{v_{wr}} = \frac{f_0}{v_{ws}} = \frac{1}{\lambda}
|
||||
$$
|
||||
* $v_{\omega r}$是wave speed相对于receiver
|
||||
* $v_{\omega s}$是wave speed相对于source
|
||||
* $\lambda$是波长
|
||||
|
||||
## Example
|
||||
|
||||

|
||||
|
||||
其中$v_s = 0.7c$,波前开始在源的右侧(前面)聚集,并在源的左侧(后面)进一步分开。
|
||||
|
||||
在前面的receiver会听到higher frequency,也就是$f = \frac{c}{c-0.7c}f_0 = 3.33f_0$;后面的receiver会听到lower frequency,也就是$f = \frac{c}{c + 0.7c}f_0 = 0.59f_0$
|
||||
|
||||
|
||||
|
||||
|
||||
# Reference
|
||||
|
||||
* [多普勒效应 - Wiki](https://zh.wikipedia.org/wiki/%E5%A4%9A%E6%99%AE%E5%8B%92%E6%95%88%E5%BA%94)
|
||||
|
After Width: | Height: | Size: 568 KiB |
BIN
content/Physics/Wave/attachments/Pasted image 20230418153538.png
Normal file
|
After Width: | Height: | Size: 3.1 KiB |
50
content/Report/2023.04.16 天线测试.md
Normal file
@ -0,0 +1,50 @@
|
||||
|
||||
对天线进行测距能力的测试
|
||||
|
||||
# 背景
|
||||
|
||||

|
||||
|
||||
# 测试结果
|
||||
|
||||
## 无穷远距离测量
|
||||
|
||||
前方30cm内无反射,超出本雷达测距能力极限,近似为无穷远距离内无反射,得到收集端电压
|
||||
|
||||

|
||||
|
||||
以前的天线收集的数据:
|
||||
|
||||

|
||||
|
||||
问题在于两点:
|
||||
|
||||
* 目前天线稳定性不足
|
||||
* 核心信号峰值下降为1.7v左右,而之前核心信号为2.2v
|
||||
|
||||
## 实时测距实验
|
||||
|
||||
*实时测距实验为在天线段实时测量信号并在前面按照时间放置金属挡板检测天线的测距能力。*
|
||||
|
||||
实验大致的放置时间为:
|
||||
1. 0-25s,不放置金属挡板
|
||||
2. 25-50s,金属挡板贴紧天线
|
||||
3. 50-75s,不放置金属挡板
|
||||
4. 75-100s,在10cm处放置金属挡板
|
||||
5. 100-125s,不放置金属挡板
|
||||
6. 125-150s,在20cm处放置金属挡板
|
||||
7. 175-200s,不放置金属挡板
|
||||
8. 150-175s,在30cm处放置金属挡板
|
||||
|
||||
新天线收集数据:
|
||||
|
||||

|
||||
|
||||
旧天线收集信号:
|
||||
|
||||

|
||||
|
||||
问题在于:
|
||||
|
||||
* 新天线信号不稳定,与无穷远测试中的结果吻合。
|
||||
* 导致了不同距离的信号区分度丧失
|
||||
BIN
content/Report/attachments/2477544fc674d675ebb328cba3a74b1.png
Normal file
|
After Width: | Height: | Size: 28 KiB |
BIN
content/Report/attachments/7983094eb03d1dcc285edf9c1768018 1.png
Normal file
|
After Width: | Height: | Size: 27 KiB |
BIN
content/Report/attachments/7983094eb03d1dcc285edf9c1768018.png
Normal file
|
After Width: | Height: | Size: 27 KiB |
BIN
content/Report/attachments/96251ac46494ab01294e570e352c426.png
Normal file
|
After Width: | Height: | Size: 76 KiB |
BIN
content/Report/attachments/abaec3368e16f2c9be67b5edbba39be.png
Normal file
|
After Width: | Height: | Size: 28 KiB |
BIN
content/Report/attachments/ac4c5aa53392835d3db04a78e73476b.png
Normal file
|
After Width: | Height: | Size: 132 KiB |
BIN
content/Report/attachments/f5d557933b15f8ea7f6861f70663d13.png
Normal file
|
After Width: | Height: | Size: 25 KiB |
25
content/_index.md
Normal file
0
content/assets/pdf/NUS_Transcript.pdf.md
Normal file
62
content/atlas.md
Normal file
@ -0,0 +1,62 @@
|
||||
---
|
||||
title: Atlas - Map of Maps
|
||||
tags:
|
||||
- MOC
|
||||
---
|
||||
|
||||
🚧 There are notebooks about his research career:
|
||||
|
||||
* [Deep Learning & Machine Learning](computer_sci/deep_learning_and_machine_learning/Deep%20_Learning_MOC.md)
|
||||
|
||||
* [[synthetic_aperture_radar_imaging/SAR_MOC| Synthetic Aperture Radar(SAR) Imaging]]
|
||||
|
||||
|
||||
💻 Also, his research needs some basic science to support
|
||||
|
||||
* [Data Structure and Algorithm MOC](computer_sci/data_structure_and_algorithm/MOC.md)
|
||||
|
||||
* [Hardware](computer_sci/Hardware/Hardware_MOC.md)
|
||||
|
||||
* [Physics](Physics/Physics_MOC.md)
|
||||
|
||||
* [Signal Processing](signal_processing/signal_processing_MOC.md)
|
||||
|
||||
* [Data Science](data_sci/data_sci_MOC.md)
|
||||
|
||||
* [About coding language design detail](computer_sci/coding_knowledge/coding_lang_MOC.md)
|
||||
|
||||
* [Math](Math/MOC.md)
|
||||
|
||||
* [Computational Geometry](computer_sci/computational_geometry/MOC.md)
|
||||
|
||||
* [Code Framework Learn](computer_sci/code_frame_learn/MOC.md)
|
||||
|
||||
🦺 I also need some tool to help me:
|
||||
|
||||
* [Git](toolkit/git/git_MOC.md)
|
||||
|
||||
💻 Code Practice:
|
||||
|
||||
* [💽Programing Problem Solution Record](https://github.com/PinkR1ver/JudeW-Problemset)
|
||||
|
||||
🛶 Also, he learn some knowledge about his hobbies:
|
||||
|
||||
* [📷 Photography](Photography/Photography_MOC.md)
|
||||
|
||||
* [📮文学](文学/文学_MOC.md)
|
||||
|
||||
* [🥐Food](food/MOC.md)
|
||||
|
||||
* [🎬Watching List](https://pinkr1ver.notion.site/5e136466f3664ff1aaaa75b85446e5b4?v=a41efbce52a84f7aa89d8f649f4620f6&pvs=4)
|
||||
|
||||
⭐ Here to find my recent study:
|
||||
|
||||
* [Recent notes (this function cannot be used on web)](recent.md)
|
||||
* [Papers Recently Read](research_career/papers_read.md)
|
||||
|
||||
🎏 I also have some plans in my mind to do;
|
||||
|
||||
* [Life List🚀](plan/life.md)
|
||||
|
||||
☁️ I also have some daily thoughts:
|
||||
* [Logs](log/log_MOC.md)
|
||||
@ -0,0 +1,22 @@
|
||||
---
|
||||
title: Deep Learning - MOC
|
||||
tags:
|
||||
- MOC
|
||||
- deep-learning
|
||||
---
|
||||
|
||||
# Tech Explanation
|
||||
|
||||
* [⭐Deep Learning MOC](computer_sci/deep_learning_and_machine_learning/deep_learning/deep_learning_MOC.md)
|
||||
|
||||
* [✨Machine Learning MOC](computer_sci/deep_learning_and_machine_learning/machine_learning/MOC.md)
|
||||
|
||||
* [LLM - MOC](computer_sci/deep_learning_and_machine_learning/LLM/LLM_MOC.md)
|
||||
|
||||
# Deep-learning Research
|
||||
|
||||
* [Model Interpretability](computer_sci/deep_learning_and_machine_learning/Model_interpretability/Model_Interpretability_MOC.md)
|
||||
|
||||
* [Famous Model - MOC](computer_sci/deep_learning_and_machine_learning/Famous_Model/Famous_Model_MOC.md)
|
||||
|
||||
* [Model Evaluation - MOC](computer_sci/deep_learning_and_machine_learning/Evaluation/model_evaluation_MOC.md)
|
||||
|
After Width: | Height: | Size: 119 KiB |
|
After Width: | Height: | Size: 119 KiB |
|
After Width: | Height: | Size: 77 KiB |
|
After Width: | Height: | Size: 66 KiB |
|
After Width: | Height: | Size: 68 KiB |
|
After Width: | Height: | Size: 55 KiB |
|
After Width: | Height: | Size: 55 KiB |
|
After Width: | Height: | Size: 52 KiB |
|
After Width: | Height: | Size: 25 KiB |
|
After Width: | Height: | Size: 43 KiB |
@ -0,0 +1,8 @@
|
||||
---
|
||||
title: Model Evaluation - MOC
|
||||
tags:
|
||||
- deep-learning
|
||||
- evaluation
|
||||
---
|
||||
|
||||
* [Model Evaluation in Time Series Forecasting](computer_sci/deep_learning_and_machine_learning/Evaluation/time_series_forecasting.md)
|
||||
@ -0,0 +1,121 @@
|
||||
---
|
||||
title: Model Evaluation in Time Series Forecasting
|
||||
tags:
|
||||
- deep-learning
|
||||
- evaluation
|
||||
- time-series-dealing
|
||||
---
|
||||
|
||||

|
||||
|
||||
# Some famous time series scoring technics
|
||||
|
||||
1. **MAE, RMSE and AIC**
|
||||
2. **Mean Forecast Accuracy**
|
||||
3. **Warning: The time series model EVALUATION TRAP!**
|
||||
4. **RdR Score Benchmark**
|
||||
|
||||
## MAE, RMSE, AIC
|
||||
|
||||
MAE means **Mean Absolute Error (MAE)** and RMSE means **Root Mean Squared Error (RMSE)**.
|
||||
|
||||
这是两个衡量 continuous variables的accuracy的著名指标,MAE在以前的文章中被时常使用,16年的观察已经发现RMSE或者其他version的R-squared逐渐被使用起来
|
||||
|
||||
*我们需要了解何时使用哪种指标会更好*
|
||||
|
||||
### MAE
|
||||
|
||||
$$
|
||||
\text{MAE} = \frac{1}{n}\sum_{j=1}^n |y_j - \hat{y}_j|
|
||||
$$
|
||||
MAE的特点在于所有individual difference有着equal weight
|
||||
|
||||
如果将绝对值去掉,MAE会变成**Mean Bias Error (MBE)**,使用MBE时,要注意正反bias相互抵消
|
||||
|
||||
### RMSE
|
||||
|
||||
$$
|
||||
\text{RMSE} = \sqrt{\frac{1}{n} \sum_{j=1}^n (y_j - \hat{y}_j)^2}
|
||||
$$
|
||||
|
||||
均方根误差(RMSE)是一种二次评分规则,它还测量误差的平均幅度。它是预测值和实际观测值之间差异的平方的平均值的平方根。
|
||||
|
||||
### AIC
|
||||
|
||||
$$
|
||||
\text{AIC} = 2k - 2\ln{(\hat{L})}
|
||||
$$
|
||||
$k$是模型参数的估计,$\hat{L}$是模型似然函数(likelihood function)的最大化值
|
||||
|
||||
**Akaike information criterion**,赤池信息准则(AIC)是一个有助于比较模型的指标,因为它同时考虑了模型对数据的拟合程度和模型的复杂性。
|
||||
|
||||
AIC衡量信息的损失并**对模型的复杂性进行惩罚**。它是*参数数量惩罚后的负对数似然函数*。AIC的主要思想是模型参数越少越好。**AIC允许您测试模型在不过拟合数据集的情况下拟合数据的程度**
|
||||
|
||||
### Comparison
|
||||
|
||||
#### Similarities between MAE and RMSE
|
||||
|
||||
均方误差(MAE)和均方根误差(RMSE)都以感兴趣变量的单位来表示平均模型预测误差。这两个指标都可以在0到∞的范围内变化,并且对误差的方向不敏感。它们是负向评分指标,也就是说数值越低越好。
|
||||
|
||||
#### Differences between MAE and RMSE
|
||||
|
||||
*由于误差在求平均之前被平方,RMSE对大误差给予相对较高的权重*。这意味着在特别不希望出现大误差的情况下,RMSE应该更有用;而在MAE的平均值中,这些大误差将被稀释,
|
||||
|
||||

|
||||
|
||||
AIC the lower is better,但没有perfect score,只能用来相同dataset下不同model的性能
|
||||
|
||||
## Mean Forecast Accuracy
|
||||
|
||||

|
||||
|
||||
计算每个点的Forecast Accuracy,然后求平均,得到 Mean Forecast Accuracy
|
||||
|
||||
Mean Forecast Accuracy的重大缺陷在大的偏离值造成巨大的负面影响,比如$1 - \frac{|\hat{y}_j - y_j|}{y_j} = 1 - \frac{250-25}{25} = -800\%$
|
||||
|
||||
解决方案是将Forecast Accuracy的最小值限制为0%,同时可以使用Median代替Mean。
|
||||
|
||||
一般来说,**当你的误差分布偏斜时,你应该使用 Median 而不是 Mean**。 在某些情况下,Mean Forecast Accuray也可能毫无意义。 如果你还记得你的统计数据; 变异系数 (**coefficient of variation**, CV) 表示标准偏差与平均值的比率($\text{CV} = (\text{Standard Deviation}/\text{Mean} * 100)$)。 大 CV 值意味着大变异性,这也意味着围绕均值的离差程度更大。 **例如,我们可以将 CV 高于 0.7 的任何事物视为高度可变且不可真正预测的。 另外,还可以说明你的预测模型预测能力很不稳定!**
|
||||
|
||||
## RdR Score Benchmark (这是一个具有实验性的指标,blogger指出这个指标并没有在research paper出现过)
|
||||
|
||||
RdR metric stands for:
|
||||
* *R*: **Naïve Random Walk**
|
||||
* *d*: **Dynamic Time Warping**
|
||||
* *R*: **Root Mean Squared Error**
|
||||
|
||||
### DTW to deal with shape similarity
|
||||
|
||||

|
||||
|
||||
RMSE、MAE这些指标都没有考虑到一个重要的标准:**THE SHAPE SIMILARITY**
|
||||
|
||||
RdR Score Benchmark使用 [**Dynamic Time Warping(DTW,动态时间调整)** ](computer_sci/deep_learning_and_machine_learning/Trick/DTW.md)作为shape similarity的指标
|
||||
|
||||

|
||||
欧氏距离在时间序列之间可能是一个不好的选择,因为时间轴上存在扭曲的情况。
|
||||
|
||||
* DTW:通过“同步”/“对齐”时间轴上的不同信号,找到两个时间序列之间的最佳(最小距离)扭曲路径
|
||||
|
||||
### RdR score means
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
*RdR score*通过RMSE和DTW distance来计算,用于比较你的model和Radnom Walk(*Random Walk的RdR score = 0*)相比的优越性
|
||||
|
||||
### RdR calculation details
|
||||
|
||||
可以通过绘制 RMSE vs. DTW来计算RdR score,绘制的图如下所示:
|
||||
|
||||

|
||||
|
||||
|
||||
计算矩阵面积来计算RdR score,(文章里并没有完整介绍计算,在[github code](https://github.com/CoteDave/blog/tree/master/RdR%20score)里有,并不确定)
|
||||
|
||||
# Reference
|
||||
|
||||
* M.Sc, Dave Cote. “RdR Score Metric for Evaluating Time Series Forecasting Models.” _Medium_, 8 Feb. 2022, https://medium.com/@dave.cote.msc/rdr-score-metric-for-evaluating-time-series-forecasting-models-1c23f92f80e7.
|
||||
* JJ. “MAE and RMSE — Which Metric Is Better?” _Human in a Machine World_, 23 Mar. 2016, https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d.
|
||||
* _Accelerating Dynamic Time Warping Subsequence Search with GPU_. https://www.slideshare.net/DavideNardone/accelerating-dynamic-time-warping-subsequence-search-with-gpu. Accessed 29 May 2023.
|
||||
@ -0,0 +1,77 @@
|
||||
---
|
||||
title: DeepAR - Time Series Forcasting
|
||||
tags:
|
||||
- deep-learning
|
||||
- model
|
||||
- time-series-dealing
|
||||
---
|
||||
|
||||
DeepAR, an autoregressive recurrent network developed by Amazon, is the first model that could natively work on multiple time-series. It's a milestone in time-series community.
|
||||
|
||||
# What is DeepAR
|
||||
|
||||
> [!quote]
|
||||
> DeepAR is the first successful model to combine Deep Learning with traditional Probabilistic Forecasting.
|
||||
|
||||
* **Multiple time-series support**
|
||||
* **Extra covariates**: *DeepAR* allows extra features, covariates. It is very important for me when I learn *DeepAR*, because in my task, I have corresponding feature for each time series.
|
||||
* **Probabilistic output**: Instead of making a single prediction, the model leverages [**quantile loss**](computer_sci/deep_learning_and_machine_learning/Trick/quantile_loss.md) to output prediction intervals.
|
||||
* **“Cold” forecasting:** By learning from thousands of time-series that potentially share a few similarities, _DeepAR_ can provide forecasts for time-series that have little or no history at all.
|
||||
|
||||
# Block used in DeepAR
|
||||
|
||||
* [LSTM](computer_sci/deep_learning_and_machine_learning/deep_learning/LSTM.md)
|
||||
|
||||
# *DeepAR* Architecture
|
||||
|
||||
DeepAR模型并不直接使用LSTMs去计算prediction,而是去估计Gaussian likelihood function的参数,即$\theta=(\mu,\sigma)$,估计Gaussian likelihood function的mean和standard deviation。
|
||||
|
||||
## Training Step-by-Step
|
||||
|
||||

|
||||
|
||||
假设目前我们在time-series $i$ 的 t 时刻,
|
||||
|
||||
1. LSTM cell会输入covariates $x_{i,t}$,即$x_i$在t时刻的值,还有上一时刻的target variable,$z_{i,t-1}$,LSTM还需要输入上一时刻的隐藏状态$h_{i,t-1}$
|
||||
2. LSTM紧接着就会输出当前的hidden state $h_{i,t}$,会输入到下一步中
|
||||
3. Gaussian likelihood function里的parameter,$\mu$和$\sigma$会从$h_{i,t}$中不直接计算出,计算细节在后面
|
||||
|
||||
> [!quote]
|
||||
> 换言之,这个模型是为了得到最好的$\mu$和$\sigma$去构建gaussian distribution,让预测更接近$z_{i,t}$;同时,因为*DeepAR*每次都是train and predicts a single data point,所以这个模型也被称为autoregressive模型
|
||||
|
||||
|
||||
## Inference Step-by-Step
|
||||
|
||||
|
||||

|
||||
|
||||
|
||||
在使用model进行预测的时候,某一改变的就是使用预测值$\hat{z}$ 代替真实值$z$,同时$\hat{z}_{i,t}$是在我们模型学习到的Gaussian distribution里sample得到的,而这个Gaussian distribution里的参数$\mu$和$\sigma$并不是model直接学习到的,*DeepAR*如何做到这一点的呢?
|
||||
|
||||
# Gaussian Likelihood
|
||||
|
||||
$$
|
||||
\ell_G(z|\mu,\sigma) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp{(-\frac{(z-\mu)^2)}{2\sigma^2}}
|
||||
$$
|
||||
|
||||
Estimate gaussian distribution的任务一般会被转化成maximize gaussian log-likelihood function的任务,即**MLEformulas**(maximum log-likelihood estimators)
|
||||
**Gaussian log-likelihood function**:
|
||||
|
||||
$$
|
||||
\mathcal{L} = \sum_{i=1}^{N}\sum_{t=t_o}^{T} \log{\ell(z_{i,t}|\theta(h_{i,t}))}
|
||||
$$
|
||||
|
||||
|
||||
# Parameter estimation in *DeepAR*
|
||||
|
||||
|
||||
在统计学中,预估Gaussian Distribution一般使用MLEformulas,但是在*DeepAR*中,并不这么去做,而是使用两个dense layer去做预估,如下图:
|
||||
|
||||

|
||||
|
||||
使用dense layer的方式去预估Gaussian distribution的原因在于,可以使用backpropagation
|
||||
|
||||
|
||||
# Reference
|
||||
|
||||
* [https://towardsdatascience.com/deepar-mastering-time-series-forecasting-with-deep-learning-bc717771ce85](https://towardsdatascience.com/deepar-mastering-time-series-forecasting-with-deep-learning-bc717771ce85)
|
||||
@ -0,0 +1,11 @@
|
||||
---
|
||||
title: Famous Model MOC
|
||||
tags:
|
||||
- deep-learning
|
||||
- MOC
|
||||
---
|
||||
|
||||
# Time-series
|
||||
|
||||
* [DeepAR](computer_sci/deep_learning_and_machine_learning/Famous_Model/DeepAR.md)
|
||||
|
||||
@ -0,0 +1,8 @@
|
||||
---
|
||||
title: Temporal Fusion Transformer
|
||||
tags:
|
||||
- deep-learning
|
||||
- model
|
||||
- time-series-dealing
|
||||
---
|
||||
|
||||
|
After Width: | Height: | Size: 44 KiB |
|
After Width: | Height: | Size: 44 KiB |
|
After Width: | Height: | Size: 55 KiB |
|
After Width: | Height: | Size: 65 KiB |
@ -0,0 +1,25 @@
|
||||
---
|
||||
title: Large Language Model(LLM) - MOC
|
||||
tags:
|
||||
- deep-learning
|
||||
- LLM
|
||||
- NLP
|
||||
---
|
||||
|
||||
# Training
|
||||
|
||||
* [Training Tech Outline](computer_sci/deep_learning_and_machine_learning/LLM/train/steps.md)
|
||||
* [⭐⭐⭐Train LLM from scratch](computer_sci/deep_learning_and_machine_learning/LLM/train/train_LLM.md)
|
||||
* [⭐⭐⭐Detailed explanation of RLHF technology](computer_sci/deep_learning_and_machine_learning/LLM/train/RLHF.md)
|
||||
* [How to do use fine tune tech to create your chatbot](computer_sci/deep_learning_and_machine_learning/LLM/train/finr_tune/how_to_fine_tune.md)
|
||||
* [Learn finetune by Stanford Alpaca](computer_sci/deep_learning_and_machine_learning/LLM/train/finr_tune/learn_finetune_byStanfordAlpaca.md)
|
||||
|
||||
# Metrics
|
||||
|
||||
How to evaluate a LLM performance?
|
||||
|
||||
* [Tasks to evaluate BERT - Maybe can be deployed in other LM](computer_sci/deep_learning_and_machine_learning/LLM/metircs/some_task.md)
|
||||
|
||||
# Basic
|
||||
|
||||
* [LLM Hyperparameter](computer_sci/deep_learning_and_machine_learning/LLM/basic/llm_hyperparameter.md)
|
||||
|
After Width: | Height: | Size: 216 KiB |
|
After Width: | Height: | Size: 216 KiB |
|
After Width: | Height: | Size: 173 KiB |
|
After Width: | Height: | Size: 444 KiB |
|
After Width: | Height: | Size: 28 KiB |
|
After Width: | Height: | Size: 6.5 MiB |
|
After Width: | Height: | Size: 1.8 MiB |
@ -0,0 +1,56 @@
|
||||
---
|
||||
title: LLM hyperparameter
|
||||
tags:
|
||||
- hyperparameter
|
||||
- LLM
|
||||
- deep-learning
|
||||
- basic
|
||||
---
|
||||
|
||||
# LLM Temperature
|
||||
|
||||
Temperature definition come from the physical meaning of temperature. The more higher temperature, the atoms moving more faster, meaning more randomness.
|
||||
|
||||

|
||||
|
||||
LLM temperature is a hyperparameter that regulates **the randomness, or creativity.**
|
||||
|
||||
* Higher the LLM temperature, more diverse and creative, increasing likelihood of straying from context.
|
||||
* Lower the LLM temperature, more focused and deterministic, sticking closely to the most likely prediction
|
||||
|
||||

|
||||
|
||||
## More detail
|
||||
|
||||
The LLM model is to give a probability of next word, like this:
|
||||
|
||||

|
||||
|
||||
"A cat is chasing a …", there are lots of words can be filled in that blank. Different words have different probabilities, in the model, we output the next word ratings.
|
||||
|
||||
Sure, we can always pick the highest rating word, but that would result in very standard predictable boring sentences, and the model wouldn't be equivalent to human language, because we don't always use the most common word either.
|
||||
|
||||
So, we want to design a mechanism that **allows all words with a decent rating to occur with a reasonable probability**, that's why we need temperature in LLM model.
|
||||
|
||||
Like real physic world, we can do samples to describe the distribution, *we use SoftMax to describe the distribution of the probability of the next word*. The temperature is the element $T$ in the formula:
|
||||
|
||||
$$
|
||||
p_i = \frac{\exp{(\frac{R_i}{T})}}{\sum_i \exp{(\frac{R_i}{T})}}
|
||||
$$
|
||||
|
||||

|
||||
|
||||
More lower the $T$, the higher rating word's probability will goes to 100%, and more higher the $T$, the probability will be more smoother for very words.
|
||||
|
||||
*The gif below is important and intuitive.*
|
||||
|
||||

|
||||
|
||||
So, set different $T$, the next word's probability will be changed, we will output next word depending on the probability.
|
||||
|
||||

|
||||
|
||||
# Reference
|
||||
|
||||
* [LLM Temperature, dedpchecks](https://deepchecks.com/glossary/llm-parameters/#:~:text=One%20intriguing%20parameter%20within%20LLMs,of%20straying%20from%20the%20context.)
|
||||
* [⭐⭐⭐https://www.youtube.com/watch?v=YjVuJjmgclU](https://www.youtube.com/watch?v=YjVuJjmgclU)
|
||||
|
After Width: | Height: | Size: 272 KiB |
@ -0,0 +1,44 @@
|
||||
---
|
||||
title: LangChain Explained
|
||||
tags:
|
||||
- LLM
|
||||
- basic
|
||||
- langchain
|
||||
---
|
||||
|
||||
# What is LangChain
|
||||
|
||||
LangChain is an open source framework that allows AI developers to combine LLMs like GPT-4 *with external sources of computation and data*.
|
||||
|
||||
# Why LangChain
|
||||
|
||||
LangChain can make LLM answer question depending on your own documents. It can help you doing lots of amazing apps.
|
||||
|
||||
You can use LangChain to make GPT to do analysis on your own company data, booking flight depending on schedule. summarizing abstract on bunches of PDFs, .….
|
||||
|
||||
# LangChain value propositions
|
||||
|
||||
## Components
|
||||
|
||||
* LLM Wrappers
|
||||
* Prompt Templates
|
||||
* Indexes for relevant information retrieval
|
||||
|
||||
## Chains
|
||||
|
||||
Assemble components to solve a specific task - finding info in a book...
|
||||
|
||||
## Agents
|
||||
|
||||
Agents allow LLMs to interact with it's environment. - For instance, make API request with a specific action
|
||||
|
||||
# LangChain Framework
|
||||
|
||||

|
||||
|
||||
|
||||
|
||||
# Reference
|
||||
|
||||
* [https://www.youtube.com/watch?v=aywZrzNaKjs](https://www.youtube.com/watch?v=aywZrzNaKjs)
|
||||
*
|
||||
|
After Width: | Height: | Size: 88 KiB |
|
After Width: | Height: | Size: 88 KiB |
@ -0,0 +1,36 @@
|
||||
---
|
||||
title: Tasks to evaluate BERT - Maybe can be deployed in other LM
|
||||
tags:
|
||||
- LLM
|
||||
- metircs
|
||||
- deep-learning
|
||||
- benchmark
|
||||
---
|
||||
|
||||
# Overview
|
||||
|
||||

|
||||
|
||||
# MNLI-m (Multi-Genre Natural Language Inference - Matched):
|
||||
|
||||
MNLI-m is a benchmark dataset and task for natural language inference (NLI). The goal of NLI is to determine the logical relationship between two given sentences: whether the relationship is "entailment," "contradiction," or "neutral." MNLI-m focuses on matched data, which means the sentences are drawn from the same genres as the sentences in the training set. It is part of the GLUE (General Language Understanding Evaluation) benchmark, which evaluates the performance of models on various natural language understanding tasks.
|
||||
|
||||
# QNLI (Question Natural Language Inference):
|
||||
|
||||
QNLI is another NLI task included in the GLUE benchmark. In this task, the model is given a sentence that is a premise and a sentence that is a question related to the premise. The goal is to determine whether the answer to the question can be inferred from the given premise. The dataset for QNLI is derived from the Stanford Question Answering Dataset (SQuAD).
|
||||
|
||||
# MRPC (Microsoft Research Paraphrase Corpus):
|
||||
|
||||
MRPC is a dataset used for paraphrase identification or semantic equivalence detection. It consists of sentence pairs from various sources that are labeled as either paraphrases or not. The task is to classify whether a given sentence pair expresses the same meaning (paraphrase) or not. MRPC is also part of the GLUE benchmark and helps evaluate models' ability to understand sentence similarity and equivalence.
|
||||
|
||||
# SST-2 (Stanford Sentiment Treebank - Binary Sentiment Classification):
|
||||
|
||||
SST-2 is a binary sentiment classification task based on the Stanford Sentiment Treebank dataset. The dataset contains sentences from movie reviews labeled as either positive or negative sentiment. The task is to classify a given sentence as expressing a positive or negative sentiment. SST-2 is often used to evaluate the ability of models to understand and classify sentiment in natural language.
|
||||
|
||||
# SQuAD (Stanford Question Answering Dataset):
|
||||
|
||||
SQuAD is a widely known dataset and task for machine reading comprehension. It consists of questions posed by humans on a set of Wikipedia articles, where the answers to the questions are spans of text from the corresponding articles. The goal is to build models that can accurately answer the questions based on the provided context. SQuAD has been instrumental in advancing the field of question answering and evaluating models' reading comprehension capabilities.
|
||||
|
||||
Overall, these tasks and datasets serve as benchmarks for evaluating natural language understanding and processing models. They cover a range of language understanding tasks, including natural language inference, paraphrase identification, sentiment analysis, and machine reading comprehension.
|
||||
|
||||
|
||||
@ -0,0 +1,65 @@
|
||||
---
|
||||
title: Reinforcement Learning from Human Feedback
|
||||
tags:
|
||||
- LLM
|
||||
- deep-learning
|
||||
- RLHF
|
||||
- LLM-training-method
|
||||
---
|
||||
|
||||
|
||||
# Review: Reinforcement Learning Basics
|
||||
|
||||

|
||||
|
||||
|
||||
Reinforcement learning is a mathematical framework.
|
||||
|
||||
Demystify the reinforcement learning model, it's a open-ended model using reward function to optimize agent to solve complex task in target environment.
|
||||
|
||||
<!---
|
||||
# Origins of RLHF
|
||||
|
||||
## Pre Deep RL
|
||||
|
||||

|
||||
|
||||
|
||||
Before, Deep RL don't use neural network to represent policy. What this system did was a machine learning system that created a policy by having humans label the actions that an agent took as being kind of correct or incorrect. This was just a simple decision rule where humans labeled every actions as good or bad. This was essentially a reward model and a policy put together.
|
||||
|
||||
## For Deep RL
|
||||
|
||||

|
||||
|
||||
--->
|
||||
|
||||
# Step by Step
|
||||
|
||||
For RLHF training method, here are three core steps:
|
||||
|
||||
1. Pretraining a language model
|
||||
2. Gathering data(问答数据) and training a reward model
|
||||
3. Fine-tuning the LM with reinforcement learning
|
||||
|
||||
## Step 1. Pretraining Language Models
|
||||
|
||||
Read this to learn how to train a LM:
|
||||
|
||||
[Pretraining language models](computer_sci/deep_learning_and_machine_learning/LLM/train/train_LLM.md)
|
||||
|
||||
OpenAI used a smaller version of GPT-3 for its first popular RLHF model - InstructGPT.
|
||||
|
||||
Nowadays, RLHF is new area, there's no answer to which model is the best for starting point of RLHF and using expensive augmented data to fine-tune is not necessarily.
|
||||
|
||||
## Step 2. Reward model training
|
||||
|
||||
In reward model, we integrate human preferences into the system.
|
||||
|
||||

|
||||
|
||||
|
||||
|
||||
# Reference
|
||||
|
||||
* [Reinforcement Learning from Human Feedback: From Zero to chatGPT, YouTube, HuggingFace](https://www.youtube.com/watch?v=2MBJOuVq380)
|
||||
* [Hugging Face blog, ChatGPT 背后的“功臣”——RLHF 技术详解](https://huggingface.co/blog/zh/rlhf)
|
||||
|
After Width: | Height: | Size: 62 KiB |
|
After Width: | Height: | Size: 70 KiB |
|
After Width: | Height: | Size: 47 KiB |
|
After Width: | Height: | Size: 90 KiB |
|
After Width: | Height: | Size: 86 KiB |
@ -0,0 +1,8 @@
|
||||
---
|
||||
title: How to make custom dataset?
|
||||
tags:
|
||||
- dataset
|
||||
- LLM
|
||||
- deep-learning
|
||||
---
|
||||
|
||||
|
After Width: | Height: | Size: 240 KiB |
@ -0,0 +1,7 @@
|
||||
---
|
||||
title: How to do use fine tune tech to create your chatbot
|
||||
tags:
|
||||
- deep-learning
|
||||
- LLM
|
||||
---
|
||||
|
||||
@ -0,0 +1,19 @@
|
||||
---
|
||||
title: Learn finetune by Stanford Alpaca
|
||||
tags:
|
||||
- deep-learning
|
||||
- LLM
|
||||
- fine-tune
|
||||
- LLaMA
|
||||
---
|
||||
|
||||

|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
# Reference
|
||||
|
||||
* [https://www.youtube.com/watch?v=pcszoCYw3vc](https://www.youtube.com/watch?v=pcszoCYw3vc)
|
||||
* [https://crfm.stanford.edu/2023/03/13/alpaca.html](https://crfm.stanford.edu/2023/03/13/alpaca.html)
|
||||
@ -0,0 +1,24 @@
|
||||
---
|
||||
title: LLM training steps
|
||||
tags:
|
||||
- LLM
|
||||
- deep-learning
|
||||
---
|
||||
|
||||
训练大型语言模型(LLM)的方法通常涉及以下步骤:
|
||||
|
||||
1. **数据收集**:收集大规模的文本数据作为训练数据。这些数据可以是互联网上的文本、书籍、文章、新闻、对话记录等。数据的质量和多样性对于训练出高质量的LLM非常重要。
|
||||
|
||||
2. **预处理**:对数据进行预处理以使其适合模型训练。这包括分词(将文本划分为词或子词单元)、建立词汇表(将词映射到数字表示)、清理和规范化文本等操作。
|
||||
|
||||
3. **构建模型架构**:选择适当的模型架构来构建LLM。目前最常用的模型架构是Transformer,其中包含多层的自注意力机制和前馈神经网络层。
|
||||
|
||||
4. **预训练**:使用大规模的文本数据集对模型进行预训练。预训练是指在无监督的情况下,通过让模型学习预测缺失的词语或下一个词语等任务来提取语言知识。这使得模型能够学习到丰富的语言表示。
|
||||
|
||||
5. **微调(Fine-tuning)**:在预训练之后,使用特定的任务数据对模型进行微调。微调是指在特定任务的标注数据上进行有监督的训练,例如文本生成、问题回答等。通过微调,模型可以更好地适应特定任务的要求。
|
||||
|
||||
6. **超参数调优**:调整模型的超参数,例如学习率、批量大小、模型层数等,以获得更好的性能和效果。
|
||||
|
||||
7. **评估和迭代**:对训练后的模型进行评估,并根据评估结果进行迭代改进。这可能包括调整模型架构、增加训练数据、调整训练策略等。
|
||||
|
||||
这些步骤通常是迭代进行的,通过不断的训练和改进,使LLM能够在各种自然语言处理任务中展现出更好的性能和生成能力。值得注意的是,LLM的训练需要大量的计算资源和时间,并且通常由专业团队在大规模的计算环境中进行。
|
||||
@ -0,0 +1,143 @@
|
||||
---
|
||||
title: Train LLM from scratch
|
||||
tags:
|
||||
- LLM
|
||||
- LLM-training-method
|
||||
- deep-learning
|
||||
---
|
||||
|
||||
# Find a dataset
|
||||
|
||||
Find a corpus of text in language you prefer.
|
||||
* Such as [OSCAR](https://oscar-project.org/)
|
||||
|
||||
Intuitively, the more data you can get to pretrain on, the better results you will get.
|
||||
|
||||
# Train a tokenizer
|
||||
|
||||
There are something you need take into consideration when train a tokenizer
|
||||
|
||||
## Tokenization
|
||||
|
||||
You can read more detailed post - [Tokenization](computer_sci/deep_learning_and_machine_learning/NLP/basic/tokenization.md)
|
||||
|
||||
Tokenization is the process of **breaking text into words of sentences**. These tokens helps machine to learn context of the text. This helps in *interpreting the meaning behind the text*. Hence, tokenization is *the first and foremost process while working on the text*. Once the tokenization is performed on the corpus, the resulted tokens can be used to prepare vocabulary which can be used for further steps to train the model.
|
||||
|
||||
Example:
|
||||
|
||||
“The city is on the river bank” -> “The”, ”city”, ”is”, ”on”, ”the”, ”river”, ”bank”
|
||||
|
||||
Here are some typical tokenization:
|
||||
* Word ( White Space ) Tokenization
|
||||
* Character Tokenization
|
||||
* **Subword Tokenization (SOTA)**
|
||||
|
||||
|
||||
Subword Tokenization can handle OOV(Out Of Vocabulary) problem effectively.
|
||||
|
||||
### Subword Tokenization Algorithm
|
||||
|
||||
* **Byte pair encoding** *(BPE)*
|
||||
* **Byte-level byte pair encoding**
|
||||
* **WordPiece**
|
||||
* **unigram**
|
||||
* **SentencePiece**
|
||||
|
||||
## Word embedding
|
||||
|
||||
After tokenization, we make our text into token. We also wants to present token in math type. Here we use word embedding technique, converting word to math.
|
||||
|
||||
Here are some typical word embedding algorithms:
|
||||
|
||||
* **Word2Vec**
|
||||
* skip-gram
|
||||
* continuous bag-of-words (CBOW)
|
||||
* **GloVe** (Global Vectors for Word Representations)
|
||||
* **FastText**
|
||||
* **ELMo** (Embeddings from Language Models)
|
||||
* **BERT** (Bidirectional Encoder Representations from Transformers)
|
||||
* a language model rather than a traditional word embedding algorithm. **While BERT does generate word embeddings as a byproduct of its training process**, its primary purpose is to learn contextualized representations of words and text segments.
|
||||
|
||||
# Train a language model from scratch
|
||||
|
||||
We need clear the definition of language model.
|
||||
|
||||
## Language model definition
|
||||
|
||||
Simply to say, the language model is a computational model or algorithm that is designed to understand and generate human language. It is a type of artificial intelligence(AI) model that uses *statistical and probabilistic techniques to predict and generate sequences of words and sentences*.
|
||||
|
||||
It captures the statistical relationships between words or characters and *builds a probability distribution of the likelihood of a particular word or sequence of words appearing in a given context.*
|
||||
|
||||
Language model can be used for various NLP tasks, including machine translation, speech recognition, text generation and so on....
|
||||
|
||||
As usual, a language model takes a seed input or prompt and uses its *learned knowledge of language(model weights)* to predict most likely words or characters to follow.
|
||||
|
||||
The SOTA of language model today is GPT-4.
|
||||
|
||||
## Language model algorithm
|
||||
|
||||
|
||||
### Classical LM
|
||||
|
||||
* **n-gram**
|
||||
* N-gram can be used as *both a tokenization algorithm and a component of a language model*. In my searching experience, n-grams are easier to understand as a language model to predict a likelihood distribution.
|
||||
* **HMMs** (Hidden Markov Models)
|
||||
* **RNNs** (Recurrent Neural Networks)
|
||||
|
||||
### Cutting-edge
|
||||
|
||||
* **GPT** (Generative Pre-trained Transformer)
|
||||
* **BERT** (Bidirectional Encoder Representations from Transformers)
|
||||
* **T5** (Text-To-Text Transfer Transformer)
|
||||
* **Megatron-LM**
|
||||
|
||||
## Train Method
|
||||
|
||||
Different designed models usually have different training methods. Here we take BERT-like model as example.
|
||||
|
||||
### BERT-Like model
|
||||
|
||||

|
||||
|
||||
To train BERT-Like model, we'll train it on a task of **Masked Language Modeling**(MLM), i.e. the predict how to fill arbitrary tokens that we randomly mask in the dataset.
|
||||
|
||||
Also, we'll train BERT-Like model using **Next Sentence Prediction** (NSP). *MLM teaches BERT to understand relationships between words and NSP teaches BERT to understand long-term dependencies across sentences.* In NSP training, give BERT two sentences, A and B, then BERT will determine B is A's next sentence or not, i.e. outputting `IsNextSentence` or `NotNextSentence`
|
||||
|
||||
With NSP training, BERT will have better performance.
|
||||
|
||||
| Task | MNLI-m (acc) | QNLI (acc) | MRPC (acc) | SST-2 (acc) | SQuAD (f1) |
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
| With NSP | 84.4 | 88.4 | 86.7 | 92.7 | 88.5 |
|
||||
| Without NSP | 83.9 | 84.9 | 86.5 | 92.6 | 87.9 |
|
||||
|
||||
[Table source](https://arxiv.org/pdf/1810.04805.pdf)
|
||||
[Table metrics explain](computer_sci/deep_learning_and_machine_learning/LLM/metircs/some_task.md)
|
||||
|
||||
|
||||
# Check LM actually trained
|
||||
|
||||
## Take BERT as example
|
||||
|
||||
Aside from looking at the training and eval losses going down, we can check our model using `FillMaskPipeline`.
|
||||
|
||||
This is a method input *a masked token (here, `<mask>`) and return a list of the most probable filled sequences, with their probabilities.*
|
||||
|
||||
With this method, we can see our LM captures more semantic knowledge or even some sort of (statistical) common sense reasoning.
|
||||
|
||||
# Fine-tune our LM on a downstream task
|
||||
|
||||
Finally, we can fine-tune our LM on a downstream task such as translation, chatbot, text generation and so on.
|
||||
|
||||
Different downstream task may need different methods to do fine-tune.
|
||||
|
||||
# Example
|
||||
|
||||
[https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/01_how_to_train.ipynb#scrollTo=G-kkz81OY6xH](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/01_how_to_train.ipynb#scrollTo=G-kkz81OY6xH)
|
||||
|
||||
|
||||
# Reference
|
||||
|
||||
* [HuggingFace blog, How to train a new language model from scratch using Transformers and Tokenizers](https://huggingface.co/blog/how-to-train)
|
||||
* [Medium blog, NLP Tokenization](https://medium.com/nerd-for-tech/nlp-tokenization-2fdec7536d17)
|
||||
* [Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. (2018). Improving language understanding by generative pre-training. , .](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf)
|
||||
|
||||
@ -0,0 +1,9 @@
|
||||
---
|
||||
title: Model Interpretability - MOC
|
||||
tags:
|
||||
- MOC
|
||||
- deep-learning
|
||||
- interpretability
|
||||
---
|
||||
|
||||
* [SHAP](computer_sci/deep_learning_and_machine_learning/Model_interpretability/SHAP.md)
|
||||
@ -0,0 +1,193 @@
|
||||
---
|
||||
title: SHAP - a reliable way to analyze model interpretability
|
||||
tags:
|
||||
- deep-learning
|
||||
- interpretability
|
||||
- algorithm
|
||||
---
|
||||
|
||||
SHAP is the most popular model-agnostic technique that is used to explain predictions. SHAP stands for **SH**apley **A**dditive ex**P**lanations
|
||||
|
||||
Shapely values are obtained by incorporating concepts from *Cooperative Game Theory* and *local explanations*
|
||||
|
||||
# Mathematical and Algorithm Foundation
|
||||
|
||||
## Shapely Values
|
||||
|
||||
Shapely values were from game theory and invented by Lloyd Shapley. Shapely values were invented to be a way of providing a fair solution to the following question:
|
||||
|
||||
> [!question]
|
||||
> If we have a coalition **C** that collaborates to produce a value **V**: How much did each individual member contribute to the final value
|
||||
|
||||
The method here we assess each individual member’s contribution is to removing each member to get a new coalition and then compare their production, like this graphs:
|
||||
|
||||

|
||||
|
||||
And then, we get every member 1 included or not included coalitions like this:
|
||||
|
||||

|
||||
|
||||
Using left value - right value, we can get difference like image left above; And then we calculate the mean of them:
|
||||
|
||||
$$
|
||||
\varphi_i=\frac{1}{\text{Members}}\sum_{\forall \text{C s.t. i}\notin \text{C}} \frac{\text{Marginal Contribution of i to C}}{\text{Coalitions of size |C|}}
|
||||
$$
|
||||
|
||||
## Shapely Additive Explanations
|
||||
|
||||
We need to know what’s **additive** mean here. Lundberg and Lee define an additive feature attribution as follows:
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
$x'$, the simplified local inputs usually means that we turn a feature vector into a discrete binary vector, where features are either included or excluded. Also, the $g(x')$ should take this form:
|
||||
|
||||
$$
|
||||
g(x')=\varphi_0+\sum_{i=1}^N \varphi_i {x'}_i
|
||||
$$
|
||||
|
||||
* $\varphi_0$ is the **null output** of this model, that is, the **average output** of this model
|
||||
- $\varphi_i$ is **feature affect**, is how much that feature changes the output of the model, introduced above. It’s called **attribution**
|
||||
|
||||

|
||||
|
||||
Now Lundberg and Lee go on to describe a set of three desirable properties of such an additive feature method, **local accuracy**, **missingness**, and **consistency**.
|
||||
|
||||
### Local accuracy
|
||||
|
||||
$$
|
||||
g(x')\approx f(x) \quad \text{if} \quad x'\approx x
|
||||
$$
|
||||
|
||||
### Missingness
|
||||
|
||||
$$
|
||||
{x_i}' = 0 \rightarrow \varphi_i = 0
|
||||
$$
|
||||
|
||||
if a feature excluded from the model. it’s attribution must be zero; that is, the only thing that can affect the output of the explanation model is the inclusion of features, not the exclusion.
|
||||
|
||||
### Consistency
|
||||
|
||||
If feature contribution changes, the feature effect cannot change in the opposite direction
|
||||
|
||||
# Why SHAP
|
||||
|
||||
Lee and Lundberg in their paper argue that only SHAP satisfies all three properties if **the feature attributions in only additive explanatory model are specifically chosen to be the shapley values of those features**
|
||||
|
||||
# SHAP, step-by-step Process, same as shap.explainer
|
||||
|
||||
For example, we consider a ice cream shop in the airport, it has four features we can know to predict his business.
|
||||
|
||||
$$
|
||||
\begin{bmatrix}
|
||||
\text{temperature} & \text{day of weeks} & \text{num of flights} & \text{num of hours}
|
||||
\end{bmatrix}
|
||||
\\
|
||||
\rightarrow \\
|
||||
\begin{bmatrix}
|
||||
T & D & F & H
|
||||
\end{bmatrix}
|
||||
$$
|
||||
|
||||
For, example, we want to know the temperature 80 in sample [80 1 100 4] shapley value, here’s the step
|
||||
|
||||
- Step 1. Get random permutation of features, and give a bracket to the feature we care and everything in its right. (manually)
|
||||
|
||||
$$
|
||||
\begin{bmatrix}
|
||||
F & D & \underbrace{T \quad H}
|
||||
\end{bmatrix}
|
||||
$$
|
||||
|
||||
- Step 2. Pick random sample from dataset
|
||||
|
||||
For example, [200 5 70 8], form: [F D T H]
|
||||
|
||||
- Step 3. Form vectors $x_1 \quad x_2$
|
||||
|
||||
$$
|
||||
x_1=[100 \quad 1 \quad 80 \quad \color{#BF40BF} 8 \color{#FFFFFF}]
|
||||
$$
|
||||
|
||||
$x_1$ is partially from original sample and partially from the random chosen one, the feature in bracket will from random chosen one, exclude what we care
|
||||
|
||||
$$
|
||||
x_2 = [100 \quad 1 \quad \color{#BF40BF} 70 \quad 8 \color{#FFFFFF}]
|
||||
$$
|
||||
|
||||
$x_2$ just change the feature we care into the same as random chosen one’s feature value
|
||||
|
||||
Then, calculate the diff and record
|
||||
|
||||
$$
|
||||
DIFF = c_1 - c_2
|
||||
$$
|
||||
|
||||
- Step 4. Record the diff & return to step 1. and repeat many times
|
||||
|
||||
$$
|
||||
\text{SHAP}(T=80 | [80 \quad 1 \quad 100 \quad 4]) = \text{average(DIFF)}
|
||||
$$
|
||||
|
||||
# Shapley kernel
|
||||
|
||||
## Too many coalitions need to be sampled
|
||||
|
||||
Like we introduce shapley values above, for each $\varphi_i$ we need to sample a lot of coalitions to compute the difference.
|
||||
|
||||
For 4 features, we need 64 total coalitions to sample; For 32 features, we need 17.1 billion coalitions to sample.
|
||||
|
||||
It’s entirely untenable.
|
||||
|
||||
So, to get over this difficulty, we need devise a **shapley kernel**, and that’s how the Lee and Lundberg do
|
||||
|
||||

|
||||
|
||||
## Detail
|
||||

|
||||
|
||||
Though most of ML models won’t just let you omit a feature, what we do is define a **background dataset** B, one that contains a set of representative data points that model was trained over. We then filled in out omitted feature of features with values from background dataset, while holding the features are included in the permutation fixed to their original values. We then take the average of the model output over all of these new synthetic data point as our model output for that feature permutation which we call $\bar{y}$.
|
||||
|
||||
$$
|
||||
E[y_{\text{12i4}}\ \ \forall \ \text{i}\in B] = \bar{y}_{\text{124}}
|
||||
$$
|
||||

|
||||
|
||||
Them we have a number of samples computed in this way,like image in left.
|
||||
|
||||
We can formulate this as a weighted linear regression, with each feature assigned a coefficient.
|
||||
|
||||
And we can prove that, in the special choice, the coefficient can be the shaplely values. **This weighting scheme is the basis of the Shapley Kernal.** In this situation, the weighted linear regression process as a whole is Kernal SHAP.
|
||||
|
||||
### Different types of SHAP
|
||||
|
||||
- **Kernal SHAP**
|
||||
- Low-order SHAP
|
||||
- Linear SHAP
|
||||
- Max SHAP
|
||||
- Deep SHAP
|
||||
- Tree SHAP
|
||||
|
||||

|
||||
|
||||
### You need to notice
|
||||
We can see that, we calculate shapley values using linear regression lastly. So there must be the error here, but some python packages can not give us the error bound, so it’s confusion to konw if this error come from linear regression or the data, or the model.
|
||||
|
||||
|
||||
# Reference
|
||||
|
||||
[Shapley Additive Explanations (SHAP)](https://www.youtube.com/watch?v=VB9uV-x0gtg)
|
||||
|
||||
[SHAP: A reliable way to analyze your model interpretability](https://towardsdatascience.com/shap-a-reliable-way-to-analyze-your-model-interpretability-874294d30af6)
|
||||
|
||||
[【Python可解释机器学习库SHAP】:Python的可解释机器学习库SHAP](https://zhuanlan.zhihu.com/p/483622352)
|
||||
|
||||
[Shapley Values : Data Science Concepts](https://www.youtube.com/watch?v=NBg7YirBTN8)
|
||||
|
||||
# Appendix
|
||||
|
||||
Other methods to interprete model:
|
||||
|
||||
[Papers with Code - SHAP Explained](https://paperswithcode.com/method/shap)
|
||||
|
After Width: | Height: | Size: 73 KiB |
|
After Width: | Height: | Size: 73 KiB |
|
After Width: | Height: | Size: 93 KiB |
|
After Width: | Height: | Size: 81 KiB |
|
After Width: | Height: | Size: 88 KiB |
|
After Width: | Height: | Size: 318 KiB |
|
After Width: | Height: | Size: 254 KiB |
|
After Width: | Height: | Size: 351 KiB |
|
After Width: | Height: | Size: 317 KiB |
|
After Width: | Height: | Size: 288 KiB |
@ -0,0 +1,9 @@
|
||||
---
|
||||
title: Tokenization
|
||||
tags:
|
||||
- NLP
|
||||
- deep-learning
|
||||
- tokenization
|
||||
- basic
|
||||
---
|
||||
|
||||
@ -0,0 +1,58 @@
|
||||
---
|
||||
title: Dynamic Time Warping (DTW)
|
||||
tags:
|
||||
- metrics
|
||||
- time-series-dealing
|
||||
- evalution
|
||||
---
|
||||
|
||||

|
||||
|
||||
欧氏距离在时间序列之间可能是一个不好的选择,因为时间轴上存在扭曲的情况。DTW 是一个考虑到这种扭曲的,测量距离来比较两个时间序列的一个指标,本section讲解如何计算 DTW distance
|
||||
|
||||
# Detail
|
||||
|
||||
|
||||
## Step 1. 准备输入序列
|
||||
|
||||
假设两个time series, A & B
|
||||
|
||||
## Step 2. 计算距离矩阵
|
||||
|
||||
创建一个距离矩阵,其中的元素表示序列 A 和序列 B 中每个时间点之间的距离。常见的距离度量方法包括欧氏距离、曼哈顿距离、余弦相似度等。根据你的数据类型和需求选择适当的距离度量方法。
|
||||
|
||||
## Step 3. 初始化累积距离矩阵
|
||||
|
||||
创建一个与距离矩阵大小相同的累积距离矩阵,用于存储从起点到每个位置的累积距离。将起点 (0, 0) 的累积距离设为距离矩阵的起始点距离。
|
||||
|
||||
## Step 4. 计算累积距离
|
||||
|
||||
从起点开始,按照动态规划的方式计算累积距离矩阵中每个位置的累积距离。对于每个位置 (i, j),**累积距离等于该位置的距离加上三个相邻位置中选择最小累积距离的值。**
|
||||
|
||||
$$
|
||||
DTW(i, j) = d_{i,j} + \min{\{DTW(i-1,j), DTW(i, j-1), DTW(i-1, j-1)\}}
|
||||
$$
|
||||
|
||||
|
||||
## Step 5. 回溯最优路径
|
||||
|
||||
从累积距离矩阵的最右下角开始,根据最小累积距离的路径回溯到起点 (0, 0)。记录下经过的路径,即为最优路径。
|
||||
|
||||
## Step 6. 计算最终距离
|
||||
|
||||
根据最优路径上的累积距离,计算出最终的 DTW 距离。
|
||||
|
||||
# Example
|
||||
|
||||

|
||||
|
||||
左边是距离矩阵,右边是DTW矩阵,也就是累积距离矩阵
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
通过回溯,找到optimal warping path,DTW distance就是 the optimal warping path的square root,本例中就是$\sqrt{15}$
|
||||
|
||||
|
||||
|
||||
|
After Width: | Height: | Size: 98 KiB |
|
After Width: | Height: | Size: 55 KiB |
|
After Width: | Height: | Size: 521 KiB |
|
After Width: | Height: | Size: 563 KiB |
|
After Width: | Height: | Size: 880 KiB |
@ -0,0 +1,63 @@
|
||||
---
|
||||
title: Quantile loss
|
||||
tags:
|
||||
- loss-function
|
||||
- deep-learning
|
||||
- deep-learning-math
|
||||
---
|
||||
|
||||
在大多数现实世界的预测问题中,我们的预测所带来的不确定性具有重要价值。相较于仅仅提供点估计,了解预测范围能够显著改善许多商业应用的决策过程。**Quantile loss**就是为例帮助我们了解预测范围的loss function。
|
||||
|
||||
Quantile loss用于衡量预测分布和目标分布之间的差异,特别适用于处理不确定性较高的预测问题。
|
||||
|
||||
# What is quantile
|
||||
|
||||
[Quantile](Math/Statistics/Basic/Quantile.md)
|
||||
|
||||
# What is a prediction interval
|
||||
|
||||
|
||||
预测区间是对预测的不确定性进行量化的一种方法。它为结果变量的估计提供了**概率上限和下限的范围**。
|
||||
|
||||

|
||||
|
||||
输出本身是随机变量,因此具有分布特性。预测区间的目的在于了解结果的正确性可能性。
|
||||
|
||||
# What is Quantile Loss
|
||||
|
||||
在Quantile loss中,我们将预测结果和目标值都表示为分位数形式,例如,我们可以用预测的α分位数来表示预测结果,用真实值的α分位数来表示目标值。然后,Quantile loss衡量了这两个分布之间的差异,通常使用分位数损失函数来计算。
|
||||
|
||||
分位数回归损失函数(Quantile Regression Loss)用于预测分位数(Quantile)。例如,对于分位数为0.9的预测,应该在90%的情况下做出过高的预测。
|
||||
|
||||
对于一条数据,prediction是$y_i^p$,真实值是$y_i$,mean regression loss for a quantile q:
|
||||
|
||||
$$
|
||||
L(y_i^p, y_i) = \max[q(y_i^p - y_i), (q-1)(y_i^p - y_i)]
|
||||
$$
|
||||
|
||||
一系列prediction数据来通过minimize这个loss function后,得到quantile - $q$
|
||||
|
||||
|
||||
## Intuitive Understanding
|
||||
|
||||
在上述的回归损失方程中,由于 q 的取值范围在 0 到 1 之间,当进行过高预测($y_i^p$ > $y_i$)时,第一项将为正并占主导地位;而当进行过低预测($y_i^p$ < $y_i$)时,第二项将占主导地位。当 q 等于 0.5 时,过低预测和过高预测将受到相同的惩罚因子,从而得到中位数。q 的值越大,相比于过低预测,过高预测将受到更严厉的惩罚。例如,当 q 等于 0.75 时,过高预测将受到 0.75 的惩罚因子,而过低预测将受到 0.25 的惩罚因子。模型做出过高预测的可能性的*难度*将会是过低预测可能性的3倍,从而得到 0.75 分位数。
|
||||
|
||||
## Why Quantile loss
|
||||
|
||||
> [!quote]
|
||||
> **“同方差性”,“恒定方差假设”**
|
||||
>
|
||||
> 在最小二乘回归中,预测区间基于一个假设,即残差在自变量的各个取值上具有恒定的方差。这假设被称为“同方差性”或“恒定方差假设”。
|
||||
>
|
||||
> 这个假设是基于对回归模型中误差项的性质的一种合理假设。在最小二乘回归中,我们假设因变量的观测值是由真实值和一个误差项组成的,而这个误差项是独立同分布的,即在每个自变量取值上都具有相同的分布。
|
||||
>
|
||||
> 如果残差在自变量的各个取值上具有恒定的方差,意味着误差的大小不会随着自变量的变化而发生显著的变化。这样的话,我们可以使用统计方法来计算出预测区间,这个区间能够给出对未来观测值的置信度。
|
||||
>
|
||||
> 然而,如果恒定方差假设不成立,也就是残差在自变量的取值上具有不同的方差,那么最小二乘回归的结果可能会出现问题。在这种情况下,预测区间可能会低估或高估预测的不确定性,导致对未来观测值的置信度估计不准确。
|
||||
|
||||
Quantile Loss Regression可以提供合理的预测区间,即使对于具有非恒定方差或非正态分布的残差也是如此
|
||||
|
||||
|
||||
# Reference
|
||||
|
||||
* [Kandi, Shabeel. “Prediction Intervals in Forecasting: Quantile Loss Function.” _Analytics Vidhya_, 24 Apr. 2023, https://medium.com/analytics-vidhya/prediction-intervals-in-forecasting-quantile-loss-function-18f72501586f.](https://medium.com/analytics-vidhya/prediction-intervals-in-forecasting-quantile-loss-function-18f72501586f)
|
||||
@ -0,0 +1,109 @@
|
||||
import cv2
|
||||
import numpy as np
|
||||
import matplotlib.pyplot as plt
|
||||
from tkinter import Tk, filedialog
|
||||
from mpl_toolkits.mplot3d import Axes3D
|
||||
from sklearn.cluster import KMeans
|
||||
|
||||
|
||||
# Create a Tkinter root window
|
||||
root = Tk()
|
||||
root.withdraw()
|
||||
|
||||
# Open a file explorer dialog to select an image file
|
||||
file_path = filedialog.askopenfilename()
|
||||
|
||||
# Read the selected image using cv2
|
||||
image = cv2.imread(file_path)
|
||||
|
||||
# Convert the image to RGB color space
|
||||
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
|
||||
|
||||
# Get the dimensions of the image
|
||||
height, width, _ = image_rgb.shape
|
||||
|
||||
# Reshape the image to a 2D array of pixels, one is pixel number, one is pixel channel
|
||||
pixels = image_rgb.reshape((height * width, 3))
|
||||
|
||||
# Create an empty dataset
|
||||
dataset = []
|
||||
|
||||
# Iterate over each pixel and store the RGB values as a vector in the dataset
|
||||
for pixel in pixels:
|
||||
dataset.append(pixel)
|
||||
|
||||
# Convert the dataset to a NumPy array
|
||||
dataset = np.array(dataset)
|
||||
|
||||
# Get the RGB values from the dataset
|
||||
red = dataset[:, 0]
|
||||
green = dataset[:, 1]
|
||||
blue = dataset[:, 2]
|
||||
|
||||
|
||||
|
||||
# plot show
|
||||
'''
|
||||
# Plot the histograms
|
||||
plt.figure(figsize=(10, 6))
|
||||
plt.hist(red, bins=256, color='red', alpha=0.5, label='Red')
|
||||
plt.hist(green, bins=256, color='green', alpha=0.5, label='Green')
|
||||
plt.hist(blue, bins=256, color='blue', alpha=0.5, label='Blue')
|
||||
plt.title('RGB Value Histogram')
|
||||
plt.xlabel('RGB Value')
|
||||
plt.ylabel('Frequency')
|
||||
plt.legend()
|
||||
plt.show()
|
||||
|
||||
|
||||
# Plot the 3D scatter graph
|
||||
fig = plt.figure(figsize=(10, 8))
|
||||
ax = fig.add_subplot(111, projection='3d')
|
||||
ax.scatter(red, green, blue, c='#000000', s=1)
|
||||
ax.set_xlabel('Red')
|
||||
ax.set_ylabel('Green')
|
||||
ax.set_zlabel('Blue')
|
||||
ax.set_title('RGB Scatter Plot')
|
||||
plt.show()
|
||||
'''
|
||||
|
||||
|
||||
# Perform k-means clustering
|
||||
num_clusters = 3 # Specify the desired number of clusters
|
||||
kmeans = KMeans(n_clusters=num_clusters, n_init='auto', random_state=42)
|
||||
labels = kmeans.fit_predict(dataset)
|
||||
|
||||
|
||||
# Show K-means Clustering result
|
||||
'''
|
||||
# Plot the scatter plot for each iteration of the k-means algorithm
|
||||
fig = plt.figure(figsize=(10, 8))
|
||||
ax = fig.add_subplot(111, projection='3d')
|
||||
|
||||
for i in range(num_clusters):
|
||||
cluster_points = dataset[labels == i]
|
||||
ax.scatter(cluster_points[:, 0], cluster_points[:, 1], cluster_points[:, 2], s=1)
|
||||
|
||||
ax.set_xlabel('Red')
|
||||
ax.set_ylabel('Green')
|
||||
ax.set_zlabel('Blue')
|
||||
ax.set_title('RGB Scatter Plot - K-Means Clustering')
|
||||
plt.show()
|
||||
'''
|
||||
|
||||
center_values = kmeans.cluster_centers_.astype(int)
|
||||
|
||||
for i in range(num_clusters):
|
||||
dataset[labels == i] = center_values[i]
|
||||
|
||||
|
||||
# Reshape the pixels array back into an image with the original dimensions and convert it to BGR color space
|
||||
reshaped_image = dataset.reshape((height, width, 3))
|
||||
reshaped_image_bgr = cv2.cvtColor(reshaped_image.astype(np.uint8), cv2.COLOR_RGB2BGR)
|
||||
|
||||
# Display the image using matplotlib
|
||||
plt.imshow(reshaped_image)
|
||||
plt.show()
|
||||
|
||||
# Opencv store image
|
||||
cv2.imwrite('C:/Users/BME51/Desktop/color8bit_style.jpg', reshaped_image_bgr)
|
||||
|
After Width: | Height: | Size: 730 KiB |
|
After Width: | Height: | Size: 1.9 MiB |
|
After Width: | Height: | Size: 518 KiB |
@ -0,0 +1,102 @@
|
||||
---
|
||||
title: K-means Clustering Algorithm
|
||||
tags:
|
||||
- machine-learning
|
||||
- clustering
|
||||
- algorithm
|
||||
---
|
||||
|
||||
# Step by Step
|
||||
|
||||
Our algorithm works as follows, assuming we have inputs $x_1, x_2, \cdots, x_n$ and value of $K$
|
||||
|
||||
- **Step 1** - Pick $K$ random points as cluster centers called centroids.
|
||||
- **Step 2** - Assign each $x_i$ to nearest cluster by calculating its distance to each centroid.
|
||||
- **Step 3** - Find new cluster center by taking the average of the assigned points.
|
||||
- **Step 4** - Repeat Step 2 and 3 until none of the cluster assignments change.
|
||||
|
||||

|
||||
|
||||
# Implementation
|
||||
|
||||
## Core code
|
||||
|
||||
### Distance calculation:
|
||||
|
||||
```python
|
||||
# Euclidean Distance Caculator
|
||||
def dist(a, b, ax=1):
|
||||
return np.linalg.norm(a - b, axis=ax)
|
||||
```
|
||||
|
||||
|
||||
### Generate Random Clustering center at first
|
||||
|
||||
```python
|
||||
# Number of clusters
|
||||
k = 3
|
||||
# X coordinates of random centroids
|
||||
C_x = np.random.randint(0, np.max(X)-20, size=k)
|
||||
# Y coordinates of random centroids
|
||||
C_y = np.random.randint(0, np.max(X)-20, size=k)
|
||||
C = np.array(list(zip(C_x, C_y)), dtype=np.float32)
|
||||
print(C)
|
||||
```
|
||||
|
||||
### Calculate dis and tag point, then update every tag's new center
|
||||
|
||||
```python
|
||||
# To store the value of centroids when it updates
|
||||
C_old = np.zeros(C.shape)
|
||||
# Cluster Lables(0, 1, 2)
|
||||
clusters = np.zeros(len(X))
|
||||
# Error func. - Distance between new centroids and old centroids
|
||||
error = dist(C, C_old, None)
|
||||
# Loop will run till the error becomes zero
|
||||
while error != 0:
|
||||
# Assigning each value to its closest cluster
|
||||
for i in range(len(X)):
|
||||
distances = dist(X[i], C)
|
||||
cluster = np.argmin(distances)
|
||||
clusters[i] = cluster
|
||||
# Storing the old centroid values
|
||||
C_old = deepcopy(C)
|
||||
# Finding the new centroids by taking the average value
|
||||
for i in range(k):
|
||||
points = [X[j] for j in range(len(X)) if clusters[j] == i]
|
||||
C[i] = np.mean(points, axis=0)
|
||||
error = dist(C, C_old, None)
|
||||
```
|
||||
|
||||
## Simple approach by scikit-learn
|
||||
|
||||
```python
|
||||
from sklearn.cluster import KMeans
|
||||
|
||||
# Number of clusters
|
||||
kmeans = KMeans(n_clusters=3)
|
||||
# Fitting the input data
|
||||
kmeans = kmeans.fit(X)
|
||||
# Getting the cluster labels
|
||||
labels = kmeans.predict(X)
|
||||
# Centroid values
|
||||
centroids = kmeans.cluster_centers_
|
||||
|
||||
# Comparing with scikit-learn centroids
|
||||
print(C) # From Scratch
|
||||
print(centroids) # From sci-kit learn
|
||||
```
|
||||
|
||||
# Application
|
||||
|
||||
## 8bit style
|
||||
|
||||
Read image and use k-means to do clustering for pixel value. Make pic to 8bit color style.
|
||||
|
||||

|
||||
|
||||
[color8bit_style.py](https://github.com/PinkR1ver/Jude.W-s-Knowledge-Brain/blob/master/Deep_Learning_And_Machine_Learning/clustering/k-means/application/color8bit_style.py)
|
||||
|
||||
# Reference
|
||||
|
||||
* [K-Means Clustering in Python, https://mubaris.com/posts/kmeans-clustering/. Accessed 3 July 2023.](https://mubaris.com/posts/kmeans-clustering/)
|
||||
@ -0,0 +1,38 @@
|
||||
---
|
||||
title: AdaBoost
|
||||
tags:
|
||||
- deep-learning
|
||||
- ensemble-learning
|
||||
---
|
||||
|
||||
# Video you need to watch first
|
||||
|
||||
* [AdaBoost, Clearly Explained](https://www.youtube.com/watch?v=LsK-xG1cLYA)
|
||||
|
||||
# Key words and equation
|
||||
|
||||
- **Stump(树桩) means classification just by one feature**
|
||||
- Amount of say
|
||||
|
||||
$$
|
||||
\text{Amout of say} = \frac{1}{2}\log{(\frac{1-\text{Total Error}}{\text{Total Error}})}
|
||||
$$
|
||||
|
||||
- Wrong Classified Sample New Weight
|
||||
|
||||
$$
|
||||
\text{New Sample Weight} = \text{Sample Weight}\times e^{\text{amount of say}}
|
||||
$$
|
||||
|
||||
- Correct Clasified Sample New Weight
|
||||
|
||||
$$
|
||||
\text{New Sample Weight} = \text{Sample Weight}\times e^{-\text{amount of say}}
|
||||
$$
|
||||
|
||||
- After reassing sample weight, do bootstrap sample based on their new weight, it will select big weight sample lots of times to adjust next model
|
||||
- In last prediction, the **amount of say** decide which results we will pick.
|
||||
|
||||
# Question
|
||||
|
||||
- **[why decision stumps instead of trees?](https://stats.stackexchange.com/questions/520667/adaboost-why-decision-stumps-instead-of-trees)**
|
||||