Big change about A -> a
@ -9,16 +9,16 @@ tags:
|
||||
|
||||
## Basic concept
|
||||
|
||||
* [Quantile](Math/Statistics/Basic/Quantile.md)
|
||||
* [Quantile](math/Statistics/Basic/Quantile.md)
|
||||
|
||||
# Discrete mathematics
|
||||
|
||||
## Set theory
|
||||
|
||||
* [Cantor Expansion](Math/discrete_mathematics/set_theory/cantor_expansion/cantor_expansion.md)
|
||||
* [Cantor Expansion](math/discrete_mathematics/set_theory/cantor_expansion/cantor_expansion.md)
|
||||
|
||||
|
||||
# Optimization Problem
|
||||
|
||||
|
||||
* [Quadratic Programming](Math/optimization_problem/Quadratic_Programming.md)
|
||||
* [Quadratic Programming](math/optimization_problem/Quadratic_Programming.md)
|
||||
@ -13,10 +13,10 @@ $$
|
||||
|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
the Cauchy principal value is the method for assigning values to *certain improper integrals* which would otherwise be undefined. In this method, a singularity on an integral interval is avoided by limiting the integral interval to the non singular domain.
|
||||
|
||||
|
||||
@ -6,4 +6,4 @@ tags:
|
||||
- MOC
|
||||
---
|
||||
|
||||
* [🌊Sea MOC](Photography/Aesthetic/Landscape/Sea/Sea_MOC.md)
|
||||
* [🌊Sea MOC](photography/Aesthetic/Landscape/Sea/Sea_MOC.md)
|
||||
@ -6,22 +6,22 @@ tags:
|
||||
- photography
|
||||
---
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
# Reference
|
||||
|
||||
|
||||
@ -7,5 +7,5 @@ tags:
|
||||
- aesthetic
|
||||
---
|
||||
|
||||
* [Fujifilm Blue🌊, 小红书-Philips谢骏](Photography/Aesthetic/Landscape/Sea/Fujifilm_Blue_by_小红书_Philips谢骏.md)
|
||||
* [豊島🏝, Instagram-shiifoncake](Photography/Aesthetic/Landscape/Sea/豊島_Instagram_shiifoncake.md)
|
||||
* [Fujifilm Blue🌊, 小红书-Philips谢骏](photography/Aesthetic/Landscape/Sea/Fujifilm_Blue_by_小红书_Philips谢骏.md)
|
||||
* [豊島🏝, Instagram-shiifoncake](photography/Aesthetic/Landscape/Sea/豊島_Instagram_shiifoncake.md)
|
||||
@ -6,17 +6,17 @@ tags:
|
||||
- landscape
|
||||
- aesthetic
|
||||
---
|
||||

|
||||

|
||||
|
||||
.jpg)
|
||||
.jpg)
|
||||
|
||||

|
||||

|
||||
|
||||
.jpg)
|
||||
.jpg)
|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
# Reference
|
||||
|
||||
@ -6,4 +6,4 @@ tags:
|
||||
- MOC
|
||||
---
|
||||
|
||||
* [🖼How to show Polaroid photo in a great way](Photography/Aesthetic/Polaroid/Polaroid_showcase.md)
|
||||
* [🖼How to show Polaroid photo in a great way](photography/Aesthetic/Polaroid/Polaroid_showcase.md)
|
||||
@ -8,18 +8,18 @@ tags:
|
||||
|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
Credits to [比扫描仪更easy的宝丽来翻拍解决方案 -BonBon的Pan](https://www.xiaohongshu.com/user/profile/6272c025000000002102353b/6331af53000000001701acfd)
|
||||
@ -9,45 +9,45 @@ tags:
|
||||
Credits to [Marta Bevacqua](https://www.martabevacquaphotography.com/),
|
||||
Thanks🌸
|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
||||
.jpg)
|
||||
.jpg)
|
||||
|
||||
.jpg)
|
||||
.jpg)
|
||||
|
||||
.jpg)
|
||||
.jpg)
|
||||
|
||||
.jpg)
|
||||
.jpg)
|
||||
|
||||
.jpg)
|
||||
.jpg)
|
||||
|
||||
.jpg)
|
||||
.jpg)
|
||||
|
||||
.jpg)
|
||||
.jpg)
|
||||
|
||||
.jpg)
|
||||
.jpg)
|
||||
|
||||
.jpg)
|
||||
.jpg)
|
||||
|
||||
.jpg)
|
||||
.jpg)
|
||||
|
||||
.jpg)
|
||||
.jpg)
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
@ -14,22 +14,22 @@ Thanks
|
||||
Also, I see this in [摄影灵感|那有一点可爱 - by
|
||||
小八怪](https://www.xiaohongshu.com/explore/63f0a27e0000000013002b05)
|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
@ -7,5 +7,5 @@ tags:
|
||||
- MOC
|
||||
---
|
||||
|
||||
* [🌸Flower & Girl](Photography/Aesthetic/Portrait/Flower_and_Girl.md)
|
||||
* [👧🇰🇷Cute Portrait from Korean MV <Today's Mood>](Photography/Aesthetic/Portrait/From%20Korean%20MV%20Todays_Mod.md)
|
||||
* [🌸Flower & Girl](photography/Aesthetic/Portrait/Flower_and_Girl.md)
|
||||
* [👧🇰🇷Cute Portrait from Korean MV <Today's Mood>](photography/Aesthetic/Portrait/From%20Korean%20MV%20Todays_Mod.md)
|
||||
|
||||
@ -7,10 +7,10 @@ tags:
|
||||
- share
|
||||
---
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
# Reference
|
||||
|
||||
@ -7,5 +7,5 @@ tags:
|
||||
- MOC
|
||||
---
|
||||
|
||||
* [🌅Warmth - Nguan](Photography/Aesthetic/Style/Warmth_by_Nguan.md)
|
||||
* [📗 Grainy Green](Photography/Aesthetic/Style/Grainy_Green.md)
|
||||
* [🌅Warmth - Nguan](photography/Aesthetic/Style/Warmth_by_Nguan.md)
|
||||
* [📗 Grainy Green](photography/Aesthetic/Style/Grainy_Green.md)
|
||||
|
||||
@ -8,19 +8,19 @@ tags:
|
||||
Credits to [Nguan](https://www.instagram.com/_nguan_/)
|
||||
|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
|
||||
@ -21,7 +21,7 @@ tags:
|
||||
# What is MTF Curve
|
||||
|
||||
|
||||
调制传递函数 (MTF) 曲线是一种信息密集型指标(information-dense metric),反映了镜头如何*将对比度再现为空间频率(分辨率)的函数*。MTF Curve在一组设定好的基础参数下,提供一个composite view,关于光学像差([**optical aberrations**](Physics/Optical/optical_abberation.md))如何影响镜头性能。
|
||||
调制传递函数 (MTF) 曲线是一种信息密集型指标(information-dense metric),反映了镜头如何*将对比度再现为空间频率(分辨率)的函数*。MTF Curve在一组设定好的基础参数下,提供一个composite view,关于光学像差([**optical aberrations**](physics/Optical/optical_abberation.md))如何影响镜头性能。
|
||||
|
||||
通过MTF图,我们可以知道
|
||||
|
||||
@ -41,11 +41,11 @@ tags:
|
||||
|
||||
大家应该知道,一个镜头的中心比边缘成像能力要好很多,因此只测试镜头的中心或边缘,是不能代表镜头的好坏的,所以厂家会从中心到边缘,选取多个点进行测试。如下图,尼康的全画幅机器,选取了距离中心5毫米,10mm,15mm,20mm的点测试。如果是APS-C画幅,因为感光元件小,会选取3mm,6mm,9mm,12mm等,不同厂家可能不一样。
|
||||
|
||||

|
||||

|
||||
|
||||
测试方法一般使用白色背景、黑色直线
|
||||
|
||||

|
||||

|
||||
|
||||
* **粗线**用来测试**对比度**,粗度为 10 lines/mm
|
||||
* **细线**用来测试**分辨率**,粗度为 30 lines/mm
|
||||
@ -53,11 +53,11 @@ tags:
|
||||
|
||||
下图的成像质量是越来越差:
|
||||
|
||||

|
||||

|
||||
|
||||
# How to read MTF curve
|
||||
|
||||

|
||||

|
||||
|
||||
横坐标代表了到镜头中心的距离,纵坐标代表了对比度和分辨率的值。
|
||||
|
||||
@ -67,7 +67,7 @@ tags:
|
||||
|
||||
蓝线是通过**细线**测试得到的,代表**分辨率**。
|
||||
|
||||

|
||||

|
||||
|
||||
普通的镜头的曲线应该是下面这样的(红线代表对比度,蓝线代表分辨率),在中心点,镜头的对比度和分辨率最好,越往边缘越差。
|
||||
|
||||
@ -77,11 +77,11 @@ tags:
|
||||
|
||||
有波浪就代表有像场弯曲,越大就越严重,实际情况一般问题不大。
|
||||
|
||||

|
||||

|
||||
|
||||
最常见的MTF曲线如图:
|
||||
|
||||

|
||||

|
||||
|
||||
1. 红线,10lines/mm,也就是上面测试时说的粗线,用来测对比度的,从镜头中心到边缘,数值逐渐降低,表明镜头的对比度从镜头到边缘,逐渐降低。
|
||||
2. 分辨率,从中心到边缘逐渐降低
|
||||
|
||||
@ -9,4 +9,4 @@ tags:
|
||||
|
||||
# Rollei
|
||||
|
||||
* [Rollei35](Photography/Cameras_Research/Pocket_film/Rollei_35.md)
|
||||
* [Rollei35](photography/Cameras_Research/Pocket_film/Rollei_35.md)
|
||||
@ -9,7 +9,7 @@ tags:
|
||||
|
||||
# Polaroid Background
|
||||
|
||||

|
||||

|
||||
|
||||
Polaroid是一家成立于1937年的美国相机及照片制造公司,该公司曾经是即时相机市场的领导者。Polaroid公司在20世纪50年代推出了第一台即时相机,并在随后的几十年里不断推出各种型号的即时相机和胶片,成为了全球广泛使用的品牌。
|
||||
|
||||
@ -21,5 +21,5 @@ Polaroid最著名的特点之一是它的“即时影像”技术,这种技术
|
||||
|
||||
# Polaroid Camera Review
|
||||
|
||||
* [Polaroid one600](Photography/Cameras_Research/Polaroid/Polaroid_one600.md)
|
||||
* [Polaroid Integral 600 Series](Photography/Cameras_Research/Polaroid/Polaroid_600.md)
|
||||
* [Polaroid one600](photography/Cameras_Research/Polaroid/Polaroid_one600.md)
|
||||
* [Polaroid Integral 600 Series](photography/Cameras_Research/Polaroid/Polaroid_600.md)
|
||||
|
||||
@ -8,7 +8,7 @@ tags:
|
||||
---
|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
# Specifications
|
||||
|
||||
|
||||
@ -7,4 +7,4 @@ tags:
|
||||
---
|
||||
|
||||
|
||||
* [idea - reference image](Photography/MoodBoard/Sea_20230428/idea.md)
|
||||
* [idea - reference image](photography/MoodBoard/Sea_20230428/idea.md)
|
||||
|
||||
@ -6,40 +6,40 @@ tags:
|
||||
- idea
|
||||
---
|
||||
|
||||
# [Fujifilm_Blue_by_小红书_Philips谢骏](Photography/Aesthetic/Landscape/Sea/Fujifilm_Blue_by_小红书_Philips谢骏.md)
|
||||
# [Fujifilm_Blue_by_小红书_Philips谢骏](photography/Aesthetic/Landscape/Sea/Fujifilm_Blue_by_小红书_Philips谢骏.md)
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
# [豊島_Instagram_shiifoncake](Photography/Aesthetic/Landscape/Sea/豊島_Instagram_shiifoncake.md)
|
||||
# [豊島_Instagram_shiifoncake](photography/Aesthetic/Landscape/Sea/豊島_Instagram_shiifoncake.md)
|
||||
|
||||

|
||||

|
||||
|
||||
.jpg)
|
||||
.jpg)
|
||||
|
||||

|
||||

|
||||
|
||||
.jpg)
|
||||
.jpg)
|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
# [寄り道の理由。- Instagram, photono_gen](https://www.instagram.com/p/CrVPFjZvvlr/)
|
||||
|
||||

|
||||

|
||||
@ -18,38 +18,38 @@ Also, here's my notes about learning photography
|
||||
|
||||
## About Basic Concepts:
|
||||
|
||||
* [Saturation](Photography/Basic/Saturation.md)
|
||||
* [Saturation](photography/Basic/Saturation.md)
|
||||
|
||||
## Appreciation of other works - about ***aesthetic***
|
||||
|
||||
* [👧Portrait](Photography/Aesthetic/Portrait/Portrait_MOC.md)
|
||||
* [🏔Landscape](Photography/Aesthetic/Landscape/Landscape_MOC.md)
|
||||
* [☝Style](Photography/Aesthetic/Style/Style_MOC.md)
|
||||
* [✨Polaroid](Photography/Aesthetic/Polaroid/Polaroid_aesthetic_MOC.md)
|
||||
* [👧Portrait](photography/Aesthetic/Portrait/Portrait_MOC.md)
|
||||
* [🏔Landscape](photography/Aesthetic/Landscape/Landscape_MOC.md)
|
||||
* [☝Style](photography/Aesthetic/Style/Style_MOC.md)
|
||||
* [✨Polaroid](photography/Aesthetic/Polaroid/Polaroid_aesthetic_MOC.md)
|
||||
|
||||
## Camera Research
|
||||
|
||||
* [✨Polaroid](Photography/Cameras_Research/Polaroid/Polaroid.md)
|
||||
* [📷Lens Structure](Photography/Cameras_Research/Lens_Structure/Lens_Structure_MOC.md)
|
||||
* [📸Pocket film camera](Photography/Cameras_Research/Pocket_film/Pocket_film_camera_MOC.md)
|
||||
* [✨Polaroid](photography/Cameras_Research/Polaroid/Polaroid.md)
|
||||
* [📷Lens Structure](photography/Cameras_Research/Lens_Structure/Lens_Structure_MOC.md)
|
||||
* [📸Pocket film camera](photography/Cameras_Research/Pocket_film/Pocket_film_camera_MOC.md)
|
||||
|
||||
## Skills I learned
|
||||
|
||||
* [How to measure light using Polaroid?](Photography/Skills/Polaroid_light.md)
|
||||
* [How to use Moodboard](Photography/Skills/Moodboard.md)
|
||||
* [How to show your Polaroid Picture](Photography/Aesthetic/Polaroid/Polaroid_showcase.md)
|
||||
* [How to measure light using Polaroid?](photography/Skills/Polaroid_light.md)
|
||||
* [How to use Moodboard](photography/Skills/Moodboard.md)
|
||||
* [How to show your Polaroid Picture](photography/Aesthetic/Polaroid/Polaroid_showcase.md)
|
||||
|
||||
## Photography story
|
||||
|
||||
* [夜爬蛤蟆峰拍Polaroid慢门 - 2023.04.14](Photography/Story/Rainy_evening_hiking_Polaroid.md)
|
||||
* [夜爬蛤蟆峰拍Polaroid慢门 - 2023.04.14](photography/Story/Rainy_evening_hiking_Polaroid.md)
|
||||
|
||||
## Mood Board
|
||||
|
||||
* [🌊Sea - 2023.04.28](Photography/MoodBoard/Sea_20230428/Sea_20230428.md)
|
||||
* [🌊Sea - 2023.04.28](photography/MoodBoard/Sea_20230428/Sea_20230428.md)
|
||||
|
||||
## Meme
|
||||
|
||||
* [Photography meme](Photography/Photography_meme/Photography_meme.md)
|
||||
* [Photography meme](photography/Photography_meme/Photography_meme.md)
|
||||
|
||||
|
||||
# Reference
|
||||
|
||||
@ -7,4 +7,4 @@ tags:
|
||||
- happy
|
||||
---
|
||||
|
||||

|
||||

|
||||
@ -6,4 +6,4 @@ tags:
|
||||
- skill
|
||||
---
|
||||
|
||||
* [宝丽来翻拍9宫格](Photography/Aesthetic/Polaroid/Polaroid_showcase.md)
|
||||
* [宝丽来翻拍9宫格](photography/Aesthetic/Polaroid/Polaroid_showcase.md)
|
||||
@ -15,7 +15,7 @@ tags:
|
||||
|
||||
山底已经在小雨中颇有丁达尔现象的感觉。
|
||||
|
||||

|
||||

|
||||
|
||||
雨让石头逐渐变得打滑,蛤蟆峰山顶的石头快的攀登会变得非常危险,这一点难以描述,或许你可以问你杭州本地的朋友。周潭在攀登最后一段路程之前摔倒,还好背包缓冲了几乎所有的冲撞,也让他意识到雨天来到这里的危险性,是具有极限运动的底色在的。
|
||||
|
||||
@ -25,14 +25,14 @@ tags:
|
||||
|
||||
在蛤蟆峰顶拍慢门需要一定的三脚架架设技巧和测光技巧,在雨中就显得更加困难。
|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
经过测光和宝丽来app曝光调整,这次拍摄夜景的计划以$f/22$, 30s shutter speed, i-type film 640 ISO进行拍摄,先看成片效果:
|
||||
|
||||

|
||||

|
||||
|
||||
照片由iPhone 12 mini Polaroid app scanner扫描完成的film -> digital,效果比较一般,但我们能看出,曝光的效果不尽人意。这里的原因认为由以下原因导致:
|
||||
* 天气恶劣,空气湿度大,造成光线色散加重
|
||||
@ -41,11 +41,11 @@ tags:
|
||||
|
||||
同时,那晚还不懂now+ +键的使用导致相纸浪费一张,下面是now+中+键的用法:
|
||||
|
||||

|
||||

|
||||
|
||||
同时,那晚曝光时,有一次光圈不小心打到$f/33$,导致欠曝地更为厉害,其效果大概如下:
|
||||
|
||||

|
||||

|
||||
|
||||
同时要注意的是,Polaroid的曝光时间最多是30s,如果要更长时间的曝光,可以不弹相纸进行二次曝光,但是长曝光30s以上可能效果很差。
|
||||
|
||||
@ -53,10 +53,10 @@ tags:
|
||||
|
||||
搞了两张人像,同样的曝光参数$f/22$, 30s shutter speed, i-type film 640 ISO,开了宝丽来闪关灯最大等级:
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
第一张人像清晰些,以我个人观点来看,是因为伞造成的反射
|
||||
|
||||
@ -72,9 +72,9 @@ tags:
|
||||
|
||||
师傅当时前往了滨江,所以我们只好在山脚下,也就是保俶路的忠儿面馆那里等待,刚好周潭没有吃饱,起源巧合下,在这也算吃了一顿还算杭州特色的拌川。
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
# Route
|
||||
|
||||

|
||||

|
||||
@ -18,7 +18,7 @@ $$
|
||||
* $X_L$ = inductive reactance
|
||||
* $X_C$ = capacitive reactance
|
||||
|
||||

|
||||

|
||||
|
||||
**阻抗**是电路中电阻、电感、电容对交流电的阻碍作用的统称。阻抗是一个复数,实部称为**电阻**,虚部称为**电抗**;其中电容在电路中对交流电所起的阻碍作用称为**容抗**,电感在电路中对交流电所起的阻碍作用称为**感抗**,容抗和感抗合称为**电抗**。
|
||||
|
||||
|
||||
@ -8,12 +8,12 @@ tags:
|
||||
|
||||
# Basic
|
||||
|
||||
* [Electric units](Physics/Electromagnetism/Basic/Electric_units.md)
|
||||
* [Electric units](physics/Electromagnetism/Basic/Electric_units.md)
|
||||
|
||||
## Advanced
|
||||
|
||||
* [Maxwell's equation](Physics/Electromagnetism/Maxwells_equation.md)
|
||||
* [Maxwell's equation](physics/Electromagnetism/Maxwells_equation.md)
|
||||
|
||||
# Circuit
|
||||
|
||||
* [Resonant circuit](Physics/Electromagnetism/Resonant_circuit.md)
|
||||
* [Resonant circuit](physics/Electromagnetism/Resonant_circuit.md)
|
||||
@ -32,11 +32,11 @@ Essentially a vector field is what you get if you associate each point in space
|
||||
> [!note]
|
||||
> If you were to draw the vectors to scale, the longer ones end up just cluttering the whole thing, so it's common to basically lie a little and artificially shorten ones that are too long. Maybe using **color to give some vague sense of length**.
|
||||
|
||||

|
||||

|
||||
|
||||
## Divergence
|
||||
|
||||

|
||||

|
||||
|
||||
Divergence $\cdot$ Vector filed是来衡量在(x, y)点你产生fluid的能力
|
||||
|
||||
@ -44,26 +44,26 @@ Divergence $\cdot$ Vector filed是来衡量在(x, y)点你产生fluid的能力
|
||||
|
||||
那些fluid流入的sink端,他们的Divergence $\cdot$ Vector filed就是negative的
|
||||
|
||||

|
||||

|
||||
|
||||
同时,如果点可以slow flow in变fast slow out,这个点位的divergence $\cdot$ vector filed也是positive的
|
||||
|
||||

|
||||

|
||||
|
||||
Vector field input point得到的是一个多维的输出,指向一个方向并带有scale;divergence $\cdot$ vector field,它的输出depends on the behavior of the field in small neighborhood around that point。输出为一个数值,衡量这个point acts as a source or a sink
|
||||
|
||||

|
||||

|
||||
|
||||
> [!note]
|
||||
> For actual fluid flow: $\text{div} F = 0$ everywhere
|
||||
|
||||
## Curl
|
||||
|
||||

|
||||

|
||||
|
||||
Curl是衡量fluid在point被rotate的程度,clockwise方向是positive curl,counterclockwise是negative curl。
|
||||
|
||||

|
||||

|
||||
|
||||
上图中这个点的curl也是非零的,因为fluid上快下慢,result in clockwise influence
|
||||
|
||||
@ -94,23 +94,23 @@ F_y
|
||||
= \frac{\partial F_y}{\partial x} - \frac{\partial F_x}{\partial y}
|
||||
$$
|
||||
|
||||

|
||||

|
||||
|
||||
### Detail Explanation
|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
||||
在$(x_0, y_0)$微分一个很小的tiny step,会有一个新的vector,它与原有的vector会有一个difference。
|
||||
|
||||

|
||||

|
||||
|
||||
$\text{div} F(x_0, y_0)$其实就是corresponds to $360^\circ$方向的average的Step $\cdot$ Difference
|
||||
|
||||
可以想象一个source端,它朝四面发射vector,它的Step $\cdot$ Difference自然就是positive的
|
||||
|
||||

|
||||

|
||||
|
||||
同理,不难想象的是,$\text{curl} F(x_0, y_0)$是corresponds to Step $\times$ Difference
|
||||
|
||||
@ -125,7 +125,7 @@ $$
|
||||
$$
|
||||
|
||||
|
||||

|
||||

|
||||
|
||||
* $\rho$是charge density
|
||||
* $\epsilon_0$是Epsilon Naught,free space的介电常数,它决定free space空间中电场的强度
|
||||
@ -146,7 +146,7 @@ $$
|
||||
\text{div} B = 0
|
||||
$$
|
||||
|
||||

|
||||

|
||||
|
||||
磁场的divergence在任意地方为0,说明磁场的fluid是incompressible的,没有source也没有sinks,就像water一样。也有这样的interpretation,说明magnetic monopoles是不存在的
|
||||
|
||||
@ -156,16 +156,16 @@ $$
|
||||
\nabla \times E = - \frac{1}{c} \frac{\partial B}{\partial t}
|
||||
$$
|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
## Ampère's circuital law (with Maxwell's addition)
|
||||
|
||||
$$
|
||||
\nabla \times B = \frac{1}{c} (4\pi J + \frac{\partial E}{\partial t})
|
||||
$$
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
# Maxwells equation explain EM wave
|
||||
@ -184,7 +184,7 @@ $$
|
||||
|
||||
电磁波的波段处于无法被肉眼观测的波段,直到Maxwells去世后,才被Hertz用实验证实了电磁波的存在。
|
||||
|
||||

|
||||

|
||||
|
||||
# Reference
|
||||
|
||||
|
||||
@ -16,7 +16,7 @@ In physics and engineering, the quality factor or Q factor is a **dimensionless*
|
||||
|
||||
|
||||
|
||||
<font size=1>Fig. A damped oscillation. A low Q factor – about 5 here – means the oscillation dies out rapidly.</font>
|
||||
<font size=1>Fig. A damped oscillation. A low Q factor – about 5 here – means the oscillation dies out rapidly.</font>
|
||||
|
||||
|
||||
Q因子较高的振子在共振时,在共振频率附近的**振幅较大**,但会产生的共振的**频率范围比较小**,此频率范围可以称为频宽。
|
||||
@ -31,7 +31,7 @@ Q因子较高的振子在共振时,在共振频率附近的**振幅较大**,
|
||||
|
||||
# Definition
|
||||
|
||||

|
||||

|
||||
|
||||
<font size=1>Fig. 一阻尼谐振子的频宽, $\Delta f$可以用频率和能量的图来表示。阻尼谐振子(或滤波器)的Q因子为$f_{0}/\Delta f$。Q因子越大,其波峰高度会越高,而其宽度会越窄</font>
|
||||
|
||||
|
||||
@ -19,7 +19,7 @@ tags:
|
||||
|
||||
## *Resonant Frequency*
|
||||
|
||||
电容,电阻的[电抗](Physics/Electromagnetism/Basic/Electric_units.md#Electrical%20impedance)相同时发生谐振
|
||||
电容,电阻的[电抗](physics/Electromagnetism/Basic/Electric_units.md#Electrical%20impedance)相同时发生谐振
|
||||
|
||||
$$
|
||||
|X_C| = |\frac{1}{j2\pi fC}| = |X_L| = |j2\pi fL|
|
||||
@ -38,7 +38,7 @@ $$
|
||||
|
||||
* 阻抗最小,且为纯电阻,$Z = R+jXL-jXC = R$
|
||||
|
||||
## **品质因子** ([*Q factor*](Physics/Electromagnetism/Q_factor.md))
|
||||
## **品质因子** ([*Q factor*](physics/Electromagnetism/Q_factor.md))
|
||||
|
||||
* 电感器或电容器在谐振时产生的电抗功率与电阻器消耗的平均功率之比,称为谐振时之品质因子。
|
||||
|
||||
|
||||
@ -14,7 +14,7 @@ tags:
|
||||
|
||||
要解释像差如何使图像模糊,首先要解释一下:什么是混淆圈? 当来自目标的光点到达镜头,然后会聚在传感器上时,它会很清晰。 否则,如果它在传感器之前或之后会聚,则传感器上的光分布会更广。 这可以在图 1 中看到,其中可以看到点光源会聚在传感器上,但随着传感器位置的变化,沿传感器散布的光量也会发生变化。
|
||||
|
||||

|
||||

|
||||
|
||||
光线越分散,图像的焦点就越少。 除非光圈很小,否则图像中彼此距离较大的目标通常会使背景或前景失焦。 这是因为会聚在前景中的光与来自背景中较远目标的光会聚在不同的点。
|
||||
|
||||
@ -25,7 +25,7 @@ tags:
|
||||
|
||||
彗形像差,又称彗星像差,此种像差的分布形状以类似于彗星的拖尾而得名。
|
||||
|
||||

|
||||

|
||||
|
||||
这是一些透镜固有的或是光学设计造成的缺点,导致离开光轴的点光源,例如恒星,产生变形。特别是,彗形像差被定义为偏离入射光孔的放大变异。在折射或衍射的光学系统,特别是在宽光谱范围的影像中,彗形像差是波长的函数。
|
||||
|
||||
@ -35,7 +35,7 @@ tags:
|
||||
|
||||
这可以在图 3 中看到,其中两个焦点由红色水平面和蓝色垂直面表示。 图像中的最佳清晰度点将在这两个点之间,其中任一平面的混淆圈都不太宽。
|
||||
|
||||

|
||||

|
||||
|
||||
当光学器件未对准时,散光会导致图像的侧面和边缘失真。 它通常被描述为在查看图像中的线条时缺乏清晰度。
|
||||
|
||||
@ -47,7 +47,7 @@ tags:
|
||||
|
||||
场曲是图像平面由于多个焦点而变得不平坦的结果。
|
||||
|
||||

|
||||

|
||||
|
||||
相机镜头已在很大程度上纠正了这一点,但在许多镜头上可能会发现一些场曲。 一些传感器制造商实际上正在研究可以校正弯曲焦点区域的弯曲传感器。 这种设计将允许传感器校正像差,而不需要以这种精度生产昂贵的镜头设计。 通过实施这种类型的传感器,可以使用更便宜的镜头来产生高质量的结果。 这方面的真实例子可以在开普勒太空天文台看到,那里使用弯曲的传感器阵列来校正望远镜的大型球面光学元件。
|
||||
|
||||
@ -59,7 +59,7 @@ tags:
|
||||
|
||||
具有桶形失真的图像的边缘和侧面远离中心弯曲。 这在视觉上看起来像是图像中有一个凸起,因为它捕获了弯曲视场 (FoV, field of view) 的外观。 例如,当在高层建筑的高处使用较低焦距的镜头(也称为广角镜头)时,可以捕捉到更宽的 FoV。 如图 5 所示,使用产生非常扭曲和宽 FoV 的鱼眼镜头时,这种情况最为夸张。在此图像中,网格线用于帮助说明失真效果如何在靠近侧面的地方向外产生更拉伸的图像, 边缘。
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
### Pincushion distortion (枕型畸变)
|
||||
@ -68,7 +68,7 @@ tags:
|
||||
|
||||
这种形式的像差最常见于焦距较长的远摄镜头。
|
||||
|
||||

|
||||

|
||||
|
||||
### Mustache distortion
|
||||
|
||||
@ -81,13 +81,13 @@ tags:
|
||||
|
||||
光的颜色代表特定波长的光。 由于折射,彩色图像将有多个波长进入镜头并聚焦在不同的点。 纵向或轴向色差是由不同波长聚焦在沿光轴的不同点引起的。 波长越短,其焦点将离镜头越近,而波长越远,则反之,离镜头越远,如图 8 所示。通过引入较小的孔径,进入的光仍可能聚焦在不同的位置 点,但“混淆圈”的宽度(直径)会小得多,导致不那么剧烈的模糊。
|
||||
|
||||

|
||||

|
||||
|
||||
### Transverse / lateral aberration
|
||||
|
||||
导致不同波长沿图像平面分布的离轴光是横向或横向色差。 这会导致图像中主体边缘出现彩色边纹。 这比纵向色差更难校正。
|
||||
|
||||

|
||||

|
||||
|
||||
它可以使用引入不同折射率的消色差双合透镜来固定。 通过将可见光谱的两端置于一个焦点上,可以消除色边。 对于横向和纵向色差,减小光圈的大小也有帮助。 此外,在高对比度环境(即具有非常亮的背景的图像)中不成像目标可能是有益的。 在显微镜中,镜头可能使用复消色差透镜 (APO) 而不是消色差透镜,消色差透镜使用三个透镜元件来校正入射光的所有波长。 当颜色最重要时,确保减轻色差将产生最佳效果。
|
||||
|
||||
|
||||
@ -7,4 +7,4 @@ tags:
|
||||
|
||||
# Electromagnetism
|
||||
|
||||
* [Electromagnetism MOC](Physics/Electromagnetism/Electromagnetism_MOC.md)
|
||||
* [Electromagnetism MOC](physics/Electromagnetism/Electromagnetism_MOC.md)
|
||||
@ -33,7 +33,7 @@ $$
|
||||
|
||||
## Example
|
||||
|
||||

|
||||

|
||||
|
||||
其中$v_s = 0.7c$,波前开始在源的右侧(前面)聚集,并在源的左侧(后面)进一步分开。
|
||||
|
||||
|
||||
@ -3,7 +3,7 @@
|
||||
|
||||
# 背景
|
||||
|
||||

|
||||

|
||||
|
||||
# 测试结果
|
||||
|
||||
@ -11,11 +11,11 @@
|
||||
|
||||
前方30cm内无反射,超出本雷达测距能力极限,近似为无穷远距离内无反射,得到收集端电压
|
||||
|
||||

|
||||

|
||||
|
||||
以前的天线收集的数据:
|
||||
|
||||

|
||||

|
||||
|
||||
问题在于两点:
|
||||
|
||||
@ -38,11 +38,11 @@
|
||||
|
||||
新天线收集数据:
|
||||
|
||||

|
||||

|
||||
|
||||
旧天线收集信号:
|
||||
|
||||

|
||||

|
||||
|
||||
问题在于:
|
||||
|
||||
|
||||
@ -17,7 +17,7 @@ tags:
|
||||
|
||||
* [Hardware](computer_sci/Hardware/Hardware_MOC.md)
|
||||
|
||||
* [Physics](Physics/Physics_MOC.md)
|
||||
* [Physics](physics/Physics_MOC.md)
|
||||
|
||||
* [Signal Processing](signal_processing/signal_processing_MOC.md)
|
||||
|
||||
@ -25,7 +25,7 @@ tags:
|
||||
|
||||
* [About coding language design detail](computer_sci/coding_knowledge/coding_lang_MOC.md)
|
||||
|
||||
* [Math](Math/MOC.md)
|
||||
* [Math](math/MOC.md)
|
||||
|
||||
* [Computational Geometry](computer_sci/computational_geometry/MOC.md)
|
||||
|
||||
@ -41,7 +41,7 @@ tags:
|
||||
|
||||
🛶 Also, he learn some knowledge about his hobbies:
|
||||
|
||||
* [📷 Photography](Photography/Photography_MOC.md)
|
||||
* [📷 Photography](photography/Photography_MOC.md)
|
||||
|
||||
* [📮文学](文学/文学_MOC.md)
|
||||
|
||||
|
||||
@ -12,7 +12,7 @@ Quantile loss用于衡量预测分布和目标分布之间的差异,特别适
|
||||
|
||||
# What is quantile
|
||||
|
||||
[Quantile](Math/Statistics/Basic/Quantile.md)
|
||||
[Quantile](math/Statistics/Basic/Quantile.md)
|
||||
|
||||
# What is a prediction interval
|
||||
|
||||
|
||||
@ -10,7 +10,7 @@ XGBoost is an open-source software library that implements optimized distributed
|
||||
|
||||
# What you need to know first
|
||||
|
||||
* [🚧🚧AdaBoost](computer_sci/deep_learning_and_machine_learning/deep_learning/AdaBoost.md)
|
||||
* [🚧🚧AdaBoost](computer_sci/deep_learning_and_machine_learning/deep_learning/adaBoost.md)
|
||||
|
||||
# What is XGBoost
|
||||
|
||||
|
||||
@ -8,21 +8,21 @@ tags:
|
||||
|
||||
# Attention is all you need
|
||||
|
||||
* [[computer_sci/deep_learning_and_machine_learning/deep_learning/⭐Attention|Attention Blocker]]
|
||||
* [[computer_sci/deep_learning_and_machine_learning/deep_learning/Transformer|Transformer]]
|
||||
* [[computer_sci/deep_learning_and_machine_learning/deep_learning/attention|Attention Blocker]]
|
||||
* [[computer_sci/deep_learning_and_machine_learning/deep_learning/transformer|transformer]]
|
||||
|
||||
|
||||
# Tree-like architecture
|
||||
|
||||
* [Decision Tree](computer_sci/deep_learning_and_machine_learning/deep_learning/Decision_Tree.md)
|
||||
* [Random Forest](computer_sci/deep_learning_and_machine_learning/deep_learning/Random_Forest.md)
|
||||
* [Deep Neural Decision Forests](computer_sci/deep_learning_and_machine_learning/deep_learning/Deep_Neural_Decision_Forests.md)
|
||||
* [Decision Tree](computer_sci/deep_learning_and_machine_learning/deep_learning/decision_tree.md)
|
||||
* [Random Forest](computer_sci/deep_learning_and_machine_learning/deep_learning/random_forest.md)
|
||||
* [Deep Neural Decision Forests](computer_sci/deep_learning_and_machine_learning/deep_learning/deep_neural_decision_forests.md)
|
||||
* [XGBoost](computer_sci/deep_learning_and_machine_learning/deep_learning/XGBoost.md)
|
||||
|
||||
|
||||
# Ensemble Learning
|
||||
|
||||
* [AdaBoost](computer_sci/deep_learning_and_machine_learning/deep_learning/AdaBoost.md)
|
||||
* [adaBoost](computer_sci/deep_learning_and_machine_learning/deep_learning/adaBoost.md)
|
||||
* [XGBoost](computer_sci/deep_learning_and_machine_learning/deep_learning/XGBoost.md)
|
||||
|
||||
|
||||
|
||||
|
After Width: | Height: | Size: 119 KiB |
|
After Width: | Height: | Size: 119 KiB |
|
After Width: | Height: | Size: 77 KiB |
|
After Width: | Height: | Size: 66 KiB |
|
After Width: | Height: | Size: 68 KiB |
|
After Width: | Height: | Size: 55 KiB |
|
After Width: | Height: | Size: 55 KiB |
|
After Width: | Height: | Size: 52 KiB |
|
After Width: | Height: | Size: 25 KiB |
|
After Width: | Height: | Size: 43 KiB |
@ -0,0 +1,8 @@
|
||||
---
|
||||
title: Model Evaluation - MOC
|
||||
tags:
|
||||
- deep-learning
|
||||
- evaluation
|
||||
---
|
||||
|
||||
* [Model Evaluation in Time Series Forecasting](computer_sci/deep_learning_and_machine_learning/Evaluation/time_series_forecasting.md)
|
||||
@ -0,0 +1,121 @@
|
||||
---
|
||||
title: Model Evaluation in Time Series Forecasting
|
||||
tags:
|
||||
- deep-learning
|
||||
- evaluation
|
||||
- time-series-dealing
|
||||
---
|
||||
|
||||

|
||||
|
||||
# Some famous time series scoring technics
|
||||
|
||||
1. **MAE, RMSE and AIC**
|
||||
2. **Mean Forecast Accuracy**
|
||||
3. **Warning: The time series model EVALUATION TRAP!**
|
||||
4. **RdR Score Benchmark**
|
||||
|
||||
## MAE, RMSE, AIC
|
||||
|
||||
MAE means **Mean Absolute Error (MAE)** and RMSE means **Root Mean Squared Error (RMSE)**.
|
||||
|
||||
这是两个衡量 continuous variables的accuracy的著名指标,MAE在以前的文章中被时常使用,16年的观察已经发现RMSE或者其他version的R-squared逐渐被使用起来
|
||||
|
||||
*我们需要了解何时使用哪种指标会更好*
|
||||
|
||||
### MAE
|
||||
|
||||
$$
|
||||
\text{MAE} = \frac{1}{n}\sum_{j=1}^n |y_j - \hat{y}_j|
|
||||
$$
|
||||
MAE的特点在于所有individual difference有着equal weight
|
||||
|
||||
如果将绝对值去掉,MAE会变成**Mean Bias Error (MBE)**,使用MBE时,要注意正反bias相互抵消
|
||||
|
||||
### RMSE
|
||||
|
||||
$$
|
||||
\text{RMSE} = \sqrt{\frac{1}{n} \sum_{j=1}^n (y_j - \hat{y}_j)^2}
|
||||
$$
|
||||
|
||||
均方根误差(RMSE)是一种二次评分规则,它还测量误差的平均幅度。它是预测值和实际观测值之间差异的平方的平均值的平方根。
|
||||
|
||||
### AIC
|
||||
|
||||
$$
|
||||
\text{AIC} = 2k - 2\ln{(\hat{L})}
|
||||
$$
|
||||
$k$是模型参数的估计,$\hat{L}$是模型似然函数(likelihood function)的最大化值
|
||||
|
||||
**Akaike information criterion**,赤池信息准则(AIC)是一个有助于比较模型的指标,因为它同时考虑了模型对数据的拟合程度和模型的复杂性。
|
||||
|
||||
AIC衡量信息的损失并**对模型的复杂性进行惩罚**。它是*参数数量惩罚后的负对数似然函数*。AIC的主要思想是模型参数越少越好。**AIC允许您测试模型在不过拟合数据集的情况下拟合数据的程度**
|
||||
|
||||
### Comparison
|
||||
|
||||
#### Similarities between MAE and RMSE
|
||||
|
||||
均方误差(MAE)和均方根误差(RMSE)都以感兴趣变量的单位来表示平均模型预测误差。这两个指标都可以在0到∞的范围内变化,并且对误差的方向不敏感。它们是负向评分指标,也就是说数值越低越好。
|
||||
|
||||
#### Differences between MAE and RMSE
|
||||
|
||||
*由于误差在求平均之前被平方,RMSE对大误差给予相对较高的权重*。这意味着在特别不希望出现大误差的情况下,RMSE应该更有用;而在MAE的平均值中,这些大误差将被稀释,
|
||||
|
||||

|
||||
|
||||
AIC the lower is better,但没有perfect score,只能用来相同dataset下不同model的性能
|
||||
|
||||
## Mean Forecast Accuracy
|
||||
|
||||

|
||||
|
||||
计算每个点的Forecast Accuracy,然后求平均,得到 Mean Forecast Accuracy
|
||||
|
||||
Mean Forecast Accuracy的重大缺陷在大的偏离值造成巨大的负面影响,比如$1 - \frac{|\hat{y}_j - y_j|}{y_j} = 1 - \frac{250-25}{25} = -800\%$
|
||||
|
||||
解决方案是将Forecast Accuracy的最小值限制为0%,同时可以使用Median代替Mean。
|
||||
|
||||
一般来说,**当你的误差分布偏斜时,你应该使用 Median 而不是 Mean**。 在某些情况下,Mean Forecast Accuray也可能毫无意义。 如果你还记得你的统计数据; 变异系数 (**coefficient of variation**, CV) 表示标准偏差与平均值的比率($\text{CV} = (\text{Standard Deviation}/\text{Mean} * 100)$)。 大 CV 值意味着大变异性,这也意味着围绕均值的离差程度更大。 **例如,我们可以将 CV 高于 0.7 的任何事物视为高度可变且不可真正预测的。 另外,还可以说明你的预测模型预测能力很不稳定!**
|
||||
|
||||
## RdR Score Benchmark (这是一个具有实验性的指标,blogger指出这个指标并没有在research paper出现过)
|
||||
|
||||
RdR metric stands for:
|
||||
* *R*: **Naïve Random Walk**
|
||||
* *d*: **Dynamic Time Warping**
|
||||
* *R*: **Root Mean Squared Error**
|
||||
|
||||
### DTW to deal with shape similarity
|
||||
|
||||

|
||||
|
||||
RMSE、MAE这些指标都没有考虑到一个重要的标准:**THE SHAPE SIMILARITY**
|
||||
|
||||
RdR Score Benchmark使用 [**Dynamic Time Warping(DTW,动态时间调整)** ](computer_sci/deep_learning_and_machine_learning/Trick/DTW.md)作为shape similarity的指标
|
||||
|
||||

|
||||
欧氏距离在时间序列之间可能是一个不好的选择,因为时间轴上存在扭曲的情况。
|
||||
|
||||
* DTW:通过“同步”/“对齐”时间轴上的不同信号,找到两个时间序列之间的最佳(最小距离)扭曲路径
|
||||
|
||||
### RdR score means
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
*RdR score*通过RMSE和DTW distance来计算,用于比较你的model和Radnom Walk(*Random Walk的RdR score = 0*)相比的优越性
|
||||
|
||||
### RdR calculation details
|
||||
|
||||
可以通过绘制 RMSE vs. DTW来计算RdR score,绘制的图如下所示:
|
||||
|
||||

|
||||
|
||||
|
||||
计算矩阵面积来计算RdR score,(文章里并没有完整介绍计算,在[github code](https://github.com/CoteDave/blog/tree/master/RdR%20score)里有,并不确定)
|
||||
|
||||
# Reference
|
||||
|
||||
* M.Sc, Dave Cote. “RdR Score Metric for Evaluating Time Series Forecasting Models.” _Medium_, 8 Feb. 2022, https://medium.com/@dave.cote.msc/rdr-score-metric-for-evaluating-time-series-forecasting-models-1c23f92f80e7.
|
||||
* JJ. “MAE and RMSE — Which Metric Is Better?” _Human in a Machine World_, 23 Mar. 2016, https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d.
|
||||
* _Accelerating Dynamic Time Warping Subsequence Search with GPU_. https://www.slideshare.net/DavideNardone/accelerating-dynamic-time-warping-subsequence-search-with-gpu. Accessed 29 May 2023.
|
||||
@ -0,0 +1,77 @@
|
||||
---
|
||||
title: DeepAR - Time Series Forcasting
|
||||
tags:
|
||||
- deep-learning
|
||||
- model
|
||||
- time-series-dealing
|
||||
---
|
||||
|
||||
DeepAR, an autoregressive recurrent network developed by Amazon, is the first model that could natively work on multiple time-series. It's a milestone in time-series community.
|
||||
|
||||
# What is DeepAR
|
||||
|
||||
> [!quote]
|
||||
> DeepAR is the first successful model to combine Deep Learning with traditional Probabilistic Forecasting.
|
||||
|
||||
* **Multiple time-series support**
|
||||
* **Extra covariates**: *DeepAR* allows extra features, covariates. It is very important for me when I learn *DeepAR*, because in my task, I have corresponding feature for each time series.
|
||||
* **Probabilistic output**: Instead of making a single prediction, the model leverages [**quantile loss**](computer_sci/deep_learning_and_machine_learning/Trick/quantile_loss.md) to output prediction intervals.
|
||||
* **“Cold” forecasting:** By learning from thousands of time-series that potentially share a few similarities, _DeepAR_ can provide forecasts for time-series that have little or no history at all.
|
||||
|
||||
# Block used in DeepAR
|
||||
|
||||
* [LSTM](computer_sci/deep_learning_and_machine_learning/deep_learning/LSTM.md)
|
||||
|
||||
# *DeepAR* Architecture
|
||||
|
||||
DeepAR模型并不直接使用LSTMs去计算prediction,而是去估计Gaussian likelihood function的参数,即$\theta=(\mu,\sigma)$,估计Gaussian likelihood function的mean和standard deviation。
|
||||
|
||||
## Training Step-by-Step
|
||||
|
||||

|
||||
|
||||
假设目前我们在time-series $i$ 的 t 时刻,
|
||||
|
||||
1. LSTM cell会输入covariates $x_{i,t}$,即$x_i$在t时刻的值,还有上一时刻的target variable,$z_{i,t-1}$,LSTM还需要输入上一时刻的隐藏状态$h_{i,t-1}$
|
||||
2. LSTM紧接着就会输出当前的hidden state $h_{i,t}$,会输入到下一步中
|
||||
3. Gaussian likelihood function里的parameter,$\mu$和$\sigma$会从$h_{i,t}$中不直接计算出,计算细节在后面
|
||||
|
||||
> [!quote]
|
||||
> 换言之,这个模型是为了得到最好的$\mu$和$\sigma$去构建gaussian distribution,让预测更接近$z_{i,t}$;同时,因为*DeepAR*每次都是train and predicts a single data point,所以这个模型也被称为autoregressive模型
|
||||
|
||||
|
||||
## Inference Step-by-Step
|
||||
|
||||
|
||||

|
||||
|
||||
|
||||
在使用model进行预测的时候,某一改变的就是使用预测值$\hat{z}$ 代替真实值$z$,同时$\hat{z}_{i,t}$是在我们模型学习到的Gaussian distribution里sample得到的,而这个Gaussian distribution里的参数$\mu$和$\sigma$并不是model直接学习到的,*DeepAR*如何做到这一点的呢?
|
||||
|
||||
# Gaussian Likelihood
|
||||
|
||||
$$
|
||||
\ell_G(z|\mu,\sigma) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp{(-\frac{(z-\mu)^2)}{2\sigma^2}}
|
||||
$$
|
||||
|
||||
Estimate gaussian distribution的任务一般会被转化成maximize gaussian log-likelihood function的任务,即**MLEformulas**(maximum log-likelihood estimators)
|
||||
**Gaussian log-likelihood function**:
|
||||
|
||||
$$
|
||||
\mathcal{L} = \sum_{i=1}^{N}\sum_{t=t_o}^{T} \log{\ell(z_{i,t}|\theta(h_{i,t}))}
|
||||
$$
|
||||
|
||||
|
||||
# Parameter estimation in *DeepAR*
|
||||
|
||||
|
||||
在统计学中,预估Gaussian Distribution一般使用MLEformulas,但是在*DeepAR*中,并不这么去做,而是使用两个dense layer去做预估,如下图:
|
||||
|
||||

|
||||
|
||||
使用dense layer的方式去预估Gaussian distribution的原因在于,可以使用backpropagation
|
||||
|
||||
|
||||
# Reference
|
||||
|
||||
* [https://towardsdatascience.com/deepar-mastering-time-series-forecasting-with-deep-learning-bc717771ce85](https://towardsdatascience.com/deepar-mastering-time-series-forecasting-with-deep-learning-bc717771ce85)
|
||||
@ -0,0 +1,11 @@
|
||||
---
|
||||
title: Famous Model MOC
|
||||
tags:
|
||||
- deep-learning
|
||||
- MOC
|
||||
---
|
||||
|
||||
# Time-series
|
||||
|
||||
* [DeepAR](computer_sci/deep_learning_and_machine_learning/Famous_Model/DeepAR.md)
|
||||
|
||||
@ -0,0 +1,8 @@
|
||||
---
|
||||
title: Temporal Fusion Transformer
|
||||
tags:
|
||||
- deep-learning
|
||||
- model
|
||||
- time-series-dealing
|
||||
---
|
||||
|
||||
|
After Width: | Height: | Size: 44 KiB |
|
After Width: | Height: | Size: 44 KiB |
|
After Width: | Height: | Size: 55 KiB |
|
After Width: | Height: | Size: 65 KiB |
@ -0,0 +1,25 @@
|
||||
---
|
||||
title: Large Language Model(LLM) - MOC
|
||||
tags:
|
||||
- deep-learning
|
||||
- LLM
|
||||
- NLP
|
||||
---
|
||||
|
||||
# Training
|
||||
|
||||
* [Training Tech Outline](computer_sci/deep_learning_and_machine_learning/LLM/train/steps.md)
|
||||
* [⭐⭐⭐Train LLM from scratch](computer_sci/deep_learning_and_machine_learning/LLM/train/train_LLM.md)
|
||||
* [⭐⭐⭐Detailed explanation of RLHF technology](computer_sci/deep_learning_and_machine_learning/LLM/train/RLHF.md)
|
||||
* [How to do use fine tune tech to create your chatbot](computer_sci/deep_learning_and_machine_learning/LLM/train/finr_tune/how_to_fine_tune.md)
|
||||
* [Learn finetune by Stanford Alpaca](computer_sci/deep_learning_and_machine_learning/LLM/train/finr_tune/learn_finetune_byStanfordAlpaca.md)
|
||||
|
||||
# Metrics
|
||||
|
||||
How to evaluate a LLM performance?
|
||||
|
||||
* [Tasks to evaluate BERT - Maybe can be deployed in other LM](computer_sci/deep_learning_and_machine_learning/LLM/metircs/some_task.md)
|
||||
|
||||
# Basic
|
||||
|
||||
* [LLM Hyperparameter](computer_sci/deep_learning_and_machine_learning/LLM/basic/llm_hyperparameter.md)
|
||||
|
After Width: | Height: | Size: 216 KiB |
|
After Width: | Height: | Size: 216 KiB |
|
After Width: | Height: | Size: 173 KiB |
|
After Width: | Height: | Size: 444 KiB |
|
After Width: | Height: | Size: 28 KiB |
|
After Width: | Height: | Size: 6.5 MiB |
|
After Width: | Height: | Size: 1.8 MiB |
@ -0,0 +1,56 @@
|
||||
---
|
||||
title: LLM hyperparameter
|
||||
tags:
|
||||
- hyperparameter
|
||||
- LLM
|
||||
- deep-learning
|
||||
- basic
|
||||
---
|
||||
|
||||
# LLM Temperature
|
||||
|
||||
Temperature definition come from the physical meaning of temperature. The more higher temperature, the atoms moving more faster, meaning more randomness.
|
||||
|
||||

|
||||
|
||||
LLM temperature is a hyperparameter that regulates **the randomness, or creativity.**
|
||||
|
||||
* Higher the LLM temperature, more diverse and creative, increasing likelihood of straying from context.
|
||||
* Lower the LLM temperature, more focused and deterministic, sticking closely to the most likely prediction
|
||||
|
||||

|
||||
|
||||
## More detail
|
||||
|
||||
The LLM model is to give a probability of next word, like this:
|
||||
|
||||

|
||||
|
||||
"A cat is chasing a …", there are lots of words can be filled in that blank. Different words have different probabilities, in the model, we output the next word ratings.
|
||||
|
||||
Sure, we can always pick the highest rating word, but that would result in very standard predictable boring sentences, and the model wouldn't be equivalent to human language, because we don't always use the most common word either.
|
||||
|
||||
So, we want to design a mechanism that **allows all words with a decent rating to occur with a reasonable probability**, that's why we need temperature in LLM model.
|
||||
|
||||
Like real physic world, we can do samples to describe the distribution, *we use SoftMax to describe the distribution of the probability of the next word*. The temperature is the element $T$ in the formula:
|
||||
|
||||
$$
|
||||
p_i = \frac{\exp{(\frac{R_i}{T})}}{\sum_i \exp{(\frac{R_i}{T})}}
|
||||
$$
|
||||
|
||||

|
||||
|
||||
More lower the $T$, the higher rating word's probability will goes to 100%, and more higher the $T$, the probability will be more smoother for very words.
|
||||
|
||||
*The gif below is important and intuitive.*
|
||||
|
||||

|
||||
|
||||
So, set different $T$, the next word's probability will be changed, we will output next word depending on the probability.
|
||||
|
||||

|
||||
|
||||
# Reference
|
||||
|
||||
* [LLM Temperature, dedpchecks](https://deepchecks.com/glossary/llm-parameters/#:~:text=One%20intriguing%20parameter%20within%20LLMs,of%20straying%20from%20the%20context.)
|
||||
* [⭐⭐⭐https://www.youtube.com/watch?v=YjVuJjmgclU](https://www.youtube.com/watch?v=YjVuJjmgclU)
|
||||
|
After Width: | Height: | Size: 272 KiB |
@ -0,0 +1,44 @@
|
||||
---
|
||||
title: LangChain Explained
|
||||
tags:
|
||||
- LLM
|
||||
- basic
|
||||
- langchain
|
||||
---
|
||||
|
||||
# What is LangChain
|
||||
|
||||
LangChain is an open source framework that allows AI developers to combine LLMs like GPT-4 *with external sources of computation and data*.
|
||||
|
||||
# Why LangChain
|
||||
|
||||
LangChain can make LLM answer question depending on your own documents. It can help you doing lots of amazing apps.
|
||||
|
||||
You can use LangChain to make GPT to do analysis on your own company data, booking flight depending on schedule. summarizing abstract on bunches of PDFs, .….
|
||||
|
||||
# LangChain value propositions
|
||||
|
||||
## Components
|
||||
|
||||
* LLM Wrappers
|
||||
* Prompt Templates
|
||||
* Indexes for relevant information retrieval
|
||||
|
||||
## Chains
|
||||
|
||||
Assemble components to solve a specific task - finding info in a book...
|
||||
|
||||
## Agents
|
||||
|
||||
Agents allow LLMs to interact with it's environment. - For instance, make API request with a specific action
|
||||
|
||||
# LangChain Framework
|
||||
|
||||

|
||||
|
||||
|
||||
|
||||
# Reference
|
||||
|
||||
* [https://www.youtube.com/watch?v=aywZrzNaKjs](https://www.youtube.com/watch?v=aywZrzNaKjs)
|
||||
*
|
||||
|
After Width: | Height: | Size: 88 KiB |
|
After Width: | Height: | Size: 88 KiB |
@ -0,0 +1,36 @@
|
||||
---
|
||||
title: Tasks to evaluate BERT - Maybe can be deployed in other LM
|
||||
tags:
|
||||
- LLM
|
||||
- metircs
|
||||
- deep-learning
|
||||
- benchmark
|
||||
---
|
||||
|
||||
# Overview
|
||||
|
||||

|
||||
|
||||
# MNLI-m (Multi-Genre Natural Language Inference - Matched):
|
||||
|
||||
MNLI-m is a benchmark dataset and task for natural language inference (NLI). The goal of NLI is to determine the logical relationship between two given sentences: whether the relationship is "entailment," "contradiction," or "neutral." MNLI-m focuses on matched data, which means the sentences are drawn from the same genres as the sentences in the training set. It is part of the GLUE (General Language Understanding Evaluation) benchmark, which evaluates the performance of models on various natural language understanding tasks.
|
||||
|
||||
# QNLI (Question Natural Language Inference):
|
||||
|
||||
QNLI is another NLI task included in the GLUE benchmark. In this task, the model is given a sentence that is a premise and a sentence that is a question related to the premise. The goal is to determine whether the answer to the question can be inferred from the given premise. The dataset for QNLI is derived from the Stanford Question Answering Dataset (SQuAD).
|
||||
|
||||
# MRPC (Microsoft Research Paraphrase Corpus):
|
||||
|
||||
MRPC is a dataset used for paraphrase identification or semantic equivalence detection. It consists of sentence pairs from various sources that are labeled as either paraphrases or not. The task is to classify whether a given sentence pair expresses the same meaning (paraphrase) or not. MRPC is also part of the GLUE benchmark and helps evaluate models' ability to understand sentence similarity and equivalence.
|
||||
|
||||
# SST-2 (Stanford Sentiment Treebank - Binary Sentiment Classification):
|
||||
|
||||
SST-2 is a binary sentiment classification task based on the Stanford Sentiment Treebank dataset. The dataset contains sentences from movie reviews labeled as either positive or negative sentiment. The task is to classify a given sentence as expressing a positive or negative sentiment. SST-2 is often used to evaluate the ability of models to understand and classify sentiment in natural language.
|
||||
|
||||
# SQuAD (Stanford Question Answering Dataset):
|
||||
|
||||
SQuAD is a widely known dataset and task for machine reading comprehension. It consists of questions posed by humans on a set of Wikipedia articles, where the answers to the questions are spans of text from the corresponding articles. The goal is to build models that can accurately answer the questions based on the provided context. SQuAD has been instrumental in advancing the field of question answering and evaluating models' reading comprehension capabilities.
|
||||
|
||||
Overall, these tasks and datasets serve as benchmarks for evaluating natural language understanding and processing models. They cover a range of language understanding tasks, including natural language inference, paraphrase identification, sentiment analysis, and machine reading comprehension.
|
||||
|
||||
|
||||
@ -0,0 +1,65 @@
|
||||
---
|
||||
title: Reinforcement Learning from Human Feedback
|
||||
tags:
|
||||
- LLM
|
||||
- deep-learning
|
||||
- RLHF
|
||||
- LLM-training-method
|
||||
---
|
||||
|
||||
|
||||
# Review: Reinforcement Learning Basics
|
||||
|
||||

|
||||
|
||||
|
||||
Reinforcement learning is a mathematical framework.
|
||||
|
||||
Demystify the reinforcement learning model, it's a open-ended model using reward function to optimize agent to solve complex task in target environment.
|
||||
|
||||
<!---
|
||||
# Origins of RLHF
|
||||
|
||||
## Pre Deep RL
|
||||
|
||||

|
||||
|
||||
|
||||
Before, Deep RL don't use neural network to represent policy. What this system did was a machine learning system that created a policy by having humans label the actions that an agent took as being kind of correct or incorrect. This was just a simple decision rule where humans labeled every actions as good or bad. This was essentially a reward model and a policy put together.
|
||||
|
||||
## For Deep RL
|
||||
|
||||

|
||||
|
||||
--->
|
||||
|
||||
# Step by Step
|
||||
|
||||
For RLHF training method, here are three core steps:
|
||||
|
||||
1. Pretraining a language model
|
||||
2. Gathering data(问答数据) and training a reward model
|
||||
3. Fine-tuning the LM with reinforcement learning
|
||||
|
||||
## Step 1. Pretraining Language Models
|
||||
|
||||
Read this to learn how to train a LM:
|
||||
|
||||
[Pretraining language models](computer_sci/deep_learning_and_machine_learning/LLM/train/train_LLM.md)
|
||||
|
||||
OpenAI used a smaller version of GPT-3 for its first popular RLHF model - InstructGPT.
|
||||
|
||||
Nowadays, RLHF is new area, there's no answer to which model is the best for starting point of RLHF and using expensive augmented data to fine-tune is not necessarily.
|
||||
|
||||
## Step 2. Reward model training
|
||||
|
||||
In reward model, we integrate human preferences into the system.
|
||||
|
||||

|
||||
|
||||
|
||||
|
||||
# Reference
|
||||
|
||||
* [Reinforcement Learning from Human Feedback: From Zero to chatGPT, YouTube, HuggingFace](https://www.youtube.com/watch?v=2MBJOuVq380)
|
||||
* [Hugging Face blog, ChatGPT 背后的“功臣”——RLHF 技术详解](https://huggingface.co/blog/zh/rlhf)
|
||||
|
After Width: | Height: | Size: 62 KiB |
|
After Width: | Height: | Size: 70 KiB |
|
After Width: | Height: | Size: 47 KiB |
|
After Width: | Height: | Size: 90 KiB |
|
After Width: | Height: | Size: 86 KiB |
@ -0,0 +1,8 @@
|
||||
---
|
||||
title: How to make custom dataset?
|
||||
tags:
|
||||
- dataset
|
||||
- LLM
|
||||
- deep-learning
|
||||
---
|
||||
|
||||
|
After Width: | Height: | Size: 240 KiB |
@ -0,0 +1,7 @@
|
||||
---
|
||||
title: How to do use fine tune tech to create your chatbot
|
||||
tags:
|
||||
- deep-learning
|
||||
- LLM
|
||||
---
|
||||
|
||||
@ -0,0 +1,19 @@
|
||||
---
|
||||
title: Learn finetune by Stanford Alpaca
|
||||
tags:
|
||||
- deep-learning
|
||||
- LLM
|
||||
- fine-tune
|
||||
- LLaMA
|
||||
---
|
||||
|
||||

|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
# Reference
|
||||
|
||||
* [https://www.youtube.com/watch?v=pcszoCYw3vc](https://www.youtube.com/watch?v=pcszoCYw3vc)
|
||||
* [https://crfm.stanford.edu/2023/03/13/alpaca.html](https://crfm.stanford.edu/2023/03/13/alpaca.html)
|
||||
@ -0,0 +1,24 @@
|
||||
---
|
||||
title: LLM training steps
|
||||
tags:
|
||||
- LLM
|
||||
- deep-learning
|
||||
---
|
||||
|
||||
训练大型语言模型(LLM)的方法通常涉及以下步骤:
|
||||
|
||||
1. **数据收集**:收集大规模的文本数据作为训练数据。这些数据可以是互联网上的文本、书籍、文章、新闻、对话记录等。数据的质量和多样性对于训练出高质量的LLM非常重要。
|
||||
|
||||
2. **预处理**:对数据进行预处理以使其适合模型训练。这包括分词(将文本划分为词或子词单元)、建立词汇表(将词映射到数字表示)、清理和规范化文本等操作。
|
||||
|
||||
3. **构建模型架构**:选择适当的模型架构来构建LLM。目前最常用的模型架构是Transformer,其中包含多层的自注意力机制和前馈神经网络层。
|
||||
|
||||
4. **预训练**:使用大规模的文本数据集对模型进行预训练。预训练是指在无监督的情况下,通过让模型学习预测缺失的词语或下一个词语等任务来提取语言知识。这使得模型能够学习到丰富的语言表示。
|
||||
|
||||
5. **微调(Fine-tuning)**:在预训练之后,使用特定的任务数据对模型进行微调。微调是指在特定任务的标注数据上进行有监督的训练,例如文本生成、问题回答等。通过微调,模型可以更好地适应特定任务的要求。
|
||||
|
||||
6. **超参数调优**:调整模型的超参数,例如学习率、批量大小、模型层数等,以获得更好的性能和效果。
|
||||
|
||||
7. **评估和迭代**:对训练后的模型进行评估,并根据评估结果进行迭代改进。这可能包括调整模型架构、增加训练数据、调整训练策略等。
|
||||
|
||||
这些步骤通常是迭代进行的,通过不断的训练和改进,使LLM能够在各种自然语言处理任务中展现出更好的性能和生成能力。值得注意的是,LLM的训练需要大量的计算资源和时间,并且通常由专业团队在大规模的计算环境中进行。
|
||||
@ -0,0 +1,143 @@
|
||||
---
|
||||
title: Train LLM from scratch
|
||||
tags:
|
||||
- LLM
|
||||
- LLM-training-method
|
||||
- deep-learning
|
||||
---
|
||||
|
||||
# Find a dataset
|
||||
|
||||
Find a corpus of text in language you prefer.
|
||||
* Such as [OSCAR](https://oscar-project.org/)
|
||||
|
||||
Intuitively, the more data you can get to pretrain on, the better results you will get.
|
||||
|
||||
# Train a tokenizer
|
||||
|
||||
There are something you need take into consideration when train a tokenizer
|
||||
|
||||
## Tokenization
|
||||
|
||||
You can read more detailed post - [Tokenization](computer_sci/deep_learning_and_machine_learning/NLP/basic/tokenization.md)
|
||||
|
||||
Tokenization is the process of **breaking text into words of sentences**. These tokens helps machine to learn context of the text. This helps in *interpreting the meaning behind the text*. Hence, tokenization is *the first and foremost process while working on the text*. Once the tokenization is performed on the corpus, the resulted tokens can be used to prepare vocabulary which can be used for further steps to train the model.
|
||||
|
||||
Example:
|
||||
|
||||
“The city is on the river bank” -> “The”, ”city”, ”is”, ”on”, ”the”, ”river”, ”bank”
|
||||
|
||||
Here are some typical tokenization:
|
||||
* Word ( White Space ) Tokenization
|
||||
* Character Tokenization
|
||||
* **Subword Tokenization (SOTA)**
|
||||
|
||||
|
||||
Subword Tokenization can handle OOV(Out Of Vocabulary) problem effectively.
|
||||
|
||||
### Subword Tokenization Algorithm
|
||||
|
||||
* **Byte pair encoding** *(BPE)*
|
||||
* **Byte-level byte pair encoding**
|
||||
* **WordPiece**
|
||||
* **unigram**
|
||||
* **SentencePiece**
|
||||
|
||||
## Word embedding
|
||||
|
||||
After tokenization, we make our text into token. We also wants to present token in math type. Here we use word embedding technique, converting word to math.
|
||||
|
||||
Here are some typical word embedding algorithms:
|
||||
|
||||
* **Word2Vec**
|
||||
* skip-gram
|
||||
* continuous bag-of-words (CBOW)
|
||||
* **GloVe** (Global Vectors for Word Representations)
|
||||
* **FastText**
|
||||
* **ELMo** (Embeddings from Language Models)
|
||||
* **BERT** (Bidirectional Encoder Representations from Transformers)
|
||||
* a language model rather than a traditional word embedding algorithm. **While BERT does generate word embeddings as a byproduct of its training process**, its primary purpose is to learn contextualized representations of words and text segments.
|
||||
|
||||
# Train a language model from scratch
|
||||
|
||||
We need clear the definition of language model.
|
||||
|
||||
## Language model definition
|
||||
|
||||
Simply to say, the language model is a computational model or algorithm that is designed to understand and generate human language. It is a type of artificial intelligence(AI) model that uses *statistical and probabilistic techniques to predict and generate sequences of words and sentences*.
|
||||
|
||||
It captures the statistical relationships between words or characters and *builds a probability distribution of the likelihood of a particular word or sequence of words appearing in a given context.*
|
||||
|
||||
Language model can be used for various NLP tasks, including machine translation, speech recognition, text generation and so on....
|
||||
|
||||
As usual, a language model takes a seed input or prompt and uses its *learned knowledge of language(model weights)* to predict most likely words or characters to follow.
|
||||
|
||||
The SOTA of language model today is GPT-4.
|
||||
|
||||
## Language model algorithm
|
||||
|
||||
|
||||
### Classical LM
|
||||
|
||||
* **n-gram**
|
||||
* N-gram can be used as *both a tokenization algorithm and a component of a language model*. In my searching experience, n-grams are easier to understand as a language model to predict a likelihood distribution.
|
||||
* **HMMs** (Hidden Markov Models)
|
||||
* **RNNs** (Recurrent Neural Networks)
|
||||
|
||||
### Cutting-edge
|
||||
|
||||
* **GPT** (Generative Pre-trained Transformer)
|
||||
* **BERT** (Bidirectional Encoder Representations from Transformers)
|
||||
* **T5** (Text-To-Text Transfer Transformer)
|
||||
* **Megatron-LM**
|
||||
|
||||
## Train Method
|
||||
|
||||
Different designed models usually have different training methods. Here we take BERT-like model as example.
|
||||
|
||||
### BERT-Like model
|
||||
|
||||

|
||||
|
||||
To train BERT-Like model, we'll train it on a task of **Masked Language Modeling**(MLM), i.e. the predict how to fill arbitrary tokens that we randomly mask in the dataset.
|
||||
|
||||
Also, we'll train BERT-Like model using **Next Sentence Prediction** (NSP). *MLM teaches BERT to understand relationships between words and NSP teaches BERT to understand long-term dependencies across sentences.* In NSP training, give BERT two sentences, A and B, then BERT will determine B is A's next sentence or not, i.e. outputting `IsNextSentence` or `NotNextSentence`
|
||||
|
||||
With NSP training, BERT will have better performance.
|
||||
|
||||
| Task | MNLI-m (acc) | QNLI (acc) | MRPC (acc) | SST-2 (acc) | SQuAD (f1) |
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
| With NSP | 84.4 | 88.4 | 86.7 | 92.7 | 88.5 |
|
||||
| Without NSP | 83.9 | 84.9 | 86.5 | 92.6 | 87.9 |
|
||||
|
||||
[Table source](https://arxiv.org/pdf/1810.04805.pdf)
|
||||
[Table metrics explain](computer_sci/deep_learning_and_machine_learning/LLM/metircs/some_task.md)
|
||||
|
||||
|
||||
# Check LM actually trained
|
||||
|
||||
## Take BERT as example
|
||||
|
||||
Aside from looking at the training and eval losses going down, we can check our model using `FillMaskPipeline`.
|
||||
|
||||
This is a method input *a masked token (here, `<mask>`) and return a list of the most probable filled sequences, with their probabilities.*
|
||||
|
||||
With this method, we can see our LM captures more semantic knowledge or even some sort of (statistical) common sense reasoning.
|
||||
|
||||
# Fine-tune our LM on a downstream task
|
||||
|
||||
Finally, we can fine-tune our LM on a downstream task such as translation, chatbot, text generation and so on.
|
||||
|
||||
Different downstream task may need different methods to do fine-tune.
|
||||
|
||||
# Example
|
||||
|
||||
[https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/01_how_to_train.ipynb#scrollTo=G-kkz81OY6xH](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/01_how_to_train.ipynb#scrollTo=G-kkz81OY6xH)
|
||||
|
||||
|
||||
# Reference
|
||||
|
||||
* [HuggingFace blog, How to train a new language model from scratch using Transformers and Tokenizers](https://huggingface.co/blog/how-to-train)
|
||||
* [Medium blog, NLP Tokenization](https://medium.com/nerd-for-tech/nlp-tokenization-2fdec7536d17)
|
||||
* [Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. (2018). Improving language understanding by generative pre-training. , .](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf)
|
||||
|
||||
@ -0,0 +1,9 @@
|
||||
---
|
||||
title: Model Interpretability - MOC
|
||||
tags:
|
||||
- MOC
|
||||
- deep-learning
|
||||
- interpretability
|
||||
---
|
||||
|
||||
* [SHAP](computer_sci/deep_learning_and_machine_learning/Model_interpretability/SHAP.md)
|
||||
@ -0,0 +1,193 @@
|
||||
---
|
||||
title: SHAP - a reliable way to analyze model interpretability
|
||||
tags:
|
||||
- deep-learning
|
||||
- interpretability
|
||||
- algorithm
|
||||
---
|
||||
|
||||
SHAP is the most popular model-agnostic technique that is used to explain predictions. SHAP stands for **SH**apley **A**dditive ex**P**lanations
|
||||
|
||||
Shapely values are obtained by incorporating concepts from *Cooperative Game Theory* and *local explanations*
|
||||
|
||||
# Mathematical and Algorithm Foundation
|
||||
|
||||
## Shapely Values
|
||||
|
||||
Shapely values were from game theory and invented by Lloyd Shapley. Shapely values were invented to be a way of providing a fair solution to the following question:
|
||||
|
||||
> [!question]
|
||||
> If we have a coalition **C** that collaborates to produce a value **V**: How much did each individual member contribute to the final value
|
||||
|
||||
The method here we assess each individual member’s contribution is to removing each member to get a new coalition and then compare their production, like this graphs:
|
||||
|
||||

|
||||
|
||||
And then, we get every member 1 included or not included coalitions like this:
|
||||
|
||||

|
||||
|
||||
Using left value - right value, we can get difference like image left above; And then we calculate the mean of them:
|
||||
|
||||
$$
|
||||
\varphi_i=\frac{1}{\text{Members}}\sum_{\forall \text{C s.t. i}\notin \text{C}} \frac{\text{Marginal Contribution of i to C}}{\text{Coalitions of size |C|}}
|
||||
$$
|
||||
|
||||
## Shapely Additive Explanations
|
||||
|
||||
We need to know what’s **additive** mean here. Lundberg and Lee define an additive feature attribution as follows:
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
$x'$, the simplified local inputs usually means that we turn a feature vector into a discrete binary vector, where features are either included or excluded. Also, the $g(x')$ should take this form:
|
||||
|
||||
$$
|
||||
g(x')=\varphi_0+\sum_{i=1}^N \varphi_i {x'}_i
|
||||
$$
|
||||
|
||||
* $\varphi_0$ is the **null output** of this model, that is, the **average output** of this model
|
||||
- $\varphi_i$ is **feature affect**, is how much that feature changes the output of the model, introduced above. It’s called **attribution**
|
||||
|
||||

|
||||
|
||||
Now Lundberg and Lee go on to describe a set of three desirable properties of such an additive feature method, **local accuracy**, **missingness**, and **consistency**.
|
||||
|
||||
### Local accuracy
|
||||
|
||||
$$
|
||||
g(x')\approx f(x) \quad \text{if} \quad x'\approx x
|
||||
$$
|
||||
|
||||
### Missingness
|
||||
|
||||
$$
|
||||
{x_i}' = 0 \rightarrow \varphi_i = 0
|
||||
$$
|
||||
|
||||
if a feature excluded from the model. it’s attribution must be zero; that is, the only thing that can affect the output of the explanation model is the inclusion of features, not the exclusion.
|
||||
|
||||
### Consistency
|
||||
|
||||
If feature contribution changes, the feature effect cannot change in the opposite direction
|
||||
|
||||
# Why SHAP
|
||||
|
||||
Lee and Lundberg in their paper argue that only SHAP satisfies all three properties if **the feature attributions in only additive explanatory model are specifically chosen to be the shapley values of those features**
|
||||
|
||||
# SHAP, step-by-step Process, same as shap.explainer
|
||||
|
||||
For example, we consider a ice cream shop in the airport, it has four features we can know to predict his business.
|
||||
|
||||
$$
|
||||
\begin{bmatrix}
|
||||
\text{temperature} & \text{day of weeks} & \text{num of flights} & \text{num of hours}
|
||||
\end{bmatrix}
|
||||
\\
|
||||
\rightarrow \\
|
||||
\begin{bmatrix}
|
||||
T & D & F & H
|
||||
\end{bmatrix}
|
||||
$$
|
||||
|
||||
For, example, we want to know the temperature 80 in sample [80 1 100 4] shapley value, here’s the step
|
||||
|
||||
- Step 1. Get random permutation of features, and give a bracket to the feature we care and everything in its right. (manually)
|
||||
|
||||
$$
|
||||
\begin{bmatrix}
|
||||
F & D & \underbrace{T \quad H}
|
||||
\end{bmatrix}
|
||||
$$
|
||||
|
||||
- Step 2. Pick random sample from dataset
|
||||
|
||||
For example, [200 5 70 8], form: [F D T H]
|
||||
|
||||
- Step 3. Form vectors $x_1 \quad x_2$
|
||||
|
||||
$$
|
||||
x_1=[100 \quad 1 \quad 80 \quad \color{#BF40BF} 8 \color{#FFFFFF}]
|
||||
$$
|
||||
|
||||
$x_1$ is partially from original sample and partially from the random chosen one, the feature in bracket will from random chosen one, exclude what we care
|
||||
|
||||
$$
|
||||
x_2 = [100 \quad 1 \quad \color{#BF40BF} 70 \quad 8 \color{#FFFFFF}]
|
||||
$$
|
||||
|
||||
$x_2$ just change the feature we care into the same as random chosen one’s feature value
|
||||
|
||||
Then, calculate the diff and record
|
||||
|
||||
$$
|
||||
DIFF = c_1 - c_2
|
||||
$$
|
||||
|
||||
- Step 4. Record the diff & return to step 1. and repeat many times
|
||||
|
||||
$$
|
||||
\text{SHAP}(T=80 | [80 \quad 1 \quad 100 \quad 4]) = \text{average(DIFF)}
|
||||
$$
|
||||
|
||||
# Shapley kernel
|
||||
|
||||
## Too many coalitions need to be sampled
|
||||
|
||||
Like we introduce shapley values above, for each $\varphi_i$ we need to sample a lot of coalitions to compute the difference.
|
||||
|
||||
For 4 features, we need 64 total coalitions to sample; For 32 features, we need 17.1 billion coalitions to sample.
|
||||
|
||||
It’s entirely untenable.
|
||||
|
||||
So, to get over this difficulty, we need devise a **shapley kernel**, and that’s how the Lee and Lundberg do
|
||||
|
||||

|
||||
|
||||
## Detail
|
||||

|
||||
|
||||
Though most of ML models won’t just let you omit a feature, what we do is define a **background dataset** B, one that contains a set of representative data points that model was trained over. We then filled in out omitted feature of features with values from background dataset, while holding the features are included in the permutation fixed to their original values. We then take the average of the model output over all of these new synthetic data point as our model output for that feature permutation which we call $\bar{y}$.
|
||||
|
||||
$$
|
||||
E[y_{\text{12i4}}\ \ \forall \ \text{i}\in B] = \bar{y}_{\text{124}}
|
||||
$$
|
||||

|
||||
|
||||
Them we have a number of samples computed in this way,like image in left.
|
||||
|
||||
We can formulate this as a weighted linear regression, with each feature assigned a coefficient.
|
||||
|
||||
And we can prove that, in the special choice, the coefficient can be the shaplely values. **This weighting scheme is the basis of the Shapley Kernal.** In this situation, the weighted linear regression process as a whole is Kernal SHAP.
|
||||
|
||||
### Different types of SHAP
|
||||
|
||||
- **Kernal SHAP**
|
||||
- Low-order SHAP
|
||||
- Linear SHAP
|
||||
- Max SHAP
|
||||
- Deep SHAP
|
||||
- Tree SHAP
|
||||
|
||||

|
||||
|
||||
### You need to notice
|
||||
We can see that, we calculate shapley values using linear regression lastly. So there must be the error here, but some python packages can not give us the error bound, so it’s confusion to konw if this error come from linear regression or the data, or the model.
|
||||
|
||||
|
||||
# Reference
|
||||
|
||||
[Shapley Additive Explanations (SHAP)](https://www.youtube.com/watch?v=VB9uV-x0gtg)
|
||||
|
||||
[SHAP: A reliable way to analyze your model interpretability](https://towardsdatascience.com/shap-a-reliable-way-to-analyze-your-model-interpretability-874294d30af6)
|
||||
|
||||
[【Python可解释机器学习库SHAP】:Python的可解释机器学习库SHAP](https://zhuanlan.zhihu.com/p/483622352)
|
||||
|
||||
[Shapley Values : Data Science Concepts](https://www.youtube.com/watch?v=NBg7YirBTN8)
|
||||
|
||||
# Appendix
|
||||
|
||||
Other methods to interprete model:
|
||||
|
||||
[Papers with Code - SHAP Explained](https://paperswithcode.com/method/shap)
|
||||
|
After Width: | Height: | Size: 73 KiB |
|
After Width: | Height: | Size: 73 KiB |
|
After Width: | Height: | Size: 93 KiB |
|
After Width: | Height: | Size: 81 KiB |
|
After Width: | Height: | Size: 88 KiB |
|
After Width: | Height: | Size: 318 KiB |
|
After Width: | Height: | Size: 254 KiB |
|
After Width: | Height: | Size: 351 KiB |
|
After Width: | Height: | Size: 317 KiB |
|
After Width: | Height: | Size: 288 KiB |
@ -0,0 +1,9 @@
|
||||
---
|
||||
title: Tokenization
|
||||
tags:
|
||||
- NLP
|
||||
- deep-learning
|
||||
- tokenization
|
||||
- basic
|
||||
---
|
||||
|
||||
@ -0,0 +1,58 @@
|
||||
---
|
||||
title: Dynamic Time Warping (DTW)
|
||||
tags:
|
||||
- metrics
|
||||
- time-series-dealing
|
||||
- evalution
|
||||
---
|
||||
|
||||

|
||||
|
||||
欧氏距离在时间序列之间可能是一个不好的选择,因为时间轴上存在扭曲的情况。DTW 是一个考虑到这种扭曲的,测量距离来比较两个时间序列的一个指标,本section讲解如何计算 DTW distance
|
||||
|
||||
# Detail
|
||||
|
||||
|
||||
## Step 1. 准备输入序列
|
||||
|
||||
假设两个time series, A & B
|
||||
|
||||
## Step 2. 计算距离矩阵
|
||||
|
||||
创建一个距离矩阵,其中的元素表示序列 A 和序列 B 中每个时间点之间的距离。常见的距离度量方法包括欧氏距离、曼哈顿距离、余弦相似度等。根据你的数据类型和需求选择适当的距离度量方法。
|
||||
|
||||
## Step 3. 初始化累积距离矩阵
|
||||
|
||||
创建一个与距离矩阵大小相同的累积距离矩阵,用于存储从起点到每个位置的累积距离。将起点 (0, 0) 的累积距离设为距离矩阵的起始点距离。
|
||||
|
||||
## Step 4. 计算累积距离
|
||||
|
||||
从起点开始,按照动态规划的方式计算累积距离矩阵中每个位置的累积距离。对于每个位置 (i, j),**累积距离等于该位置的距离加上三个相邻位置中选择最小累积距离的值。**
|
||||
|
||||
$$
|
||||
DTW(i, j) = d_{i,j} + \min{\{DTW(i-1,j), DTW(i, j-1), DTW(i-1, j-1)\}}
|
||||
$$
|
||||
|
||||
|
||||
## Step 5. 回溯最优路径
|
||||
|
||||
从累积距离矩阵的最右下角开始,根据最小累积距离的路径回溯到起点 (0, 0)。记录下经过的路径,即为最优路径。
|
||||
|
||||
## Step 6. 计算最终距离
|
||||
|
||||
根据最优路径上的累积距离,计算出最终的 DTW 距离。
|
||||
|
||||
# Example
|
||||
|
||||

|
||||
|
||||
左边是距离矩阵,右边是DTW矩阵,也就是累积距离矩阵
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
通过回溯,找到optimal warping path,DTW distance就是 the optimal warping path的square root,本例中就是$\sqrt{15}$
|
||||
|
||||
|
||||
|
||||
|
After Width: | Height: | Size: 98 KiB |
|
After Width: | Height: | Size: 55 KiB |
|
After Width: | Height: | Size: 521 KiB |