quartz/Coursera Maths for ML.md at 0dfede6574d3ae2261878033bd719c5e1047b104

GitHub/quartz

Fork 2

mirror of https://github.com/jackyzha0/quartz.git synced 2025-12-31 00:34:05 -06:00

ErdemOzgen 0dfede6574 Add section 2 mit calculus single variable

2024-11-10 14:12:25 +03:00

5.5 KiB

Raw Blame History

Linear Regression Explanation: Weights Vector and Feature Matrix

1. How the Weights Vector Multiplies with the Feature Matrix

In linear regression, we model the relationship between input features and the target output as a linear combination of the input features. The formula for linear regression is:


\mathbf{y} = \mathbf{X} \mathbf{w} + \mathbf{b}

where:

(\mathbf{y}) is the vector of predicted outputs.
(\mathbf{X}) is the feature matrix, where each row represents a data point and each column represents a feature. If there are n data points and p features, then \mathbf{X} has a shape of n \times p.
(\mathbf{w}) is the weights vector (also called the coefficient vector), which has a size of p \times 1. Each weight corresponds to the importance or contribution of a specific feature.
(\mathbf{b}) is the bias term, which shifts the output up or down.

Explanation of `\mathbf{X} \mathbf{w}`

Matrix-vector multiplication: In \mathbf{X} \mathbf{w}, each row of \mathbf{X} (a single data point) is multiplied by the weights vector \mathbf{w} to produce a prediction for that data point.
Output: The result \mathbf{X} \mathbf{w} yields an n \times 1 vector of predictions for all n data points. The bias term \mathbf{b} is added to each element in this vector to produce the final prediction vector.

2. The Weights Vector `\mathbf{w}`

The weights vector, typically denoted as (\mathbf{w}), is a column vector that holds the coefficients assigned to each feature in linear regression. It defines the influence or contribution of each feature to the output prediction.

For a model with p features, the weights vector (\mathbf{w}) is of size p \times 1 and can be represented as:


\mathbf{w} = \begin{bmatrix} w_1 \\ w_2 \\ \vdots \\ w_p \end{bmatrix}

where each element w_i in the weights vector corresponds to the coefficient for feature x_i:

w_1 is the weight for the first feature.
w_2 is the weight for the second feature.
...
w_p is the weight for the (p)-th feature.

For example, if the model has three features, then the weights vector would be:


\mathbf{w} = \begin{bmatrix} w_1 \\ w_2 \\ w_3 \end{bmatrix}

In practice, these weights are determined during training by minimizing the error between the predictions and the actual target values, adjusting the weights accordingly.

3. The Feature Matrix `\mathbf{X}`

The feature matrix, denoted as (\mathbf{X}), contains all input data points for a linear regression model. In this matrix:

Each row represents a single data point (observation).
Each column represents a feature (variable) of that data point.

If there are n data points and p features, the feature matrix \mathbf{X} is of size n \times p and has the following structure:


\mathbf{X} = \begin{bmatrix} 
x_{11} & x_{12} & \cdots & x_{1p} \\
x_{21} & x_{22} & \cdots & x_{2p} \\
\vdots & \vdots & \ddots & \vdots \\
x_{n1} & x_{n2} & \cdots & x_{np} \\
\end{bmatrix}

where:

x_{ij} represents the value of the (j)-th feature for the (i)-th data point.

Example of a Feature Matrix

For three data points and two features, the feature matrix would look like this:


\mathbf{X} = \begin{bmatrix} 
x_{11} & x_{12} \\ 
x_{21} & x_{22} \\ 
x_{31} & x_{32} \\
\end{bmatrix}

In this example:

The first row [x_{11}, x_{12}] represents the features for the first data point.
The second row [x_{21}, x_{22}] represents the features for the second data point.
The third row [x_{31}, x_{32}] represents the features for the third data point.

During the linear regression calculation, each row of (\mathbf{X}) is multiplied by the weights vector (\mathbf{w}) to generate a prediction for that specific data point.

In linear algebra, a singularity typically refers to a singular matrix—a matrix that does not have an inverse. Here’s what that means:

Singular Matrix:
- A matrix \mathbf{A} is singular if its determinant is zero: \det(\mathbf{A}) = 0.
- A singular matrix cannot be inverted, which implies that there is no matrix \mathbf{A}^{-1} such that \mathbf{A} \mathbf{A}^{-1} = \mathbf{I} (where \mathbf{I} is the identity matrix).
Implications of Singularity:
- Linearly Dependent Rows or Columns: A matrix is singular if its rows or columns are linearly dependent, meaning that at least one row or column can be written as a linear combination of the others.
- No Unique Solutions: When using a singular matrix in a system of linear equations \mathbf{A}\mathbf{x} = \mathbf{b}, the system either has no solutions or infinitely many solutions, but never a unique solution.
- Non-Invertible Transformations: In transformations represented by matrices, a singular matrix corresponds to a transformation that squashes some or all of the space, meaning the transformation is not fully reversible.
Geometric Interpretation:
- A singular matrix, when representing a transformation in vector space, will map some vectors onto lower-dimensional space (e.g., a 2D plane in 3D space), causing a loss of dimensionality. This often manifests as "flattening" or "collapsing" parts of the space onto each other.

In summary, in linear algebra, singularity indicates a loss of invertibility due to linearly dependent vectors, resulting in a matrix that cannot map uniquely back to its original space.

5.5 KiB Raw Blame History Unescape Escape