quartz/content/AI&DATA/MathsForML/Coursera Maths for ML.md
2024-11-10 14:12:25 +03:00

5.5 KiB
Raw Blame History


Linear Regression Explanation: Weights Vector and Feature Matrix

1. How the Weights Vector Multiplies with the Feature Matrix

In linear regression, we model the relationship between input features and the target output as a linear combination of the input features. The formula for linear regression is:


\mathbf{y} = \mathbf{X} \mathbf{w} + \mathbf{b}

where:

  • (\mathbf{y}) is the vector of predicted outputs.
  • (\mathbf{X}) is the feature matrix, where each row represents a data point and each column represents a feature. If there are n data points and p features, then \mathbf{X} has a shape of n \times p.
  • (\mathbf{w}) is the weights vector (also called the coefficient vector), which has a size of p \times 1. Each weight corresponds to the importance or contribution of a specific feature.
  • (\mathbf{b}) is the bias term, which shifts the output up or down.

Explanation of \mathbf{X} \mathbf{w}

  1. Matrix-vector multiplication: In \mathbf{X} \mathbf{w}, each row of \mathbf{X} (a single data point) is multiplied by the weights vector \mathbf{w} to produce a prediction for that data point.
  2. Output: The result \mathbf{X} \mathbf{w} yields an n \times 1 vector of predictions for all n data points. The bias term \mathbf{b} is added to each element in this vector to produce the final prediction vector.

2. The Weights Vector \mathbf{w}

The weights vector, typically denoted as (\mathbf{w}), is a column vector that holds the coefficients assigned to each feature in linear regression. It defines the influence or contribution of each feature to the output prediction.

For a model with p features, the weights vector (\mathbf{w}) is of size p \times 1 and can be represented as:


\mathbf{w} = \begin{bmatrix} w_1 \\ w_2 \\ \vdots \\ w_p \end{bmatrix}

where each element w_i in the weights vector corresponds to the coefficient for feature x_i:

  • w_1 is the weight for the first feature.
  • w_2 is the weight for the second feature.
  • ...
  • w_p is the weight for the (p)-th feature.

For example, if the model has three features, then the weights vector would be:


\mathbf{w} = \begin{bmatrix} w_1 \\ w_2 \\ w_3 \end{bmatrix}

In practice, these weights are determined during training by minimizing the error between the predictions and the actual target values, adjusting the weights accordingly.


3. The Feature Matrix \mathbf{X}

The feature matrix, denoted as (\mathbf{X}), contains all input data points for a linear regression model. In this matrix:

  • Each row represents a single data point (observation).
  • Each column represents a feature (variable) of that data point.

If there are n data points and p features, the feature matrix \mathbf{X} is of size n \times p and has the following structure:


\mathbf{X} = \begin{bmatrix} 
x_{11} & x_{12} & \cdots & x_{1p} \\
x_{21} & x_{22} & \cdots & x_{2p} \\
\vdots & \vdots & \ddots & \vdots \\
x_{n1} & x_{n2} & \cdots & x_{np} \\
\end{bmatrix}

where:

  • x_{ij} represents the value of the (j)-th feature for the (i)-th data point.

Example of a Feature Matrix

For three data points and two features, the feature matrix would look like this:


\mathbf{X} = \begin{bmatrix} 
x_{11} & x_{12} \\ 
x_{21} & x_{22} \\ 
x_{31} & x_{32} \\
\end{bmatrix}

In this example:

  • The first row [x_{11}, x_{12}] represents the features for the first data point.
  • The second row [x_{21}, x_{22}] represents the features for the second data point.
  • The third row [x_{31}, x_{32}] represents the features for the third data point.

During the linear regression calculation, each row of (\mathbf{X}) is multiplied by the weights vector (\mathbf{w}) to generate a prediction for that specific data point.

In linear algebra, a singularity typically refers to a singular matrix—a matrix that does not have an inverse. Heres what that means:

  1. Singular Matrix:

    • A matrix \mathbf{A} is singular if its determinant is zero: \det(\mathbf{A}) = 0.
    • A singular matrix cannot be inverted, which implies that there is no matrix \mathbf{A}^{-1} such that \mathbf{A} \mathbf{A}^{-1} = \mathbf{I} (where \mathbf{I} is the identity matrix).
  2. Implications of Singularity:

    • Linearly Dependent Rows or Columns: A matrix is singular if its rows or columns are linearly dependent, meaning that at least one row or column can be written as a linear combination of the others.
    • No Unique Solutions: When using a singular matrix in a system of linear equations \mathbf{A}\mathbf{x} = \mathbf{b}, the system either has no solutions or infinitely many solutions, but never a unique solution.
    • Non-Invertible Transformations: In transformations represented by matrices, a singular matrix corresponds to a transformation that squashes some or all of the space, meaning the transformation is not fully reversible.
  3. Geometric Interpretation:

    • A singular matrix, when representing a transformation in vector space, will map some vectors onto lower-dimensional space (e.g., a 2D plane in 3D space), causing a loss of dimensionality. This often manifests as "flattening" or "collapsing" parts of the space onto each other.

In summary, in linear algebra, singularity indicates a loss of invertibility due to linearly dependent vectors, resulting in a matrix that cannot map uniquely back to its original space.