Suppose we have 3 observations:
Our model is
\[ y_i = \beta_0 + \beta_1 x_i + \varepsilon_i \\ \varepsilon_i \sim \text{Normal}(0, \sigma^2) \]
The design matrix is \(X = \begin{bmatrix} 1 & 1 \\ 1 & 2 \\ 1 & 3 \end{bmatrix}\)
We know that we can use the following code to find the estimates of \(\beta_0\) and \(\beta_1\):
X <- cbind(
c(1, 1, 1),
c(1, 2, 3)
)
y <- matrix(c(2, 4, 5))
beta_hat <- solve(t(X) %*% X) %*% t(X) %*% y
beta_hat
## [,1]
## [1,] 0.6666667
## [2,] 1.5000000
Here is a picture of the RSS as a function of \(\beta_0\) and \(\beta_1\), with our estimates \((\hat{\beta}_0, \hat{\beta}_1)\) shown with a red point:
\[RSS = \sum_{i = 1}^n (y_i - \hat{y}_i)^2 = \{2 - (\beta_0 + \beta_1 \cdot 1)\}^2 + \{4 - (\beta_0 + \beta_1 \cdot 2)\}^2 + \{5 - (\beta_0 + \beta_1 \cdot 3)\}^2\]
Suppose we have 3 observations:
Our model is
\[ y_i = \beta_0 + \beta_1 x_i + \varepsilon_i \\ \varepsilon_i \sim \text{Normal}(0, \sigma^2) \]
The design matrix is \(X = \begin{bmatrix} 1 & 2 \\ 1 & 2 \\ 1 & 2 \end{bmatrix}\)
We know that there is not a unique \(\hat{\beta}\) that minimizes RSS because the columns of \(X\) are not linearly independent.
Here is a picture of the RSS as a function of \(\beta_0\) and \(\beta_1\):
\[RSS = \sum_{i = 1}^n (y_i - \hat{y}_i)^2 = \{2 - (\beta_0 + \beta_1 \cdot 2)\}^2 + \{4 - (\beta_0 + \beta_1 \cdot 2)\}^2 + \{5 - (\beta_0 + \beta_1 \cdot 2)\}^2\]
Note that there is no unique pair \((\beta_0, \beta_1)\) that minimizes RSS.