Suppose we have 3 observations:
Our model is
yi=β0+β1xi+εiεi∼Normal(0,σ2)
The design matrix is X=[111213]
We know that we can use the following code to find the estimates of β0 and β1:
X <- cbind(
c(1, 1, 1),
c(1, 2, 3)
)
y <- matrix(c(2, 4, 5))
beta_hat <- solve(t(X) %*% X) %*% t(X) %*% y
beta_hat
## [,1]
## [1,] 0.6666667
## [2,] 1.5000000
Here is a picture of the RSS as a function of β0 and β1, with our estimates (ˆβ0,ˆβ1) shown with a red point:
RSS=n∑i=1(yi−ˆyi)2={2−(β0+β1⋅1)}2+{4−(β0+β1⋅2)}2+{5−(β0+β1⋅3)}2
Suppose we have 3 observations:
Our model is
yi=β0+β1xi+εiεi∼Normal(0,σ2)
The design matrix is X=[121212]
We know that there is not a unique ˆβ that minimizes RSS because the columns of X are not linearly independent.
Here is a picture of the RSS as a function of β0 and β1:
RSS=n∑i=1(yi−ˆyi)2={2−(β0+β1⋅2)}2+{4−(β0+β1⋅2)}2+{5−(β0+β1⋅2)}2
Note that there is no unique pair (β0,β1) that minimizes RSS.