Feature Scaling
Feature Scaling
This is an explanation for the question,- Should we scale the predicted variable (usually denoted y) also when the dependent variables (usually deboted Xs) are scaled?
Load an example data and separate predicted variable and the featuers (X matrix)
rm(list = ls())
dat <- read.table("ex1data2.txt", header = FALSE, sep = ",")
X <- dat[, 1:2]
y <- dat[, 3]
# show the first 6 lines of the data
head(dat)
## V1 V2 V3
## 1 2104 3 399900
## 2 1600 3 329900
## 3 2400 3 369000
## 4 1416 2 232000
## 5 3000 4 539900
## 6 1985 4 299900
mod1 <- lm(V3 ~ V1 + V2, data = dat)
coef(mod1) # thetas
## (Intercept) V1 V2 
## 89597.9 139.2 -8738.0
predict(mod1, newdata = data.frame(V1 = 1650, V2 = 3))
## 1 
## 293081
# 293081
# ------------------------------------------ Feature scaling and mean
# normalisation ------------------------------------------
dat2 <- as.data.frame(cbind(scale(dat[, 1:2]), dat[, 3]))
# y is in original form and Xs are scaled
head(dat2)
## V1 V2 V3
## 1 0.13001 -0.2237 399900
## 2 -0.50419 -0.2237 329900
## 3 0.50248 -0.2237 369000
## 4 -0.73572 -1.5378 232000
## 5 1.25748 1.0904 539900
## 6 -0.01973 1.0904 299900
# Build the linear model
mod2 <- lm(V3 ~ V1 + V2, data = dat2)
coef(mod2) # theta
## (Intercept) V1 V2 
## 340413 110631 -6649
predict(mod2, newdata = data.frame(V1 = 1650, V2 = 3))
## 1 
## 182861697
# 182861697
# Scaling and normalising the predicted scores
V1 <- (1650 - colMeans(X)[1])/apply(X, 2, sd)[1]
V2 <- (3 - colMeans(X)[2])/apply(X, 2, sd)[2]
predict(mod2, newdata = data.frame(V1 = V1, V2 = V2))
## V1 
## 293081
# 293081.5
Below lists note for using normal equation to estimate thetas
Normal equation
X <- dat[, 1:2]
y <- dat[, 3]
X <- as.matrix(cbind(rep(1, nrow(X)), X))
colnames(X) <- paste("theta", 0:2, sep = "")
solve(t(X) %*% X) %*% t(X) %*% y
## [,1]
## theta0 89597.9
## theta1 139.2
## theta2 -8738.0
download file now