Feature Scaling
Feature Scaling
This is an explanation for the question,- Should we scale the predicted variable (usually denoted y) also when the dependent variables (usually deboted Xs) are scaled?
Load an example data and separate predicted variable and the featuers (X matrix)
rm(list = ls())
dat <- read.table("ex1data2.txt", header = FALSE, sep = ",")
X <- dat[, 1:2]
y <- dat[, 3]
# show the first 6 lines of the data
head(dat)
## V1 V2 V3
## 1 2104 3 399900
## 2 1600 3 329900
## 3 2400 3 369000
## 4 1416 2 232000
## 5 3000 4 539900
## 6 1985 4 299900
First, lets us try estimate thetas without scaling any variables. Here I use Rs lm to perform linear regression, without scaling the variables. mod1 <- lm(V3 ~ V1 + V2, data = dat)
coef(mod1) # thetas
## (Intercept) V1 V2
## 89597.9 139.2 -8738.0
Also print out the predicted price when the square feet of the house is 1650 (V1) and the number of floor is 3 (V2).predict(mod1, newdata = data.frame(V1 = 1650, V2 = 3))
## 1
## 293081
# 293081
So the price should be $293081. Now lets try scaling Xs only.# ------------------------------------------ Feature scaling and mean
# normalisation ------------------------------------------
dat2 <- as.data.frame(cbind(scale(dat[, 1:2]), dat[, 3]))
# y is in original form and Xs are scaled
head(dat2)
## V1 V2 V3
## 1 0.13001 -0.2237 399900
## 2 -0.50419 -0.2237 329900
## 3 0.50248 -0.2237 369000
## 4 -0.73572 -1.5378 232000
## 5 1.25748 1.0904 539900
## 6 -0.01973 1.0904 299900
# Build the linear model
mod2 <- lm(V3 ~ V1 + V2, data = dat2)
coef(mod2) # theta
## (Intercept) V1 V2
## 340413 110631 -6649
Do the prediction, using mod2 (Xs are scaled, but not y). The predicted Xs are applied in original form.predict(mod2, newdata = data.frame(V1 = 1650, V2 = 3))
## 1
## 182861697
# 182861697
Hmm, the number is not the same as previously. Clearly, it is incorrect procedure. Lets try scaling the predicted Xs (1650 and 3) and using them as the inputs.# Scaling and normalising the predicted scores
V1 <- (1650 - colMeans(X)[1])/apply(X, 2, sd)[1]
V2 <- (3 - colMeans(X)[2])/apply(X, 2, sd)[2]
predict(mod2, newdata = data.frame(V1 = V1, V2 = V2))
## V1
## 293081
# 293081.5
The answer is that it is OK not to scale y variable when using data from a training set to estimate thetas. In the case of using scaled training set Xs to estimate thetas, we also need to use scaled Xs in the test set to predict y. Otherwise, the predicted y wont be right.Below lists note for using normal equation to estimate thetas
Normal equation
X <- dat[, 1:2]
y <- dat[, 3]
X <- as.matrix(cbind(rep(1, nrow(X)), X))
colnames(X) <- paste("theta", 0:2, sep = "")
solve(t(X) %*% X) %*% t(X) %*% y
## [,1]
## theta0 89597.9
## theta1 139.2
## theta2 -8738.0
download file now