Goal of this lab session

We practice the diagnostic tools and data transformation using Canadian Survey of Labour and Income Dynamics (SLID). We want to replicate the analysis in Chapter 12 in JF.

Preparations

Let’s first load the data.

slid.data <- read.table("SLID-Ontario.txt", header = TRUE)
# show dimension
dim(slid.data)
## [1] 3997    4

Fit linear model in 12.1

\[ Wage = \beta_0 + \beta_1 \cdot Sex + \beta_2 \cdot Age + \beta_3 \cdot Education + \epsilon \]

lmod <- lm(compositeHourlyWages ~ sex + age + yearsEducation, data = slid.data)
summary(lmod)
## 
## Call:
## lm(formula = compositeHourlyWages ~ sex + age + yearsEducation, 
##     data = slid.data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -26.490  -4.302  -0.746   3.233  35.871 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -8.124231   0.598977  -13.56   <2e-16 ***
## sexMale         3.473670   0.207009   16.78   <2e-16 ***
## age             0.261293   0.008664   30.16   <2e-16 ***
## yearsEducation  0.929649   0.034257   27.14   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.542 on 3993 degrees of freedom
## Multiple R-squared:  0.3074, Adjusted R-squared:  0.3069 
## F-statistic: 590.7 on 3 and 3993 DF,  p-value: < 2.2e-16

Q.1

Write out the parameter estimates.

Write out the associated hypothesis tests for each p-values in the above summary table

What is the \(R^2\) value?

Q2, Replicate Figure 12.1 (a) Quantile-comparison plot for the studentized residuals from the SLID regression.

library(car)
qqPlot(lmod, distribution = , line = "none", envelop = list(style = "lines"))  # fill in the distribution type

Q3

Fit another linear regression with log2 (base 2) transformed wage as the response variable.

\[ \log_2(Wage) = \beta_0' + \beta_1' \cdot Sex + \beta_2' \cdot Age + \beta_3' \cdot Education + \epsilon' \]

lmod.2 <- lm()
summary(lmod.2)

Compare your parameter estimates to (12.2)

Report \(R^2\) value

Replicate Figure 12.2 (a)

qqPlot(lmod.2, distribution = , line = "none", 
       envelop = list(style = "lines", col = 1, alpha = 0.1))

Has the transformation corrected the positive skew in the original regression model?

Q4

Replicate Figure 12.3 (a) of Studentized residuals versus fitted values from the first model (without data transformation)

residualPlot(lmod, ) # look at the documentation of this function and finish it

Replicate Figure 12.4 (a) of studentized residuals versus fitted values from the second model (with log2 transformed wage)

residualPlot(lmod.2, )

Comment on your findings from the last two plots

Q5

Replicate Figure 12.6, Component-plus-residual plots for age and education

in the SLID regression of log wages on these variables and sex.

# plot all together
crPlots(lmod.2, col = "grey")

# plot one at a time
# I didn't figure out how to change line colors...
crPlot(lmod.2, variable = "age", col = "grey",
       smooth = list(smoother=loessLine, span = 0.4, col = "black"))

crPlot(lmod.2, variable = "yearsEducation", col = "grey",
       smooth = list(smoother=loessLine, span = 0.4, col = "black"))

Write out your findings