Why R plots Residuals vs Leverage instead of Residuals vs Factor Levels (ANOVA test & model with ‘aov’)

I’m analysing a data set with the weight of newborn babies and some info about their mothers, including a categorical variable ‘smoke’ – whether a mother is a smoker, or not.

I did an aov test and wanted to plot diagnostic plots of an ANOVA model with its help. I expected to get four plots, including a ‘Residuals vs Factor Levels’ plot. Instead, I got a ‘Residuals vs Leverage’ plot, as if my categorical variable was a numeric one.

You can find the dataset here:

In there variable ‘smoke’ has values 0 (non-smoker) and 1 (smoker). I used mutate to change it into a proper factor (among others, like parity), then made the aov test itself and tried to plot the results, to verify the assumptions. Below you can find my code:

babies <- read.csv("babies.csv")
babies <- babies %>% 
mutate(parity = factor(parity, 
                     levels = c(0, 1), 
                     labels = c("not firstborn", "firstborn"))) %>% 
mutate(smoke = factor(smoke, 
                    levels = c(0, 1), 
                    labels = c("non smoker", "smoker")))

model6 <- aov(babies$bwt ~ babies$smoke)
par(mfrow = c(2,2))
plot(aov(babies$bwt ~ babies$smoke))

The result I’m getting in the fourth plot is this:

I tried to check whether ‘smoke’ is a factor as I wanted or not, like that:

> head(babies$smoke)
[1] non smoker non smoker smoker     non smoker smoker     non smoker
Levels: non smoker smoker

Since ‘smoke’ is a factor (as I understand) and a categorical variable, why is there leverage as per numeric variable? How to fix this and get the proper plot?

Thanks for the help in advance!


App for finance management built with react

irchiver — your full-resolution personal web archive