I’m analysing a data set with the weight of newborn babies and some info about their mothers, including a categorical variable ‘smoke’ – whether a mother is a smoker, or not.
I did an aov test and wanted to plot diagnostic plots of an ANOVA model with its help. I expected to get four plots, including a ‘Residuals vs Factor Levels’ plot. Instead, I got a ‘Residuals vs Leverage’ plot, as if my categorical variable was a numeric one.
You can find the dataset here: https://drive.google.com/file/d/1VwiAHdYZF2BrGZZ875GGdkyamKMgxmGU/view?usp=sharing
In there variable ‘smoke’ has values 0 (non-smoker) and 1 (smoker). I used mutate to change it into a proper factor (among others, like parity), then made the aov test itself and tried to plot the results, to verify the assumptions. Below you can find my code:
babies <- read.csv("babies.csv") babies <- babies %>% mutate(parity = factor(parity, levels = c(0, 1), labels = c("not firstborn", "firstborn"))) %>% mutate(smoke = factor(smoke, levels = c(0, 1), labels = c("non smoker", "smoker"))) model6 <- aov(babies$bwt ~ babies$smoke) par(mfrow = c(2,2)) plot(aov(babies$bwt ~ babies$smoke))
The result I’m getting in the fourth plot is this:
I tried to check whether ‘smoke’ is a factor as I wanted or not, like that:
> head(babies$smoke)  non smoker non smoker smoker non smoker smoker non smoker Levels: non smoker smoker
Since ‘smoke’ is a factor (as I understand) and a categorical variable, why is there leverage as per numeric variable? How to fix this and get the proper plot?
Thanks for the help in advance!