in

Why R plots Residuals vs Leverage instead of Residuals vs Factor Levels (ANOVA test & model with ‘aov’)


I’m analysing a data set with the weight of newborn babies and some info about their mothers, including a categorical variable ‘smoke’ – whether a mother is a smoker, or not.

I did an aov test and wanted to plot diagnostic plots of an ANOVA model with its help. I expected to get four plots, including a ‘Residuals vs Factor Levels’ plot. Instead, I got a ‘Residuals vs Leverage’ plot, as if my categorical variable was a numeric one.

You can find the dataset here: https://drive.google.com/file/d/1VwiAHdYZF2BrGZZ875GGdkyamKMgxmGU/view?usp=sharing

In there variable ‘smoke’ has values 0 (non-smoker) and 1 (smoker). I used mutate to change it into a proper factor (among others, like parity), then made the aov test itself and tried to plot the results, to verify the assumptions. Below you can find my code:

babies <- read.csv("babies.csv")
babies <- babies %>% 
mutate(parity = factor(parity, 
                     levels = c(0, 1), 
                     labels = c("not firstborn", "firstborn"))) %>% 
mutate(smoke = factor(smoke, 
                    levels = c(0, 1), 
                    labels = c("non smoker", "smoker")))

model6 <- aov(babies$bwt ~ babies$smoke)
par(mfrow = c(2,2))
plot(aov(babies$bwt ~ babies$smoke))

The result I’m getting in the fourth plot is this:

I tried to check whether ‘smoke’ is a factor as I wanted or not, like that:

> head(babies$smoke)
[1] non smoker non smoker smoker     non smoker smoker     non smoker
Levels: non smoker smoker

Since ‘smoke’ is a factor (as I understand) and a categorical variable, why is there leverage as per numeric variable? How to fix this and get the proper plot?

Thanks for the help in advance!



Source: https://stackoverflow.com/questions/70625885/why-r-plots-residuals-vs-leverage-instead-of-residuals-vs-factor-levels-anova-t

App for finance management built with react

irchiver — your full-resolution personal web archive