in

Why do results differ for dplyr left_join() and right_join() using these two dataframes


I am learning how to use the R dplyr ‘join’ functions by doing the exercises from this course: https://github.com/uclouvain-cbio/WSBIM1207 and got stuck on the problem described below.

First, download the example dataframes used for this question:

BiocManager::install("UCLouvain-CBIO/rWSBIM1207")

Load the package:

library(rWSBIM1207)

Then in R/RStudio load the dataframe files, ‘clinical2’ and ‘expression’ by typing:

data(clinical2)
data(expression)

The task is, firstly:
Join the expression and clinical2 tables by the patient reference, using the left_join and the right_join functions.
I did that in this way:

left_join(expression, clinical2, 
          by = c("patient" = "patientID"))
right_join(expression, clinical2,
                     by = c("patient" = "patientID"))

The second task is to explain why the results are different. I found that there are 3 more rows in the right_join output versus the left_join output. This seems odd to me given that ‘clinical2’ has 516 rows, whereas ‘expression’ has 570 rows. The 3 extra rows present in the r_join output have in common that they contain multiple NA values, which presumably represent patients found in ‘clinical2’ and not in ‘expression’. I don’t really understand what is going on here, and would be grateful for any help.



Source: https://stackoverflow.com/questions/70536460/why-do-results-differ-for-dplyr-left-join-and-right-join-using-these-two-dat

Vuetify datetime picker with input fields

A Lightweight eventbus with async compatibility for Golang