Python v R Series: Part 02: Join Data
Introduction
Joining and merging data are straightforward to do in both R and Python. The examples were inspired by Matt Dancho and his incredible library of training videos on Business Science University.
In the R examples, we will be using the RStudio IDE. It is assumed that you have a project set up in the folder where the data files exist. For more information about getting a project set up in R Studio and the Python environment set up in VS Code, click the links below.
In both examples, we will be importing and joining 3 different excel data sets. Do not worry if you do not have the exact data sets to follow along. You can simply use a data set of your own and follow along.
R: Joining Data
library(tidyverse) library(readxl) bikes <- read_excel("00_Data_Files/bikes.xlsx") bikeshops <- read_excel("00_Data_Files/bikeshops.xlsx") orderlines <- read_excel("00_Data_Files/orderlines.xlsx")
bikes bikeshops orderlines
bikes_data_joined <- orderlines %>% left_join(bikes, by=c("product.id" = "bike.id")) %>% left_join(bikeshops, by=c("customer.id" = "bikeshop.id")) bikes_data_joined %>% glimpse()
Python: Joining Data
# 1.0 Load Libraries ---- # # Load Libraries # Core Python Data Analysis from numpy.core.defchararray import index import pandas as pd import numpy as np
bikes_df = pd.read_excel("00_data_raw/bikes.xlsx")
bikeshops_df = pd.read_excel("00_data_raw/bikeshops.xlsx")
orderlines_df = pd.read_excel("00_data_raw/orderlines.xlsx")
The data tables listed below give the same results as the R example above.
We will perform the same exact joins as we did in the R example above. You will notice slight differences in the code, but the end result is precisely the same.
bike_orderlines_joined_df = orderlines_df \ .drop(columns='Unnamed: 0', axis=1) \ .merge( right = bikes_df, how='left', left_on='product.id', right_on='bike.id' ) \ .merge( right=bikeshops_df, how = 'left', left_on='customer.id', right_on='bikeshop.id' )
bike_orderlines_joined_df
bike_orderlines_joined_df.info()
Conclusion
The joining process between R and Python was again almost identical. Both the platforms make it incredibly easy to join the three data sets.
For the complete list of R v Python topics, click on the links below.
03: Change Data Types
04: String Splits
05: Calculate New Columns
06: Organize Columns
07: Rename Columns
08: Saving Data
09: Aggregations