Python v R Series: Part 08: Saving Data
Introduction
Saving data in R and Python is fairly similar. The examples were inspired by Matt Dancho and his incredible library of training videos on Business Science University.
In the R examples, we will be using the RStudio IDE. It is assumed that you have a project set up in the folder where the data files exist. For more information about getting a project set up in R Studio and the Python environment set up in VS Code, click the links below.
R: Saving Data
library(tidyverse) library(readxl) bikes <- read_excel("00_Data_Files/bikes.xlsx") bikeshops <- read_excel("00_Data_Files/bikeshops.xlsx") orderlines <- read_excel("00_Data_Files/orderlines.xlsx")
bikes bikeshops orderlines
bikes_data_joined <- orderlines %>% left_join(bikes, by=c("product.id" = "bike.id")) %>% left_join(bikeshops, by=c("customer.id" = "bikeshop.id")) bikes_data_joined %>% glimpse()
df <- bikes_data_joined %>%
## Split Description into 3 columns separate(description, c("category.1", "category.2", "frame.material"), " - ") %>% ## Split location into 2 columns separate(location, c("city", "state"), ", ", remove = FALSE) %>% glimpse()
df <- df %>% mutate(total.price = quantity * price) %>% glimpse()
df <- df %>% select(order.id, order.line, order.date, model, quantity, price, total.price, bikeshop.name, location, category.1, category.2, frame.material, city, state) %>% glimpse()
names(df)
names(df) <- str_replace_all(names(df), "\\.", "_") df %>% glimpse()
df_2 <- df
Python: Rename Columns
# 1.0 Load Libraries ---- # # Load Libraries # Core Python Data Analysis from numpy.core.defchararray import index import pandas as pd import numpy as np
bikes_df = pd.read_excel("00_data_raw/bikes.xlsx")
bikeshops_df = pd.read_excel("00_data_raw/bikeshops.xlsx")
orderlines_df = pd.read_excel("00_data_raw/orderlines.xlsx")
bikes_df = pd.read_excel("00_data_raw/bikes.xlsx")
bikeshops_df = pd.read_excel("00_data_raw/bikeshops.xlsx")
orderlines_df = pd.read_excel("00_data_raw/orderlines.xlsx")
bike_orderlines_joined_df = orderlines_df \ .drop(columns='Unnamed: 0', axis=1) \ .merge( right = bikes_df, how='left', left_on='product.id', right_on='bike.id' ) \ .merge( right=bikeshops_df, how = 'left', left_on='customer.id', right_on='bikeshop.id' )
bike_orderlines_joined_df
bike_orderlines_joined_df.info()
df = bike_orderlines_joined_df df.T
temp_df = df['description'].str.split(pat=' - ', expand = True) df['category.1'] = temp_df[0] df['category.2'] = temp_df[1] df['frame.material'] = temp_df[2] df.T
temp_df = df['location'].str.split(', ', n = 1, expand = True) df['city'] = temp_df[0] df['state'] = temp_df[1] df.T
df['total.price'] = df['quantity'] * df['price'] df.T
df.columns cols_to_keep_list = [ 'order.id', 'order.line', 'order.date', 'model', 'quantity', 'price', 'total.price', 'bikeshop.name', 'location', 'category.1', 'category.2', 'frame.material', 'city', 'state' ] cols_to_keep_list df = df[cols_to_keep_list] df.T
df.columns = df.columns.str.replace(".", "_") df.T
We can perform that same exercise in Python if we want to save our work as a different object. Since Python does not make copies of data frames, the benefit of renaming the data frame object is that we do not have to run long sections of code each time we want to test new code out. To keep things simple and consistent, we will save the new data set as df_2. The code is pretty straightforward in Python as well.
df_2 = df.copy()
Conclusion
The process of saving data sets to new objects in R and Python was very similar. In the next section, we will look at the similarities and differences between aggregations in R and Python.
For the complete list of R v Python topics, click on the links below.