Python v R Series: Part 08: Saving Data

 


Introduction

Saving data in R and Python is fairly similar.  The examples were inspired by Matt Dancho and his incredible library of training videos on Business Science University

In the R examples, we will be using the RStudio IDE.  It is assumed that you have a project set up in the folder where the data files exist. For more information about getting a project set up in R Studio and the Python environment set up in VS Code, click the links below.


R: Saving Data

In the previous articles, we detailed how to perform the following functions listed below.  For more information, click on the links below.


library(tidyverse)
library(readxl)

bikes      <- read_excel("00_Data_Files/bikes.xlsx")
bikeshops  <- read_excel("00_Data_Files/bikeshops.xlsx")
orderlines <- read_excel("00_Data_Files/orderlines.xlsx")

bikes
bikeshops
orderlines

bikes_data_joined <- orderlines %>% 
  left_join(bikes, by=c("product.id" = "bike.id")) %>%
  left_join(bikeshops, by=c("customer.id" = "bikeshop.id"))

bikes_data_joined %>% glimpse()

df <- bikes_data_joined %>%
  ## Split Description into 3 columns
  separate(description, c("category.1", "category.2", "frame.material"), " - ") %>%
  ## Split location into 2 columns
  separate(location, c("city", "state"), ", ", remove = FALSE) %>%
  glimpse()

df <- df %>% 
  mutate(total.price = quantity * price) %>% 
  glimpse()

df <- df %>%
  select(order.id,
         order.line,
         order.date,
         model,
         quantity,
         price,
         total.price,
         bikeshop.name,
         location,
         category.1,
         category.2,
         frame.material,
         city,
         state) %>%
  glimpse()

names(df)

names(df) <- str_replace_all(names(df), "\\.", "_")

df %>% glimpse()


If we now want to save our work as a different object we can pipe it into another name.  The benefit of this is that we can work with this new data set without overwriting previous code.  To keep things simple and consistent, we will save the new data set as df_2.  The code is pretty straightforward in R.

df_2 <- df



Python: Rename Columns

In the previous articles, we detailed how to perform the following functions listed below.  For more information, click on the links below.


# 1.0 Load Libraries ----

# # Load Libraries

# Core Python Data Analysis
from numpy.core.defchararray import index
import pandas as pd
import numpy as np

bikes_df      = pd.read_excel("00_data_raw/bikes.xlsx")
bikeshops_df  = pd.read_excel("00_data_raw/bikeshops.xlsx")
orderlines_df = pd.read_excel("00_data_raw/orderlines.xlsx")

bikes_df      = pd.read_excel("00_data_raw/bikes.xlsx")
bikeshops_df  = pd.read_excel("00_data_raw/bikeshops.xlsx")
orderlines_df = pd.read_excel("00_data_raw/orderlines.xlsx")

bike_orderlines_joined_df = orderlines_df \
    .drop(columns='Unnamed: 0', axis=1) \
    .merge(
        right = bikes_df,
        how='left',
        left_on='product.id',
        right_on='bike.id'
    ) \
    .merge(
        right=bikeshops_df,
        how = 'left',
        left_on='customer.id',
        right_on='bikeshop.id'
    )

bike_orderlines_joined_df
bike_orderlines_joined_df.info()

df = bike_orderlines_joined_df
df.T


temp_df = df['description'].str.split(pat=' - ', expand = True)

df['category.1'] = temp_df[0]
df['category.2'] = temp_df[1]
df['frame.material'] = temp_df[2]

df.T

temp_df = df['location'].str.split(', ', n = 1, expand = True)

df['city'] = temp_df[0]
df['state'] = temp_df[1]

df.T

df['total.price'] = df['quantity'] * df['price']

df.T

df.columns

cols_to_keep_list = [
    'order.id', 
    'order.line', 
    'order.date', 
    'model',  
    'quantity',
    'price', 
    'total.price',
    'bikeshop.name', 
    'location', 
    'category.1', 
    'category.2',
    'frame.material', 
    'city', 
    'state'
]

cols_to_keep_list

df = df[cols_to_keep_list]

df.T

df.columns = df.columns.str.replace(".", "_")

df.T


We can perform that same exercise in Python if we want to save our work as a different object. Since Python does not make copies of data frames, the benefit of renaming the data frame object is that we do not have to run long sections of code each time we want to test new code out.  To keep things simple and consistent, we will save the new data set as df_2.  The code is pretty straightforward in Python as well.

df_2 = df.copy()


Conclusion

The process of saving data sets to new objects in R and Python was very similar.  In the next section, we will look at the similarities and differences between aggregations in R and  Python. 

For the complete list of R v Python topics, click on the links below.


Popular posts from this blog

MySQL Part 1: Getting MySQL Set Up in goormIDE

Do Popular Market Index Returns Follow a Normal Distribution?