Introduction

Saving data in R and Python is fairly similar. The examples were inspired by Matt Dancho and his incredible library of training videos on Business Science University.

In the R examples, we will be using the RStudio IDE. It is assumed that you have a project set up in the folder where the data files exist. For more information about getting a project set up in R Studio and the Python environment set up in VS Code, click the links below.

Setting Up Projects in R

Setting Up Python VS Code Environment

R: Saving Data

In the previous articles, we detailed how to perform the following functions listed below. For more information, click on the links below.

01: Data Imports

02: Join Data
03: Change Data Types
04: String Splits
05: Calculate New Columns
06: Organize Columns
07: Rename Columns

library(tidyverse)
library(readxl)

bikes      <- read_excel("00_Data_Files/bikes.xlsx")
bikeshops  <- read_excel("00_Data_Files/bikeshops.xlsx")
orderlines <- read_excel("00_Data_Files/orderlines.xlsx")

bikes
bikeshops
orderlines

bikes_data_joined <- orderlines %>% 
  left_join(bikes, by=c("product.id" = "bike.id")) %>%
  left_join(bikeshops, by=c("customer.id" = "bikeshop.id"))

bikes_data_joined %>% glimpse()

df <- bikes_data_joined %>%

  ## Split Description into 3 columns
  separate(description, c("category.1", "category.2", "frame.material"), " - ") %>%
  ## Split location into 2 columns
  separate(location, c("city", "state"), ", ", remove = FALSE) %>%
  glimpse()

df <- df %>% 
  mutate(total.price = quantity * price) %>% 
  glimpse()

df <- df %>%
  select(order.id,
         order.line,
         order.date,
         model,
         quantity,
         price,
         total.price,
         bikeshop.name,
         location,
         category.1,
         category.2,
         frame.material,
         city,
         state) %>%
  glimpse()

names(df)

names(df) <- str_replace_all(names(df), "\\.", "_")

df %>% glimpse()

If we now want to save our work as a different object we can pipe it into another name. The benefit of this is that we can work with this new data set without overwriting previous code. To keep things simple and consistent, we will save the new data set as df_2. The code is pretty straightforward in R.

df_2 <- df

Python: Rename Columns

In the previous articles, we detailed how to perform the following functions listed below. For more information, click on the links below.

01: Data Imports

02: Join Data
03: Change Data Types
04: String Splits
05: Calculate New Columns
06: Organize Columns
07: Rename Columns

# 1.0 Load Libraries ----

# # Load Libraries

# Core Python Data Analysis
from numpy.core.defchararray import index
import pandas as pd
import numpy as np

bikes_df      = pd.read_excel("00_data_raw/bikes.xlsx")
bikeshops_df  = pd.read_excel("00_data_raw/bikeshops.xlsx")
orderlines_df = pd.read_excel("00_data_raw/orderlines.xlsx")

bikes_df      = pd.read_excel("00_data_raw/bikes.xlsx")
bikeshops_df  = pd.read_excel("00_data_raw/bikeshops.xlsx")
orderlines_df = pd.read_excel("00_data_raw/orderlines.xlsx")

bike_orderlines_joined_df = orderlines_df \
    .drop(columns='Unnamed: 0', axis=1) \
    .merge(
        right = bikes_df,
        how='left',
        left_on='product.id',
        right_on='bike.id'
    ) \
    .merge(
        right=bikeshops_df,
        how = 'left',
        left_on='customer.id',
        right_on='bikeshop.id'
    )

bike_orderlines_joined_df

bike_orderlines_joined_df.info()

df = bike_orderlines_joined_df
df.T

temp_df = df['description'].str.split(pat=' - ', expand = True)

df['category.1'] = temp_df[0]
df['category.2'] = temp_df[1]
df['frame.material'] = temp_df[2]

df.T

temp_df = df['location'].str.split(', ', n = 1, expand = True)

df['city'] = temp_df[0]
df['state'] = temp_df[1]

df.T

df['total.price'] = df['quantity'] * df['price']

df.T

df.columns

cols_to_keep_list = [
    'order.id', 
    'order.line', 
    'order.date', 
    'model',  
    'quantity',
    'price', 
    'total.price',
    'bikeshop.name', 
    'location', 
    'category.1', 
    'category.2',
    'frame.material', 
    'city', 
    'state'
]

cols_to_keep_list

df = df[cols_to_keep_list]

df.T

df.columns = df.columns.str.replace(".", "_")

df.T

We can perform that same exercise in Python if we want to save our work as a different object. Since Python does not make copies of data frames, the benefit of renaming the data frame object is that we do not have to run long sections of code each time we want to test new code out. To keep things simple and consistent, we will save the new data set as df_2. The code is pretty straightforward in Python as well.

df_2 = df.copy()

Conclusion

The process of saving data sets to new objects in R and Python was very similar. In the next section, we will look at the similarities and differences between aggregations in R and Python.

For the complete list of R v Python topics, click on the links below.

01: Data Imports

02: Join Data
03: Change Data Types
04: String Splits
05: Calculate New Columns
06: Organize Columns
07: Rename Columns
08: Saving Data
09: Aggregations

Search This Blog

Data Exploration

Python v R Series: Part 08: Saving Data

Introduction

R: Saving Data

Python: Rename Columns

Conclusion

Popular posts from this blog

Website Time Series Anomaly Detection

MySQL Part 1: Getting MySQL Set Up in goormIDE

Do Popular Market Index Returns Follow a Normal Distribution?