R v Python Series: Part 05: Calculate New Columns

 


Introduction

Adding columns in both R and Python is pretty straightforward.  The examples were inspired by Matt Dancho and his incredible library of training videos on Business Science University

In the R examples, we will be using the RStudio IDE.  It is assumed that you have a project set up in the folder where the data files exist. For more information about getting a project set up in R Studio and the Python environment set up in VS Code, click the links below.


R: Mutate (Add) Columns

In the previous articles, we detailed how to perform the following functions listed below.  For more information, click on the links below.


library(tidyverse)
library(readxl)

bikes      <- read_excel("00_Data_Files/bikes.xlsx")
bikeshops  <- read_excel("00_Data_Files/bikeshops.xlsx")
orderlines <- read_excel("00_Data_Files/orderlines.xlsx")

bikes
bikeshops
orderlines

bikes_data_joined <- orderlines %>% 
  left_join(bikes, by=c("product.id" = "bike.id")) %>%
  left_join(bikeshops, by=c("customer.id" = "bikeshop.id"))

bikes_data_joined %>% glimpse()

df <- bikes_data_joined %>%
  ## Split Description into 3 columns
  separate(description, c("category.1", "category.2", "frame.material"), " - ") %>%
  ## Split location into 2 columns
  separate(location, c("city", "state"), ", ", remove = FALSE) %>%
  glimpse()











Next, we are going to add a calculated column called total.price.  This column multiplies the price times the quantity to get the order total.  The code is listed below, with the new column highlighted as well.

df <- df %>% 
  mutate(total.price = quantity * price) %>% 
  glimpse()



Python: (Add) Columns

In the previous articles, we detailed how to perform the following functions listed below.  For more information, click on the links below.


# 1.0 Load Libraries ----

# # Load Libraries

# Core Python Data Analysis
from numpy.core.defchararray import index
import pandas as pd
import numpy as np

bikes_df      = pd.read_excel("00_data_raw/bikes.xlsx")
bikeshops_df  = pd.read_excel("00_data_raw/bikeshops.xlsx")
orderlines_df = pd.read_excel("00_data_raw/orderlines.xlsx")

bikes_df      = pd.read_excel("00_data_raw/bikes.xlsx")
bikeshops_df  = pd.read_excel("00_data_raw/bikeshops.xlsx")
orderlines_df = pd.read_excel("00_data_raw/orderlines.xlsx")

bike_orderlines_joined_df = orderlines_df \
    .drop(columns='Unnamed: 0', axis=1) \
    .merge(
        right = bikes_df,
        how='left',
        left_on='product.id',
        right_on='bike.id'
    ) \
    .merge(
        right=bikeshops_df,
        how = 'left',
        left_on='customer.id',
        right_on='bikeshop.id'
    )

bike_orderlines_joined_df
bike_orderlines_joined_df.info()

df = bike_orderlines_joined_df
df.T


temp_df = df['description'].str.split(pat=' - ', expand = True)

df['category.1'] = temp_df[0]
df['category.2'] = temp_df[1]
df['frame.material'] = temp_df[2]

df.T


temp_df = df['location'].str.split(', ', n = 1, expand = True)

df['city'] = temp_df[0]
df['state'] = temp_df[1]

df.T


Similar to the R example, we are going to add a calculated column called total.price.  This column multiplies the price times the quantity to get the order total.  The code is listed below, with the new column highlighted as well.

df['total.price'] = df['quantity'] * df['price']

df.T





Conclusion

The process of adding calculated columns in R and Python was very comparable.  In the next section, we will organize and eliminate unneeded columns in our data sets.

For the complete list of R v Python topics, click on the links below.


Popular posts from this blog

MySQL Part 1: Getting MySQL Set Up in goormIDE

Do Popular Market Index Returns Follow a Normal Distribution?