Python v R Series: Part 07: Renaming Columns

 


Introduction

Renaming columns in both R and Python is simple.  The examples were inspired by Matt Dancho and his incredible library of training videos on Business Science University

In the R examples, we will be using the RStudio IDE.  It is assumed that you have a project set up in the folder where the data files exist. For more information about getting a project set up in R Studio and the Python environment set up in VS Code, click the links below.


R: Rename Columns

In the previous articles, we detailed how to perform the following functions listed below.  For more information, click on the links below.


library(tidyverse)
library(readxl)

bikes      <- read_excel("00_Data_Files/bikes.xlsx")
bikeshops  <- read_excel("00_Data_Files/bikeshops.xlsx")
orderlines <- read_excel("00_Data_Files/orderlines.xlsx")

bikes
bikeshops
orderlines

bikes_data_joined <- orderlines %>% 
  left_join(bikes, by=c("product.id" = "bike.id")) %>%
  left_join(bikeshops, by=c("customer.id" = "bikeshop.id"))

bikes_data_joined %>% glimpse()

df <- bikes_data_joined %>%
  ## Split Description into 3 columns
  separate(description, c("category.1", "category.2", "frame.material"), " - ") %>%
  ## Split location into 2 columns
  separate(location, c("city", "state"), ", ", remove = FALSE) %>%
  glimpse()

df <- df %>% 
  mutate(total.price = quantity * price) %>% 
  glimpse()

df <- df %>%
  select(order.id,
         order.line,
         order.date,
         model,
         quantity,
         price,
         total.price,
         bikeshop.name,
         location,
         category.1,
         category.2,
         frame.material,
         city,
         state) %>%
  glimpse()

Now that we have only the columns we need, we can rename some of these columns.  Before we rename the columns, we can get a list of the names of the columns using the names function.

names(df)




Using that same function, we can then use the str_replace_all function to replace all periods with underscores.

names(df) <- str_replace_all(names(df), "\\.", "_")

df %>% glimpse()

The df column headers now have underscores in the titles as opposed to the periods.








Python: Rename Columns

In the previous articles, we detailed how to perform the following functions listed below.  For more information, click on the links below.


# 1.0 Load Libraries ----

# # Load Libraries

# Core Python Data Analysis
from numpy.core.defchararray import index
import pandas as pd
import numpy as np

bikes_df      = pd.read_excel("00_data_raw/bikes.xlsx")
bikeshops_df  = pd.read_excel("00_data_raw/bikeshops.xlsx")
orderlines_df = pd.read_excel("00_data_raw/orderlines.xlsx")

bikes_df      = pd.read_excel("00_data_raw/bikes.xlsx")
bikeshops_df  = pd.read_excel("00_data_raw/bikeshops.xlsx")
orderlines_df = pd.read_excel("00_data_raw/orderlines.xlsx")

bike_orderlines_joined_df = orderlines_df \
    .drop(columns='Unnamed: 0', axis=1) \
    .merge(
        right = bikes_df,
        how='left',
        left_on='product.id',
        right_on='bike.id'
    ) \
    .merge(
        right=bikeshops_df,
        how = 'left',
        left_on='customer.id',
        right_on='bikeshop.id'
    )

bike_orderlines_joined_df
bike_orderlines_joined_df.info()

df = bike_orderlines_joined_df
df.T


temp_df = df['description'].str.split(pat=' - ', expand = True)

df['category.1'] = temp_df[0]
df['category.2'] = temp_df[1]
df['frame.material'] = temp_df[2]

df.T

temp_df = df['location'].str.split(', ', n = 1, expand = True)

df['city'] = temp_df[0]
df['state'] = temp_df[1]

df.T

df['total.price'] = df['quantity'] * df['price']

df.T

df.columns

cols_to_keep_list = [
    'order.id', 
    'order.line', 
    'order.date', 
    'model', 
    'quantity',
    'price', 
    'total.price',
    'bikeshop.name', 
    'location', 
    'category.1', 
    'category.2',
    'frame.material', 
    'city', 
    'state'
]

cols_to_keep_list

df = df[cols_to_keep_list]

df.T

Now that we have only the columns we need in Python, we can rename some of these columns.  Currently, the headers use periods as separators.  



Similar to the script in R, we can find all periods and replace them with underscores.  

df.columns = df.columns.str.replace(".", "_")

df.T

The df column headers now have underscores in the titles as opposed to the periods.




Conclusion

The process of renaming columns in R and Python was very similar.  In the next section, we will save our data results so that we do not have to re-run large amounts of code to interact with our data results going forward. 

For the complete list of R v Python topics, click on the links below.


Popular posts from this blog

MySQL Part 1: Getting MySQL Set Up in goormIDE

Do Popular Market Index Returns Follow a Normal Distribution?