Python v R Series: Part 03: Change Data Types

 


Introduction

Changing data types is simple in both R and Python.  The examples were inspired by Matt Dancho and his incredible library of training videos on Business Science University

In the R examples, we will be using the RStudio IDE.  It is assumed that you have a project set up in the folder where the data files exist. For more information about getting a project set up in R Studio and the Python environment set up in VS Code, click the links below.

In both examples, we will be importing and joining 3 different excel data sets.  Do not worry if you do not have the exact data sets to follow along.  You can simply use a data set of your own and follow along.


R: Change Data Types

In the previous articles, we discussed correctly importing, viewing, and joining all of our data.  The code below includes all the code from the last article. Click the links below for more details.


library(tidyverse)
library(readxl)

bikes      <- read_excel("00_Data_Files/bikes.xlsx")
bikeshops  <- read_excel("00_Data_Files/bikeshops.xlsx")
orderlines <- read_excel("00_Data_Files/orderlines.xlsx")

bikes
bikeshops
orderlines

bikes_data_joined <- orderlines %>% 
  left_join(bikes, by=c("product.id" = "bike.id")) %>%
  left_join(bikeshops, by=c("customer.id" = "bikeshop.id"))

bikes_data_joined %>% glimpse()










Now that all three data sources are joined together, we can go through examples of changing a data type.  The data above shows order.date as a dttm data type.  Since this is a date data type, you can utilize various R packages' built-in date functions.  However, let's suggest that you convert the data type to a character (chr) data type.  The code below shows you how to do this.  This is not a good best practice; however, it is good to know how to change data types in both R and Python. 

The statement below uses a mutate function to basically overwrite the original data type.  Typically, mutate statements are used when adding additional columns.   

bikes_data_joined %>%
  mutate(order.date = as.character(order.date)) %>%
  glimpse()









We will not use this line of code going forward since we will want to keep the data format in all of the examples following this section.


Python: Change Data Types

In the previous article, we discussed how to correctly import and view data.  The code below includes all the code from the last article. Click the link below for more details.


# 1.0 Load Libraries ----

# # Load Libraries

# Core Python Data Analysis
from numpy.core.defchararray import index
import pandas as pd
import numpy as np

bikes_df      = pd.read_excel("00_data_raw/bikes.xlsx")
bikeshops_df  = pd.read_excel("00_data_raw/bikeshops.xlsx")
orderlines_df = pd.read_excel("00_data_raw/orderlines.xlsx")

bikes_df      = pd.read_excel("00_data_raw/bikes.xlsx")
bikeshops_df  = pd.read_excel("00_data_raw/bikeshops.xlsx")
orderlines_df = pd.read_excel("00_data_raw/orderlines.xlsx")

bike_orderlines_joined_df = orderlines_df \
    .drop(columns='Unnamed: 0', axis=1) \
    .merge(
        right = bikes_df,
        how='left',
        left_on='product.id',
        right_on='bike.id'
    ) \
    .merge(
        right=bikeshops_df,
        how = 'left',
        left_on='customer.id',
        right_on='bikeshop.id'
    )

bike_orderlines_joined_df
bike_orderlines_joined_df.info()


In Python, when working with data frames, it is a good practice to shorten the title of the data frame.  Below, we rename the bike_orderlines_joined_df to df.  Similar to the R Examples, we then call the order.date column and change the data type to a string.  The results are below.

df = bike_orderlines_joined_df
df['order.date'] = df["order.date"].astype(str)
df.info()












Conclusion

The process of changing data types between R and Python was again very similar.  Going forward, just note that we will ignore the new code presented in this article and order.date will be a date data type going forward.

For the complete list of R v Python topics, click on the links below.



Popular posts from this blog

MySQL Part 1: Getting MySQL Set Up in goormIDE

Do Popular Market Index Returns Follow a Normal Distribution?