Python v R Series: Part 03: Change Data Types
Introduction
Changing data types is simple in both R and Python. The examples were inspired by Matt Dancho and his incredible library of training videos on Business Science University.
In the R examples, we will be using the RStudio IDE. It is assumed that you have a project set up in the folder where the data files exist. For more information about getting a project set up in R Studio and the Python environment set up in VS Code, click the links below.
In both examples, we will be importing and joining 3 different excel data sets. Do not worry if you do not have the exact data sets to follow along. You can simply use a data set of your own and follow along.
R: Change Data Types
library(tidyverse) library(readxl) bikes <- read_excel("00_Data_Files/bikes.xlsx") bikeshops <- read_excel("00_Data_Files/bikeshops.xlsx") orderlines <- read_excel("00_Data_Files/orderlines.xlsx")
bikes bikeshops orderlines
bikes_data_joined <- orderlines %>% left_join(bikes, by=c("product.id" = "bike.id")) %>% left_join(bikeshops, by=c("customer.id" = "bikeshop.id")) bikes_data_joined %>% glimpse()
bikes_data_joined %>% mutate(order.date = as.character(order.date)) %>% glimpse()
Python: Change Data Types
# 1.0 Load Libraries ---- # # Load Libraries # Core Python Data Analysis from numpy.core.defchararray import index import pandas as pd import numpy as np
bikes_df = pd.read_excel("00_data_raw/bikes.xlsx")
bikeshops_df = pd.read_excel("00_data_raw/bikeshops.xlsx")
orderlines_df = pd.read_excel("00_data_raw/orderlines.xlsx")
bikes_df = pd.read_excel("00_data_raw/bikes.xlsx")
bikeshops_df = pd.read_excel("00_data_raw/bikeshops.xlsx")
orderlines_df = pd.read_excel("00_data_raw/orderlines.xlsx")
bike_orderlines_joined_df = orderlines_df \ .drop(columns='Unnamed: 0', axis=1) \ .merge( right = bikes_df, how='left', left_on='product.id', right_on='bike.id' ) \ .merge( right=bikeshops_df, how = 'left', left_on='customer.id', right_on='bikeshop.id' )
bike_orderlines_joined_df
bike_orderlines_joined_df.info()
df = bike_orderlines_joined_df df['order.date'] = df["order.date"].astype(str) df.info()
Conclusion
The process of changing data types between R and Python was again very similar. Going forward, just note that we will ignore the new code presented in this article and order.date will be a date data type going forward.
For the complete list of R v Python topics, click on the links below.
03: Change Data Types
04: String Splits
05: Calculate New Columns
06: Organize Columns
07: Rename Columns
08: Saving Data
09: Aggregations