R v Python Series: Part 01: Data Imports
Introduction
Data imports are very easy to do in both R and Python. The examples were inspired by Matt Dancho and his incredible library of training videos on Business Science University.
In the R examples, we will be using the RStudio IDE. It is assumed that you have a project set up in the folder where the data files exist. For more information about getting a project set up in R Studio and the Python environment set up in VS Code, click the links below.
In both examples, we will be importing and joining 3 different excel data sets. Do not worry if you do not have the exact data sets to follow along. You can simply use a data set of your own and follow along.
R: Data Import
library(tidyverse) library(readxl) bikes <- read_excel("00_Data_Files/bikes.xlsx") bikeshops <- read_excel("00_Data_Files/bikeshops.xlsx") orderlines <- read_excel("00_Data_Files/orderlines.xlsx")
bikes bikeshops orderlines
In the Python scripts below, we assume your VS code environment is installed and up and running. We will be using a different set of files and folders for the Python environment. The excel files we loaded in the R examples above are located in this path for the Python example.
To import the folder with the excel files, click on File > Open Folder and find the folder containing your Excel files.
If you successfully loaded the files into your environment, you will see them as shown below.
Next, we will load some popular libraries that exist in Python
# 1.0 Load Libraries ---- # # Load Libraries # Core Python Data Analysis from numpy.core.defchararray import index import pandas as pd import numpy as np
The code to load the .xlsx files is very similar compared to R, as shown below.
bikes_df = pd.read_excel("00_data_raw/bikes.xlsx") bikeshops_df = pd.read_excel("00_data_raw/bikeshops.xlsx") orderlines_df = pd.read_excel("00_data_raw/orderlines.xlsx")
To test to see if the data was imported correctly, simply type the names of the objects and press Shift + Enter after each line.
bikes_df bikeshops_df orderlines_df
The results of the data imports are listed below.
Conclusion
The data import process between R and Python were almost identical. Both the platforms make it incredibly easy to read in .xlsx files.
For the complete list of R v Python topics, click on the links below.
03: Change Data Types
04: String Splits
05: Calculate New Columns
06: Organize Columns
07: Rename Columns
08: Saving Data
09: Aggregations