Loading Multiple Files from Various Folders

Problem Statement

This tutorial shows you how to load multiple files from multiple folders using R.  As a bonus, this article will also show you how to add folder and file names to the final tibble.

Load Libraries

Tidyverse and readxl are the only libraries required for this tutorial.

library(tidyverse)
library(readxl)


Data Description

In this example, Spotify playlist data was downloaded using Exportify. Each .xlsx file contains information about the playlist, including the artist name, track name, album name, and song duration.

A sample of one of the playlist data files is listed below.




Folder Structure

Each playlist was previously converted to a .xlsx file and placed into the appropriate genre’s folder. The three genre folders include Country, Hip-Hop, and Rock.

Warning: please make sure to update your paths below to represent your appropriate project paths.

list.dirs(path = "Data_Sources/2020_05_14_Loading_Mult_Files/Genres/", 
          full.names = FALSE, 
          recursive  = FALSE)



File Structure

Each genre folder includes multiple playlist files. The example below shows the three playlists that exist inside the Country genre folder.

list.files(path = "Data_Sources/2020_05_14_Loading_Mult_Files/Genres/Country/", 
          full.names = FALSE, 
          recursive  = FALSE)



Create Function to List All Files in Each Folder

A custom function called list_files_in_folder() was created that lists the .xlsx files in each genre folder.


list_files_in_folder <- function(genre) {
    list.files(path = str_c("Data_Sources/2020_05_14_Loading_Mult_Files/Genres/", genre),
                          pattern = "*.xlsx", 
                          full.names = TRUE) %>% 
    tibble::enframe()
}


View Files in Country Genre Folder

For example, if we want to view all the files in the Country genre folder, we can easily pass “Country” into the function.

list_files_in_folder("Country")



Create Genre List

Next, a genre_list() was created to include all three (3) genres. This list allows us to map the list_files_in_folder() function to multiple genre folders later on.

genre_list <- c("Country", "Hip_Hop", "Rock")


View All Files In All Genre Folders

Using the purrr::map_df() function, we can pass the genre_list() into the list_files_in_folder() function. The final result produces a tibble that includes all the playlists from all of the genre folders.

playlist_tbl <- genre_list %>% 
  purrr::map_df(list_files_in_folder)

As you can see below, the playlist_tbl now includes 9 observations. Each of the three (3) genre folders has three playlist .xlsx files.







Extract “value” Column From playlist_tbl

The value column was then selected from the playlist_tbl above and converted into a list.

playlist_list <- playlist_tbl %>%
  select(value) %>%
  as.list()

playlist_list$value



Read All Excel Files and Add Folder and File Names

The playlist_list$value was mapped into the function below, which read and loaded each .xlsx file. The code below also added the filename and entire path into a new column called filename.

song_tbl <- playlist_list$value %>% 
    map_df(function(add_file_name)
        read_excel(add_file_name) %>%
            mutate(playlist_name = gsub(".xlsx", 
                                        "", 
                                        basename(add_file_name))) %>%
            mutate(filename = add_file_name)
    )

song_tbl %>% 
  glimpse()



Separate “filename” Column

The entire path and filename were listed in the new filename column in the song_tbl. Using the separate() function, we can split out three (3) folders, Genre and the Playlist, into five (5) different columns.

song_tidy_tbl <- song_tbl %>% 
            separate(col    = filename, 
                     into   = c("Folder_01", 
                                "Folder_02", 
                                "Folder_03", 
                                "Genre", 
                                "Playlist"),
                     sep    = "/",
                     remove = FALSE)


Final Tidy Tibble

The final tibble is complete and tidy, with all 794 songs listed.

song_tidy_tbl %>% 
  glimpse()











Popular posts from this blog

MySQL Part 1: Getting MySQL Set Up in goormIDE

Do Popular Market Index Returns Follow a Normal Distribution?