Accessing Spotify's Track Info Using R
Introduction
Spotify has an API that allows you to build applications from its platform. This article shows you how to gather the data necessary to construct the Tableau Public application shown above using R. This article also assumes that you already have downloaded the Spotify application.
To view the live version of the Tableau Public application, please click here.
Some additional resources I used along the way are listed below. The first article goes into greater detail regarding retrieving the appropriate tokens and keys necessary to ping Spotify's API.
https://medium.com/swlh/accessing-spotifys-api-using-r-1a8eef0507c
https://www.rcharlie.com/spotifyr/
Tokens, IDs, and Secrets
As stated earlier, you will need several items to access the Spotify API. Many items are available to download additional information, but this article requires only the 5 items listed below.
- profileID
- clientID
- secret
- auth_token_01 - this can be obtained by clicking on this link. Note that this token expires every hour and will need to be reestablished every time it expires.
- auth_token_02 - this can be obtained by clicking on this link. Note that this token expires every hour and will need to be reestablished every time it expires.
profileID <- "Your profileID" clientID <- "Your clientID" secret <- "Your secret" auth_token_01 <- "Your auth_token_01" auth_token_02 <- "Your auth_token_02"
Download Libraries
You will need to install/load the following libraries.
library(httr) library(tidyverse) library("xlsx")
Access Playlist
The following code will allow you to retrieve the appropriate access tokens used later in the pipeline.
response <- POST( "https://accounts.spotify.com/api/token", accept_json(), authenticate(clientID, secret), body = list(grant_type = "client_credentials"), encode = "form", verbose() ) access_token <- content(response)$access_token access_token
Please note that by default, you will want to make sure the playlist you are accessing is listed at the top of all other playlists within your Spotify app.
We then use that access token to retrieve some playlist information, including the playlist ID, playlist name, and playlist number of tracks.
header_value <- str_c("Bearer ", auth_token_01) URI <- str_c("https://api.spotify.com/v1/me/playlists") response <- GET(url = URI, add_headers(Authorization = header_value)) response pl <- content(response) ## Ensure the new playlist is listed at the top pl_id <- pl$items[[1]]$id pl_name <- pl$items[[1]]$name pl_num_tracks <- pl$items[[1]]$tracks$total pl_tbl <- tibble(pl_id, pl_name, pl_num_tracks)
The pl_tbl tibble that we created looks like this.
Access Tracks Within Playlist
Now that we have the playlist ID, we can now access the contents within the playlist. We also want to set up a genre object that we will tack onto our tibble later in the pipeline. We also need to define an offset number. The offset number represents the row number in your playlist. The Spotify API only allows you to download 100 songs at a time. For our purposes, we will use 0 as our default offset.
genre <- "Rock" offset <- "0" header_value_2 <- str_c("Bearer ", auth_token_02) URI_2 <- str_c("https://api.spotify.com/v1/playlists/", pl_tbl$pl_id, "/tracks?offset=", offset) response_2 <- GET(url = URI_2, add_headers(Authorization = header_value_2)) response_2 pl_2 <- content(response_2) pl_2_track_limit <- pl_2$limit ntracks_tbl <- tibble(limit = pl_2_track_limit, track_cnt = pl_tbl$pl_num_tracks, final_track_cnt = case_when(limit < track_cnt ~ limit, TRUE ~ track_cnt)) ntracks <- ntracks_tbl$final_track_cnt
Create Function to Loop Through Tracks
Now that we have accessed the playlist's content, we can set up a generic function to retrieve the following items for the first 100 tracks within the playlist.
list_playlist_tracks <- function(track_num) { track_id <- pl_2$items[[track_num]]$track$id track_name <- pl_2$items[[track_num]]$track$name track_popularity <- as.character(pl_2$items[[track_num]]$track$popularity) track_duration <- as.character(pl_2$items[[track_num]]$track$duration_ms) artist_name <- pl_2$items[[track_num]]$track$artists[[1]]$name artist_id <- pl_2$items[[track_num]]$track$artists[[1]]$id album_release_date <- pl_2$items[[track_num]]$track$album$release_date album_name <- pl_2$items[[track_num]]$track$album$name album_image_url <- pl_2$items[[track_num]]$track$album$images[[1]]$url album_image_lrg <- pl_2$items[[track_num]]$track$album$images[[1]]$url album_image_med <- pl_2$items[[track_num]]$track$album$images[[2]]$url album_image_sma <- pl_2$items[[track_num]]$track$album$images[[3]]$url album_id <- pl_2$items[[track_num]]$track$album$id song_preview_url <- pl_2$items[[track_num]]$track$preview_url output <- list(track_id, track_name, track_popularity, track_duration, artist_name, artist_id, album_release_date, album_name, album_image_url, album_image_lrg, album_image_med, album_image_sma, album_id, song_preview_url) return <- output }
This function can be mapped for the first 100 songs. In other words, the code below will allow you to loop through the first 100 songs in the playlist and retrieve the information listed above. In addition, the results need to be unlisted, enframed, and unnested to produce a workable tibble later in the pipeline.
playlist_tracks_list <- map(1:ntracks, list_playlist_tracks) playlist_tracks_tbl_1 <- playlist_tracks_list %>% unlist(recursive = FALSE) %>% enframe() %>% unnest(c(name, value)) %>% mutate(row_cnt = max(row_number()), ntracks = ntracks, distinct_names = row_cnt/ntracks)
The results of the playlist_tracks_tbl_1 are listed below.
Pivot Track Data
As you can see above, the data for the playlist is long and narrow. In other words, each song has 14 rows associated with it and will need to be pivoted so that each song has one row with many descriptors.
The code below allows us to pivot the data into a wider format.
distinct_names_lst <- playlist_tracks_tbl_1$distinct_names[1] distinct_count_tbl <- tibble(track_num = as.factor(rep(1:ntracks, each = distinct_names_lst))) distinct_names_tbl <- tibble(description = as.factor(rep(c("track_id", "track_name", "track_popularity", "track_duration", "artist_name", "artist_id", "album_release_date", "album_name", "album_image_url", "album_image_lrg", "album_image_med", "album_image_sma", "album_id", "song_preview_url"), ntracks))) playlist_tracks_tbl_2 <- cbind(distinct_count_tbl, distinct_names_tbl, playlist_tracks_tbl_1) %>% mutate(genre = genre) %>% select(track_num, description, genre, value) new_playlist_info <- playlist_tracks_tbl_2 %>% pivot_wider(names_from = description, values_from = value)
The results of the new_playlist_info are listed below. We now have a tibble set up where each song is recorded on one row.
Access Additional Song Information
We can add additional song info (energy, danceability, etc.) using another path in the Spotify API. A lot of the same steps listed above are repeated in the code below.
list_playlist_track_detail <- function(track_id) { #Sys.sleep(0.02) URI_track <- str_c("https://api.spotify.com/v1/audio-features/", track_id) response_track <- GET(url <- URI_track, add_headers(Authorization = header_value)) track <- content(response_track) return(track) } playlist_tracks_lst <- new_playlist_info$track_id playlist_track_detail_map <- map(playlist_tracks_lst, list_playlist_track_detail) playlist_track_detail <- playlist_track_detail_map %>% unlist(recursive = TRUE) %>% enframe() playlist_track_detail_filtered <- playlist_track_detail %>% filter(name %in% c("id", "danceability", "energy", "valence", "tempo")) num_of_row <- playlist_track_detail_filtered %>% nrow() num_of_tracks <- playlist_track_detail_filtered %>% filter(name == "id") %>% count() num_of_rows_per_track <- num_of_row/num_of_tracks$n track_number_col <- rep(c(1:num_of_tracks$n),each=num_of_rows_per_track) playlist_track_detail_pivot_wide <- playlist_track_detail_filtered %>% cbind(track_number_col) %>% select(track_number_col, everything()) %>% pivot_wider(names_from = name, values_from = value) %>% rename(track_id = id) playlist_track_detail_pivot_wide %>% glimpse()
The playlist_track_detail_pivot_wide tibble result is listed below.
Access Additional Artist Information
We can also grab additional artist info using similar code to what we had listed above.
list_playlist_artist_detail <- function(artist_id) { #Sys.sleep(0.02) URI_track <- str_c("https://api.spotify.com/v1/artists/", artist_id) response_artist <- GET(url <- URI_track, add_headers(Authorization = header_value)) artist <- content(response_artist) return(artist) } playlist_artist_lst <- new_playlist_info$artist_id playlist_artist_detail_map <- map(playlist_artist_lst, list_playlist_artist_detail) playlist_artist_detail <- playlist_artist_detail_map %>% unlist(recursive = TRUE) %>% enframe() playlist_artist_detail %>% glimpse() playlist_artist_detail_filtered <- playlist_artist_detail %>% filter(name %in% c("id", "followers.total", "popularity" )) num_of_row_artist_info <- playlist_artist_detail_filtered %>% nrow() num_of_tracks_artist_info <- playlist_artist_detail_filtered %>% filter(name == "id") %>% count() num_of_rows_per_track_artist_info <- num_of_row_artist_info/num_of_tracks_artist_info$n track_number_col_artist_info <- rep(c(1:num_of_tracks_artist_info$n),each=num_of_rows_per_track_artist_info) playlist_artist_detail_pivot_wide <- playlist_artist_detail_filtered %>% cbind(track_number_col_artist_info) %>% select(track_number_col_artist_info, everything()) %>% pivot_wider(names_from = name, values_from = value) %>% rename(artist_id = id) %>% select(artist_id, followers.total, popularity) %>% rename(artist_popularity = popularity) %>% distinct()
Putting it all Together
We can then join all three tibbles together for our final tibble used in the Tableau Public application.
new_playlist_tbl <- new_playlist_info %>% left_join(playlist_track_detail_pivot_wide) %>% left_join(playlist_artist_detail_pivot_wide)
The final tibble now has 100 rows and 23 columns.
To access the Tableau Public application, please click here.






