Accessing Spotify's Track Info Using R

Introduction

Spotify has an API that allows you to build applications from its platform.  This article shows you how to gather the data necessary to construct the Tableau Public application shown above using R.  This article also assumes that you already have downloaded the Spotify application.

To view the live version of the Tableau Public application, please click here.

Some additional resources I used along the way are listed below.  The first article goes into greater detail regarding retrieving the appropriate tokens and keys necessary to ping Spotify's API.

https://medium.com/swlh/accessing-spotifys-api-using-r-1a8eef0507c

http://rcharlie.net/sentify/

https://www.rcharlie.com/spotifyr/


Tokens, IDs, and Secrets

As stated earlier, you will need several items to access the Spotify API.  Many items are available to download additional information, but this article requires only the 5 items listed below.

  • profileID
  • clientID
  • secret
  • auth_token_01 - this can be obtained by clicking on this link.  Note that this token expires every hour and will need to be reestablished every time it expires.
  • auth_token_02 - this can be obtained by clicking on this link.  Note that this token expires every hour and will need to be reestablished every time it expires.
Once you have retrieved all the necessary items above, your can then assign those objects below.  Obviously, you will type in your personalized information below.

profileID <- "Your profileID"
clientID  <- "Your clientID"
secret    <- "Your secret"
auth_token_01 <- "Your auth_token_01"
auth_token_02 <- "Your auth_token_02"


Download Libraries

You will need to install/load the following libraries.

library(httr)
library(tidyverse)
library("xlsx")


Access Playlist

The following code will allow you to retrieve the appropriate access tokens used later in the pipeline.

response <- POST(
  "https://accounts.spotify.com/api/token",
  accept_json(),
  authenticate(clientID, secret),
  body = list(grant_type = "client_credentials"),
  encode = "form",
  verbose()
)

access_token <- content(response)$access_token
access_token

Please note that by default, you will want to make sure the playlist you are accessing is listed at the top of all other playlists within your Spotify app.  


We then use that access token to retrieve some playlist information, including the playlist ID, playlist name, and playlist number of tracks.

header_value <- str_c("Bearer ", auth_token_01)

URI <- str_c("https://api.spotify.com/v1/me/playlists")
response <- GET(url = URI, add_headers(Authorization = header_value))
response
pl <- content(response)

## Ensure the new playlist is listed at the top
pl_id          <- pl$items[[1]]$id
pl_name        <- pl$items[[1]]$name
pl_num_tracks  <- pl$items[[1]]$tracks$total


pl_tbl <- tibble(pl_id,
                 pl_name,
                 pl_num_tracks)

The pl_tbl tibble that we created looks like this.




Access Tracks Within Playlist

Now that we have the playlist ID, we can now access the contents within the playlist.  We also want to set up a genre object that we will tack onto our tibble later in the pipeline.  We also need to define an offset number.  The offset number represents the row number in your playlist.  The Spotify API only allows you to download 100 songs at a time.  For our purposes, we will use 0 as our default offset.

genre  <- "Rock"
offset <- "0"

header_value_2 <- str_c("Bearer ", auth_token_02)

URI_2 <- str_c("https://api.spotify.com/v1/playlists/", pl_tbl$pl_id, "/tracks?offset=", offset)
response_2 <- GET(url = URI_2, add_headers(Authorization = header_value_2))
response_2
pl_2 <- content(response_2)

pl_2_track_limit <- pl_2$limit

ntracks_tbl <- tibble(limit = pl_2_track_limit,
                      track_cnt = pl_tbl$pl_num_tracks,
                      final_track_cnt = case_when(limit < track_cnt ~ limit,
                                                  TRUE ~ track_cnt))

ntracks <- ntracks_tbl$final_track_cnt


Create Function to Loop Through Tracks

Now that we have accessed the playlist's content, we can set up a generic function to retrieve the following items for the first 100 tracks within the playlist.

list_playlist_tracks <- function(track_num) {
  track_id           <- pl_2$items[[track_num]]$track$id
  track_name         <- pl_2$items[[track_num]]$track$name
  track_popularity   <- as.character(pl_2$items[[track_num]]$track$popularity)
  track_duration     <- as.character(pl_2$items[[track_num]]$track$duration_ms)
  artist_name        <- pl_2$items[[track_num]]$track$artists[[1]]$name
  artist_id          <- pl_2$items[[track_num]]$track$artists[[1]]$id
  album_release_date <- pl_2$items[[track_num]]$track$album$release_date
  album_name         <- pl_2$items[[track_num]]$track$album$name
  album_image_url    <- pl_2$items[[track_num]]$track$album$images[[1]]$url
  album_image_lrg    <- pl_2$items[[track_num]]$track$album$images[[1]]$url
  album_image_med    <- pl_2$items[[track_num]]$track$album$images[[2]]$url
  album_image_sma    <- pl_2$items[[track_num]]$track$album$images[[3]]$url
  album_id           <- pl_2$items[[track_num]]$track$album$id
  song_preview_url   <- pl_2$items[[track_num]]$track$preview_url

  output <- list(track_id,
                 track_name,
                 track_popularity,
                 track_duration,
                 artist_name,
                 artist_id,
                 album_release_date,
                 album_name,
                 album_image_url,
                 album_image_lrg,
                 album_image_med,
                 album_image_sma,
                 album_id,
                 song_preview_url)
  
  return <- output
  }

This function can be mapped for the first 100 songs.  In other words, the code below will allow you to loop through the first 100 songs in the playlist and retrieve the information listed above.  In addition, the results need to be unlisted, enframed, and unnested to produce a workable tibble later in the pipeline.

playlist_tracks_list <- map(1:ntracks, list_playlist_tracks)

playlist_tracks_tbl_1 <- playlist_tracks_list %>% 
  unlist(recursive = FALSE) %>% 
  enframe() %>% 
  unnest(c(name, value)) %>%
  mutate(row_cnt = max(row_number()),
         ntracks = ntracks,
         distinct_names = row_cnt/ntracks)

The results of the playlist_tracks_tbl_1 are listed below.  






Pivot Track Data

As you can see above, the data for the playlist is long and narrow.  In other words, each song has 14 rows associated with it and will need to be pivoted so that each song has one row with many descriptors.  

The code below allows us to pivot the data into a wider format.

distinct_names_lst <- playlist_tracks_tbl_1$distinct_names[1]

distinct_count_tbl <- tibble(track_num = as.factor(rep(1:ntracks, each = distinct_names_lst)))
distinct_names_tbl <- tibble(description = as.factor(rep(c("track_id", 
                                                           "track_name",
                                                           "track_popularity",
                                                           "track_duration",
                                                           "artist_name",
                                                           "artist_id",
                                                           "album_release_date",
                                                           "album_name",
                                                           "album_image_url",
                                                           "album_image_lrg",
                                                           "album_image_med",
                                                           "album_image_sma",
                                                           "album_id",
                                                           "song_preview_url"), 
                                                         ntracks)))

playlist_tracks_tbl_2 <- cbind(distinct_count_tbl, distinct_names_tbl, playlist_tracks_tbl_1) %>%
  mutate(genre = genre) %>%
  select(track_num, description, genre, value)

new_playlist_info <- playlist_tracks_tbl_2 %>% pivot_wider(names_from = description, values_from = value)

The results of the new_playlist_info are listed below.  We now have a tibble set up where each song is recorded on one row.







Access Additional Song Information

We can add additional song info (energy, danceability, etc.) using another path in the Spotify API.  A lot of the same steps listed above are repeated in the code below.    

list_playlist_track_detail <- function(track_id) {
  #Sys.sleep(0.02)
  URI_track      <- str_c("https://api.spotify.com/v1/audio-features/", track_id)
  response_track <- GET(url <- URI_track, add_headers(Authorization = header_value))
  track          <- content(response_track)
  return(track)
}

playlist_tracks_lst <- new_playlist_info$track_id

playlist_track_detail_map <- map(playlist_tracks_lst, list_playlist_track_detail)

playlist_track_detail <- playlist_track_detail_map %>% 
  unlist(recursive = TRUE) %>% 
  enframe()

playlist_track_detail_filtered <- playlist_track_detail %>%
  filter(name %in% c("id",
                     "danceability",
                     "energy",
                     "valence",
                     "tempo"))

num_of_row <- playlist_track_detail_filtered %>%
  nrow()

num_of_tracks <- playlist_track_detail_filtered %>% 
  filter(name == "id") %>%
  count()

num_of_rows_per_track <- num_of_row/num_of_tracks$n

track_number_col <- rep(c(1:num_of_tracks$n),each=num_of_rows_per_track)

playlist_track_detail_pivot_wide <- playlist_track_detail_filtered %>%
  cbind(track_number_col) %>%
  select(track_number_col, everything()) %>%
  pivot_wider(names_from = name, values_from = value) %>%
  rename(track_id = id)

playlist_track_detail_pivot_wide %>% glimpse()

The playlist_track_detail_pivot_wide tibble result is listed below.






Access Additional Artist Information

We can also grab additional artist info using similar code to what we had listed above.  

list_playlist_artist_detail <- function(artist_id) {
  #Sys.sleep(0.02)
  URI_track       <- str_c("https://api.spotify.com/v1/artists/", artist_id)
  response_artist <- GET(url <- URI_track, add_headers(Authorization = header_value))
  artist          <- content(response_artist)
  return(artist)
}

playlist_artist_lst <- new_playlist_info$artist_id

playlist_artist_detail_map <- map(playlist_artist_lst, list_playlist_artist_detail)

playlist_artist_detail <- playlist_artist_detail_map %>% 
  unlist(recursive = TRUE) %>% 
  enframe()

playlist_artist_detail %>% glimpse()

playlist_artist_detail_filtered <- playlist_artist_detail %>%
  filter(name %in% c("id",
                     "followers.total",
                     "popularity"
                     ))

num_of_row_artist_info <- playlist_artist_detail_filtered %>%
  nrow()

num_of_tracks_artist_info <- playlist_artist_detail_filtered %>% 
  filter(name == "id") %>%
  count()

num_of_rows_per_track_artist_info <- num_of_row_artist_info/num_of_tracks_artist_info$n

track_number_col_artist_info <- rep(c(1:num_of_tracks_artist_info$n),each=num_of_rows_per_track_artist_info)

playlist_artist_detail_pivot_wide <- playlist_artist_detail_filtered %>%
  cbind(track_number_col_artist_info) %>%
  select(track_number_col_artist_info, everything()) %>%
  pivot_wider(names_from = name, values_from = value) %>%
  rename(artist_id = id) %>%
  select(artist_id, followers.total, popularity) %>%
  rename(artist_popularity = popularity) %>%
  distinct()

The results from the playlist_artist_detail_pivot_wide tibble are listed below.





Putting it all Together

We can then join all three tibbles together for our final tibble used in the Tableau Public application.

new_playlist_tbl <- new_playlist_info %>%
  left_join(playlist_track_detail_pivot_wide) %>%
  left_join(playlist_artist_detail_pivot_wide)

The final tibble now has 100 rows and 23 columns.











To access the Tableau Public application, please click here.

Popular posts from this blog

MySQL Part 1: Getting MySQL Set Up in goormIDE

Do Popular Market Index Returns Follow a Normal Distribution?