Music Taste Analysis

Ever been asked what sort of music you like and felt unable to describe it convincingly? This notebook represents my effort to once and for all answer the question, because, yes, I regard it to be this complicated.

How to Use

My first pass at this depended upon Watsonbox's Exportify, but I decided I didn't like his version because of bugs and inadequate output detail. So I went and forked it, cleaned up the code, and hosted it myself.

As such, the code here depends on .csv inputs in the format output by my version.

  1. To get started, hop on over there, sign in to Spotify to give the app access to your playlists, and export whatever you like.
  2. Next, either download this .ipynb file and run the notebook yourself or launch it in Binder.
  3. Either put the downloaded .csv in the same directory as the notebook, or upload it in Binder.
  4. Open the .ipynb through your browser, update the filename variable in the first code cell to point to your playlist instead, and shift+enter in each following code cell to generate the corresponding plot. (Or select Cell -> Run All from the menu to make all graphs at once.)

Read the Data

For years I've been accumulating my favorite songs in a single master playlist called music that tickles my fancy. It's thousands of songs. This is what I'll be analyzing. Let's take a look at the first few rows to get a sense of what we're dealing with.

Artist Bar Chart

Number of songs binned by artist.

Note I've attributed songs with multiple artists to multiple bars, so the integral here is the number of unique song-artist pairs, not just the number of songs.

It seems to follow a Pareto distribution. Let's try to fit one.

Best fit is still too sharp for the data, and I tried for a good long while to get it to fit better, so I conclude this doesn't quite fit a power law.

Let's plot the top 50 artists so we can actually read who they are.

Volume Added Over Time

My proclivity to add songs to this playlist is a proxy for my interest in listening to music generally. How has it waxed and waned over time?

The initial spike is from when I first stared using Spotify as the home for this collection and manually added hundreds from my previous list.

Eclecticness Measure (Frequency Transform)

This one is a personal favorite. I want to know how many of my songs are one-offs from that artist for me--just individual pieces I found fantastic and ended up adding after a few listens--, how many are two-offs, et cetera. I know it must be heavily skewed toward the low numbers.

So, yes, it's much more common for an artist to make it in my list a few times than many times. In fact, the plurality of my top songs come from unique artists.

Conversely, this view also makes stark those few musicians from whom I've collected dozens.

Note that here, as in the artist bar charts, some songs are doubly-counted, because in cases artists collaborated I listed the song in both bins.

Genres Bar Chart

Alright, enough messing around. All the above were possible with the output from Watsonbox's Exportify. Let's get to the novel stuff you came here for.

People describe music by genre. As we'll see, genre names are flippin' hilarious and extremely varied, but in theory if I cluster around a few, that should give you a flavor of my tastes.

So many! Let's do the same thing as with the artists and for giggles see if it fits a power law.

Still too sharp, but fits better than with the artists.

Let's look at the top 50 so we can read the names.

"Indie poptimism" lol. wtf? "Dreamo", "Vapor soul", "Freak folk", "Tropical house", "Post-grunge", "Hopebeat", "Noise pop", "Mellow gold"

These are too good. Next time someone asks me my music taste, I'm definitely using these.

If these are the most popular names, what are the really unique ones at the bottom of the chart?

"hauntology", "psychadelic folk", "stomp and whittle", "dark trap", "filthstep", "shamanic", "deep underground hip hop", "future garage"

That was fun.

Release Dates

Which era of music do I prefer?

It seems to follow a Gamma distribution! This makes sense because I'm more likely to have heard things that are nearer me in time, and it takes a while for them to get through my process and become favorites.

Let's fit that gamma to the time-reversed data.

Pretty good fit! I seem to be extra partial to music from about 5 years ago. We'll see whether the present or maybe the further past catches up.

Popularity Contest

I was happy to find popularity listed as a field in Spotify's track JSON. It's a percentile between 0 and 100, rather than an absolute number of plays. Still, it can be used to give a notion of how hipster I am.

Damn, I'm a hipster.

Track Duration

Do I prefer long songs or short ones?

Median is lower than the mean, so I'm skewed right. That is, I like a few really long songs. What are they?

Musical Features

In the interest of understanding user tastes and providing the best possible music recommendations, Spotify has done some really sophisticated analysis of actual track content. Music is a time series, but most similarity metrics (and most ML methods generally) require inputs to be vectors, that is: points in some feature-space. So they've transformed the tracks to numerical metrics like Energy and Valence (continuous) and Key (discrete).

For the continuous metrics, they provide distributions across all music. Here they are next to similar plots of my own songs.