After singing togehter with the same people for a few years, I couldn't help thinking, 'How many songs have we sung together in total? What have we sung so far?' Unfortunately, Smule -just like any other social media- does not really give you a year-end report like how you would get your credit card usage statement😂
To get the songs listed on the page in Smule is quite an endless process. You have to scroll through the page. As you scroll down, the page will give more content until you hit the first song you have ever sung. Imagine there's over 1000s of songs on your page and the page shows only 25 at a time...and your internet suddenly crashes☠ You get the idea, right?
Now let's say you were really passionate about finding the list of all the songs and scrolled through those 100s of pages (FYI: Even I don't do that!), you will probably have to copy + paste the specific item on the page into some sort of a spreadsheet field by field and one by one. You can pay an intern to do this for months. Or just let the code do it for you🤓
Yes, I am very obssessed with getting those data! What could be better than using my own singing data to demonstrate my coding skills AND my personal life interest!?
I just want a list of all the songs I have sung and I want to know what does my singing activity look like since I started using this app in 2016 ( •̀ ω •́ )✧ That's it! No elaborate hopes and dreams! ...Though, for the purpose of this project, I wrote up the summary report to get some insight!
I want to be respectful of my singing buddies' privacy, so I will redact everything that can be traced back to them.
Trying to get my hands on the data was actually the hardest part of this project. Here's my journey:
From googling around, I finally got my hands on the URL where I can get the data I need in a json format. Bless those beautiful souls for sharing the URL on stackoverflow and Sing Salon.
STARTER URL ACQUIRED✨https://www.smule.com/s/profile/performance/<username>/sing?offset=<#>
Let the cherry-picking begins!
I actually created a project on my GitHub that is dedicated for extracting data [and transforming too]. The code I share actually gives you the information on how many songs have the two people sung together. For my own purpose, I extracted only data from my account.
Unfortunately, the code that I shared for Data Extraction does not always work on the first go. I was able to run the whole notebook in one-go one day, the 2nd loop got an error on another. Also, each time I ran the code, I will have to wait at least 30 minutes before I ran the same request code and I would not be able to have access to Smule page for a similar amount of time (got a 418 Error Code). Until now, I still couldn't figure out how to bypass the rate limitation on making request for the json data. I added errors & exceptions which seemed to work, but later on it gave me the same result as the code without try-except. I tried adding time.sleep(2) and that didn't do anything either.
As a last resort, instead of making two requests (2 usernames, 2 URLs) in the same notebook I made only 1 request on just 1 username. I exported the output to csv, waited 30 minutes to restart & rerun the code with the 2nd username, and exported that 2nd csv. Then, I appended the two dataframes and reset the index (need to do this for SQL database) before exporting that final dataframe to csv to be loaded. A whole set of jupyter notebooks were created for this alternative method. Yeah, I know. I AM obssessed with getting these data.
Most of the data returned is actually pretty clean, but I want the data to get spat out in a certain way. I tweaked the output of the two fields: date ("created_at") and web URL ("web_url").
Instead of having the date and timestamp (stalkable parameter right there 😱), I only want a return of just the date:
The web URL returned is only part of the url. I want a full URL:
I did an extra clean-up step to replace the real date with a fake one and to also replace the actual username with just '-username-' for demo purpose.
Once all the data is in the format I want, it's time for pandas to shine!
I ran the code to write the dataframe to csv. Then, I did some SQL-querying to present the data and talk about it.
NICE TABLE ACQUIRED✨Since the date I started using the app, September, 2016, until the date the data was extracted, September, 2020, I have made a total of 1858 recordings.
Number of Songs |
---|
1858 |
Number of Song Titles |
---|
1071 |
Number of Invitation Spawners |
---|
185 |
Collaboration type | Number of Songs |
---|---|
Inviting | 869 |
Joining | 989 |
Invitation Spawner | Number of Songs |
---|---|
Buddy1 | 110 |
Buddy2 | 56 |
Buddy3 | 54 |
Buddy4 | 44 |
Buddy5 | 39 |
Buddy6 | 36 |
... | ... |
StrangerY | 1 |
StrangerZ | 1 |
There are 869 recordings I created, either solos or duets for others to join. I have joined others on 989 songs.
Out of those 989 songs, I have sung with 185 different Smule-users. I sang as many as 100s songs with 1 person and as little as 1 song.
Now, let's just look at my own singing activity for these past years:
Year | Number of Songs |
---|---|
2016 | 106 |
2017 | 746 |
2018 | 453 |
2019 | 327 |
2020 | 226 |
Yearly
Month | Number of Songs |
---|---|
1 | 203 |
2 | 182 |
3 | 152 |
4 | 148 |
5 | 173 |
6 | 186 |
7 | 159 |
8 | 139 |
9 | 116 |
10 | 143 |
11 | 139 |
12 | 118 |
Monthly
Let's look at some of the songs that I sang most frequently next.
From looking at the resulted table, there are 1071 recordings with unique titles. I said 'unique titles' because if the same songs have have different titles (depends on how the song was being uploaded and whether it's a piano version, a guitar version, etc.) within the app, they are being count separately.
Top 15 Most Frequent Songs
サリシノハラ is pretty easy to sing and a very popular one, so I could see why I sang it so much.
心做し is also popular, though the short version only. The note is pretty high to casually sing it frequently. My favorite from
these list are actually just HEAVEN
and しわ. Nah, those URLs don't lead to my recording🤭
Song Title | Frequency |
---|---|
サリシノハラ | 14 |
心做し【Short.】 | 14 |
独りんぼエンヴィー | 9 |
HEAVEN | 9 |
Tokyo Teddy Bear | 8 |
Song Title | Frequency |
---|---|
Magnet | 8 |
しわ -Romaji- | 8 |
アイのシナリオ (TV Size) | 8 |
Zoetrope | 8 |
夜もすがら君想ふ / Romaji | 8 |
Song Title | Frequency |
---|---|
背徳の記憶 ~The Lost Memory~ | 7 |
only my railgun | 7 |
小夜子 [Original] | 7 |
WAVE | 7 |
Acute | 7 |
Note: I edited some of the songs' title to be more readable.
What are some of the inferences we could make from this?
Want to see some actual dashboards out of this data? Go to the '2.0' project, Data Visualization: Smule App Data.