Jump to content


Members Plus
  • Posts

  • Joined

  • Last visited

Profile Information

  • Gender
  • My SoundCloud
  • Interests
    computer music, R, ggplot, data viz, machine learning, supercollider, cyberpunk

Previous Fields

  • Country
    Not Selected
  • PSN

lextreefrog's Achievements


Newbie (1/14)

  • First Post
  • Conversation Starter
  • Week One Done
  • One Month Later
  • One Year In

Recent Badges



  1. No, can anyone point help me in finding a *clean* source of the text of the names so I can add? Something that is easily scrapeable like a chart like wikipedia uses would be ideal.
  2. I absolutely want the most complete dataset possible. If anyone wants to point out any beefy releases you don't see on rapgenius. I'd need it in the below form. I'm glad I pulled this up to screenshot since, as you'll notice, rap genius has the fan titles for SAWII! We clearly shouldn't be counting those as we are interested in Aphex Twin's authorial preference, not how that preference is perceived or interpreted. We want to prove authorial intent! Apologize for the oversight. ...wonder what happens if we remove it looks about the same, but let's save our eyeballs some trouble: Only 2 of the top 10 overindexed letters in AFX titles ("x","z",...,"l") increase in the relative distribution when SAWII fan titles are included, while 6 out of the bottom 10 increased. This is evidence that the fan titles bring the distribution of AFX titles closer to the distribution of the English language. I love how "w" stayed at the bottom AND is the most affected by the inclusion. Aphex Twin doesn't ask questions!
  3. Love these questions! The dataset I pulled, which is every "Aphex Twin" release on rapenius except Rushup Edge I think because it has a special character in it I couldn't pull it down properly. There are 11 instances of q in the 307 unique song titles I pulled. Your other concerns I think are best addressed by explaining what is meant by "Relative Distribution". If I were just to plot the number of occurrences of each letter in an aphex twin song, a's and e's and s's and all the other 1-point scrabble pieces would be top ranked. But what the graph represents is the distribution of those letters relative to their usage in the english language (using frequency in all words in the dictionary as an approximation). So while "a" may still be used a lot more than 'q' in aphex twin song titles, it's more interesting to know the letters that come up the most or least often in Aphex Twin song titles THAN YOU WOULD EXPECT them to if they had the same distribution as the alphabet. Here's a way of thinking about it. You have two buckets, A and B. A is filled with all the letters of Aphex Twin's song titles. B is filled with regular scrabble tiles. Each bucket has the same number of tiles in it. If you were blindfolded so that you didn't know which bucket you were reaching into and pulled out a letter, the graph tells you the odds the house would give you if you were to bet on which bucket that letter came from. If it was an "i", you'd have 50:50 odds of it being from bucket A. If it was a "x", you'd have 20:1 (ish) odds. Make sense? This is what we want to look at if we want to quantify the extent to which aphex twin's use of english characters diverges from the English language. I removed all special characters as I couldn't get a number for how often they are used in the english language. I thought about getting the raw text from a bunch of different newspapers or something so I could get frequency of special characters. Here's the frequency table with special characters included for fun.
  4. Anyone else make dumb stuff like this? Used Josiah Parry's R package geniusR to retrieve song titles from rapgenius. Dictionary letter distribution from Wikipedia. I spent a lot of time trying to find a (freely available) font that looks like Syro to no avail, if anyone can recommend a better font that'd be appreciated.
  • Create New...