The Fiction Genome Project

by [anonymous] 2 min read29th Jun 201232 comments


The Music Genome Project is what powers Pandora. According to Wikipedia:


The Music Genome Project was first conceived by Will Glaser and Tim Westergren in late 1999. In January 2000, they joined forces with Jon Kraft to found Pandora Media to bring their idea to market.[1] The Music Genome Project was an effort to "capture the essence of music at the fundamental level" using almost 400 attributes to describe songs and a complex mathematical algorithm to organize them. Under the direction of Nolan Gasser, the musical structure and implementation of the Music Genome Project, made up of 5 Genomes (Pop/Rock, Hip-Hop/Electronica, Jazz, World Music, and Classical), was advanced and codified.


A given song is represented by a vector (a list of attributes) containing approximately 400 "genes" (analogous to trait-determining genes for organisms in the field of genetics). Each gene corresponds to a characteristic of the music, for example, gender of lead vocalist, level of distortion on the electric guitar, type of background vocals, etc. Rock and pop songs have 150 genes, rap songs have 350, and jazz songs have approximately 400. Other genres of music, such as world and classical music, have 300–500 genes. The system depends on a sufficient number of genes to render useful results. Each gene is assigned a number between 1 and 5, in half-integer increments.[2]


Given the vector of one or more songs, a list of other similar songs is constructed using a distance function. Each song is analyzed by a musician in a process that takes 20 to 30 minutes per song.[3] Ten percent of songs are analyzed by more than one technician to ensure conformity with the in-house standards and statistical reliability. The technology is currently used by Pandora to play music for Internet users based on their preferences. Because of licensing restrictions, Pandora is available only to users whose location is reported to be in the USA by Pandora's geolocation software.[4]



Eminent lesswronger, strategist, and blogger, Sebastian Marshall,  wonders:


Personally, I was thinking of doing a sort of “DNA analysis” of successful writing. Have you heard of the Music Genome Project? It powers


So I was thinking, you could probably do something like that for writing, and then try to craft a written work with elements known to appeal to people. For instance, if you wished to write a best selling detective novel, you might do an analysis of when the antagonist(s) appear in the plot for the first time. You might find that 15% of bestsellers open with the primary antagonist committing their crime, 10% have the antagonist mixed in quickly into the plot, and 75% keep the primary antagonist a vague and shadowy figure until shortly before the climax.


I don’t know if the pattern fits that – I don’t read many detective novels – but it would be a bit of a surprise if it did. You might think, well, hey, I better either introduce the antagonist right away having them commit their crime, or keep him shadowy for a while.



Or, to use an easier example – perhaps you could wholesale adopt the use of engineering checklists into your chosen discipline? It seems to me like lots of fields don’t use checklists that could benefit tremendously from them. I run this through my mind again and again – what kind of checklist could be built here? I first came across the concept of checklists being adopted in surgery from engineering, and then having surgical accidents and mistakes go way down.


Some people at TV Tropes came across that article, and thought that their wiki's database might be a good starting point to make this project a reality. I came here to look for the savvy, intelligence, and level of technical expertise in all things AI and NIT that I've come to expect of this site's user-base, hoping that some of you might be interested in having a look at the discussion, and, perhaps, would feel like joining in, or at least sharing some good advice.

Thank you. (Also, should I make this post "Discussion" or "Top Level"?)