Tuesday, February 06, 2007

Transcript of BriefingsDirect Podcast on Music Search Technology and Implications

Edited transcript of BriefingsDirect[TM] B2B informational podcast on music search with Sun Labs, recorded Jan. 10, 2007.

Listen to the podcast here.

If you'd like to learn more about BriefingsDirect B2B informational podcasts, or to become a sponsor of this or other B2B podcasts, contact Dana Gardner at 603-528-2435.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you're listening to BriefingsDirect. Today, a discussion with Paul Lamere, a staff engineer at Sun Microsystems Labs and principal investigator for Search Inside the Music.

This interesting research project is taking search to quite a new level. Instead of using lists of data about the music author, album, or composer, Search Inside the Music actually digs into the characteristics of the music, finding patterns, and then associating that with other music.

The possibilities here are pretty impressive. I want to thank Paul Lamere for joining us. Welcome to the show, Paul.

Paul Lamere: Hi, Dana. Thanks for having me.

Gardner: I had the opportunity to see your presentation face-to-face when I visited the Sun Labs’ campus in Burlington, Mass., back in December, 2006, and it really had me thinking for a few days after. I kept coming back to it and thinking, “Wow! What about this and what about that?” I thought it would be good to bring this to a wider audience and to use the medium of audio, seeing as it is so germane to this project.

Let’s get a little historical context. Tell us how you got involved with this project, why music, and how does that relate to IT in general?

Lamere: Okay, as you know, I work in Sun’s research lab, and our role as researchers is to be the eyes and ears for Sun. We’re trying to look out on the distant horizon for Sun to keep an eye on the interesting things that are happening in the world of computers. Clearly music has been something that's been greatly changed by computers since Napster and iTunes.

So I started to look at what's going on in the music space, especially around music discovery. The thing that I thought was really interesting is looking at Chris Anderson’s book and website, The Long Tail. You know, music is sort of the sweet spot for the long tail; the audio is a nice conveniently sized packet, and people want them. What we’re seeing right now is the major labels putting out 1,000 songs a week. If you start to include some of the independent labels and the music that's happening out on MySpace and that sort of thing, you may start to see more like 10,000 songs a week.

Chris Anderson said, “The key to The Long Tail is just to make everything available.” Now imagine not just every indie artist, but every kid with a drum machine in their basement and a PC putting on their tracks, and every recording of every performance of "Louie Louie" on the Web, and that same thing happening all over the world and sticking that on the Web. Now we may start having millions of songs arriving on the Web every week.

Gardner: Not to mention the past 800 or 1,000 years’ worth of music that has been recorded in the last 50 or 100 years.

Lamere: That’s right. So, we have many orders of magnitude, more music to sift through and 99.9 percent of that music is something you would never ever want to listen to. However, there is probably some music in there that is our favorite songs if we could just find them.

I’m really interested in trying to figure out how to get through this slush pile to find the gems that are in there. Folks like Amazon have traditionally used collaborate filtering to work through content. I’m sure you’re familiar with Amazon’s “customers who bought this book also bought this book,” and that works well if you have lots of people who are interested in the content. You can take advantage of the Wisdom of Crowds. However, when you have…

Gardner: They are working on the "short head," but not the long tail.

Lamere: That’s right. When you have millions of songs out there, some that people just haven’t listened to, you have no basis for recommending music. So, you end up with this feedback where, because nobody’s listening to the music, it’s never going to be recommended -- and because it’s never recommended, people won’t be listening to the music. And so there is no real entry-point for these new bands. You end up once again with the short head, where you have U2 and The Beatles who are very, very popular and are recommended all the time because everyone is listening to them. But there is no entry point for that garage band.

Gardner: Yes. When I use Amazon or Netflix and they try to match me up, they tell me things I already know; they haven't told me things that I don’t know.

Lamere: That’s right. Did you really need to know that if you liked The Beatles, you might like The Rolling Stones? So, we’re taking a look at some alternative ways to help weed through this huge amount of music. One of the things that we’re looking at is the idea of doing content-based recommendation. Instead of relying on just the Wisdom of Crowds -- actually rely on the audio content.

We use some techniques very similar to what a speech recognizer does. It will take the audio and will run signal processing algorithms over it and try to extract out some key features that describe the music. We then use some machine-learning techniques basically to teach this system how to recognize music that is both similar and dissimilar. So at the end, we have a music similarity model and this is the neat part. We can then use this music similarity model to recommend music that sounds similar to music that you already like.

Gardner: Yes, this is fascinating to me because you can scan or analyze music digitally and come out and say, this is blues, this is delta blues; this is jazz, this is New Orleans jazz. I mean, it’s amazing how discreet you can get on the type of music.

Lamere: Yes, that’s right, and the key is that you can do this with any kind of music without having any metadata at all. So, you can be given raw audio and you can either classify it into rather rich sets of genres or just say, "Hey, this sounds similar to that Green Day song that you’ve been listening to, so you might like to listen to this, too."

Gardner: Fascinating. So once we’re able to get characteristics and label and categorize music, we can scan all sorts of music and help people find what is similar to what they want. Perhaps they’re experimenting and might listen to something and think, “I wonder if I am interested in that,” and do all kinds of neat things. So, explain the next step.

Lamere: Well, there are lots of ways to go. One of the things that we can do with this similarity model is to provide different ways of exploring their music collections. If you look through current music interfaces, they look like spreadsheets. You have lists of albums, tracks, and artists and you can scroll through them much like you would through Lotus 1-2-3 or whatever spreadsheet you are using.

It should be fun; it should be interesting. When people look for music, they want to be engaged in the music. Our similarity model gives people new and different ways of interacting with their music collections.

We can now take our music collection and essentially toss it into a three-dimensional space based on music similarity, and give the listener a visualization of the space and actually let them fly through their collection. The songs are clustered based on what they sound like. So you may see one little cluster of music that’s your punk and at the other end of the space, trying to be as far away from the punk music, might be your Mozart.

Using this kind of visualization gives you a way of doing interesting things like exploring for one thing, or seeing your favorite songs or some songs that you forgot about. You can make regular playlists or you can make playlists that have trajectories. If you want to listen to high-energy music while driving home from work, you can play music in the high-energy, edgy space part of your music space. If you like to be mellowed out by the time you get home, you have a playlist that takes you gradually from hard-driving music to relaxing and mellow music by the time you pull into the driveway.

Gardner: Now, for those who are listening, I’m going to provide some links so you see some of these visualizations. It’s fascinating because it does look like some of these Hubble Telescope cosmos-wide diagrams where you see clusters of galaxies, and then you’re shown the same sort of visualization with clusters of types of music and how they relate.

If we take the scale in the other direction -- down into our brains and with what we know now about brain mapping and where activities take place and how brain cells actually create connections across different parts of the brain -- there is probably a physical mapping within our brains about the music that we like. We’re almost capturing that and then using that as a tool to further our enjoyment of music.

Lamere: That’s an interesting idea.

Gardner: Now, I’m looking here on my PC at my iTunes while we’re talking and I’ve got 4,690 items, 15 days of music, and 26.71GB. And it turns out -- even when I use shuffle and I’ve got my playlists and I’ve dug into this -- I’m probably only listening to about 20 percent or 30 percent of my music. What’s up with that?

Lamere: Yes, exactly. We did a study on that. We looked at 5,000 users and saw that the 80-20 rule really applies to people’s music collections: 80 percent of their listening time is really concentrated in about 20 percent of their music. In fact, we found that these 5,000 users had about 25 million songs on their iPods and we found that 63 percent of the songs had not been listened to even once. So, you can think of your iPod as the place that your music goes to die, because once it’s there, the chances are you will never listen to it again.

Gardner: I don’t want it to be like that. So, clearly we can use some better richer tools. Is that right?

Lamere: That’s right. Shuffle play is great if you have only a few hundred songs that you can pick and put on there, but your iPod is a lot like mine. It has 5,000 songs and it also has my 11-year-old daughter’s high-school musical and DisneyMania tracks. I have Christmas music and some tracks I really don’t want to listen to.

When I hit shuffle play, sometimes those come up. Also with shuffle play, you end up going from something like Mozart to Rammstein. I call that the iPod whiplash. A system that understands a little bit about the content of the music can certainly help you generate playlists that are easier to listen to and also help you explore your music collection.

So you can imagine instead of hitting shuffle play or just playing the same albums over again, you could have a button on your iPod, or your music player, that lets you play music you like. That’s something that is really needed out there.

Gardner: So, instead of a playlist, you could have a "moodlist." For example, I’m stressed and I need to relax, or I really want to get pumped up because I’m going to a party and I want to feel high-energy, or I have the kids in the back seat and I want them to relax. Your mood can, in a sense, dictate how you assemble a playlist.

Lamere: That’s right. Imagine that a few years from now, (it’s not hard to extrapolate with the new iPhone), we’re going to have wireless iPods that are probably connected to a cloud of millions of tracks. It’s not hard to imagine all of the world’s music will be available on your hand-held audio player in a few years. Try using shuffle play when you have 500 million songs at your disposal; you’ll never find anything.

Gardner: I don’t have the benefit of a DJ to pick out what I want either, so I’m sort of stuck. I’m not in the old top 40 days, but I’m in the 40 million tracks days.

Lamere: That’s right.

Gardner: Now let’s look at some practical and commercial uses. Professional playlists assemblers who are creating what goes into these channels that we get through satellite, cable probably could use this to advantage. However, that doesn’t strike me as the biggest market. Have you thought about the market opportunity? Would people be willing to pay another five dollars a month to add it to their service so they have all their music readily accessible? How do you foresee it commercializing?

Lamere: I’m sure you’ve heard of Netflix. They are probably one of the biggest DVD shippers and one of their biggest advantages is their recommendation engine. I’m not sure if you’ve heard about the Netflix contest. Any researcher who can improve their recommendation by just 10 percent will receive $1 million from Netflix. I think that really represents how valuable good recommendations are to companies trying to deliver Long Tail content.

Amazon has built their business around helping connect people with their content as well. The same things are going to happen (or are happening now) within the music world. There is a lot of value hooking up to people with music; getting them into the Long Tail.

Gardner: Can we easily transfer this into a full multi-media experience -- video and audio? Have you tried to use this with tracks from a movie or a television show? Is there an option to take just from the audio alone; a characteristic map that people could say, "I like this TV show, now give me ones that are like it?" Is that too far-fetched? How do we go to full multi-media search with this?

Lamere: Those are all really interesting research questions. We haven't got that far yet. There are so many interesting spaces to bring this -- for instance, digital games. People are interacting with characters, monsters, and things like that. Currently, they may be in the same situation 50 times because they keep playing the game over and over again.

Wouldn’t it be nice to hook up to music that matches the mood and knows changes or may even push the latest songs that match the mood into the games, so that instead of listening to the same song, you get new music that does not detract from the game? There are all sorts of really interesting things that could happen there.

Gardner: I saw you demonstrating a 3D map in real time that was shifting as music was going in and out of the library as different songs were playing. It was dynamic; it was visual; there were little icons that represented the cover art from the albums floating around. It was very impressive. Now, that doesn’t happen with a small amount of computer power. Is this a service that could be delivered with the required amount of back-end computational power?

Lamere: Yes. We can take advantage of all of the nifty gaming hardware out there to give us the whiz bang with the 3D visualizations. The real interesting stuff, the signal processing and the machine learning when dealing with millions of songs, is going to use vast computer resources.

If you think music is driving a lot of bandwidth now in storage and computation, in a few years when people start really gravitating toward this content-based analysis, music will be driving lots and lots of CPU cycles. A small company with this new way of content-based recommendation can rent time on a grid at a per-CPU hour rate and get an iTunes-sized music collection (a few million songs) in a weekend as opposed to the five or 10 years it would take on a couple of desktop systems.

Gardner: Interesting. The technology that you’ve applied to music began with speech. Is there a way of moving this back over to speech? We do have quite a bit of metadata and straight text and traditional search capabilities. What if we create an overlay of intonation, emphasis, and some of the audio cues that we get in language that don’t show up in the written presentation or in what is searchable? Does that add another level of capability or "color" to how we manage the spoken word and/or the written word? With my podcasting, for example, I start with audio -- but I go to text, and then try to enjoy the benefits of both.

Lamere: Right. These are all great research questions; the things that researchers could spend years on in the lab. I think one interesting application would be tied to meetings; work meetings, conference meetings, just like when you visited Sun last month.

If we had a computer that was listening to the meeting and maybe trying to do some transcripts, but also noting some of the audio events in the meeting such as when everybody laughed or when things got really quiet. You could start to use that as keys for searching content in the meetings. So, you could go back to a recording of the meeting and find the introductions again very easily so you can remember who was at the meeting or find that summary slide and the spoken words that went with the conclusion of the talk.

Gardner: Almost like a focus group ability from just some of these audio cues.

Lamere: That’s right.

Gardner: Hey, I’ve got something that you could take to the airlines. I tend to sit on planes for long periods of time and after my battery runs out, I am stuck listening to what the airline audio provides through those little air tubes. Wouldn’t it be nice if there were audio selections that were rich and they really fit my mood. This is a perfect commercialization.

Lamere: Yes, you can have your favorite music as part of your travel profile.

Gardner: This could also work with automakers. Now that everyone has found a way to hook up to their iPods or their MP3 equivalent in their automobile, the automakers can give you what you want off of the satellite feed.

Lamere: Definitely.

Gardner: There are many different directions to take this. Obviously you’ve got some interest in promoting Sun’s strategic direction. There must be some other licensing opportunities for something like this. Is this patented or are you approaching it from an intellectual property standpoint? If our listeners have some ideas that we haven’t thought of, what should they do?

Lamere: When you’re in the labs, the people who run the lab really like to make sure that the researchers are not tempted by other people’s ideas because it can make it difficult down the road. If people have some ideas they want to send my way, it’s always great to hear more things. We do have some patents around the space. We generally don’t try to exploit the patents to charge people entry into a particular space.

Gardner: Since this does fall under the category of search, there are some big companies out there that are interested in that technology. We have a lot of Google beta projects, for example, such as Google News, Google Blogs, Google Podcasts, and Google base. Why not Google Music?

Lamere: Google has two -- I guess, you may call them Friday projects -- on their labs site around music. One is Google Trends, and the idea there is they’re trying to track which music is popular. If you use Google’s instant messenger, you can download a plug-in that will also track your listening behavior. So every time you play a song, it sends the name of the artist and the track to Google and they keep track of that. They give you charts of top 100 music, top 100 artists, whatever. The other thing they have is a Music Search tailored toward music.

If you type in Coldplay or The Beatles, you’ll get a search result that’s oriented toward music. You’ll see links of the artist page and links to lyrics but, interestingly enough, they haven't done anything in public to my knowledge about indexing music itself. This is interesting because Google has never been shy about indexing.

Its mission is to index all the world’s information, and certainly music falls into that category. They haven't been shy about going up against publishers when it comes to their library project, where they’re scanning all the books in all the libraries despite some of the objections of the book publishers. But for some reason they don’t want to do the same with music. So, it’s an open question. But probably they’ll be announcing the Google Music Search tomorrow.

Gardner: At least we can safely say we’re in the infancy of music search, right?

Lamere: That’s right. I see a lot of companies trying to get into the space. Most of them are trying to use the collaborate filtering models. The collaborate filtering models really require lots of data about lots of users. So they have a hard time attracting users because until they get a lot of users, their recommendations are not that good. And because their recommendations are not that good, they don’t get a lot of users.

Gardner: The classic chicken-and-egg dilemma.

Lamere: Yes, it’s called the "cold start" problem.

Gardner: I firmly believe in the "medium is the message"-effect, and not just for viewing news or getting information. When I was getting my music through the AM radio, that characterized a certain way that I listened and enjoyed music. I had to be in a car, or I had to be close to a radio, and then I might avoid sitting under a bridge.

Then I had my LPs and my eight tracks and they changed from one song into an album format for me. We’re going back a number of years here. I’ve been quite fond of my iPod and iTunes for the last several years and that has also changed the way I relate to music. Now, you have had the chance to enjoy your Search Inside the Music benefit. How has using this changed the way you relate to and use music?

Lamere: That’s a good question. I agree that the media is the message, and that really affects our way of dealing with music. As we switch over to MP3s, I think listening to music has shifted from the living room to the computer. People are now jogging with their iPod and listening experiences are much more casual.

They may have access to 10,000 tracks. They’re not listening to the same track over and over like we used to. So I think over time music is going to shift back from the computer to the living room, back to the living spaces in our house and away from the computer.

I try to use our system away from the computer -- just because I like to listen to music when I’m living, not just when I’m working. So I can use something like Search Inside the Music to generate interesting playlists that I don’t have to think about.

Instead of just putting the shuffle play on The Beatles, I can start from where I was yesterday, and since we were eating dinner, let’s circle around this area of string quartets and then when we’re done, we will ramp it up to some new indie music. You still have the surprise that you get with shuffle, which is really nice, but you also have some kind of arc to the listening.

Gardner: So you are really creating a musical journey across different moods, sensibilities, and genres. You can chart that beyond one or two songs or a half-dozen songs into a full-day playlist.

Lamere: That’s right.

Gardner: Very interesting. Well, thanks for joining us, Paul. We’ve been talking with Paul Lamere, a researcher and staff engineer at the Sun Microsystems Research Labs in Burlington, Mass. We’ve been discussing Search Inside the Music, a capability that he’s been investigating. A lot of thoughts and possibilities came from this. Paul is the principal investigator. I wish you well on the project.

Lamere: Thanks, Dana. It’s been great talking to you.

Gardner: This is Dana Gardner, principal analyst at Interarbor Solutions. You’ve been listening to BriefingsDirect. Come back next time.

If any of our listeners are interested in learning more about BriefingsDirect B2B informational podcasts or to become a sponsor of this or other B2B podcasts, please fill free to contact Dana Gardner at 603-528-2435.

Listen to the podcast here.

Transcript of BriefingsDirect podcast on music search with Sun Labs. Copyright Interarbor Solutions, LLC, 2005-2007. All rights reserved.