Technical Spiders

So I built a Netflix app last weekend...

07 Feb 2011

I had a simple goal. I wanted to find the absolute worst 100 movies on netflix. You see, I’m a bad movie fan. I’m not talking a little bad. I’m talking the worst of the worst. The unwatchable. The unthinkable. I’ve found the humor of them and they totally crack me up (much much thanks to Jane). I was also very much inspired by IMDB’s Bottom 100.

So, I set forth on this task. First, I looked at Netflix’s APIs. I started down the path of using their REST API, which wasn’t bad, but had a few drawbacks.

First, you can only search movies by terms. This makes this a bit difficult for searching their catalog by rating or by pretty much anything else. They do have a full index, so I set off in fetching that. It took a bit to download the full catalog (around 300M), and it was a bit of a pain because it was behind their oauth APIs. After that, with grep and sed I was able to extract out the 76k or so titles from the catalog that were movies (not TV Series or People or Genres). Here is the simple client I wrote with oauth & crack. But the catalog didn’t have user ratings, so I had to go fetch and store all of them, then find the worst.

This was a great chance to play around with redis - it supports sets with scores and it’s wicked simple and fast. So, I whiped up this code so that I could run this code and determine the worst movies. I was all set.

Except one problem. Netflix’s API limits are 5,000 requests a day. If I run this every day, that will be 15 days until I can finish my app. That’s no fun. So, I went back to the drawing board, and looked over at Netflix’s OData APIs. I had never really heard of OData before, but it looked to provide a way of search and filtering on the data that I wanted. After finding the ruby odata gem, this gave me the start of what I needed.

I ended up whipping up this bit of code, which allowed me to not only search and sort by average rating, but I could also filter by DVDs available and Netflix Instant.

From there, it was a matter of throwing it together in a rails app and deploying to heroku. Heroku is simple and easy for hosting rails apps, and free for smaller usage.

So, I finished up and deployed it. I then realized it was a bit slow (from looking at newrelic), and decided to add some caching. Heroku provides a free memcached plugin (up to 5M), so I plugged that in and fragment cached the views. Since this data doesn’t seem to change often, seems like a pretty good compromise.

Challenges, Issues, Final Thoughts

All in all, pretty simple and fun. And not bad for a small weekend project. There are some take-aways. First, I still can’t get this 100% right. Netflix exposes average ratings, but not number of ratings. So, if there are ties, then I can’t sort by number of votes next (which is really more fair and accurate - it’s much harder to maintain a low rating the more votes that come in). Second, I’m still having issues with the ruby odata client filtering by Genre. I know it can be done, but I haven’t been able to get it to work yet with the gem. Ideally, I want the worst for each Genre as well. Overall, Netflix has done an okay job with providing some APIs, but they could really use some love. They last updated the Odata “preview api" about 10 months ago, they last updated the REST API 2 years ago. They were hiring 7 months ago, so I’m hoping they got their new devs and will be making some changes soon.

So, now it’s off to the movies (if you dare).

ruby netflix weekend project worst movies

Technical Spiders

So I built a Netflix app last weekend...

07 Feb 2011

Related Posts

New blog setup 24 Jul 2013

Something that stinks about Ruby’s core library 22 Apr 2010

Ruby Can't Scale (Don't Listen to John Metta)! 24 Mar 2010

On Procs and Rubies 01 Aug 2009

Ruby enumerable every 10 Apr 2009