more netflixing
Mood: accomplished
Posted on 2006-10-05 09:35:00
Tags: netflixprize
Words: 152

So I submitted my first entry last night - it did poorly, of course, but at least I'm on the Leaderboard! (I'm "teamgreg" because I really didn't want to think of a name) Crunched some more numbers last night in support of my next submission, which I have to wait a week for. It's a little annoying that just reading in all 100 million entries takes, at a minimum, 40 minutes (not to mention any processing, but that's all been pretty quick so far), although it works decently if we're doing other things that I can start a run of that and then get back to something else. As long as that "something else" isn't WoW, because it really sloooows my computer down. :-)

I'm impressed one team already has a RMSE (root mean squared error) of .9571, which is within spitting distance of .9474, which is how Netflix's algorithm does on the data.


3 comments

Comment from onefishclappin:
2006-10-05T09:23:33+00:00

FYI Gary and I were talking about this last night and were discussing pulling zip code/census data (freely available stuff, IIRC) into it. Does that make sense from what you get to work with?

Comment from gregstoll:
2006-10-05T10:13:20+00:00

Neat idea! Unfortunately, the data set doesn't include any information about users (only a unique identifier). For the movies it includes the title and the date it was either released or released on DVD (so using the date is a bit sketchy), so you can pull external data about that...

Comment from onefishclappin:
2006-10-05T11:09:23+00:00

Suck... if you get just a zipcode, you could harvest/infer all sorts of good info about people (with minimal privacy concerns).
I wonder if my father is too busy on stuff to work on this - he's been doing various datamining whatnot for years... Somehow I suspect that he's too busy in retirement to want to do more "work" :)
Keep letting us know how it's going!

This backup was done by LJBackup.