Posted on 2006-10-24 08:59:00
So, my best result on the netflix prize scored an RMSE of .981 on the probe set (this was using the movie correlation). My attempt to use user-based correlation resulted in a hideous 1.03 RMSE, which is still better than just taking the average of each movie and using that rating for each user, but not by much. So there is a little data in there, and this morning when I should have been showering I cooked up a little script to take a weighted average of the two results (obviously weighted towards the movie correlation one since it scored much better). The outputs were not horribly encouraging - I managed to get the RMSE down to .976 or so, but that's it. The last person on the leaderboard has an RMSE of .9597 right now, so I'm way off...
I do have one more way of calculating user correlations that is running now, but assuming that doesn't yield fabulous results I'll probably give up and reclaim my computer within the week. It's a little disappointing for it to end this way, but I did give it a good shot and even made it on the leaderboard for a short amount of time. And I had fun, and kept up my C++ skills a little. So it wasn't a waste!
Also, I've read that the movie correlation data is kinda interesting (which movie did people like around as much as Miss Congeniality), so I'll probably cook up a little script to show that data somehow. That would be fun.
I read a paper that suggests including data from IMDB about the movies (actors, directors, etc.) could improve things, but there are 17770 of them and only one of me, and I'm short on free time as it is.
Comment from anonymous:
I also got a very low RMSE when using user correlation, I was wondering if something was wrong with my code.
But it is reassuring to see, I am not the only one.
The movie correlations do give good result. I would have been on the leaderboard, but I am a week too late.
Comment from anonymous:
correction, i got very high RMSE when using user correlation
Comment from gregstoll:
Yeah, it's pretty difficult to make it on the leaderboard now. I'm going to try to combine the user and movie correlations in a more intelligent way, but it's going to take a really long time to run (presumably) and I'm not convinced the user stuff is adding much useful information...
This backup was done by LJBackup.