obsession #12358: bsg
Mood: bored
Posted on 2007-05-14 12:53:00
Tags: ngrams
Words: 348

So destroyerj gave us many seasons of Battlestar Galactica as a present, and we sat down kinda late Saturday night to watch the miniseries. (i.e. Disc 1 of Season 1) I had heard from a few people that the miniseries was kinda slow since it had to introduce all the characters and such. To the contrary, I enjoyed it a lot, and we stayed up until 1:30 or so to finish it. Good stuff, although I liked the first "real" episode less than the miniseries. My mom's coming this week (she has a conference in Columbia...what are the odds?) but I guess we'll pick it up next week...

As one obsession waxes, so does another wane. My licensing problem with the Google n-grams led me to try to find my own. I found a corpus based on Wikipedia that I'm currently using, but the parser I wrote to extract words isn't very good, and I'm not sure that it can be improved due to inconsistent data. So my data is not particularly clean. (sometimes two words are strung together, sometimes a single word is broken into two) I wrote a quick little thingy to just do a simple lookup on the popularity of a word, and I'll probably make a web interface for that, but I'm highly doubtful that a cryptogram solver would actually work. (and I've basically given up with trying to extract 2-grams, 3-grams, etc. that would generate English-looking text)

So I'm really losing motivation to work on it, and not having an exciting project to work on leaves me in a weird and unstable condition. I like having a drive to do things like this, but sometimes it's kinda irritating. Maybe a little time off from the n-grams will give me some more inspiration...or maybe I'll give up completely and move on to something else. In a very real way it shouldn't matter (it's just something I'm doing for fun, not for anyone in particular), but I still feel bad starting a project, then giving up and moving on to another one a week later.


Comment from destroyerj:

re: the miniseries, I had heard that from various people prior to my viewing it, but I also didn't find it to be the case when I watched it the first time; I found it quite enjoyable (though long, of course). But when I rewatched it, I was suddenly aware that it was really sloooow. I dunno, I think it probably as a whole is a pretty slow miniseries but has enough introductions of things/people/concepts that your mind is paying attention to all that and doesn't notice the slower pace when everything's new to you. I bet to someone who'd seen a couple episodes and went back to see the miniseries for the first time, or maybe even someone who'd seen the first series back in the day, it'd be more noticeable.

Comment from gregstoll:

That's a good point, especially since I have to concentrate to try to remember who's who. (I'm bad at recognizing faces which can really kill a show with lots of people)

Low expectations are also good :-)

Comment from djedi:

I think they were going for a more suspenseful setting than would work for a tv show typically. Also, I think we're more programmed that an action movie has to be nonstop bangs and flashes or we realize that the plot is really weak. Since this one had a plot and char development (a lot of char development actually), the slowness was ok. On a second viewing, knowing the characters and the plot and that we're just setting up the tv show, would definitely make it less interesting.

Comment from wonderjess:

I just watched the miniseries, too! (actually, I just watched the entire original series, then the miniseries -- old series, hokey but surprisingly good). I'm getting the rest of it on netflix soon-ish...wanted to take a break, first.

Comment from taesmar:

BSG is like crack. As I recall, Shawn was the one who hated the miniseries. I liked it.

Comment from ikarpov:

I'm not sure if it has what you are looking for, but I really enjoyed using NLTK-Lite for NLP stuff. It has taggers, chunkers, parsers, the Brown and the Penn corpora, and other nice stuff.

Comment from ikarpov:

Also, parsing is going to be noisy for natural language - that's life. If you want something more reliable but not as deep, just do chunking.

Comment from gregstoll:

Hey, this looks pretty useful! I'll try to take a look at it next week. Thanks!

This backup was done by LJBackup.