Maybe I'll detail my adventure here.
So I started out with regex. Each span has a unique ID, so regex -- why not? Slurp the whole file, match on the IDs, barf out the content.
Not my most brilliant moment, I've got to admit. The HTML is often formatted strangely, and this method is pretty inefficient for large files. I figured someone had already done this (I knew, in fact, I just figured this'd be faster and simpler), so method #2.
The HTML Parser class from CPAN. Write a hook for start, text, and end tags, [i]then[/i] match on ID. Didn't seem elegant or efficient.
I bet there's some programmer looking at me now, laughing, because I'm just stumbling around in the dark.
What I would ideally want are CSS-like selectors that operate on the HTML ... in perl. So I could say ("#myUniqueID").content to get the content. JQuery is awesome in this respect: I can just ask for what I want.
I know there's a way to do this, and someone's already done it. I've just got to keep hunting around to find it. Maybe perl isn't the right language to do this.
In other news, I think we should load up the new BC2 maps on Server 2. I'd want to play them now, not the old maps.
No announcement yet.
parsing HTML with perl