Wow the state-of-the-art for parsing HTML with the Python standard library is... pretty far back, huh

· · Web · 2 · 0 · 1

@xor I'd just use BeautifulSoup4 with the lxml backend

@divergentdave as a challenge I'm trying to stick to the standard library and hoo boy it is way less good

(I got it to work with regex but like...)

@xor (extremely xkcd voice) the HTML5 spec, with its standardization of parsing, postdates Python 3 by six years!

@xor well-formed HTML or the kind of HTML you find in the wild wild web?

@brainwane this was about exceedingly well-formed HTML! The kind of HTML that BeautifulSoup cuts through like a hot knife through butter

@brainwane (in fact, it is well-formed enough that my solution was regular expressions. don't tell anyone)

Sign in to participate in the conversation

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!