Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

FYI, BeautifulSoup is actually dropping its own parser entirely for the next version, in favour of being a wrapper around lxml/html5lib/html.parser http://www.crummy.com/2012/02/02/0


Ah, interesting, thanks. I will confess that I haven't followed the latest round of developments in BeautifulSoup - I abandoned it last year when I made the switch to Python 3, and the previous effort to port BS to Python 3 had stalled/failed at that point (looks like they're back on track on that one now, too). Then I found I didn't really miss BS while using lxml.html - not sure I see a point in putting it on top of it.


I think the idea behind BS is that it saves some typing and thinking for common scraping tasks. In that light, it fits perfectly on top of some other (faster) parsing library.


Yup, but all of the stuff in the Quick Start section of http://www.crummy.com/software/BeautifulSoup/bs4/doc/ has more or less close equivalents in lxml.html.

That being said, I think it's great that he's still maintaining BS, and porting it to Python 3 in particular - it keeps existing code working and will allow more people to make the switch. And a responsible and committed maintainer is good advertising for a package in itself, of course.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: