Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As a pioneer of scraping (NetProphet, the first interactive stock charting app with push-data) we initially scraped every quote we had in our database from other sites.

The fundamental problem is, web pages can change a lot. We constantly had scraper scripts fail either because the web pages changed for some innocuous reason, or they noticed the scraping and blocked us.

We resorted to a list of scrape targets and constantly-updating scrape-scripts to adapt continuously to the 'market'. We also pinged each target to find the least congested.

Eventually we got our own stock feed (guy that did that is a research scientist at Adobe now) and stopped scraping altogether. But it was a wild ride.



We still need to scrape many (several 100) clients' websites because they are unable to give us product feeds (adequate ones or any at all) for their stores. But hey, it gives us a small edge because we try harder than the competition.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: