> It could just be me, but I prefer more control, and using the requests library gave me that
I had the exact same opinion until a month back when I started using scrapy + scrapyd[1] for a serious scraping task. Yes, it's a long running spider which is why I decided to use Scrapy in the first place. But so far I have been really impressed with it in general and might consider it even for one-off crawling in future. IMO, pipelines etc. are worth learning. I would also recommend using the httpcache middleware during development which makes testing and debugging easy without having to download content again and again.
On a separate note, for one-off scraping, it's also worth checking whether a python script is required at all. For example, if you just need to download all images, wget with recursive download option should work pretty well in most cases. For eg. a while back I wanted to download all game of life configs (plain text files with .cells extension) from a certain site[2]. Initially I wrote a Python script, but when I learnt about wget's -r option, I replaced it with the following one-liner:
I had the exact same opinion until a month back when I started using scrapy + scrapyd[1] for a serious scraping task. Yes, it's a long running spider which is why I decided to use Scrapy in the first place. But so far I have been really impressed with it in general and might consider it even for one-off crawling in future. IMO, pipelines etc. are worth learning. I would also recommend using the httpcache middleware during development which makes testing and debugging easy without having to download content again and again.
On a separate note, for one-off scraping, it's also worth checking whether a python script is required at all. For example, if you just need to download all images, wget with recursive download option should work pretty well in most cases. For eg. a while back I wanted to download all game of life configs (plain text files with .cells extension) from a certain site[2]. Initially I wrote a Python script, but when I learnt about wget's -r option, I replaced it with the following one-liner:
$ wget -P ./cells -nd -r -l 2 -A cells http://www.bitstorm.org/gameoflife/lexicon/
[1]: http://scrapyd.readthedocs.org/en/latest/
[2]: http://www.bitstorm.org/gameoflife/lexicon/