Since the third party service conducted rate-limiting based on IP
address (stated in their docs), my solution was to put the code that
hit their service into some client-side Javascript, and then send
the results back to my server from each of the clients.
This way, the requests would appear to come from thousands of
different places, since each client would presumably have their own
unique IP address, and none of them would individually be going over
the rate limit.
Pretty sure the browser Same Original Policy forbids this. Think about it- if this worked, you'd be able to scrape inside corporate firewalls simply by having users visit your website from behind the firewall.
> Since the third party service conducted rate-limiting based on IP
By the way, that's one of my projects. You can use a basic fibonacci-related algorithm to figure out (in the most minimal number of requests) what exactly the rate limit is. This way, you can scrape at just under the maximum limit. I am still working on this core library though. :|
That's a great point, for most web services, this request would be blocked at the browser level by the Same Origin Policy. Fortunately for me, this site allowed client-side calls by returning a Access-Control-Allow-Origin: * header[1], specifically designed to allow this type of cross-domain access.