First, you could try Datasift or Gnip, who both sell twitter data, and thus have no API limit. Not sure if you can afford it, as it does have a cost.
Second, maybe you could use the streaming API: you could get part of the data that way, and have more credits.
If users follow back, you could use the sitestream, although it's quite different to work with then the REST API.
Thirdly, if I read correct, if Alice and Bob are both your clients, and Fred follows both, you now collect Fred's data twice, right? I would put a cache in between that. Riak, cluster of redis, or even S3 or DynamoDB.
If I can help more, send me an email (in profile)
Lastly, if you have twitter investors in your userbase, ask for an intro to talk to twitter. They see the value of your service.
Hey! Datasift+Gnip both seem to supply conversational data, which I currently don't track. My focus is on user-data, which both don't seem to provide. And yes, they're expensive, wow.
The streaming API idea is a good one to get the most out of all of Twitter's data sources... someone else suggested this as well. Seems like it's worth a try.
Your Alice/Bob/Fred assumption is correct. I had a cache of sorts, through a large mysql table which went way over a couple of gigabytes and kept crashing all the time and restoring took half a day.
The twitter investors haven't had a chance to see any of the service since they are still waiting for results :) tough one to ask for intros ;)
The first thing I would try is rebuilding the cache. Since you only need to do key-value lookups, there are many (easy) approaches, that can scale better than MySQL. You could do JSON files on a filesystem (with some nesting, as you don't want to put 20 million files in 1 dir). Or Redis: 20 million x 1 KB is 20 GB. You will need more with overhead, but a few machines would work. Or Riak (we went > 1 billion items with Riak).
But even MySQL should be able to handle this: We had over 100 million records in MySQL (on SSD's) before switching to something else.
That seems to be the quickest win to reduce the number of requests, but it will depend on the overlap of your usersets how much this will help.
Second, maybe you could use the streaming API: you could get part of the data that way, and have more credits. If users follow back, you could use the sitestream, although it's quite different to work with then the REST API.
Thirdly, if I read correct, if Alice and Bob are both your clients, and Fred follows both, you now collect Fred's data twice, right? I would put a cache in between that. Riak, cluster of redis, or even S3 or DynamoDB. If I can help more, send me an email (in profile)
Lastly, if you have twitter investors in your userbase, ask for an intro to talk to twitter. They see the value of your service.