Yeah, I was about to suggest Datasift as well. IIRC I had a chat to some of them at the last SiliconMilkRoundabout & they seemed like they knew their stuff.
It does occur to me that the OP is effectively trying to replicate large chunks of the twitter datastore & that's going to be very difficult to manage! It's not like twitter themselves were particularly reliably to start with after all.
yeah, their data records are sometimes faulty and I need to scale down (even if it's not a time out) to 1 record per request to find the faulty in one hundred accounts. so that sucks.
and regarding replicating, originally that's what I did. I had up to 20M of Twitter's 140M records cached almost - but that probably wasn't cool with them on the long run and i was unable to maintain a database with one table having multiple gigabytes of data.
It does occur to me that the OP is effectively trying to replicate large chunks of the twitter datastore & that's going to be very difficult to manage! It's not like twitter themselves were particularly reliably to start with after all.