Yeah, I was about to suggest Datasift as well. IIRC I had a chat to some of them...

mittermayr · on Nov 5, 2012

yeah, their data records are sometimes faulty and I need to scale down (even if it's not a time out) to 1 record per request to find the faulty in one hundred accounts. so that sucks.

and regarding replicating, originally that's what I did. I had up to 20M of Twitter's 140M records cached almost - but that probably wasn't cool with them on the long run and i was unable to maintain a database with one table having multiple gigabytes of data.