I'd build it with two stages. First, a cache that holds user information (sort of replicating profiles), that is, follower count and etc. This would be shared among all users, and can go into a dedicated Redis instance (and why not, also replicated to a MySQL-InnoDB for convenience).
Then, the "graph" DB (follower list), that I'd put into Redis. With some scripting and Redis magic, you can keep automatically sorted (server-side) users by their follower count. You'll just need a lot of RAM (get a dedicated server, look at ovh or others, cloud is usually more expensive and less reliable when it comes to RAM).
You can collect profile information before they go on to the 1.1 (which forces auth), to populate the global DB. Then, you'd only have to fetch users' follower IDs (using the 1.1: followers/ids), which I believe is way more reliable (and progressively, pooling queries, populate the profile database in batches of 500 or 250 users, using followers lists with user details).
This means that data can be queried dynamically without killing the server (or the servers, there should be more than one), therefore allowing for "partial results" (1M followers -> info about the first 10,000 just after signing up, for example).
you do have a sensible approach that you suggest here. originally, without using redis, i wanted to use mysql to cache all user data. and insert/refresh details in it over time. the table quickly grew to 20M records (with meta-data information taking at least 1K of data per record if not more) , the database grew to multiple gigabytes. twitter has up 140M accounts or more now, so i'd need headroom here, although i'd likely not touch a large amount of twitter users.
also the system started making sense after a while when I had user ids that I had already cached (you are correct, the IDs I get through followers/ids which is much more well-though out function in terms of limits).
but then mysql constantly crashed and reparing/backing up a multiple-gigabyte-table exceeded my technical abilities and i gave up. so I split up everything into per-user-sqlite databases that I backup to S3. i lose the ability to access a cache of users though since I can't query other user's sqlite databases in a sane way to see if they have meta data for that user id.
major problem is that I believe twitter will eventually shut me down if I duplicate/replicate their user database (and I constantly need to refresh since user data will eventually be outdated).
Well, it's intended to work with followers/ids and other calls. Twitter might go against you, but that would be disregarding your calls… If they pull the plug it will be because of features, not because of how you use the API :(
you could easily apply sharding to that DB table and then scale horizontally as much as you like (just think big enough, so that you don't have to re-shard too soon..)
I'd build it with two stages. First, a cache that holds user information (sort of replicating profiles), that is, follower count and etc. This would be shared among all users, and can go into a dedicated Redis instance (and why not, also replicated to a MySQL-InnoDB for convenience).
Then, the "graph" DB (follower list), that I'd put into Redis. With some scripting and Redis magic, you can keep automatically sorted (server-side) users by their follower count. You'll just need a lot of RAM (get a dedicated server, look at ovh or others, cloud is usually more expensive and less reliable when it comes to RAM).
You can collect profile information before they go on to the 1.1 (which forces auth), to populate the global DB. Then, you'd only have to fetch users' follower IDs (using the 1.1: followers/ids), which I believe is way more reliable (and progressively, pooling queries, populate the profile database in batches of 500 or 250 users, using followers lists with user details).
This means that data can be queried dynamically without killing the server (or the servers, there should be more than one), therefore allowing for "partial results" (1M followers -> info about the first 10,000 just after signing up, for example).