So, as I mentioned in the other reply I'm not really trying to start a business here I just want to understand more about the processes and tools underpinning these vast scalable web apps that surround us. But this advice sounds pretty reasonable relative to how one would actually go about creating a small tech startup and iterating on it enough to get into YCombinator (for example).
Given it doesn't sound like you're gonna share the info about scalable infrastructure, could you possibly provide some guides or extra reading for the patterns and practices one should be going with at the the small and feisty stage? I might as well take some notes down about this side of things if you have materials that speak to it.
The advice given by the original commenter is how you should start any application, whether it's one that's going to be used internally at a company, the main product at a startup, or a single app among a suite of existing applications at an established company.
Scaling is a reward for building something useful. Building the useful thing is harder.
Some generalities though, try to organize your application in such a fashion that data that is often needed together is stored close together, geographically, and try to shard (i.e. separate out) your data based on some identifier that can be sliced into many small pieces.
Given I have zero intention of making it any of those things, and this question was aimed at specifically learning more about the technical underpinnings of the rewarding part, I'm not really sure it's the right advice for what I'm doing. You will note that I indeed recognize it as being good advice in general though.
The generalizable advice about data co-location/data sharding is definitely something I will keep in mind (if this weird learner project really involves data in such quantities) however, thanks!
I can give you examples of scalability benchmarks from gaming and extrapolate from there why it's such a moving target to pin down what makes systems fast and scalable.
Now, the general measure of technical proficiency in game engines is in how detailed your scenes are and how fast they render. If you are rendering empty space then it's quite easy to blow up the scale by making the numbers big, and this is how early space games like the original Elite operate; a simple scene defined by a few numbers and some procedural generation can be made into the whole galaxy by repeating that scene with a different seed number. It is taking advantage of the saying "it's easy to get a wrong answer infinitely fast" by defining wrong answers to be right.
So we have to look at what's actually being processed to simulate and render the scene to understand scaling. And right away that should trigger something in your head about applications: if they have fewer features, their processing is simpler, so they scale more readily. Scaling problems are produced by feature complexity creating bottlenecks that can't be optimized by rote. And in most cases, we would rather have our apps produce right answers slowly than wrong ones quickly, hence the product design is a critical part to optimization: if we know our design will never need a certain feature, that's the place where we can optimize it.
From there, you can dig into the nuts and bolts of defining what kind of performance envelope you expect to have: so in games you might use a target frame rate, polygon count, texture memory, and the number of live AIs and entities. But as you build out the game the numbers start moving around because you're still adding features: when you add detailed animations with a lot of bones you spend some more of your CPU budget to deform the model. Every shader effect could have a GPU time cost. When you add audio and audio processing you have to allocate some memory and CPU time to the playback and effects. If you want to continuously stream in a scene(as is done in open-world games) you have to consider the rate and latency at which you can load it off persistent media, which leads to various different strategies. So you don't know at the beginning quite what you need. Instead you try to set general targets for what you'll try to hit, stand up a test scene with similar numbers, and then iterate on them later as you get more developed, with more features, fleshed out scenes and final assets.
On early cartridge platforms streaming was generally done off ROM with bank switching, which made it nearly instant: NES Zelda 2 does it up to hundreds of times on the overworld screen, because it was given a rough port from Famicom Disk System, which had more working RAM. This causes slowdowns in some parts of the map.
Games on CDs and DVDs had a huge capacity but limited bandwidth and high latency: this meant that the strategy to get the most out of them involved physically locating the data in places where the drive head would seek quickly, and then linearizing the data so that it didn't have to stop and start. Which meant that some data would have multiple copies for different scenes.
Modern gaming on SSDs changes the paradigm again, back towards lower latency accesses bolstered by hardware decompression: that allows the games on new consoles to eliminate loading screens.
Now, in a web app you can encounter a similar kind of thing with your database accesses and frontends. Some applications need to write very frequently, others need to read a lot. The distribution of reads and writes can vary(e.g. hosting one very popular video versus a sprawling e-commerce platform). These things determine where scaling needs to take place. But if you have no real users, you have an "empty space" scene where the bottlenecks aren't present because there's nothing to do - you can guess, but even the best guesses tend to be wrong when a site starts getting serious traction. Will you be able to batch things up like a DVD access? Will you need something like global state like a social network, or is the state just limited to the user session? You don't really know what it'll look like until the features go in and you can start profiling against the real-world samples.
It's not that anyone is trying to hide the secrets - it's just that scaling is a speciality you only end up possessing through the direct experience of trying to get a little more out of the architecture you have; the specific thing you learned may not apply if your next project has a different performance profile and different hardware.
In the meantime, the next best thing would be to take large existing datasets, construct synthetic benchmarks out of those, and then have fun optimizing them. Stuff like "how fast can I load this enormous CSV, do trivial processing, then store the result".
Alright, you win: this answer is fantastic. This is a far, far better way to think about what limits scalability than simple things like pages served per millisecond. I never did expect to find ready answers that will guide me to making the universally scalable app, but now I see the problem can be reduced even further into niche sorts of scaling which, while solvable with hardware tricks, do their very best to escape generalization.
Given it doesn't sound like you're gonna share the info about scalable infrastructure, could you possibly provide some guides or extra reading for the patterns and practices one should be going with at the the small and feisty stage? I might as well take some notes down about this side of things if you have materials that speak to it.