The heavy usage of 3rd party tools does sound about right wrt what I've heard get used to vastly expand apps after they've got a minimum viable product people actually want to use though, particularly the usage of CDNs to reduce latency. However I haven't heard much about the details regarding ssr/issr vs ssg though, so if you have any reading that elaborates more on it I'd love to take a look.
Additionally, taking this scaled t-shirt app approach as a template can you share any insight into how a more complicated platform (eg Youtube, Amazon) would further leverage such in-house/3rd party services to most efficiently use their resources (ie at the fully mature platform stage)?
Is it possible you could share more specifics about these simulation figures? While I'm still very much pursuing technical/organizational info about what underpins the big scalable services (or the meta-scalable ones like firebase, which itself helps other devs make scalable apps) I'd be very interested to know how stark the difference in cost/performance is for this managed crud/firebase-heavy solution vs other offerings, or even just as a baseline of whats out there in the space of easier-to-architect scaling options.
I've heard this book (assuming it's https://dataintensive.net/) recommended before, but some have mentioned it's overly theoretical and doesn't offer much in the way of helping you build working examples of the material (although I suppose the complexity of the material does make toy examples a bit tricky to come up with). Do you know of anything that does have such offerings, or possibly if I'm thinking of a different book?
I'd actually been thinking about giving golang a try after seeing some very clever usage of concurrently operating Go routines, do you know of any particularly interesting server examples you could share?
As for the homework...
naive answer: constantly update every T shirt app instance with a record of the current stock, as every transaction completes, and then make sure the system all agrees before allowing additional transactions. Sort of like a really inefficient internal distributed ledger.
horizontial/vertical scale agnostic answer: split the stock tracking into two separate services. One that is focused on managing transactions/maintains the number of T shirts (presumably a hash table for different types of T-shirts with their quantities), the other being updated every time that inventory table changes. The first service can probably be further decomposed into constituent services but for now we'll keep it simpler as a single big one. At page load time the app instance can send a GET (or equiv) request to the secondary service to get quantities of all the shirts on a given page (using matching item ID/hashes) that is then cached in something like localStorage, with ones in limited supply or out of stock denoted as such by the browser/app on render. On a transaction completion you send an update request to the primary stock service, which then begins its process of updating the table/passing the updated table to service two. This will allow customers to add products to their cart with a much lower chance of finding out some are out of stock at check out, but also (hopefully) keeps the stock tracking footprint lower than something like a constantly polling no-matter-what or monolithic tracking system.
Given the bottlenecks I can already see with the checkout transaction part, I'm actually extra curious how to improve it, so I hope you'll share what the exemplar solution would be!
This sounds right up my alley honestly, could you explain a bit more about the affordances you set up for the decoupled container component to allow it to grow from a few containers to an orchestrated K8s cluster? Just some insight into how and when one decides to grow from a container or two to full on clusters would be really good to know.
As for sharing, while it's very generous of you to offer access and I'm certainly interested in using it once it gets a public launch, I'm much more in pursuit of the thinking behind system design choices and better developing the intuition that makes those choices. Honestly the framework you're putting together sounds like it could help a lot of people with similar problems to me though :D
Sure; because it is a monolith, it makes things simpler by having just the one backend container to scale. So it's a matter of making more instances of it available (via # of pod replicas in your k8s cluster) for increasing availability.
You can start using k8s right away, and just have 3 replicas running to start. As you scale, just up the number of replicas (and nodes as you need them) as you go.
Your real bottleneck becomes the database at that point (in addition to any blocking 3rd party APIs you may be using), which I would not host in k8s but use a managed service such as AWS RDS. This bottleneck will make itself apparent later on, depending on your application and the scale you reach. But you should definitely have the resources to cross that bridge once you, if ever, reach it, because you should be dealing with a large number of customers at that point.
Ah gotcha yeah at the small scale(?) scaling we're talking about monolithic applications, being a bit simpler to organize and run, do still make for a compelling solution. That's a great tip regarding how to handle ballooning storage issues via managed cloud offerings (in the weird case I make something that really works) that I hadn't considered. However, it's starting to feel like these scaling questions/solutions are a lot more akin to Factorio bottleneck chasing than I would like haha.
That could work for a smallish CRUD application I think (just using the function as a service approach to really squeeze efficiency out of the compute time), but yeah the specific numbers aren't quite as important as learning about how to acquire a scalable fraction of the power behind the massively distributed platforms that now dominate the world. This seems like one such approach that some devs have very definitely made working services from, based on a quick search.
Technical debt is a made up problem and cannot hurt you :,(
All of this is entirely fair and worth considering, given pretty much all mature frameworks do tend to have some extension or build out option to make them "scalable" as far as I can tell. Now to choose one that's not Django, which I've heard is a bit of a nightmare for this sort of building out (but maybe more because of the problems you mentioned than anything wrong with the framework itself?)
I think any framework that has been deployed at scale is going to have people saying it's a nightmare. But very few people have deployed comparable applications in different frameworks at a comparable scale. With only one data point you can't really draw any conclusions.
It's a different matter if you can argue from a specific technical feature of a framework that makes it unsuitable. I don't know Django, so I don't know if anything like that applies to it. For now I think you should focus on finding something, anything, that's both enjoyable to work with, and lets you focus more on developing your business than worrying about architecture or menial implementation details.
So a lot of the scalability complaints arise from the fact it's written in Python, but as far as I can tell they're largely newer developers trying to grow some web app they made rather than people presenting at PyCon about why it's awful. Fortunately I am not trying to bootstrap a business from this thread, but I agree that it's far better to spend a few weeks getting my hands dirty with some framework or language that I'm familiar with than just reading about all the cool things I can do in some other language or framework that I need to still learn.
Thank you for the fleshed out answer! These are the sorts of considerations and solutions I was looking for. The first scaling step is probably the most informative of all though, is it possible you could point me towards additional reading about how developers have handled that first scaling step/the decision making behind what should be prioritized to go in the limited cache space (naive example: localStorage in js). Even if you can't, I really do appreciate what you've shared already!
Ah that makes a bit more sense. Could you possibly point me in a more specific direction, eg a current system design guide aimed at developers just trying to get their feet wet with things more complex than an SPA? Yeah, that issue is one I've noticed; you're expected to sort of pick this knowledge up from a job but to get the job where you become acquainted with it you're likely gonna need at least some relevant experience with it.
I've heard it recommended before, but some have mentioned it's overly theoretical and doesn't offer much in the way of helping you build working examples of the material (although I suppose the complexity of the material does make toy examples a bit tricky to come up with). Do you know of anything that does have such offerings?
It's not exactly what you're asking for but if you want to grok distributed systems and managing workloads then learning some Erlang/Elixir (OTP runtime) really helped me, as you can "code along" with your book of choice and they handle real-world situations like node failure and backpressure management.
Other topics that come to mind are books about building out microservice architectures. There are certainly plenty of war stories out there and micro-services seem to tend towards re-implementing OTP runtime primitives in arbitrary languages as design patterns, so you get even more of a feel for what's going on at a lower level of abstraction.
So, as I mentioned in the other reply I'm not really trying to start a business here I just want to understand more about the processes and tools underpinning these vast scalable web apps that surround us. But this advice sounds pretty reasonable relative to how one would actually go about creating a small tech startup and iterating on it enough to get into YCombinator (for example).
Given it doesn't sound like you're gonna share the info about scalable infrastructure, could you possibly provide some guides or extra reading for the patterns and practices one should be going with at the the small and feisty stage? I might as well take some notes down about this side of things if you have materials that speak to it.
The advice given by the original commenter is how you should start any application, whether it's one that's going to be used internally at a company, the main product at a startup, or a single app among a suite of existing applications at an established company.
Scaling is a reward for building something useful. Building the useful thing is harder.
Some generalities though, try to organize your application in such a fashion that data that is often needed together is stored close together, geographically, and try to shard (i.e. separate out) your data based on some identifier that can be sliced into many small pieces.
Given I have zero intention of making it any of those things, and this question was aimed at specifically learning more about the technical underpinnings of the rewarding part, I'm not really sure it's the right advice for what I'm doing. You will note that I indeed recognize it as being good advice in general though.
The generalizable advice about data co-location/data sharding is definitely something I will keep in mind (if this weird learner project really involves data in such quantities) however, thanks!
I can give you examples of scalability benchmarks from gaming and extrapolate from there why it's such a moving target to pin down what makes systems fast and scalable.
Now, the general measure of technical proficiency in game engines is in how detailed your scenes are and how fast they render. If you are rendering empty space then it's quite easy to blow up the scale by making the numbers big, and this is how early space games like the original Elite operate; a simple scene defined by a few numbers and some procedural generation can be made into the whole galaxy by repeating that scene with a different seed number. It is taking advantage of the saying "it's easy to get a wrong answer infinitely fast" by defining wrong answers to be right.
So we have to look at what's actually being processed to simulate and render the scene to understand scaling. And right away that should trigger something in your head about applications: if they have fewer features, their processing is simpler, so they scale more readily. Scaling problems are produced by feature complexity creating bottlenecks that can't be optimized by rote. And in most cases, we would rather have our apps produce right answers slowly than wrong ones quickly, hence the product design is a critical part to optimization: if we know our design will never need a certain feature, that's the place where we can optimize it.
From there, you can dig into the nuts and bolts of defining what kind of performance envelope you expect to have: so in games you might use a target frame rate, polygon count, texture memory, and the number of live AIs and entities. But as you build out the game the numbers start moving around because you're still adding features: when you add detailed animations with a lot of bones you spend some more of your CPU budget to deform the model. Every shader effect could have a GPU time cost. When you add audio and audio processing you have to allocate some memory and CPU time to the playback and effects. If you want to continuously stream in a scene(as is done in open-world games) you have to consider the rate and latency at which you can load it off persistent media, which leads to various different strategies. So you don't know at the beginning quite what you need. Instead you try to set general targets for what you'll try to hit, stand up a test scene with similar numbers, and then iterate on them later as you get more developed, with more features, fleshed out scenes and final assets.
On early cartridge platforms streaming was generally done off ROM with bank switching, which made it nearly instant: NES Zelda 2 does it up to hundreds of times on the overworld screen, because it was given a rough port from Famicom Disk System, which had more working RAM. This causes slowdowns in some parts of the map.
Games on CDs and DVDs had a huge capacity but limited bandwidth and high latency: this meant that the strategy to get the most out of them involved physically locating the data in places where the drive head would seek quickly, and then linearizing the data so that it didn't have to stop and start. Which meant that some data would have multiple copies for different scenes.
Modern gaming on SSDs changes the paradigm again, back towards lower latency accesses bolstered by hardware decompression: that allows the games on new consoles to eliminate loading screens.
Now, in a web app you can encounter a similar kind of thing with your database accesses and frontends. Some applications need to write very frequently, others need to read a lot. The distribution of reads and writes can vary(e.g. hosting one very popular video versus a sprawling e-commerce platform). These things determine where scaling needs to take place. But if you have no real users, you have an "empty space" scene where the bottlenecks aren't present because there's nothing to do - you can guess, but even the best guesses tend to be wrong when a site starts getting serious traction. Will you be able to batch things up like a DVD access? Will you need something like global state like a social network, or is the state just limited to the user session? You don't really know what it'll look like until the features go in and you can start profiling against the real-world samples.
It's not that anyone is trying to hide the secrets - it's just that scaling is a speciality you only end up possessing through the direct experience of trying to get a little more out of the architecture you have; the specific thing you learned may not apply if your next project has a different performance profile and different hardware.
In the meantime, the next best thing would be to take large existing datasets, construct synthetic benchmarks out of those, and then have fun optimizing them. Stuff like "how fast can I load this enormous CSV, do trivial processing, then store the result".
Alright, you win: this answer is fantastic. This is a far, far better way to think about what limits scalability than simple things like pages served per millisecond. I never did expect to find ready answers that will guide me to making the universally scalable app, but now I see the problem can be reduced even further into niche sorts of scaling which, while solvable with hardware tricks, do their very best to escape generalization.
Additionally, taking this scaled t-shirt app approach as a template can you share any insight into how a more complicated platform (eg Youtube, Amazon) would further leverage such in-house/3rd party services to most efficiently use their resources (ie at the fully mature platform stage)?