I read some of these of massively complex data architecture posts and I almost always come away asking "What the hell is this for?" I know Shopify is a huge business but I see this kind of engineering complexity and all I think is it has to cost tens of millions to build and operate and how could they possibly be getting ROI. There are ten boxes on that diagram and none of them have a user interface for anyone except other developers.
A lot of times this is used for data warehousing so product managers and otherwise can query the database of one app joined with another, especially in an environment with microservices. You might join a table containing orders with another table that was from a totally different DB, like payments, to find out which kinds of items are best to offer BNPL or something.
The author also mentions that it’s used for machine learning models which will ultimately feed back into Shopify’s front end, for instance.
I know what a data warehouse is for but this whole situation has doesn't even cover the reporting system which itself is a level removed from any actual product decisions and whether those decisions result in incremental revenue to justify this cost. My company is a fraction the size of Shopify but we have a robust data pipeline and reporting on millions of users run by two people and off the shelf software.