Author seems to slightly miss the point of Node. It’s designed for IO-bound work, and essentially nothing else.
If you have a long-running synchronous algorithm, you should not be running it in Node, or alternatively you could dispatch to another process/C lib and have it run in a true thread and asynchronously wait for the result.
I work on a giant [Health|Fin|Defense]Tech monolith which has been around forever, has do to everything for everyone, and has been worked on by hundreds or thousands of developers with radically different skill levels. It connects to many databases, external services, etc., and does some immensely complex data munging just to render what you'd think are simple pages (since the inflexibility of the backing model and limited space to make denormalizations of really big data mean everyone has to do super complex aggregations in the app server, across data sources X, Y, and Z, all to show the user a 10 row table).
In short, it's huge, ugly, and computationally expensive.
I was asked to quickly research the benefits of switching its platform (a single-threaded scripting language) to NodeJS (I wasn't told to research anything other than NodeJS, despite my objections).
I figured the savings would be minimal, since all our application servers are constantly running out of CPU (page loads crazy expensive, see above) or mem (aggregations crazy expensive, see above).
So I broke down what the app was doing on some representative servers, working both from the coarse level (dtrace/system resource usage) to the fine level (flame graphs of calls/wait time/yield events within the application runtime itself). I didn't profile the batch processing services; they were RPC'd to via the renderers, and used more appropriate languages/patterns for huge-data manipulations. As far as my profiling was concerned, they were functionally databases.
The result? 88% on average time spent waiting on IO or blocking non-block-file system calls. P90 was 99% blocking.
That went totally against my assumptions.
Sure, our webservers were overloaded with non-IO load, but if we were to switch to non-blocking IO and buy more webservers, we'd have gotten a massive performance increase without having to change the fundamental architecture of our webapp.
That was when I started seriously considering the benefits of reactive-style programming in a single thread, a la NodeJS; it hits a nice balance between "programmers that aren't necessarily super skilled having to engage with a full/real concurrency system" and "do everything blocking one per process".
There are tons of downsides, of course. Switching to nonblocking IO after spending so long in a blocking world would require both massive technical expenditure, and would probably also require reorganizing the capacity planning of all the other services/databases the app servers talked to, since they'd be fielding a lot more requests. Basically, the blocking nature of the render loops was an informal rate limiter on database queries. Parallelizing the render loops via processes gave much more direct control of changes in resource utilization, which is nice for proactive scaling. Additionally, node/callback style is still harder to learn (even with async/await sugar) than plain 'ol top-to-bottom sequential code. All that said, we'd be rewriting code in a new platform that looked different, but the code could still do the same things, per render, in the same order, which is a huge benefit.
A platform that hides preemption/concurrency while allowing people to program in the sequential style (e.g. Erlang) might have been a better fit, but . . . we were already using one of the best M:N resource schedulers in the world, the Linux process scheduler, to multiplex concurrent sequential processes that were just . . . linux processes. At the end of the day, I gained a lot of respect for the power and balance struck by single-thread/event-loop-driven reactive runtimes like Node.
So I broke down what the app was doing on some representative servers, working both from the coarse level (dtrace/system resource usage) to the fine level (flame graphs of calls/wait time/yield events within the application runtime itself). I didn't profile the batch processing services; they were RPC'd to via the renderers, and used more appropriate languages/patterns for huge-data manipulations.
That's very interesting; you were given a huge old legacy app that had scaling issues (cpu and memory). Presumably the business was tired of throwing hardware at it? Or they hadn't gotten to that stage yet? Continuing, and tasked with diagnosing the performance problems you looked at the system and application. I can read up on dtrace, but how did you profile the application level stuff (time/yield)? Was it some functionality provided by the run time? Ie java visual vm for example?
This is a problem many, many companies have; ill performing legacy apps that the "legacy" staff aren't capable of handling (because the talented people left long ago ie "don't move my cheese"). It'd be really educational to see a write up of this!
It broke down roughly like this. I can't write it up, and am being vague, because I don't wanna get yelled at, sorry. Googling the below techniques will get you started, though.
1. Simple resource usage (system time vs. user time, memory, etc.) got the metrics for how long the OS thought the app was spending waiting on IO.
2. Dtrace was able to slice those up by where/how they were being called, which syscalls were being made, and what was being passed as arguments. This was important for filtering out syscalls that would remain a constant, high cost (e.g. block local file operations on old versions of Linux, which we have, get farmed out to a thread pool in NodeJS, so I pessimistically budgeted as if that thread pool were constantly exhausted due to volume + filesystem overuse).
3. In-runtime profiling. We have the equivalent of java visual VM (well, more primitive; more like jstack plus some additional in-house nice features we built, like speculative replay), but for our scripting language platform. That generated flame graphs and call charts for processes. Those were somewhat useful, but tended to fall into black boxes where things called into native code libraries, which was where the dtrace-based filtering data was able to help disambiguate. Using this we got a comprehensive map of "time spent actually waiting for IO".
There was a lot more to it than that, though:
Since different syscalls both had different call overhead (and different call overhead depending on arguments supplied) and different blocking times, all 3 steps were necessary.
For example, an old monolithic chunk of code that did ten sequential queries to already-open database connections is going to issue select(2) (or epoll or whatever) at least ten times. Conversion to Node, and it's single-poll-per-tick model, would make that cost vastly reduced, making it move the performance needle a lot. Of course, that's only true if the ten queries in question can actually be parallelized, which typically requires understanding the code . . . if it can be understood, which is not a given.
However, a page render that called ten different HTTP services would make ten full-cost connect(2) calls in the worst case, ten low-cost (keepalive'd) connect calls in the best case. Node would still have to make those same ten calls, making it a less needle-moving thing to move into nonblocking IO (though the time spent waiting for connect to complete or time out would still not be paid directly in the render, which had to be accounted for as a positive). And it goes deeper: depending on the services being hit, keepalive window, and rate at which they were called during a typical server's render operations, we had to calculate how often, say, a 50-process appserver worker pool would be redundantly connecting to those services (because separate sibling processes can't share the sockets if they're initiating the connection, and before you ask, I would not like to add to the chaos by passing open file descriptors over unix sockets between uncoordinated processes thank you very much). If the redundant connect rate was high, Node might offer significant savings by allowing keepalive'd sharing of connections within a single node process (we'd need many fewer of those per server than appserver workers). If it was low, fewer savings.
TL;DR it's complicated but possible to measure this data using established techniques. You don't have to get super far down the rabbit hole to get a decent guess as to whether it will be beneficial to performance, but transforming decent into good requires a fairly thorough tour of Linux internals.
And, as always, the decision of whether or not to switch hinged primarily on the engineering time required, not the benefits of switching. C'est la vie.
That's what other languages are for. Logic handling IO goes in Node.js, and everything else can go in native modules, child node processes, or other services that fulfill requests.
If you can design an application where this kind of separation won't overcomplicate everything, Node.js is probably pleasant to deal with.
I run an motor insurance company. The vast majority of our services are indeed exclusively IO-bound. The very small amount of non-IO-bound work happens mostly in Go.
Not sure that's fair. I think his point was that instead of improving performance by switching to async and navigating callback soup and task unfairness, we should really be improving the performance of context switching, and he probably has a point.
Just a few weeks ago there was an interview with Ryan Dahl where he said:
"...I think Node is not the best system to build a massive server web. I would definitely use Go for that. And honestly, that’s basically the reason why I left Node. It was the realization that: oh, actually, this is not the best server side system ever."[0]
If you have a long-running synchronous algorithm, you should not be running it in Node, or alternatively you could dispatch to another process/C lib and have it run in a true thread and asynchronously wait for the result.