Pretty cool use case for virtual threads. Recently I implemented something with core.async in order to batch messages and dispatch them to some UI code at a fixed rate. It was definitely a lot more complicated than using a single sempahore like shown in this article.
On a related note, it would be nice to have an implementation of CSP using proper virtual threads rather than the thread pool currently used by core.async. The promesa library currently has a proof-of-concept[1], but it doesn't seem interoperable with existing core.async code.
(author here) Virtual threads being "free" definitely makes things simpler to implement, I was actually quite surprised.
Where I think things would get more complicated is if you wanted to control how bursty the token bucket is. You'd probably need another semaphore to limit the bursts etc.
This implementation is naive and does create a virtual thread for each permit that is waiting, so you have 2x the number of threads as you would have with a more complex implementation (i.e 1 per running task and 1 per waiting permit). This would be a no go with regular threads, but seems to work fine with virtual threads. It's pretty cool that virtual threads make naive implementations work.
Good to know that refs and atoms are not safe to use in virtual threads. I would have definitely found that out the hard way since atoms are so simple to use.
There have been several comments from the core team folks that a vthread variant of core.async is being considered. It would be a different library and may be somewhat API-compatible -- but that's all up in the air right now.
> things get complicated with virtual threads, they shouldn't be pooled, as they aren't a scarce resource
Why not pool virtual threads, though? I get that they’re not scarce, but if you’re looking to limit throughput anyway wouldn’t that be easier to achieve using a thread pool than semaphores?
(author here) From what I've read, other than documentation saying they shouldn't be pooled, is that by disign they are meant to run and then get garbage collected. There's also some overhead in managing the pool. If someone has a deeper understanding of virtual threads I'd love to know why in more detail.
As to why use a semaphore over a thread pool for this implementation? A thread pool couples throughput to the number of running threads. A semaphore lets me couple throughput to started tasks per second. I don't care how many threads are currently running, I care about how many requests I'm making per second. Does that make more sense?
Pooling virtual threads has no upside and potentially a bit of downside: 1. You hang on for unused objects for longer instead of returning them to the more general pool that is the GC; 2. You risk leaking context between multiple tasks sharing the thread which may have security implications. Because of these and similar downsides you should only ever pool objects that give you benefit when they're shared -- e.g. they're expensive to create -- and shouldn't pool objects otherwise.
Thank you! You incur this risk when pooling any kind of thread, too, but with platform threads at least pooling makes sense because they're costly, so you just need to be careful with thread locals on a shared thread pool. Not needing to share threads and potentially leak context is a security advantage of virtual threads.
Aren't "virtual threads" built on a thread pool themselves? I suppose there would be no advantage in pooling an already pooled resource since presumably the runtime would manage pooling better than user code.
On a related note, it would be nice to have an implementation of CSP using proper virtual threads rather than the thread pool currently used by core.async. The promesa library currently has a proof-of-concept[1], but it doesn't seem interoperable with existing core.async code.
[1] https://funcool.github.io/promesa/latest/promesa.exec.csp.ht...