It's impressive to me that you can build a project on something (here Ruby), and if you get big enough that the initial decision is a problem for some reason (here performance), you can just improve on it like that.
How does one generally get to work on compilers? Is it mostly through academia? Through open source contributions to compilers?
I've been meaning to write a blog post about this because I get asked about my personal journey to working on the team a bunch.
The makeup of the current team is a split of:
- people who worked on programming languages & compilers in academia
- people who worked on developer productivity/product infrastructure at large tech companies
- people who just really like types (me)
Which is to say: there's a deal of luck involved, but there are also some fairly straightforward ways to inch closer to PLs/compilers job in industry.
That all being said, the piece of advice I always give people who bring this up: there's nothing stopping you from working on PLs/compilers right now! Find a ticket labeled "good first bug," show up in the #internals channel on the Sorbet Slack[1], and someone would love to help you get started.
Don't care about Sorbet? That's fine! Pick your programming language of choice, find where the developers hang out, and repeat the process. Oftentimes, there's nothing holding you back from working on PLs and compilers except yourself.
The Rust compiler is a great example of one that is very beginner friendly. Diagnostic (error message) issues, of which there are many, are a relatively easy way to get started and learn the codebase. The community is also very friendly to new contributors, and you can learn your way around the compiler and choose to work and learn things that interest you. There are certain things you'll need to know for certain parts of the compiler, such as some graph theory if you're working on Rust's MIR, and you should read a high level overview of how compilers work. Compilers are a huge, never-ending field of study/work. Start small and work your way up from there.
If you start looking for mini languages in a normal project (or where they would fit) you will see that there are frequently many opportunities to do something useful with small domain specific languages.
I suggest starting with parser combinators rather than page 1 in the Dragon Book. They are a nice way to get things done quickly.
Almost everyone who works on the D compiler has come to it as a hobby. The books don't teach you how to write a good compiler i.e. most textbooks spend years on parsing theory when 99% of real compilers use handwritten parsers which can basically be made up on the fly.
Facebook did the same for PHP: added gradual typing (enforced in 99.9% of code with mandatory linters), a better standard library, and a compiler. That was much cheaper than rewriting everything.
Just wondering, does FB still use PHP / Hack for new services? I personally don’t mind the language but for some reason I imagine it being unpopular internally.
They still use it for anything that runs on the application layer, i.e. stuff the runs in-process on the machines that serve web and GrahphQL API requests.
FB isn’t as steeped in modern service-oriented ideology as most similar companies, so that `www` monorepo represents a lot more code than you probably expect, including stuff you’d normally think of as “services”.
True backend services do exist outside the PHP application, and are typically written in C++. Java, Rust, and Python also exist in various niches.
The only mainstream language I have never seen anywhere at FB is Go, although I’m sure any language you can imagine is used by _some_ random team out there.
I’m not sure why Hack would be unpopular. On the contrary, it’s the default choice for starting new projects since all of the internal tooling is built mostly with Hack in mind, so it’s the path of least resistance for developers.
> How does one generally get to work on compilers? Is it mostly through academia? Through open source contributions to compilers?
You can just do it! As a place to start I'd recommend looking for tutorials on recursive-descent parsing, and just try to make a program that parses a toy language and transpiles it into C, or some other language you're familiar with. Then you can grow it from there.
You can get really fancy with things automatic parser generation, and using an LLVM back-end to enable optimization, but writing a simple compiler is a reasonable project for someone with even a basic handle on programming and a little creativity to just jump in and figure out.
I've been using Sorbet on a personal project (https://github.com/connorshea/vglist) for almost two years now, and it's been great (although I've had to build myself a lot of tooling for it over time).
I don't think I'll use the compiler (for now, at least), but I'm very interested in seeing how it grows over time :)
A bit off topic, but I'm constantly impressed by the flexibility of LLVM to be adapted to many different language compilation tasks, and immensely grateful for its existence. One of my favourite examples of its use is SAW [0], which symbolically executes LLVM bytecode to allow automated verification of C programs. Excited to see the barrier to optimised compilation come down for new languages!
IMO the Sorbet additions do impact readability, but it dramatically improves understandability. In a codebase past a certain size/number of contributors, you're always going to have ambiguity about what sort of data you're working with. For example, in my part of the codebase we have models representing an entity that is identified by a number in external systems and represented as a whole data object in our code. If I have `foo.entity`, it's unclear just from the "real" code whether that'll return a data object or just the numerical identifier. Having the Sorbet annotations on that method (and available on hover in the IDE) make it a lot easier to navigate the codebase and resolve these ambiguities.
I agree with this take. I work at Shopify using Sorbet and Rails. On a monolithic code base it can be very helpful to know what type is going to be returned.
On a smaller code base I may not bother with it for the reasons the original commenter mentioned.
Does it still feel like Ruby though or is it a new beast? I mean Typescript doesn't really look like Javascript at all and your whole mindset changes when you use it.
So is Shopify now typing the whole monolith ?
IMO it depends on what part of the ruby feel you most enjoy. Sorbet makes it harder to work with the runtime-magic parts of Ruby (and in some cases we rely on tooling-generated code instead of runtime dynamic stuff), but it doesn’t affect the code you’d write within methods all that much.
I second this. I tried using Sorbet on a project with non-trivial logic, and it significantly degraded readability, and stuff that I would find "normal" to do in Ruby had unfortunate issues with typing. It's good, but nowhere near TS/JS, and that's just an unfortunate effect of fighting the syntax of the language. IMO they should've done a "soft fork" of the language like TS did, with special syntax for typing that compiled down to good ol' Ruby.
The only thing that I find a bit disheartening is that typing in Ruby diverged between the official Ruby 3 implementation and Sorbet. It would have been much better, in my opinion, if there was one typing solution that both Ruby core team and Stripe worked together on.
Sorbet works well with .rbs files! (The Sorbet team helped the Ruby Core team define the syntax).
I agree it's too bad that Sorbet's syntax choices weren't adopted "officially" by Ruby Core – I think matz had some unfortunate strong opinions about types never appearing in normal .rb files – but I wouldn't be too disheartened a Rubyist as long as the community adopts Sorbet anyway, which it seems to be doing.
> as long as the community adopts Sorbet anyway, which it seems to be doing
Really no to that, where is this coming from? Stripe is Stripe and yes perhaps a couple of huge companies here and there may give Sorbet a try (although I believe Shopify didn't go this route, neither has Github afaik), but as a community Ruby is a dynamic community. And also, Ruby is much more characterized as a tech for small-mid teams (where types are arguably not helpful) than as an enterprise tech which is mostly typed.
I don't get the whole choose Ruby and sparkle types on it with some tool, you have much better tools to do that - they are called Java/C#.
Alright. A few very lucky and successful companies adopted Sorbet "to some extent". That doesn't mean the Ruby community is moving in that direction at all. It just means if you are in the lucky 1% of companies that reach that crazy scale and success you'll have better tools now.
I don't think your average startup needs Sorbet or that the Ruby community is moving in that direction (let's face it: the Ruby community is mostly Rails, and Rails is not going in that direction).
Yep, totally agree with you, I was merely correcting a factoïd, and sorry for the very terse response, I was on the go.
And even the "better tooling" part is somewhat subjective. The added strictness and explicitness is valued by some, but for others it's seen as degraded readability.
Any source to this statement? The guy you are responding to (byroot) works there and is a pretty known figure in Ruby/Rails/Shopify worlds, I have no reason not to accept what he's saying.
I also work at Shopify, though I only joined recently.
It seems like he works on a different part of Shopify, which might explain our difference in opinion. The Ruby code I write has to be typed: true at minimum, usually typed: strict.
They did communicate and collaborate at least a bit. I remember reading about the reasons why Ruby didn't go with the sorbet approach, but unfortunately I don't remember now. I did find this but it doesn't fully answer: https://sorbet.org/blog/2020/07/30/ruby-3-rbs-sorbet
I think I understand GP's motivation: RBI files and RBS files are two different formats, and as a user of the language, people tend to want to use the officially blessed solution the language provides.
In case you weren't aware, parlour[1] is a popular open source project for working with RBI files. I believe it supports transparently converting between RBI files (Sorbet) and RBS files (Ruby 3).
There is also rbs_parser[2], a C++ parser for RBS files to convert them to RBI files, written by Shopify, a major user of Sorbet.
Stepping back: I haven't personally read many complaints from Sorbet users describing how the current state of RBI/RBS interop gets in the way of what they can actually do with Sorbet. Almost all the feature requests we get about Sorbet (both inside Stripe and outside) are for fixing bugs or implementing new language-level features. RBI files as implemented seem to work.
Sorbet already has an extensive set of RBI files covering the Ruby standard library (at least as good or better to my knowledge than any existing repository of types for RBS files), and there are plentiful tools for working with RBI files, listed here.[3]
If lack of first-party RBS support in Sorbet is holding you back from trying Sorbet, I'd strongly encourage you to give Sorbet a try anyways! Many people have shared great experiences adopting Sorbet in their Ruby codebases.
What are your thoughts on eventually forking the language and moving it in a different direction? Like 100% typed and AOT compiled, similar to Hack/PHP situation with Facebook.
We're reluctant to completely fork the language. We benefit from so many open source tools, technologies, and libraries built around Ruby. For example: we've built:
- A type checker
- LSP-based editor tooling
- A compiler
But we haven't:
- implemented our own GraphQL, protobuf, gRPC, or JSON libraries
- built a Ruby debugger or debug protocol adapter
- built custom performance monitoring tools
- etc.
There might come a time when it makes sense to fork the language, but we've been very reluctant to do so from the start, because we know what we'd be throwing away. "Compatible with Ruby" has been an explicit design principle of Sorbet from the start:
Would you guys ever consider reimplementing the runtime for more aggressive optimizations? Thinking back that was the first step Facebook took towards their Hack transition.
Normally I wouldn't suggest Crystal [1] with Rails Shop given how different they are, but Stripe uses a tight subset of Ruby and Type Check everything I am wondering if they have ever looked at Crystal and what are their thoughts on it. Given they are looking at Java and Go as well.
>Architected this way, the Sorbet Compiler turns Ruby into a language for writing Ruby native extensions! Instead of having to write C, C++, Rust, or some other compiled language to write native extensions, people can continue to write Ruby but gain the benefits of native compiled speeds.
To me this is the biggest feature that could impact the whole Ruby Ecosystem. Along with another Ruby JIT that is currently being tested at Shopify.
Sorbet clearly aims to be interoperable with the rest of the ruby ecosystem, whereas crystal is a relatively small garden - Ruby code does not defacto work in crystal.
every post about Sorbet must have someone talking about Crystal. If everyone suggesting Crystal would actually use Crystal, then a) The Crystal community would be huge or b) wouldn't be suggesting Crystal as "drop-in" to ruby + sorbet.
Here is a bit of a critical read of this blog post, from the point of view of alternative Ruby implementations:
> instead of having to ship an entire language runtime to production
Except they ship the CRuby language runtime in production.
Ruby is not a language that can run without a runtime.
> Not only did we not need Java VM-level interoperability
So they almost discard the entire idea because they don't need a specific additional feature?
> choosing either alternative Ruby implementation would have made for a difficult migration path.
So what is it?
> Stripe relies heavily on gems with native extensions
Yes that's a problem on JRuby (when there is no java extension in that gem), but TruffleRuby supports native extensions, as very clearly stated in many places.
> as you can imagine, a multi-million line Ruby codebase over time starts to depend on Ruby-the-implementation, not just Ruby-the-language.
Except all serious Ruby implementations know they need to be compatible with whatever CRuby does, not just an incomplete ISO specification of the language.
In fact alternative Ruby implementations match CRuby behavior as much as possible for compatibility, even when it seems weird or makes little sense (they report it in this case but have to match behavior anyway).
If you're using the types from Sorbet to compile, that means you must trust them. Does that mean that soundness is mandatory in Sorbet, and more significantly, that sound interop with untyped is always checked in Sorbet?
This is a great question! I've written up an internal blog post at Stripe about this, and now that the beans are spilled I might port it over to my personal blog or the Sorbet blog. The tl;dr:
- The Sorbet types are hints for optimizations in the Compiler. The compiler doesn't blindly trust them, but rather it checks whether they're correct and if so does something faster.
- The Sorbet compiler can frequently check types much faster than the interpreter could because it can look directly at the object representation, rather than having to fall back to calling a full-blown method like .is_a? or .nil?. Many common type checks are a single assembly instruction, so type checks are actually fast most of the time.
- Sorbet is and has always been designed to have runtime type checks.[1] These have been a part of Sorbet since even before we open sourced the typechecker. Every method already does runtime signature checking, when interpreted, and this is no different when compiled.
- The power of LLVM means that a lot of these type tests end up coalescing. For example, if a signature says "this method accepts ints" and then the compiler sees "if this is an int, I can do something faster," that's frequently only one type test, because the power of LLVM magically coalesces the checks.
This is really interesting, thanks. As an academic working on gradual typing and dynamic checks, the sorbet experience is extremely valuable. (I've cited the threshold someone on your team mentioned for when the cost of checks gets too high.) So I'd love to see the blog post.
It sounds like you're taking an approach fairly similar to what Facebook is doing with Hack these days: Dynamic checks implied by types help catch bugs and can be often optimized away, and static types are hints but not trusted, but the type checker means that using the hint is almost always a good idea. Is that accurate?
Yep, that's accurate. If anything, there aren't enough hints in Stripe's codebase, so much so that I'd love people to add more!
When I'm looking at a production performance profile result, trying to figure out why the compiler didn't speed something up, the first thing I do is add or improve types. It has never slowed down the resulting compiled code, and usually speeds it up substantially.
While the progress seems not up to the speed, the approach itself sounds pretty nice. You can make improvements module-by-module, from the bottom to the top. With this approach, it's very easy to set priorities using conventional profilers. No large deviation from usual daily operations.
On the other hand, this won't be very attracting for small projects, where the biggest bottlenecks are in 3rd party frameworks and libraries. Not being 100% compatible with upstream Ruby (from what I heard), this compiler might not be able to save your day.
Very roughly Ruby is about 20-40x slower than C. It used to be quite a bit slower but there has been a lot of performance work in the past handful of years.
For Python people, I believe the Sorbet Compiler is basically to Ruby what Cython is to Python. Although, Cython is happy to compile things without types attached.
Love this work! From what I’ve heard the Hack/HHVM team has been heading in this general direction too, in their case using the typechecker to improve the IR.
A comment of appreciation which I think a lot of OSS devs don't get due to threads being filled with criticism or pessimism.
Sorbet is great - chuck it on your core entity's and core methods and boom it feels great. Thank you team for creating this tool in OSS and continuing to work on it. Looking forward to the future of it! Thank you Jez!
I understand if ruby (MRI) don’t want to make typing or ahead of time compilation mandatory —- I think that’s wise.
But would be neat if this would make its way into official Ruby eventually, so that projects could choose typing + ahead of time compilation if they wanted.
Just as Rails has influenced a lot of Ruby, I hope this will do the same!
I've actually found that a lot of I/O bottlenecks are often Ruby related, for the fact that Ruby doesn't give you a lot of great options in terms of doing that I/O in a non-blocking way. If your Ruby program has to do things like fetch data from upstream via a network request, you have to be very careful not to cause request saturation / queueing if that upstream request starts timing out. Even a well-behaving upstream can cause problems if your user patterns require you to hit it a lot. Shopify was fortunate that we had a lot of spare money flying around and could just scale the shit out of things horizontally, but that's a crude solution.
Granted, there's been a lot of interesting work done on Ruby concurrency lately, but it's far from a solved problem at the language level.
> Curious if there’s anything public about improving ruby performance from the I/O angle mentioned in the post.
I'm currently working on Polyphony [0], a Ruby gem for writing highly-concurrent Ruby apps. It uses Ruby fibers under the hood, and does I/O using io_uring (on Linux, there is also a libev-based backend).
I'm sure you are aware of ioquatix's work on Async/Falcon etc, how do you see your project differing from his work? And why hasn't anything changed in the Ruby server space - it's all processes and threads afaik.
"Stripe relies heavily on gems with native extensions, and as you can imagine, a multi-million line Ruby codebase over time starts to depend on Ruby-the-implementation, not just Ruby-the-language.
I get it, this is a big one. How big though? I wish as a community we had a list of important gems that have an extension, maybe with a combined effort we can port them to the JVM or whatever else so people can switch between JRuby/Truffle/MRuby with relative ease.
I know about the big ones: Of course Postgres/MySql gems come with extensions and probably most Ruby web servers. But what else - what are the big ones?
TruffleRuby supports many C extensions like DB adapters out of the box like CRuby. JRuby has Java ports for most popular gems that ship alongside them. Nokogiri is one gem that's a Rails dependency and stands out as a bit of a pain, but both JRuby and TruffleRuby put in substantial effort to support it too. All three generally work with web servers, including Puma, which ships with Rails. There are still gaps in C extensions that TruffleRuby covers and lack of Java ports for less popular but still used gems. They may have some things like gRPC C extensions that don't work and they don't want to port them either. Generally the ecosystem just works between these popular Ruby implementations.
MRuby uses MGems and is just a different ecosystem entirely. It has parallel libraries but they're not shared with the above implementations.
So why the relative lack of popularity for JRuby/Truffle? I know about startup time and a degraded dev experience (compile/restart server etc is slower). Are these the major pain points?
Or is it simply that most Ruby shops have a good enough performance and they're not looking for anything else?
Very slow startup and the compatibility isn't 100%. You don't get the latest Ruby features and every once in a while, code that would work on CRuby doesn't work out of the box. It's kind of a similar situation with PyPy as far as I can tell. It's been around for a long time but not much adoption.
Long before Sorbet existed I explored type signatures for Ruby -- I even used `#sig` (I suppose it is the obvious choice). From that experience, I am perplexed by Sorbet's verbose syntax. Why the use of `#params`? Is it just to be able to add return types as a method call?
After my experiments I came to the conclusion that type information would probably be better if specified in structured comments, something like TomDoc. That way type information would be had just by writing the documentation one should be writing anyway. (Two birds, one stone.) It would also make the type signatures very readable.
Strongly agreed. I think there is a kind of x/y situation¹ going on with regard to Ruby type annotations. Are we trying to make it easier for developers to scan the code and understand what's going on? If so, then type signatures are more like a form of enforced documentation. Or are we interested in the ability to perform static analysis and catch bugs in an automated fashion? Then type signatures are something else.
My beef with Sorbet is that it seems like these two goals have been conflated a bit. While it's true that they aren't entirely orthogonal to each other, you definitely optimize for different things depending on which goal is more important.
If I'm primarily interested in developer understanding, then it's really important to use a DSL for annotations that's easy to visually scan and parse. I don't think Sorbet is great in that department. Even among Sorbet advocates, I often hear complaints about the syntax. That's because there's a direct correlation between the expressiveness of a language (Ruby being very expressive), and the expressiveness of the DSL you need to describe that language.
Sorbet went with an approach that tries to capture as much of Ruby's expressiveness as possible. But the reality of day-to-day Ruby programs is often that you could capture 90% of the use cases with 10% of the DSL footprint. If your primary heuristic is developer ease-of-understanding, then leaving that remaining 10% language coverage on the floor could be your best option.
Something like Typescript has its cake, and eats it too, by virtue of becoming its own language -- it's able to be very expressive while still remaining relatively easy to scan and parse. On the other hand, Sorbet is fighting with one hand tied behind its back in having to stick to standard Ruby syntax.
A few years ago at Shopify, I worked on an internal comment-based type checker for Ruby that would instrument our methods at runtime, and compare their inputs/outputs to what was declared above in YARDOC. It was definitely skewed more towards developer understanding than program-correctness. You couldn't really use it for static analysis, but it was pretty good at providing guidance unobtrusively. It was just comments, after all -- you could color them however you liked in your editor.
Right before we started adopting Sorbet in our monolith, we had high-level discussions about which type checker we wanted to use, and we ultimately went with Sorbet. I think that was probably the right call (more cumulative momentum). But I do still wish that Sorbet optimized a bit more for ease-of-understanding by restricting itself to a simpler DSL (at the expense of some descriptiveness).
It's possible to write a YARDOC-to-rbi converter so that you can still rely on comments instead of having to pepper your code with Sorbet annotations proper, but I haven't seen much use of that in the wild. We used such a converter at Shopify for migrating off of YARDOCs in places, though.
So what was/is the strategy at Shopify for typing - was the policy everything is going to eventually be typed? If so, I don't understand all the JIT/CRuby efforts Shopify is doing; they can just use this Sorbet compiler I guess.
I left Shopify late last year, so I don't know what the current internal stance is, nor do I speak for the company in any capacity.
But before I left, the strategy was to begin making inroads into typedness by attacking the most important places first (which in our case was often the inter-component abstraction boundaries). Teams were encouraged to add Sorbet annotations to their intra-component code where appropriate but it was not a blanket requirement.
Regarding JIT/CRuby/Sorbet compiler -- I don't think it's either/or. A company of Shopify's size can afford to fund work on multiple projects with overlapping goals.
There is an interpreter VM in progress, check asterite progress in the asterite:repl branch . Very cool addition. It was announce in the virtual conference last month. Check that video, too.
How does one generally get to work on compilers? Is it mostly through academia? Through open source contributions to compilers?