> Hmm, the strong reason could be latency and layout stability. Tree-sitter parses on the main thread (or a close worker) typically in sub-ms timeframes
One of the designers/architects of 'Roslyn' here, the semantic analysis engine that powers the C#/VB compilers, VS IDE experiences, and our LSP server.
Note: For roslyn, we aim for microsecond (not millisecond) parsing. Even for very large files, even if the initial parse is milliseconds, we have an incremental parser design (https://github.com/dotnet/roslyn/blob/main/docs/compilers/De...) that makes 99.99+% of edits happen in microseconds, while reusing 99.99+ of syntax nodes, while also producing an independent, immutable tree (thus ensuring no threading concerns sharing these trees out to concurrent consumers).
> you introduce a flash of unstyled content or color-shifting artifacts every time you type, because the round-trip to the server (even a local one) and the subsequent re-tokenization takes longer than the frame budget.
This would indicate a serious problem somewhere.
It's also no different than any sort of modern UI stack. A modern UI stack would never want external code coming in that could ever block it. So all, potentially unbounded, processing work will be happening off the UI thread, ensuring that that thread is always responsive.
Note that "because the round-trip to the server (even a local one)" is no different from round-tripping to a processing thread. Indeed, in Visual Studio that is how it works as we have no need to run our server in a separate process space. Instead, the LSP server itself for roslyn simply runs in-process in VS as a normal library. No different than any other component that might have previously been doing this work.
> Relying on LSP for the base layer makes the editor feel sluggish.
It really should not. Note: this does take some amount of smart work. For example, in roslyn's classification systems we have a cascading set of classifying threads. One that classifies lexically, one for syntax, one for semantics, and finally, one for embedded languages (imagine embedded regex/json, or even C# nested in c#). And, of course, these embedded languages have cascading classification as well :D
Note that this concept is used in other places in LSP as well. For example, our diagnostics server computes compiler-syntax, vs compiler-semantics, versus 3rd-party analyzers, separately.
The approach of all of this has several benefits. First, we can scale up with the capabilities of the machine. So if there are free cores, we can put them to work computing less relevant data concurrently. Second, as results are computed on some operation, it can be displayed to the user without having to wait for the rest to finish. Being fine-grained means the UI can appear crisp and responsive, while potentially slower operations take longer but eventually appear.
For example, compiler syntax diagnostics generally take microseconds. While 3rd-party analyzer diagnostics might take seconds. No point in stalling the former while waiting for the latter to run. LSP makes multi-plexing this stuff easy
> For roslyn, we aim for microsecond (not millisecond) parsing. Even for very large files, even if the initial parse is milliseconds, we have an incremental parser design [] that makes 99.99+% of edits happen in microseconds
I'm curious how you can make such statements involving absolute time values, without specifying what the minimum hardware requirements are.
I often write code on a 10-year-old Celeron, and I've opted for tree-sitter on the assumption that a language server would show unbearable latency, but I might have been wrong all this time. Do you claim your engine would give me sub-ms feedback on such hardware?
> I'm curious how you can make such statements involving absolute time values, without specifying what the minimum hardware requirements are.
That's a very fair point. In this case. I'm using the minimum requirements for visual studio
> Do you claim your engine would give me sub-ms feedback on such hardware?
I would expect yes, for nearly all edits. See the links I've provided in this discussion to our incremental parsing architecture.
Briefly, you can expect an edit to only cause a small handful of allocations. And the parser will be able to reuse almost the entirety of the other tree, skipping over vast swaths of it (before and after the edit) trivially.
Say you have a 100 types, each with a 100 members, each with 100 statements. An edit to a statement will trivially blow through 99 of the types, reusing them. Then in the type surrounding the edited statement, it will reuse 99 members. Then in the edited member, it will reuse 99 statements and just reparse the one affected one.
So basically it's just the computer walking 297 nodes (absolutely cheap on any machine), and reparsing a statement (also cheap).
So this should still be microseconds.
--
Now. That relates to parsing. But you did say: would give me sub-ms feedback on such hardware?
So it depends on what you mean by feedback. I don't make any claims here about layers higher up and how they operate. But I can make real, measured, claims about incremental parsing performance.
C# Language Designer here, and one of the designers/architects of 'Roslyn', the semantic analysis engine that powers the C#/VB compilers, VS IDE experiences, and our LSP server.
The original post conflates some concepts worth separating. LSP and language servers operate at an IDE/Editor feature level, whereas tree-sitter is a particular technological choice for parsing text and producing a syntax tree. They serve different purposes but can work together.
What does a language server actually do? LSP defines features like:
A language server for language X could use tree-sitter internally to implement these features. But it can use whatever technologies it wants. LSP is protocol-level; tree-sitter is an implementation detail.
The article talks about tree-sitter avoiding the problem of "maintaining two parsers" (one for the compiler, one for the editor). This misunderstands how production compiler/IDE systems actually work. In Roslyn, we don't have two parsers. We have one parser that powers both the compiler and the IDE. Same code, same behavior, same error recovery. This works better, not worse. You want your IDE to understand code exactly the way the compiler does, not approximately.
The article highlights tree-sitter being "error-tolerant" and "incremental" as key advantages. These are real concerns. If you're starting from scratch with no existing language infrastructure, tree-sitter's error tolerance is valuable. But this isn't unique to tree-sitter. Production compiler parsers are already extremely error-tolerant because they have to be. People are typing invalid code 99% of the time in an editor.
Roslyn was designed from day one for IDE scenarios. We do incremental parsing (https://github.com/dotnet/roslyn/blob/main/docs/compilers/De...), but more importantly, we do incremental semantic analysis. When you change a file, we recompute semantic information for just the parts that changed, not the entire project. Tree-sitter gives you incremental parsing. That's good. But if you want rich IDE features, you need incremental semantics too.
The article suggests language servers are inherently "heavy" while tree-sitter is "lightweight." This isn't quite right. An LSP server is as heavy or light as you make it. If all you need is parsing and there's no existing language library, fine, use tree-sitter and build a minimal LSP server on top. But if you want to do more, LSP is designed for that. The protocol supports everything from basic syntax highlighting to complex refactorings.
Now, as to syntax highlighting. Despite the name, it isn't just syntactic in modern IDEs. In C#, we call this "classification," and it's powered by the full semantic model. A reference to a symbol is classified by what that symbol is: local, parameter, field, property, class, struct, type parameter, method, etc. Symbol attributes affect presentation. Static members are italicized, unused variables are faded, overwritten values are underlined. We classify based on runtime behavior: `async` methods, `const` fields, extension methods.
This requires deep semantic understanding. Binding symbols, resolving types, understanding scope and lifetime. Tree-sitter gives you a parse tree. That's it. It's excellent at what it does, but it's fundamentally a syntactic tool.
Example: in C#, `var x = GetValue();` is syntactically ambiguous. Is `var` a keyword or a type name? Only semantic analysis can tell you definitively. Tree-sitter would have to guess or mark it generically.
Tree-sitter is definitely a great technology though. Want to add basic syntax highlighting for a new language to your editor? Tree-sitter makes this trivial. Need structural editing or code folding? Perfect use case. However, for rich IDE experiences, the kind where clicking on a variable highlights all its uses, or where hovering shows documentation, or where renaming a method updates all call sites across your codebase, you need semantic analysis. That's a fundamentally different problem than parsing.
Tree-sitter definitely lowers the barrier to supporting new languages in editors. But it's not a replacement for language servers or semantic analysis engines. They're complementary technologies. For languages with mature compilers and semantic engines (C#, TypeScript, Rust, etc.), using the real compiler infrastructure for IDE features makes sense. For cases with simpler tooling needs, tree-sitter is an excellent foundation to build on.
I wrote several of typescript's initial compilers. We didn't use red/green for a few reasons:
• The js engines of the time were not efficient with that design. This was primarily testing v8 and chakra (IE/edge's prior engine).
• Red/green takes advantage of many things .net provides to be extremely efficient. For example structs. These are absent in js, making things much more costly. See the document on red-green trees I wrote here for more detail: https://github.com/dotnet/roslyn/blob/main/docs/compilers/De...
• The problem domains are a bit different. In Roslyn the design is a highly concurrent, multi-threaded feature-set that wants to share immutable data. Ts/JS being single threaded doesn't have the same concerns. So there is less need to efficiently create an immutable data structure. So having it be mutable meant working well with the engines of the time, with sacrificing too much.
• The ts parser is incremental, and operates very similarly to what I describe in for Roslyn in https://github.com/dotnet/roslyn/blob/main/docs/compilers/De.... However, because it operates on the equivalent of a red tree, it does need to do extra work to update positions and parent pointers.
Tldr, different engine performance and different consumption patterns pushed us to a different model.
I ask because I picked up where the TS and Roslyn teams left off. I actually brought red green trees into JS.
My finding is that the historical reasons against this no longer seem to apply today. With monomorphic code style JS has close enough to structs. Multi threading is now essential for perf.
I don't even think multithreading is the strongest argument for immutability, because it's not only parallelization that immutability unlocks but also safe concurrency, and/or the ability to give trusted data to an untrusted plugin without risking corruption
Note: being more concise is not really the goal of the `?` features. The goal is actually to be more correct and clear. A core problem these features help avoid is the unfortunate situation people need to be in with null checks where they either do:
if (some_expr != null)
someExpr...
Or, the more correct, but much more unweildy:
var temp = some_expr;
if (temp != null)
temp...
`?` allows the collapsing of all the concepts together. The computation is only performed once, and the check and subsequent operation on it only happens when it is non-null.
Note that this is not a speculative concern. Codebases have shipped with real bugs because people opted for the former form versus the latter.
Our goal is to make it so that the most correct form should feel the nicest to write and maintain. Some languages opt for making the user continuously write out verbose patterns over and over again to do this, but we actually view that as a negative (you are welcome to disagree of course). We think forcing users into unweildy patterns everywhere ends up increasing the noise of the program and decreasing the signal. Having common patterns fade away, and now be more correct (and often more performant) is what we as primary purposes of the language in the first place.
As a really long term C# engineer, I feel quite strongly that C# has become a harder and harder language over time, with a massive over abundance of different ways of doing the same thing, tons of new syntactic sugar, so 5 different devs can write five different ways of doing the same thing, even if it's a really simple thing!
At this point, even though I've been doing .net since version 2, I get confused with what null checks I should be doing and what is the new "right" and best syntax. It's kind of becoming a huge fucking mess, in my opinion anyway.
The issues with null-checks are easily avoided though: Just don’t declare values as nullable.
C# grows because they add improvements but cannot remove older ways of doing things due to backwards compatibility. If you wan’t a language without so much cruft, I recommend F#.
I'd love to see some good examples of those bugs you referred to, in order to get some more context.
Is the intent of the second form to evaluate only once, and cache that answer to avoid re-evaluating some_expr?
When some_expr is a simple variable, I didn't think there was any difference between the two forms, and always thought the first form was canonical. It's what I've seen in codebases forever, going all the way back to C, and it's always been very clear.
When some_expr is more complex, i.e. difficult to compute or mutable in my timeframe of interest, I'm naturally inclined to the second form. I've personally found that case less common (eg. how exactly are you using nulls such that you have to bury them so deep down, and is it possible you're over-using nullable types?).
I appreciate what you're saying about nudging developers to the most correct pattern and letting the noise fade away. I always felt C# struck a good balance with that, although as the language evolved it feels like there's been a growing risk of "too many different right ways" to do things.
Btw while you're here, I understand why prefix increment/decrement could get complicated and why it isn't supported, but being forced to do car.Wheel?.Skids += 1 instead of car.Wheel?.Skids++ also feels odd.
Think of it this way. We already supported these semantics in existing syntax through things like invocations (which are freely allowed to mutate/write). So `x?.SetSomething(e1)`. We want properties to feel and behave similarly to methods (after all, they're just methods under the covers), but these sorts of deviations end up making that not the case.
In this situation, we felt like we were actually reducing concept count by removing yet another way that properties don't compose as well with other language features as something like invocation calls do.
Note: when we make these features we do do an examination of the ecosystem and we can see how useful the feature would be. We also are community continuously with our community and seeing just how desirable such a feature is. This goes beyond just those who participate on the open source design site. But also tons of private partners, as well as tens of thousands of developers participating at our conferences and other events.
This feature had been a continued thorn for many, and we received continuous feedback in the decade since `?.` was introduced about this. We are very cautious on adding features. But in this case, given the continued feedback, positive reception from huge swaths of the ecosystem, minimal costs, lowered complexity, and increased consistency in the language, this felt like a very reasonable change to make.
The space there is large and complex, and we have a large amount of resources devoted to it. There was no way that `a?.b = c` was going to change if/when unions come to the language.
For unions, nothing has actually been delayed. We continue working hard on it, and we'll release it when we think it's suitable and ready for the future of the lang.
The only feature that actually did get delayed was 'dictionary expressions' (one that i'm working). But that will hopefully be in C# 15 to fill out the collection-expression space.
Thank you for working on it, I hope we will see it in a release soon.
By delayed I mean that the committee was discussing about discriminated unions since a long time ago and it was never "the right time". You can see the discussions related to implementing discriminated unions on Github.
They also often need a lot of scaffolding to be built along the way. We like breaking hard problems into much smaller, composable, units that we can build into the language and then compose to a final full solution. We've been doing that for many years, with unions being a major goal we've been leading to. At this point, we think we have the right pieces in place to naturally add this in a way that feels right to the C# ecosystem.
That's been part and parcel for C# for over 10 years at this point. When we added `?.` originally, it was its nature that it would not execute code that was now unnecessary due to the receiver being null. For example:
This would already not run anything on the RHS of the `?.` if `Settings` was null.
So this feature behaves consistently with how the language has always treated this space. Except now it doesn't have an artificial limitation on which 'expression' level it stops at.
Sure. But that was the case prior to this feature as well. Previously you'd have to store the variable somewhere and null check it. Those could all lead to different outcomes. This just encapsulates that pattern safely in a regular manner. :)
> Cute, but is this actually needed? It's one more thing to remember, one more thing to know the subtleties of, and for what?
Hi there! C# language designer here :-)
In this case, it's more that this feature made the language more uniform. We've had `?.` for more than 10 years now, and it worked properly for most expressions except assignment.
During that time we got a large amount of feedback from users asking for this, and we commonly ran into it ourselves. At a language and impl level, these were both very easy to add in, so this was a low cost Qol feature that just made things nicer and more consistent.
> It feels like the C# designers have a hard time saying "no" to ideas coming their way.
We say no to more than 99% of requests.
> We're trading brevity for complexity
There's no new keyword here. And this makes usage and processing of `?.` more uniform and consistent. Imo, that is a good thing. You have less complexity that way.
Thank you for all the hard work on C#! I’ve been loving the past 5 years of developments and don’t agree with the parent comment here.
p.s. I will take the opportunity to say that I dream of the day when C# gets bounded sum types with compiler enforced exhaustive pattern matching. It feels like we are soooo close with records and switch expression, but just missing one or two pieces to make it work.
Thanks @klysm! I think we're getting close to that. `unions` are a big part of this discussion, and we're working very hard to hopefully get them in soon :)
I should have been clearer in my message. This specific feature is nice, and the semantics are straightforward. My message is more about some the myriad of other language features with questionable benefits. There's simply more and more "stuff" in the language and a growing number of ways to write the same logic but often with subtle semantic differences between each variant. There are just too many different, often overlapping, concepts. The number of keywords is a symptom of that.
Just a minor correction (as I'm the author of c#'s raw string literal feature).
The indentation of the final ` """` line is what is removed from all other lines. Not the indentation of the first line. This allows the first line to be indented as well.
Cheers, and I'm glad you like it. I thought we did a really good job with that feature :-)
Really not trying to go into any of the "holy wars" here, but could you please compare C#'s feature to Java's multi-line strings? I'm only familiar with the latter, and I would like to know if they are similar in concept or not.
Hi there! I'm a developer on the .Net team, and I heavily work on our IDE offerings. You can absolutely write a small program that is only a CLI, and our tooling is heavily tailored to make that a great experience. First off, you don't need to use an IDE for this at all (if you don't want to). You can just do `dotnet new ...` from the command line to spit out what is needed to do CLI development.
If you do want to use an IDE, there are many choices available. First party options include Visual Studio itself (which has varying skus depending on what you're interested in). For just CLI development, the Community sku would work great. Then there is VSCode, which has both the open-source "C# Extension" (also built by us), and the closed-source add-on "DevKit" which enhances that further with more features".
Regardless of which environment you use, writing a CLI is extremely simple, and the language and environment cater to it. A simple 'Hello World' for C# literally is just:
Console.WriteLine("Hello World!");
And you can grow on that as you want to flesh out whatever your CLI needs to do. If you're interested in doing anything web/server related, then ASP.NET Core also fits into this very simply, allowing you to stand up a web server from your CLI app trivially.
We def want .Net, including the language, runtime, and tooling to scale all the way from these sorts of experiences to the "enterprisey" space, in a clean and consistent fashion.
If you do run into issues with any of the above, def let us know. You can see all the work we do in .Net over at github.com/dotnet/... Including what's being worked on now, and what we're continuing to invest in for future releases.
You may not know this, but ESRI is more or less the Microsoft of GIS (Geographic Information Systems). While they largely settled on Python, some of the newer "add-ins" for ArcGIS Pro rely on .NET, instead. As such, I thought it would be a good idea to begin looking into .NET, C#, and the like to see if I could develop some add-ins myself.
I'm one of those solo "dark matter" developers who ends up writing middleware, custom ETLs, and such that almost nobody will ever see and I had begun to despair of finding tooling for "the little guy." I will look into the SKUs you mentioned, it gives me some hope.
You can see the entire history of the proposal there. To answer you specific question, we went with `..` because that's what the language already uses for the complimentary 'pattern matching deconstruction' form for collection patterns.
In other words, you can already say this today:
if (x is [var start, .. var middle, .. var end]) { ... }
So the construction compliment to that is:
M([start, .. middle, end])
We very much want 'construction/deconstruction' to have this sort of parity, and we will be continuing to follow that principle with new features we are continuing to invest in.
--
Now, if your next question is "why was .. picked for collection deconstruction?" the answer is "because we considered that the nicest syntax from the choices we considered". Syntax is often very subjective, and we often come up with numerous forms to consider (along with examining other languages to see what they've done). In this case, we simply preferred `..` over `...`. The extra dot didn't add anything for us, and we also felt like it might be an operator we might want to use in the future for other language features.
--
Finally, if you're interested in these sorts of questions/designs, def participate on github.com/dotnet/csharplang. We do all our design in the open over there, and are always interested in community perspectives on these sorts of things.
A general offtopic comment, but something that confused me when I was looking at the github issues and docs for the spread operator.
I've noticed that `...` is frequently used to denote that there's some code that's omitted. However with the spread operator being added (and being `...` in other languages) and "we also felt like it might be an operator we might want to use in the future for other language features" it's a bit confusing at a glance whether that's new syntax or just a shorthand. And it makes searching difficult.
I'm not sure what to replace it with, possibly a comment would be clearer: `/* code here */` or at least `/* ... */`.
P.S. Collection expressions are an awesome new addition!
One of the designers/architects of 'Roslyn' here, the semantic analysis engine that powers the C#/VB compilers, VS IDE experiences, and our LSP server.
Note: For roslyn, we aim for microsecond (not millisecond) parsing. Even for very large files, even if the initial parse is milliseconds, we have an incremental parser design (https://github.com/dotnet/roslyn/blob/main/docs/compilers/De...) that makes 99.99+% of edits happen in microseconds, while reusing 99.99+ of syntax nodes, while also producing an independent, immutable tree (thus ensuring no threading concerns sharing these trees out to concurrent consumers).
> you introduce a flash of unstyled content or color-shifting artifacts every time you type, because the round-trip to the server (even a local one) and the subsequent re-tokenization takes longer than the frame budget.
This would indicate a serious problem somewhere.
It's also no different than any sort of modern UI stack. A modern UI stack would never want external code coming in that could ever block it. So all, potentially unbounded, processing work will be happening off the UI thread, ensuring that that thread is always responsive.
Note that "because the round-trip to the server (even a local one)" is no different from round-tripping to a processing thread. Indeed, in Visual Studio that is how it works as we have no need to run our server in a separate process space. Instead, the LSP server itself for roslyn simply runs in-process in VS as a normal library. No different than any other component that might have previously been doing this work.
> Relying on LSP for the base layer makes the editor feel sluggish.
It really should not. Note: this does take some amount of smart work. For example, in roslyn's classification systems we have a cascading set of classifying threads. One that classifies lexically, one for syntax, one for semantics, and finally, one for embedded languages (imagine embedded regex/json, or even C# nested in c#). And, of course, these embedded languages have cascading classification as well :D
Note that this concept is used in other places in LSP as well. For example, our diagnostics server computes compiler-syntax, vs compiler-semantics, versus 3rd-party analyzers, separately.
The approach of all of this has several benefits. First, we can scale up with the capabilities of the machine. So if there are free cores, we can put them to work computing less relevant data concurrently. Second, as results are computed on some operation, it can be displayed to the user without having to wait for the rest to finish. Being fine-grained means the UI can appear crisp and responsive, while potentially slower operations take longer but eventually appear.
For example, compiler syntax diagnostics generally take microseconds. While 3rd-party analyzer diagnostics might take seconds. No point in stalling the former while waiting for the latter to run. LSP makes multi-plexing this stuff easy