Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The author of that post has it _so easy_! Formatting well-formed programs, in a young language with a short spec. He just starts with an AST!

No no my friend. The challenge is when the program is _not_ well-formed, i.e. has errors, or worse yet - refers to to code elsewhere, so you can't even decide if it's well-formed or not. So no AST for you. You have to go into the dark and dangerous word of speculative and partial parsing of language grammars, representing ambiguity, replacing undefined, ambiguous or erroneously-defined semantics with deductions from the author's existing formatting and so on. That's where the real challenges lie.

And - it's not gonna be 3,853 lines, I can tell you this much.



> with a short spec.

Oh, how I wish. Dart is no C/C++ with preprocessor insanity, but it has quite a rich grammar.

> The challenge is when the program is _not_ well-formed, i.e. has errors

Being able to format erroneous code has come up as a feature request a number of times. It would be useful because then you can format even in the middle entering code into your IDE. We do have a very robust parser that can handle incorrect, incomplete code and give you a partial AST.

One of the reasons I've hesitated so far is that testing that in any sort of rigorous seems really difficult to me. Not to mention even defining what "correct" formatting is with code like that.

> refers to to code elsewhere, so you can't even decide if it's well-formed or not.

Thankfully most languages are context free, so that's rarely a problem outside of C/C++. :)

Dart does have a couple of semi-contextual syntax corners that could potentially affect formatting. An expression like:

    foo.bar();
Could potentially be:

* Call the method "bar" on the local variable "foo".

* Call the top-level function "bar" imported with prefix "foo".

* Call the named constructor "bar" on the class "foo".

Because dartfmt does not do symbol or import resolution (for many reasons), it can't distinguish those cases. In practice, all that means is that we don't have th eability to have different formatting rules for those different cases and they are all formatted uniformly. So far, that seems to be an acceptable limitation (though it would be nice if we could treat import prefixes as stickier).


I actually really like a "format on save; don't format without AST" combination.

It's really nice knowing as soon as you hit ctrl+s whether you've just written in a syntax error or not.


To see this concept taken to its logical conclusion, check out Casey Muratori write C code with the editor configured to reformat on every keystroke: https://www.youtube.com/watch?v=S3JutszP9fg&t=11m37s

I'm not sure I'd ever go this far, but that might just be years of conditioning by editors that format code much more slowly than the fork of 4coder that he's using in the video.


The idea of autoformatted code editing is intriguing.

I've been exploring that (plus instant preview/eval, but that's another story) in a home-made language with a basic editor. For smaller files/modules it's totally feasible to do on every key press, fast enough to be fairly seamless.

It took a bit of getting used to, but there's something catchy about the experience, like molding clay - if the clay was made of a "smart material" that re-formed itself to an optimal shape.

The immediacy of working with self-reshaping code (and live reload too) reduces mental friction, so I can forget about formatting or compilation altogether. I hope more languages make it a part of the developer experience, IDE, etc.


I feel like it's a subset of a much larger, more general topic of how latency impacts computer interactions. People have long tried to emulate real-world activities in computer programs, but not much attention has been paid to making those programs instantaneous.

When you mold clay, there is no lag - the tactile and visual feedback is immediate. Same with handwriting, painting, or petting a dog. On the other hand, even in this current world of multi-core 4Ghz+ computers most software fails to provide immediate feedback.

100ms, I believe, is the magic number. The maximum delay between cause and effect that still registers as "instantaneous" by the human brain. Yet, surprisingly, it's still a rarely reached target. FPS video game studios might be the only major industry that consistently cares about and delivers on this metric. (Edit: "major" was an intentionally vague word to use here. Obviously, all kinds of embedded software projects have to deal with real-time or soft-real-time requirements.)

If reformatting the whole current code file took 1ms, why wouldn't you enable it? If compilation took 50ms, why not recompile on each new line? There's a magic latency barrier below which actions feel "free", so why not try and move as many of them as possible below that threshold?


> When you mold clay, there is no lag

Wonderful phrasing.

On the importance of immediate feedback in the creative process, it reminds me of audio/music production - in particular, MIDI instruments. The latency between keypress and sound reaching the ear should be as close to zero as possible.

Recently I read a discussion about remote collaborative music performance. From what I understood, network latency is not nearly low enough to achieve it. There's also the physical limit of the speed of light.

10ms was mentioned as acceptable for musicians - but then someone said, even when musicians are in the same room, there can already be too much latency if it's a big room and they're distant from each other, making it difficult to play together.

> why not recompile on each new line?

This is becoming the standard in web development, with incremental compilation (and "hot reload" on the client) of only the changed code/module. I saw that it's getting applied in building mobile apps as well, to get closer to the ideal of instant feedback.

In thinking of the ideal developer experience, I often come back to Bret Victor's work, Learnable Programming. http://worrydream.com/#!/LearnableProgramming


> 10ms was mentioned as acceptable for musicians - but then someone said, even when musicians are in the same room, there can already be too much latency if it's a big room and they're distant from each other, making it difficult to play together.

Sound only travels ~3.4 meters in 10 ms, so it matches up that a large venue easily exceeds that quite a bit.


This is how I have Emacs set up, but when I use VS Code I really appreciate that syntax errors get the red squiggly underlines.


I know that I’ve missed a delimiter when clang-format arbitrarily adds a level of indentation to the latter half of the file.


Yes, it's like people try to make more science of it than it really is...


When does one need to format syntactically invalid code? What would that even mean?


All the time:

1. While you are in the middle of writing your file the first time.

2. When your IDE/editor doesn't see all the includes/imports

3. In languages where syntactic validity depends on semantics, beyond merely known and unknown identifiers.

4. When you make a syntax errors. Which is all the time, basically, until you've gotten your program or library to compile.

5. When you choose to format just a small segment of code, not the whole file.


The idea is that your code could be formatted for you as you type instead of having to wait until you reach a point where your code has no errors and hitting save.


IDEs like IntelliJ do this all the time. For instance, formatting a block of code when it's pasted into a file with syntax errors elsewhere.


If your language can't even be formatted properly if it refers to code elsewhere, I have to wonder what the fuck compilers do: Can't parse the file until we've loaded some unknown number of other files, the names of which we can't determine until we parse the file, which we can't do until we've loaded an unknown number of... C++ by way of MC Escher, sounds like.

Common Lisp has big, meaty, possibly-undecidable macros which compilers need to expand, but Common Lisp formatters are piss-easy because the basic grammar doesn't change unless you go so wild you invent Dylan or something.

So, yes: In the general case of Really Advanced Programming Languages, the amount of information you need to format is the same as the amount of information you need to compile, but that's fine, because it means you can flip it around and see that if you can't format cleanly you're never going to compile anyway, so what's the point of formatting code you can't even use?


> If your language can't even be formatted properly if it refers to code elsewhere, I have to wonder what the fuck compilers do

They are split into passes. With C macros it's totally fine to have a file that says

    #define LBRACE {
And then in a different file the compiler necessarily have to access the first file in order to find the opening brace. It's not even invalid for an include file to contain unbalanced braces with the expectation that whatever code that includes it fixes that.


> If your language can't even be formatted properly if it refers to code elsewhere,

Let me introduce you to a certain obscure language named C++ ...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: