sdf45's comments

sdf45 · on Sept 18, 2015

While the advice in the article is funny, on a deeper level the author seems to argue that program text is a rather basic representation that often obstructs insight into the semantics of the program.

The quest for other program representations is still open. The state of the art uses text as "storage" format plus an IDE that does some semantic analysis to help the developer navigate. Many of the ideas mentined in the article (navigation, coloring, auto-format) have been integrated into IDEs and editors.

With almost 20 years of research between the article then and now, what semantic techniques are you dreaming of in your development environment? What are you missing? What feature would greatly improve your productivity (but is possibly too costly to implement by yourself)?

sdf45 · on Aug 18, 2015

I don't know much about Lua and LuaJIT, so let me ask a naive question:

If you would start out with a JavaScript JIT (like V8) what things would you have to add (i.e. things that are not required to JIT JavaScript) besides the obvious modifications in the parser?

One point I can think of is support for efficient compilation of co-routines.

haberman · on Aug 18, 2015

> If you would start out with a JavaScript JIT (like V8) what things would you have to add (i.e. things that are not required to JIT JavaScript) besides the obvious modifications in the parser?

This is such a tempting thought -- that JavaScript and Lua are similar enough languages that an engine for one could be retargeted to the other with some parser changes and a few new features.

In practice it doesn't work out that way. Here is the story of Tessel, who originally set out to do it the other way around (implement JavaScript on LuaJIT) but reversed course after two years: https://tessel.io/blog/112888410737/moving-faster-with-iojs

"We believed that over time we could cover all of the corner cases of JavaScript with our own runtime, but perhaps we should have taken a cue from the “Wat talk” that it was going to be an uphill battle. While the semantics between JavaScript and Lua are very similar, they are also just slightly different in many ways (For example, the comparisons between null vs undefined vs falsy), and capturing all of those idiosyncrasies has proved frustrating for both the users and developers of the Runtime. [...] I still do believe it’s possible to make a nearly compatible runtime, but it’s going to take much more work than we expected and that resource investment would be an unwise business decision."

yoklov · on Aug 18, 2015

Lua has goto (and JS does not), which means not all cfg's are well structured. That could impact the compiler in certain ways.

Lua also has finalizers, which effect the GC's design. The Lua/C api also makes it impossible for the GC to move objects, which means pretty much every current JS VM's GC is out.

Lua 5.3 has 64 bit integers, which would effect most JS VM's in a significant way (but to my knowledge LuaJIT doesn't support 5.3 so...)

There are other issues too, but these are just off the top of my head.

mraleph · on Aug 18, 2015

> Lua also has finalizers, which effect the GC's design.

V8 has a weak callback mechanism, which (while not exposed to JS) allows reentering JS from inside a weak callback - which means you can emulate Lua's __gc on top of this mechanism.

> The Lua/C api also makes it impossible for the GC to move objects, which means pretty much every current JS VM's GC is out.

If we disregard lua_topointer then Lua/C API only leaks internal pointers for strings (lua_tostring), userdata (lua_newuserdata, lua_touserdata) and threads (lua_newthread,lua_tothread) - everything else is manipulated using lua_State's stack.

This means VM only has to take care with regards to these objects. Userdata and threads can be just allocated outside of movable part of the heap and strings can be "externalized" (i.e. they payload relocated into the immovable space) on first access via lua_tostring. Coincidentally last thing is something that V8 supports[1] (though of course externalization is not a cheap operation as it requires copying).

> Lua 5.3 has 64 bit integers, which would effect most JS VM's in a significant way

Yeah, that's certainly a whole ton of work, but most of this work would be pretty technical.

JS engines might actually get int64/uint64 value types in the future (at some point there was an ES7 proposal - but currently it does not seem to be on track for inclusion).

[1] https://github.com/v8/v8-git-mirror/blob/master/include/v8.h...

piotrjurkiewicz · on Aug 18, 2015

LuaJIT does not support 64 bit integers. It uses double NaN tagging for storing object references. That's the basic principle of LuaJIT design and one of the most important source of its superior performance. In this matter, it is very similar to JS VMs.

Support for 64 bit integers would require to abandon this model and completely redesign the LuaJIT VM. Mike opinion about that was very negative.

I believe that this was one of the reasons he decided to abandon the project: he was disappointed by Lua creators decision to introduce 64 bit ints and the fact that LuaJIT can't be made Lua 5.3 compatible without rebuilding it from scratch (but that's only my personal impression).

striking · on Aug 18, 2015

LuaJIT supports the 5.1 Lua standard, so goto doesn't exist.

Yeah, finalizers and weak key value stores would be an issue in certain versions of JS.

pygy_ · on Aug 18, 2015

LuaJIT supports goto out of the box, as well as many other Lua 5.2 features, some behind à compiler flag.

striking · on Aug 18, 2015

I stand corrected. Thanks!

wolf550e · on Aug 18, 2015

The reason people use LuaJIT instead of v8 is because LuaJIT is faster than v8 (on some code) and is smaller and is more easily embeddable. Or at least that is my impression, I have no personal experience with it.

malkia · on Aug 18, 2015

For me the main reason was the FFI, even if back in the day it did not support re-entrance - e.g. something in the "C" land has to call back "lua" land.

But the JIT in luajit is simply too impressive to skip it over. I was able to quickly prototype things with it, running almost at "C" speed, and some times even faster.

camperman · on Aug 18, 2015

There's a in-depth discussion of those issues with respect to both LuaJIT and Javascript here:

http://lambda-the-ultimate.org/node/3851

barkingcat · on Aug 18, 2015

I'd think that the javascript JIT would need to be gutted and extensively re-engineered. There's a reason why each language has their own jit - in order to speed up specifics parts of the language.

While certain things are similar, the specific optimizations I'm sure follows the spec of the language so closely that it's not readily transferable to other languages.

The general optimization strategies, like tracing, etc are techniques that can be ported, but if you start with a highly optimized jit for javascript, you're gonna have to rewrite large portions - so much that it would as much works as rewriting luajit from scratch.

sdf45 · on Aug 14, 2015

While tooling may help, I think that a completely new approach is necessary.

The correctness invariants of complex C++ programs (such as browsers and JITs) cannot be 'discovered' by static analysis - they must be, at least in part, supplied by the programmer.

C and C++ were not designed to allow programmers to specify such invariants (and have them automatically checked). I am not convinced that introducing them can be done in a clean way.

flohofwoe · on Aug 14, 2015

Hmm true, we could have a new 'safe' keyword (or even #pragma) which would switch off 'unsafe' language features (basically the opposite of Rust's 'unsafe' enforce a stricter, more restrictive code style which is easier for the static analyzer to reason about). The majority of even high-performance C/C++ apps only needs to twiddle bits in very small areas of the code. That's still a lot better then trying to rewrite basically all software that has been written in the last 50 years ;)

sdf45 · on Aug 14, 2015

Please show me how to implement a graph library without using unsafe in Rust.

jganetsk · on Aug 14, 2015

Several answers to this challenge

1) It's doable http://smallcultfollowing.com/babysteps/blog/2015/04/06/mode...

2) Which attack surface would you rather deal with... the small fraction of your graph library that deals with mutability... or all of Adobe Flash?

3) The fact is that "unsafe" doesn't mean unsafe, it means "trust the programmer that this is safe". It's reasonable to assume that safety can be maintained in Rust libraries that use the unsafe keyword.

cousin_it · on Aug 14, 2015

It seems to me that an undirected graph can be represented as a map from keys to sets of keys (which can be integers, strings, or anything else), with the invariant that map[x] contains y iff map[y] contains x. For a directed graph, use two maps for incoming and outgoing, with the invariant that incoming[x] contains y iff outgoing[y] contains x.

This approach doesn't require unsafe anywhere and all graph operations are easy to implement, including deleting nodes. Am I missing something?

lifthrasiir · on Aug 14, 2015

This argument is very flawed, since you are forced to use `unsafe` when you need anything communicating to the outside world (which includes, obviously, `println!`). The very point of Rust is to limit the unsafe surface, not to completely eliminate that.

sdf45 · on Aug 14, 2015

First of all a graph library does not communicate with the outside world.

My point was that even for the ubiquitous task of implementing a graph structure, unsafe is necessary.

So while Rust may provide a clean separation between unsafe and safe code (enforced by the type system), the original problem remains: How do we ensure correctness of the unsafe parts of the code.

lifthrasiir · on Aug 14, 2015

For what it's worth (and I intentionally didn't point this out in the parent), you can make a safe graph library in Rust with a typed arena (slightly less ergonomic and faster) or a refcounted smart pointer (slightly more ergonomic and slower). But this still does not validate your point, since the memory allocator is in many cases unsafe.

On how to ensure correctness of the `unsafe` code: that was what we were doing with the entire C/C++ code for decades, so what's the problem? We could however concentrate on the much less amount of code if we were using safer languages.

zamalek · on Aug 14, 2015

I've never actually used Rust myself - I'm just familiar with their goals: so I'm not entirely sure what you are referring to.

Just keep in mind that the amount of work done on the various C++ compilers is most likely measured in man-decades, where Rust is probably still man-hours.

Do you mean a directed graph or a graphical graph? I've implemented directed graphs in at least two different managed languages (which are more constrained than Rust) and had to use no unsafe breakouts. There might be some complexity for some reason with Rust, but if it's possible in a managed language then surely...

steveklabnik · on Aug 14, 2015

  >  Rust is probably still man-hours.

It's nowhere near the amount of time put into various C++ compilers, but

  1. We use LLVM, so all that time is working for us as well.
  2. Mozilla has been paying at least 4 people for at least a
     few years to write Rust full-time, I would bet we're coming
     up on a person-decade of time for Rust. The project has existed
     for eight years in total, though four of that was just as a side
     project.

dbaupp · on Aug 14, 2015

I would guess that it's well past one decade for Rust (there's been 8 paid people on the team for at least a year).

sdf45 · on Aug 14, 2015

As of yet, a browser (in particular the JavaScript JIT) cannot be implemented in a safe language.

It is not even clear how a safe language that would permit this would look like.

Thinking that banning C/C++ magically solves all problems is naive.

epidemian · on Aug 14, 2015

> It is not even clear how a safe language that would permit this would look like.

If this hypothetical browser is implemented in a safe language that runs on a JIT-compiled environment itself, then it could compile JS to that same safe language and allow that same runtime environment to JIT-compile the resulting compiled JS.

Would that make sense? (I have no empirical data to back this up, but i'd like to know if at least hypothetically this approach could work)