Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A New R6RS Scheme Compiler (weinholt.se)
182 points by widdershins on Oct 2, 2019 | hide | past | favorite | 57 comments


There is definitely room for a scheme compiler. But the scheme ecosystem felt extremely fractured the last time I took a look. Chicken had a decent but aging repo. Chez had a great compiler (I prefer not compiling to C) but it's package ecosystem was not something well advertised if it did have one. Racket has too many dialects which might appeal to some but found the number of sub languages overwhelming for a casual schemer.


Making languages is the thing that Racket does better than any other language, so they do tend proliferate. However, for getting stuff done the majority of work is done in just Racket. You don't need to know the other languages unless you're interested in some specialist task they target.

Back when I was a regular Scheme user Racket had by far the biggest ecosystem and community, so it would be my suggestion if you're interested in exploring Scheme.


> Back when I was a regular Scheme user Racket had by far the biggest ecosystem and community,

IIRC, you were instrumental in building the ecosystem, and attracting many other contributing community members.

Network effects bootstrapping blame, where it's due. :)


Ha! I blame you! :-)


Nobody else does! :)


Does anyone use Racket outside academia/hobby?


Yes. I've written a test framework for another software I'm developing that (optionally) sets up an environment for the executable under test, (optionally) launches and monitors the executable (process state, stderr/stdout), connects to the process via a WebSocket (that's what that executable does) and then tests the surface of the exposed WS-API. As the executable runs under windows & linux, embedded and not, this approach lets me run the test suite under all different hosts and connect to localhost or the thing running elsewhere. So, in my case, you may not see it in a product; but the product we're building is continuously tested with a myriad of racket programs.



>Making languages is the thing that Racket does better than any other language

Have you tried Rebol/Red?


I did and even though I'm forever in love with both Rebol and Red, Racket does languages in a more fun way and powerful way for my own personal taste. Don't know if you have checked this book out but https://beautifulracket.com/ is a wonderful introduction for building languages with Racket.


For me, the power of the Redbol approach lies in the rich set of datatypes, so you can create dialects at the block/value level, leveraging the syntax without having to tokenize at the string level. It's no small thing to be able to create languages that natively understand values like email addresses, URLs, IP addresses, files as distinct from strings, and more.


From brief check on the book I see that Racket has a lot of infrastructure for making languages, but fundamentally I don't see the difference with Parse in Red. One can easily add tokenizer, reader and expander similar to what is there in Racket.


For those that haven't used Rebol or Red. It is a homoiconic language like lisp, but no s-expressions as I understand it.

Rebol is really nice with a lot of built-in primitives that make it super useful although Rebol is kind of dead now (some users exist, but no new versions in a while). Red is being worked on and if it ever reaches 1.0 could be a game changer.

I think it is pretty easy to build little DSLs, but I don't think macros are super easy unless I'm mistaken. If somebody could shed some light for me I'd appreciate it.


You don't need macros in Red (but it's easy to roll out your own macro layer, if one so desires [1]), all code transformations can be achieved at runtime. A brief explanation of how this works is given in [2].

As for Red / Rebol vs. Racket content - I wrote an excessive post on that some time ago [3], in response to similar question.

TL;DR from language creation perspective is that Racket has state-of-the-art macro system and all the necessary infrastructure, and Red has metaDSL (that is, DSL for creating DSLs) called Parse [4], which is basically PEG on steroids (think OMeta) + a different take on homoiconicity, compared to Lisps, which, most of the time, renders custom readers and tokenizers unnecessary.

[1]: https://www.red-lang.org/2016/12/entering-world-of-macros.ht...

[2]: https://github.com/red/red/wiki/%5BNOTES%5D-How-Red-Works---...

[3]: https://www.reddit.com/r/redlang/comments/aebxct/contrast_re...

[4]: https://www.red-lang.org/2013/11/041-introducing-parse.html


I tried it and would consider it a write-once language, similar to Forth in that respect. It seems good for quick hacks but not for long-term maintainable projects.


I'd like more info on that as well (as part of Team Red). And, for comparison, what language(s) you feel are best for that, and what features lead to that.


Can you elaborate?


I have not.


They basically have PEG-kind of parser built into the language allowing you to execute arbitrary code up to modifying the parsing rules themselves.

Here's small shell dialect I made while learning it: https://gitlab.com/maxvel/red-shell-dialect/blob/master/shel...

So you can do things like

  shell/run [cat %/etc/passwd | grep {root} > myvar]
where both `cat` and `grep` can be either shell commands or Red functions and you can redirect input/output from/to variables as well as files. Also you can mixin Red data types, like in example here `%/etc/passwd` is a file instance, not a string.

Also the language itself heavily utilizes this (called Parse dialog) for example they don't have regular expressions at all, JSON is parsed using it as well, GUI system is a fully separate dialog and so on.


s/dialog/dialect/

Concept of a dialect (aka 'embedded DSL') is not tied to Parse - you can implement one with it for sure, but it's not a strict requirement.


Racket 7.4 now comes running on chez if you so wish. Check out [1]. In there there's also the links to the whys, hows and whats of the whole effort.

I've been using racket quite a lot lately, really like it. The thing I do not like about racket is the whole "racket 2" effort. Abandon S-exps? Really? To leave behind the 'popularity boundaries' of lisps? d'oh How about you drop your common USP that you share with the other lisps to become .. what, exactly.. irrelevant?

Cf. your 'dialect' point: In reality, the languages are just (supposed to be) DSLs for specific (sub-)problems of your problem-space. I don't mind, at all. You just require the libraries written in that other dialect from racket and use the stuff in there, et voila.

[1] https://blog.racket-lang.org/2019/08/racket-v7-4.html


I always thought that was the best feature of scheme. Nobody agrees on anything, except mainly R5RS, keeping it from going down the path of "one implementation for everything, always" that most languages suffer from once they become a little popular. Scheme is one of the very few languages that has resisted accruing a lot of bloat throughout its existence (R6RS never was very popular because of that)

People underestimate the upside of being able to paste any example snippet into your repl and have it work transparently in almost every scheme, without having to worry about libraries or eco-systems.

For anything more exotic, you grab the nearest SRFI and squeeze it around until it works in the scheme you're working with.

Scheme: it's so simple, any subgenius could use it!


I am just getting into Scheme, and have heard good things about Chez. Any pointers to what is standard out in the wild? What are the de-facto Scheme tools are people using these days?


https://akkuscm.org/ is a seemingly vibrant package manager for Chez and others. It is also by the author of the article.


The story of Scheme is that it was so easy to implement (relatively) that they ended up with dozens of implementations, so they decided to write a standard... and then started doing the same thing with the standard(s).

It's a language family, or even philosophy, rather than a single language.


The standard was quite ok, until the version 7 disagreement of what should be actually in.


There's always been disagreement. R7RS (small|large) is an effort to resolve that disagreement.


I thought that R7RS was the one that sparked it.


R6RS was the one that really sparked the divide, and for a long time only had a limited number of implementations due to this controversy. The big thing that R6RS brought, in my opinion, was a syntax for creating modules and sharing them. R7RS was an attempt to create an R5RS-like minimal standard that still had a library/module syntax so that you could get a bunch of implementations and share code between them. R7RS-small did this, and then R7RS-large attempts to use the module syntax to create a big body of "industrial strength" code that you can use to get "real things" done with scheme. The R7RS-large process seems to be using the SRFIs as a staging ground for the new modules, standardizing on chunks of them as they go in different "editions".

So the theory now is that you can write a smaller R7RS-small compliant scheme, and do all your language experimentation in that, and someone else could come along and bootstrap that into a fairly useful large implementation without a ton of effort, and modules you write on that scheme could then be reused on other implementations.


Likewise, thanks for clarifying it as well.


There are three ways to think about this.

1. R7RS explicitly decided to repudiate R6RS, leading to the disagreement.

2. R6RS made changes that did not have consensus and the resulting vote didn't have a big enough super majority requirement, leading to a controversial standard.

3. There was always major disagreement about all the relevant points, and R5RS itself was only created by not introducing anything new post 1992, and by not taking a position on the controversial topics. So any standard effort was bound to be controversial.


Thanks for clarifying it.


I just got into Scheme this past month, but I started with GNU Guile, and really like it.


The related article about using CPU alignment checking for "type checking" tagged pointers is quite interesting. I wonder if this technique has been used before (in lisp machines or something?). I'd be interested in hearing about practical matters like performance and CPU support too.

https://weinholt.se/articles/alignment-check/


The SPARC CPU was designed to make use of this. Alignment checking was enabled by default, the tagged add/sub instructions required that the tag in the lower two bits for an integer was 0, tags for pointers were expected to be 3.

Lisp Machines were word addressed so the technique wouldn't work for them.


Lisp machines used tag bits


This looks like impressive progress towards a Scheme implementation for systems programming. I'll be interested to see how GC, interrupt-handling, etc. work.

With an aarch64 backend, and some graphics&GUI work, it might also make sense atop PostmarketOS Linux kernel, for handhelds.[1] If one needs to heavily rework Linux userland anyway, and especially if motivated by enthusiasm, it's an opportunity to rethink, rather than automatically inherit decades of desktop stack.

[1] Notes on approaching this incrementally at https://www.neilvandyke.org/postmarketos/ , though you could alternatively start by building up atop bare kernel&drivers, and Loko Scheme looks like a promising candidate for that.


> It is written in R6RS Scheme and a wafer thin amount of assembly. Once it has been bootstrapped it can self-compile. There is no C code in the compiler or the runtime.

> ...multiboot binaries for bare hardware

Multiple brownie points combo!

Congratulations on the work achieved.


PicoLisp also has a tiny assembler layer. (And two parallel implementations, one in Java, one in C.)


I am aware of it, thanks for bringing it up.

The more the merrier.


About the AGPL license, does it extend to programs compiled with Loko? The author seems to not want it to with the comment about the license not extending to user space. My understanding, however, is that would iff the programs used a library provided by the AGPL compiler. Hence why gcc has the linking exception.


The author has opened an issue[1] for discussion of the license matter on the issue tracker. He writes:

    > Loko is AGPLv3 for various reasons. There are benefits to using the AGPLv3 that I prefer to keep, but it's not a perfect match for a compiler + runtime. To make Loko more widely usable, to even developers of closed source applications, I would like to formulate a suitable exception to the license. People should not need to release the code to their application that is running on Loko. There are precedents in Linux and GCC.
    > An expert in free software licenses is likely needed for this.
[1] https://gitlab.com/weinholt/loko/issues/2


Yes, unless there is a specific exception, the program would be covered by AGPL. Even if you just use it on the server, you have to release your programs code under the AGPL.


Wow, really? So it’s essentially impossible to use this for proprietary software? I knew about the server stuff, but I didn’t think the AGPL bound compiled programs to the terms of the compiler.

If I use an AGPL text editor, does my written text also become APGL?


An (A)GPL text editor works, see for example Emacs, as the text created doesn't contain any part of the editor itself. But a compilers emits code which is based on your source and the compiler itself. Also, it usually links against a standard library which would be AGPL. So this makes your created executable AGPL unless there is an explicit excemption, as it is the case with GCC.


Impossible to use for proprietary software you distribute. So you can write internal software, and only need to share the source with yourself.


Probably more poignantly, it's not usable for open source projects that want to distribute their stuff under a license that's more permissive than the AGPL.

Which I'm guessing is intentional.


Well, it only prevents distributing binaries. You could still do source only distribution.


Actually seems the author might be a bit confused about link exceptions:

"Linux’s license doesn’t extend to user space and I want that aspect to work the same for Loko."

Sounds like they really want something closer to gcc/glibc licensing.

Not quite sure if they really want to close the "proprietary fork running as compiler-as-a-service in the cloud"-hole that using the AGPL (with link exception) would close, though.


Linux kernel has some uapi headers which are usually (transitively) incorporated into your programs by the libc you use. The exception covers this code.


Right, the author might not be aware of:

https://github.com/torvalds/linux/blob/master/Documentation/...

"The Linux Kernel is provided under the terms of the GNU General Public License version 2 only (GPL-2.0), as provided in LICENSES/preferred/GPL-2.0, with an explicit syscall exception described in LICENSES/exceptions/Linux-syscall-note, as described in the COPYING file."

In other words, the Linux kernel isn't pure gplv2, but comes with an exception in order to allow linking.


The compiled program. So you could give users the compiler (under AGPL) and source code to your program under a proprietary license. But you probably wouldn't, I guess.

Any compiled artifacts would need to be distributed (if they are distributed) under AGPL.

I'm a fan of Free software, but much prefer GPL/LGPL for compilers and runtimes/standard libraries.


It depends. . . if the compiled program is 100% compiler output, then it's fine. If the compiled program ends up including bits of library code that ships with the compiler, then, in the absence of special exceptions, the output will also need to be AGPL-licensed.

That's exactly why, for example, LLVM ships with a modified version of the Apache license: http://llvm.org/foundation/relicensing/LICENSE.txt (see the bottom)


I'm having trouble seeing how this won't incorporate a substantial runtime in all builds? It's a full scheme system...


All of my software that is not brief and trivial and so benefiting from it is licensed under the AGPLv3, although I don't use the or any later versions clause, as I don't trust licenses that have yet to be written, and the recent events make me feel all the superior for not having done so. I'm glad to see more AGPLv3 software. My opinion on a batch compiler is different from my opinion on an interactive environment and I'll share a method on how to make money with the AGPLv3:

For a batch compiler, that is a language that doesn't need the compiler to run, write an AGPLv3 compiler without a linking exception and then demand sufficient funds be raised to release a version with the linking exception. This is then a way to write a Free Software compiler and still demand money for it, for many to actually want to use it. I wouldn't license something such as a Lisp compiler with the exception, however, as the implementation generally exists with the program and that removes some advantages of it being AGPLv3. Fortunately, it seems the author has used the AGPLv3 and any desired exception uses vague language not in the license, so it's legally irrelevant. I'd find it best if there were no exception for this program, but it's not my decision.

In closing, this has made me want again to write my own Scheme implementation, so perhaps I'll do that at some point soon.


I had a similar idea with it starting dual-licensed so I could straight-up sell it to companies. Optionally use a more permissive license if it hit a certain total. Each new version is done the same way so upgrades on the more-permissive version still get paid for. Plus, the "get what you pay for" companies might only buy a proprietary version.


Sounds like a very interesting project, but actually I belong to the crowd who would close the tab basically after reading the second paragraph. Which is a pity, because this sounds like an implementation which could be very nice.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: