There's a patch to make the current conservative GC a precise GC, which will totally fix this problem. It's in the process of being reviewed and integrated into the core.
FWIW, the panelists on the recent "Go in production" panel at I/O responded that garbage collection wasn't an issue for them on 64-bit systems. http://www.youtube.com/watch?v=kKQLhGZVN4A
That's awesome, kudos. How do you deal with maintaining the distinction between pointers and values on the stack and in registers? I didn't see any code to do that, skimming through that patch. Since it's a precise GC, you need to maintain the distinction.
In Rust we have a very large patchset to LLVM to allow us to do this; it involves modifying nearly every stage of the LLVM pipeline. Curious how you did that in gccgo.
Edit: I just looked at the source and it still seems conservative for the stack and registers. So it's not a precise GC yet and will not eliminate false positives. (This is of course totally fine, I just wanted to clarify for myself.)
That's too bad. Hopefully someone will improve `gccgo` as well. I actually started out experimenting with Go using the `gc` compiler, and found performance (basically manipulating large arrays sequentially) atrocious. Moving to `gccgo` improved performance four-fold, making my unoptimized Go test code in some cases faster than my unoptimized C test code.
Go 1.0.2, which is the newest stable release. I will see if I have some extra time to file a report.
Slightly off topic (but relevant to my comment above), do you know why this:
func TraverseInline() {
var ints := make([]int, N)
for i := range ints {
sum += i
}
}
should be slower than:
func TraverseArray(ints []int) int {
sum := 0
for i := range ints {
sum += i
}
return sum
}
func TraverseWithFunc() {
var ints := make([]int, N)
sum := TraverseArray(ints)
}
Seeing a huge difference here. With an array of 300000000 ints, TraverseInline() takes about 750ms, whereas TraverseWithFunc() takes 394ms. With gccgo (GCC 4.7.1), the different is much slighter, but there's still a difference: 196ms compared to 172ms. (These are best-case figures, I'm doing lots of loops.) Go is compiled, not JIT, so I don't see why a function should offer better performance.
I don't have any immediate insight for the difference in performance you're seeing (my hunch is that it's to do with where and how memory is being allocated), but I would suggest you ask about it on the Go-Nuts mailing list: https://groups.google.com/forum/?fromgroups#!forum/golang-nu...
The Go authors should be very receptive and informative on such an issue.
Not sure why the difference in performance between function and non-function, particularly considering this looks inline-able.
2 theories on why gcc is doing better. Maybe its unrolling the loop, and avoids a bunch of jumps. Alternatively, it could be using SSE, but this theory is a much less likely, because I would expect it to be 4 times faster not twice as fast. gc doesn't use SSE instructions yet.
I'm not seeing any difference between the two functions using the 1.0.2 x64 gc compiler. Here's the code I used (slightly modified to compile): http://play.golang.org/p/gcf0BOGncQ
If you want to dig deeper and are not averse to assembly, you can pass -gcflags -S to 'go build' and see the compiler output. Here's what I got for the above code:
The numbers are from Go 1.0 as that's what is available on Ubuntu Precise, my test box. (Getting similar numbers with 1.0.2 on a different box where I don't have gccgo.)
I realized that the first loop was using "i < len(ints)" as a loop test, which turns out to be more expensive than I thought, and the compiler doesn't optimize it into a constant (which would be expecting too much, I guess). After rewriting the test, the function call case is only slightly faster, although it is still significantly faster with gccgo.
I've talked with Chris Lattner about upstreaming it and the LLVM developers seem open to it when it's ready. In the meantime you can find the most recent work here:
Currently, you are mostly out of luck if you try to use mainline, unless you're okay with the llvm.gcroot intrinsics, which pin things to the stack so they can never live in registers (causing a corresponding performance loss). Probably your best bet is to do what Go will do once the patch enneff was referring to is merged: scan conservatively on the stack and precisely on the heap. You can still get false positives and leaks, and you can't do moving GC that way, but it's better than fully conservative GC.
By far, the hardest part of GC is precisely marking the stack and registers. Yet you have to be able to do it if you want to prevent leaks. Our goal is to put in the hard work so that new languages will be able to have proper GC without having to roll a custom code generator or target the JVM or CLR.
> Our goal is to put in the hard work so that new languages will be able to have proper GC without having to roll a custom code generator or target the JVM or CLR.
As someone who has ambitions of writing a fun little language this is exactly what I want out of LLVM. Can't wait for your patches to hopefully get to mainline.
http://codereview.appspot.com/6114046/
FWIW, the panelists on the recent "Go in production" panel at I/O responded that garbage collection wasn't an issue for them on 64-bit systems. http://www.youtube.com/watch?v=kKQLhGZVN4A