Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A coworker and I tested this out when the announcement was originally made, at least in the use case of iterating a massive array. (Insert 100M trues and one false at the end, and find me the false)

Code is here: https://gist.github.com/Karunamon/abc6483ac1d08f6cc137.

The result was that streams were roughly 4x slower.

Still, I really like some of the new constructs that Java is getting - they make the language a bit more expressive, lack of which has always been my main gripe with the language.



Write code to be as clear and expressive as possible, then optimize for performance when you know performance is a problem. This is why I don't mind the performance cost of new language features like this. 90% of the time it won't matter, and you can optimize the other 10.


That's the mindset that eventually makes everything slow and you can't easily see that 10% anymore because it gets more spread out the more abstractions you introduce.

When loops stop looking like loops, it makes it rather harder to find them.


Aka "flat profile". Using these things in full force requires really banking on JIT doing the right thing (if perf is a concern), which is optimistic. A simple loop is just as readable as these one liners, and carries less risk of not being compiled as tightly.


These constructs don't always make code any easier to understand / maintain. I have played a lot with NetBeans' feature that automatically translate loops with their "functional" equivalent. Sometimes the code was so clever I couldn't understand what was happening.

Sure, code is more compact, but compactness is not an end in itself.


Compactness is more of a side effect and can be a detrimental one. Personally I find that this kind of code is more declarative of intent than the more imperative for loop. By using particular functional tools for the job you're avoiding any possibility of a bug in your looping construct and making your purpose explicit.

I also like how it removes boiler plate to handle different container types making it easier to switch to different ones.


This reminds me a lot of Resharper's refactoring common things to LINQ in Visual Studio. A lot of times I just ran it thinking "oh wow that's a neat way to do it" and then reverted it back because it wasn't a common idiom or it made it much harder to understand.

The same thing is going to happen with Java 8 until the feature becomes less shiny.


Sometimes it is the loop that brings most overhead to the processing time (imagine big array of ints, and some simple transformations). In such case refactoring would be easy, but anyway...


If you want new language constructs, why bother writing 90% of your Java code in pure Java to begin with anyway? Use something that compiles to bytecode like Groovy. Profile your app, and write whatever is performance critical in pure Java.


You chose a scripting language as your example. The profiler would probably tell you to rewrite the entire app in Java, which can be tricky because although the Groovy code uses the Java syntax, the semantics are often different. If you use a language for building systems, the profiler would probably tell you nothing needs rewriting.


>(Insert 100M trues and one false at the end, and find me the false)

That sounds like arrays benefited heavily from the easy branch prediction.


Echoing everyone else, Java microbenchmarking is annoying. You should proably use Caliper or whatever anyone else posts.

Of course since I've never used any of them, here's my terrible example with stream beating for loop.

https://gist.github.com/anonymous/2395fb0728e491bc54f5

Warmup Warming up done 9262288 # for loop 6156414 # stream Done

edit: tuning WARMUP_RUNS to something lower (like 500 vs 10k) and for loop wins consistently vs warmup runs at 10k.

If I put on -XX:+PrintCompilation I see some extra compilation output but I don't really understand the output. I assume some of this contributes but there's way more output then I expected tbh

     4929  263       3       java.lang.invoke.LambdaForm$DMH/1581781576::invokeStatic_L_L (14 bytes)   made not entrant
   4929  339       4         java.util.function.Predicate::isEqual (20 bytes)


Yes your example is bad. If JIT inlines through, the code is trivially dead and can be removed. So yes, please use proper bench harness (JMH).


Ya I thought it might mark it as dead but even if I append the result to something like a List and print the list at the end (so it can't just not run the code?), Stream wins. Anyway I'm off to work for today, maybe I'll post in evening.

edit: https://gist.github.com/anonymous/ed0d8f4a5c6553fe8435

Is there some way in this example it could not actually run the code here?


Well, you're not really testing for loops since 5there are other artifacts here:

1) forEach driver method is receiving multiple types, it's not monomorphic 2) you may be hitting OSR compilations 3) for loop may hit range checks on each get() 4) for loop version warms the cache for the stream version and this benchmark is mem ref heavy

So, please try to use JMH to get more accurate picture. And, as mentioned, this isn't really testing for loop vs streams.


I'm really surprised that it isn't a zero cost abstraction, doesn't the Java JIT inline?


HotSpot (the OpenJDK JVM) does, and it should in this case, too, but this usage suffers from "the inlining problem"[1] and/or the profile pollution problem[2]. These are problems that are continuously addressed and improved with each release, but have not yet been satisfactorily resolved.

[1]: http://www.azulsystems.com/blog/cliff/2011-04-04-fixing-the-...

[2]: https://wiki.openjdk.java.net/display/HotSpot/MethodData


I think Graal does more aggressive inlining than Hotspot.

Looking forward to the day it will be in the reference JDK.


I don't think the 100M entries example suffers from inlining or profile pollution. It's just that the manual loop is going to be as tight as you can get it and it's likely the stream version leaves artifacts behind that are noticeable when the loop kernel is dead simple like this.

-XX:UnlockDiagnosticVMOptions -XX:+PrintInlining will tell you whether this inlined, and dumping the JIT asm can be done to see what was actually generated.


Did you see that they're planning on making Graal a plugin of the standard OpenJDK build in Java 9? It's right there: http://openjdk.java.net/jeps/243


I know, but I imagine it will only be fully merged if Project Sumatra turns out to be integrated.


Sumatra is dead, AFAIK. But why would graal depend on sumatra for integration? Bigger challenge is how to bootstrap graal itself such that compilation time and thus time to peak perf isn't degraded substantially.


Really?!

> But why would graal depend on sumatra for integration?

The other way around. Since Sumatra depends on Graal, it being integrated, means Graal also has to be.

Pity, since I learned about Maxime and JikesRVM back in the day, I have looked forward to the day the reference JVM would be meta-circular.


Sumatra was about GPU offload, not so much metacircular JVM. If you look at sumatra dev mailing list archive, you'll see the last email there states it's not in active development. The project appeared to have been driven by AMD, but they may have re-prioritized things.

I also don't think it's currently possible to write the bulk of the JVM in java, if you want comparable performance and memory footprint to Hotspot.


Sumatra was using Graal.

> I also don't think it's currently possible to write the bulk of the JVM in java, if you want comparable performance and memory footprint to Hotspot.

Better check Graal and JikesRVM research papers then.

One reason why reference JDK JIT doesn't get rewritten is the ROI.

Just check how long has taken to rewrite C# and VB.NET compilers while keeping the new compilers 1:1 compatible or the new RyuJIT and the multiple AOT compiler iterations in .NET land.


As already mentioned, Graal is just the JIT compiler, it's not the entire VM. JikesRVM is a research VM, which has different needs/characteristics from production JVMs.


But why do you think Graal wouldn't be integrated? They've already started the process with Sumatra or without.


Maybe with Sumatra the integration would be deeper than just an API, e.g. replacing HotSpot completely and also add SubtrateVM into it.

I don't know, lets see how it turns out.


Graal is not a HotSpot replacement. It's a JIT for HotSpot or an AOT compiler for SubstrateVM which is a separate JVM altogether. If Graal matures and proves itself, it will become HotSpot's JIT. And Substrate may or may not become a product regardless.

And project Sumatra -- while cool -- was never a big influence over OpenJDK's plans. Being able to run streams on GPUs is absolutely awesome, but not the number one priority for the majority of Java users. My point is that Sumatra wouldn't have played a significant role in the decision of when to make Graal HotSpot's default JIT.

BTW, you don't even need Graal to be the default JIT in order to support Sumatra, anyway. Graal as a plugin (JEP 243) is good enough for that.


I don't think you want a production/industrial JVM written in java :).


It's not the whole JVM -- just the JIT. What difference does it make what language the JIT is written in? It is my understanding that if Graal proves itself, it will replace C2 (if not C1 as well).


I was replying to pjmlp's comment:

>Maybe with Sumatra the integration would be deeper than just an API, e.g. replacing HotSpot completely and also add SubtrateVM into it.


There are a few commercial ones. :)


Which ones?


I think Java 9 paves the way to (or actually includes) JIT caching.


Don't think there's anything in 9 for JIT caching unless I missed it - do you have a reference? JIT caching is non trivial problem for Hotspot due to the nature of speculative optimization, so it may take quite some time for that to appear. Having said that, Azul has some form of it in its ReadyNow feature, but I don't know the details.

I think likely some form of AOT will be needed.


It's related to Project Jigsaw, but I don't know if it's actually scheduled for Java 9. I assumed the idea is to cache C1 output, not C2. I think I saw it in one of Paul Sandoz's videos. I'll look for it, and if I can't find it, I'll ask Paul.


Yes there is, the JEP 197 - Segmented Code Cache.

http://openjdk.java.net/jeps/197

This is just the early work to be improved in later versions.

However there are commercial JVMs, like J9 that already do JIT caching.


Segmented Code Cache has nothing to do with JIT caching. Or rather, it has as much to do with it as the existing homogenous code cache.


But you'll be able to use it as a 100% Java plugin regardless.


It depends on which JIT you mean, there are plenty to choose from.

In any case, at least JITWatch, Solaris Studio and Intel Intel Amplifier do allow to look at the generated assembly.


I think there's some other secret sauce happening in there, especially since making the process parallelized is as easy as changing .stream to .parallelStream.


that's not a good benchmark, it doesn't take JIT warmup into account.

Should have used JMH instead.

http://openjdk.java.net/projects/code-tools/jmh/


Poo. Nice analysis, btw.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: