A coworker and I tested this out when the announcement was originally made, at l...

RandomBK · on June 10, 2015

Write code to be as clear and expressive as possible, then optimize for performance when you know performance is a problem. This is why I don't mind the performance cost of new language features like this. 90% of the time it won't matter, and you can optimize the other 10.

userbinator · on June 10, 2015

That's the mindset that eventually makes everything slow and you can't easily see that 10% anymore because it gets more spread out the more abstractions you introduce.

When loops stop looking like loops, it makes it rather harder to find them.

vitalyd · on June 10, 2015

Aka "flat profile". Using these things in full force requires really banking on JIT doing the right thing (if perf is a concern), which is optimistic. A simple loop is just as readable as these one liners, and carries less risk of not being compiled as tightly.

sacado2 · on June 10, 2015

These constructs don't always make code any easier to understand / maintain. I have played a lot with NetBeans' feature that automatically translate loops with their "functional" equivalent. Sometimes the code was so clever I couldn't understand what was happening.

Sure, code is more compact, but compactness is not an end in itself.

justinhj · on June 10, 2015

Compactness is more of a side effect and can be a detrimental one. Personally I find that this kind of code is more declarative of intent than the more imperative for loop. By using particular functional tools for the job you're avoiding any possibility of a bug in your looping construct and making your purpose explicit.

I also like how it removes boiler plate to handle different container types making it easier to switch to different ones.

Glide · on June 10, 2015

This reminds me a lot of Resharper's refactoring common things to LINQ in Visual Studio. A lot of times I just ran it thinking "oh wow that's a neat way to do it" and then reverted it back because it wasn't a common idiom or it made it much harder to understand.

The same thing is going to happen with Java 8 until the feature becomes less shiny.

haddr · on June 10, 2015

Sometimes it is the loop that brings most overhead to the processing time (imagine big array of ints, and some simple transformations). In such case refactoring would be easy, but anyway...

laichzeit0 · on June 10, 2015

If you want new language constructs, why bother writing 90% of your Java code in pure Java to begin with anyway? Use something that compiles to bytecode like Groovy. Profile your app, and write whatever is performance critical in pure Java.

vorg · on June 10, 2015

You chose a scripting language as your example. The profiler would probably tell you to rewrite the entire app in Java, which can be tricky because although the Groovy code uses the Java syntax, the semantics are often different. If you use a language for building systems, the profiler would probably tell you nothing needs rewriting.

Gurkenmaster · on June 10, 2015

>(Insert 100M trues and one false at the end, and find me the false)

That sounds like arrays benefited heavily from the easy branch prediction.

potatosareok · on June 10, 2015

Echoing everyone else, Java microbenchmarking is annoying. You should proably use Caliper or whatever anyone else posts.

Of course since I've never used any of them, here's my terrible example with stream beating for loop.

https://gist.github.com/anonymous/2395fb0728e491bc54f5

Warmup Warming up done 9262288 # for loop 6156414 # stream Done

edit: tuning WARMUP_RUNS to something lower (like 500 vs 10k) and for loop wins consistently vs warmup runs at 10k.

If I put on -XX:+PrintCompilation I see some extra compilation output but I don't really understand the output. I assume some of this contributes but there's way more output then I expected tbh

     4929  263       3       java.lang.invoke.LambdaForm$DMH/1581781576::invokeStatic_L_L (14 bytes)   made not entrant
   4929  339       4         java.util.function.Predicate::isEqual (20 bytes)

vitalyd · on June 10, 2015

Yes your example is bad. If JIT inlines through, the code is trivially dead and can be removed. So yes, please use proper bench harness (JMH).

potatosareok · on June 10, 2015

Ya I thought it might mark it as dead but even if I append the result to something like a List and print the list at the end (so it can't just not run the code?), Stream wins. Anyway I'm off to work for today, maybe I'll post in evening.

edit: https://gist.github.com/anonymous/ed0d8f4a5c6553fe8435

Is there some way in this example it could not actually run the code here?

vitalyd · on June 10, 2015

Well, you're not really testing for loops since 5there are other artifacts here:

1) forEach driver method is receiving multiple types, it's not monomorphic 2) you may be hitting OSR compilations 3) for loop may hit range checks on each get() 4) for loop version warms the cache for the stream version and this benchmark is mem ref heavy

So, please try to use JMH to get more accurate picture. And, as mentioned, this isn't really testing for loop vs streams.

zamalek · on June 10, 2015

I'm really surprised that it isn't a zero cost abstraction, doesn't the Java JIT inline?

pron · on June 10, 2015

HotSpot (the OpenJDK JVM) does, and it should in this case, too, but this usage suffers from "the inlining problem"[1] and/or the profile pollution problem[2]. These are problems that are continuously addressed and improved with each release, but have not yet been satisfactorily resolved.

[1]: http://www.azulsystems.com/blog/cliff/2011-04-04-fixing-the-...

[2]: https://wiki.openjdk.java.net/display/HotSpot/MethodData

pjmlp · on June 10, 2015

I think Graal does more aggressive inlining than Hotspot.

Looking forward to the day it will be in the reference JDK.

vitalyd · on June 10, 2015

I don't think the 100M entries example suffers from inlining or profile pollution. It's just that the manual loop is going to be as tight as you can get it and it's likely the stream version leaves artifacts behind that are noticeable when the loop kernel is dead simple like this.

-XX:UnlockDiagnosticVMOptions -XX:+PrintInlining will tell you whether this inlined, and dumping the JIT asm can be done to see what was actually generated.

pron · on June 10, 2015

Did you see that they're planning on making Graal a plugin of the standard OpenJDK build in Java 9? It's right there: http://openjdk.java.net/jeps/243

pjmlp · on June 10, 2015

I know, but I imagine it will only be fully merged if Project Sumatra turns out to be integrated.

vitalyd · on June 10, 2015

Sumatra is dead, AFAIK. But why would graal depend on sumatra for integration? Bigger challenge is how to bootstrap graal itself such that compilation time and thus time to peak perf isn't degraded substantially.

pjmlp · on June 11, 2015

Really?!

> But why would graal depend on sumatra for integration?

The other way around. Since Sumatra depends on Graal, it being integrated, means Graal also has to be.

Pity, since I learned about Maxime and JikesRVM back in the day, I have looked forward to the day the reference JVM would be meta-circular.

vitalyd · on June 11, 2015

Sumatra was about GPU offload, not so much metacircular JVM. If you look at sumatra dev mailing list archive, you'll see the last email there states it's not in active development. The project appeared to have been driven by AMD, but they may have re-prioritized things.

I also don't think it's currently possible to write the bulk of the JVM in java, if you want comparable performance and memory footprint to Hotspot.

pjmlp · on June 11, 2015

Sumatra was using Graal.

> I also don't think it's currently possible to write the bulk of the JVM in java, if you want comparable performance and memory footprint to Hotspot.

Better check Graal and JikesRVM research papers then.

One reason why reference JDK JIT doesn't get rewritten is the ROI.

Just check how long has taken to rewrite C# and VB.NET compilers while keeping the new compilers 1:1 compatible or the new RyuJIT and the multiple AOT compiler iterations in .NET land.

vitalyd · on June 11, 2015

As already mentioned, Graal is just the JIT compiler, it's not the entire VM. JikesRVM is a research VM, which has different needs/characteristics from production JVMs.

pron · on June 11, 2015

But why do you think Graal wouldn't be integrated? They've already started the process with Sumatra or without.

pjmlp · on June 11, 2015

Maybe with Sumatra the integration would be deeper than just an API, e.g. replacing HotSpot completely and also add SubtrateVM into it.

I don't know, lets see how it turns out.

pron · on June 11, 2015

Graal is not a HotSpot replacement. It's a JIT for HotSpot or an AOT compiler for SubstrateVM which is a separate JVM altogether. If Graal matures and proves itself, it will become HotSpot's JIT. And Substrate may or may not become a product regardless.

And project Sumatra -- while cool -- was never a big influence over OpenJDK's plans. Being able to run streams on GPUs is absolutely awesome, but not the number one priority for the majority of Java users. My point is that Sumatra wouldn't have played a significant role in the decision of when to make Graal HotSpot's default JIT.

BTW, you don't even need Graal to be the default JIT in order to support Sumatra, anyway. Graal as a plugin (JEP 243) is good enough for that.

vitalyd · on June 11, 2015

I don't think you want a production/industrial JVM written in java :).

pron · on June 11, 2015

It's not the whole JVM -- just the JIT. What difference does it make what language the JIT is written in? It is my understanding that if Graal proves itself, it will replace C2 (if not C1 as well).

vitalyd · on June 11, 2015

I was replying to pjmlp's comment:

>Maybe with Sumatra the integration would be deeper than just an API, e.g. replacing HotSpot completely and also add SubtrateVM into it.

pjmlp · on June 11, 2015

There are a few commercial ones. :)

vitalyd · on June 11, 2015

Which ones?

pron · on June 11, 2015

I think Java 9 paves the way to (or actually includes) JIT caching.

vitalyd · on June 11, 2015

Don't think there's anything in 9 for JIT caching unless I missed it - do you have a reference? JIT caching is non trivial problem for Hotspot due to the nature of speculative optimization, so it may take quite some time for that to appear. Having said that, Azul has some form of it in its ReadyNow feature, but I don't know the details.

I think likely some form of AOT will be needed.

pron · on June 11, 2015

It's related to Project Jigsaw, but I don't know if it's actually scheduled for Java 9. I assumed the idea is to cache C1 output, not C2. I think I saw it in one of Paul Sandoz's videos. I'll look for it, and if I can't find it, I'll ask Paul.

pjmlp · on June 11, 2015

Yes there is, the JEP 197 - Segmented Code Cache.

http://openjdk.java.net/jeps/197

This is just the early work to be improved in later versions.

However there are commercial JVMs, like J9 that already do JIT caching.

vitalyd · on June 11, 2015

Segmented Code Cache has nothing to do with JIT caching. Or rather, it has as much to do with it as the existing homogenous code cache.

pron · on June 10, 2015

But you'll be able to use it as a 100% Java plugin regardless.

pjmlp · on June 10, 2015

It depends on which JIT you mean, there are plenty to choose from.

In any case, at least JITWatch, Solaris Studio and Intel Intel Amplifier do allow to look at the generated assembly.

Karunamon · on June 10, 2015

I think there's some other secret sauce happening in there, especially since making the process parallelized is as easy as changing .stream to .parallelStream.

the8472 · on June 10, 2015

that's not a good benchmark, it doesn't take JIT warmup into account.

Should have used JMH instead.

http://openjdk.java.net/projects/code-tools/jmh/

tsmarsh · on June 10, 2015

Poo. Nice analysis, btw.