One good reason to use a VM - JIT's can generate very very fast code. Hotspot ha...

YZF · on April 3, 2014

Can you show us some Java code that's faster than reasonably tuned C compiled with a good C compiler with all optimizations on? Performance issues in Java are often related to memory management and lack of control over the layout of your objects and JITs also have real-time constraints, you can't sit all day while the JIT does it's thing.

elq · on April 3, 2014

I do machine learning, I don't particularly care about object layout - almost everything I deal with is just arrays of floats, doubles, or longs. I'm also not bothered by GC issues as I simply iterate over the same arrays many times. Lastly, with the type of code I run, JITing of hotspots is done long before the first iteration over the data is complete.

A coworker bet that java would be within 30% of similarly structured c code. I didn't believe him, so I rewrote two of my companies computationally expensive algorithms; a variant of matrix factorization and a neural network.

The production version of the C code did some tricks that would not be possible in java, so I rewrote the C versions to be a bit simpler. I then ported the simpler C programs to java, a fairly straight forward and mostly mechanical job.

The MF code was 5-6% slower than in C, the RBM code was about 1% faster than C.

Granted the tricks I was able to exploit in the production versions made those versions decisively faster than the simple java version, but java was and is much better than I realized.

YZF · on April 3, 2014

I'd be interested in seeing the code. I guess my perspective is that with C++ or C getting something to run fast is a process. The first naive implementation will run pretty fast with a good compiler but then you whip out the profiler and look at the generated assembly. It's not just the language it's also the tooling. As an example, you might find that to make good utilization of SSE you want to process multiple matrices concurrently. You may arrange your input data such that you can quickly load the corner element of 4 matrices into an SSE register. You may further rearrange things so that you hide the latency of certain instructions. A good compiler can do some of that for you but the biggest thing is having visibility and being able to exert control at this level. This is often an order of magnitude difference in performance.

Now with a VM you can't really do that. Even if you could for a given implementation of the VM you might get terrible performance somewhere else. The run anywhere VM approach means you're giving up the ability to fine tune things and there's really no way around that. All you can do is try to minimize the impact and I guess JIT is one way. It's certainly true it's a lot faster than it used to be but presumably there are some sweet spots, patterns the JIT is very good at, and some less than sweet spots, patterns where it's not, and your visibility and ability to engineer things is reduced...

Buddha789 · on April 3, 2014

So, you wrote the C code to be slower, and then Java seemed fast? :-) I do numeric software for a living - similar to you: lots of arrays allocated once and iterated over. It would be great if Java could cut it, but I'll never believe when Java people claim that it's nearly the speed of C until they show me a Java FFT that compares to FFTW. One simple benchmark, and I'll believe the Java believers.

x0x0 · on April 4, 2014

I work on ml code as well.

one awesome trick the jit does is in the case of child classes with virtual functions, if there is only one child class used in a run, it can remove the virtual-ness and inline the function used.

so, for example, one place you would see this is if you have distributions, and eg class Distribution has a virtual member function like deviance or some such. However, for a given run of your code, you only ever use one distribution. Also say that the loop that uses the deviance is hot enough that you can see the slowdown from the vtable and the function call. The java jit will inline the deviance function for the distribution you use (assuming the method is small enough, which it often is). The only way to get this behavior in c++ is to lift the decision out of the hot loop and generate one function for each distribution. I've done this and seen noticeable performance gains (10-15% on hours/days long runtimes), but it does not make the code nice to work with.

pjmlp · on April 2, 2014

Very much true, but Sun had to pump lots of research money to make it so.

Had it been an optimizing compiler since day one, it wouldn't had to be like that.

elq · on April 2, 2014

Well... A lot of hotspots potential comes from runtime profiling, this is not necessarily the same as having an optimizing compiler generating code at compile time.

YZF · on April 3, 2014

Modern C compilers have profile guided optimization for that. You run your code, profile it, and the optimizer uses that data in the next pass. So this is something you can get without a JIT compiler.

elq · on April 3, 2014

In my experience, PGO is something that relatively few C/C++ programmers know about or at least use... roughly the same percentage of java programmers know how JIT works. But all java programmers get to benefit from JIT.

Don't get me wrong, a lot of java bugs the hell out of me and I'm personally much more comfortable writing c++. I'm just pointing out that a code running on a virtual machine isn't some horrible back water.