This is a pretty crappy article in my opinion, and certainly won't do anything to help the problem that the author describes - that developers' knowledge of concurrency comes piecemeal without any structure.
I challenge anyone to gain an understanding of volatile variables from his description. His example of volatile vs. synchronised contains an example of incorrect code but not an example of how it should be fixed, and doesn't even contain a particularly clear explanation of why it's bad.
The explanation of the bytecode generated for a synchronised method compared to a synchronised block is curious if you're into bytecode but totally useless for a working programmer, and offers no guidance on choosing between them. Worse, it actually makes it sound like the option you almost certainly don't want (the synchronised method) is somehow more efficient than the almost universally better method (synchronising on a lock object, which provides much better lock granularity, prevents deadlock in the case of some external object claiming your object lock and generally makes lock ownership much clearer and easier to track).
Finally, his example of the AtomicReferenceFieldUpdater ignores the javadoc (This class is designed for use in atomic data structures in which several reference fields of the same node are independently subject to atomic updates) and proceeds to show a contrived example of a compareAndSet that is really only a more confusing set, and doesn't actually achieve anything at all.
There are much better sources to learn this stuff if you really need to know it, starting with Java Concurrency In Practice - no Java programmer should be writing concurrent code without having read the first 7 chapters of this.
Agreed. It's hard to discern the real value of using synchronized methods vs synchronized blocks when you end up reading: "Creating the synchronized block yielded 16 lines of bytecode, whereas synchronizing the method returned just 5". As you point out, synchronization on a lock object is, as a rule of thumb, a much better option than synchronizing on the intrinsic lock, and thus losing control over it.
The explanation of volatile is woeful, to say the least. He doesn't even mention the main issue that volatile addresses (the only one?): visibility. Anyone really interested in this should read the following article by the author of JCIP: http://www.ibm.com/developerworks/java/library/j-jtp06197.ht...
I liked the section regarding ThreadLocal though (an example would've helped a bit).
Any limits depend on the particular JVM that you are using. In practice one needs to experiment.
A few years back I did some comparisons between Java and Erlang with respect to the number of threads the system could handle. Somewhere around a thousand threads was sufficient to make the JVM go very slowly.
See Kilim (http://www.malhar.net/sriram/kilim) for a lightweight thread solution for Java that easily outpaces Erlang for thread creation and message passing. It's the basis for the Erjang project which runs Erlang bytecode on the JVM. While it's faster at a low level, it obviously doesn't provide everything that Erlang does (supervisor hierarchies etc). It's still pretty impressive that this can be achieved on the JVM though.
Well, sure. Erlang processes aren't threads. They're more like pre-empted coroutines, with a scheduler that factors in when they're blocking and waiting to receive a message (among many other things).
What would you say the essential difference is betwee pre-empted coroutines with a scheduler and threads?
The only thing I could think of is that with coroutines one can explicitly pass control over to another coroutine, but that is something that Erlang does not provide, so I don't really understand what you mean.
Erlang processes are managed by its scheduler, while threads are managed by the OS. Erlang has more info about appropriate stack sizes, garbage collection is done per-process* , it only pre-empts processes that are currently working or recently got a new message, and probably a half dozen other optimizations I'm not aware of. Let me know if you have more questions, I really like Erlang and I've wanted an excuse to research this. :)
* And often doesn't need to be done at all, because the process terminates first.
In my case it was about trying to make Java threads work in a situation where pre-emptive scheduling was needed for very many threads. The problem with the JVMs I tried was that they all started too many OS threads, and I couldn't get them to limit this enough. I might have done something wrong though. My general feel is that the threading model is too underspecified in Java.
I haven't been doing Java for a while now, but I hope that the situation has become better. The Erlang model of starting threads for all the tasks to do is very alluring once one gets used to it :)
It may be very alluring but it's not going to work on Java since the model is different, and if you try to emulate Erlang on Java you're in for at least one surprise. This doesn't mean it's underspecified, in fact Java has arguably the most completely specified concurrency and memory model around. You just have to understand how it works to get the most out of it, like anything.
The memory model is nicely specified, given that the program is race-free. If it isn't, then all bets are off. There are a lot of programs that are not race-free (although he races in themselves might be benign), and as such have a very weak memory model.
The concurrency model does have a very weak specification. It essentially says that a) threads exist, and b) threads with the highest priority will run at some point in time. This specification was left purposefully vague, AFAIK, so that a JVM implementation can rely on OS threads. Unfortunately, it also means that the programmer has to be aware of the specifics of all relevant JVM+OS combinations where a particular program is run. This is what I meant with underspecified, although I understand the reason for it.
This is absolutely not true. If your program is not race free, then you're going to have problems no matter what your concurrency model is. The memory model is not a property of the program, it's a property of the platform - since in Java you're programming to the JVM platform, the fact that the platform is very exactly specified is what allows you to show that your program is race free.
I think you're looking for a very different sort of specification than what the JMM provides. It doesn't provide you with exact specifications of exactly how threads will behave, but no system that I'm aware of completely specifies that and any program that relies on that is fundamentally broken on any platform except exactly one version of one platform anyway - that which it was written and tested on. It's simply too dependent on the exact OS and version, the exact version of the processor, and a million other factors you can't control.
Instead, the JVM, the JMM and company work together to exactly specify the interactions between threads that are safe. It turns out that this really is all you need to know to write correct concurrent programs, and you don't have to worry about combinations of anything, or how threads are implemented, or how many processors are available, etc etc. Additionally, with the j.u.c libraries in Java 5+ there are sufficiently good abstractions that you don't need to worry about any of this low level gore much.
Seriously, if you're going to be writing multithreaded code in Java, read the book. It's awesome.
No, the JMM specification is not vague. Abridged, it provides a minimum set of guarantees in terms of data visibility between threads -happens-before rules- that the application programmer can expect. Think of it as an abstraction layer that portrays an uniform view of the myriad of hardware configurations on which the JVM runs -e.g. cache coherence strategies-, thus relieving the programmer from the burden of knowing the nuts and bolts of the target system for which its code is intended.
This is a little misleading. It may be true that the Java specification doesn't require Java threads to map onto system threads but for all intents and purposes they're the same in all real implementations. The Executor framework doesn't change this, it gives you a flexible framework for task scheduling using thread pools.
A lot of my earlier work was in formal specification of JVM thread behaviour, so I was very interested in what the standard said and required. I agree that in practice, it probably will map to threads, which is unfortunate. The experiments were mostly to satisfy my curiosity on how stuff behaved in the real world.
For people actually interested in parallel programming, particularly on Java, Herlihy's _The Art of Multiprocessor Programming_ is a fantastic introduction. On top of the grab-bag of techniques you'd expect, it boasts enough formal treatment to help folks learn to reason about correctness in their own programs.
The only part I found interesting was the first part, but only because it raises an interesting question that the author (unfortunately) didn't address:
Supposing you want to synchronize only one part of your method, which approach would cause the JVM to do less work: putting that part in a synchronized block (thus generating extra bytecode instructions) or extracting it into a synchronized method (thus generating an additional call)?
Not that it's really important -- micro-optimizations like that are silly -- but it does make me curious.
The article really fails here, this is a horrible example and horrible reasoning.
I'm no Java expert, but the reason you'd synchronize a smaller block of code, or use any finer grain lock, rather than a whole method is that you'd have a shorter time to wait if many threads are competing for the lock. You'd increase utilization.
Lines of bytecode doesn't really tell you anything, especially since the synchronized method is a stub that does nothing. Not only that, but the synchronized method does all its synchronization work implicitly; the JVM still has to acquire the lock somehow, it's just not written in the bytecode.
What really matter is what happens when the bytecode is compiled to native code, so you could see the synchronized method call overhead. If the article analyzed that, it would be helpful.
reference updates are atomic, but without volatile (or more heavyweight synchronization) there may be an arbitrary delay before the change is visible to other threads.
I challenge anyone to gain an understanding of volatile variables from his description. His example of volatile vs. synchronised contains an example of incorrect code but not an example of how it should be fixed, and doesn't even contain a particularly clear explanation of why it's bad.
The explanation of the bytecode generated for a synchronised method compared to a synchronised block is curious if you're into bytecode but totally useless for a working programmer, and offers no guidance on choosing between them. Worse, it actually makes it sound like the option you almost certainly don't want (the synchronised method) is somehow more efficient than the almost universally better method (synchronising on a lock object, which provides much better lock granularity, prevents deadlock in the case of some external object claiming your object lock and generally makes lock ownership much clearer and easier to track).
Finally, his example of the AtomicReferenceFieldUpdater ignores the javadoc (This class is designed for use in atomic data structures in which several reference fields of the same node are independently subject to atomic updates) and proceeds to show a contrived example of a compareAndSet that is really only a more confusing set, and doesn't actually achieve anything at all.
There are much better sources to learn this stuff if you really need to know it, starting with Java Concurrency In Practice - no Java programmer should be writing concurrent code without having read the first 7 chapters of this.