David R. MacIver's Blog: Java collections and concurrency

Java collections and concurrency

14 January 2008

This is a general tip about Java collections and concurrency. I’m not the best person to write about this, so I’m going to keep this post limited to a simple note, but it’s an important point which far too many people get wrong.

There are various methods in Collections such as synchronizedList, synchronizedMap, etc. These are for wrapping non threadsafe collections in a way that synchronizes important operations.

Don’t use them. Ever.

In a similar theme, never write code that looks like the following:

synchronized(myMap){
  doStuffTo(myMap);
}

Concurrency is not an afterthought. If you’re going to be doing concurrent programming you should be using datastructures designed for concurrent use. java.util.concurrent has a number of good ones. Further, you should avoid explicitly synchronizing if at all possible and have your structures be internally threadsafe. If you try to ensure thread safety by synchronizing on the structures you’re mutating you will

a) Make a mistake. Almost certainly. This will introduce bizarre bugs which you will have a serious headache tracking down.
b) Have worse concurrent performance than using a properly designed datastructure - e.g. a ConcurrentHashMap has finer grained locking, so it actually is possible for multiple threads to write to it in a safe manner.
c) Have really ugly code with synchronization logic spread all over the place. This is not a minor point - if your threading code is simple, it’s much easier to determine if it’s correct (although still not easy).

Comments

Andreas Krey on 2008-01-14 20:37:00:

Concurrency is not an afterthought.

No, but neither is an explicit synchronized. I often enough run into cases where having a synchronized map isn’t enough, like needing to insert an object into a map only if there isn’t one yet at that place: get and optional put, and those two must be in one synchronized block.

Hence, I need to use synchronized, and since I cannot rely on a ConcurrentHashmap synchronizing itself on its whole this, I need to wrap all other accesses just as you showed not to.

Basically, multithreading is stupefyingly hard, especially if you want it performing fast, correct and deadlock-free, and the specific thing you need hasn’t been addressed in the standard (container) libraries.

(I notice that ConcurrentHashmap has putIfAbsent. Good, but I still need to create the object every time I try.)

David R. MacIver on 2008-01-14 22:57:00:

Absolutely. Sometimes you need more than a single point of synchronization. But when this is the case, I would still strongly discourage the use of synchronized constructs spread throughout your code and instead build a more suitable synchronized datastructure using the building blocks java.util.concurrent provides you with. In particular I tend to build internally synchronized structures with a ReentrantReadWriteLock rather than synchronization.

The problem with using synchronized is that you now have to use it on every single access to that structure (more or less) or your code is incorrect. It puts an unnecessarily high burden on your code for correctness. In your example, I’d be amazed if creating the object every time you tried was actually as much an overhead as synchronizing in concurrent access would be. Additionally, the following code is correct without synchronization:

value = myMap.get(key);

if (value == null){
existingValue = myMap.putIfAbsent(key, newEntry(key));
if (existingValue != null) value = existingValue;
}

This can very rarely result in newEntry being called unnecessarily, but sufficiently rarely that it’s not worth worrying about.

I agree that multithreading is stupefyingly hard. My note is in no way meant to be a definitive guide to writing good multithreaded code! And despite my absolute statements it’s not really the case that explicit synchronization like that is always wrong. Just most of the time. :-) (Using the synchronizedCollection methods *is* always wrong though)

Tom on 2008-01-15 09:14:00:

Ok. Having once coded my own multithreaded OS, including spinlocks and stuff, I can appriciate the finer details of concurrency.

One issue still eludes me. I really like the concept of a business model: you put as much non UI related logic into a single JAR. Then you can have a swing, web or EDI application use that jar and know that all constraints and restrictions are met.

Great. Works like a charm too. However, certain operations may take a while, so for Swing you may want to spawn a separate thread. As of that moment my business model must be thread safe... Oh my.

So I either have one-thread-per-entity-manager or I need a way to make the BM behave right.

Any suggestions?

David R. MacIver on 2008-01-15 11:29:00:

I’m really far from being the best person to ask, but I’ll try to suggest something!

The short answer is “Make your business model threadsafe”. If you can’t guarantee the threading model of the application that’s talking to it then you don’t have a lot of choice.

The best way to do this is to make it almost entirely stateless. My preferred approach is approximately

a) Data objects should be immutable (This is hard. Too many fucking tools require some approximation to the javabean spec. In particular I don’t know how to make this work with JPA. Hibernate might have more specific tools for constructor mapping. I know ibatis does).
b) Almost all your business logic should live in entirely stateless objects.
c) Any classes containing mutable state must be internally synchronized. Preferably using some explicit read/write lock. You will need to think really hard about these classes in order to make sure they’re correct, so they should be as small as possible and as infrequent as possible! Most of these can optionally be replaced with a database if that’s your thing.

An alternative approach is to force some sort of specific threading model on your business logic and communicate via some sort of CSP / actors based approach. But I’ve not actually tried this. :-)

Tom on 2008-01-15 12:03:00:

I see where you are going. However, let’s consider this scenario: there are two worker threads doing their thing in the business model. Both of them updating properties and adding and removed entities. At the same time a Swing window is open showing one of the properties in a text field, bound by a binding library (e.g. jgoodies).

Issues to take into account:

1. a property is changed and via the PCE and binding the worker (non EDT thread) will alter a Swing component.

2. one worker is walking through a collection (e.g. items in a bill) while the other worker is adding to it.

Both issues have nothing to do with state. The second is especially interesting, since it is totally BM internal: should the list be synchronized?

David R. MacIver on 2008-01-15 12:22:00:

Um. Both of those are almost 100% concerned with state. Are you using some meaning of the word of which I am not aware?

Making your business model threadsafe doesn’t automatically resolve all threading issues, but it makes them cross the line from intractable to tractable.

Tom on 2008-01-15 13:10:00:

Maybe we see the concept of “state” differently? State to me means: information associated with a process. For example: an order proces is in the “payment” state.

Stateful would be a servlet that has an instance variable with data associated to one request. That cannot be. Servlets need to be stateless, hence have no data associated with the process of handling one request.

So. Then.

The fact that an entity contains a list of other entities, I do not consider state, that is just, ehm, data?

But the worker thread modifying the list could be considered a process. And thus is the modified list stateful. Hence the BM would not be stateless at the time of modifying.

How does one make the BM stateless? Should the worker thread construct the modified list internally and then “set” it as a whole using a synchronized method, instead of using “add” and “removed” methods?

The BM does not know it is being constructed internally, so it may call all kinds of methods and setters. I do not see that happening.

I’m trying to get my finger behind an overall concept of how to handle a BM in a multithreading environment. So your blog entry lured me in :-)

At the moment I’m coming to the conclusion that either the BM is totally not threading-aware and the application must solve it. Or maybe the Swing alike “one thread for the BM”, so you must hand over any BM calls to the BM-thread.

Ben on 2008-01-16 11:36:00:

In the future, I’d recomend rephrasing these types of posting. It really, really sounds like someone who discovered Java 5’s util.concurrent and now wrote a blog post preaching how everyone else is doing it wrong. A lot of us already migrated. Heck, a really large number of us were using Doug Lea’s library for years and read his book. So I know you don’t mean to come off this way, but blanket statements generally cause me to think the person just wants to get on a blog roll and sound superior.

Java synchronization has gotten pretty fast - its biggest problem was (a) giant lock usage, (b) poor performance at high contention. You can get excellent performance even with synchronized blocks if you minimize those two facets.

I’ve written quite a bit of concurrent code and a fair amount of lock free code too. Its really not that hard and I think those who do find it challenging lack a deep understanding of the hardware. If you understand those details, the outcomes never seem surprising. Take that as a grain of salt, though, since I was a EE in a former life.

I do agree overall. If you have the libraries, there’s no reason not to use them. At the same time, you need to be careful because *many* people come out of these discussions thinking “synchronization is bad” and only synchronize writes, but not reads, believing that minimizes the pain. They don’t fully digest the discussion and can easily write worse code.

David R. MacIver on 2008-01-16 12:33:00:

You’re right, it’s definitely an overstated point and I didn’t write this post very well - it was thrown together quickly

To be honest, I got the emphasis in this post all wrong. The main issue I was objecting to was people using synchronizedCollection methods when there are perfectly good internally synchronized collections available. I see people recommending their usage a lot in ##java and when I call them on it their response is usually along the lines of “Oh. Right. I wonder why I didn’t think of that”. After far too many instances of seeing people recommend incorrect code I got a bit annoyed. Hence the above post, and I stand by my claim that the use of synchronizedCollection methods is flat out wrong.

The general point about synchronization was intended more to be “synchronize internally rather than externally by default” but I stated it rather badly. I definitely don’t mean “don’t synchronize”, just that you should write multithreaded code in such a way that it is correct without external synchronization.

As to the performance, it really wasn’t intended to be the main point. It’s more of a “And on top of all these other problems, your code will be slower!”. I know synchronization is fast (ish). The issue is one of lock contention, not lock acquisition, and the performance point was solely intended to be that ubiquitous synchronized blocks will result in threads waiting on eachother when it’s deeply unneccessary.