[concurrency-interest] LinkedBlockingDeque deadlock? (original) (raw)
Ariel Weisberg ariel at weisberg.ws
Wed Jul 8 22:57:55 UTC 2009
- Previous message: [concurrency-interest] LinkedBlockingDeque deadlock?
- Next message: [concurrency-interest] LinkedBlockingDeque deadlock?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,
The poll()ing thread is blocked waiting for the internal lock, but there's no indication of any thread owning that lock. You're using an OpenJDK 6 build ... can you try JDK7 ?
I got a chance to do that today. I downloaded JDK 7 from http://www.java.net/download/jdk7/binaries/jdk-7-ea-bin-b63-linux-x64-02_jul_2009.bin and was able to reproduce the problem. I have attached the stack trace from running the 1.7 version. It is the same situation as before except there are 9 execution sites running on each host. There are no threads that are missing or that have been restarted. Foo Network thread (selector thread) and Network Thread - 0 are waiting on 0x00002aaab43d3b28. I also ran with JDK 7 and 6 and LinkedBlockingQueue and was not able to recreate the problem using that structure.
I don't recall anything similar to this, but I don't know what version that OpenJDK6 build relates to.
The cluster is running on CentOS 5.3.
[aweisberg at 3f ~]$ rpm -qi java-1.6.0-openjdk-1.6.0.0-0.30.b09.el5 Name : java-1.6.0-openjdk Relocations: (not relocatable) Version : 1.6.0.0 Vendor: CentOS Release : 0.30.b09.el5 Build Date: Tue 07 Apr 2009 07:24:52 PM EDT Install Date: Thu 11 Jun 2009 03:27:46 PM EDT Build Host: builder10.centos.org Group : Development/Languages Source RPM: java-1.6.0-openjdk-1.6.0.0-0.30.b09.el5.src.rpm Size : 76336266 License: GPLv2 with exceptions Signature : DSA/SHA1, Wed 08 Apr 2009 07:55:13 AM EDT, Key ID a8a447dce8562897 URL : http://icedtea.classpath.org/ Summary : OpenJDK Runtime Environment Description : The OpenJDK runtime environment.
Make sure you haven't missed any exceptions occurring in other threads. There are no threads missing in the application (terminated threads are not replaced) and there is a try catch pair (prints error and rethrows) around the run loop of each thread. It is possible that an exception may have been swallowed up somewhere.
A small reproducible test case from you would be useful. I am working on that. I wrote a test case that mimics the application's use of the LBD, but I have not succeeded in reproducing the problem in the test case. The app has a single thread (network selector) that polls the LBD and several threads (ExecutionSites, and network threads that return results from remote ExecutionSites) that offer results into the queue. About 120k items will go into/out of the deque each second. In the actual app the problem is reproducible but inconsistent. If I run on my dual core laptop I can't reproduce it, and it is less likely to occur with a small cluster, but with 6 nodes (~560k transactions/sec) the problem will usually appear. Sometimes the cluster will run for several minutes without issue and other times it will deadlock immediately.
Thanks,
Ariel
On Wed, 08 Jul 2009 05:14 +1000, "Martin Buchholz" <martinrb at google.com> wrote:
[+core-libs-dev]
Doug Lea and I are (slowly) working on a new version of LinkedBlockingDeque. I was not aware of a deadlock but can vaguely imagine how it might happen. A small reproducible test case from you would be useful. Unfinished work in progress can be found here: http://cr.openjdk.java.net/~martin/webrevs/openjdk7/BlockingQueue/ Martin
On Wed, 08 Jul 2009 05:14 +1000, "David Holmes" <davidcholmes at aapt.net.au> wrote:
Ariel, The poll()ing thread is blocked waiting for the internal lock, but there's no indication of any thread owning that lock. You're using an OpenJDK 6 build ... can you try JDK7 ? I don't recall anything similar to this, but I don't know what version that OpenJDK6 build relates to. Make sure you haven't missed any exceptions occurring in other threads. David Holmes > -----Original Message----- > From: concurrency-interest-bounces at cs.oswego.edu > [mailto:concurrency-interest-bounces at cs.oswego.edu]On Behalf Of Ariel > Weisberg > Sent: Wednesday, 8 July 2009 8:31 AM > To: concurrency-interest at cs.oswego.edu > Subject: [concurrency-interest] LinkedBlockingDeque deadlock? > > > Hi all, > > I did a search on LinkedBlockingDeque and didn't find anything similar > to what I am seeing. Attached is the stack trace from an application > that is deadlocked with three threads waiting for 0x00002aaab3e91080 > (threads "ExecutionSite: 26", "ExecutionSite:27", and "Network > Selector"). The execution sites are attempting to offer results to the > deque and the network thread is trying to poll for them using the > non-blocking version of poll. I am seeing the network thread never > return from poll (straight poll()). Do my eyes deceive me? > > Thanks, > > Ariel Weisberg >
- Previous message: [concurrency-interest] LinkedBlockingDeque deadlock?
- Next message: [concurrency-interest] LinkedBlockingDeque deadlock?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]