[jira] [Created] (IGNITE-13082) Deadlock between topology update and CQ registration.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (IGNITE-13082) Deadlock between topology update and CQ registration.

Anton Vinogradov (Jira)
Andrey Mashenkov created IGNITE-13082:
-----------------------------------------

             Summary: Deadlock between topology update and CQ registration.
                 Key: IGNITE-13082
                 URL: https://issues.apache.org/jira/browse/IGNITE-13082
             Project: Ignite
          Issue Type: Task
            Reporter: Andrey Mashenkov


Relevant stack traces:Relevant stack traces:
{code}"sys-stripe-0-#65483%cache.BinaryMetadataRegistrationInsideEntryProcessorTest0%" #85739 prio=5 os_prio=0 tid=0x00007fda80139800 nid=0x5618 waiting on condition [0x00007fdc018e8000]   java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for  <0x00000000fc138298> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.lockListenerReadLock(GridCacheMapEntry.java:5032) at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2262) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2574) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:2034) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1854) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1668) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3239) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:139) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:273) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:268) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109) at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1626) at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1246) at org.apache.ignite.internal.managers.communication.GridIoManager.access$4300(GridIoManager.java:142) at org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1137) at org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:50) at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:559) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) at java.lang.Thread.run(Thread.java:748)\{code}
{code}"disco-notifier-worker-#65517%cache.BinaryMetadataRegistrationInsideEntryProcessorTest0%" #85777 prio=5 os_prio=0 tid=0x00007fda800a9800 nid=0x5639 waiting on condition [0x00007fdc006d9000]   java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for  <0x00000000fbde5f30> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.localUpdateCounters(GridDhtPartitionTopologyImpl.java:2810) at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$2.onRegister(CacheContinuousQueryHandler.java:379) at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.registerListener(CacheContinuousQueryManager.java:946) at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.register(CacheContinuousQueryHandler.java:628) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.registerHandler(GridContinuousProcessor.java:1818) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.processStartRequest(GridContinuousProcessor.java:1444) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.access$400(GridContinuousProcessor.java:113) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor$2.onCustomEvent(GridContinuousProcessor.java:205) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor$2.onCustomEvent(GridContinuousProcessor.java:196) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:639) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:510) - locked <0x00000000fb58bbc8> (a java.lang.Object) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4$$Lambda$91/1259207939.run(Unknown Source) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2650) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2688) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) at java.lang.Thread.run(Thread.java:748)\{code}
The problematic code is in \{{CacheContinuousQueryManager.registerListener}}. It first acquires CQ listener write lock, and then it acquires topology read lock when update counters are being read.During cache update, we first acquire topology read lock and then acquire CQ listener read lock.If some other thread will try to acquire topology write lock in between, those two threads are deadlocked.
The issue seems to be introduced by IGNITE-10755 (topology read lock is inserted inside CQ write lock), so only 2.7/8.7 and upwards are affected.
The issue causes occasional Cache 1 suite hangs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)