Hello,
I want to make ticket IGNITE-4365 <https://issues.apache.org/jira/browse/IGNITE-4365>. The problem came from DataStreamerImpl. There are methods which use DataStreamerImpl under the lock (GridCacheGateway), but the method DataStreamerImpl#doFlush() has a "while(true)" loop. And in case when someone is calling the GridCacheGateway#onStopped(), application can get stuck in the loop in DataStreamerImpl#doFlush(), and in trying get a lock in GridCacheGateway#onStopped(). So I need an expert opinion about DataStreamerImpl#doFlush(). 1) Can I just drop unfinished futures in DataStreamerImpl#doFlush() when someone is calling GridCacheGateway#onStopped()? I can track it by adding a volatile boolean flag in the GridCacheGateway. 2) Or better to modify a futures execution DataStreamerImpl#load0() to use onDone with an exception or something like that? Methods which use or might use DataStreamerImpl under the lock: 1) GridCacheAdapter#localLoad() 2) GridCacheAdapter#localLoadAndUpdate() 3) GridCacheAdapter#localLoadCache() 4) GridDistributedCacheAdapter.GlobalRemoveAllJob#localExecute() (it exectly happen in thread dump in ticket) |
Alex, can you please share a test that demonstrates the hang?
--Yakov 2017-06-29 14:27 GMT+03:00 Александр Меньшиков <[hidden email]>: > Hello, > > I want to make ticket IGNITE-4365 > <https://issues.apache.org/jira/browse/IGNITE-4365>. The problem came > from DataStreamerImpl. > There are methods which use DataStreamerImpl under the lock > (GridCacheGateway), but the method DataStreamerImpl#doFlush() has a > "while(true)" loop. And in case when someone is calling the > GridCacheGateway#onStopped(), application can get stuck in the loop in > DataStreamerImpl#doFlush(), and in trying get a lock in > GridCacheGateway#onStopped(). > > So I need an expert opinion about DataStreamerImpl#doFlush(). > 1) Can I just drop unfinished futures in DataStreamerImpl#doFlush() when > someone is calling GridCacheGateway#onStopped()? I can track it by adding a > volatile boolean flag in the GridCacheGateway. > 2) Or better to modify a futures execution DataStreamerImpl#load0() to use > onDone with an exception or something like that? > > Methods which use or might use DataStreamerImpl under the lock: > > 1) GridCacheAdapter#localLoad() > 2) GridCacheAdapter#localLoadAndUpdate() > 3) GridCacheAdapter#localLoadCache() > 4) GridDistributedCacheAdapter.GlobalRemoveAllJob#localExecute() (it > exectly happen in thread dump in ticket) > |
I don't have it. I got all information from thread dump which you added to
the ticket: one thread stuck in the DataStreamerImpl#doFlush() (which was called by GridDistributedCacheAdapter.GlobalRemoveAllJob#localExecute()), and the other in the GridCacheGateway#onStopped() (which was called by GridCacheProcessor#onExchangeDone()). I read about a problem with reproducing (Alexey Kuznetsov's first comment in JIRA) and made the decision to look at the different view. Code still looks dangerous, so I don't think the problem has resolved itself. In thread dump there are 2 tests: 1) GridCacheNearTxForceKeyTest 2) CrossCacheTxRandomOperationsTest They all passed in a single running. 2017-06-29 15:31 GMT+03:00 Yakov Zhdanov <[hidden email]>: > Alex, can you please share a test that demonstrates the hang? > > --Yakov > > 2017-06-29 14:27 GMT+03:00 Александр Меньшиков <[hidden email]>: > >> Hello, >> >> I want to make ticket IGNITE-4365 >> <https://issues.apache.org/jira/browse/IGNITE-4365>. The problem came >> from DataStreamerImpl. >> There are methods which use DataStreamerImpl under the lock >> (GridCacheGateway), but the method DataStreamerImpl#doFlush() has a >> "while(true)" loop. And in case when someone is calling the >> GridCacheGateway#onStopped(), application can get stuck in the loop in >> DataStreamerImpl#doFlush(), and in trying get a lock in >> GridCacheGateway#onStopped(). >> >> So I need an expert opinion about DataStreamerImpl#doFlush(). >> 1) Can I just drop unfinished futures in DataStreamerImpl#doFlush() when >> someone is calling GridCacheGateway#onStopped()? I can track it by adding a >> volatile boolean flag in the GridCacheGateway. >> 2) Or better to modify a futures execution DataStreamerImpl#load0() to >> use onDone with an exception or something like that? >> >> Methods which use or might use DataStreamerImpl under the lock: >> >> 1) GridCacheAdapter#localLoad() >> 2) GridCacheAdapter#localLoadAndUpdate() >> 3) GridCacheAdapter#localLoadCache() >> 4) GridDistributedCacheAdapter.GlobalRemoveAllJob#localExecute() (it >> exectly happen in thread dump in ticket) >> > > |
Free forum by Nabble | Edit this page |