[jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

Anton Vinogradov (Jira)
Joel Lang created IGNITE-6530:
---------------------------------

             Summary: Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager
                 Key: IGNITE-6530
                 URL: https://issues.apache.org/jira/browse/IGNITE-6530
             Project: Ignite
          Issue Type: Bug
          Components: persistence
    Affects Versions: 2.2
            Reporter: Joel Lang


Just started receiving a great deal of warnings about possible starvation in stripped pool.

The stack trace shows it happens in checkpointReadLock() in GridCacheDatabaseSharedManager.

Here are the log messages:

{noformat}
2017-09-28 13:15:12 [grid-timeout-worker-#15%mbe%] WARN  o.a.ignite.internal.util.typedef.G - >>> Possible starvation in striped pool.
    Thread name: sys-stripe-4-#5%mbe%
    Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearAtomicSingleUpdateRequest [key=BinaryObjectImpl [arr= true, ctx=false, start=0], parent=GridNearAtomicAbstractUpdateRequest [res=null, flags=]]]]]
    Deadlock: true
    Completed: 3212
Thread [name="sys-stripe-4-#5%mbe%", id=19, state=WAITING, blockCnt=12, waitCnt=5835]
    Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2e1993dd, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(Unknown Source)
        at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(Unknown Source)
        at o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:847)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1770)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1686)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3063)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:129)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:265)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:260)
        at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1042)
        at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:561)
        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
        at o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
        at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
        at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
        at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
        at o.a.i.i.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
        at o.a.i.i.managers.communication.GridIoManager$9.run(GridIoManager.java:1097)
        at o.a.i.i.util.StripedExecutor$Stripe.run(StripedExecutor.java:483)
        at java.lang.Thread.run(Unknown Source)

2017-09-28 13:15:12 [grid-timeout-worker-#15%mbe%] WARN  o.a.ignite.internal.util.typedef.G - >>> Possible starvation in striped pool.
    Thread name: sys-stripe-5-#6%mbe%
    Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearAtomicSingleUpdateRequest [key=BinaryObjectImpl [arr= true, ctx=false, start=0], parent=GridNearAtomicAbstractUpdateRequest [res=null, flags=]]]]]
    Deadlock: true
    Completed: 3524
Thread [name="sys-stripe-5-#6%mbe%", id=20, state=WAITING, blockCnt=3, waitCnt=6730]
    Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2e1993dd, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(Unknown Source)
        at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(Unknown Source)
        at o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:847)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1770)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1686)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3063)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:129)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:265)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:260)
        at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1042)
        at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:561)
        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
        at o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
        at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
        at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
        at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
        at o.a.i.i.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
        at o.a.i.i.managers.communication.GridIoManager$9.run(GridIoManager.java:1097)
        at o.a.i.i.util.StripedExecutor$Stripe.run(StripedExecutor.java:483)
        at java.lang.Thread.run(Unknown Source)

2017-09-28 13:15:12 [grid-timeout-worker-#15%mbe%] WARN  o.a.ignite.internal.util.typedef.G - >>> Possible starvation in striped pool.
    Thread name: sys-stripe-6-#7%mbe%
    Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearAtomicSingleUpdateRequest [key=BinaryObjectImpl [arr= true, ctx=false, start=0], parent=GridNearAtomicAbstractUpdateRequest [res=null, flags=]]]]]
    Deadlock: true
    Completed: 3074
Thread [name="sys-stripe-6-#7%mbe%", id=21, state=WAITING, blockCnt=1, waitCnt=5686]
    Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2e1993dd, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(Unknown Source)
        at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(Unknown Source)
        at o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:847)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1770)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1686)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3063)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:129)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:265)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:260)
        at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1042)
        at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:561)
        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
        at o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
        at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
        at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
        at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
        at o.a.i.i.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
        at o.a.i.i.managers.communication.GridIoManager$9.run(GridIoManager.java:1097)
        at o.a.i.i.util.StripedExecutor$Stripe.run(StripedExecutor.java:483)
        at java.lang.Thread.run(Unknown Source)

2017-09-28 13:15:12 [grid-timeout-worker-#15%mbe%] WARN  o.a.ignite.internal.util.typedef.G - >>> Possible starvation in striped pool.
    Thread name: sys-stripe-7-#8%mbe%
    Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearAtomicSingleUpdateRequest [key=BinaryObjectImpl [arr= true, ctx=false, start=0], parent=GridNearAtomicAbstractUpdateRequest [res=null, flags=]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearAtomicSingleUpdateRequest [key=BinaryObjectImpl [arr= true, ctx=false, start=0], parent=GridNearAtomicAbstractUpdateRequest [res=null, flags=]]]]]
    Deadlock: true
    Completed: 3249
Thread [name="sys-stripe-7-#8%mbe%", id=22, state=WAITING, blockCnt=4, waitCnt=5892]
    Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2e1993dd, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(Unknown Source)
        at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(Unknown Source)
        at o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:847)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1770)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1686)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3063)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:129)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:265)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:260)
        at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1042)
        at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:561)
        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
        at o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
        at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
        at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
        at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
        at o.a.i.i.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
        at o.a.i.i.managers.communication.GridIoManager$9.run(GridIoManager.java:1097)
        at o.a.i.i.util.StripedExecutor$Stripe.run(StripedExecutor.java:483)
        at java.lang.Thread.run(Unknown Source)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

Dmitry Gorchakov
This issue once reproduced in my productive with similar stack trace. During
sources analysis I saw the use unsafe of method
writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may cause
thread to starvation. Problem no longer appeared after local fix.
I prepared pull request https://github.com/apache/ignite/pull/7445.
Accept please jira issue to me and review my PR.



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

Nikolay Izhikov-2
In reply to this post by Anton Vinogradov (Jira)
Hello, Dmitriy.

Do you have a reproducer? minimal application that can reproduce this specific bug.

Can you, please, run a TC for your PR and get a TC bot visa?

> 25 февр. 2020 г., в 11:08, Dmitry Gorchakov <[hidden email]> написал(а):
>
> This issue once reproduced in my productive with similar stack trace. During
> sources analysis I saw the use unsafe of method
> writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may cause
> thread to starvation. Problem no longer appeared after local fix.
> I prepared pull request https://github.com/apache/ignite/pull/7445.
> Accept please jira issue to me and review my PR.
>
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

slava.koptilin
In reply to this post by Anton Vinogradov (Jira)
Hello Dmitry,

To be honest, I don't quite understand the proposed fix. Could you please
describe in detail the issue you are trying to fix?
By the way, please take a look at
GridCacheDatabaseSharedManager.checkpointReadLock:

    @Override public void checkpointReadLock() {
        if (checkpointLock.writeLock().isHeldByCurrentThread())
            return;
        ...
    }

In other words, the read lock is not acquired in case the write lock is
already held.
So, do we really need to unconditionally unlock the read lock at
checkpointReadUnlock as you propose?

Thanks,
S.

вт, 25 февр. 2020 г. в 13:32, Dmitry Gorchakov <
[hidden email]>:

> This issue once reproduced in my productive with similar stack trace.
> During
> sources analysis I saw the use unsafe of method
> writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may
> cause
> thread to starvation. Problem no longer appeared after local fix.
> I prepared pull request https://github.com/apache/ignite/pull/7445.
> Accept please jira issue to me and review my PR.
>
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

Dmitry Gorchakov
> Could you please describe in detail the issue you are trying to fix?

I once got same deadlock on a productive in v2.7.5. New iteration of checkpoint could not start. Dump attached.

> By the way, please take a look at GridCacheDatabaseSharedManager.checkpointReadLock:

Thanks, I will look more closely at this check.

> So, do we really need to unconditionally unlock the read lock at checkpointReadUnlock as you propose?

Checking for a writе lock holding through method isHeldByCurrentThread is not atomic, so there is no certainty that this thread still owns a write lock.
I assume that a read lock was obtained and in this code we did not release it.

Thanks!

вт, 25 февр. 2020 г. в 14:35, Вячеслав Коптилин <[hidden email]>:
Hello Dmitry,

To be honest, I don't quite understand the proposed fix. Could you please
describe in detail the issue you are trying to fix?
By the way, please take a look at
GridCacheDatabaseSharedManager.checkpointReadLock:

    @Override public void checkpointReadLock() {
        if (checkpointLock.writeLock().isHeldByCurrentThread())
            return;
        ...
    }

In other words, the read lock is not acquired in case the write lock is
already held.
So, do we really need to unconditionally unlock the read lock at
checkpointReadUnlock as you propose?

Thanks,
S.

вт, 25 февр. 2020 г. в 13:32, Dmitry Gorchakov <
[hidden email]>:

> This issue once reproduced in my productive with similar stack trace.
> During
> sources analysis I saw the use unsafe of method
> writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may
> cause
> thread to starvation. Problem no longer appeared after local fix.
> I prepared pull request https://github.com/apache/ignite/pull/7445.
> Accept please jira issue to me and review my PR.
>
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>

threaddump.txt (571K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

Andrew Mashenkov
Write lock is an exclusive lock and no other threads can obtain read or
write lock while current thread holding write lock.

Why you think checking for current thread is lock-owner is not atomic?
A thread that already holds a write lock has no needs to obtain read locks
unless it releases the write-lock.

Your change can have serious impact.
Can you share a reproducer?


вт, 25 февр. 2020 г., 20:42 Дмитрий Горчаков <
[hidden email]>:

> > Could you please describe in detail the issue you are trying to fix?
>
> I once got same deadlock on a productive in v2.7.5. New iteration of
> checkpoint could not start. Dump attached.
>
> > By the way, please take a look at
> GridCacheDatabaseSharedManager.checkpointReadLock:
>
> Thanks, I will look more closely at this check.
>
> > So, do we really need to unconditionally unlock the read lock at
> checkpointReadUnlock as you propose?
>
> Checking for a writе lock holding through method isHeldByCurrentThread is
> not atomic, so there is no certainty that this thread still owns a write
> lock.
> I assume that a read lock was obtained and in this code we did not release
> it.
>
> Thanks!
>
> вт, 25 февр. 2020 г. в 14:35, Вячеслав Коптилин <[hidden email]
> >:
>
>> Hello Dmitry,
>>
>> To be honest, I don't quite understand the proposed fix. Could you please
>> describe in detail the issue you are trying to fix?
>> By the way, please take a look at
>> GridCacheDatabaseSharedManager.checkpointReadLock:
>>
>>     @Override public void checkpointReadLock() {
>>         if (checkpointLock.writeLock().isHeldByCurrentThread())
>>             return;
>>         ...
>>     }
>>
>> In other words, the read lock is not acquired in case the write lock is
>> already held.
>> So, do we really need to unconditionally unlock the read lock at
>> checkpointReadUnlock as you propose?
>>
>> Thanks,
>> S.
>>
>> вт, 25 февр. 2020 г. в 13:32, Dmitry Gorchakov <
>> [hidden email]>:
>>
>> > This issue once reproduced in my productive with similar stack trace.
>> > During
>> > sources analysis I saw the use unsafe of method
>> > writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may
>> > cause
>> > thread to starvation. Problem no longer appeared after local fix.
>> > I prepared pull request https://github.com/apache/ignite/pull/7445.
>> > Accept please jira issue to me and review my PR.
>> >
>> >
>> >
>> >
>> > --
>> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>> >
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

agura
Agree with Andrey and Slava.

Your change could lead to exception in case when current thread does
not own read lock.

> Checking for a writе lock holding through method isHeldByCurrentThread is not atomic, so there is no certainty that this thread still owns a write lock.

What do you mean when talk "method isHeldByCurrentThread is not
atomic"? There is no need in any additional guarantees here because
current thread always sees own changes and any other thread could not
change thread reference if current thread is owner. Also current
thread can't check lock owning and release lock concurrently.

> I assume that a read lock was obtained and in this code we did not release it.

If current thread is write lock owner then it can't acquire read lock.
RRWL doesn't allow upgrades or downgrades.

Stack traces from JIRA issue don't point to deadlock. "Deadlock: true"
means that JVM detected deadlock but it doesn't mean that exactly this
threads in deadlock.

On Tue, Feb 25, 2020 at 9:25 PM Andrey Mashenkov
<[hidden email]> wrote:

>
> Write lock is an exclusive lock and no other threads can obtain read or
> write lock while current thread holding write lock.
>
> Why you think checking for current thread is lock-owner is not atomic?
> A thread that already holds a write lock has no needs to obtain read locks
> unless it releases the write-lock.
>
> Your change can have serious impact.
> Can you share a reproducer?
>
>
> вт, 25 февр. 2020 г., 20:42 Дмитрий Горчаков <
> [hidden email]>:
>
> > > Could you please describe in detail the issue you are trying to fix?
> >
> > I once got same deadlock on a productive in v2.7.5. New iteration of
> > checkpoint could not start. Dump attached.
> >
> > > By the way, please take a look at
> > GridCacheDatabaseSharedManager.checkpointReadLock:
> >
> > Thanks, I will look more closely at this check.
> >
> > > So, do we really need to unconditionally unlock the read lock at
> > checkpointReadUnlock as you propose?
> >
> > Checking for a writе lock holding through method isHeldByCurrentThread is
> > not atomic, so there is no certainty that this thread still owns a write
> > lock.
> > I assume that a read lock was obtained and in this code we did not release
> > it.
> >
> > Thanks!
> >
> > вт, 25 февр. 2020 г. в 14:35, Вячеслав Коптилин <[hidden email]
> > >:
> >
> >> Hello Dmitry,
> >>
> >> To be honest, I don't quite understand the proposed fix. Could you please
> >> describe in detail the issue you are trying to fix?
> >> By the way, please take a look at
> >> GridCacheDatabaseSharedManager.checkpointReadLock:
> >>
> >>     @Override public void checkpointReadLock() {
> >>         if (checkpointLock.writeLock().isHeldByCurrentThread())
> >>             return;
> >>         ...
> >>     }
> >>
> >> In other words, the read lock is not acquired in case the write lock is
> >> already held.
> >> So, do we really need to unconditionally unlock the read lock at
> >> checkpointReadUnlock as you propose?
> >>
> >> Thanks,
> >> S.
> >>
> >> вт, 25 февр. 2020 г. в 13:32, Dmitry Gorchakov <
> >> [hidden email]>:
> >>
> >> > This issue once reproduced in my productive with similar stack trace.
> >> > During
> >> > sources analysis I saw the use unsafe of method
> >> > writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may
> >> > cause
> >> > thread to starvation. Problem no longer appeared after local fix.
> >> > I prepared pull request https://github.com/apache/ignite/pull/7445.
> >> > Accept please jira issue to me and review my PR.
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >> >
> >>
> >
Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Created] (IGNITE-6530) Deadlock in checkpointReadLock method in GridCacheDatabaseSharedManager

Ivan Pavlukhin
I might have missed some context, but a side note:

> If current thread is write lock owner then it can't acquire read lock.
> RRWL doesn't allow upgrades or downgrades.

From ReentrantReadWriteLock javadoc:
Reentrancy also allows downgrading from the write lock to a read lock,
by acquiring the write lock, then the read lock and then releasing the
write lock. However, upgrading from a read lock to the write lock is
not possible.


Best regards,
Ivan Pavlukhin

вт, 25 февр. 2020 г. в 22:05, Andrey Gura <[hidden email]>:

>
> Agree with Andrey and Slava.
>
> Your change could lead to exception in case when current thread does
> not own read lock.
>
> > Checking for a writе lock holding through method isHeldByCurrentThread is not atomic, so there is no certainty that this thread still owns a write lock.
>
> What do you mean when talk "method isHeldByCurrentThread is not
> atomic"? There is no need in any additional guarantees here because
> current thread always sees own changes and any other thread could not
> change thread reference if current thread is owner. Also current
> thread can't check lock owning and release lock concurrently.
>
> > I assume that a read lock was obtained and in this code we did not release it.
>
> If current thread is write lock owner then it can't acquire read lock.
> RRWL doesn't allow upgrades or downgrades.
>
> Stack traces from JIRA issue don't point to deadlock. "Deadlock: true"
> means that JVM detected deadlock but it doesn't mean that exactly this
> threads in deadlock.
>
> On Tue, Feb 25, 2020 at 9:25 PM Andrey Mashenkov
> <[hidden email]> wrote:
> >
> > Write lock is an exclusive lock and no other threads can obtain read or
> > write lock while current thread holding write lock.
> >
> > Why you think checking for current thread is lock-owner is not atomic?
> > A thread that already holds a write lock has no needs to obtain read locks
> > unless it releases the write-lock.
> >
> > Your change can have serious impact.
> > Can you share a reproducer?
> >
> >
> > вт, 25 февр. 2020 г., 20:42 Дмитрий Горчаков <
> > [hidden email]>:
> >
> > > > Could you please describe in detail the issue you are trying to fix?
> > >
> > > I once got same deadlock on a productive in v2.7.5. New iteration of
> > > checkpoint could not start. Dump attached.
> > >
> > > > By the way, please take a look at
> > > GridCacheDatabaseSharedManager.checkpointReadLock:
> > >
> > > Thanks, I will look more closely at this check.
> > >
> > > > So, do we really need to unconditionally unlock the read lock at
> > > checkpointReadUnlock as you propose?
> > >
> > > Checking for a writе lock holding through method isHeldByCurrentThread is
> > > not atomic, so there is no certainty that this thread still owns a write
> > > lock.
> > > I assume that a read lock was obtained and in this code we did not release
> > > it.
> > >
> > > Thanks!
> > >
> > > вт, 25 февр. 2020 г. в 14:35, Вячеслав Коптилин <[hidden email]
> > > >:
> > >
> > >> Hello Dmitry,
> > >>
> > >> To be honest, I don't quite understand the proposed fix. Could you please
> > >> describe in detail the issue you are trying to fix?
> > >> By the way, please take a look at
> > >> GridCacheDatabaseSharedManager.checkpointReadLock:
> > >>
> > >>     @Override public void checkpointReadLock() {
> > >>         if (checkpointLock.writeLock().isHeldByCurrentThread())
> > >>             return;
> > >>         ...
> > >>     }
> > >>
> > >> In other words, the read lock is not acquired in case the write lock is
> > >> already held.
> > >> So, do we really need to unconditionally unlock the read lock at
> > >> checkpointReadUnlock as you propose?
> > >>
> > >> Thanks,
> > >> S.
> > >>
> > >> вт, 25 февр. 2020 г. в 13:32, Dmitry Gorchakov <
> > >> [hidden email]>:
> > >>
> > >> > This issue once reproduced in my productive with similar stack trace.
> > >> > During
> > >> > sources analysis I saw the use unsafe of method
> > >> > writeLock().isHeldByCurrentThread() for unlocking checkpoint. This may
> > >> > cause
> > >> > thread to starvation. Problem no longer appeared after local fix.
> > >> > I prepared pull request https://github.com/apache/ignite/pull/7445.
> > >> > Accept please jira issue to me and review my PR.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > >> >
> > >>
> > >