Apache Ignite Developers - Legacy Mail Archive

IgniteSemaphore and failoverSafe flag

Classic

List

Threaded

21 messages Options

dkarachentsev

Re: IgniteSemaphore and failoverSafe flag

It's not 100% reproducible, to get failed locally I've ran it many times
in a loop (Intellij IDEA feature).
N.B. This test was muted before the fix, so yes, it's could not be a cause.

Thanks!

14.04.2017 17:23, Vladisav Jelisavcic пишет:

> Hmm, I cannot reproduce this behavior locally,
> my guess is interrupt flag is not always cleared properly in
> #GridCacheSemaphore.acquire method (but it doesn't have anything to do
> with latest fix)
>
> Can you make it reproducible?
>
> On Fri, Apr 14, 2017 at 2:46 PM, Dmitry Karachentsev
> <[hidden email] <mailto:[hidden email]>> wrote:
>
> Vladislav,
>
> One more thing, This test [1] started failing on semaphore close
> when this fix [2] was introduced.
> Could you check it please?
>
> [1]
> http://ci.ignite.apache.org/viewLog.html?buildId=547151&tab=buildResultsDiv&buildTypeId=IgniteTests_IgniteDataStrucutures#testNameId-979977708202725050
> <http://ci.ignite.apache.org/viewLog.html?buildId=547151&tab=buildResultsDiv&buildTypeId=IgniteTests_IgniteDataStrucutures#testNameId-979977708202725050>
> [2] https://issues.apache.org/jira/browse/IGNITE-1977
> <https://issues.apache.org/jira/browse/IGNITE-1977>
>
> Thanks!
>
> 14.04.2017 15:27, Dmitry Karachentsev пишет:
>> Vladislav,
>>
>> Yep, you're right. I'll fix it.
>>
>> Thanks!
>>
>> 14.04.2017 15:18, Vladisav Jelisavcic пишет:
>>> Hi Dmitry,
>>>
>>> it looks to me that this test is not valid - after the semaphore
>>> 2 fails the permits are redistributed
>>> so the expected number of permits should really be 20 not 10. Do
>>> you agree?
>>>
>>> I guess before latest fix this test was (incorrectly) passing
>>> because permits weren't released properly.
>>>
>>> What do you think?
>>>
>>> On Fri, Apr 14, 2017 at 11:27 AM, Dmitry Karachentsev
>>> <[hidden email] <mailto:[hidden email]>>
>>> wrote:
>>>
>>> Hi Vladislav,
>>>
>>> It looks like after fix was merged these tests [1] started
>>> failing. Could you please take a look?
>>>
>>> [1]
>>> http://ci.ignite.apache.org/viewLog.html?buildId=544238&tab=buildResultsDiv&buildTypeId=IgniteTests_IgniteBinaryObjectsDataStrucutures
>>> <http://ci.ignite.apache.org/viewLog.html?buildId=544238&tab=buildResultsDiv&buildTypeId=IgniteTests_IgniteBinaryObjectsDataStrucutures>
>>>
>>> Thanks!
>>>
>>> -Dmitry.
>>>
>>> 13.04.2017 16:15, Dmitry Karachentsev пишет:
>>>> Thanks a lot!
>>>>
>>>> 12.04.2017 16:35, Vladisav Jelisavcic пишет:
>>>>> Hi Dmitry,
>>>>>
>>>>> sure, I made a fix, take a look at the PR and the comments
>>>>> in the ticket.
>>>>>
>>>>> Best regards,
>>>>> Vladisav
>>>>>
>>>>> On Tue, Apr 11, 2017 at 3:00 PM, Dmitry Karachentsev
>>>>> <[hidden email]
>>>>> <mailto:[hidden email]>> wrote:
>>>>>
>>>>> Hi Vladislav,
>>>>>
>>>>> Thanks for your contribution! But it seems doesn't fix
>>>>> related tickets, in particular [1].
>>>>> Could you please take a look?
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-4173
>>>>> <https://issues.apache.org/jira/browse/IGNITE-4173>
>>>>>
>>>>> Thanks!
>>>>>
>>>>> 06.04.2017 16:27, Vladisav Jelisavcic пишет:
>>>>>> Hey Dmitry,
>>>>>>
>>>>>> sorry for the late reply, I'll try to bake a pr later
>>>>>> during the day.
>>>>>>
>>>>>> Best regards,
>>>>>> Vladisav
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Apr 4, 2017 at 11:05 AM, Dmitry Karachentsev
>>>>>> <[hidden email]
>>>>>> <mailto:[hidden email]>> wrote:
>>>>>>
>>>>>> Hi Vladislav,
>>>>>>
>>>>>> I see you're developing [1] for a while, did you
>>>>>> have any chance to fix it? If no, is there any
>>>>>> estimate?
>>>>>>
>>>>>> [1]
>>>>>> https://issues.apache.org/jira/browse/IGNITE-1977
>>>>>> <https://issues.apache.org/jira/browse/IGNITE-1977>
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> -Dmitry.
>>>>>>
>>>>>>
>>>>>>
>>>>>> 20.03.2017 10:28, Alexey Goncharuk пишет:
>>>>>>
>>>>>> I think re-creation should be handled by a
>>>>>> user who will make sure that
>>>>>> nobody else is currently executing the
>>>>>> guarded logic before the
>>>>>> re-creation. This is exactly the same
>>>>>> semantics as with
>>>>>> BrokenBarrierException for j.u.c.CyclicBarrier.
>>>>>>
>>>>>> 2017-03-17 2:39 GMT+03:00 Vladisav Jelisavcic
>>>>>> <[hidden email]
>>>>>> <mailto:[hidden email]>>:
>>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> I agree with Val, he's got a point;
>>>>>> recreating the lock doesn't seem
>>>>>> possible
>>>>>> (at least not the with the transactional
>>>>>> cache lock/semaphore we have).
>>>>>> Is this re-create behavior really needed?
>>>>>>
>>>>>> Best regards,
>>>>>> Vladisav
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 16, 2017 at 8:34 PM, Valentin
>>>>>> Kulichenko <
>>>>>> [hidden email]
>>>>>> <mailto:[hidden email]>>
>>>>>> wrote:
>>>>>>
>>>>>> Guys,
>>>>>>
>>>>>> How does recreation of the lock
>>>>>> helps? My understanding is that scenario
>>>>>>
>>>>>> is
>>>>>>
>>>>>> the following:
>>>>>>
>>>>>> 1. Client A creates and acquires a
>>>>>> lock, and then starts to execute
>>>>>>
>>>>>> guarded
>>>>>>
>>>>>> logic.
>>>>>> 2. Client B tries to acquire the same
>>>>>> lock and parks to wait.
>>>>>> 3. Before client A unlocks, all
>>>>>> affinity nodes for the lock fail, lock
>>>>>> disappears from the cache.
>>>>>> 4. Client B fails with exception,
>>>>>> recreates the lock, acquires it, and
>>>>>> starts to execute guarded logic
>>>>>> concurrently with client A.
>>>>>>
>>>>>> In my view this is wrong anyway,
>>>>>> regardless of whether this happens
>>>>>> silently or with an exception handled
>>>>>> in user's code. Because this code
>>>>>> doesn't have any way to know if
>>>>>> client A still holds the lock or not.
>>>>>>
>>>>>> Am I missing something?
>>>>>>
>>>>>> -Val
>>>>>>
>>>>>> On Tue, Mar 14, 2017 at 10:14 AM,
>>>>>> Dmitriy Setrakyan <
>>>>>>
>>>>>> [hidden email]
>>>>>> <mailto:[hidden email]>
>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>> On Tue, Mar 14, 2017 at 12:46 AM,
>>>>>> Alexey Goncharuk <
>>>>>> [hidden email]
>>>>>> <mailto:[hidden email]>>
>>>>>> wrote:
>>>>>>
>>>>>> Which user operation
>>>>>> would result in
>>>>>> exception? To my knowledge,
>>>>>>
>>>>>> user
>>>>>>
>>>>>> may
>>>>>>
>>>>>> already be holding the
>>>>>> lock and not invoking any
>>>>>> Ignite APIs, no?
>>>>>>
>>>>>> Yes, this is exactly my point.
>>>>>>
>>>>>> Imagine that a node already
>>>>>> holds a lock and another node
>>>>>> is waiting
>>>>>>
>>>>>> for
>>>>>>
>>>>>> the lock. If all partition
>>>>>> nodes leave the grid and the
>>>>>> lock is
>>>>>>
>>>>>> re-created,
>>>>>>
>>>>>> this second node will
>>>>>> immediately acquire the lock
>>>>>> and we will have
>>>>>>
>>>>>> two
>>>>>>
>>>>>> lock owners. I think in this
>>>>>> case this second node (blocked on
>>>>>>
>>>>>> lock())
>>>>>>
>>>>>> should get an exception
>>>>>> saying that the lock was lost
>>>>>> (which is, by
>>>>>>
>>>>>> the
>>>>>>
>>>>>> way, the current behavior),
>>>>>> and the first node should get an
>>>>>>
>>>>>> exception
>>>>>>
>>>>>> on
>>>>>>
>>>>>> unlock.
>>>>>>
>>>>>> Makes sense.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>