IgniteSemaphore and failoverSafe flag

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: IgniteSemaphore and failoverSafe flag

dkarachentsev
It's not 100% reproducible, to get failed locally I've ran it many times
in a loop (Intellij IDEA feature).
N.B. This test was muted before the fix, so yes, it's could not be a cause.

Thanks!

14.04.2017 17:23, Vladisav Jelisavcic пишет:

> Hmm, I cannot reproduce this behavior locally,
> my guess is interrupt flag is not always cleared properly in
> #GridCacheSemaphore.acquire method (but it doesn't have anything to do
> with latest fix)
>
> Can you make it reproducible?
>
> On Fri, Apr 14, 2017 at 2:46 PM, Dmitry Karachentsev
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Vladislav,
>
>     One more thing, This test [1] started failing on semaphore close
>     when this fix [2] was introduced.
>     Could you check it please?
>
>     [1]
>     http://ci.ignite.apache.org/viewLog.html?buildId=547151&tab=buildResultsDiv&buildTypeId=IgniteTests_IgniteDataStrucutures#testNameId-979977708202725050
>     <http://ci.ignite.apache.org/viewLog.html?buildId=547151&tab=buildResultsDiv&buildTypeId=IgniteTests_IgniteDataStrucutures#testNameId-979977708202725050>
>     [2] https://issues.apache.org/jira/browse/IGNITE-1977
>     <https://issues.apache.org/jira/browse/IGNITE-1977>
>
>     Thanks!
>
>     14.04.2017 15:27, Dmitry Karachentsev пишет:
>>     Vladislav,
>>
>>     Yep, you're right. I'll fix it.
>>
>>     Thanks!
>>
>>     14.04.2017 15:18, Vladisav Jelisavcic пишет:
>>>     Hi Dmitry,
>>>
>>>     it looks to me that this test is not valid - after the semaphore
>>>     2 fails the permits are redistributed
>>>     so the expected number of permits should really be 20 not 10. Do
>>>     you agree?
>>>
>>>     I guess before latest fix this test was (incorrectly) passing
>>>     because permits weren't released properly.
>>>
>>>     What do you think?
>>>
>>>     On Fri, Apr 14, 2017 at 11:27 AM, Dmitry Karachentsev
>>>     <[hidden email] <mailto:[hidden email]>>
>>>     wrote:
>>>
>>>         Hi Vladislav,
>>>
>>>         It looks like after fix was merged these tests [1] started
>>>         failing. Could you please take a look?
>>>
>>>         [1]
>>>         http://ci.ignite.apache.org/viewLog.html?buildId=544238&tab=buildResultsDiv&buildTypeId=IgniteTests_IgniteBinaryObjectsDataStrucutures
>>>         <http://ci.ignite.apache.org/viewLog.html?buildId=544238&tab=buildResultsDiv&buildTypeId=IgniteTests_IgniteBinaryObjectsDataStrucutures>
>>>
>>>         Thanks!
>>>
>>>         -Dmitry.
>>>
>>>         13.04.2017 16:15, Dmitry Karachentsev пишет:
>>>>         Thanks a lot!
>>>>
>>>>         12.04.2017 16:35, Vladisav Jelisavcic пишет:
>>>>>         Hi Dmitry,
>>>>>
>>>>>         sure, I made a fix, take a look at the PR and the comments
>>>>>         in the ticket.
>>>>>
>>>>>         Best regards,
>>>>>         Vladisav
>>>>>
>>>>>         On Tue, Apr 11, 2017 at 3:00 PM, Dmitry Karachentsev
>>>>>         <[hidden email]
>>>>>         <mailto:[hidden email]>> wrote:
>>>>>
>>>>>             Hi Vladislav,
>>>>>
>>>>>             Thanks for your contribution! But it seems doesn't fix
>>>>>             related tickets, in particular [1].
>>>>>             Could you please take a look?
>>>>>
>>>>>             [1] https://issues.apache.org/jira/browse/IGNITE-4173
>>>>>             <https://issues.apache.org/jira/browse/IGNITE-4173>
>>>>>
>>>>>             Thanks!
>>>>>
>>>>>             06.04.2017 16:27, Vladisav Jelisavcic пишет:
>>>>>>             Hey Dmitry,
>>>>>>
>>>>>>             sorry for the late reply, I'll try to bake a pr later
>>>>>>             during the day.
>>>>>>
>>>>>>             Best regards,
>>>>>>             Vladisav
>>>>>>
>>>>>>
>>>>>>
>>>>>>             On Tue, Apr 4, 2017 at 11:05 AM, Dmitry Karachentsev
>>>>>>             <[hidden email]
>>>>>>             <mailto:[hidden email]>> wrote:
>>>>>>
>>>>>>                 Hi Vladislav,
>>>>>>
>>>>>>                 I see you're developing [1] for a while, did you
>>>>>>                 have any chance to fix it? If no, is there any
>>>>>>                 estimate?
>>>>>>
>>>>>>                 [1]
>>>>>>                 https://issues.apache.org/jira/browse/IGNITE-1977
>>>>>>                 <https://issues.apache.org/jira/browse/IGNITE-1977>
>>>>>>
>>>>>>                 Thanks!
>>>>>>
>>>>>>                 -Dmitry.
>>>>>>
>>>>>>
>>>>>>
>>>>>>                 20.03.2017 10:28, Alexey Goncharuk пишет:
>>>>>>
>>>>>>                     I think re-creation should be handled by a
>>>>>>                     user who will make sure that
>>>>>>                     nobody else is currently executing the
>>>>>>                     guarded logic before the
>>>>>>                     re-creation. This is exactly the same
>>>>>>                     semantics as with
>>>>>>                     BrokenBarrierException for j.u.c.CyclicBarrier.
>>>>>>
>>>>>>                     2017-03-17 2:39 GMT+03:00 Vladisav Jelisavcic
>>>>>>                     <[hidden email]
>>>>>>                     <mailto:[hidden email]>>:
>>>>>>
>>>>>>                         Hi everyone,
>>>>>>
>>>>>>                         I agree with Val, he's got a point;
>>>>>>                         recreating the lock doesn't seem
>>>>>>                         possible
>>>>>>                         (at least not the with the transactional
>>>>>>                         cache lock/semaphore we have).
>>>>>>                         Is this re-create behavior really needed?
>>>>>>
>>>>>>                         Best regards,
>>>>>>                         Vladisav
>>>>>>
>>>>>>
>>>>>>
>>>>>>                         On Thu, Mar 16, 2017 at 8:34 PM, Valentin
>>>>>>                         Kulichenko <
>>>>>>                         [hidden email]
>>>>>>                         <mailto:[hidden email]>>
>>>>>>                         wrote:
>>>>>>
>>>>>>                             Guys,
>>>>>>
>>>>>>                             How does recreation of the lock
>>>>>>                             helps? My understanding is that scenario
>>>>>>
>>>>>>                         is
>>>>>>
>>>>>>                             the following:
>>>>>>
>>>>>>                             1. Client A creates and acquires a
>>>>>>                             lock, and then starts to execute
>>>>>>
>>>>>>                         guarded
>>>>>>
>>>>>>                             logic.
>>>>>>                             2. Client B tries to acquire the same
>>>>>>                             lock and parks to wait.
>>>>>>                             3. Before client A unlocks, all
>>>>>>                             affinity nodes for the lock fail, lock
>>>>>>                             disappears from the cache.
>>>>>>                             4. Client B fails with exception,
>>>>>>                             recreates the lock, acquires it, and
>>>>>>                             starts to execute guarded logic
>>>>>>                             concurrently with client A.
>>>>>>
>>>>>>                             In my view this is wrong anyway,
>>>>>>                             regardless of whether this happens
>>>>>>                             silently or with an exception handled
>>>>>>                             in user's code. Because this code
>>>>>>                             doesn't have any way to know if
>>>>>>                             client A still holds the lock or not.
>>>>>>
>>>>>>                             Am I missing something?
>>>>>>
>>>>>>                             -Val
>>>>>>
>>>>>>                             On Tue, Mar 14, 2017 at 10:14 AM,
>>>>>>                             Dmitriy Setrakyan <
>>>>>>
>>>>>>                         [hidden email]
>>>>>>                         <mailto:[hidden email]>
>>>>>>
>>>>>>                             wrote:
>>>>>>
>>>>>>                                 On Tue, Mar 14, 2017 at 12:46 AM,
>>>>>>                                 Alexey Goncharuk <
>>>>>>                                 [hidden email]
>>>>>>                                 <mailto:[hidden email]>>
>>>>>>                                 wrote:
>>>>>>
>>>>>>                                         Which user operation
>>>>>>                                         would result in
>>>>>>                                         exception? To my knowledge,
>>>>>>
>>>>>>                         user
>>>>>>
>>>>>>                                 may
>>>>>>
>>>>>>                                         already be holding the
>>>>>>                                         lock and not invoking any
>>>>>>                                         Ignite APIs, no?
>>>>>>
>>>>>>                                     Yes, this is exactly my point.
>>>>>>
>>>>>>                                     Imagine that a node already
>>>>>>                                     holds a lock and another node
>>>>>>                                     is waiting
>>>>>>
>>>>>>                             for
>>>>>>
>>>>>>                                     the lock. If all partition
>>>>>>                                     nodes leave the grid and the
>>>>>>                                     lock is
>>>>>>
>>>>>>                                 re-created,
>>>>>>
>>>>>>                                     this second node will
>>>>>>                                     immediately acquire the lock
>>>>>>                                     and we will have
>>>>>>
>>>>>>                         two
>>>>>>
>>>>>>                                     lock owners. I think in this
>>>>>>                                     case this second node (blocked on
>>>>>>
>>>>>>                         lock())
>>>>>>
>>>>>>                                     should get an exception
>>>>>>                                     saying that the lock was lost
>>>>>>                                     (which is, by
>>>>>>
>>>>>>                         the
>>>>>>
>>>>>>                                     way, the current behavior),
>>>>>>                                     and the first node should get an
>>>>>>
>>>>>>                         exception
>>>>>>
>>>>>>                             on
>>>>>>
>>>>>>                                     unlock.
>>>>>>
>>>>>>                                 Makes sense.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>

12