Unexpected behavior of DiscoveryCustomMessage acks

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Unexpected behavior of DiscoveryCustomMessage acks

Sergey Chugunov
Hello folks,

Working on IGNITE-4302 <https://issues.apache.org/jira/browse/IGNITE-4302>
I developed a protocol for delivering metadata updates to all nodes in
cluster.

This protocol relies on a guarantee of *DiscoveryCustomMessage* that each
message is delivered to *CustomEventListener* exactly once; duplicates are
not possible.

But test *GridEventConsumeSelfTest::testMultithreadedWithNodeRestart*
running with my latest code changes seems to fail exactly because of
violation of this guarantee.
I can see that acknowledge messages which are also DiscoveryCustomMessages
make two passes across the cluster when some nodes are restarted.

My question is: is it s bug or just a detail about guarantees around
acknowledge messages?
I can easily filter out these duplicates at the protocol level, but it is
better to fix this in case it is a bug.

Thanks,
Sergey.
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected behavior of DiscoveryCustomMessage acks

Vladimir Ozerov
Sounds very nasty.

On Thu, Feb 2, 2017 at 3:50 PM, Sergey Chugunov <[hidden email]>
wrote:

> Hello folks,
>
> Working on IGNITE-4302 <https://issues.apache.org/jira/browse/IGNITE-4302>
> I developed a protocol for delivering metadata updates to all nodes in
> cluster.
>
> This protocol relies on a guarantee of *DiscoveryCustomMessage* that each
> message is delivered to *CustomEventListener* exactly once; duplicates are
> not possible.
>
> But test *GridEventConsumeSelfTest::testMultithreadedWithNodeRestart*
> running with my latest code changes seems to fail exactly because of
> violation of this guarantee.
> I can see that acknowledge messages which are also DiscoveryCustomMessages
> make two passes across the cluster when some nodes are restarted.
>
> My question is: is it s bug or just a detail about guarantees around
> acknowledge messages?
> I can easily filter out these duplicates at the protocol level, but it is
> better to fix this in case it is a bug.
>
> Thanks,
> Sergey.
>
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected behavior of DiscoveryCustomMessage acks

yzhdanov
Can anyone point a place in javadoc that states that there is exactly once
guarantee?

Imagine you have nodes A, B and C. A sends custom message to B and gets ack
back, B sends to C and dies. A connects to C and resends the message.
Result - C has got the message twice. Currently handling logic is
responsible for resolving duplicates.

--Yakov
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected behavior of DiscoveryCustomMessage acks

Sergey Chugunov
Yakov,

Thanks for clean explanation, also I found exactly that logic in
RingMessageWorker code.

But I strongly believe that this behavior should have been documented in
*DiscoveryCustomMessage* interface (I think it is the best place for this).

Messaging managers like discovery manager must provide very explicit and
detailed documentation for guarantees they provide to their users so
developers wouldn't guess what to expect.

Anyway modifying my protocol for IGNITE-4302 is not a big deal, I can
easily change it to handle situations of like this.
Also as part of the JIRA I'll try to clarify documentation at least for
this case.

Thanks,
Sergey.

On Fri, Feb 3, 2017 at 1:30 PM, Yakov Zhdanov <[hidden email]> wrote:

> Can anyone point a place in javadoc that states that there is exactly once
> guarantee?
>
> Imagine you have nodes A, B and C. A sends custom message to B and gets ack
> back, B sends to C and dies. A connects to C and resends the message.
> Result - C has got the message twice. Currently handling logic is
> responsible for resolving duplicates.
>
> --Yakov
>
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected behavior of DiscoveryCustomMessage acks

Alexey Goncharuk
I think we should have duplicate filtering logic in discovery manager. As
far as I remember, we wanted custom events to be consistent with other
discovery events and we rely on the fact that node joined and node left
event won't be received twice.



2017-02-03 14:40 GMT+03:00 Sergey Chugunov <[hidden email]>:

> Yakov,
>
> Thanks for clean explanation, also I found exactly that logic in
> RingMessageWorker code.
>
> But I strongly believe that this behavior should have been documented in
> *DiscoveryCustomMessage* interface (I think it is the best place for this).
>
> Messaging managers like discovery manager must provide very explicit and
> detailed documentation for guarantees they provide to their users so
> developers wouldn't guess what to expect.
>
> Anyway modifying my protocol for IGNITE-4302 is not a big deal, I can
> easily change it to handle situations of like this.
> Also as part of the JIRA I'll try to clarify documentation at least for
> this case.
>
> Thanks,
> Sergey.
>
> On Fri, Feb 3, 2017 at 1:30 PM, Yakov Zhdanov <[hidden email]> wrote:
>
> > Can anyone point a place in javadoc that states that there is exactly
> once
> > guarantee?
> >
> > Imagine you have nodes A, B and C. A sends custom message to B and gets
> ack
> > back, B sends to C and dies. A connects to C and resends the message.
> > Result - C has got the message twice. Currently handling logic is
> > responsible for resolving duplicates.
> >
> > --Yakov
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected behavior of DiscoveryCustomMessage acks

yzhdanov
Alex, this will require some more coding. The difference between nodes and
custom messages is that node can easily be identified by ID, but messages
now do not have such strong IDs and is pretty hard to compare them in
general case. However, this is, of course, possible.

--Yakov

2017-02-03 19:32 GMT+07:00 Alexey Goncharuk <[hidden email]>:

> I think we should have duplicate filtering logic in discovery manager. As
> far as I remember, we wanted custom events to be consistent with other
> discovery events and we rely on the fact that node joined and node left
> event won't be received twice.
>
>
>
> 2017-02-03 14:40 GMT+03:00 Sergey Chugunov <[hidden email]>:
>
> > Yakov,
> >
> > Thanks for clean explanation, also I found exactly that logic in
> > RingMessageWorker code.
> >
> > But I strongly believe that this behavior should have been documented in
> > *DiscoveryCustomMessage* interface (I think it is the best place for
> this).
> >
> > Messaging managers like discovery manager must provide very explicit and
> > detailed documentation for guarantees they provide to their users so
> > developers wouldn't guess what to expect.
> >
> > Anyway modifying my protocol for IGNITE-4302 is not a big deal, I can
> > easily change it to handle situations of like this.
> > Also as part of the JIRA I'll try to clarify documentation at least for
> > this case.
> >
> > Thanks,
> > Sergey.
> >
> > On Fri, Feb 3, 2017 at 1:30 PM, Yakov Zhdanov <[hidden email]>
> wrote:
> >
> > > Can anyone point a place in javadoc that states that there is exactly
> > once
> > > guarantee?
> > >
> > > Imagine you have nodes A, B and C. A sends custom message to B and gets
> > ack
> > > back, B sends to C and dies. A connects to C and resends the message.
> > > Result - C has got the message twice. Currently handling logic is
> > > responsible for resolving duplicates.
> > >
> > > --Yakov
> > >
> >
>