Apache Ignite Developers - Legacy Mail Archive

ATOMIC caches consistency - IEP-12

Classic

List

Threaded

4 messages Options

yzhdanov

ATOMIC caches consistency - IEP-12

Hello!

I tried to summarize some ideas on how to make ATOMIC caches stay
consistent in case of topology changes.

Imagine partitioned ATOMIC cache with 2 backups configured (primary + 2
backup copies of the partition). One of possible scenarios to get to
primary and backups divergence is the following - update initiating node
sends update operation to primary node, primary node propagates update to 1
of 2 backups and then dies. If initiating node crashes as well (and
therefore cannot retry the operation) then in current implementation system
comes to a situation when 2 copies of the partition present in cluster may
be different. Note that both situations possible - new primary contains the
latest update and backup does not and vice versa. New backup will be
elected according to configured affinity and will rebalance the partition
from random owner, but copies may not be consistent due to described above.

This problem does not affect TRANSACTIONAL caches as 2PC protocol deals
with scenarios of the kind very well.

Here is the link to IEP -
https://cwiki.apache.org/confluence/display/IGNITE/IEP-12+Make+ATOMIC+Caches+Consistent+Again

Sam, Alex G, Vladimir, please share your thoughts.

--Yakov

dsetrakyan

Re: ATOMIC caches consistency - IEP-12

Yakov,

As one of the prevention mechanisms for the scenario you describe, would it
be possible to send the update message to backup nodes in the order in
which they would become primary nodes? For example, if backup1 node is the
next in chain to become the primary node, then the current primary node
should send the update message to backup1 before it sends it to backup2.

In this case, if the primary and client nodes fail, then backup1 node will
become primary and will have the latest version of the data, no? Will this
solve the situation you describe, at least partially?

D.

On Mon, Dec 25, 2017 at 7:23 AM, Yakov Zhdanov <[hidden email]> wrote:

> Hello!
>
> I tried to summarize some ideas on how to make ATOMIC caches stay
> consistent in case of topology changes.
>
> Imagine partitioned ATOMIC cache with 2 backups configured (primary + 2
> backup copies of the partition). One of possible scenarios to get to
> primary and backups divergence is the following - update initiating node
> sends update operation to primary node, primary node propagates update to 1
> of 2 backups and then dies. If initiating node crashes as well (and
> therefore cannot retry the operation) then in current implementation system
> comes to a situation when 2 copies of the partition present in cluster may
> be different. Note that both situations possible - new primary contains the
> latest update and backup does not and vice versa. New backup will be
> elected according to configured affinity and will rebalance the partition
> from random owner, but copies may not be consistent due to described above.
>
> This problem does not affect TRANSACTIONAL caches as 2PC protocol deals
> with scenarios of the kind very well.
>
> Here is the link to IEP -
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> 12+Make+ATOMIC+Caches+Consistent+Again
>
> Sam, Alex G, Vladimir, please share your thoughts.
>
> --Yakov
>

Yakov Zhdanov-2

Re: ATOMIC caches consistency - IEP-12

Dmitry,

This will not solve the problem:

1. new primary election may be affinity function dependent
2. If you want to introduce order in updating backup copies then you lose
parallelism. This will increase ops latency.

My suggestion should not affect current performance on stable and on
unstable topologies.

Btw, I have just got an idea - if we need to rebalance partition to more
than 1 node then we need to scan it only once. I believe now we do it each
time from very beginning.

Yakov Zhdanov,
www.gridgain.com

2017-12-27 6:14 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:

> Yakov,
>
> As one of the prevention mechanisms for the scenario you describe, would it
> be possible to send the update message to backup nodes in the order in
> which they would become primary nodes? For example, if backup1 node is the
> next in chain to become the primary node, then the current primary node
> should send the update message to backup1 before it sends it to backup2.
>
> In this case, if the primary and client nodes fail, then backup1 node will
> become primary and will have the latest version of the data, no? Will this
> solve the situation you describe, at least partially?
>
> D.
>
> On Mon, Dec 25, 2017 at 7:23 AM, Yakov Zhdanov <[hidden email]>
> wrote:
>
> > Hello!
> >
> > I tried to summarize some ideas on how to make ATOMIC caches stay
> > consistent in case of topology changes.
> >
> > Imagine partitioned ATOMIC cache with 2 backups configured (primary + 2
> > backup copies of the partition). One of possible scenarios to get to
> > primary and backups divergence is the following - update initiating node
> > sends update operation to primary node, primary node propagates update
> to 1
> > of 2 backups and then dies. If initiating node crashes as well (and
> > therefore cannot retry the operation) then in current implementation
> system
> > comes to a situation when 2 copies of the partition present in cluster
> may
> > be different. Note that both situations possible - new primary contains
> the
> > latest update and backup does not and vice versa. New backup will be
> > elected according to configured affinity and will rebalance the partition
> > from random owner, but copies may not be consistent due to described
> above.
> >
> > This problem does not affect TRANSACTIONAL caches as 2PC protocol deals
> > with scenarios of the kind very well.
> >
> > Here is the link to IEP -
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > 12+Make+ATOMIC+Caches+Consistent+Again
> >
> > Sam, Alex G, Vladimir, please share your thoughts.
> >
> > --Yakov
> >
>

dsetrakyan

Re: ATOMIC caches consistency - IEP-12

On Thu, Dec 28, 2017 at 9:12 AM, Yakov Zhdanov <[hidden email]>
wrote:

> Dmitry,
>
> This will not solve the problem:
>
> 1. new primary election may be affinity function dependent
>

Yes, of course.

> 2. If you want to introduce order in updating backup copies then you lose
> parallelism. This will increase ops latency.
>

I did not mean waiting till an update is finished. I suggested waiting an
ask for the message is received. In my view, this will significantly reduce
the possibility of out-of-sync data.

>
> My suggestion should not affect current performance on stable and on
> unstable topologies.
>
> Btw, I have just got an idea - if we need to rebalance partition to more
> than 1 node then we need to scan it only once. I believe now we do it each
> time from very beginning.
>

I am not sure what this really means, but you should file a ticket for it
with a proper description.

D.