Crash recovery speed-up #3, Cellular Switch

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Crash recovery speed-up #3, Cellular Switch

Anton Vinogradov-2
Igniters,

PME-free switch [1] (since 2.8) skips PME on node left when possible
(baseline + fully rebalanced cluster).
This means we already wait for nothing (except recovery) to perform the
switch.
This optimization allows continuing already started operations during or
after the switch if they are not affected by failed primary.
But upcoming operations still can't be started until the switch is finished
cluster-wide.

Let me propose an additional optimization - Cellular switch.
Cellular Affinity [2] means that nodes combined into virtual cells where,
for each partition, backups located at the same cell with primaries.
The simplest way to gain Cellular Affinity is to use backup filters [3].

Cellular Affinity allows to finish the switch outside the affected cell
instantly with the following assumptions:
- Replicated caches should be recovered first since every node affected (as
a backup) by any failed primary.
  But, it is expected that replicated caches effectively read-only (has
extremely rare updates), so, nothing to wait here.
- Upcoming replicated transactions (with non-failed primaries) can be
started but can't be committed until switch finished cluster-wide.
- Upcoming transactions related to the broken cell will wait for cell
recovery (cluster-wide switch finish).

... and this means:
In addition to PME-free switch, where we able to continue already started
operations during or after the switch, now we also able to perform most of
the upcoming operations during the switch.

In other words, Cellular switch has little effect on the operation's
latency, when operation not related to the failed cell.

According to benchmark [4] which checks "how fast upcoming transactions
(started after switch start) can be committed when we have thousands of
prepared transactions (prepared before switch start)", we have 5326 ms [5]
operation's latency on master and 65 ms [6] with the proposed fix, which is
~100 times faster.

Fix [7] (as a part of IEP-45 [8]) ready to be reviewed.
Waiting for your review!


[1]
http://apache-ignite-developers.2346864.n4.nabble.com/Non-blocking-PME-Phase-One-Node-fail-tp43531p44586.html
[2]
https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up#IEP-45:CrashRecoverySpeed-Up-Cellularswitch
[3]
https://gist.github.com/anton-vinogradov/c50f9d0ce3e3e2997646f84ba7eba5f5#file-bench-java-L417
[4]
https://gist.github.com/anton-vinogradov/c50f9d0ce3e3e2997646f84ba7eba5f5
[5]
https://gist.github.com/anton-vinogradov/a35a3a8151b7494aa84b83f58cb75889#file-master-txt-L15
[6]
https://gist.github.com/anton-vinogradov/a35a3a8151b7494aa84b83f58cb75889#file-fix-txt-L15
[7] https://issues.apache.org/jira/browse/IGNITE-12617
[8]
https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up
Reply | Threaded
Open this post in threaded view
|

Re: Crash recovery speed-up #3, Cellular Switch

dmagda
Hi Anton,

Generally, it means that Ignite will keep executing operations/transactions
that are mapped into the partitions of those cells that won't be
rebalanced, is that correct?

-
Denis


On Wed, May 6, 2020 at 3:24 AM Anton Vinogradov <[hidden email]> wrote:

> Igniters,
>
> PME-free switch [1] (since 2.8) skips PME on node left when possible
> (baseline + fully rebalanced cluster).
> This means we already wait for nothing (except recovery) to perform the
> switch.
> This optimization allows continuing already started operations during or
> after the switch if they are not affected by failed primary.
> But upcoming operations still can't be started until the switch is finished
> cluster-wide.
>
> Let me propose an additional optimization - Cellular switch.
> Cellular Affinity [2] means that nodes combined into virtual cells where,
> for each partition, backups located at the same cell with primaries.
> The simplest way to gain Cellular Affinity is to use backup filters [3].
>
> Cellular Affinity allows to finish the switch outside the affected cell
> instantly with the following assumptions:
> - Replicated caches should be recovered first since every node affected (as
> a backup) by any failed primary.
>   But, it is expected that replicated caches effectively read-only (has
> extremely rare updates), so, nothing to wait here.
> - Upcoming replicated transactions (with non-failed primaries) can be
> started but can't be committed until switch finished cluster-wide.
> - Upcoming transactions related to the broken cell will wait for cell
> recovery (cluster-wide switch finish).
>
> ... and this means:
> In addition to PME-free switch, where we able to continue already started
> operations during or after the switch, now we also able to perform most of
> the upcoming operations during the switch.
>
> In other words, Cellular switch has little effect on the operation's
> latency, when operation not related to the failed cell.
>
> According to benchmark [4] which checks "how fast upcoming transactions
> (started after switch start) can be committed when we have thousands of
> prepared transactions (prepared before switch start)", we have 5326 ms [5]
> operation's latency on master and 65 ms [6] with the proposed fix, which is
> ~100 times faster.
>
> Fix [7] (as a part of IEP-45 [8]) ready to be reviewed.
> Waiting for your review!
>
>
> [1]
>
> http://apache-ignite-developers.2346864.n4.nabble.com/Non-blocking-PME-Phase-One-Node-fail-tp43531p44586.html
> [2]
>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up#IEP-45:CrashRecoverySpeed-Up-Cellularswitch
> [3]
>
> https://gist.github.com/anton-vinogradov/c50f9d0ce3e3e2997646f84ba7eba5f5#file-bench-java-L417
> [4]
> https://gist.github.com/anton-vinogradov/c50f9d0ce3e3e2997646f84ba7eba5f5
> [5]
>
> https://gist.github.com/anton-vinogradov/a35a3a8151b7494aa84b83f58cb75889#file-master-txt-L15
> [6]
>
> https://gist.github.com/anton-vinogradov/a35a3a8151b7494aa84b83f58cb75889#file-fix-txt-L15
> [7] https://issues.apache.org/jira/browse/IGNITE-12617
> [8]
>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up
>
Reply | Threaded
Open this post in threaded view
|

Re: Crash recovery speed-up #3, Cellular Switch

Anton Vinogradov-2
Denis,

Rebalance is not expected here since this optimization works only on a
fully rebalanced cluster with baseline.

On Sat, May 9, 2020 at 12:48 AM Denis Magda <[hidden email]> wrote:

> Hi Anton,
>
> Generally, it means that Ignite will keep executing operations/transactions
> that are mapped into the partitions of those cells that won't be
> rebalanced, is that correct?
>
> -
> Denis
>
>
> On Wed, May 6, 2020 at 3:24 AM Anton Vinogradov <[hidden email]> wrote:
>
> > Igniters,
> >
> > PME-free switch [1] (since 2.8) skips PME on node left when possible
> > (baseline + fully rebalanced cluster).
> > This means we already wait for nothing (except recovery) to perform the
> > switch.
> > This optimization allows continuing already started operations during or
> > after the switch if they are not affected by failed primary.
> > But upcoming operations still can't be started until the switch is
> finished
> > cluster-wide.
> >
> > Let me propose an additional optimization - Cellular switch.
> > Cellular Affinity [2] means that nodes combined into virtual cells where,
> > for each partition, backups located at the same cell with primaries.
> > The simplest way to gain Cellular Affinity is to use backup filters [3].
> >
> > Cellular Affinity allows to finish the switch outside the affected cell
> > instantly with the following assumptions:
> > - Replicated caches should be recovered first since every node affected
> (as
> > a backup) by any failed primary.
> >   But, it is expected that replicated caches effectively read-only (has
> > extremely rare updates), so, nothing to wait here.
> > - Upcoming replicated transactions (with non-failed primaries) can be
> > started but can't be committed until switch finished cluster-wide.
> > - Upcoming transactions related to the broken cell will wait for cell
> > recovery (cluster-wide switch finish).
> >
> > ... and this means:
> > In addition to PME-free switch, where we able to continue already started
> > operations during or after the switch, now we also able to perform most
> of
> > the upcoming operations during the switch.
> >
> > In other words, Cellular switch has little effect on the operation's
> > latency, when operation not related to the failed cell.
> >
> > According to benchmark [4] which checks "how fast upcoming transactions
> > (started after switch start) can be committed when we have thousands of
> > prepared transactions (prepared before switch start)", we have 5326 ms
> [5]
> > operation's latency on master and 65 ms [6] with the proposed fix, which
> is
> > ~100 times faster.
> >
> > Fix [7] (as a part of IEP-45 [8]) ready to be reviewed.
> > Waiting for your review!
> >
> >
> > [1]
> >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/Non-blocking-PME-Phase-One-Node-fail-tp43531p44586.html
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up#IEP-45:CrashRecoverySpeed-Up-Cellularswitch
> > [3]
> >
> >
> https://gist.github.com/anton-vinogradov/c50f9d0ce3e3e2997646f84ba7eba5f5#file-bench-java-L417
> > [4]
> >
> https://gist.github.com/anton-vinogradov/c50f9d0ce3e3e2997646f84ba7eba5f5
> > [5]
> >
> >
> https://gist.github.com/anton-vinogradov/a35a3a8151b7494aa84b83f58cb75889#file-master-txt-L15
> > [6]
> >
> >
> https://gist.github.com/anton-vinogradov/a35a3a8151b7494aa84b83f58cb75889#file-fix-txt-L15
> > [7] https://issues.apache.org/jira/browse/IGNITE-12617
> > [8]
> >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Crash recovery speed-up #3, Cellular Switch

Anton Vinogradov-2
Folks,

It seems, we have tacit agreement here.
Going to merge fix May 15.

On Tue, May 12, 2020 at 10:08 AM Anton Vinogradov <[hidden email]> wrote:

> Denis,
>
> Rebalance is not expected here since this optimization works only on a
> fully rebalanced cluster with baseline.
>
> On Sat, May 9, 2020 at 12:48 AM Denis Magda <[hidden email]> wrote:
>
>> Hi Anton,
>>
>> Generally, it means that Ignite will keep executing
>> operations/transactions
>> that are mapped into the partitions of those cells that won't be
>> rebalanced, is that correct?
>>
>> -
>> Denis
>>
>>
>> On Wed, May 6, 2020 at 3:24 AM Anton Vinogradov <[hidden email]> wrote:
>>
>> > Igniters,
>> >
>> > PME-free switch [1] (since 2.8) skips PME on node left when possible
>> > (baseline + fully rebalanced cluster).
>> > This means we already wait for nothing (except recovery) to perform the
>> > switch.
>> > This optimization allows continuing already started operations during or
>> > after the switch if they are not affected by failed primary.
>> > But upcoming operations still can't be started until the switch is
>> finished
>> > cluster-wide.
>> >
>> > Let me propose an additional optimization - Cellular switch.
>> > Cellular Affinity [2] means that nodes combined into virtual cells
>> where,
>> > for each partition, backups located at the same cell with primaries.
>> > The simplest way to gain Cellular Affinity is to use backup filters [3].
>> >
>> > Cellular Affinity allows to finish the switch outside the affected cell
>> > instantly with the following assumptions:
>> > - Replicated caches should be recovered first since every node affected
>> (as
>> > a backup) by any failed primary.
>> >   But, it is expected that replicated caches effectively read-only (has
>> > extremely rare updates), so, nothing to wait here.
>> > - Upcoming replicated transactions (with non-failed primaries) can be
>> > started but can't be committed until switch finished cluster-wide.
>> > - Upcoming transactions related to the broken cell will wait for cell
>> > recovery (cluster-wide switch finish).
>> >
>> > ... and this means:
>> > In addition to PME-free switch, where we able to continue already
>> started
>> > operations during or after the switch, now we also able to perform most
>> of
>> > the upcoming operations during the switch.
>> >
>> > In other words, Cellular switch has little effect on the operation's
>> > latency, when operation not related to the failed cell.
>> >
>> > According to benchmark [4] which checks "how fast upcoming transactions
>> > (started after switch start) can be committed when we have thousands of
>> > prepared transactions (prepared before switch start)", we have 5326 ms
>> [5]
>> > operation's latency on master and 65 ms [6] with the proposed fix,
>> which is
>> > ~100 times faster.
>> >
>> > Fix [7] (as a part of IEP-45 [8]) ready to be reviewed.
>> > Waiting for your review!
>> >
>> >
>> > [1]
>> >
>> >
>> http://apache-ignite-developers.2346864.n4.nabble.com/Non-blocking-PME-Phase-One-Node-fail-tp43531p44586.html
>> > [2]
>> >
>> >
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up#IEP-45:CrashRecoverySpeed-Up-Cellularswitch
>> > [3]
>> >
>> >
>> https://gist.github.com/anton-vinogradov/c50f9d0ce3e3e2997646f84ba7eba5f5#file-bench-java-L417
>> > [4]
>> >
>> https://gist.github.com/anton-vinogradov/c50f9d0ce3e3e2997646f84ba7eba5f5
>> > [5]
>> >
>> >
>> https://gist.github.com/anton-vinogradov/a35a3a8151b7494aa84b83f58cb75889#file-master-txt-L15
>> > [6]
>> >
>> >
>> https://gist.github.com/anton-vinogradov/a35a3a8151b7494aa84b83f58cb75889#file-fix-txt-L15
>> > [7] https://issues.apache.org/jira/browse/IGNITE-12617
>> > [8]
>> >
>> >
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up
>> >
>>
>