Igniters,
PME-free switch [1] (since 2.8) skips PME on node left when possible (baseline + fully rebalanced cluster). This means we already wait for nothing (except recovery) to perform the switch. This optimization allows continuing already started operations during or after the switch if they are not affected by failed primary. But upcoming operations still can't be started until the switch is finished cluster-wide. Let me propose an additional optimization - Cellular switch. Cellular Affinity [2] means that nodes combined into virtual cells where, for each partition, backups located at the same cell with primaries. The simplest way to gain Cellular Affinity is to use backup filters [3]. Cellular Affinity allows to finish the switch outside the affected cell instantly with the following assumptions: - Replicated caches should be recovered first since every node affected (as a backup) by any failed primary. But, it is expected that replicated caches effectively read-only (has extremely rare updates), so, nothing to wait here. - Upcoming replicated transactions (with non-failed primaries) can be started but can't be committed until switch finished cluster-wide. - Upcoming transactions related to the broken cell will wait for cell recovery (cluster-wide switch finish). ... and this means: In addition to PME-free switch, where we able to continue already started operations during or after the switch, now we also able to perform most of the upcoming operations during the switch. In other words, Cellular switch has little effect on the operation's latency, when operation not related to the failed cell. According to benchmark [4] which checks "how fast upcoming transactions (started after switch start) can be committed when we have thousands of prepared transactions (prepared before switch start)", we have 5326 ms [5] operation's latency on master and 65 ms [6] with the proposed fix, which is ~100 times faster. Fix [7] (as a part of IEP-45 [8]) ready to be reviewed. Waiting for your review! [1] http://apache-ignite-developers.2346864.n4.nabble.com/Non-blocking-PME-Phase-One-Node-fail-tp43531p44586.html [2] https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up#IEP-45:CrashRecoverySpeed-Up-Cellularswitch [3] https://gist.github.com/anton-vinogradov/c50f9d0ce3e3e2997646f84ba7eba5f5#file-bench-java-L417 [4] https://gist.github.com/anton-vinogradov/c50f9d0ce3e3e2997646f84ba7eba5f5 [5] https://gist.github.com/anton-vinogradov/a35a3a8151b7494aa84b83f58cb75889#file-master-txt-L15 [6] https://gist.github.com/anton-vinogradov/a35a3a8151b7494aa84b83f58cb75889#file-fix-txt-L15 [7] https://issues.apache.org/jira/browse/IGNITE-12617 [8] https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up |
Hi Anton,
Generally, it means that Ignite will keep executing operations/transactions that are mapped into the partitions of those cells that won't be rebalanced, is that correct? - Denis On Wed, May 6, 2020 at 3:24 AM Anton Vinogradov <[hidden email]> wrote: > Igniters, > > PME-free switch [1] (since 2.8) skips PME on node left when possible > (baseline + fully rebalanced cluster). > This means we already wait for nothing (except recovery) to perform the > switch. > This optimization allows continuing already started operations during or > after the switch if they are not affected by failed primary. > But upcoming operations still can't be started until the switch is finished > cluster-wide. > > Let me propose an additional optimization - Cellular switch. > Cellular Affinity [2] means that nodes combined into virtual cells where, > for each partition, backups located at the same cell with primaries. > The simplest way to gain Cellular Affinity is to use backup filters [3]. > > Cellular Affinity allows to finish the switch outside the affected cell > instantly with the following assumptions: > - Replicated caches should be recovered first since every node affected (as > a backup) by any failed primary. > But, it is expected that replicated caches effectively read-only (has > extremely rare updates), so, nothing to wait here. > - Upcoming replicated transactions (with non-failed primaries) can be > started but can't be committed until switch finished cluster-wide. > - Upcoming transactions related to the broken cell will wait for cell > recovery (cluster-wide switch finish). > > ... and this means: > In addition to PME-free switch, where we able to continue already started > operations during or after the switch, now we also able to perform most of > the upcoming operations during the switch. > > In other words, Cellular switch has little effect on the operation's > latency, when operation not related to the failed cell. > > According to benchmark [4] which checks "how fast upcoming transactions > (started after switch start) can be committed when we have thousands of > prepared transactions (prepared before switch start)", we have 5326 ms [5] > operation's latency on master and 65 ms [6] with the proposed fix, which is > ~100 times faster. > > Fix [7] (as a part of IEP-45 [8]) ready to be reviewed. > Waiting for your review! > > > [1] > > http://apache-ignite-developers.2346864.n4.nabble.com/Non-blocking-PME-Phase-One-Node-fail-tp43531p44586.html > [2] > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up#IEP-45:CrashRecoverySpeed-Up-Cellularswitch > [3] > > https://gist.github.com/anton-vinogradov/c50f9d0ce3e3e2997646f84ba7eba5f5#file-bench-java-L417 > [4] > https://gist.github.com/anton-vinogradov/c50f9d0ce3e3e2997646f84ba7eba5f5 > [5] > > https://gist.github.com/anton-vinogradov/a35a3a8151b7494aa84b83f58cb75889#file-master-txt-L15 > [6] > > https://gist.github.com/anton-vinogradov/a35a3a8151b7494aa84b83f58cb75889#file-fix-txt-L15 > [7] https://issues.apache.org/jira/browse/IGNITE-12617 > [8] > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up > |
Denis,
Rebalance is not expected here since this optimization works only on a fully rebalanced cluster with baseline. On Sat, May 9, 2020 at 12:48 AM Denis Magda <[hidden email]> wrote: > Hi Anton, > > Generally, it means that Ignite will keep executing operations/transactions > that are mapped into the partitions of those cells that won't be > rebalanced, is that correct? > > - > Denis > > > On Wed, May 6, 2020 at 3:24 AM Anton Vinogradov <[hidden email]> wrote: > > > Igniters, > > > > PME-free switch [1] (since 2.8) skips PME on node left when possible > > (baseline + fully rebalanced cluster). > > This means we already wait for nothing (except recovery) to perform the > > switch. > > This optimization allows continuing already started operations during or > > after the switch if they are not affected by failed primary. > > But upcoming operations still can't be started until the switch is > finished > > cluster-wide. > > > > Let me propose an additional optimization - Cellular switch. > > Cellular Affinity [2] means that nodes combined into virtual cells where, > > for each partition, backups located at the same cell with primaries. > > The simplest way to gain Cellular Affinity is to use backup filters [3]. > > > > Cellular Affinity allows to finish the switch outside the affected cell > > instantly with the following assumptions: > > - Replicated caches should be recovered first since every node affected > (as > > a backup) by any failed primary. > > But, it is expected that replicated caches effectively read-only (has > > extremely rare updates), so, nothing to wait here. > > - Upcoming replicated transactions (with non-failed primaries) can be > > started but can't be committed until switch finished cluster-wide. > > - Upcoming transactions related to the broken cell will wait for cell > > recovery (cluster-wide switch finish). > > > > ... and this means: > > In addition to PME-free switch, where we able to continue already started > > operations during or after the switch, now we also able to perform most > of > > the upcoming operations during the switch. > > > > In other words, Cellular switch has little effect on the operation's > > latency, when operation not related to the failed cell. > > > > According to benchmark [4] which checks "how fast upcoming transactions > > (started after switch start) can be committed when we have thousands of > > prepared transactions (prepared before switch start)", we have 5326 ms > [5] > > operation's latency on master and 65 ms [6] with the proposed fix, which > is > > ~100 times faster. > > > > Fix [7] (as a part of IEP-45 [8]) ready to be reviewed. > > Waiting for your review! > > > > > > [1] > > > > > http://apache-ignite-developers.2346864.n4.nabble.com/Non-blocking-PME-Phase-One-Node-fail-tp43531p44586.html > > [2] > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up#IEP-45:CrashRecoverySpeed-Up-Cellularswitch > > [3] > > > > > https://gist.github.com/anton-vinogradov/c50f9d0ce3e3e2997646f84ba7eba5f5#file-bench-java-L417 > > [4] > > > https://gist.github.com/anton-vinogradov/c50f9d0ce3e3e2997646f84ba7eba5f5 > > [5] > > > > > https://gist.github.com/anton-vinogradov/a35a3a8151b7494aa84b83f58cb75889#file-master-txt-L15 > > [6] > > > > > https://gist.github.com/anton-vinogradov/a35a3a8151b7494aa84b83f58cb75889#file-fix-txt-L15 > > [7] https://issues.apache.org/jira/browse/IGNITE-12617 > > [8] > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up > > > |
Folks,
It seems, we have tacit agreement here. Going to merge fix May 15. On Tue, May 12, 2020 at 10:08 AM Anton Vinogradov <[hidden email]> wrote: > Denis, > > Rebalance is not expected here since this optimization works only on a > fully rebalanced cluster with baseline. > > On Sat, May 9, 2020 at 12:48 AM Denis Magda <[hidden email]> wrote: > >> Hi Anton, >> >> Generally, it means that Ignite will keep executing >> operations/transactions >> that are mapped into the partitions of those cells that won't be >> rebalanced, is that correct? >> >> - >> Denis >> >> >> On Wed, May 6, 2020 at 3:24 AM Anton Vinogradov <[hidden email]> wrote: >> >> > Igniters, >> > >> > PME-free switch [1] (since 2.8) skips PME on node left when possible >> > (baseline + fully rebalanced cluster). >> > This means we already wait for nothing (except recovery) to perform the >> > switch. >> > This optimization allows continuing already started operations during or >> > after the switch if they are not affected by failed primary. >> > But upcoming operations still can't be started until the switch is >> finished >> > cluster-wide. >> > >> > Let me propose an additional optimization - Cellular switch. >> > Cellular Affinity [2] means that nodes combined into virtual cells >> where, >> > for each partition, backups located at the same cell with primaries. >> > The simplest way to gain Cellular Affinity is to use backup filters [3]. >> > >> > Cellular Affinity allows to finish the switch outside the affected cell >> > instantly with the following assumptions: >> > - Replicated caches should be recovered first since every node affected >> (as >> > a backup) by any failed primary. >> > But, it is expected that replicated caches effectively read-only (has >> > extremely rare updates), so, nothing to wait here. >> > - Upcoming replicated transactions (with non-failed primaries) can be >> > started but can't be committed until switch finished cluster-wide. >> > - Upcoming transactions related to the broken cell will wait for cell >> > recovery (cluster-wide switch finish). >> > >> > ... and this means: >> > In addition to PME-free switch, where we able to continue already >> started >> > operations during or after the switch, now we also able to perform most >> of >> > the upcoming operations during the switch. >> > >> > In other words, Cellular switch has little effect on the operation's >> > latency, when operation not related to the failed cell. >> > >> > According to benchmark [4] which checks "how fast upcoming transactions >> > (started after switch start) can be committed when we have thousands of >> > prepared transactions (prepared before switch start)", we have 5326 ms >> [5] >> > operation's latency on master and 65 ms [6] with the proposed fix, >> which is >> > ~100 times faster. >> > >> > Fix [7] (as a part of IEP-45 [8]) ready to be reviewed. >> > Waiting for your review! >> > >> > >> > [1] >> > >> > >> http://apache-ignite-developers.2346864.n4.nabble.com/Non-blocking-PME-Phase-One-Node-fail-tp43531p44586.html >> > [2] >> > >> > >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up#IEP-45:CrashRecoverySpeed-Up-Cellularswitch >> > [3] >> > >> > >> https://gist.github.com/anton-vinogradov/c50f9d0ce3e3e2997646f84ba7eba5f5#file-bench-java-L417 >> > [4] >> > >> https://gist.github.com/anton-vinogradov/c50f9d0ce3e3e2997646f84ba7eba5f5 >> > [5] >> > >> > >> https://gist.github.com/anton-vinogradov/a35a3a8151b7494aa84b83f58cb75889#file-master-txt-L15 >> > [6] >> > >> > >> https://gist.github.com/anton-vinogradov/a35a3a8151b7494aa84b83f58cb75889#file-fix-txt-L15 >> > [7] https://issues.apache.org/jira/browse/IGNITE-12617 >> > [8] >> > >> > >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up >> > >> > |
Free forum by Nabble | Edit this page |