Hello Igniters,
Current implementation of GridDhtPartitionsExchangeFuture#waitPartitionRelease function doesn't give us 100% guarantees that after this method completes there are no ongoing atomic or transactional updates on current node during main stage of PME. It gives us only guarantee that all primary updates will be finished on that node, while we can still receive and process backup updates after this method. Example of such case is described in https://issues.apache.org/jira/browse/IGNITE-7871 To avoid such situations we would like to implement second phase of waitPartitionRelease method. On this phase every server node participating in PME should wait while all other server nodes will finish their ongoing updates. Here is brief algorithm description: Non-coordinator node: 1) Finish all ongoing atomic & transactional updates. 2) Send acknowledgement to coordinator. 3) Wait for final acknowledgement from coordinator, that all nodes finished their updates. 4) Continue PME. Coordinator node: 1) Finish all ongoing atomic & transactional updates. 2) Wait for all acknowledgements from all server nodes. 3) Send final acknowledgement to all server nodes. 4) Continue PME. Acknowledgement messages have tiny size, so network pressure and overall performance drop will be minimal. Another solution of the problem is just cancelling atomic backup updates and transactional backup updates on PREPARED phase if topology version is changed. But from user perspective it's not correct to catch transaction errors even in cases when node is joining to the cluster. Any thoughts? |
Hi Igniters,
I prefer option 1 because throwing any exceptions is bad for product usability. I think we should do this way only if it is unavoidable. In the same time it would be good if we could provide so reliable but optimized (from the point of view of messages count) solution. Please share your vision. Sincerely, Dmitriy Pavlov пн, 19 мар. 2018 г. в 20:15, Pavel Kovalenko <[hidden email]>: > Hello Igniters, > > Current implementation of > GridDhtPartitionsExchangeFuture#waitPartitionRelease function doesn't give > us 100% guarantees that > after this method completes there are no ongoing atomic or transactional > updates on current node during main stage of PME. > It gives us only guarantee that all primary updates will be finished on > that node, while we can still receive and process backup updates after this > method. > Example of such case is described in > https://issues.apache.org/jira/browse/IGNITE-7871 > > To avoid such situations we would like to implement second phase of > waitPartitionRelease method. > On this phase every server node participating in PME should wait while all > other server nodes will finish their ongoing updates. > > Here is brief algorithm description: > > Non-coordinator node: > 1) Finish all ongoing atomic & transactional updates. > 2) Send acknowledgement to coordinator. > 3) Wait for final acknowledgement from coordinator, that all nodes finished > their updates. > 4) Continue PME. > > Coordinator node: > 1) Finish all ongoing atomic & transactional updates. > 2) Wait for all acknowledgements from all server nodes. > 3) Send final acknowledgement to all server nodes. > 4) Continue PME. > > Acknowledgement messages have tiny size, so network pressure and overall > performance drop will be minimal. > > Another solution of the problem is just cancelling atomic backup updates > and transactional backup updates on PREPARED phase if topology version is > changed. > But from user perspective it's not correct to catch transaction errors even > in cases when node is joining to the cluster. > > Any thoughts? > |
For now, I think the two-phase await is the only option. After the fix is
prototyped we need to benchmark and check what is the impact of this change on PME timing. 2018-03-20 18:09 GMT+03:00 Dmitry Pavlov <[hidden email]>: > Hi Igniters, > > I prefer option 1 because throwing any exceptions is bad for product > usability. I think we should do this way only if it is unavoidable. > > In the same time it would be good if we could provide so reliable but > optimized (from the point of view of messages count) solution. > > Please share your vision. > > Sincerely, > Dmitriy Pavlov > > пн, 19 мар. 2018 г. в 20:15, Pavel Kovalenko <[hidden email]>: > > > Hello Igniters, > > > > Current implementation of > > GridDhtPartitionsExchangeFuture#waitPartitionRelease function doesn't > give > > us 100% guarantees that > > after this method completes there are no ongoing atomic or transactional > > updates on current node during main stage of PME. > > It gives us only guarantee that all primary updates will be finished on > > that node, while we can still receive and process backup updates after > this > > method. > > Example of such case is described in > > https://issues.apache.org/jira/browse/IGNITE-7871 > > > > To avoid such situations we would like to implement second phase of > > waitPartitionRelease method. > > On this phase every server node participating in PME should wait while > all > > other server nodes will finish their ongoing updates. > > > > Here is brief algorithm description: > > > > Non-coordinator node: > > 1) Finish all ongoing atomic & transactional updates. > > 2) Send acknowledgement to coordinator. > > 3) Wait for final acknowledgement from coordinator, that all nodes > finished > > their updates. > > 4) Continue PME. > > > > Coordinator node: > > 1) Finish all ongoing atomic & transactional updates. > > 2) Wait for all acknowledgements from all server nodes. > > 3) Send final acknowledgement to all server nodes. > > 4) Continue PME. > > > > Acknowledgement messages have tiny size, so network pressure and overall > > performance drop will be minimal. > > > > Another solution of the problem is just cancelling atomic backup updates > > and transactional backup updates on PREPARED phase if topology version is > > changed. > > But from user perspective it's not correct to catch transaction errors > even > > in cases when node is joining to the cluster. > > > > Any thoughts? > > > |
Free forum by Nabble | Edit this page |