[jira] [Created] (IGNITE-8828) Detecting and stopping unresponsive nodes during Partition Map Exchange

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (IGNITE-8828) Detecting and stopping unresponsive nodes during Partition Map Exchange

Anton Vinogradov (Jira)
Sergey Chugunov created IGNITE-8828:
---------------------------------------

             Summary: Detecting and stopping unresponsive nodes during Partition Map Exchange
                 Key: IGNITE-8828
                 URL: https://issues.apache.org/jira/browse/IGNITE-8828
             Project: Ignite
          Issue Type: Improvement
            Reporter: Sergey Chugunov


During PME process coordinator (1) gathers local partition maps from all nodes and (2) sends calculated full partition map back to all nodes in the topology.

However if one or more nodes fail to send local information on step 1 for any reason, PME process hangs blocking all operations. The only solution will be to manually identify and stop nodes which failed to send info to coordinator.

This should be done by coordinator itself: in case it didn't receive in time local partition maps from any nodes, it should check that stopping these nodes won't lead to data loss and then stop them forcibly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)