Semen Boikov created IGNITE-4798:
------------------------------------
Summary: Cluster does not finish rebalancing after nodes leaving
Key: IGNITE-4798
URL:
https://issues.apache.org/jira/browse/IGNITE-4798 Project: Ignite
Issue Type: Bug
Reporter: Denis Kholodov
Hi Valentin,
I managed to reproduce the stability issue we've been having in production in a relatively sterile environment.
The logs and stack traces are accessible here:
https://drive.google.com/open?id=0B1YMrCiHZq1PMWJsblBYSXhaX1kThe situation is:
1. Startup a cluster of 223 nodes.
2. Wait for everything to stabilize (took about 2 minutes).
3. Shut down 112 nodes.
4. Wait for everything to stabilize..
Since that point, I can't connect client nodes to the cluster:
2017-02-15 23:13:16.396 WARN o.a.i.i.p.c.GridCachePartitionExchangeManager main ctx: actor: - Failed to wait for initial partition map exchange. Possible reasons are:
^-- Transactions in deadlock.
^-- Long running transactions (ignore if this is the case).
^-- Unreleased explicit locks.
Other cache operations are also stuck.
Let me know what other information I can provide.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)