[jira] [Created] (IGNITE-8785) Node may hang indefinitely in CONNECTING state during cluster segmentation

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (IGNITE-8785) Node may hang indefinitely in CONNECTING state during cluster segmentation

Anton Vinogradov (Jira)
Pavel Kovalenko created IGNITE-8785:
---------------------------------------

             Summary: Node may hang indefinitely in CONNECTING state during cluster segmentation
                 Key: IGNITE-8785
                 URL: https://issues.apache.org/jira/browse/IGNITE-8785
             Project: Ignite
          Issue Type: Bug
          Components: cache
    Affects Versions: 2.5
            Reporter: Pavel Kovalenko
             Fix For: 2.6


Affected test: org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest#testTopologyValidatorWithCacheGroup

Node hangs with following stacktrace:

{noformat}
"grid-starter-testTopologyValidatorWithCacheGroup-22" #117619 prio=5 os_prio=0 tid=0x00007f17dd19b800 nid=0x304a in Object.wait() [0x00007f16b19df000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:931)
        - locked <0x0000000705ee4a60> (a java.lang.Object)
        at org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:373)
        at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:1948)
        at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297)
        at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:915)
        at org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1739)
        at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1046)
        at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2014)
        at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1723)
        - locked <0x0000000705995ec0> (a org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
        at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1151)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:649)
        at org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:882)
        at org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:845)
        at org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:833)
        at org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:799)
        at org.apache.ignite.testframework.junits.GridAbstractTest$3.call(GridAbstractTest.java:742)
        at org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:86)
{noformat}

It seems that node never receives acknowledgment from coordinator.

There were some failure before:

{noformat}
[org.apache.ignite:ignite-core] [2018-06-10 04:59:18,876][WARN ][grid-starter-testTopologyValidatorWithCacheGroup-22][IgniteCacheTopologySplitAbstractTest$SplitTcpDiscoverySpi] Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000]
{noformat}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)