Hello Ignite community,
When testing Ignite in dockerized environment I faced the following issue with current TcpComminicationSpi implementation. I had several physical machines and each Ignite node running inside Docker container had at least two InetAddresses associated with it: one IP address associated with physical host and one additional IP address of Docker bridge interface *which was default and the same accross all physical machines*. Each node publishes address of its Docker bridge in the list of its addresses although it is not reachable from remote nodes. So when node tries to establish communication connection using remote node's Docker address its request goes to itself like it was a loopback address. I would suggest to implement a simple heuristic to avoid this: before connecting to some remote node's address CommunicationSpi should check whether local node has exactly the same address. If "remote" and local addresses are the same CommunicationSpi should skip such address from remote node's list and proceed with the next one. Is it safe to implement such heuristic in TcpCommunicationSpi or there are some risks I'm missing? I would really appreciate any help from expert with deep knowledge of Communication mechanics. If such improvement makes sense I'll file a ticket and start working on it. Thanks, Sergey. |
Sergey,
The way I "solved" this problem was to modify both org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi#getNodeAddresses(TcpDiscoveryNode, boolean) and org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi#nodeAddresses(ClusterNode) to make sure the external IP addresses (the ones in ATTR_EXT_ADDRS attribute of the cluster node) are listed first in the returned collection. It did fix the problem and significantly reduced the connection time as Ignite no longer had to waste time attempting to connect to the remote node's Docker's internal IP. It always results in a socket timeout (2 seconds, by default), and in case of multiple nodes, making the cluster startup very slow and unreliable. Of course, having a Docker Swarm with an overlay network would probably solve this problem more elegantly without any code changes, but I'm not a Docker expert and Docker Swarm is not my target execution environment anyway. I'd like to be able to deploy Ignite nodes in standalone containers and have them join the cluster as if they were running on physical hardware. Hope it helps. Andrey ________________________________ From: Sergey Chugunov <[hidden email]> Sent: Friday, February 9, 2018 3:54 AM To: [hidden email] Subject: TcpCommunicationSpi in dockerized environment Hello Ignite community, When testing Ignite in dockerized environment I faced the following issue with current TcpComminicationSpi implementation. I had several physical machines and each Ignite node running inside Docker container had at least two InetAddresses associated with it: one IP address associated with physical host and one additional IP address of Docker bridge interface *which was default and the same accross all physical machines*. Each node publishes address of its Docker bridge in the list of its addresses although it is not reachable from remote nodes. So when node tries to establish communication connection using remote node's Docker address its request goes to itself like it was a loopback address. I would suggest to implement a simple heuristic to avoid this: before connecting to some remote node's address CommunicationSpi should check whether local node has exactly the same address. If "remote" and local addresses are the same CommunicationSpi should skip such address from remote node's list and proceed with the next one. Is it safe to implement such heuristic in TcpCommunicationSpi or there are some risks I'm missing? I would really appreciate any help from expert with deep knowledge of Communication mechanics. If such improvement makes sense I'll file a ticket and start working on it. Thanks, Sergey. |
Free forum by Nabble | Edit this page |