Sort nodes in the ring in order to minimize the number of reconnections

classic Classic list List threaded Threaded
41 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Sort nodes in the ring in order to minimize the number of reconnections

Александр Меньшиков
Hello everyone,

As far as I know nodes are connected in a ring. For example if i have 6
nodes, with names A, B, C, D, E, and F they can connect in ring any
possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, etc. And if some node falls
out of topology neighboring nodes must reconnect. If nodes A,B and C
located in the same physical location, and D, E and F in another, and in
some time one physical location is not available in another, we can get
different number of reconnections. Best case scenario if we have ring like
A-B-CxD-E-FxA ('x' mean disconnect) -- then we get only one reconnect (C
reconnect to A or F reconnect to D -- depending on what part of the cluster
we leave alive). But now possible that case AxFxBxExCxDxA -- then we get a
lot of reconnections (A to B, B to C, C to A -- in general n/2
reconnections, where n -- number of nodes). And i think to add something to
ensure that we always have good sorting of nodes connections
(A-B-C-...-Z-A).

Of course in real world we can have multiple levels of physical closeness.

In my opinion enough to add one parameter of 'int' to configuration (with
name like 'ExtraNodeOrder') and to change the method of comparison nodes so
that it first compared the 'ExtraNodeOrder', and then according to the old
criterion (as far as I know Ignite use topology version). So if some users
have multiple levels of physical closeness, he can use different bits. For
example use 16 high bits for DC number, and low 16 bits for racks.

Alternatively, we can add array of ‘int’ to configuration and compare nodes
in sequence from the zero element to the last.
Reply | Threaded
Open this post in threaded view
|

Re: Sort nodes in the ring in order to minimize the number of reconnections

daradurvs
Hello, Alex!

I think it is a great idea.

I suggest to build communications between nodes on weight (or priority).

For example, ordering on latency:
- nodes on one host = 1
- nodes in one rack-blade = 2
- nodes in one server-rack = 3
- nodes in one physical cluster = 4
- nodes in one subnet = 5
- etc.

Maybe it'll be better to use some metrics from ClusterMetrics interface.

The algorithm of ordering can be implemented in a class such as Comparator
and use it when we build a cluster or we select a place for a new node.

--
With best regards,
Vyacheslav Daradur

2016-12-22 13:59 GMT+03:00 Александр Меньшиков <[hidden email]>:

> Hello everyone,
>
> As far as I know nodes are connected in a ring. For example if i have 6
> nodes, with names A, B, C, D, E, and F they can connect in ring any
> possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, etc. And if some node falls
> out of topology neighboring nodes must reconnect. If nodes A,B and C
> located in the same physical location, and D, E and F in another, and in
> some time one physical location is not available in another, we can get
> different number of reconnections. Best case scenario if we have ring like
> A-B-CxD-E-FxA ('x' mean disconnect) -- then we get only one reconnect (C
> reconnect to A or F reconnect to D -- depending on what part of the cluster
> we leave alive). But now possible that case AxFxBxExCxDxA -- then we get a
> lot of reconnections (A to B, B to C, C to A -- in general n/2
> reconnections, where n -- number of nodes). And i think to add something to
> ensure that we always have good sorting of nodes connections
> (A-B-C-...-Z-A).
>
> Of course in real world we can have multiple levels of physical closeness.
>
> In my opinion enough to add one parameter of 'int' to configuration (with
> name like 'ExtraNodeOrder') and to change the method of comparison nodes so
> that it first compared the 'ExtraNodeOrder', and then according to the old
> criterion (as far as I know Ignite use topology version). So if some users
> have multiple levels of physical closeness, he can use different bits. For
> example use 16 high bits for DC number, and low 16 bits for racks.
>
> Alternatively, we can add array of ‘int’ to configuration and compare nodes
> in sequence from the zero element to the last.
>
Reply | Threaded
Open this post in threaded view
|

Re: Sort nodes in the ring in order to minimize the number of reconnections

dsetrakyan
I think having some user-defined ordering can be beneficial. However, we
are only talking about node discovery protocol here to maintain the
cluster. All other communication between nodes happens directly (does not
go through the ring).

D.

On Thu, Dec 22, 2016 at 6:32 AM, Vyacheslav Daradur <[hidden email]>
wrote:

> Hello, Alex!
>
> I think it is a great idea.
>
> I suggest to build communications between nodes on weight (or priority).
>
> For example, ordering on latency:
> - nodes on one host = 1
> - nodes in one rack-blade = 2
> - nodes in one server-rack = 3
> - nodes in one physical cluster = 4
> - nodes in one subnet = 5
> - etc.
>
> Maybe it'll be better to use some metrics from ClusterMetrics interface.
>
> The algorithm of ordering can be implemented in a class such as Comparator
> and use it when we build a cluster or we select a place for a new node.
>
> --
> With best regards,
> Vyacheslav Daradur
>
> 2016-12-22 13:59 GMT+03:00 Александр Меньшиков <[hidden email]>:
>
> > Hello everyone,
> >
> > As far as I know nodes are connected in a ring. For example if i have 6
> > nodes, with names A, B, C, D, E, and F they can connect in ring any
> > possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, etc. And if some node
> falls
> > out of topology neighboring nodes must reconnect. If nodes A,B and C
> > located in the same physical location, and D, E and F in another, and in
> > some time one physical location is not available in another, we can get
> > different number of reconnections. Best case scenario if we have ring
> like
> > A-B-CxD-E-FxA ('x' mean disconnect) -- then we get only one reconnect (C
> > reconnect to A or F reconnect to D -- depending on what part of the
> cluster
> > we leave alive). But now possible that case AxFxBxExCxDxA -- then we get
> a
> > lot of reconnections (A to B, B to C, C to A -- in general n/2
> > reconnections, where n -- number of nodes). And i think to add something
> to
> > ensure that we always have good sorting of nodes connections
> > (A-B-C-...-Z-A).
> >
> > Of course in real world we can have multiple levels of physical
> closeness.
> >
> > In my opinion enough to add one parameter of 'int' to configuration (with
> > name like 'ExtraNodeOrder') and to change the method of comparison nodes
> so
> > that it first compared the 'ExtraNodeOrder', and then according to the
> old
> > criterion (as far as I know Ignite use topology version). So if some
> users
> > have multiple levels of physical closeness, he can use different bits.
> For
> > example use 16 high bits for DC number, and low 16 bits for racks.
> >
> > Alternatively, we can add array of ‘int’ to configuration and compare
> nodes
> > in sequence from the zero element to the last.
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Sort nodes in the ring in order to minimize the number of reconnections

daradurvs
Thanks for reply.

I have some questions:

1. Where the logic of Ignite cluster building is realized? DiscoverySpi and
TcpDiscoveryMulticastIpFinder?

2. Which standart Ignite metrics you can recommend to use for node-ordering?

2016-12-22 19:08 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:

> I think having some user-defined ordering can be beneficial. However, we
> are only talking about node discovery protocol here to maintain the
> cluster. All other communication between nodes happens directly (does not
> go through the ring).
>
> D.
>
> On Thu, Dec 22, 2016 at 6:32 AM, Vyacheslav Daradur <[hidden email]>
> wrote:
>
> > Hello, Alex!
> >
> > I think it is a great idea.
> >
> > I suggest to build communications between nodes on weight (or priority).
> >
> > For example, ordering on latency:
> > - nodes on one host = 1
> > - nodes in one rack-blade = 2
> > - nodes in one server-rack = 3
> > - nodes in one physical cluster = 4
> > - nodes in one subnet = 5
> > - etc.
> >
> > Maybe it'll be better to use some metrics from ClusterMetrics interface.
> >
> > The algorithm of ordering can be implemented in a class such as
> Comparator
> > and use it when we build a cluster or we select a place for a new node.
> >
> > --
> > With best regards,
> > Vyacheslav Daradur
> >
> > 2016-12-22 13:59 GMT+03:00 Александр Меньшиков <[hidden email]>:
> >
> > > Hello everyone,
> > >
> > > As far as I know nodes are connected in a ring. For example if i have 6
> > > nodes, with names A, B, C, D, E, and F they can connect in ring any
> > > possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, etc. And if some node
> > falls
> > > out of topology neighboring nodes must reconnect. If nodes A,B and C
> > > located in the same physical location, and D, E and F in another, and
> in
> > > some time one physical location is not available in another, we can get
> > > different number of reconnections. Best case scenario if we have ring
> > like
> > > A-B-CxD-E-FxA ('x' mean disconnect) -- then we get only one reconnect
> (C
> > > reconnect to A or F reconnect to D -- depending on what part of the
> > cluster
> > > we leave alive). But now possible that case AxFxBxExCxDxA -- then we
> get
> > a
> > > lot of reconnections (A to B, B to C, C to A -- in general n/2
> > > reconnections, where n -- number of nodes). And i think to add
> something
> > to
> > > ensure that we always have good sorting of nodes connections
> > > (A-B-C-...-Z-A).
> > >
> > > Of course in real world we can have multiple levels of physical
> > closeness.
> > >
> > > In my opinion enough to add one parameter of 'int' to configuration
> (with
> > > name like 'ExtraNodeOrder') and to change the method of comparison
> nodes
> > so
> > > that it first compared the 'ExtraNodeOrder', and then according to the
> > old
> > > criterion (as far as I know Ignite use topology version). So if some
> > users
> > > have multiple levels of physical closeness, he can use different bits.
> > For
> > > example use 16 high bits for DC number, and low 16 bits for racks.
> > >
> > > Alternatively, we can add array of ‘int’ to configuration and compare
> > nodes
> > > in sequence from the zero element to the last.
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Sort nodes in the ring in order to minimize the number of reconnections

Valentin Kulichenko
Hi Vyacheslav,

Discovery logic is incapsulated in TcpDiscoverySpi.
TcpDiscoveryMulticastIpFinder in one of many implementations of IP finder.
The only purpose of the IP finder is to provide list of addresses where a
node can send initial join request, and the fact that it sends this initial
request to node A doesn't actually mean that it will be connected to A
within a ring. Having said that, I doubt that IP finder will be somehow
affected in case the discussed change is implemented.

Discovery protocol already maintains consistent information about the ring,
so any node in topology already knows everything about other nodes,
including ordering in the ring. So on discovery level it should not be very
difficult to customize where a joining node is placed on the ring.

However, here is the concern I have. Currently when a new node joins,
coordinator assigns order number to this node (e.g. if we already have
nodes 1,2 and 3, new node will have order 4). This node will then be the
last one on the ring, i.e. nodes are always ordered in the ring by this
order number (1->2->3->4->1). If we change this, we will basically allow a
node to be placed anywhere else (smth like 1->2->4->3->1). I'm not 100%
sure if this is going to cause issues, but sounds dangerous.

Yakov, can you please chime in and share your thoughts on this?

-Val

On Fri, Dec 23, 2016 at 2:46 AM, Vyacheslav Daradur <[hidden email]>
wrote:

> Thanks for reply.
>
> I have some questions:
>
> 1. Where the logic of Ignite cluster building is realized? DiscoverySpi and
> TcpDiscoveryMulticastIpFinder?
>
> 2. Which standart Ignite metrics you can recommend to use for
> node-ordering?
>
> 2016-12-22 19:08 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
>
> > I think having some user-defined ordering can be beneficial. However, we
> > are only talking about node discovery protocol here to maintain the
> > cluster. All other communication between nodes happens directly (does not
> > go through the ring).
> >
> > D.
> >
> > On Thu, Dec 22, 2016 at 6:32 AM, Vyacheslav Daradur <[hidden email]
> >
> > wrote:
> >
> > > Hello, Alex!
> > >
> > > I think it is a great idea.
> > >
> > > I suggest to build communications between nodes on weight (or
> priority).
> > >
> > > For example, ordering on latency:
> > > - nodes on one host = 1
> > > - nodes in one rack-blade = 2
> > > - nodes in one server-rack = 3
> > > - nodes in one physical cluster = 4
> > > - nodes in one subnet = 5
> > > - etc.
> > >
> > > Maybe it'll be better to use some metrics from ClusterMetrics
> interface.
> > >
> > > The algorithm of ordering can be implemented in a class such as
> > Comparator
> > > and use it when we build a cluster or we select a place for a new node.
> > >
> > > --
> > > With best regards,
> > > Vyacheslav Daradur
> > >
> > > 2016-12-22 13:59 GMT+03:00 Александр Меньшиков <[hidden email]>:
> > >
> > > > Hello everyone,
> > > >
> > > > As far as I know nodes are connected in a ring. For example if i
> have 6
> > > > nodes, with names A, B, C, D, E, and F they can connect in ring any
> > > > possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, etc. And if some node
> > > falls
> > > > out of topology neighboring nodes must reconnect. If nodes A,B and C
> > > > located in the same physical location, and D, E and F in another, and
> > in
> > > > some time one physical location is not available in another, we can
> get
> > > > different number of reconnections. Best case scenario if we have ring
> > > like
> > > > A-B-CxD-E-FxA ('x' mean disconnect) -- then we get only one reconnect
> > (C
> > > > reconnect to A or F reconnect to D -- depending on what part of the
> > > cluster
> > > > we leave alive). But now possible that case AxFxBxExCxDxA -- then we
> > get
> > > a
> > > > lot of reconnections (A to B, B to C, C to A -- in general n/2
> > > > reconnections, where n -- number of nodes). And i think to add
> > something
> > > to
> > > > ensure that we always have good sorting of nodes connections
> > > > (A-B-C-...-Z-A).
> > > >
> > > > Of course in real world we can have multiple levels of physical
> > > closeness.
> > > >
> > > > In my opinion enough to add one parameter of 'int' to configuration
> > (with
> > > > name like 'ExtraNodeOrder') and to change the method of comparison
> > nodes
> > > so
> > > > that it first compared the 'ExtraNodeOrder', and then according to
> the
> > > old
> > > > criterion (as far as I know Ignite use topology version). So if some
> > > users
> > > > have multiple levels of physical closeness, he can use different
> bits.
> > > For
> > > > example use 16 high bits for DC number, and low 16 bits for racks.
> > > >
> > > > Alternatively, we can add array of ‘int’ to configuration and compare
> > > nodes
> > > > in sequence from the zero element to the last.
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Sort nodes in the ring in order to minimize the number of reconnections

Александр Меньшиков
I in fact worried about the following situation:

Like i said we have ring A->F->B->E->C->D->A, and connection between A,B,C
and D,E,F was been broken. But nodes will detect the fact of the
unavailability of nodes not at the same time. And meanwhile the client will
perform transactional operations. Transactions may rollback many times in
the following sequence of events:

0. Everything is fine: A->F->B->E->C->D->A.
1. Connection between A,B,C and D,E,F is broken.
2. "A" sees "F" falls out of topology and reconnect to "B", all
transactions using the "F" are rolled back and begin with backup node ("B",
for example).
3. After that "B" sees "E" falls out of topology and reconnect to "C", all
transaction using "E" are rolled back and begin with backup node ("C", for
example).
4. After that "C" sees "D" falls out of topology and reconnect to "A", all
transaction using "D" are rolled back and begin with backup node ("A", for
example).

And we get 3 different set of rollbacks, instead one set of rollbacks.

2016-12-23 22:43 GMT+03:00 Valentin Kulichenko <
[hidden email]>:

> Hi Vyacheslav,
>
> Discovery logic is incapsulated in TcpDiscoverySpi.
> TcpDiscoveryMulticastIpFinder in one of many implementations of IP finder.
> The only purpose of the IP finder is to provide list of addresses where a
> node can send initial join request, and the fact that it sends this initial
> request to node A doesn't actually mean that it will be connected to A
> within a ring. Having said that, I doubt that IP finder will be somehow
> affected in case the discussed change is implemented.
>
> Discovery protocol already maintains consistent information about the ring,
> so any node in topology already knows everything about other nodes,
> including ordering in the ring. So on discovery level it should not be very
> difficult to customize where a joining node is placed on the ring.
>
> However, here is the concern I have. Currently when a new node joins,
> coordinator assigns order number to this node (e.g. if we already have
> nodes 1,2 and 3, new node will have order 4). This node will then be the
> last one on the ring, i.e. nodes are always ordered in the ring by this
> order number (1->2->3->4->1). If we change this, we will basically allow a
> node to be placed anywhere else (smth like 1->2->4->3->1). I'm not 100%
> sure if this is going to cause issues, but sounds dangerous.
>
> Yakov, can you please chime in and share your thoughts on this?
>
> -Val
>
> On Fri, Dec 23, 2016 at 2:46 AM, Vyacheslav Daradur <[hidden email]>
> wrote:
>
> > Thanks for reply.
> >
> > I have some questions:
> >
> > 1. Where the logic of Ignite cluster building is realized? DiscoverySpi
> and
> > TcpDiscoveryMulticastIpFinder?
> >
> > 2. Which standart Ignite metrics you can recommend to use for
> > node-ordering?
> >
> > 2016-12-22 19:08 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
> >
> > > I think having some user-defined ordering can be beneficial. However,
> we
> > > are only talking about node discovery protocol here to maintain the
> > > cluster. All other communication between nodes happens directly (does
> not
> > > go through the ring).
> > >
> > > D.
> > >
> > > On Thu, Dec 22, 2016 at 6:32 AM, Vyacheslav Daradur <
> [hidden email]
> > >
> > > wrote:
> > >
> > > > Hello, Alex!
> > > >
> > > > I think it is a great idea.
> > > >
> > > > I suggest to build communications between nodes on weight (or
> > priority).
> > > >
> > > > For example, ordering on latency:
> > > > - nodes on one host = 1
> > > > - nodes in one rack-blade = 2
> > > > - nodes in one server-rack = 3
> > > > - nodes in one physical cluster = 4
> > > > - nodes in one subnet = 5
> > > > - etc.
> > > >
> > > > Maybe it'll be better to use some metrics from ClusterMetrics
> > interface.
> > > >
> > > > The algorithm of ordering can be implemented in a class such as
> > > Comparator
> > > > and use it when we build a cluster or we select a place for a new
> node.
> > > >
> > > > --
> > > > With best regards,
> > > > Vyacheslav Daradur
> > > >
> > > > 2016-12-22 13:59 GMT+03:00 Александр Меньшиков <[hidden email]
> >:
> > > >
> > > > > Hello everyone,
> > > > >
> > > > > As far as I know nodes are connected in a ring. For example if i
> > have 6
> > > > > nodes, with names A, B, C, D, E, and F they can connect in ring any
> > > > > possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, etc. And if some
> node
> > > > falls
> > > > > out of topology neighboring nodes must reconnect. If nodes A,B and
> C
> > > > > located in the same physical location, and D, E and F in another,
> and
> > > in
> > > > > some time one physical location is not available in another, we can
> > get
> > > > > different number of reconnections. Best case scenario if we have
> ring
> > > > like
> > > > > A-B-CxD-E-FxA ('x' mean disconnect) -- then we get only one
> reconnect
> > > (C
> > > > > reconnect to A or F reconnect to D -- depending on what part of the
> > > > cluster
> > > > > we leave alive). But now possible that case AxFxBxExCxDxA -- then
> we
> > > get
> > > > a
> > > > > lot of reconnections (A to B, B to C, C to A -- in general n/2
> > > > > reconnections, where n -- number of nodes). And i think to add
> > > something
> > > > to
> > > > > ensure that we always have good sorting of nodes connections
> > > > > (A-B-C-...-Z-A).
> > > > >
> > > > > Of course in real world we can have multiple levels of physical
> > > > closeness.
> > > > >
> > > > > In my opinion enough to add one parameter of 'int' to configuration
> > > (with
> > > > > name like 'ExtraNodeOrder') and to change the method of comparison
> > > nodes
> > > > so
> > > > > that it first compared the 'ExtraNodeOrder', and then according to
> > the
> > > > old
> > > > > criterion (as far as I know Ignite use topology version). So if
> some
> > > > users
> > > > > have multiple levels of physical closeness, he can use different
> > bits.
> > > > For
> > > > > example use 16 high bits for DC number, and low 16 bits for racks.
> > > > >
> > > > > Alternatively, we can add array of ‘int’ to configuration and
> compare
> > > > nodes
> > > > > in sequence from the zero element to the last.
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Sort nodes in the ring in order to minimize the number of reconnections

dmagda
Alexander,

This is something different and looks unrelated to the discussion we have over here.

A transaction will not be rolled back the way you’re describing. It will be either committed once or rolled back once. There can be and will be inter nodes communication when something fails at the commit phase but this depends on how the affinity function distributes the keys and partitions and not how the nodes are connected at the discovery SPI layer.
 
Here you can learn more about failures handling by 2 phase commit protocol
http://gridgain.blogspot.com/2014/09/two-phase-commit-for-in-memory-caches.html <http://gridgain.blogspot.com/2014/09/two-phase-commit-for-in-memory-caches.html>


Denis

> On Dec 23, 2016, at 12:24 PM, Александр Меньшиков <[hidden email]> wrote:
>
> I in fact worried about the following situation:
>
> Like i said we have ring A->F->B->E->C->D->A, and connection between A,B,C
> and D,E,F was been broken. But nodes will detect the fact of the
> unavailability of nodes not at the same time. And meanwhile the client will
> perform transactional operations. Transactions may rollback many times in
> the following sequence of events:
>
> 0. Everything is fine: A->F->B->E->C->D->A.
> 1. Connection between A,B,C and D,E,F is broken.
> 2. "A" sees "F" falls out of topology and reconnect to "B", all
> transactions using the "F" are rolled back and begin with backup node ("B",
> for example).
> 3. After that "B" sees "E" falls out of topology and reconnect to "C", all
> transaction using "E" are rolled back and begin with backup node ("C", for
> example).
> 4. After that "C" sees "D" falls out of topology and reconnect to "A", all
> transaction using "D" are rolled back and begin with backup node ("A", for
> example).
>
> And we get 3 different set of rollbacks, instead one set of rollbacks.
>
> 2016-12-23 22:43 GMT+03:00 Valentin Kulichenko <
> [hidden email]>:
>
>> Hi Vyacheslav,
>>
>> Discovery logic is incapsulated in TcpDiscoverySpi.
>> TcpDiscoveryMulticastIpFinder in one of many implementations of IP finder.
>> The only purpose of the IP finder is to provide list of addresses where a
>> node can send initial join request, and the fact that it sends this initial
>> request to node A doesn't actually mean that it will be connected to A
>> within a ring. Having said that, I doubt that IP finder will be somehow
>> affected in case the discussed change is implemented.
>>
>> Discovery protocol already maintains consistent information about the ring,
>> so any node in topology already knows everything about other nodes,
>> including ordering in the ring. So on discovery level it should not be very
>> difficult to customize where a joining node is placed on the ring.
>>
>> However, here is the concern I have. Currently when a new node joins,
>> coordinator assigns order number to this node (e.g. if we already have
>> nodes 1,2 and 3, new node will have order 4). This node will then be the
>> last one on the ring, i.e. nodes are always ordered in the ring by this
>> order number (1->2->3->4->1). If we change this, we will basically allow a
>> node to be placed anywhere else (smth like 1->2->4->3->1). I'm not 100%
>> sure if this is going to cause issues, but sounds dangerous.
>>
>> Yakov, can you please chime in and share your thoughts on this?
>>
>> -Val
>>
>> On Fri, Dec 23, 2016 at 2:46 AM, Vyacheslav Daradur <[hidden email]>
>> wrote:
>>
>>> Thanks for reply.
>>>
>>> I have some questions:
>>>
>>> 1. Where the logic of Ignite cluster building is realized? DiscoverySpi
>> and
>>> TcpDiscoveryMulticastIpFinder?
>>>
>>> 2. Which standart Ignite metrics you can recommend to use for
>>> node-ordering?
>>>
>>> 2016-12-22 19:08 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
>>>
>>>> I think having some user-defined ordering can be beneficial. However,
>> we
>>>> are only talking about node discovery protocol here to maintain the
>>>> cluster. All other communication between nodes happens directly (does
>> not
>>>> go through the ring).
>>>>
>>>> D.
>>>>
>>>> On Thu, Dec 22, 2016 at 6:32 AM, Vyacheslav Daradur <
>> [hidden email]
>>>>
>>>> wrote:
>>>>
>>>>> Hello, Alex!
>>>>>
>>>>> I think it is a great idea.
>>>>>
>>>>> I suggest to build communications between nodes on weight (or
>>> priority).
>>>>>
>>>>> For example, ordering on latency:
>>>>> - nodes on one host = 1
>>>>> - nodes in one rack-blade = 2
>>>>> - nodes in one server-rack = 3
>>>>> - nodes in one physical cluster = 4
>>>>> - nodes in one subnet = 5
>>>>> - etc.
>>>>>
>>>>> Maybe it'll be better to use some metrics from ClusterMetrics
>>> interface.
>>>>>
>>>>> The algorithm of ordering can be implemented in a class such as
>>>> Comparator
>>>>> and use it when we build a cluster or we select a place for a new
>> node.
>>>>>
>>>>> --
>>>>> With best regards,
>>>>> Vyacheslav Daradur
>>>>>
>>>>> 2016-12-22 13:59 GMT+03:00 Александр Меньшиков <[hidden email]
>>> :
>>>>>
>>>>>> Hello everyone,
>>>>>>
>>>>>> As far as I know nodes are connected in a ring. For example if i
>>> have 6
>>>>>> nodes, with names A, B, C, D, E, and F they can connect in ring any
>>>>>> possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, etc. And if some
>> node
>>>>> falls
>>>>>> out of topology neighboring nodes must reconnect. If nodes A,B and
>> C
>>>>>> located in the same physical location, and D, E and F in another,
>> and
>>>> in
>>>>>> some time one physical location is not available in another, we can
>>> get
>>>>>> different number of reconnections. Best case scenario if we have
>> ring
>>>>> like
>>>>>> A-B-CxD-E-FxA ('x' mean disconnect) -- then we get only one
>> reconnect
>>>> (C
>>>>>> reconnect to A or F reconnect to D -- depending on what part of the
>>>>> cluster
>>>>>> we leave alive). But now possible that case AxFxBxExCxDxA -- then
>> we
>>>> get
>>>>> a
>>>>>> lot of reconnections (A to B, B to C, C to A -- in general n/2
>>>>>> reconnections, where n -- number of nodes). And i think to add
>>>> something
>>>>> to
>>>>>> ensure that we always have good sorting of nodes connections
>>>>>> (A-B-C-...-Z-A).
>>>>>>
>>>>>> Of course in real world we can have multiple levels of physical
>>>>> closeness.
>>>>>>
>>>>>> In my opinion enough to add one parameter of 'int' to configuration
>>>> (with
>>>>>> name like 'ExtraNodeOrder') and to change the method of comparison
>>>> nodes
>>>>> so
>>>>>> that it first compared the 'ExtraNodeOrder', and then according to
>>> the
>>>>> old
>>>>>> criterion (as far as I know Ignite use topology version). So if
>> some
>>>>> users
>>>>>> have multiple levels of physical closeness, he can use different
>>> bits.
>>>>> For
>>>>>> example use 16 high bits for DC number, and low 16 bits for racks.
>>>>>>
>>>>>> Alternatively, we can add array of ‘int’ to configuration and
>> compare
>>>>> nodes
>>>>>> in sequence from the zero element to the last.
>>>>>>
>>>>>
>>>>
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Sort nodes in the ring in order to minimize the number of reconnections

yzhdanov
In reply to this post by Valentin Kulichenko
>>
For example, ordering on latency:
- nodes on one host = 1
- nodes in one rack-blade = 2
- nodes in one server-rack = 3
- nodes in one physical cluster = 4
- nodes in one subnet = 5
- etc.

Maybe it'll be better to use some metrics from ClusterMetrics interface.

The algorithm of ordering can be implemented in a class such as Comparator
and use it when we build a cluster or we select a place for a new node.
>>

Vyacheslav, please elaborate on how we can determine whether we are on the
same rack. I am not sure this is possible in general case. Please see my
suggestions below.

>>
However, here is the concern I have. Currently when a new node joins,
coordinator assigns order number to this node (e.g. if we already have
nodes 1,2 and 3, new node will have order 4). This node will then be the
last one on the ring, i.e. nodes are always ordered in the ring by this
order number (1->2->3->4->1). If we change this, we will basically allow a
node to be placed anywhere else (smth like 1->2->4->3->1). I'm not 100%
sure if this is going to cause issues, but sounds dangerous.

Yakov, can you please chime in and share your thoughts on this?
>>

I don't think this may cause issues. Nodes ordering and placement is
implemented in TcpDiscoveryNodesRing and I think that we will just need to
alter org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection<org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNode>)
logic.

As far as design of this, I would suggest the following.

1.  User should have an ability to define ARC_ID for the node. I suggest
"arc" for this since we are using "ring" concept. This will be the most
honored characteristic for nodes placement. By default arc_id is 0 and
possible to set with system property IGNITE_DISCO_ARC_ID or env variable or
via TcpDiscoverySpi.setArcId() - new method.
So, if I have nodes A, D, G with arc_id set to 1 and B, Z with arc_id set
to 5 then ring should be built as follows: A->D->G->B->Z->A. Here arcs can
represent different racks or data centers.

I am strongly against giving user an opportunity to point exact place in
the ring with somewhat like this interface [int getIdex(Node newNode,
List<Node> currentRing)]. This is very error prone and may require tricky
consistency checks just to make sure that implementation of this interface
is consistent along the topology.
With "arcs" approach user can automatically assign proper ids basing on
physical network topology and network routes.

2. Subnet - 2nd honored parameter. Nodes on the same subnet should be
placed side by side in the same arc.

3. Physical host - 3rd honored parameter. Nodes on the same physical host
should be placed together automatically in the same arc.

4. New mode involving points 1-3 should become default and we should also
provide ability to switch to current mode which should become legacy.

--Yakov
Reply | Threaded
Open this post in threaded view
|

Re: Sort nodes in the ring in order to minimize the number of reconnections

daradurvs
>>
Vyacheslav, please elaborate on how we can determine whether we are on the
same rack. I am not sure this is possible in general case. Please see my
suggestions below.
>>

I thought of latency values.

Latency between host nodes < Latency between same rack nodes < Latency
between subnet nodes < etc.


2016-12-26 12:20 GMT+03:00 Yakov Zhdanov <[hidden email]>:

> >>
> For example, ordering on latency:
> - nodes on one host = 1
> - nodes in one rack-blade = 2
> - nodes in one server-rack = 3
> - nodes in one physical cluster = 4
> - nodes in one subnet = 5
> - etc.
>
> Maybe it'll be better to use some metrics from ClusterMetrics interface.
>
> The algorithm of ordering can be implemented in a class such as Comparator
> and use it when we build a cluster or we select a place for a new node.
> >>
>
> Vyacheslav, please elaborate on how we can determine whether we are on the
> same rack. I am not sure this is possible in general case. Please see my
> suggestions below.
>
> >>
> However, here is the concern I have. Currently when a new node joins,
> coordinator assigns order number to this node (e.g. if we already have
> nodes 1,2 and 3, new node will have order 4). This node will then be the
> last one on the ring, i.e. nodes are always ordered in the ring by this
> order number (1->2->3->4->1). If we change this, we will basically allow a
> node to be placed anywhere else (smth like 1->2->4->3->1). I'm not 100%
> sure if this is going to cause issues, but sounds dangerous.
>
> Yakov, can you please chime in and share your thoughts on this?
> >>
>
> I don't think this may cause issues. Nodes ordering and placement is
> implemented in TcpDiscoveryNodesRing and I think that we will just need to
> alter org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#
> nextNode(java.util.Collection<org.apache.ignite.spi.
> discovery.tcp.internal.TcpDiscoveryNode>)
> logic.
>
> As far as design of this, I would suggest the following.
>
> 1.  User should have an ability to define ARC_ID for the node. I suggest
> "arc" for this since we are using "ring" concept. This will be the most
> honored characteristic for nodes placement. By default arc_id is 0 and
> possible to set with system property IGNITE_DISCO_ARC_ID or env variable or
> via TcpDiscoverySpi.setArcId() - new method.
> So, if I have nodes A, D, G with arc_id set to 1 and B, Z with arc_id set
> to 5 then ring should be built as follows: A->D->G->B->Z->A. Here arcs can
> represent different racks or data centers.
>
> I am strongly against giving user an opportunity to point exact place in
> the ring with somewhat like this interface [int getIdex(Node newNode,
> List<Node> currentRing)]. This is very error prone and may require tricky
> consistency checks just to make sure that implementation of this interface
> is consistent along the topology.
> With "arcs" approach user can automatically assign proper ids basing on
> physical network topology and network routes.
>
> 2. Subnet - 2nd honored parameter. Nodes on the same subnet should be
> placed side by side in the same arc.
>
> 3. Physical host - 3rd honored parameter. Nodes on the same physical host
> should be placed together automatically in the same arc.
>
> 4. New mode involving points 1-3 should become default and we should also
> provide ability to switch to current mode which should become legacy.
>
> --Yakov
>
Reply | Threaded
Open this post in threaded view
|

Re: Sort nodes in the ring in order to minimize the number of reconnections

Александр Меньшиков
In reply to this post by yzhdanov
Thank you, Denis, for your explanation. Then, as I understand it, a lot of
reconnection in the ring cannot create even temporary but major problems
for performance. And in general this optimization will change practically
nothing. Or am I missing some things?

2016-12-26 12:20 GMT+03:00 Yakov Zhdanov <[hidden email]>:

> >>
> For example, ordering on latency:
> - nodes on one host = 1
> - nodes in one rack-blade = 2
> - nodes in one server-rack = 3
> - nodes in one physical cluster = 4
> - nodes in one subnet = 5
> - etc.
>
> Maybe it'll be better to use some metrics from ClusterMetrics interface.
>
> The algorithm of ordering can be implemented in a class such as Comparator
> and use it when we build a cluster or we select a place for a new node.
> >>
>
> Vyacheslav, please elaborate on how we can determine whether we are on the
> same rack. I am not sure this is possible in general case. Please see my
> suggestions below.
>
> >>
> However, here is the concern I have. Currently when a new node joins,
> coordinator assigns order number to this node (e.g. if we already have
> nodes 1,2 and 3, new node will have order 4). This node will then be the
> last one on the ring, i.e. nodes are always ordered in the ring by this
> order number (1->2->3->4->1). If we change this, we will basically allow a
> node to be placed anywhere else (smth like 1->2->4->3->1). I'm not 100%
> sure if this is going to cause issues, but sounds dangerous.
>
> Yakov, can you please chime in and share your thoughts on this?
> >>
>
> I don't think this may cause issues. Nodes ordering and placement is
> implemented in TcpDiscoveryNodesRing and I think that we will just need to
> alter org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#
> nextNode(java.util.Collection<org.apache.ignite.spi.
> discovery.tcp.internal.TcpDiscoveryNode>)
> logic.
>
> As far as design of this, I would suggest the following.
>
> 1.  User should have an ability to define ARC_ID for the node. I suggest
> "arc" for this since we are using "ring" concept. This will be the most
> honored characteristic for nodes placement. By default arc_id is 0 and
> possible to set with system property IGNITE_DISCO_ARC_ID or env variable or
> via TcpDiscoverySpi.setArcId() - new method.
> So, if I have nodes A, D, G with arc_id set to 1 and B, Z with arc_id set
> to 5 then ring should be built as follows: A->D->G->B->Z->A. Here arcs can
> represent different racks or data centers.
>
> I am strongly against giving user an opportunity to point exact place in
> the ring with somewhat like this interface [int getIdex(Node newNode,
> List<Node> currentRing)]. This is very error prone and may require tricky
> consistency checks just to make sure that implementation of this interface
> is consistent along the topology.
> With "arcs" approach user can automatically assign proper ids basing on
> physical network topology and network routes.
>
> 2. Subnet - 2nd honored parameter. Nodes on the same subnet should be
> placed side by side in the same arc.
>
> 3. Physical host - 3rd honored parameter. Nodes on the same physical host
> should be placed together automatically in the same arc.
>
> 4. New mode involving points 1-3 should become default and we should also
> provide ability to switch to current mode which should become legacy.
>
> --Yakov
>
Reply | Threaded
Open this post in threaded view
|

Re: Sort nodes in the ring in order to minimize the number of reconnections

yzhdanov
>Then, as I understand it, a lot of reconnection in the ring cannot create
even temporary but major problems for performance. And in general this
optimization will change practically nothing. Or am I missing some things?

I am afraid I did not understand this at all. Please elaborate.

I did not suggest any reconnections or ring rebuild. All I suggest is to
control over ring building process with arcs. And, yes, proper built ring
should reduce latency of ring messages IMO.

--Yakov
Reply | Threaded
Open this post in threaded view
|

Re: Sort nodes in the ring in order to minimize the number of reconnections

Александр Меньшиков
> I am afraid I did not understand this at all. Please elaborate.

I just want to understand which benefits we get when implement what we're
talking about. If major benefit is reduced latency of ring messages, then
the assignment 'ARC ID' in accordance with latency value is quite
enough. But if there are any hidden problems because of the large number of
reconnection (like I described in first message in this discussion), then
better to find a way to determine real physical location.

> And, yes, proper built ring should reduce latency of ring messages IMO.

Okey, then i think Vyacheslav's idea (using latency values) is quite enough
when we can't determine real physical location.

2016-12-26 13:03 GMT+03:00 Yakov Zhdanov <[hidden email]>:

> >Then, as I understand it, a lot of reconnection in the ring cannot create
> even temporary but major problems for performance. And in general this
> optimization will change practically nothing. Or am I missing some things?
>
> I am afraid I did not understand this at all. Please elaborate.
>
> I did not suggest any reconnections or ring rebuild. All I suggest is to
> control over ring building process with arcs. And, yes, proper built ring
> should reduce latency of ring messages IMO.
>
> --Yakov
>
Reply | Threaded
Open this post in threaded view
|

Re: Sort nodes in the ring in order to minimize the number of reconnections

yzhdanov
In reply to this post by daradurvs
>>
I thought of latency values.

Latency between host nodes < Latency between same rack nodes < Latency
between subnet nodes < etc.
>>

Vyacheslav, I agree that latency increase in the way you describe, but I
still don't understand how we use this information in discovery. Latency
may differ from time to time depending on many factors. I still think that
arc approach is more intuitive for user and easier to implement.

--Yakov
Reply | Threaded
Open this post in threaded view
|

Re: Sort nodes in the ring in order to minimize the number of reconnections

yzhdanov
In reply to this post by Александр Меньшиков
>>
I just want to understand which benefits we get when implement what we're
talking about. If major benefit is reduced latency of ring messages, then
the assignment 'ARC ID' in accordance with latency value is quite
enough. But if there are any hidden problems because of the large number of
reconnection (like I described in first message in this discussion), then
better to find a way to determine real physical location.
>>

I suggest to solve ring building up and reducing number of reconnects
separately. If we have AxB-C-D-A then A will try to reconnect to B, then to
C, then to D. This is how discovery works now. I agree this should be fixed
and I have couple ideas on how we can do it but let's separate these ones.

>>
Okey, then i think Vyacheslav's idea (using latency values) is quite enough
when we can't determine real physical location.
>>

Can you please explain why this is better than arc approach?

--Yakov
Reply | Threaded
Open this post in threaded view
|

Re: Sort nodes in the ring in order to minimize the number of reconnections

daradurvs
>>
Vyacheslav, I agree that latency increase in the way you describe, but I
still don't understand how we use this information in discovery. Latency
may differ from time to time depending on many factors. I still think that
arc approach is more intuitive for user and easier to implement.
>>

Way of latency increase is just a main idea.

I suggest to connect new node on some priority.
General approach:
--
if [ there are same host node ] then [ connect with it ]
else if [ there are same subnet nodes] then [ connect with one of them ]
 // how to choose node from a set of subnet? - choose with min latency each
other
else [ connect to remote nodes ] // how to choose node from a set of
remotes? - choose with min latency each other
--
Maybe we can describe another intermediate steps.


2016-12-26 15:08 GMT+03:00 Yakov Zhdanov <[hidden email]>:

> >>
> I just want to understand which benefits we get when implement what we're
> talking about. If major benefit is reduced latency of ring messages, then
> the assignment 'ARC ID' in accordance with latency value is quite
> enough. But if there are any hidden problems because of the large number of
> reconnection (like I described in first message in this discussion), then
> better to find a way to determine real physical location.
> >>
>
> I suggest to solve ring building up and reducing number of reconnects
> separately. If we have AxB-C-D-A then A will try to reconnect to B, then to
> C, then to D. This is how discovery works now. I agree this should be fixed
> and I have couple ideas on how we can do it but let's separate these ones.
>
> >>
> Okey, then i think Vyacheslav's idea (using latency values) is quite enough
> when we can't determine real physical location.
> >>
>
> Can you please explain why this is better than arc approach?
>
> --Yakov
>
Reply | Threaded
Open this post in threaded view
|

Re: Sort nodes in the ring in order to minimize the number of reconnections

Александр Меньшиков
> Can you please explain why this is better than arc approach?

We had a misunderstanding. Everything okay with arc approach. But we must
choose how nodes will determine "ARC_ID", and i think it can be calculated
from latency values. If users will be able to set "ARC_ID" in config file
then they can set different 'ARC_ID' on all nodes, and, in fact, point
exact place in the ring, what we would like to avoid.

2016-12-26 15:36 GMT+03:00 Vyacheslav Daradur <[hidden email]>:

> >>
> Vyacheslav, I agree that latency increase in the way you describe, but I
> still don't understand how we use this information in discovery. Latency
> may differ from time to time depending on many factors. I still think that
> arc approach is more intuitive for user and easier to implement.
> >>
>
> Way of latency increase is just a main idea.
>
> I suggest to connect new node on some priority.
> General approach:
> --
> if [ there are same host node ] then [ connect with it ]
> else if [ there are same subnet nodes] then [ connect with one of them ]
>  // how to choose node from a set of subnet? - choose with min latency each
> other
> else [ connect to remote nodes ] // how to choose node from a set of
> remotes? - choose with min latency each other
> --
> Maybe we can describe another intermediate steps.
>
>
> 2016-12-26 15:08 GMT+03:00 Yakov Zhdanov <[hidden email]>:
>
> > >>
> > I just want to understand which benefits we get when implement what we're
> > talking about. If major benefit is reduced latency of ring messages, then
> > the assignment 'ARC ID' in accordance with latency value is quite
> > enough. But if there are any hidden problems because of the large number
> of
> > reconnection (like I described in first message in this discussion), then
> > better to find a way to determine real physical location.
> > >>
> >
> > I suggest to solve ring building up and reducing number of reconnects
> > separately. If we have AxB-C-D-A then A will try to reconnect to B, then
> to
> > C, then to D. This is how discovery works now. I agree this should be
> fixed
> > and I have couple ideas on how we can do it but let's separate these
> ones.
> >
> > >>
> > Okey, then i think Vyacheslav's idea (using latency values) is quite
> enough
> > when we can't determine real physical location.
> > >>
> >
> > Can you please explain why this is better than arc approach?
> >
> > --Yakov
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Sort nodes in the ring in order to minimize the number of reconnections

Alexei Scherbakov
Yakov,

ARC_ID approach seems just a variation of node attribute based ordering for
me.

I suggest more generic approach.

What if we define node ordering using something like NodeComparator?

Then a new node joins topology, it calculates node for joining using
sorting on current topology + new node.

nextNode just takes first element in sorted list. It's guaranteed what all
nodes will return the same sorted list for the topology version.

We can provide default implementation based on IP address:

nodes on the same host : nodes on the same subnet : other nodes

I think this will work for most cases.

If needed user can provide it's own comparison strategy based on latencies,
data centers, whatever.







2016-12-26 17:17 GMT+03:00 Александр Меньшиков <[hidden email]>:

> > Can you please explain why this is better than arc approach?
>
> We had a misunderstanding. Everything okay with arc approach. But we must
> choose how nodes will determine "ARC_ID", and i think it can be calculated
> from latency values. If users will be able to set "ARC_ID" in config file
> then they can set different 'ARC_ID' on all nodes, and, in fact, point
> exact place in the ring, what we would like to avoid.
>
> 2016-12-26 15:36 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
>
> > >>
> > Vyacheslav, I agree that latency increase in the way you describe, but I
> > still don't understand how we use this information in discovery. Latency
> > may differ from time to time depending on many factors. I still think
> that
> > arc approach is more intuitive for user and easier to implement.
> > >>
> >
> > Way of latency increase is just a main idea.
> >
> > I suggest to connect new node on some priority.
> > General approach:
> > --
> > if [ there are same host node ] then [ connect with it ]
> > else if [ there are same subnet nodes] then [ connect with one of them ]
> >  // how to choose node from a set of subnet? - choose with min latency
> each
> > other
> > else [ connect to remote nodes ] // how to choose node from a set of
> > remotes? - choose with min latency each other
> > --
> > Maybe we can describe another intermediate steps.
> >
> >
> > 2016-12-26 15:08 GMT+03:00 Yakov Zhdanov <[hidden email]>:
> >
> > > >>
> > > I just want to understand which benefits we get when implement what
> we're
> > > talking about. If major benefit is reduced latency of ring messages,
> then
> > > the assignment 'ARC ID' in accordance with latency value is quite
> > > enough. But if there are any hidden problems because of the large
> number
> > of
> > > reconnection (like I described in first message in this discussion),
> then
> > > better to find a way to determine real physical location.
> > > >>
> > >
> > > I suggest to solve ring building up and reducing number of reconnects
> > > separately. If we have AxB-C-D-A then A will try to reconnect to B,
> then
> > to
> > > C, then to D. This is how discovery works now. I agree this should be
> > fixed
> > > and I have couple ideas on how we can do it but let's separate these
> > ones.
> > >
> > > >>
> > > Okey, then i think Vyacheslav's idea (using latency values) is quite
> > enough
> > > when we can't determine real physical location.
> > > >>
> > >
> > > Can you please explain why this is better than arc approach?
> > >
> > > --Yakov
> > >
> >
>



--

Best regards,
Alexei Scherbakov
Reply | Threaded
Open this post in threaded view
|

Re: Sort nodes in the ring in order to minimize the number of reconnections

Alexei Scherbakov
Of course where is no need to sort all nodes.

It's enough just to select smallest node.

2016-12-26 22:29 GMT+03:00 Alexei Scherbakov <[hidden email]>:

> Yakov,
>
> ARC_ID approach seems just a variation of node attribute based ordering
> for me.
>
> I suggest more generic approach.
>
> What if we define node ordering using something like NodeComparator?
>
> Then a new node joins topology, it calculates node for joining using
> sorting on current topology + new node.
>
> nextNode just takes first element in sorted list. It's guaranteed what all
> nodes will return the same sorted list for the topology version.
>
> We can provide default implementation based on IP address:
>
> nodes on the same host : nodes on the same subnet : other nodes
>
> I think this will work for most cases.
>
> If needed user can provide it's own comparison strategy based on
> latencies, data centers, whatever.
>
>
>
>
>
>
>
> 2016-12-26 17:17 GMT+03:00 Александр Меньшиков <[hidden email]>:
>
>> > Can you please explain why this is better than arc approach?
>>
>> We had a misunderstanding. Everything okay with arc approach. But we must
>> choose how nodes will determine "ARC_ID", and i think it can be calculated
>> from latency values. If users will be able to set "ARC_ID" in config file
>> then they can set different 'ARC_ID' on all nodes, and, in fact, point
>> exact place in the ring, what we would like to avoid.
>>
>> 2016-12-26 15:36 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
>>
>> > >>
>> > Vyacheslav, I agree that latency increase in the way you describe, but I
>> > still don't understand how we use this information in discovery. Latency
>> > may differ from time to time depending on many factors. I still think
>> that
>> > arc approach is more intuitive for user and easier to implement.
>> > >>
>> >
>> > Way of latency increase is just a main idea.
>> >
>> > I suggest to connect new node on some priority.
>> > General approach:
>> > --
>> > if [ there are same host node ] then [ connect with it ]
>> > else if [ there are same subnet nodes] then [ connect with one of them ]
>> >  // how to choose node from a set of subnet? - choose with min latency
>> each
>> > other
>> > else [ connect to remote nodes ] // how to choose node from a set of
>> > remotes? - choose with min latency each other
>> > --
>> > Maybe we can describe another intermediate steps.
>> >
>> >
>> > 2016-12-26 15:08 GMT+03:00 Yakov Zhdanov <[hidden email]>:
>> >
>> > > >>
>> > > I just want to understand which benefits we get when implement what
>> we're
>> > > talking about. If major benefit is reduced latency of ring messages,
>> then
>> > > the assignment 'ARC ID' in accordance with latency value is quite
>> > > enough. But if there are any hidden problems because of the large
>> number
>> > of
>> > > reconnection (like I described in first message in this discussion),
>> then
>> > > better to find a way to determine real physical location.
>> > > >>
>> > >
>> > > I suggest to solve ring building up and reducing number of reconnects
>> > > separately. If we have AxB-C-D-A then A will try to reconnect to B,
>> then
>> > to
>> > > C, then to D. This is how discovery works now. I agree this should be
>> > fixed
>> > > and I have couple ideas on how we can do it but let's separate these
>> > ones.
>> > >
>> > > >>
>> > > Okey, then i think Vyacheslav's idea (using latency values) is quite
>> > enough
>> > > when we can't determine real physical location.
>> > > >>
>> > >
>> > > Can you please explain why this is better than arc approach?
>> > >
>> > > --Yakov
>> > >
>> >
>>
>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>



--

Best regards,
Alexei Scherbakov
Reply | Threaded
Open this post in threaded view
|

Re: Sort nodes in the ring in order to minimize the number of reconnections

dsetrakyan
I think the NodeComparator approach will work. User can chose how to sort
nodes from one rack before nodes from another rack. Same goes for subnets,
or data centers.

My main concern here is code complexity. Yakov, how difficult it is to
stick a new node in an arbitrary spot of a discovery ring?

D.

On Mon, Dec 26, 2016 at 12:42 PM, Alexei Scherbakov <
[hidden email]> wrote:

> Of course where is no need to sort all nodes.
>
> It's enough just to select smallest node.
>
> 2016-12-26 22:29 GMT+03:00 Alexei Scherbakov <[hidden email]
> >:
>
> > Yakov,
> >
> > ARC_ID approach seems just a variation of node attribute based ordering
> > for me.
> >
> > I suggest more generic approach.
> >
> > What if we define node ordering using something like NodeComparator?
> >
> > Then a new node joins topology, it calculates node for joining using
> > sorting on current topology + new node.
> >
> > nextNode just takes first element in sorted list. It's guaranteed what
> all
> > nodes will return the same sorted list for the topology version.
> >
> > We can provide default implementation based on IP address:
> >
> > nodes on the same host : nodes on the same subnet : other nodes
> >
> > I think this will work for most cases.
> >
> > If needed user can provide it's own comparison strategy based on
> > latencies, data centers, whatever.
> >
> >
> >
> >
> >
> >
> >
> > 2016-12-26 17:17 GMT+03:00 Александр Меньшиков <[hidden email]>:
> >
> >> > Can you please explain why this is better than arc approach?
> >>
> >> We had a misunderstanding. Everything okay with arc approach. But we
> must
> >> choose how nodes will determine "ARC_ID", and i think it can be
> calculated
> >> from latency values. If users will be able to set "ARC_ID" in config
> file
> >> then they can set different 'ARC_ID' on all nodes, and, in fact, point
> >> exact place in the ring, what we would like to avoid.
> >>
> >> 2016-12-26 15:36 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> >>
> >> > >>
> >> > Vyacheslav, I agree that latency increase in the way you describe,
> but I
> >> > still don't understand how we use this information in discovery.
> Latency
> >> > may differ from time to time depending on many factors. I still think
> >> that
> >> > arc approach is more intuitive for user and easier to implement.
> >> > >>
> >> >
> >> > Way of latency increase is just a main idea.
> >> >
> >> > I suggest to connect new node on some priority.
> >> > General approach:
> >> > --
> >> > if [ there are same host node ] then [ connect with it ]
> >> > else if [ there are same subnet nodes] then [ connect with one of
> them ]
> >> >  // how to choose node from a set of subnet? - choose with min latency
> >> each
> >> > other
> >> > else [ connect to remote nodes ] // how to choose node from a set of
> >> > remotes? - choose with min latency each other
> >> > --
> >> > Maybe we can describe another intermediate steps.
> >> >
> >> >
> >> > 2016-12-26 15:08 GMT+03:00 Yakov Zhdanov <[hidden email]>:
> >> >
> >> > > >>
> >> > > I just want to understand which benefits we get when implement what
> >> we're
> >> > > talking about. If major benefit is reduced latency of ring messages,
> >> then
> >> > > the assignment 'ARC ID' in accordance with latency value is quite
> >> > > enough. But if there are any hidden problems because of the large
> >> number
> >> > of
> >> > > reconnection (like I described in first message in this discussion),
> >> then
> >> > > better to find a way to determine real physical location.
> >> > > >>
> >> > >
> >> > > I suggest to solve ring building up and reducing number of
> reconnects
> >> > > separately. If we have AxB-C-D-A then A will try to reconnect to B,
> >> then
> >> > to
> >> > > C, then to D. This is how discovery works now. I agree this should
> be
> >> > fixed
> >> > > and I have couple ideas on how we can do it but let's separate these
> >> > ones.
> >> > >
> >> > > >>
> >> > > Okey, then i think Vyacheslav's idea (using latency values) is quite
> >> > enough
> >> > > when we can't determine real physical location.
> >> > > >>
> >> > >
> >> > > Can you please explain why this is better than arc approach?
> >> > >
> >> > > --Yakov
> >> > >
> >> >
> >>
> >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>
Reply | Threaded
Open this post in threaded view
|

Re: Sort nodes in the ring in order to minimize the number of reconnections

yzhdanov
>>
My main concern here is code complexity. Yakov, how difficult it is to
stick a new node in an arbitrary spot of a discovery ring?
>>

Dmitry, I think this is not hard. At least I don't see any issue now.

>>
I think the NodeComparator approach will work. User can chose how to sort
nodes from one rack before nodes from another rack. Same goes for subnets,
or data centers.
>>

Dmitry, can you please explain why you enforce user to write code? This
does not seem convenient to me at all. If user wants to write code then he
can do it for calculating proper arc_id.

Another point I already posted to this thread - this is very error prone.

>>
I am strongly against giving user an opportunity to point exact place in
the ring with somewhat like this interface [int getIdex(Node newNode,
List<Node> currentRing)]. This is very error prone and may require tricky
consistency checks just to make sure that implementation of this interface
is consistent along the topology.
With "arcs" approach user can automatically assign proper ids basing on
physical network topology and network routes.
>>

I still think arc_id is better:
1. No code from user side. Only env variable or system property on a
machine.
2. All code inside Ignite - easy to fix and change if required.
3. All benefits of comparator are still available.

Alex, I still don't get how you (and other guys as well) want to deal with
latencies here. I would like you explain how you solve this - you have 1000
IP addresses, and you need to sort them in your beloved latency order, but
please note that you need to get exactly the same ring on all of these 1000
machines.

--Yakov
123