IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts

Alexey Popov
Hi Igniters,

We often see similar questions from users and customers related to
IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts and their
relations. And we see several side-effects after incorrect timeout
configuration.

I tried to briefly describe these timeout settings (please see below) and
found out that the most of them do not have sense in terms of cluster
functions/operations and could not be explained to the users.

I propose to deprecate most of them and leave only the timeouts we can
explain in common terms ( (setFailureDetectionTimeout, setNetworkTimeout,
setJoinTimeout and some others).

Please let me know your thoughts.

Thanks,
Alexey

GLOBAL:

IgniteConfiguration.setNetworkTimeout:
It is a global timeout for high-level operations where a network is
involved. For instance, IgniteMessaging delivery uses this timeout or
DiscoverySpi handshake.

IgniteConfiguration.setFailureDetectionTimeout:
It is a global timeout for detecting failures at IgniteSpi implementations
(including DiscoverySpi and CommunicationSpi).
The failure detection algorithm actually limits a range of simple network
operations related to a single logical operation (for instance, a reliable
delivery of some DiscoverySpi message within a cluster).
Failure detection timeout is a cumulative timeout for a socket connection,
sending and receiving data bytes and all possible socket retries (if some
failure happens).
This timeout is intended to simplify the failure detection condition from a
user perspective.

IgniteConfiguration.setClientFailureDetectionTimeout: - it is a special case
for DiscoverySpi client-node Ignite.

TCP DISCOVERY SPI:

If you need more control over failure detection algorithm for
TcpDiscoverySpi you can explicitly use the following low-level options (that
will disable failureDetectoinTimeout logic):

1. TcpDiscoverySpi.setConnectTimeout - socket connection timeout
2. TcpDiscoverySpi.setReconnectCount - number of reconnect attempts used
when establishing connection with the remote node and sending messages to it
3. TcpDiscoverySpi.setSocketTimeout - socket write timeout. The write
operation will be repeated getReconnectCount() times if it exceeds this
timeout
4. TcpDiscoverySpi.setAckTimeout - message acknowledgment timeout. If a
message acknowledgment is not received within this timeout, sending is
considered as failed and SPI will try to repeat send operation. It is
automatically doubled for simultaneous retries up to getMaxAckTimeout value.
5. TcpDiscoverySpi.setMaxAckTimeout - maximum connection timeout, if the
getAckTimeout reaches getMaxAckTimeout then SPI give up sending retries

Another important TcpDiscoverySpi timeouts:

TcpDiscoverySpi.setJoinTimeout - It is a timeout for join process when a
new/restarted node joins a cluster. The node tries to connect to all
available IP addresses provided by ipFinder within this timeout.
If the timeout is exceeded, the node will give up and throw an exception
from Ignition.start().

TcpDiscoverySpi.setNetworkTimeout - timeout for high-level operations like
handshake. It looks like it should be deprecated and the
IgniteConfiguration.getNetworkTimeout should be used here.

TCP COMMUNICATION SPI:

If you need more control over failure detection algorithm for
TcpCommunicationSpi you can explicitly use the following low-level options
(that will disable failureDetectoinTimeout logic):

1. TcpCommunicationSpi.setConnectTimeout - socket connection timeout, will
be automatically doubled for simultaneous retries (up to getReconnectCount)
related to a single logical operation
2. TcpCommunicationSpi.setMaxConnectTimeout - maximum connection timeout,
the higher limit of getReconnectCount-times doubled getConnectTimeout
3. TcpCommunicationSpi.setReconnectCount - number of reconnect attempts used
when establishing connection with the remote node and sending messages to it

Another important TcpCommunicationSpi timeouts:

TcpDiscoverySpi.setSocketWriteTimeout - timeout to send a message.
TcpDiscoverySpi.setIdleConnectionTimeout - maximum idle connection timeout
upon which a connection will be closed.




--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts

Ilya Kasnacheev
I agree with you.

I think we could restrict usage of e.g. setConnectTimeout/setSocketTimeout
to people extending SPIs, since different implementations may need
different values.

However, for user configurations we should only expose timeouts we can
explain, everything else should have reasonable values.

--
Ilya Kasnacheev

2018-03-01 17:01 GMT+03:00 Alexey Popov <[hidden email]>:

> Hi Igniters,
>
> We often see similar questions from users and customers related to
> IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts and
> their
> relations. And we see several side-effects after incorrect timeout
> configuration.
>
> I tried to briefly describe these timeout settings (please see below) and
> found out that the most of them do not have sense in terms of cluster
> functions/operations and could not be explained to the users.
>
> I propose to deprecate most of them and leave only the timeouts we can
> explain in common terms ( (setFailureDetectionTimeout, setNetworkTimeout,
> setJoinTimeout and some others).
>
> Please let me know your thoughts.
>
> Thanks,
> Alexey
>
> GLOBAL:
>
> IgniteConfiguration.setNetworkTimeout:
> It is a global timeout for high-level operations where a network is
> involved. For instance, IgniteMessaging delivery uses this timeout or
> DiscoverySpi handshake.
>
> IgniteConfiguration.setFailureDetectionTimeout:
> It is a global timeout for detecting failures at IgniteSpi implementations
> (including DiscoverySpi and CommunicationSpi).
> The failure detection algorithm actually limits a range of simple network
> operations related to a single logical operation (for instance, a reliable
> delivery of some DiscoverySpi message within a cluster).
> Failure detection timeout is a cumulative timeout for a socket connection,
> sending and receiving data bytes and all possible socket retries (if some
> failure happens).
> This timeout is intended to simplify the failure detection condition from a
> user perspective.
>
> IgniteConfiguration.setClientFailureDetectionTimeout: - it is a special
> case
> for DiscoverySpi client-node Ignite.
>
> TCP DISCOVERY SPI:
>
> If you need more control over failure detection algorithm for
> TcpDiscoverySpi you can explicitly use the following low-level options
> (that
> will disable failureDetectoinTimeout logic):
>
> 1. TcpDiscoverySpi.setConnectTimeout - socket connection timeout
> 2. TcpDiscoverySpi.setReconnectCount - number of reconnect attempts used
> when establishing connection with the remote node and sending messages to
> it
> 3. TcpDiscoverySpi.setSocketTimeout - socket write timeout. The write
> operation will be repeated getReconnectCount() times if it exceeds this
> timeout
> 4. TcpDiscoverySpi.setAckTimeout - message acknowledgment timeout. If a
> message acknowledgment is not received within this timeout, sending is
> considered as failed and SPI will try to repeat send operation. It is
> automatically doubled for simultaneous retries up to getMaxAckTimeout
> value.
> 5. TcpDiscoverySpi.setMaxAckTimeout - maximum connection timeout, if the
> getAckTimeout reaches getMaxAckTimeout then SPI give up sending retries
>
> Another important TcpDiscoverySpi timeouts:
>
> TcpDiscoverySpi.setJoinTimeout - It is a timeout for join process when a
> new/restarted node joins a cluster. The node tries to connect to all
> available IP addresses provided by ipFinder within this timeout.
> If the timeout is exceeded, the node will give up and throw an exception
> from Ignition.start().
>
> TcpDiscoverySpi.setNetworkTimeout - timeout for high-level operations like
> handshake. It looks like it should be deprecated and the
> IgniteConfiguration.getNetworkTimeout should be used here.
>
> TCP COMMUNICATION SPI:
>
> If you need more control over failure detection algorithm for
> TcpCommunicationSpi you can explicitly use the following low-level options
> (that will disable failureDetectoinTimeout logic):
>
> 1. TcpCommunicationSpi.setConnectTimeout - socket connection timeout, will
> be automatically doubled for simultaneous retries (up to getReconnectCount)
> related to a single logical operation
> 2. TcpCommunicationSpi.setMaxConnectTimeout - maximum connection timeout,
> the higher limit of getReconnectCount-times doubled getConnectTimeout
> 3. TcpCommunicationSpi.setReconnectCount - number of reconnect attempts
> used
> when establishing connection with the remote node and sending messages to
> it
>
> Another important TcpCommunicationSpi timeouts:
>
> TcpDiscoverySpi.setSocketWriteTimeout - timeout to send a message.
> TcpDiscoverySpi.setIdleConnectionTimeout - maximum idle connection timeout
> upon which a connection will be closed.
>
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts

Valentin Kulichenko
+1. Low level timeouts that we still have in discovery and communication
are very hard to explain and I doubt there is anyone who fully understands
how they currently work. They bring a lot of complexity and almost zero
value. Let's deprecate them and leave only failureDetectionTimeout plus
other high level settings that Alexey mentioned.

-Val

On Thu, Mar 1, 2018 at 6:06 AM, Ilya Kasnacheev <[hidden email]>
wrote:

> I agree with you.
>
> I think we could restrict usage of e.g. setConnectTimeout/setSocketTimeout
> to people extending SPIs, since different implementations may need
> different values.
>
> However, for user configurations we should only expose timeouts we can
> explain, everything else should have reasonable values.
>
> --
> Ilya Kasnacheev
>
> 2018-03-01 17:01 GMT+03:00 Alexey Popov <[hidden email]>:
>
> > Hi Igniters,
> >
> > We often see similar questions from users and customers related to
> > IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts and
> > their
> > relations. And we see several side-effects after incorrect timeout
> > configuration.
> >
> > I tried to briefly describe these timeout settings (please see below) and
> > found out that the most of them do not have sense in terms of cluster
> > functions/operations and could not be explained to the users.
> >
> > I propose to deprecate most of them and leave only the timeouts we can
> > explain in common terms ( (setFailureDetectionTimeout, setNetworkTimeout,
> > setJoinTimeout and some others).
> >
> > Please let me know your thoughts.
> >
> > Thanks,
> > Alexey
> >
> > GLOBAL:
> >
> > IgniteConfiguration.setNetworkTimeout:
> > It is a global timeout for high-level operations where a network is
> > involved. For instance, IgniteMessaging delivery uses this timeout or
> > DiscoverySpi handshake.
> >
> > IgniteConfiguration.setFailureDetectionTimeout:
> > It is a global timeout for detecting failures at IgniteSpi
> implementations
> > (including DiscoverySpi and CommunicationSpi).
> > The failure detection algorithm actually limits a range of simple network
> > operations related to a single logical operation (for instance, a
> reliable
> > delivery of some DiscoverySpi message within a cluster).
> > Failure detection timeout is a cumulative timeout for a socket
> connection,
> > sending and receiving data bytes and all possible socket retries (if some
> > failure happens).
> > This timeout is intended to simplify the failure detection condition
> from a
> > user perspective.
> >
> > IgniteConfiguration.setClientFailureDetectionTimeout: - it is a special
> > case
> > for DiscoverySpi client-node Ignite.
> >
> > TCP DISCOVERY SPI:
> >
> > If you need more control over failure detection algorithm for
> > TcpDiscoverySpi you can explicitly use the following low-level options
> > (that
> > will disable failureDetectoinTimeout logic):
> >
> > 1. TcpDiscoverySpi.setConnectTimeout - socket connection timeout
> > 2. TcpDiscoverySpi.setReconnectCount - number of reconnect attempts used
> > when establishing connection with the remote node and sending messages to
> > it
> > 3. TcpDiscoverySpi.setSocketTimeout - socket write timeout. The write
> > operation will be repeated getReconnectCount() times if it exceeds this
> > timeout
> > 4. TcpDiscoverySpi.setAckTimeout - message acknowledgment timeout. If a
> > message acknowledgment is not received within this timeout, sending is
> > considered as failed and SPI will try to repeat send operation. It is
> > automatically doubled for simultaneous retries up to getMaxAckTimeout
> > value.
> > 5. TcpDiscoverySpi.setMaxAckTimeout - maximum connection timeout, if the
> > getAckTimeout reaches getMaxAckTimeout then SPI give up sending retries
> >
> > Another important TcpDiscoverySpi timeouts:
> >
> > TcpDiscoverySpi.setJoinTimeout - It is a timeout for join process when a
> > new/restarted node joins a cluster. The node tries to connect to all
> > available IP addresses provided by ipFinder within this timeout.
> > If the timeout is exceeded, the node will give up and throw an exception
> > from Ignition.start().
> >
> > TcpDiscoverySpi.setNetworkTimeout - timeout for high-level operations
> like
> > handshake. It looks like it should be deprecated and the
> > IgniteConfiguration.getNetworkTimeout should be used here.
> >
> > TCP COMMUNICATION SPI:
> >
> > If you need more control over failure detection algorithm for
> > TcpCommunicationSpi you can explicitly use the following low-level
> options
> > (that will disable failureDetectoinTimeout logic):
> >
> > 1. TcpCommunicationSpi.setConnectTimeout - socket connection timeout,
> will
> > be automatically doubled for simultaneous retries (up to
> getReconnectCount)
> > related to a single logical operation
> > 2. TcpCommunicationSpi.setMaxConnectTimeout - maximum connection
> timeout,
> > the higher limit of getReconnectCount-times doubled getConnectTimeout
> > 3. TcpCommunicationSpi.setReconnectCount - number of reconnect attempts
> > used
> > when establishing connection with the remote node and sending messages to
> > it
> >
> > Another important TcpCommunicationSpi timeouts:
> >
> > TcpDiscoverySpi.setSocketWriteTimeout - timeout to send a message.
> > TcpDiscoverySpi.setIdleConnectionTimeout - maximum idle connection
> timeout
> > upon which a connection will be closed.
> >
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts

Dmitriy Pavlov
Сlear and intuitive API is the strength of Ignite, so I am also +1 for
removing the unobvious settings.

пт, 2 мар. 2018 г. в 4:11, Valentin Kulichenko <
[hidden email]>:

> +1. Low level timeouts that we still have in discovery and communication
> are very hard to explain and I doubt there is anyone who fully understands
> how they currently work. They bring a lot of complexity and almost zero
> value. Let's deprecate them and leave only failureDetectionTimeout plus
> other high level settings that Alexey mentioned.
>
> -Val
>
> On Thu, Mar 1, 2018 at 6:06 AM, Ilya Kasnacheev <[hidden email]
> >
> wrote:
>
> > I agree with you.
> >
> > I think we could restrict usage of e.g.
> setConnectTimeout/setSocketTimeout
> > to people extending SPIs, since different implementations may need
> > different values.
> >
> > However, for user configurations we should only expose timeouts we can
> > explain, everything else should have reasonable values.
> >
> > --
> > Ilya Kasnacheev
> >
> > 2018-03-01 17:01 GMT+03:00 Alexey Popov <[hidden email]>:
> >
> > > Hi Igniters,
> > >
> > > We often see similar questions from users and customers related to
> > > IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts and
> > > their
> > > relations. And we see several side-effects after incorrect timeout
> > > configuration.
> > >
> > > I tried to briefly describe these timeout settings (please see below)
> and
> > > found out that the most of them do not have sense in terms of cluster
> > > functions/operations and could not be explained to the users.
> > >
> > > I propose to deprecate most of them and leave only the timeouts we can
> > > explain in common terms ( (setFailureDetectionTimeout,
> setNetworkTimeout,
> > > setJoinTimeout and some others).
> > >
> > > Please let me know your thoughts.
> > >
> > > Thanks,
> > > Alexey
> > >
> > > GLOBAL:
> > >
> > > IgniteConfiguration.setNetworkTimeout:
> > > It is a global timeout for high-level operations where a network is
> > > involved. For instance, IgniteMessaging delivery uses this timeout or
> > > DiscoverySpi handshake.
> > >
> > > IgniteConfiguration.setFailureDetectionTimeout:
> > > It is a global timeout for detecting failures at IgniteSpi
> > implementations
> > > (including DiscoverySpi and CommunicationSpi).
> > > The failure detection algorithm actually limits a range of simple
> network
> > > operations related to a single logical operation (for instance, a
> > reliable
> > > delivery of some DiscoverySpi message within a cluster).
> > > Failure detection timeout is a cumulative timeout for a socket
> > connection,
> > > sending and receiving data bytes and all possible socket retries (if
> some
> > > failure happens).
> > > This timeout is intended to simplify the failure detection condition
> > from a
> > > user perspective.
> > >
> > > IgniteConfiguration.setClientFailureDetectionTimeout: - it is a special
> > > case
> > > for DiscoverySpi client-node Ignite.
> > >
> > > TCP DISCOVERY SPI:
> > >
> > > If you need more control over failure detection algorithm for
> > > TcpDiscoverySpi you can explicitly use the following low-level options
> > > (that
> > > will disable failureDetectoinTimeout logic):
> > >
> > > 1. TcpDiscoverySpi.setConnectTimeout - socket connection timeout
> > > 2. TcpDiscoverySpi.setReconnectCount - number of reconnect attempts
> used
> > > when establishing connection with the remote node and sending messages
> to
> > > it
> > > 3. TcpDiscoverySpi.setSocketTimeout - socket write timeout. The write
> > > operation will be repeated getReconnectCount() times if it exceeds this
> > > timeout
> > > 4. TcpDiscoverySpi.setAckTimeout - message acknowledgment timeout. If a
> > > message acknowledgment is not received within this timeout, sending is
> > > considered as failed and SPI will try to repeat send operation. It is
> > > automatically doubled for simultaneous retries up to getMaxAckTimeout
> > > value.
> > > 5. TcpDiscoverySpi.setMaxAckTimeout - maximum connection timeout, if
> the
> > > getAckTimeout reaches getMaxAckTimeout then SPI give up sending retries
> > >
> > > Another important TcpDiscoverySpi timeouts:
> > >
> > > TcpDiscoverySpi.setJoinTimeout - It is a timeout for join process when
> a
> > > new/restarted node joins a cluster. The node tries to connect to all
> > > available IP addresses provided by ipFinder within this timeout.
> > > If the timeout is exceeded, the node will give up and throw an
> exception
> > > from Ignition.start().
> > >
> > > TcpDiscoverySpi.setNetworkTimeout - timeout for high-level operations
> > like
> > > handshake. It looks like it should be deprecated and the
> > > IgniteConfiguration.getNetworkTimeout should be used here.
> > >
> > > TCP COMMUNICATION SPI:
> > >
> > > If you need more control over failure detection algorithm for
> > > TcpCommunicationSpi you can explicitly use the following low-level
> > options
> > > (that will disable failureDetectoinTimeout logic):
> > >
> > > 1. TcpCommunicationSpi.setConnectTimeout - socket connection timeout,
> > will
> > > be automatically doubled for simultaneous retries (up to
> > getReconnectCount)
> > > related to a single logical operation
> > > 2. TcpCommunicationSpi.setMaxConnectTimeout - maximum connection
> > timeout,
> > > the higher limit of getReconnectCount-times doubled getConnectTimeout
> > > 3. TcpCommunicationSpi.setReconnectCount - number of reconnect attempts
> > > used
> > > when establishing connection with the remote node and sending messages
> to
> > > it
> > >
> > > Another important TcpCommunicationSpi timeouts:
> > >
> > > TcpDiscoverySpi.setSocketWriteTimeout - timeout to send a message.
> > > TcpDiscoverySpi.setIdleConnectionTimeout - maximum idle connection
> > timeout
> > > upon which a connection will be closed.
> > >
> > >
> > >
> > >
> > > --
> > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts

yzhdanov
Alexey, generally I agree. However, I don't understand what exactly you
suggest. Can you please list the list of parameters you want to deprecate
(1), internal logic changes (2) and updates to the javadocs/description of
the params you want to keep (3)?

--Yakov
Reply | Threaded
Open this post in threaded view
|

Re: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts

Alexey Popov
Yakov,

1. The proposal list of parameters to deprecate:
TcpDiscoverySpi.setConnectTimeout (covered by
IgniteConfiguration.setFailureDetectionTimeout)
TcpDiscoverySpi.setReconnectCount (covered by
IgniteConfiguration.setFailureDetectionTimeout)
TcpDiscoverySpi.setSocketTimeout (covered by
IgniteConfiguration.setFailureDetectionTimeout)
TcpDiscoverySpi.setAckTimeout (covered by
IgniteConfiguration.setFailureDetectionTimeout)
TcpDiscoverySpi.setMaxAckTimeout (covered by
IgniteConfiguration.setFailureDetectionTimeout)
TcpDiscoverySpi.setNetworkTimeout (IgniteConfiguration.setNetworkTimeout
should be used here)
TcpCommunicationSpi.setConnectTimeout (covered by
IgniteConfiguration.setFailureDetectionTimeout)
TcpCommunicationSpi.setMaxConnectTimeout (covered by
IgniteConfiguration.setFailureDetectionTimeout)
TcpCommunicationSpi.setReconnectCount (covered by
IgniteConfiguration.setFailureDetectionTimeout)
TcpCommunicationSpi.setSocketWriteTimeout
(IgniteConfiguration.setNetworkTimeout should be used here)

2. Internal logic should continue to use
IgniteConfiguration.setFailureDetectionTimeout and
IgniteConfiguration.setNetworkTimeout as it is now.
The deprecated parameters should be alive for a while with the corresponding
javadoc update.
TcpDiscoverySpi.getNetworkTimeout should use
IgniteConfiguration.getNetworkTimeout by default.
TcpCommunicationSpi.getSocketWriteTimeout should use
IgniteConfiguration.getNetworkTimeout by default.

In a few releases, the deprecated parameters could be removed.

3. I think we can keep the existent description of the parameters. Probably,
it could be updated for more clear statements for
IgniteConfiguration.setFailureDetectionTimeout and
IgniteConfiguration.setNetworkTimeout.

Thanks,
Alexey



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts

Denis Mekhanikov
Absolutely agree.

Personally I find it particularly frustrating, that
*IgniteConfiguration.networkTimeout* and TcpDiscoverySpi.networkTime*out *are
not the same thing.

If we had a small set of timeouts with simple and clear semantics, it would
make everybody happier.

Denis

вт, 6 мар. 2018 г. в 15:23, Alexey Popov <[hidden email]>:

> Yakov,
>
> 1. The proposal list of parameters to deprecate:
> TcpDiscoverySpi.setConnectTimeout (covered by
> IgniteConfiguration.setFailureDetectionTimeout)
> TcpDiscoverySpi.setReconnectCount (covered by
> IgniteConfiguration.setFailureDetectionTimeout)
> TcpDiscoverySpi.setSocketTimeout (covered by
> IgniteConfiguration.setFailureDetectionTimeout)
> TcpDiscoverySpi.setAckTimeout (covered by
> IgniteConfiguration.setFailureDetectionTimeout)
> TcpDiscoverySpi.setMaxAckTimeout (covered by
> IgniteConfiguration.setFailureDetectionTimeout)
> TcpDiscoverySpi.setNetworkTimeout (IgniteConfiguration.setNetworkTimeout
> should be used here)
> TcpCommunicationSpi.setConnectTimeout (covered by
> IgniteConfiguration.setFailureDetectionTimeout)
> TcpCommunicationSpi.setMaxConnectTimeout (covered by
> IgniteConfiguration.setFailureDetectionTimeout)
> TcpCommunicationSpi.setReconnectCount (covered by
> IgniteConfiguration.setFailureDetectionTimeout)
> TcpCommunicationSpi.setSocketWriteTimeout
> (IgniteConfiguration.setNetworkTimeout should be used here)
>
> 2. Internal logic should continue to use
> IgniteConfiguration.setFailureDetectionTimeout and
> IgniteConfiguration.setNetworkTimeout as it is now.
> The deprecated parameters should be alive for a while with the
> corresponding
> javadoc update.
> TcpDiscoverySpi.getNetworkTimeout should use
> IgniteConfiguration.getNetworkTimeout by default.
> TcpCommunicationSpi.getSocketWriteTimeout should use
> IgniteConfiguration.getNetworkTimeout by default.
>
> In a few releases, the deprecated parameters could be removed.
>
> 3. I think we can keep the existent description of the parameters.
> Probably,
> it could be updated for more clear statements for
> IgniteConfiguration.setFailureDetectionTimeout and
> IgniteConfiguration.setNetworkTimeout.
>
> Thanks,
> Alexey
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts

Alexey Popov
Hi Yakov,

Do the proposed changes look good to you?

Thanks,
Alexey



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

RE: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts

Stanislav Lukyanov
In reply to this post by Alexey Popov
Hi folks,

It looks like we stopped half-way with this activity. I’d like to pick it up.

All seem to agree that we should simplify the timeout settings.
Here are the specific actions I’d like to propose:

1. Promote the use of global timeouts as the best practice
*What*: update the docs to encourage users to rely on the following timeouts for their “network stability” settings
IgniteConfiguration.failureDetectionTimeout
IgniteConfiguration.clientFailureDetectionTimeout
IgniteConfiguration.networkTimeout
*When*: update readme.io docs for 2.5 and Javadoc for 2.6

2. Discourage the use of finer timeouts
*What*:
- update the docs to discourage users to use the following timeouts and announce their upcoming deprecation and removal
TcpDiscoverySpi.socketTimeout
TcpDiscoverySpi.ackTimeout
TcpDiscoverySpi.maxAckTimeout
TcpDiscoverySpi.reconnectCount
TcpCommunicationSpi.connectTimeout
TcpCommunicationSpi.maxConnectTimeout
TcpCommunicationSpi.reconnectCount
- deprecate the properties in code
- remove the properties in code
*When*:
- readme.io update with deprecation announcement for 2.5
- @Deprecated in code + Javadoc update + respective readme.io rewording for 2.6
- properties removal in 3.0

3. Make “orphan” timeouts rely on global timeouts, then deprecate and remove
*What*:
Two settings currently don’t default to the global equivalents, although they should:
- TcpCommunicationSpi.socketWriteTimeout should default to failureDetectionTimeout
- TcpDiscoverySpi.networkTimeout should default to IgniteConfiguration.networkTImeout
So the course of action would be:
- update the docs to explain that these timeouts have to be used for now, but announce their upcoming deprecation and removal
- change the properties to default to their global counterparts and deprecate them in code
- remove the properties in code
*When*:
- readme.io update with deprecation announcement for 2.5
- changing defaults + @Deprecated in code + Javadoc update + respective readme.io rewording for 2.6
- properties removal in 3.0

4. Don’t touch other timeouts
Other timeouts, like TcpDiscoverySpi.joinTimeout or TcpCommunicationSpi.idleConnectionTimeout, are orthogonal to the whole
“network stability” theme discussed above, and don’t have to be changed.

Finally, I’ve prepared a draft of the docs page that may be used as a base for the readme.io update.
This email is pretty long already, so please find the draft attached to the JIRA issue
https://issues.apache.org/jira/browse/IGNITE-7704.

Please share your thoughts.

Thanks,
Stan

From: Alexey Popov
Sent: 1 марта 2018 г. 17:01
To: [hidden email]
Subject: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts

Hi Igniters,

We often see similar questions from users and customers related to
IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts and their
relations. And we see several side-effects after incorrect timeout
configuration.

I tried to briefly describe these timeout settings (please see below) and
found out that the most of them do not have sense in terms of cluster
functions/operations and could not be explained to the users.

I propose to deprecate most of them and leave only the timeouts we can
explain in common terms ( (setFailureDetectionTimeout, setNetworkTimeout,
setJoinTimeout and some others).

Please let me know your thoughts.

Thanks,
Alexey

GLOBAL:

IgniteConfiguration.setNetworkTimeout:
It is a global timeout for high-level operations where a network is
involved. For instance, IgniteMessaging delivery uses this timeout or
DiscoverySpi handshake.

IgniteConfiguration.setFailureDetectionTimeout:
It is a global timeout for detecting failures at IgniteSpi implementations
(including DiscoverySpi and CommunicationSpi).
The failure detection algorithm actually limits a range of simple network
operations related to a single logical operation (for instance, a reliable
delivery of some DiscoverySpi message within a cluster).
Failure detection timeout is a cumulative timeout for a socket connection,
sending and receiving data bytes and all possible socket retries (if some
failure happens).
This timeout is intended to simplify the failure detection condition from a
user perspective.

IgniteConfiguration.setClientFailureDetectionTimeout: - it is a special case
for DiscoverySpi client-node Ignite.

TCP DISCOVERY SPI:

If you need more control over failure detection algorithm for
TcpDiscoverySpi you can explicitly use the following low-level options (that
will disable failureDetectoinTimeout logic):

1. TcpDiscoverySpi.setConnectTimeout - socket connection timeout
2. TcpDiscoverySpi.setReconnectCount - number of reconnect attempts used
when establishing connection with the remote node and sending messages to it
3. TcpDiscoverySpi.setSocketTimeout - socket write timeout. The write
operation will be repeated getReconnectCount() times if it exceeds this
timeout
4. TcpDiscoverySpi.setAckTimeout - message acknowledgment timeout. If a
message acknowledgment is not received within this timeout, sending is
considered as failed and SPI will try to repeat send operation. It is
automatically doubled for simultaneous retries up to getMaxAckTimeout value.
5. TcpDiscoverySpi.setMaxAckTimeout - maximum connection timeout, if the
getAckTimeout reaches getMaxAckTimeout then SPI give up sending retries

Another important TcpDiscoverySpi timeouts:

TcpDiscoverySpi.setJoinTimeout - It is a timeout for join process when a
new/restarted node joins a cluster. The node tries to connect to all
available IP addresses provided by ipFinder within this timeout.
If the timeout is exceeded, the node will give up and throw an exception
from Ignition.start().

TcpDiscoverySpi.setNetworkTimeout - timeout for high-level operations like
handshake. It looks like it should be deprecated and the
IgniteConfiguration.getNetworkTimeout should be used here.

TCP COMMUNICATION SPI:

If you need more control over failure detection algorithm for
TcpCommunicationSpi you can explicitly use the following low-level options
(that will disable failureDetectoinTimeout logic):

1. TcpCommunicationSpi.setConnectTimeout - socket connection timeout, will
be automatically doubled for simultaneous retries (up to getReconnectCount)
related to a single logical operation
2. TcpCommunicationSpi.setMaxConnectTimeout - maximum connection timeout,
the higher limit of getReconnectCount-times doubled getConnectTimeout
3. TcpCommunicationSpi.setReconnectCount - number of reconnect attempts used
when establishing connection with the remote node and sending messages to it

Another important TcpCommunicationSpi timeouts:

TcpDiscoverySpi.setSocketWriteTimeout - timeout to send a message.
TcpDiscoverySpi.setIdleConnectionTimeout - maximum idle connection timeout
upon which a connection will be closed.




--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts

Valentin Kulichenko
Hi Stan,

I'm 100% for this activity, however I don't think we should change the
behavior of timeouts you listed in #2 - this can lead to unexpected
behavior for users who already use them. I would just deprecate them and
eventually remove.

-Val

On Mon, May 28, 2018 at 1:29 PM, Stanislav Lukyanov <[hidden email]>
wrote:

> Hi folks,
>
> It looks like we stopped half-way with this activity. I’d like to pick it
> up.
>
> All seem to agree that we should simplify the timeout settings.
> Here are the specific actions I’d like to propose:
>
> 1. Promote the use of global timeouts as the best practice
> *What*: update the docs to encourage users to rely on the following
> timeouts for their “network stability” settings
> IgniteConfiguration.failureDetectionTimeout
> IgniteConfiguration.clientFailureDetectionTimeout
> IgniteConfiguration.networkTimeout
> *When*: update readme.io docs for 2.5 and Javadoc for 2.6
>
> 2. Discourage the use of finer timeouts
> *What*:
> - update the docs to discourage users to use the following timeouts and
> announce their upcoming deprecation and removal
> TcpDiscoverySpi.socketTimeout
> TcpDiscoverySpi.ackTimeout
> TcpDiscoverySpi.maxAckTimeout
> TcpDiscoverySpi.reconnectCount
> TcpCommunicationSpi.connectTimeout
> TcpCommunicationSpi.maxConnectTimeout
> TcpCommunicationSpi.reconnectCount
> - deprecate the properties in code
> - remove the properties in code
> *When*:
> - readme.io update with deprecation announcement for 2.5
> - @Deprecated in code + Javadoc update + respective readme.io rewording
> for 2.6
> - properties removal in 3.0
>
> 3. Make “orphan” timeouts rely on global timeouts, then deprecate and
> remove
> *What*:
> Two settings currently don’t default to the global equivalents, although
> they should:
> - TcpCommunicationSpi.socketWriteTimeout should default to
> failureDetectionTimeout
> - TcpDiscoverySpi.networkTimeout should default to IgniteConfiguration.
> networkTImeout
> So the course of action would be:
> - update the docs to explain that these timeouts have to be used for now,
> but announce their upcoming deprecation and removal
> - change the properties to default to their global counterparts and
> deprecate them in code
> - remove the properties in code
> *When*:
> - readme.io update with deprecation announcement for 2.5
> - changing defaults + @Deprecated in code + Javadoc update + respective
> readme.io rewording for 2.6
> - properties removal in 3.0
>
> 4. Don’t touch other timeouts
> Other timeouts, like TcpDiscoverySpi.joinTimeout or TcpCommunicationSpi.idleConnectionTimeout,
> are orthogonal to the whole
> “network stability” theme discussed above, and don’t have to be changed.
>
> Finally, I’ve prepared a draft of the docs page that may be used as a base
> for the readme.io update.
> This email is pretty long already, so please find the draft attached to
> the JIRA issue
> https://issues.apache.org/jira/browse/IGNITE-7704.
>
> Please share your thoughts.
>
> Thanks,
> Stan
>
> From: Alexey Popov
> Sent: 1 марта 2018 г. 17:01
> To: [hidden email]
> Subject: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts
>
> Hi Igniters,
>
> We often see similar questions from users and customers related to
> IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts and
> their
> relations. And we see several side-effects after incorrect timeout
> configuration.
>
> I tried to briefly describe these timeout settings (please see below) and
> found out that the most of them do not have sense in terms of cluster
> functions/operations and could not be explained to the users.
>
> I propose to deprecate most of them and leave only the timeouts we can
> explain in common terms ( (setFailureDetectionTimeout, setNetworkTimeout,
> setJoinTimeout and some others).
>
> Please let me know your thoughts.
>
> Thanks,
> Alexey
>
> GLOBAL:
>
> IgniteConfiguration.setNetworkTimeout:
> It is a global timeout for high-level operations where a network is
> involved. For instance, IgniteMessaging delivery uses this timeout or
> DiscoverySpi handshake.
>
> IgniteConfiguration.setFailureDetectionTimeout:
> It is a global timeout for detecting failures at IgniteSpi implementations
> (including DiscoverySpi and CommunicationSpi).
> The failure detection algorithm actually limits a range of simple network
> operations related to a single logical operation (for instance, a reliable
> delivery of some DiscoverySpi message within a cluster).
> Failure detection timeout is a cumulative timeout for a socket connection,
> sending and receiving data bytes and all possible socket retries (if some
> failure happens).
> This timeout is intended to simplify the failure detection condition from a
> user perspective.
>
> IgniteConfiguration.setClientFailureDetectionTimeout: - it is a special
> case
> for DiscoverySpi client-node Ignite.
>
> TCP DISCOVERY SPI:
>
> If you need more control over failure detection algorithm for
> TcpDiscoverySpi you can explicitly use the following low-level options
> (that
> will disable failureDetectoinTimeout logic):
>
> 1. TcpDiscoverySpi.setConnectTimeout - socket connection timeout
> 2. TcpDiscoverySpi.setReconnectCount - number of reconnect attempts used
> when establishing connection with the remote node and sending messages to
> it
> 3. TcpDiscoverySpi.setSocketTimeout - socket write timeout. The write
> operation will be repeated getReconnectCount() times if it exceeds this
> timeout
> 4. TcpDiscoverySpi.setAckTimeout - message acknowledgment timeout. If a
> message acknowledgment is not received within this timeout, sending is
> considered as failed and SPI will try to repeat send operation. It is
> automatically doubled for simultaneous retries up to getMaxAckTimeout
> value.
> 5. TcpDiscoverySpi.setMaxAckTimeout - maximum connection timeout, if the
> getAckTimeout reaches getMaxAckTimeout then SPI give up sending retries
>
> Another important TcpDiscoverySpi timeouts:
>
> TcpDiscoverySpi.setJoinTimeout - It is a timeout for join process when a
> new/restarted node joins a cluster. The node tries to connect to all
> available IP addresses provided by ipFinder within this timeout.
> If the timeout is exceeded, the node will give up and throw an exception
> from Ignition.start().
>
> TcpDiscoverySpi.setNetworkTimeout - timeout for high-level operations like
> handshake. It looks like it should be deprecated and the
> IgniteConfiguration.getNetworkTimeout should be used here.
>
> TCP COMMUNICATION SPI:
>
> If you need more control over failure detection algorithm for
> TcpCommunicationSpi you can explicitly use the following low-level options
> (that will disable failureDetectoinTimeout logic):
>
> 1. TcpCommunicationSpi.setConnectTimeout - socket connection timeout, will
> be automatically doubled for simultaneous retries (up to getReconnectCount)
> related to a single logical operation
> 2. TcpCommunicationSpi.setMaxConnectTimeout - maximum connection timeout,
> the higher limit of getReconnectCount-times doubled getConnectTimeout
> 3. TcpCommunicationSpi.setReconnectCount - number of reconnect attempts
> used
> when establishing connection with the remote node and sending messages to
> it
>
> Another important TcpCommunicationSpi timeouts:
>
> TcpDiscoverySpi.setSocketWriteTimeout - timeout to send a message.
> TcpDiscoverySpi.setIdleConnectionTimeout - maximum idle connection timeout
> upon which a connection will be closed.
>
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>
>
Reply | Threaded
Open this post in threaded view
|

RE: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpitimeouts

Stanislav Lukyanov
Val,

Which timeouts do you mean?

In #2 I don’t propose to change behavior.

I propose to change behavior for a couple of settings in #3 though.
I believe the correct approach here would be to target the behavior change for 2.6,
but keep in mind that we’ll need to carefully analyze the impact before actually making the changes.

Thanks,
Stan

From: Valentin Kulichenko
Sent: 29 мая 2018 г. 0:57
To: [hidden email]
Subject: Re: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpitimeouts

Hi Stan,

I'm 100% for this activity, however I don't think we should change the
behavior of timeouts you listed in #2 - this can lead to unexpected
behavior for users who already use them. I would just deprecate them and
eventually remove.

-Val

On Mon, May 28, 2018 at 1:29 PM, Stanislav Lukyanov <[hidden email]>
wrote:

> Hi folks,
>
> It looks like we stopped half-way with this activity. I’d like to pick it
> up.
>
> All seem to agree that we should simplify the timeout settings.
> Here are the specific actions I’d like to propose:
>
> 1. Promote the use of global timeouts as the best practice
> *What*: update the docs to encourage users to rely on the following
> timeouts for their “network stability” settings
> IgniteConfiguration.failureDetectionTimeout
> IgniteConfiguration.clientFailureDetectionTimeout
> IgniteConfiguration.networkTimeout
> *When*: update readme.io docs for 2.5 and Javadoc for 2.6
>
> 2. Discourage the use of finer timeouts
> *What*:
> - update the docs to discourage users to use the following timeouts and
> announce their upcoming deprecation and removal
> TcpDiscoverySpi.socketTimeout
> TcpDiscoverySpi.ackTimeout
> TcpDiscoverySpi.maxAckTimeout
> TcpDiscoverySpi.reconnectCount
> TcpCommunicationSpi.connectTimeout
> TcpCommunicationSpi.maxConnectTimeout
> TcpCommunicationSpi.reconnectCount
> - deprecate the properties in code
> - remove the properties in code
> *When*:
> - readme.io update with deprecation announcement for 2.5
> - @Deprecated in code + Javadoc update + respective readme.io rewording
> for 2.6
> - properties removal in 3.0
>
> 3. Make “orphan” timeouts rely on global timeouts, then deprecate and
> remove
> *What*:
> Two settings currently don’t default to the global equivalents, although
> they should:
> - TcpCommunicationSpi.socketWriteTimeout should default to
> failureDetectionTimeout
> - TcpDiscoverySpi.networkTimeout should default to IgniteConfiguration.
> networkTImeout
> So the course of action would be:
> - update the docs to explain that these timeouts have to be used for now,
> but announce their upcoming deprecation and removal
> - change the properties to default to their global counterparts and
> deprecate them in code
> - remove the properties in code
> *When*:
> - readme.io update with deprecation announcement for 2.5
> - changing defaults + @Deprecated in code + Javadoc update + respective
> readme.io rewording for 2.6
> - properties removal in 3.0
>
> 4. Don’t touch other timeouts
> Other timeouts, like TcpDiscoverySpi.joinTimeout or TcpCommunicationSpi.idleConnectionTimeout,
> are orthogonal to the whole
> “network stability” theme discussed above, and don’t have to be changed.
>
> Finally, I’ve prepared a draft of the docs page that may be used as a base
> for the readme.io update.
> This email is pretty long already, so please find the draft attached to
> the JIRA issue
> https://issues.apache.org/jira/browse/IGNITE-7704.
>
> Please share your thoughts.
>
> Thanks,
> Stan
>
> From: Alexey Popov
> Sent: 1 марта 2018 г. 17:01
> To: [hidden email]
> Subject: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts
>
> Hi Igniters,
>
> We often see similar questions from users and customers related to
> IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts and
> their
> relations. And we see several side-effects after incorrect timeout
> configuration.
>
> I tried to briefly describe these timeout settings (please see below) and
> found out that the most of them do not have sense in terms of cluster
> functions/operations and could not be explained to the users.
>
> I propose to deprecate most of them and leave only the timeouts we can
> explain in common terms ( (setFailureDetectionTimeout, setNetworkTimeout,
> setJoinTimeout and some others).
>
> Please let me know your thoughts.
>
> Thanks,
> Alexey
>
> GLOBAL:
>
> IgniteConfiguration.setNetworkTimeout:
> It is a global timeout for high-level operations where a network is
> involved. For instance, IgniteMessaging delivery uses this timeout or
> DiscoverySpi handshake.
>
> IgniteConfiguration.setFailureDetectionTimeout:
> It is a global timeout for detecting failures at IgniteSpi implementations
> (including DiscoverySpi and CommunicationSpi).
> The failure detection algorithm actually limits a range of simple network
> operations related to a single logical operation (for instance, a reliable
> delivery of some DiscoverySpi message within a cluster).
> Failure detection timeout is a cumulative timeout for a socket connection,
> sending and receiving data bytes and all possible socket retries (if some
> failure happens).
> This timeout is intended to simplify the failure detection condition from a
> user perspective.
>
> IgniteConfiguration.setClientFailureDetectionTimeout: - it is a special
> case
> for DiscoverySpi client-node Ignite.
>
> TCP DISCOVERY SPI:
>
> If you need more control over failure detection algorithm for
> TcpDiscoverySpi you can explicitly use the following low-level options
> (that
> will disable failureDetectoinTimeout logic):
>
> 1. TcpDiscoverySpi.setConnectTimeout - socket connection timeout
> 2. TcpDiscoverySpi.setReconnectCount - number of reconnect attempts used
> when establishing connection with the remote node and sending messages to
> it
> 3. TcpDiscoverySpi.setSocketTimeout - socket write timeout. The write
> operation will be repeated getReconnectCount() times if it exceeds this
> timeout
> 4. TcpDiscoverySpi.setAckTimeout - message acknowledgment timeout. If a
> message acknowledgment is not received within this timeout, sending is
> considered as failed and SPI will try to repeat send operation. It is
> automatically doubled for simultaneous retries up to getMaxAckTimeout
> value.
> 5. TcpDiscoverySpi.setMaxAckTimeout - maximum connection timeout, if the
> getAckTimeout reaches getMaxAckTimeout then SPI give up sending retries
>
> Another important TcpDiscoverySpi timeouts:
>
> TcpDiscoverySpi.setJoinTimeout - It is a timeout for join process when a
> new/restarted node joins a cluster. The node tries to connect to all
> available IP addresses provided by ipFinder within this timeout.
> If the timeout is exceeded, the node will give up and throw an exception
> from Ignition.start().
>
> TcpDiscoverySpi.setNetworkTimeout - timeout for high-level operations like
> handshake. It looks like it should be deprecated and the
> IgniteConfiguration.getNetworkTimeout should be used here.
>
> TCP COMMUNICATION SPI:
>
> If you need more control over failure detection algorithm for
> TcpCommunicationSpi you can explicitly use the following low-level options
> (that will disable failureDetectoinTimeout logic):
>
> 1. TcpCommunicationSpi.setConnectTimeout - socket connection timeout, will
> be automatically doubled for simultaneous retries (up to getReconnectCount)
> related to a single logical operation
> 2. TcpCommunicationSpi.setMaxConnectTimeout - maximum connection timeout,
> the higher limit of getReconnectCount-times doubled getConnectTimeout
> 3. TcpCommunicationSpi.setReconnectCount - number of reconnect attempts
> used
> when establishing connection with the remote node and sending messages to
> it
>
> Another important TcpCommunicationSpi timeouts:
>
> TcpDiscoverySpi.setSocketWriteTimeout - timeout to send a message.
> TcpDiscoverySpi.setIdleConnectionTimeout - maximum idle connection timeout
> upon which a connection will be closed.
>
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>
>

Reply | Threaded
Open this post in threaded view
|

Re: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpitimeouts

Valentin Kulichenko
Stan,

OK, I got confused a little :)

I do agree that TcpDiscoverySpi.networkTimeout should inherit from
IgniteConfiguration.networkTImeout if not set explicitly. Do we have the
same setting for TcpCommunicationSpi, BTW? If yes, behavior should be
consistent.

As for TcpCommunicationSpi.socketWriteTimeout, I'm not sure why you want to
change its behavior. Can we just deprecate it and eventually remove, just
as we plan to do for all timeouts from #2?

-Val

On Tue, May 29, 2018 at 3:50 AM, Stanislav Lukyanov <[hidden email]>
wrote:

> Val,
>
> Which timeouts do you mean?
>
> In #2 I don’t propose to change behavior.
>
> I propose to change behavior for a couple of settings in #3 though.
> I believe the correct approach here would be to target the behavior change
> for 2.6,
> but keep in mind that we’ll need to carefully analyze the impact before
> actually making the changes.
>
> Thanks,
> Stan
>
> From: Valentin Kulichenko
> Sent: 29 мая 2018 г. 0:57
> To: [hidden email]
> Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> TcpCommunicationSpitimeouts
>
> Hi Stan,
>
> I'm 100% for this activity, however I don't think we should change the
> behavior of timeouts you listed in #2 - this can lead to unexpected
> behavior for users who already use them. I would just deprecate them and
> eventually remove.
>
> -Val
>
> On Mon, May 28, 2018 at 1:29 PM, Stanislav Lukyanov <
> [hidden email]>
> wrote:
>
> > Hi folks,
> >
> > It looks like we stopped half-way with this activity. I’d like to pick it
> > up.
> >
> > All seem to agree that we should simplify the timeout settings.
> > Here are the specific actions I’d like to propose:
> >
> > 1. Promote the use of global timeouts as the best practice
> > *What*: update the docs to encourage users to rely on the following
> > timeouts for their “network stability” settings
> > IgniteConfiguration.failureDetectionTimeout
> > IgniteConfiguration.clientFailureDetectionTimeout
> > IgniteConfiguration.networkTimeout
> > *When*: update readme.io docs for 2.5 and Javadoc for 2.6
> >
> > 2. Discourage the use of finer timeouts
> > *What*:
> > - update the docs to discourage users to use the following timeouts and
> > announce their upcoming deprecation and removal
> > TcpDiscoverySpi.socketTimeout
> > TcpDiscoverySpi.ackTimeout
> > TcpDiscoverySpi.maxAckTimeout
> > TcpDiscoverySpi.reconnectCount
> > TcpCommunicationSpi.connectTimeout
> > TcpCommunicationSpi.maxConnectTimeout
> > TcpCommunicationSpi.reconnectCount
> > - deprecate the properties in code
> > - remove the properties in code
> > *When*:
> > - readme.io update with deprecation announcement for 2.5
> > - @Deprecated in code + Javadoc update + respective readme.io rewording
> > for 2.6
> > - properties removal in 3.0
> >
> > 3. Make “orphan” timeouts rely on global timeouts, then deprecate and
> > remove
> > *What*:
> > Two settings currently don’t default to the global equivalents, although
> > they should:
> > - TcpCommunicationSpi.socketWriteTimeout should default to
> > failureDetectionTimeout
> > - TcpDiscoverySpi.networkTimeout should default to IgniteConfiguration.
> > networkTImeout
> > So the course of action would be:
> > - update the docs to explain that these timeouts have to be used for now,
> > but announce their upcoming deprecation and removal
> > - change the properties to default to their global counterparts and
> > deprecate them in code
> > - remove the properties in code
> > *When*:
> > - readme.io update with deprecation announcement for 2.5
> > - changing defaults + @Deprecated in code + Javadoc update + respective
> > readme.io rewording for 2.6
> > - properties removal in 3.0
> >
> > 4. Don’t touch other timeouts
> > Other timeouts, like TcpDiscoverySpi.joinTimeout or TcpCommunicationSpi.
> idleConnectionTimeout,
> > are orthogonal to the whole
> > “network stability” theme discussed above, and don’t have to be changed.
> >
> > Finally, I’ve prepared a draft of the docs page that may be used as a
> base
> > for the readme.io update.
> > This email is pretty long already, so please find the draft attached to
> > the JIRA issue
> > https://issues.apache.org/jira/browse/IGNITE-7704.
> >
> > Please share your thoughts.
> >
> > Thanks,
> > Stan
> >
> > From: Alexey Popov
> > Sent: 1 марта 2018 г. 17:01
> > To: [hidden email]
> > Subject: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi
> timeouts
> >
> > Hi Igniters,
> >
> > We often see similar questions from users and customers related to
> > IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts and
> > their
> > relations. And we see several side-effects after incorrect timeout
> > configuration.
> >
> > I tried to briefly describe these timeout settings (please see below) and
> > found out that the most of them do not have sense in terms of cluster
> > functions/operations and could not be explained to the users.
> >
> > I propose to deprecate most of them and leave only the timeouts we can
> > explain in common terms ( (setFailureDetectionTimeout, setNetworkTimeout,
> > setJoinTimeout and some others).
> >
> > Please let me know your thoughts.
> >
> > Thanks,
> > Alexey
> >
> > GLOBAL:
> >
> > IgniteConfiguration.setNetworkTimeout:
> > It is a global timeout for high-level operations where a network is
> > involved. For instance, IgniteMessaging delivery uses this timeout or
> > DiscoverySpi handshake.
> >
> > IgniteConfiguration.setFailureDetectionTimeout:
> > It is a global timeout for detecting failures at IgniteSpi
> implementations
> > (including DiscoverySpi and CommunicationSpi).
> > The failure detection algorithm actually limits a range of simple network
> > operations related to a single logical operation (for instance, a
> reliable
> > delivery of some DiscoverySpi message within a cluster).
> > Failure detection timeout is a cumulative timeout for a socket
> connection,
> > sending and receiving data bytes and all possible socket retries (if some
> > failure happens).
> > This timeout is intended to simplify the failure detection condition
> from a
> > user perspective.
> >
> > IgniteConfiguration.setClientFailureDetectionTimeout: - it is a special
> > case
> > for DiscoverySpi client-node Ignite.
> >
> > TCP DISCOVERY SPI:
> >
> > If you need more control over failure detection algorithm for
> > TcpDiscoverySpi you can explicitly use the following low-level options
> > (that
> > will disable failureDetectoinTimeout logic):
> >
> > 1. TcpDiscoverySpi.setConnectTimeout - socket connection timeout
> > 2. TcpDiscoverySpi.setReconnectCount - number of reconnect attempts used
> > when establishing connection with the remote node and sending messages to
> > it
> > 3. TcpDiscoverySpi.setSocketTimeout - socket write timeout. The write
> > operation will be repeated getReconnectCount() times if it exceeds this
> > timeout
> > 4. TcpDiscoverySpi.setAckTimeout - message acknowledgment timeout. If a
> > message acknowledgment is not received within this timeout, sending is
> > considered as failed and SPI will try to repeat send operation. It is
> > automatically doubled for simultaneous retries up to getMaxAckTimeout
> > value.
> > 5. TcpDiscoverySpi.setMaxAckTimeout - maximum connection timeout, if the
> > getAckTimeout reaches getMaxAckTimeout then SPI give up sending retries
> >
> > Another important TcpDiscoverySpi timeouts:
> >
> > TcpDiscoverySpi.setJoinTimeout - It is a timeout for join process when a
> > new/restarted node joins a cluster. The node tries to connect to all
> > available IP addresses provided by ipFinder within this timeout.
> > If the timeout is exceeded, the node will give up and throw an exception
> > from Ignition.start().
> >
> > TcpDiscoverySpi.setNetworkTimeout - timeout for high-level operations
> like
> > handshake. It looks like it should be deprecated and the
> > IgniteConfiguration.getNetworkTimeout should be used here.
> >
> > TCP COMMUNICATION SPI:
> >
> > If you need more control over failure detection algorithm for
> > TcpCommunicationSpi you can explicitly use the following low-level
> options
> > (that will disable failureDetectoinTimeout logic):
> >
> > 1. TcpCommunicationSpi.setConnectTimeout - socket connection timeout,
> will
> > be automatically doubled for simultaneous retries (up to
> getReconnectCount)
> > related to a single logical operation
> > 2. TcpCommunicationSpi.setMaxConnectTimeout - maximum connection
> timeout,
> > the higher limit of getReconnectCount-times doubled getConnectTimeout
> > 3. TcpCommunicationSpi.setReconnectCount - number of reconnect attempts
> > used
> > when establishing connection with the remote node and sending messages to
> > it
> >
> > Another important TcpCommunicationSpi timeouts:
> >
> > TcpDiscoverySpi.setSocketWriteTimeout - timeout to send a message.
> > TcpDiscoverySpi.setIdleConnectionTimeout - maximum idle connection
> timeout
> > upon which a connection will be closed.
> >
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >
> >
>
>
Reply | Threaded
Open this post in threaded view
|

RE: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpitimeouts

Stanislav Lukyanov
On networkTimeout: no, we don’t have anything like that in TcpCommunicationSpi.

On socketWriteTimeout:
First, its semantic is very close to TcpDicsoverySpi.socketTimeout (with the exception that communication uses NIO), and the latter defaults to failureDetectionTimeout,
so I think it would help to avoid confusion.
Second, I think we can’t deprecate something without an alternative that would work for most users.
On the other hand, if we do default socketWriteTimeout to failureDetectionTimeout then we reach a pretty decent API state
where one only needs two properties in IgniteConfiguration neither of which we’re considering for deprecation and removal in 3.0.

Stan

From: Valentin Kulichenko
Sent: 29 мая 2018 г. 22:17
To: [hidden email]
Subject: Re: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpitimeouts

Stan,

OK, I got confused a little :)

I do agree that TcpDiscoverySpi.networkTimeout should inherit from
IgniteConfiguration.networkTImeout if not set explicitly. Do we have the
same setting for TcpCommunicationSpi, BTW? If yes, behavior should be
consistent.

As for TcpCommunicationSpi.socketWriteTimeout, I'm not sure why you want to
change its behavior. Can we just deprecate it and eventually remove, just
as we plan to do for all timeouts from #2?

-Val

On Tue, May 29, 2018 at 3:50 AM, Stanislav Lukyanov <[hidden email]>
wrote:

> Val,
>
> Which timeouts do you mean?
>
> In #2 I don’t propose to change behavior.
>
> I propose to change behavior for a couple of settings in #3 though.
> I believe the correct approach here would be to target the behavior change
> for 2.6,
> but keep in mind that we’ll need to carefully analyze the impact before
> actually making the changes.
>
> Thanks,
> Stan
>
> From: Valentin Kulichenko
> Sent: 29 мая 2018 г. 0:57
> To: [hidden email]
> Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> TcpCommunicationSpitimeouts
>
> Hi Stan,
>
> I'm 100% for this activity, however I don't think we should change the
> behavior of timeouts you listed in #2 - this can lead to unexpected
> behavior for users who already use them. I would just deprecate them and
> eventually remove.
>
> -Val
>
> On Mon, May 28, 2018 at 1:29 PM, Stanislav Lukyanov <
> [hidden email]>
> wrote:
>
> > Hi folks,
> >
> > It looks like we stopped half-way with this activity. I’d like to pick it
> > up.
> >
> > All seem to agree that we should simplify the timeout settings.
> > Here are the specific actions I’d like to propose:
> >
> > 1. Promote the use of global timeouts as the best practice
> > *What*: update the docs to encourage users to rely on the following
> > timeouts for their “network stability” settings
> > IgniteConfiguration.failureDetectionTimeout
> > IgniteConfiguration.clientFailureDetectionTimeout
> > IgniteConfiguration.networkTimeout
> > *When*: update readme.io docs for 2.5 and Javadoc for 2.6
> >
> > 2. Discourage the use of finer timeouts
> > *What*:
> > - update the docs to discourage users to use the following timeouts and
> > announce their upcoming deprecation and removal
> > TcpDiscoverySpi.socketTimeout
> > TcpDiscoverySpi.ackTimeout
> > TcpDiscoverySpi.maxAckTimeout
> > TcpDiscoverySpi.reconnectCount
> > TcpCommunicationSpi.connectTimeout
> > TcpCommunicationSpi.maxConnectTimeout
> > TcpCommunicationSpi.reconnectCount
> > - deprecate the properties in code
> > - remove the properties in code
> > *When*:
> > - readme.io update with deprecation announcement for 2.5
> > - @Deprecated in code + Javadoc update + respective readme.io rewording
> > for 2.6
> > - properties removal in 3.0
> >
> > 3. Make “orphan” timeouts rely on global timeouts, then deprecate and
> > remove
> > *What*:
> > Two settings currently don’t default to the global equivalents, although
> > they should:
> > - TcpCommunicationSpi.socketWriteTimeout should default to
> > failureDetectionTimeout
> > - TcpDiscoverySpi.networkTimeout should default to IgniteConfiguration.
> > networkTImeout
> > So the course of action would be:
> > - update the docs to explain that these timeouts have to be used for now,
> > but announce their upcoming deprecation and removal
> > - change the properties to default to their global counterparts and
> > deprecate them in code
> > - remove the properties in code
> > *When*:
> > - readme.io update with deprecation announcement for 2.5
> > - changing defaults + @Deprecated in code + Javadoc update + respective
> > readme.io rewording for 2.6
> > - properties removal in 3.0
> >
> > 4. Don’t touch other timeouts
> > Other timeouts, like TcpDiscoverySpi.joinTimeout or TcpCommunicationSpi.
> idleConnectionTimeout,
> > are orthogonal to the whole
> > “network stability” theme discussed above, and don’t have to be changed.
> >
> > Finally, I’ve prepared a draft of the docs page that may be used as a
> base
> > for the readme.io update.
> > This email is pretty long already, so please find the draft attached to
> > the JIRA issue
> > https://issues.apache.org/jira/browse/IGNITE-7704.
> >
> > Please share your thoughts.
> >
> > Thanks,
> > Stan
> >
> > From: Alexey Popov
> > Sent: 1 марта 2018 г. 17:01
> > To: [hidden email]
> > Subject: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi
> timeouts
> >
> > Hi Igniters,
> >
> > We often see similar questions from users and customers related to
> > IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts and
> > their
> > relations. And we see several side-effects after incorrect timeout
> > configuration.
> >
> > I tried to briefly describe these timeout settings (please see below) and
> > found out that the most of them do not have sense in terms of cluster
> > functions/operations and could not be explained to the users.
> >
> > I propose to deprecate most of them and leave only the timeouts we can
> > explain in common terms ( (setFailureDetectionTimeout, setNetworkTimeout,
> > setJoinTimeout and some others).
> >
> > Please let me know your thoughts.
> >
> > Thanks,
> > Alexey
> >
> > GLOBAL:
> >
> > IgniteConfiguration.setNetworkTimeout:
> > It is a global timeout for high-level operations where a network is
> > involved. For instance, IgniteMessaging delivery uses this timeout or
> > DiscoverySpi handshake.
> >
> > IgniteConfiguration.setFailureDetectionTimeout:
> > It is a global timeout for detecting failures at IgniteSpi
> implementations
> > (including DiscoverySpi and CommunicationSpi).
> > The failure detection algorithm actually limits a range of simple network
> > operations related to a single logical operation (for instance, a
> reliable
> > delivery of some DiscoverySpi message within a cluster).
> > Failure detection timeout is a cumulative timeout for a socket
> connection,
> > sending and receiving data bytes and all possible socket retries (if some
> > failure happens).
> > This timeout is intended to simplify the failure detection condition
> from a
> > user perspective.
> >
> > IgniteConfiguration.setClientFailureDetectionTimeout: - it is a special
> > case
> > for DiscoverySpi client-node Ignite.
> >
> > TCP DISCOVERY SPI:
> >
> > If you need more control over failure detection algorithm for
> > TcpDiscoverySpi you can explicitly use the following low-level options
> > (that
> > will disable failureDetectoinTimeout logic):
> >
> > 1. TcpDiscoverySpi.setConnectTimeout - socket connection timeout
> > 2. TcpDiscoverySpi.setReconnectCount - number of reconnect attempts used
> > when establishing connection with the remote node and sending messages to
> > it
> > 3. TcpDiscoverySpi.setSocketTimeout - socket write timeout. The write
> > operation will be repeated getReconnectCount() times if it exceeds this
> > timeout
> > 4. TcpDiscoverySpi.setAckTimeout - message acknowledgment timeout. If a
> > message acknowledgment is not received within this timeout, sending is
> > considered as failed and SPI will try to repeat send operation. It is
> > automatically doubled for simultaneous retries up to getMaxAckTimeout
> > value.
> > 5. TcpDiscoverySpi.setMaxAckTimeout - maximum connection timeout, if the
> > getAckTimeout reaches getMaxAckTimeout then SPI give up sending retries
> >
> > Another important TcpDiscoverySpi timeouts:
> >
> > TcpDiscoverySpi.setJoinTimeout - It is a timeout for join process when a
> > new/restarted node joins a cluster. The node tries to connect to all
> > available IP addresses provided by ipFinder within this timeout.
> > If the timeout is exceeded, the node will give up and throw an exception
> > from Ignition.start().
> >
> > TcpDiscoverySpi.setNetworkTimeout - timeout for high-level operations
> like
> > handshake. It looks like it should be deprecated and the
> > IgniteConfiguration.getNetworkTimeout should be used here.
> >
> > TCP COMMUNICATION SPI:
> >
> > If you need more control over failure detection algorithm for
> > TcpCommunicationSpi you can explicitly use the following low-level
> options
> > (that will disable failureDetectoinTimeout logic):
> >
> > 1. TcpCommunicationSpi.setConnectTimeout - socket connection timeout,
> will
> > be automatically doubled for simultaneous retries (up to
> getReconnectCount)
> > related to a single logical operation
> > 2. TcpCommunicationSpi.setMaxConnectTimeout - maximum connection
> timeout,
> > the higher limit of getReconnectCount-times doubled getConnectTimeout
> > 3. TcpCommunicationSpi.setReconnectCount - number of reconnect attempts
> > used
> > when establishing connection with the remote node and sending messages to
> > it
> >
> > Another important TcpCommunicationSpi timeouts:
> >
> > TcpDiscoverySpi.setSocketWriteTimeout - timeout to send a message.
> > TcpDiscoverySpi.setIdleConnectionTimeout - maximum idle connection
> timeout
> > upon which a connection will be closed.
> >
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >
> >
>
>

Reply | Threaded
Open this post in threaded view
|

Re: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpitimeouts

Valentin Kulichenko
Stan,

Looks like you suggest to only change the default. If so, it's OK. But
let's not change the behavior of these timeouts for the case they are
explicitly set in config.

Thanks,
Val

On Wed, May 30, 2018 at 1:06 AM, Stanislav Lukyanov <[hidden email]>
wrote:

> On networkTimeout: no, we don’t have anything like that in
> TcpCommunicationSpi.
>
> On socketWriteTimeout:
> First, its semantic is very close to TcpDicsoverySpi.socketTimeout (with
> the exception that communication uses NIO), and the latter defaults to
> failureDetectionTimeout,
> so I think it would help to avoid confusion.
> Second, I think we can’t deprecate something without an alternative that
> would work for most users.
> On the other hand, if we do default socketWriteTimeout to
> failureDetectionTimeout then we reach a pretty decent API state
> where one only needs two properties in IgniteConfiguration neither of
> which we’re considering for deprecation and removal in 3.0.
>
> Stan
>
> From: Valentin Kulichenko
> Sent: 29 мая 2018 г. 22:17
> To: [hidden email]
> Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> TcpCommunicationSpitimeouts
>
> Stan,
>
> OK, I got confused a little :)
>
> I do agree that TcpDiscoverySpi.networkTimeout should inherit from
> IgniteConfiguration.networkTImeout if not set explicitly. Do we have the
> same setting for TcpCommunicationSpi, BTW? If yes, behavior should be
> consistent.
>
> As for TcpCommunicationSpi.socketWriteTimeout, I'm not sure why you want
> to
> change its behavior. Can we just deprecate it and eventually remove, just
> as we plan to do for all timeouts from #2?
>
> -Val
>
> On Tue, May 29, 2018 at 3:50 AM, Stanislav Lukyanov <
> [hidden email]>
> wrote:
>
> > Val,
> >
> > Which timeouts do you mean?
> >
> > In #2 I don’t propose to change behavior.
> >
> > I propose to change behavior for a couple of settings in #3 though.
> > I believe the correct approach here would be to target the behavior
> change
> > for 2.6,
> > but keep in mind that we’ll need to carefully analyze the impact before
> > actually making the changes.
> >
> > Thanks,
> > Stan
> >
> > From: Valentin Kulichenko
> > Sent: 29 мая 2018 г. 0:57
> > To: [hidden email]
> > Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> > TcpCommunicationSpitimeouts
> >
> > Hi Stan,
> >
> > I'm 100% for this activity, however I don't think we should change the
> > behavior of timeouts you listed in #2 - this can lead to unexpected
> > behavior for users who already use them. I would just deprecate them and
> > eventually remove.
> >
> > -Val
> >
> > On Mon, May 28, 2018 at 1:29 PM, Stanislav Lukyanov <
> > [hidden email]>
> > wrote:
> >
> > > Hi folks,
> > >
> > > It looks like we stopped half-way with this activity. I’d like to pick
> it
> > > up.
> > >
> > > All seem to agree that we should simplify the timeout settings.
> > > Here are the specific actions I’d like to propose:
> > >
> > > 1. Promote the use of global timeouts as the best practice
> > > *What*: update the docs to encourage users to rely on the following
> > > timeouts for their “network stability” settings
> > > IgniteConfiguration.failureDetectionTimeout
> > > IgniteConfiguration.clientFailureDetectionTimeout
> > > IgniteConfiguration.networkTimeout
> > > *When*: update readme.io docs for 2.5 and Javadoc for 2.6
> > >
> > > 2. Discourage the use of finer timeouts
> > > *What*:
> > > - update the docs to discourage users to use the following timeouts and
> > > announce their upcoming deprecation and removal
> > > TcpDiscoverySpi.socketTimeout
> > > TcpDiscoverySpi.ackTimeout
> > > TcpDiscoverySpi.maxAckTimeout
> > > TcpDiscoverySpi.reconnectCount
> > > TcpCommunicationSpi.connectTimeout
> > > TcpCommunicationSpi.maxConnectTimeout
> > > TcpCommunicationSpi.reconnectCount
> > > - deprecate the properties in code
> > > - remove the properties in code
> > > *When*:
> > > - readme.io update with deprecation announcement for 2.5
> > > - @Deprecated in code + Javadoc update + respective readme.io
> rewording
> > > for 2.6
> > > - properties removal in 3.0
> > >
> > > 3. Make “orphan” timeouts rely on global timeouts, then deprecate and
> > > remove
> > > *What*:
> > > Two settings currently don’t default to the global equivalents,
> although
> > > they should:
> > > - TcpCommunicationSpi.socketWriteTimeout should default to
> > > failureDetectionTimeout
> > > - TcpDiscoverySpi.networkTimeout should default to IgniteConfiguration.
> > > networkTImeout
> > > So the course of action would be:
> > > - update the docs to explain that these timeouts have to be used for
> now,
> > > but announce their upcoming deprecation and removal
> > > - change the properties to default to their global counterparts and
> > > deprecate them in code
> > > - remove the properties in code
> > > *When*:
> > > - readme.io update with deprecation announcement for 2.5
> > > - changing defaults + @Deprecated in code + Javadoc update + respective
> > > readme.io rewording for 2.6
> > > - properties removal in 3.0
> > >
> > > 4. Don’t touch other timeouts
> > > Other timeouts, like TcpDiscoverySpi.joinTimeout or
> TcpCommunicationSpi.
> > idleConnectionTimeout,
> > > are orthogonal to the whole
> > > “network stability” theme discussed above, and don’t have to be
> changed.
> > >
> > > Finally, I’ve prepared a draft of the docs page that may be used as a
> > base
> > > for the readme.io update.
> > > This email is pretty long already, so please find the draft attached to
> > > the JIRA issue
> > > https://issues.apache.org/jira/browse/IGNITE-7704.
> > >
> > > Please share your thoughts.
> > >
> > > Thanks,
> > > Stan
> > >
> > > From: Alexey Popov
> > > Sent: 1 марта 2018 г. 17:01
> > > To: [hidden email]
> > > Subject: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi
> > timeouts
> > >
> > > Hi Igniters,
> > >
> > > We often see similar questions from users and customers related to
> > > IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts and
> > > their
> > > relations. And we see several side-effects after incorrect timeout
> > > configuration.
> > >
> > > I tried to briefly describe these timeout settings (please see below)
> and
> > > found out that the most of them do not have sense in terms of cluster
> > > functions/operations and could not be explained to the users.
> > >
> > > I propose to deprecate most of them and leave only the timeouts we can
> > > explain in common terms ( (setFailureDetectionTimeout,
> setNetworkTimeout,
> > > setJoinTimeout and some others).
> > >
> > > Please let me know your thoughts.
> > >
> > > Thanks,
> > > Alexey
> > >
> > > GLOBAL:
> > >
> > > IgniteConfiguration.setNetworkTimeout:
> > > It is a global timeout for high-level operations where a network is
> > > involved. For instance, IgniteMessaging delivery uses this timeout or
> > > DiscoverySpi handshake.
> > >
> > > IgniteConfiguration.setFailureDetectionTimeout:
> > > It is a global timeout for detecting failures at IgniteSpi
> > implementations
> > > (including DiscoverySpi and CommunicationSpi).
> > > The failure detection algorithm actually limits a range of simple
> network
> > > operations related to a single logical operation (for instance, a
> > reliable
> > > delivery of some DiscoverySpi message within a cluster).
> > > Failure detection timeout is a cumulative timeout for a socket
> > connection,
> > > sending and receiving data bytes and all possible socket retries (if
> some
> > > failure happens).
> > > This timeout is intended to simplify the failure detection condition
> > from a
> > > user perspective.
> > >
> > > IgniteConfiguration.setClientFailureDetectionTimeout: - it is a
> special
> > > case
> > > for DiscoverySpi client-node Ignite.
> > >
> > > TCP DISCOVERY SPI:
> > >
> > > If you need more control over failure detection algorithm for
> > > TcpDiscoverySpi you can explicitly use the following low-level options
> > > (that
> > > will disable failureDetectoinTimeout logic):
> > >
> > > 1. TcpDiscoverySpi.setConnectTimeout - socket connection timeout
> > > 2. TcpDiscoverySpi.setReconnectCount - number of reconnect attempts
> used
> > > when establishing connection with the remote node and sending messages
> to
> > > it
> > > 3. TcpDiscoverySpi.setSocketTimeout - socket write timeout. The write
> > > operation will be repeated getReconnectCount() times if it exceeds this
> > > timeout
> > > 4. TcpDiscoverySpi.setAckTimeout - message acknowledgment timeout. If a
> > > message acknowledgment is not received within this timeout, sending is
> > > considered as failed and SPI will try to repeat send operation. It is
> > > automatically doubled for simultaneous retries up to getMaxAckTimeout
> > > value.
> > > 5. TcpDiscoverySpi.setMaxAckTimeout - maximum connection timeout, if
> the
> > > getAckTimeout reaches getMaxAckTimeout then SPI give up sending retries
> > >
> > > Another important TcpDiscoverySpi timeouts:
> > >
> > > TcpDiscoverySpi.setJoinTimeout - It is a timeout for join process when
> a
> > > new/restarted node joins a cluster. The node tries to connect to all
> > > available IP addresses provided by ipFinder within this timeout.
> > > If the timeout is exceeded, the node will give up and throw an
> exception
> > > from Ignition.start().
> > >
> > > TcpDiscoverySpi.setNetworkTimeout - timeout for high-level operations
> > like
> > > handshake. It looks like it should be deprecated and the
> > > IgniteConfiguration.getNetworkTimeout should be used here.
> > >
> > > TCP COMMUNICATION SPI:
> > >
> > > If you need more control over failure detection algorithm for
> > > TcpCommunicationSpi you can explicitly use the following low-level
> > options
> > > (that will disable failureDetectoinTimeout logic):
> > >
> > > 1. TcpCommunicationSpi.setConnectTimeout - socket connection timeout,
> > will
> > > be automatically doubled for simultaneous retries (up to
> > getReconnectCount)
> > > related to a single logical operation
> > > 2. TcpCommunicationSpi.setMaxConnectTimeout - maximum connection
> > timeout,
> > > the higher limit of getReconnectCount-times doubled getConnectTimeout
> > > 3. TcpCommunicationSpi.setReconnectCount - number of reconnect
> attempts
> > > used
> > > when establishing connection with the remote node and sending messages
> to
> > > it
> > >
> > > Another important TcpCommunicationSpi timeouts:
> > >
> > > TcpDiscoverySpi.setSocketWriteTimeout - timeout to send a message.
> > > TcpDiscoverySpi.setIdleConnectionTimeout - maximum idle connection
> > timeout
> > > upon which a connection will be closed.
> > >
> > >
> > >
> > >
> > > --
> > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > >
> > >
> >
> >
>
>
Reply | Threaded
Open this post in threaded view
|

RE: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpitimeouts

Stanislav Lukyanov
Hi,

I’ve updated the proposed documentation update with a description of metricsUpdateFrequency and a detailed description of failureDetectionTimeout and clientFailureDetectionTimeout relations. The draft is attached to https://issues.apache.org/jira/browse/IGNITE-7704.

It seems that relation between failureDetectionTimeout and clientFailureDetectionTimeout is currently too tricky and should also be changed in future.
The problem is that in a server-client connection the server will use clientFailureDetectionTimeout but client will use failureDetectionTimeout.
In other words, clients ignore clientFailureDetectionTimeout and just use failureDetectionTimeout. Because of that, one has to provide different values of failureDetectionTimeout in server and client configs which seems confusing and inconvenient.
So I’d like to add one more point to my earlier proposal:

5. Always use clientFailureDetectionTimeout on clients instead of failureDetectionTimeout
*What*: change code to use clientFailureDetectionTimeout on clients
*When*: update code and readme.io docs in 2.7

Thanks,
Stan

From: Valentin Kulichenko
Sent: 30 мая 2018 г. 19:09
To: [hidden email]
Subject: Re: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpitimeouts

Stan,

Looks like you suggest to only change the default. If so, it's OK. But
let's not change the behavior of these timeouts for the case they are
explicitly set in config.

Thanks,
Val

On Wed, May 30, 2018 at 1:06 AM, Stanislav Lukyanov <[hidden email]>
wrote:

> On networkTimeout: no, we don’t have anything like that in
> TcpCommunicationSpi.
>
> On socketWriteTimeout:
> First, its semantic is very close to TcpDicsoverySpi.socketTimeout (with
> the exception that communication uses NIO), and the latter defaults to
> failureDetectionTimeout,
> so I think it would help to avoid confusion.
> Second, I think we can’t deprecate something without an alternative that
> would work for most users.
> On the other hand, if we do default socketWriteTimeout to
> failureDetectionTimeout then we reach a pretty decent API state
> where one only needs two properties in IgniteConfiguration neither of
> which we’re considering for deprecation and removal in 3.0.
>
> Stan
>
> From: Valentin Kulichenko
> Sent: 29 мая 2018 г. 22:17
> To: [hidden email]
> Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> TcpCommunicationSpitimeouts
>
> Stan,
>
> OK, I got confused a little :)
>
> I do agree that TcpDiscoverySpi.networkTimeout should inherit from
> IgniteConfiguration.networkTImeout if not set explicitly. Do we have the
> same setting for TcpCommunicationSpi, BTW? If yes, behavior should be
> consistent.
>
> As for TcpCommunicationSpi.socketWriteTimeout, I'm not sure why you want
> to
> change its behavior. Can we just deprecate it and eventually remove, just
> as we plan to do for all timeouts from #2?
>
> -Val
>
> On Tue, May 29, 2018 at 3:50 AM, Stanislav Lukyanov <
> [hidden email]>
> wrote:
>
> > Val,
> >
> > Which timeouts do you mean?
> >
> > In #2 I don’t propose to change behavior.
> >
> > I propose to change behavior for a couple of settings in #3 though.
> > I believe the correct approach here would be to target the behavior
> change
> > for 2.6,
> > but keep in mind that we’ll need to carefully analyze the impact before
> > actually making the changes.
> >
> > Thanks,
> > Stan
> >
> > From: Valentin Kulichenko
> > Sent: 29 мая 2018 г. 0:57
> > To: [hidden email]
> > Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> > TcpCommunicationSpitimeouts
> >
> > Hi Stan,
> >
> > I'm 100% for this activity, however I don't think we should change the
> > behavior of timeouts you listed in #2 - this can lead to unexpected
> > behavior for users who already use them. I would just deprecate them and
> > eventually remove.
> >
> > -Val
> >
> > On Mon, May 28, 2018 at 1:29 PM, Stanislav Lukyanov <
> > [hidden email]>
> > wrote:
> >
> > > Hi folks,
> > >
> > > It looks like we stopped half-way with this activity. I’d like to pick
> it
> > > up.
> > >
> > > All seem to agree that we should simplify the timeout settings.
> > > Here are the specific actions I’d like to propose:
> > >
> > > 1. Promote the use of global timeouts as the best practice
> > > *What*: update the docs to encourage users to rely on the following
> > > timeouts for their “network stability” settings
> > > IgniteConfiguration.failureDetectionTimeout
> > > IgniteConfiguration.clientFailureDetectionTimeout
> > > IgniteConfiguration.networkTimeout
> > > *When*: update readme.io docs for 2.5 and Javadoc for 2.6
> > >
> > > 2. Discourage the use of finer timeouts
> > > *What*:
> > > - update the docs to discourage users to use the following timeouts and
> > > announce their upcoming deprecation and removal
> > > TcpDiscoverySpi.socketTimeout
> > > TcpDiscoverySpi.ackTimeout
> > > TcpDiscoverySpi.maxAckTimeout
> > > TcpDiscoverySpi.reconnectCount
> > > TcpCommunicationSpi.connectTimeout
> > > TcpCommunicationSpi.maxConnectTimeout
> > > TcpCommunicationSpi.reconnectCount
> > > - deprecate the properties in code
> > > - remove the properties in code
> > > *When*:
> > > - readme.io update with deprecation announcement for 2.5
> > > - @Deprecated in code + Javadoc update + respective readme.io
> rewording
> > > for 2.6
> > > - properties removal in 3.0
> > >
> > > 3. Make “orphan” timeouts rely on global timeouts, then deprecate and
> > > remove
> > > *What*:
> > > Two settings currently don’t default to the global equivalents,
> although
> > > they should:
> > > - TcpCommunicationSpi.socketWriteTimeout should default to
> > > failureDetectionTimeout
> > > - TcpDiscoverySpi.networkTimeout should default to IgniteConfiguration.
> > > networkTImeout
> > > So the course of action would be:
> > > - update the docs to explain that these timeouts have to be used for
> now,
> > > but announce their upcoming deprecation and removal
> > > - change the properties to default to their global counterparts and
> > > deprecate them in code
> > > - remove the properties in code
> > > *When*:
> > > - readme.io update with deprecation announcement for 2.5
> > > - changing defaults + @Deprecated in code + Javadoc update + respective
> > > readme.io rewording for 2.6
> > > - properties removal in 3.0
> > >
> > > 4. Don’t touch other timeouts
> > > Other timeouts, like TcpDiscoverySpi.joinTimeout or
> TcpCommunicationSpi.
> > idleConnectionTimeout,
> > > are orthogonal to the whole
> > > “network stability” theme discussed above, and don’t have to be
> changed.
> > >
> > > Finally, I’ve prepared a draft of the docs page that may be used as a
> > base
> > > for the readme.io update.
> > > This email is pretty long already, so please find the draft attached to
> > > the JIRA issue
> > > https://issues.apache.org/jira/browse/IGNITE-7704.
> > >
> > > Please share your thoughts.
> > >
> > > Thanks,
> > > Stan
> > >
> > > From: Alexey Popov
> > > Sent: 1 марта 2018 г. 17:01
> > > To: [hidden email]
> > > Subject: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi
> > timeouts
> > >
> > > Hi Igniters,
> > >
> > > We often see similar questions from users and customers related to
> > > IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts and
> > > their
> > > relations. And we see several side-effects after incorrect timeout
> > > configuration.
> > >
> > > I tried to briefly describe these timeout settings (please see below)
> and
> > > found out that the most of them do not have sense in terms of cluster
> > > functions/operations and could not be explained to the users.
> > >
> > > I propose to deprecate most of them and leave only the timeouts we can
> > > explain in common terms ( (setFailureDetectionTimeout,
> setNetworkTimeout,
> > > setJoinTimeout and some others).
> > >
> > > Please let me know your thoughts.
> > >
> > > Thanks,
> > > Alexey
> > >
> > > GLOBAL:
> > >
> > > IgniteConfiguration.setNetworkTimeout:
> > > It is a global timeout for high-level operations where a network is
> > > involved. For instance, IgniteMessaging delivery uses this timeout or
> > > DiscoverySpi handshake.
> > >
> > > IgniteConfiguration.setFailureDetectionTimeout:
> > > It is a global timeout for detecting failures at IgniteSpi
> > implementations
> > > (including DiscoverySpi and CommunicationSpi).
> > > The failure detection algorithm actually limits a range of simple
> network
> > > operations related to a single logical operation (for instance, a
> > reliable
> > > delivery of some DiscoverySpi message within a cluster).
> > > Failure detection timeout is a cumulative timeout for a socket
> > connection,
> > > sending and receiving data bytes and all possible socket retries (if
> some
> > > failure happens).
> > > This timeout is intended to simplify the failure detection condition
> > from a
> > > user perspective.
> > >
> > > IgniteConfiguration.setClientFailureDetectionTimeout: - it is a
> special
> > > case
> > > for DiscoverySpi client-node Ignite.
> > >
> > > TCP DISCOVERY SPI:
> > >
> > > If you need more control over failure detection algorithm for
> > > TcpDiscoverySpi you can explicitly use the following low-level options
> > > (that
> > > will disable failureDetectoinTimeout logic):
> > >
> > > 1. TcpDiscoverySpi.setConnectTimeout - socket connection timeout
> > > 2. TcpDiscoverySpi.setReconnectCount - number of reconnect attempts
> used
> > > when establishing connection with the remote node and sending messages
> to
> > > it
> > > 3. TcpDiscoverySpi.setSocketTimeout - socket write timeout. The write
> > > operation will be repeated getReconnectCount() times if it exceeds this
> > > timeout
> > > 4. TcpDiscoverySpi.setAckTimeout - message acknowledgment timeout. If a
> > > message acknowledgment is not received within this timeout, sending is
> > > considered as failed and SPI will try to repeat send operation. It is
> > > automatically doubled for simultaneous retries up to getMaxAckTimeout
> > > value.
> > > 5. TcpDiscoverySpi.setMaxAckTimeout - maximum connection timeout, if
> the
> > > getAckTimeout reaches getMaxAckTimeout then SPI give up sending retries
> > >
> > > Another important TcpDiscoverySpi timeouts:
> > >
> > > TcpDiscoverySpi.setJoinTimeout - It is a timeout for join process when
> a
> > > new/restarted node joins a cluster. The node tries to connect to all
> > > available IP addresses provided by ipFinder within this timeout.
> > > If the timeout is exceeded, the node will give up and throw an
> exception
> > > from Ignition.start().
> > >
> > > TcpDiscoverySpi.setNetworkTimeout - timeout for high-level operations
> > like
> > > handshake. It looks like it should be deprecated and the
> > > IgniteConfiguration.getNetworkTimeout should be used here.
> > >
> > > TCP COMMUNICATION SPI:
> > >
> > > If you need more control over failure detection algorithm for
> > > TcpCommunicationSpi you can explicitly use the following low-level
> > options
> > > (that will disable failureDetectoinTimeout logic):
> > >
> > > 1. TcpCommunicationSpi.setConnectTimeout - socket connection timeout,
> > will
> > > be automatically doubled for simultaneous retries (up to
> > getReconnectCount)
> > > related to a single logical operation
> > > 2. TcpCommunicationSpi.setMaxConnectTimeout - maximum connection
> > timeout,
> > > the higher limit of getReconnectCount-times doubled getConnectTimeout
> > > 3. TcpCommunicationSpi.setReconnectCount - number of reconnect
> attempts
> > > used
> > > when establishing connection with the remote node and sending messages
> to
> > > it
> > >
> > > Another important TcpCommunicationSpi timeouts:
> > >
> > > TcpDiscoverySpi.setSocketWriteTimeout - timeout to send a message.
> > > TcpDiscoverySpi.setIdleConnectionTimeout - maximum idle connection
> > timeout
> > > upon which a connection will be closed.
> > >
> > >
> > >
> > >
> > > --
> > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > >
> > >
> >
> >
>
>

Reply | Threaded
Open this post in threaded view
|

Re: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpitimeouts

Valentin Kulichenko
Stan,

What is the purpose of clientFailureDetectionTimeout? Why can't we just
always use failureDetectionTimeout? Is there any difference between these
two timeouts?

-Val



On Wed, Jul 4, 2018 at 7:00 AM Stanislav Lukyanov <[hidden email]>
wrote:

> Hi,
>
> I’ve updated the proposed documentation update with a description of
> metricsUpdateFrequency and a detailed description of
> failureDetectionTimeout and clientFailureDetectionTimeout relations. The
> draft is attached to https://issues.apache.org/jira/browse/IGNITE-7704.
>
> It seems that relation between failureDetectionTimeout and
> clientFailureDetectionTimeout is currently too tricky and should also be
> changed in future.
> The problem is that in a server-client connection the server will use
> clientFailureDetectionTimeout but client will use failureDetectionTimeout.
> In other words, clients ignore clientFailureDetectionTimeout and just use
> failureDetectionTimeout. Because of that, one has to provide different
> values of failureDetectionTimeout in server and client configs which seems
> confusing and inconvenient.
> So I’d like to add one more point to my earlier proposal:
>
> 5. Always use clientFailureDetectionTimeout on clients instead of
> failureDetectionTimeout
> *What*: change code to use clientFailureDetectionTimeout on clients
> *When*: update code and readme.io docs in 2.7
>
> Thanks,
> Stan
>
> From: Valentin Kulichenko
> Sent: 30 мая 2018 г. 19:09
> To: [hidden email]
> Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> TcpCommunicationSpitimeouts
>
> Stan,
>
> Looks like you suggest to only change the default. If so, it's OK. But
> let's not change the behavior of these timeouts for the case they are
> explicitly set in config.
>
> Thanks,
> Val
>
> On Wed, May 30, 2018 at 1:06 AM, Stanislav Lukyanov <
> [hidden email]>
> wrote:
>
> > On networkTimeout: no, we don’t have anything like that in
> > TcpCommunicationSpi.
> >
> > On socketWriteTimeout:
> > First, its semantic is very close to TcpDicsoverySpi.socketTimeout (with
> > the exception that communication uses NIO), and the latter defaults to
> > failureDetectionTimeout,
> > so I think it would help to avoid confusion.
> > Second, I think we can’t deprecate something without an alternative that
> > would work for most users.
> > On the other hand, if we do default socketWriteTimeout to
> > failureDetectionTimeout then we reach a pretty decent API state
> > where one only needs two properties in IgniteConfiguration neither of
> > which we’re considering for deprecation and removal in 3.0.
> >
> > Stan
> >
> > From: Valentin Kulichenko
> > Sent: 29 мая 2018 г. 22:17
> > To: [hidden email]
> > Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> > TcpCommunicationSpitimeouts
> >
> > Stan,
> >
> > OK, I got confused a little :)
> >
> > I do agree that TcpDiscoverySpi.networkTimeout should inherit from
> > IgniteConfiguration.networkTImeout if not set explicitly. Do we have the
> > same setting for TcpCommunicationSpi, BTW? If yes, behavior should be
> > consistent.
> >
> > As for TcpCommunicationSpi.socketWriteTimeout, I'm not sure why you want
> > to
> > change its behavior. Can we just deprecate it and eventually remove, just
> > as we plan to do for all timeouts from #2?
> >
> > -Val
> >
> > On Tue, May 29, 2018 at 3:50 AM, Stanislav Lukyanov <
> > [hidden email]>
> > wrote:
> >
> > > Val,
> > >
> > > Which timeouts do you mean?
> > >
> > > In #2 I don’t propose to change behavior.
> > >
> > > I propose to change behavior for a couple of settings in #3 though.
> > > I believe the correct approach here would be to target the behavior
> > change
> > > for 2.6,
> > > but keep in mind that we’ll need to carefully analyze the impact before
> > > actually making the changes.
> > >
> > > Thanks,
> > > Stan
> > >
> > > From: Valentin Kulichenko
> > > Sent: 29 мая 2018 г. 0:57
> > > To: [hidden email]
> > > Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> > > TcpCommunicationSpitimeouts
> > >
> > > Hi Stan,
> > >
> > > I'm 100% for this activity, however I don't think we should change the
> > > behavior of timeouts you listed in #2 - this can lead to unexpected
> > > behavior for users who already use them. I would just deprecate them
> and
> > > eventually remove.
> > >
> > > -Val
> > >
> > > On Mon, May 28, 2018 at 1:29 PM, Stanislav Lukyanov <
> > > [hidden email]>
> > > wrote:
> > >
> > > > Hi folks,
> > > >
> > > > It looks like we stopped half-way with this activity. I’d like to
> pick
> > it
> > > > up.
> > > >
> > > > All seem to agree that we should simplify the timeout settings.
> > > > Here are the specific actions I’d like to propose:
> > > >
> > > > 1. Promote the use of global timeouts as the best practice
> > > > *What*: update the docs to encourage users to rely on the following
> > > > timeouts for their “network stability” settings
> > > > IgniteConfiguration.failureDetectionTimeout
> > > > IgniteConfiguration.clientFailureDetectionTimeout
> > > > IgniteConfiguration.networkTimeout
> > > > *When*: update readme.io docs for 2.5 and Javadoc for 2.6
> > > >
> > > > 2. Discourage the use of finer timeouts
> > > > *What*:
> > > > - update the docs to discourage users to use the following timeouts
> and
> > > > announce their upcoming deprecation and removal
> > > > TcpDiscoverySpi.socketTimeout
> > > > TcpDiscoverySpi.ackTimeout
> > > > TcpDiscoverySpi.maxAckTimeout
> > > > TcpDiscoverySpi.reconnectCount
> > > > TcpCommunicationSpi.connectTimeout
> > > > TcpCommunicationSpi.maxConnectTimeout
> > > > TcpCommunicationSpi.reconnectCount
> > > > - deprecate the properties in code
> > > > - remove the properties in code
> > > > *When*:
> > > > - readme.io update with deprecation announcement for 2.5
> > > > - @Deprecated in code + Javadoc update + respective readme.io
> > rewording
> > > > for 2.6
> > > > - properties removal in 3.0
> > > >
> > > > 3. Make “orphan” timeouts rely on global timeouts, then deprecate and
> > > > remove
> > > > *What*:
> > > > Two settings currently don’t default to the global equivalents,
> > although
> > > > they should:
> > > > - TcpCommunicationSpi.socketWriteTimeout should default to
> > > > failureDetectionTimeout
> > > > - TcpDiscoverySpi.networkTimeout should default to
> IgniteConfiguration.
> > > > networkTImeout
> > > > So the course of action would be:
> > > > - update the docs to explain that these timeouts have to be used for
> > now,
> > > > but announce their upcoming deprecation and removal
> > > > - change the properties to default to their global counterparts and
> > > > deprecate them in code
> > > > - remove the properties in code
> > > > *When*:
> > > > - readme.io update with deprecation announcement for 2.5
> > > > - changing defaults + @Deprecated in code + Javadoc update +
> respective
> > > > readme.io rewording for 2.6
> > > > - properties removal in 3.0
> > > >
> > > > 4. Don’t touch other timeouts
> > > > Other timeouts, like TcpDiscoverySpi.joinTimeout or
> > TcpCommunicationSpi.
> > > idleConnectionTimeout,
> > > > are orthogonal to the whole
> > > > “network stability” theme discussed above, and don’t have to be
> > changed.
> > > >
> > > > Finally, I’ve prepared a draft of the docs page that may be used as a
> > > base
> > > > for the readme.io update.
> > > > This email is pretty long already, so please find the draft attached
> to
> > > > the JIRA issue
> > > > https://issues.apache.org/jira/browse/IGNITE-7704.
> > > >
> > > > Please share your thoughts.
> > > >
> > > > Thanks,
> > > > Stan
> > > >
> > > > From: Alexey Popov
> > > > Sent: 1 марта 2018 г. 17:01
> > > > To: [hidden email]
> > > > Subject: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi
> > > timeouts
> > > >
> > > > Hi Igniters,
> > > >
> > > > We often see similar questions from users and customers related to
> > > > IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts
> and
> > > > their
> > > > relations. And we see several side-effects after incorrect timeout
> > > > configuration.
> > > >
> > > > I tried to briefly describe these timeout settings (please see below)
> > and
> > > > found out that the most of them do not have sense in terms of cluster
> > > > functions/operations and could not be explained to the users.
> > > >
> > > > I propose to deprecate most of them and leave only the timeouts we
> can
> > > > explain in common terms ( (setFailureDetectionTimeout,
> > setNetworkTimeout,
> > > > setJoinTimeout and some others).
> > > >
> > > > Please let me know your thoughts.
> > > >
> > > > Thanks,
> > > > Alexey
> > > >
> > > > GLOBAL:
> > > >
> > > > IgniteConfiguration.setNetworkTimeout:
> > > > It is a global timeout for high-level operations where a network is
> > > > involved. For instance, IgniteMessaging delivery uses this timeout or
> > > > DiscoverySpi handshake.
> > > >
> > > > IgniteConfiguration.setFailureDetectionTimeout:
> > > > It is a global timeout for detecting failures at IgniteSpi
> > > implementations
> > > > (including DiscoverySpi and CommunicationSpi).
> > > > The failure detection algorithm actually limits a range of simple
> > network
> > > > operations related to a single logical operation (for instance, a
> > > reliable
> > > > delivery of some DiscoverySpi message within a cluster).
> > > > Failure detection timeout is a cumulative timeout for a socket
> > > connection,
> > > > sending and receiving data bytes and all possible socket retries (if
> > some
> > > > failure happens).
> > > > This timeout is intended to simplify the failure detection condition
> > > from a
> > > > user perspective.
> > > >
> > > > IgniteConfiguration.setClientFailureDetectionTimeout: - it is a
> > special
> > > > case
> > > > for DiscoverySpi client-node Ignite.
> > > >
> > > > TCP DISCOVERY SPI:
> > > >
> > > > If you need more control over failure detection algorithm for
> > > > TcpDiscoverySpi you can explicitly use the following low-level
> options
> > > > (that
> > > > will disable failureDetectoinTimeout logic):
> > > >
> > > > 1. TcpDiscoverySpi.setConnectTimeout - socket connection timeout
> > > > 2. TcpDiscoverySpi.setReconnectCount - number of reconnect attempts
> > used
> > > > when establishing connection with the remote node and sending
> messages
> > to
> > > > it
> > > > 3. TcpDiscoverySpi.setSocketTimeout - socket write timeout. The write
> > > > operation will be repeated getReconnectCount() times if it exceeds
> this
> > > > timeout
> > > > 4. TcpDiscoverySpi.setAckTimeout - message acknowledgment timeout.
> If a
> > > > message acknowledgment is not received within this timeout, sending
> is
> > > > considered as failed and SPI will try to repeat send operation. It is
> > > > automatically doubled for simultaneous retries up to getMaxAckTimeout
> > > > value.
> > > > 5. TcpDiscoverySpi.setMaxAckTimeout - maximum connection timeout, if
> > the
> > > > getAckTimeout reaches getMaxAckTimeout then SPI give up sending
> retries
> > > >
> > > > Another important TcpDiscoverySpi timeouts:
> > > >
> > > > TcpDiscoverySpi.setJoinTimeout - It is a timeout for join process
> when
> > a
> > > > new/restarted node joins a cluster. The node tries to connect to all
> > > > available IP addresses provided by ipFinder within this timeout.
> > > > If the timeout is exceeded, the node will give up and throw an
> > exception
> > > > from Ignition.start().
> > > >
> > > > TcpDiscoverySpi.setNetworkTimeout - timeout for high-level operations
> > > like
> > > > handshake. It looks like it should be deprecated and the
> > > > IgniteConfiguration.getNetworkTimeout should be used here.
> > > >
> > > > TCP COMMUNICATION SPI:
> > > >
> > > > If you need more control over failure detection algorithm for
> > > > TcpCommunicationSpi you can explicitly use the following low-level
> > > options
> > > > (that will disable failureDetectoinTimeout logic):
> > > >
> > > > 1. TcpCommunicationSpi.setConnectTimeout - socket connection timeout,
> > > will
> > > > be automatically doubled for simultaneous retries (up to
> > > getReconnectCount)
> > > > related to a single logical operation
> > > > 2. TcpCommunicationSpi.setMaxConnectTimeout - maximum connection
> > > timeout,
> > > > the higher limit of getReconnectCount-times doubled getConnectTimeout
> > > > 3. TcpCommunicationSpi.setReconnectCount - number of reconnect
> > attempts
> > > > used
> > > > when establishing connection with the remote node and sending
> messages
> > to
> > > > it
> > > >
> > > > Another important TcpCommunicationSpi timeouts:
> > > >
> > > > TcpDiscoverySpi.setSocketWriteTimeout - timeout to send a message.
> > > > TcpDiscoverySpi.setIdleConnectionTimeout - maximum idle connection
> > > timeout
> > > > upon which a connection will be closed.
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > > >
> > > >
> > >
> > >
> >
> >
>
>
Reply | Threaded
Open this post in threaded view
|

RE: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpitimeouts

Stanislav Lukyanov
We could just use failureDetectionTimeout all the time I guess.
The only benefit of clientFailureDetectionTimeout is that it may allow clients to be slower/on a slower network than servers.

Do you think it isn’t worth to have a separate setting just for that?

Thanks,
Stan

From: Valentin Kulichenko
Sent: 5 июля 2018 г. 18:16
To: [hidden email]
Subject: Re: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpitimeouts

Stan,

What is the purpose of clientFailureDetectionTimeout? Why can't we just
always use failureDetectionTimeout? Is there any difference between these
two timeouts?

-Val



On Wed, Jul 4, 2018 at 7:00 AM Stanislav Lukyanov <[hidden email]>
wrote:

> Hi,
>
> I’ve updated the proposed documentation update with a description of
> metricsUpdateFrequency and a detailed description of
> failureDetectionTimeout and clientFailureDetectionTimeout relations. The
> draft is attached to https://issues.apache.org/jira/browse/IGNITE-7704.
>
> It seems that relation between failureDetectionTimeout and
> clientFailureDetectionTimeout is currently too tricky and should also be
> changed in future.
> The problem is that in a server-client connection the server will use
> clientFailureDetectionTimeout but client will use failureDetectionTimeout.
> In other words, clients ignore clientFailureDetectionTimeout and just use
> failureDetectionTimeout. Because of that, one has to provide different
> values of failureDetectionTimeout in server and client configs which seems
> confusing and inconvenient.
> So I’d like to add one more point to my earlier proposal:
>
> 5. Always use clientFailureDetectionTimeout on clients instead of
> failureDetectionTimeout
> *What*: change code to use clientFailureDetectionTimeout on clients
> *When*: update code and readme.io docs in 2.7
>
> Thanks,
> Stan
>
> From: Valentin Kulichenko
> Sent: 30 мая 2018 г. 19:09
> To: [hidden email]
> Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> TcpCommunicationSpitimeouts
>
> Stan,
>
> Looks like you suggest to only change the default. If so, it's OK. But
> let's not change the behavior of these timeouts for the case they are
> explicitly set in config.
>
> Thanks,
> Val
>
> On Wed, May 30, 2018 at 1:06 AM, Stanislav Lukyanov <
> [hidden email]>
> wrote:
>
> > On networkTimeout: no, we don’t have anything like that in
> > TcpCommunicationSpi.
> >
> > On socketWriteTimeout:
> > First, its semantic is very close to TcpDicsoverySpi.socketTimeout (with
> > the exception that communication uses NIO), and the latter defaults to
> > failureDetectionTimeout,
> > so I think it would help to avoid confusion.
> > Second, I think we can’t deprecate something without an alternative that
> > would work for most users.
> > On the other hand, if we do default socketWriteTimeout to
> > failureDetectionTimeout then we reach a pretty decent API state
> > where one only needs two properties in IgniteConfiguration neither of
> > which we’re considering for deprecation and removal in 3.0.
> >
> > Stan
> >
> > From: Valentin Kulichenko
> > Sent: 29 мая 2018 г. 22:17
> > To: [hidden email]
> > Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> > TcpCommunicationSpitimeouts
> >
> > Stan,
> >
> > OK, I got confused a little :)
> >
> > I do agree that TcpDiscoverySpi.networkTimeout should inherit from
> > IgniteConfiguration.networkTImeout if not set explicitly. Do we have the
> > same setting for TcpCommunicationSpi, BTW? If yes, behavior should be
> > consistent.
> >
> > As for TcpCommunicationSpi.socketWriteTimeout, I'm not sure why you want
> > to
> > change its behavior. Can we just deprecate it and eventually remove, just
> > as we plan to do for all timeouts from #2?
> >
> > -Val
> >
> > On Tue, May 29, 2018 at 3:50 AM, Stanislav Lukyanov <
> > [hidden email]>
> > wrote:
> >
> > > Val,
> > >
> > > Which timeouts do you mean?
> > >
> > > In #2 I don’t propose to change behavior.
> > >
> > > I propose to change behavior for a couple of settings in #3 though.
> > > I believe the correct approach here would be to target the behavior
> > change
> > > for 2.6,
> > > but keep in mind that we’ll need to carefully analyze the impact before
> > > actually making the changes.
> > >
> > > Thanks,
> > > Stan
> > >
> > > From: Valentin Kulichenko
> > > Sent: 29 мая 2018 г. 0:57
> > > To: [hidden email]
> > > Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> > > TcpCommunicationSpitimeouts
> > >
> > > Hi Stan,
> > >
> > > I'm 100% for this activity, however I don't think we should change the
> > > behavior of timeouts you listed in #2 - this can lead to unexpected
> > > behavior for users who already use them. I would just deprecate them
> and
> > > eventually remove.
> > >
> > > -Val
> > >
> > > On Mon, May 28, 2018 at 1:29 PM, Stanislav Lukyanov <
> > > [hidden email]>
> > > wrote:
> > >
> > > > Hi folks,
> > > >
> > > > It looks like we stopped half-way with this activity. I’d like to
> pick
> > it
> > > > up.
> > > >
> > > > All seem to agree that we should simplify the timeout settings.
> > > > Here are the specific actions I’d like to propose:
> > > >
> > > > 1. Promote the use of global timeouts as the best practice
> > > > *What*: update the docs to encourage users to rely on the following
> > > > timeouts for their “network stability” settings
> > > > IgniteConfiguration.failureDetectionTimeout
> > > > IgniteConfiguration.clientFailureDetectionTimeout
> > > > IgniteConfiguration.networkTimeout
> > > > *When*: update readme.io docs for 2.5 and Javadoc for 2.6
> > > >
> > > > 2. Discourage the use of finer timeouts
> > > > *What*:
> > > > - update the docs to discourage users to use the following timeouts
> and
> > > > announce their upcoming deprecation and removal
> > > > TcpDiscoverySpi.socketTimeout
> > > > TcpDiscoverySpi.ackTimeout
> > > > TcpDiscoverySpi.maxAckTimeout
> > > > TcpDiscoverySpi.reconnectCount
> > > > TcpCommunicationSpi.connectTimeout
> > > > TcpCommunicationSpi.maxConnectTimeout
> > > > TcpCommunicationSpi.reconnectCount
> > > > - deprecate the properties in code
> > > > - remove the properties in code
> > > > *When*:
> > > > - readme.io update with deprecation announcement for 2.5
> > > > - @Deprecated in code + Javadoc update + respective readme.io
> > rewording
> > > > for 2.6
> > > > - properties removal in 3.0
> > > >
> > > > 3. Make “orphan” timeouts rely on global timeouts, then deprecate and
> > > > remove
> > > > *What*:
> > > > Two settings currently don’t default to the global equivalents,
> > although
> > > > they should:
> > > > - TcpCommunicationSpi.socketWriteTimeout should default to
> > > > failureDetectionTimeout
> > > > - TcpDiscoverySpi.networkTimeout should default to
> IgniteConfiguration.
> > > > networkTImeout
> > > > So the course of action would be:
> > > > - update the docs to explain that these timeouts have to be used for
> > now,
> > > > but announce their upcoming deprecation and removal
> > > > - change the properties to default to their global counterparts and
> > > > deprecate them in code
> > > > - remove the properties in code
> > > > *When*:
> > > > - readme.io update with deprecation announcement for 2.5
> > > > - changing defaults + @Deprecated in code + Javadoc update +
> respective
> > > > readme.io rewording for 2.6
> > > > - properties removal in 3.0
> > > >
> > > > 4. Don’t touch other timeouts
> > > > Other timeouts, like TcpDiscoverySpi.joinTimeout or
> > TcpCommunicationSpi.
> > > idleConnectionTimeout,
> > > > are orthogonal to the whole
> > > > “network stability” theme discussed above, and don’t have to be
> > changed.
> > > >
> > > > Finally, I’ve prepared a draft of the docs page that may be used as a
> > > base
> > > > for the readme.io update.
> > > > This email is pretty long already, so please find the draft attached
> to
> > > > the JIRA issue
> > > > https://issues.apache.org/jira/browse/IGNITE-7704.
> > > >
> > > > Please share your thoughts.
> > > >
> > > > Thanks,
> > > > Stan
> > > >
> > > > From: Alexey Popov
> > > > Sent: 1 марта 2018 г. 17:01
> > > > To: [hidden email]
> > > > Subject: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi
> > > timeouts
> > > >
> > > > Hi Igniters,
> > > >
> > > > We often see similar questions from users and customers related to
> > > > IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts
> and
> > > > their
> > > > relations. And we see several side-effects after incorrect timeout
> > > > configuration.
> > > >
> > > > I tried to briefly describe these timeout settings (please see below)
> > and
> > > > found out that the most of them do not have sense in terms of cluster
> > > > functions/operations and could not be explained to the users.
> > > >
> > > > I propose to deprecate most of them and leave only the timeouts we
> can
> > > > explain in common terms ( (setFailureDetectionTimeout,
> > setNetworkTimeout,
> > > > setJoinTimeout and some others).
> > > >
> > > > Please let me know your thoughts.
> > > >
> > > > Thanks,
> > > > Alexey
> > > >
> > > > GLOBAL:
> > > >
> > > > IgniteConfiguration.setNetworkTimeout:
> > > > It is a global timeout for high-level operations where a network is
> > > > involved. For instance, IgniteMessaging delivery uses this timeout or
> > > > DiscoverySpi handshake.
> > > >
> > > > IgniteConfiguration.setFailureDetectionTimeout:
> > > > It is a global timeout for detecting failures at IgniteSpi
> > > implementations
> > > > (including DiscoverySpi and CommunicationSpi).
> > > > The failure detection algorithm actually limits a range of simple
> > network
> > > > operations related to a single logical operation (for instance, a
> > > reliable
> > > > delivery of some DiscoverySpi message within a cluster).
> > > > Failure detection timeout is a cumulative timeout for a socket
> > > connection,
> > > > sending and receiving data bytes and all possible socket retries (if
> > some
> > > > failure happens).
> > > > This timeout is intended to simplify the failure detection condition
> > > from a
> > > > user perspective.
> > > >
> > > > IgniteConfiguration.setClientFailureDetectionTimeout: - it is a
> > special
> > > > case
> > > > for DiscoverySpi client-node Ignite.
> > > >
> > > > TCP DISCOVERY SPI:
> > > >
> > > > If you need more control over failure detection algorithm for
> > > > TcpDiscoverySpi you can explicitly use the following low-level
> options
> > > > (that
> > > > will disable failureDetectoinTimeout logic):
> > > >
> > > > 1. TcpDiscoverySpi.setConnectTimeout - socket connection timeout
> > > > 2. TcpDiscoverySpi.setReconnectCount - number of reconnect attempts
> > used
> > > > when establishing connection with the remote node and sending
> messages
> > to
> > > > it
> > > > 3. TcpDiscoverySpi.setSocketTimeout - socket write timeout. The write
> > > > operation will be repeated getReconnectCount() times if it exceeds
> this
> > > > timeout
> > > > 4. TcpDiscoverySpi.setAckTimeout - message acknowledgment timeout.
> If a
> > > > message acknowledgment is not received within this timeout, sending
> is
> > > > considered as failed and SPI will try to repeat send operation. It is
> > > > automatically doubled for simultaneous retries up to getMaxAckTimeout
> > > > value.
> > > > 5. TcpDiscoverySpi.setMaxAckTimeout - maximum connection timeout, if
> > the
> > > > getAckTimeout reaches getMaxAckTimeout then SPI give up sending
> retries
> > > >
> > > > Another important TcpDiscoverySpi timeouts:
> > > >
> > > > TcpDiscoverySpi.setJoinTimeout - It is a timeout for join process
> when
> > a
> > > > new/restarted node joins a cluster. The node tries to connect to all
> > > > available IP addresses provided by ipFinder within this timeout.
> > > > If the timeout is exceeded, the node will give up and throw an
> > exception
> > > > from Ignition.start().
> > > >
> > > > TcpDiscoverySpi.setNetworkTimeout - timeout for high-level operations
> > > like
> > > > handshake. It looks like it should be deprecated and the
> > > > IgniteConfiguration.getNetworkTimeout should be used here.
> > > >
> > > > TCP COMMUNICATION SPI:
> > > >
> > > > If you need more control over failure detection algorithm for
> > > > TcpCommunicationSpi you can explicitly use the following low-level
> > > options
> > > > (that will disable failureDetectoinTimeout logic):
> > > >
> > > > 1. TcpCommunicationSpi.setConnectTimeout - socket connection timeout,
> > > will
> > > > be automatically doubled for simultaneous retries (up to
> > > getReconnectCount)
> > > > related to a single logical operation
> > > > 2. TcpCommunicationSpi.setMaxConnectTimeout - maximum connection
> > > timeout,
> > > > the higher limit of getReconnectCount-times doubled getConnectTimeout
> > > > 3. TcpCommunicationSpi.setReconnectCount - number of reconnect
> > attempts
> > > > used
> > > > when establishing connection with the remote node and sending
> messages
> > to
> > > > it
> > > >
> > > > Another important TcpCommunicationSpi timeouts:
> > > >
> > > > TcpDiscoverySpi.setSocketWriteTimeout - timeout to send a message.
> > > > TcpDiscoverySpi.setIdleConnectionTimeout - maximum idle connection
> > > timeout
> > > > upon which a connection will be closed.
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > > >
> > > >
> > >
> > >
> >
> >
>
>

Reply | Threaded
Open this post in threaded view
|

Re: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpitimeouts

Valentin Kulichenko
Stan,

Can you explain the semantics of both parameters? How do they behave when
set on client or on server?

-Val

On Fri, Jul 6, 2018 at 6:12 AM Stanislav Lukyanov <[hidden email]>
wrote:

> We could just use failureDetectionTimeout all the time I guess.
> The only benefit of clientFailureDetectionTimeout is that it may allow
> clients to be slower/on a slower network than servers.
>
> Do you think it isn’t worth to have a separate setting just for that?
>
> Thanks,
> Stan
>
> From: Valentin Kulichenko
> Sent: 5 июля 2018 г. 18:16
> To: [hidden email]
> Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> TcpCommunicationSpitimeouts
>
> Stan,
>
> What is the purpose of clientFailureDetectionTimeout? Why can't we just
> always use failureDetectionTimeout? Is there any difference between these
> two timeouts?
>
> -Val
>
>
>
> On Wed, Jul 4, 2018 at 7:00 AM Stanislav Lukyanov <[hidden email]>
> wrote:
>
> > Hi,
> >
> > I’ve updated the proposed documentation update with a description of
> > metricsUpdateFrequency and a detailed description of
> > failureDetectionTimeout and clientFailureDetectionTimeout relations. The
> > draft is attached to https://issues.apache.org/jira/browse/IGNITE-7704.
> >
> > It seems that relation between failureDetectionTimeout and
> > clientFailureDetectionTimeout is currently too tricky and should also be
> > changed in future.
> > The problem is that in a server-client connection the server will use
> > clientFailureDetectionTimeout but client will use
> failureDetectionTimeout.
> > In other words, clients ignore clientFailureDetectionTimeout and just use
> > failureDetectionTimeout. Because of that, one has to provide different
> > values of failureDetectionTimeout in server and client configs which
> seems
> > confusing and inconvenient.
> > So I’d like to add one more point to my earlier proposal:
> >
> > 5. Always use clientFailureDetectionTimeout on clients instead of
> > failureDetectionTimeout
> > *What*: change code to use clientFailureDetectionTimeout on clients
> > *When*: update code and readme.io docs in 2.7
> >
> > Thanks,
> > Stan
> >
> > From: Valentin Kulichenko
> > Sent: 30 мая 2018 г. 19:09
> > To: [hidden email]
> > Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> > TcpCommunicationSpitimeouts
> >
> > Stan,
> >
> > Looks like you suggest to only change the default. If so, it's OK. But
> > let's not change the behavior of these timeouts for the case they are
> > explicitly set in config.
> >
> > Thanks,
> > Val
> >
> > On Wed, May 30, 2018 at 1:06 AM, Stanislav Lukyanov <
> > [hidden email]>
> > wrote:
> >
> > > On networkTimeout: no, we don’t have anything like that in
> > > TcpCommunicationSpi.
> > >
> > > On socketWriteTimeout:
> > > First, its semantic is very close to TcpDicsoverySpi.socketTimeout
> (with
> > > the exception that communication uses NIO), and the latter defaults to
> > > failureDetectionTimeout,
> > > so I think it would help to avoid confusion.
> > > Second, I think we can’t deprecate something without an alternative
> that
> > > would work for most users.
> > > On the other hand, if we do default socketWriteTimeout to
> > > failureDetectionTimeout then we reach a pretty decent API state
> > > where one only needs two properties in IgniteConfiguration neither of
> > > which we’re considering for deprecation and removal in 3.0.
> > >
> > > Stan
> > >
> > > From: Valentin Kulichenko
> > > Sent: 29 мая 2018 г. 22:17
> > > To: [hidden email]
> > > Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> > > TcpCommunicationSpitimeouts
> > >
> > > Stan,
> > >
> > > OK, I got confused a little :)
> > >
> > > I do agree that TcpDiscoverySpi.networkTimeout should inherit from
> > > IgniteConfiguration.networkTImeout if not set explicitly. Do we have
> the
> > > same setting for TcpCommunicationSpi, BTW? If yes, behavior should be
> > > consistent.
> > >
> > > As for TcpCommunicationSpi.socketWriteTimeout, I'm not sure why you
> want
> > > to
> > > change its behavior. Can we just deprecate it and eventually remove,
> just
> > > as we plan to do for all timeouts from #2?
> > >
> > > -Val
> > >
> > > On Tue, May 29, 2018 at 3:50 AM, Stanislav Lukyanov <
> > > [hidden email]>
> > > wrote:
> > >
> > > > Val,
> > > >
> > > > Which timeouts do you mean?
> > > >
> > > > In #2 I don’t propose to change behavior.
> > > >
> > > > I propose to change behavior for a couple of settings in #3 though.
> > > > I believe the correct approach here would be to target the behavior
> > > change
> > > > for 2.6,
> > > > but keep in mind that we’ll need to carefully analyze the impact
> before
> > > > actually making the changes.
> > > >
> > > > Thanks,
> > > > Stan
> > > >
> > > > From: Valentin Kulichenko
> > > > Sent: 29 мая 2018 г. 0:57
> > > > To: [hidden email]
> > > > Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> > > > TcpCommunicationSpitimeouts
> > > >
> > > > Hi Stan,
> > > >
> > > > I'm 100% for this activity, however I don't think we should change
> the
> > > > behavior of timeouts you listed in #2 - this can lead to unexpected
> > > > behavior for users who already use them. I would just deprecate them
> > and
> > > > eventually remove.
> > > >
> > > > -Val
> > > >
> > > > On Mon, May 28, 2018 at 1:29 PM, Stanislav Lukyanov <
> > > > [hidden email]>
> > > > wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > It looks like we stopped half-way with this activity. I’d like to
> > pick
> > > it
> > > > > up.
> > > > >
> > > > > All seem to agree that we should simplify the timeout settings.
> > > > > Here are the specific actions I’d like to propose:
> > > > >
> > > > > 1. Promote the use of global timeouts as the best practice
> > > > > *What*: update the docs to encourage users to rely on the following
> > > > > timeouts for their “network stability” settings
> > > > > IgniteConfiguration.failureDetectionTimeout
> > > > > IgniteConfiguration.clientFailureDetectionTimeout
> > > > > IgniteConfiguration.networkTimeout
> > > > > *When*: update readme.io docs for 2.5 and Javadoc for 2.6
> > > > >
> > > > > 2. Discourage the use of finer timeouts
> > > > > *What*:
> > > > > - update the docs to discourage users to use the following timeouts
> > and
> > > > > announce their upcoming deprecation and removal
> > > > > TcpDiscoverySpi.socketTimeout
> > > > > TcpDiscoverySpi.ackTimeout
> > > > > TcpDiscoverySpi.maxAckTimeout
> > > > > TcpDiscoverySpi.reconnectCount
> > > > > TcpCommunicationSpi.connectTimeout
> > > > > TcpCommunicationSpi.maxConnectTimeout
> > > > > TcpCommunicationSpi.reconnectCount
> > > > > - deprecate the properties in code
> > > > > - remove the properties in code
> > > > > *When*:
> > > > > - readme.io update with deprecation announcement for 2.5
> > > > > - @Deprecated in code + Javadoc update + respective readme.io
> > > rewording
> > > > > for 2.6
> > > > > - properties removal in 3.0
> > > > >
> > > > > 3. Make “orphan” timeouts rely on global timeouts, then deprecate
> and
> > > > > remove
> > > > > *What*:
> > > > > Two settings currently don’t default to the global equivalents,
> > > although
> > > > > they should:
> > > > > - TcpCommunicationSpi.socketWriteTimeout should default to
> > > > > failureDetectionTimeout
> > > > > - TcpDiscoverySpi.networkTimeout should default to
> > IgniteConfiguration.
> > > > > networkTImeout
> > > > > So the course of action would be:
> > > > > - update the docs to explain that these timeouts have to be used
> for
> > > now,
> > > > > but announce their upcoming deprecation and removal
> > > > > - change the properties to default to their global counterparts and
> > > > > deprecate them in code
> > > > > - remove the properties in code
> > > > > *When*:
> > > > > - readme.io update with deprecation announcement for 2.5
> > > > > - changing defaults + @Deprecated in code + Javadoc update +
> > respective
> > > > > readme.io rewording for 2.6
> > > > > - properties removal in 3.0
> > > > >
> > > > > 4. Don’t touch other timeouts
> > > > > Other timeouts, like TcpDiscoverySpi.joinTimeout or
> > > TcpCommunicationSpi.
> > > > idleConnectionTimeout,
> > > > > are orthogonal to the whole
> > > > > “network stability” theme discussed above, and don’t have to be
> > > changed.
> > > > >
> > > > > Finally, I’ve prepared a draft of the docs page that may be used
> as a
> > > > base
> > > > > for the readme.io update.
> > > > > This email is pretty long already, so please find the draft
> attached
> > to
> > > > > the JIRA issue
> > > > > https://issues.apache.org/jira/browse/IGNITE-7704.
> > > > >
> > > > > Please share your thoughts.
> > > > >
> > > > > Thanks,
> > > > > Stan
> > > > >
> > > > > From: Alexey Popov
> > > > > Sent: 1 марта 2018 г. 17:01
> > > > > To: [hidden email]
> > > > > Subject: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi
> > > > timeouts
> > > > >
> > > > > Hi Igniters,
> > > > >
> > > > > We often see similar questions from users and customers related to
> > > > > IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts
> > and
> > > > > their
> > > > > relations. And we see several side-effects after incorrect timeout
> > > > > configuration.
> > > > >
> > > > > I tried to briefly describe these timeout settings (please see
> below)
> > > and
> > > > > found out that the most of them do not have sense in terms of
> cluster
> > > > > functions/operations and could not be explained to the users.
> > > > >
> > > > > I propose to deprecate most of them and leave only the timeouts we
> > can
> > > > > explain in common terms ( (setFailureDetectionTimeout,
> > > setNetworkTimeout,
> > > > > setJoinTimeout and some others).
> > > > >
> > > > > Please let me know your thoughts.
> > > > >
> > > > > Thanks,
> > > > > Alexey
> > > > >
> > > > > GLOBAL:
> > > > >
> > > > > IgniteConfiguration.setNetworkTimeout:
> > > > > It is a global timeout for high-level operations where a network is
> > > > > involved. For instance, IgniteMessaging delivery uses this timeout
> or
> > > > > DiscoverySpi handshake.
> > > > >
> > > > > IgniteConfiguration.setFailureDetectionTimeout:
> > > > > It is a global timeout for detecting failures at IgniteSpi
> > > > implementations
> > > > > (including DiscoverySpi and CommunicationSpi).
> > > > > The failure detection algorithm actually limits a range of simple
> > > network
> > > > > operations related to a single logical operation (for instance, a
> > > > reliable
> > > > > delivery of some DiscoverySpi message within a cluster).
> > > > > Failure detection timeout is a cumulative timeout for a socket
> > > > connection,
> > > > > sending and receiving data bytes and all possible socket retries
> (if
> > > some
> > > > > failure happens).
> > > > > This timeout is intended to simplify the failure detection
> condition
> > > > from a
> > > > > user perspective.
> > > > >
> > > > > IgniteConfiguration.setClientFailureDetectionTimeout: - it is a
> > > special
> > > > > case
> > > > > for DiscoverySpi client-node Ignite.
> > > > >
> > > > > TCP DISCOVERY SPI:
> > > > >
> > > > > If you need more control over failure detection algorithm for
> > > > > TcpDiscoverySpi you can explicitly use the following low-level
> > options
> > > > > (that
> > > > > will disable failureDetectoinTimeout logic):
> > > > >
> > > > > 1. TcpDiscoverySpi.setConnectTimeout - socket connection timeout
> > > > > 2. TcpDiscoverySpi.setReconnectCount - number of reconnect attempts
> > > used
> > > > > when establishing connection with the remote node and sending
> > messages
> > > to
> > > > > it
> > > > > 3. TcpDiscoverySpi.setSocketTimeout - socket write timeout. The
> write
> > > > > operation will be repeated getReconnectCount() times if it exceeds
> > this
> > > > > timeout
> > > > > 4. TcpDiscoverySpi.setAckTimeout - message acknowledgment timeout.
> > If a
> > > > > message acknowledgment is not received within this timeout, sending
> > is
> > > > > considered as failed and SPI will try to repeat send operation. It
> is
> > > > > automatically doubled for simultaneous retries up to
> getMaxAckTimeout
> > > > > value.
> > > > > 5. TcpDiscoverySpi.setMaxAckTimeout - maximum connection timeout,
> if
> > > the
> > > > > getAckTimeout reaches getMaxAckTimeout then SPI give up sending
> > retries
> > > > >
> > > > > Another important TcpDiscoverySpi timeouts:
> > > > >
> > > > > TcpDiscoverySpi.setJoinTimeout - It is a timeout for join process
> > when
> > > a
> > > > > new/restarted node joins a cluster. The node tries to connect to
> all
> > > > > available IP addresses provided by ipFinder within this timeout.
> > > > > If the timeout is exceeded, the node will give up and throw an
> > > exception
> > > > > from Ignition.start().
> > > > >
> > > > > TcpDiscoverySpi.setNetworkTimeout - timeout for high-level
> operations
> > > > like
> > > > > handshake. It looks like it should be deprecated and the
> > > > > IgniteConfiguration.getNetworkTimeout should be used here.
> > > > >
> > > > > TCP COMMUNICATION SPI:
> > > > >
> > > > > If you need more control over failure detection algorithm for
> > > > > TcpCommunicationSpi you can explicitly use the following low-level
> > > > options
> > > > > (that will disable failureDetectoinTimeout logic):
> > > > >
> > > > > 1. TcpCommunicationSpi.setConnectTimeout - socket connection
> timeout,
> > > > will
> > > > > be automatically doubled for simultaneous retries (up to
> > > > getReconnectCount)
> > > > > related to a single logical operation
> > > > > 2. TcpCommunicationSpi.setMaxConnectTimeout - maximum connection
> > > > timeout,
> > > > > the higher limit of getReconnectCount-times doubled
> getConnectTimeout
> > > > > 3. TcpCommunicationSpi.setReconnectCount - number of reconnect
> > > attempts
> > > > > used
> > > > > when establishing connection with the remote node and sending
> > messages
> > > to
> > > > > it
> > > > >
> > > > > Another important TcpCommunicationSpi timeouts:
> > > > >
> > > > > TcpDiscoverySpi.setSocketWriteTimeout - timeout to send a message.
> > > > > TcpDiscoverySpi.setIdleConnectionTimeout - maximum idle connection
> > > > timeout
> > > > > upon which a connection will be closed.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
>
Reply | Threaded
Open this post in threaded view
|

RE: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpitimeouts

Stanislav Lukyanov
Server will use its failureDetectionTimeout when talking to servers and clientFailureDetectionTimeout when talking to clients.
E.g. a Communication link from server to server uses a failureDetectionTimeout, and server to client uses a clientFailureDetectionTimeout.

Client will use its failureDetectionTimeout all the time, ignoring clientFailureDetectionTimeout.

There is even a possibility of asymmetric settings.
Say, server and client use the same config, failureDetectionTimeout=10 and clientFailureDetectionTimeout=20.
When these two nodes communicate, server will use timeouts of 20 seconds and client will use timeout of 10 seconds.

Stan

From: Valentin Kulichenko
Sent: 6 июля 2018 г. 23:17
To: [hidden email]
Subject: Re: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpitimeouts

Stan,

Can you explain the semantics of both parameters? How do they behave when
set on client or on server?

-Val

On Fri, Jul 6, 2018 at 6:12 AM Stanislav Lukyanov <[hidden email]>
wrote:

> We could just use failureDetectionTimeout all the time I guess.
> The only benefit of clientFailureDetectionTimeout is that it may allow
> clients to be slower/on a slower network than servers.
>
> Do you think it isn’t worth to have a separate setting just for that?
>
> Thanks,
> Stan
>
> From: Valentin Kulichenko
> Sent: 5 июля 2018 г. 18:16
> To: [hidden email]
> Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> TcpCommunicationSpitimeouts
>
> Stan,
>
> What is the purpose of clientFailureDetectionTimeout? Why can't we just
> always use failureDetectionTimeout? Is there any difference between these
> two timeouts?
>
> -Val
>
>
>
> On Wed, Jul 4, 2018 at 7:00 AM Stanislav Lukyanov <[hidden email]>
> wrote:
>
> > Hi,
> >
> > I’ve updated the proposed documentation update with a description of
> > metricsUpdateFrequency and a detailed description of
> > failureDetectionTimeout and clientFailureDetectionTimeout relations. The
> > draft is attached to https://issues.apache.org/jira/browse/IGNITE-7704.
> >
> > It seems that relation between failureDetectionTimeout and
> > clientFailureDetectionTimeout is currently too tricky and should also be
> > changed in future.
> > The problem is that in a server-client connection the server will use
> > clientFailureDetectionTimeout but client will use
> failureDetectionTimeout.
> > In other words, clients ignore clientFailureDetectionTimeout and just use
> > failureDetectionTimeout. Because of that, one has to provide different
> > values of failureDetectionTimeout in server and client configs which
> seems
> > confusing and inconvenient.
> > So I’d like to add one more point to my earlier proposal:
> >
> > 5. Always use clientFailureDetectionTimeout on clients instead of
> > failureDetectionTimeout
> > *What*: change code to use clientFailureDetectionTimeout on clients
> > *When*: update code and readme.io docs in 2.7
> >
> > Thanks,
> > Stan
> >
> > From: Valentin Kulichenko
> > Sent: 30 мая 2018 г. 19:09
> > To: [hidden email]
> > Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> > TcpCommunicationSpitimeouts
> >
> > Stan,
> >
> > Looks like you suggest to only change the default. If so, it's OK. But
> > let's not change the behavior of these timeouts for the case they are
> > explicitly set in config.
> >
> > Thanks,
> > Val
> >
> > On Wed, May 30, 2018 at 1:06 AM, Stanislav Lukyanov <
> > [hidden email]>
> > wrote:
> >
> > > On networkTimeout: no, we don’t have anything like that in
> > > TcpCommunicationSpi.
> > >
> > > On socketWriteTimeout:
> > > First, its semantic is very close to TcpDicsoverySpi.socketTimeout
> (with
> > > the exception that communication uses NIO), and the latter defaults to
> > > failureDetectionTimeout,
> > > so I think it would help to avoid confusion.
> > > Second, I think we can’t deprecate something without an alternative
> that
> > > would work for most users.
> > > On the other hand, if we do default socketWriteTimeout to
> > > failureDetectionTimeout then we reach a pretty decent API state
> > > where one only needs two properties in IgniteConfiguration neither of
> > > which we’re considering for deprecation and removal in 3.0.
> > >
> > > Stan
> > >
> > > From: Valentin Kulichenko
> > > Sent: 29 мая 2018 г. 22:17
> > > To: [hidden email]
> > > Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> > > TcpCommunicationSpitimeouts
> > >
> > > Stan,
> > >
> > > OK, I got confused a little :)
> > >
> > > I do agree that TcpDiscoverySpi.networkTimeout should inherit from
> > > IgniteConfiguration.networkTImeout if not set explicitly. Do we have
> the
> > > same setting for TcpCommunicationSpi, BTW? If yes, behavior should be
> > > consistent.
> > >
> > > As for TcpCommunicationSpi.socketWriteTimeout, I'm not sure why you
> want
> > > to
> > > change its behavior. Can we just deprecate it and eventually remove,
> just
> > > as we plan to do for all timeouts from #2?
> > >
> > > -Val
> > >
> > > On Tue, May 29, 2018 at 3:50 AM, Stanislav Lukyanov <
> > > [hidden email]>
> > > wrote:
> > >
> > > > Val,
> > > >
> > > > Which timeouts do you mean?
> > > >
> > > > In #2 I don’t propose to change behavior.
> > > >
> > > > I propose to change behavior for a couple of settings in #3 though.
> > > > I believe the correct approach here would be to target the behavior
> > > change
> > > > for 2.6,
> > > > but keep in mind that we’ll need to carefully analyze the impact
> before
> > > > actually making the changes.
> > > >
> > > > Thanks,
> > > > Stan
> > > >
> > > > From: Valentin Kulichenko
> > > > Sent: 29 мая 2018 г. 0:57
> > > > To: [hidden email]
> > > > Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> > > > TcpCommunicationSpitimeouts
> > > >
> > > > Hi Stan,
> > > >
> > > > I'm 100% for this activity, however I don't think we should change
> the
> > > > behavior of timeouts you listed in #2 - this can lead to unexpected
> > > > behavior for users who already use them. I would just deprecate them
> > and
> > > > eventually remove.
> > > >
> > > > -Val
> > > >
> > > > On Mon, May 28, 2018 at 1:29 PM, Stanislav Lukyanov <
> > > > [hidden email]>
> > > > wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > It looks like we stopped half-way with this activity. I’d like to
> > pick
> > > it
> > > > > up.
> > > > >
> > > > > All seem to agree that we should simplify the timeout settings.
> > > > > Here are the specific actions I’d like to propose:
> > > > >
> > > > > 1. Promote the use of global timeouts as the best practice
> > > > > *What*: update the docs to encourage users to rely on the following
> > > > > timeouts for their “network stability” settings
> > > > > IgniteConfiguration.failureDetectionTimeout
> > > > > IgniteConfiguration.clientFailureDetectionTimeout
> > > > > IgniteConfiguration.networkTimeout
> > > > > *When*: update readme.io docs for 2.5 and Javadoc for 2.6
> > > > >
> > > > > 2. Discourage the use of finer timeouts
> > > > > *What*:
> > > > > - update the docs to discourage users to use the following timeouts
> > and
> > > > > announce their upcoming deprecation and removal
> > > > > TcpDiscoverySpi.socketTimeout
> > > > > TcpDiscoverySpi.ackTimeout
> > > > > TcpDiscoverySpi.maxAckTimeout
> > > > > TcpDiscoverySpi.reconnectCount
> > > > > TcpCommunicationSpi.connectTimeout
> > > > > TcpCommunicationSpi.maxConnectTimeout
> > > > > TcpCommunicationSpi.reconnectCount
> > > > > - deprecate the properties in code
> > > > > - remove the properties in code
> > > > > *When*:
> > > > > - readme.io update with deprecation announcement for 2.5
> > > > > - @Deprecated in code + Javadoc update + respective readme.io
> > > rewording
> > > > > for 2.6
> > > > > - properties removal in 3.0
> > > > >
> > > > > 3. Make “orphan” timeouts rely on global timeouts, then deprecate
> and
> > > > > remove
> > > > > *What*:
> > > > > Two settings currently don’t default to the global equivalents,
> > > although
> > > > > they should:
> > > > > - TcpCommunicationSpi.socketWriteTimeout should default to
> > > > > failureDetectionTimeout
> > > > > - TcpDiscoverySpi.networkTimeout should default to
> > IgniteConfiguration.
> > > > > networkTImeout
> > > > > So the course of action would be:
> > > > > - update the docs to explain that these timeouts have to be used
> for
> > > now,
> > > > > but announce their upcoming deprecation and removal
> > > > > - change the properties to default to their global counterparts and
> > > > > deprecate them in code
> > > > > - remove the properties in code
> > > > > *When*:
> > > > > - readme.io update with deprecation announcement for 2.5
> > > > > - changing defaults + @Deprecated in code + Javadoc update +
> > respective
> > > > > readme.io rewording for 2.6
> > > > > - properties removal in 3.0
> > > > >
> > > > > 4. Don’t touch other timeouts
> > > > > Other timeouts, like TcpDiscoverySpi.joinTimeout or
> > > TcpCommunicationSpi.
> > > > idleConnectionTimeout,
> > > > > are orthogonal to the whole
> > > > > “network stability” theme discussed above, and don’t have to be
> > > changed.
> > > > >
> > > > > Finally, I’ve prepared a draft of the docs page that may be used
> as a
> > > > base
> > > > > for the readme.io update.
> > > > > This email is pretty long already, so please find the draft
> attached
> > to
> > > > > the JIRA issue
> > > > > https://issues.apache.org/jira/browse/IGNITE-7704.
> > > > >
> > > > > Please share your thoughts.
> > > > >
> > > > > Thanks,
> > > > > Stan
> > > > >
> > > > > From: Alexey Popov
> > > > > Sent: 1 марта 2018 г. 17:01
> > > > > To: [hidden email]
> > > > > Subject: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi
> > > > timeouts
> > > > >
> > > > > Hi Igniters,
> > > > >
> > > > > We often see similar questions from users and customers related to
> > > > > IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts
> > and
> > > > > their
> > > > > relations. And we see several side-effects after incorrect timeout
> > > > > configuration.
> > > > >
> > > > > I tried to briefly describe these timeout settings (please see
> below)
> > > and
> > > > > found out that the most of them do not have sense in terms of
> cluster
> > > > > functions/operations and could not be explained to the users.
> > > > >
> > > > > I propose to deprecate most of them and leave only the timeouts we
> > can
> > > > > explain in common terms ( (setFailureDetectionTimeout,
> > > setNetworkTimeout,
> > > > > setJoinTimeout and some others).
> > > > >
> > > > > Please let me know your thoughts.
> > > > >
> > > > > Thanks,
> > > > > Alexey
> > > > >
> > > > > GLOBAL:
> > > > >
> > > > > IgniteConfiguration.setNetworkTimeout:
> > > > > It is a global timeout for high-level operations where a network is
> > > > > involved. For instance, IgniteMessaging delivery uses this timeout
> or
> > > > > DiscoverySpi handshake.
> > > > >
> > > > > IgniteConfiguration.setFailureDetectionTimeout:
> > > > > It is a global timeout for detecting failures at IgniteSpi
> > > > implementations
> > > > > (including DiscoverySpi and CommunicationSpi).
> > > > > The failure detection algorithm actually limits a range of simple
> > > network
> > > > > operations related to a single logical operation (for instance, a
> > > > reliable
> > > > > delivery of some DiscoverySpi message within a cluster).
> > > > > Failure detection timeout is a cumulative timeout for a socket
> > > > connection,
> > > > > sending and receiving data bytes and all possible socket retries
> (if
> > > some
> > > > > failure happens).
> > > > > This timeout is intended to simplify the failure detection
> condition
> > > > from a
> > > > > user perspective.
> > > > >
> > > > > IgniteConfiguration.setClientFailureDetectionTimeout: - it is a
> > > special
> > > > > case
> > > > > for DiscoverySpi client-node Ignite.
> > > > >
> > > > > TCP DISCOVERY SPI:
> > > > >
> > > > > If you need more control over failure detection algorithm for
> > > > > TcpDiscoverySpi you can explicitly use the following low-level
> > options
> > > > > (that
> > > > > will disable failureDetectoinTimeout logic):
> > > > >
> > > > > 1. TcpDiscoverySpi.setConnectTimeout - socket connection timeout
> > > > > 2. TcpDiscoverySpi.setReconnectCount - number of reconnect attempts
> > > used
> > > > > when establishing connection with the remote node and sending
> > messages
> > > to
> > > > > it
> > > > > 3. TcpDiscoverySpi.setSocketTimeout - socket write timeout. The
> write
> > > > > operation will be repeated getReconnectCount() times if it exceeds
> > this
> > > > > timeout
> > > > > 4. TcpDiscoverySpi.setAckTimeout - message acknowledgment timeout.
> > If a
> > > > > message acknowledgment is not received within this timeout, sending
> > is
> > > > > considered as failed and SPI will try to repeat send operation. It
> is
> > > > > automatically doubled for simultaneous retries up to
> getMaxAckTimeout
> > > > > value.
> > > > > 5. TcpDiscoverySpi.setMaxAckTimeout - maximum connection timeout,
> if
> > > the
> > > > > getAckTimeout reaches getMaxAckTimeout then SPI give up sending
> > retries
> > > > >
> > > > > Another important TcpDiscoverySpi timeouts:
> > > > >
> > > > > TcpDiscoverySpi.setJoinTimeout - It is a timeout for join process
> > when
> > > a
> > > > > new/restarted node joins a cluster. The node tries to connect to
> all
> > > > > available IP addresses provided by ipFinder within this timeout.
> > > > > If the timeout is exceeded, the node will give up and throw an
> > > exception
> > > > > from Ignition.start().
> > > > >
> > > > > TcpDiscoverySpi.setNetworkTimeout - timeout for high-level
> operations
> > > > like
> > > > > handshake. It looks like it should be deprecated and the
> > > > > IgniteConfiguration.getNetworkTimeout should be used here.
> > > > >
> > > > > TCP COMMUNICATION SPI:
> > > > >
> > > > > If you need more control over failure detection algorithm for
> > > > > TcpCommunicationSpi you can explicitly use the following low-level
> > > > options
> > > > > (that will disable failureDetectoinTimeout logic):
> > > > >
> > > > > 1. TcpCommunicationSpi.setConnectTimeout - socket connection
> timeout,
> > > > will
> > > > > be automatically doubled for simultaneous retries (up to
> > > > getReconnectCount)
> > > > > related to a single logical operation
> > > > > 2. TcpCommunicationSpi.setMaxConnectTimeout - maximum connection
> > > > timeout,
> > > > > the higher limit of getReconnectCount-times doubled
> getConnectTimeout
> > > > > 3. TcpCommunicationSpi.setReconnectCount - number of reconnect
> > > attempts
> > > > > used
> > > > > when establishing connection with the remote node and sending
> > messages
> > > to
> > > > > it
> > > > >
> > > > > Another important TcpCommunicationSpi timeouts:
> > > > >
> > > > > TcpDiscoverySpi.setSocketWriteTimeout - timeout to send a message.
> > > > > TcpDiscoverySpi.setIdleConnectionTimeout - maximum idle connection
> > > > timeout
> > > > > upon which a connection will be closed.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
>

Reply | Threaded
Open this post in threaded view
|

Re: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpitimeouts

Valentin Kulichenko
If clientFailureDetectionTimeout is not set on server node, will it
use failureDetectionTimeout
instead?

Either way, this configuration seems to be a bit confusing, but I don't
think we can change it now. Let's just make sure it's properly documented.

-Val

On Mon, Jul 9, 2018 at 5:47 AM Stanislav Lukyanov <[hidden email]>
wrote:

> Server will use its failureDetectionTimeout when talking to servers and
> clientFailureDetectionTimeout when talking to clients.
> E.g. a Communication link from server to server uses a
> failureDetectionTimeout, and server to client uses a
> clientFailureDetectionTimeout.
>
> Client will use its failureDetectionTimeout all the time, ignoring
> clientFailureDetectionTimeout.
>
> There is even a possibility of asymmetric settings.
> Say, server and client use the same config, failureDetectionTimeout=10 and
> clientFailureDetectionTimeout=20.
> When these two nodes communicate, server will use timeouts of 20 seconds
> and client will use timeout of 10 seconds.
>
> Stan
>
> From: Valentin Kulichenko
> Sent: 6 июля 2018 г. 23:17
> To: [hidden email]
> Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> TcpCommunicationSpitimeouts
>
> Stan,
>
> Can you explain the semantics of both parameters? How do they behave when
> set on client or on server?
>
> -Val
>
> On Fri, Jul 6, 2018 at 6:12 AM Stanislav Lukyanov <[hidden email]>
> wrote:
>
> > We could just use failureDetectionTimeout all the time I guess.
> > The only benefit of clientFailureDetectionTimeout is that it may allow
> > clients to be slower/on a slower network than servers.
> >
> > Do you think it isn’t worth to have a separate setting just for that?
> >
> > Thanks,
> > Stan
> >
> > From: Valentin Kulichenko
> > Sent: 5 июля 2018 г. 18:16
> > To: [hidden email]
> > Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> > TcpCommunicationSpitimeouts
> >
> > Stan,
> >
> > What is the purpose of clientFailureDetectionTimeout? Why can't we just
> > always use failureDetectionTimeout? Is there any difference between these
> > two timeouts?
> >
> > -Val
> >
> >
> >
> > On Wed, Jul 4, 2018 at 7:00 AM Stanislav Lukyanov <
> [hidden email]>
> > wrote:
> >
> > > Hi,
> > >
> > > I’ve updated the proposed documentation update with a description of
> > > metricsUpdateFrequency and a detailed description of
> > > failureDetectionTimeout and clientFailureDetectionTimeout relations.
> The
> > > draft is attached to https://issues.apache.org/jira/browse/IGNITE-7704
> .
> > >
> > > It seems that relation between failureDetectionTimeout and
> > > clientFailureDetectionTimeout is currently too tricky and should also
> be
> > > changed in future.
> > > The problem is that in a server-client connection the server will use
> > > clientFailureDetectionTimeout but client will use
> > failureDetectionTimeout.
> > > In other words, clients ignore clientFailureDetectionTimeout and just
> use
> > > failureDetectionTimeout. Because of that, one has to provide different
> > > values of failureDetectionTimeout in server and client configs which
> > seems
> > > confusing and inconvenient.
> > > So I’d like to add one more point to my earlier proposal:
> > >
> > > 5. Always use clientFailureDetectionTimeout on clients instead of
> > > failureDetectionTimeout
> > > *What*: change code to use clientFailureDetectionTimeout on clients
> > > *When*: update code and readme.io docs in 2.7
> > >
> > > Thanks,
> > > Stan
> > >
> > > From: Valentin Kulichenko
> > > Sent: 30 мая 2018 г. 19:09
> > > To: [hidden email]
> > > Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> > > TcpCommunicationSpitimeouts
> > >
> > > Stan,
> > >
> > > Looks like you suggest to only change the default. If so, it's OK. But
> > > let's not change the behavior of these timeouts for the case they are
> > > explicitly set in config.
> > >
> > > Thanks,
> > > Val
> > >
> > > On Wed, May 30, 2018 at 1:06 AM, Stanislav Lukyanov <
> > > [hidden email]>
> > > wrote:
> > >
> > > > On networkTimeout: no, we don’t have anything like that in
> > > > TcpCommunicationSpi.
> > > >
> > > > On socketWriteTimeout:
> > > > First, its semantic is very close to TcpDicsoverySpi.socketTimeout
> > (with
> > > > the exception that communication uses NIO), and the latter defaults
> to
> > > > failureDetectionTimeout,
> > > > so I think it would help to avoid confusion.
> > > > Second, I think we can’t deprecate something without an alternative
> > that
> > > > would work for most users.
> > > > On the other hand, if we do default socketWriteTimeout to
> > > > failureDetectionTimeout then we reach a pretty decent API state
> > > > where one only needs two properties in IgniteConfiguration neither of
> > > > which we’re considering for deprecation and removal in 3.0.
> > > >
> > > > Stan
> > > >
> > > > From: Valentin Kulichenko
> > > > Sent: 29 мая 2018 г. 22:17
> > > > To: [hidden email]
> > > > Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> > > > TcpCommunicationSpitimeouts
> > > >
> > > > Stan,
> > > >
> > > > OK, I got confused a little :)
> > > >
> > > > I do agree that TcpDiscoverySpi.networkTimeout should inherit from
> > > > IgniteConfiguration.networkTImeout if not set explicitly. Do we have
> > the
> > > > same setting for TcpCommunicationSpi, BTW? If yes, behavior should be
> > > > consistent.
> > > >
> > > > As for TcpCommunicationSpi.socketWriteTimeout, I'm not sure why you
> > want
> > > > to
> > > > change its behavior. Can we just deprecate it and eventually remove,
> > just
> > > > as we plan to do for all timeouts from #2?
> > > >
> > > > -Val
> > > >
> > > > On Tue, May 29, 2018 at 3:50 AM, Stanislav Lukyanov <
> > > > [hidden email]>
> > > > wrote:
> > > >
> > > > > Val,
> > > > >
> > > > > Which timeouts do you mean?
> > > > >
> > > > > In #2 I don’t propose to change behavior.
> > > > >
> > > > > I propose to change behavior for a couple of settings in #3 though.
> > > > > I believe the correct approach here would be to target the behavior
> > > > change
> > > > > for 2.6,
> > > > > but keep in mind that we’ll need to carefully analyze the impact
> > before
> > > > > actually making the changes.
> > > > >
> > > > > Thanks,
> > > > > Stan
> > > > >
> > > > > From: Valentin Kulichenko
> > > > > Sent: 29 мая 2018 г. 0:57
> > > > > To: [hidden email]
> > > > > Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> > > > > TcpCommunicationSpitimeouts
> > > > >
> > > > > Hi Stan,
> > > > >
> > > > > I'm 100% for this activity, however I don't think we should change
> > the
> > > > > behavior of timeouts you listed in #2 - this can lead to unexpected
> > > > > behavior for users who already use them. I would just deprecate
> them
> > > and
> > > > > eventually remove.
> > > > >
> > > > > -Val
> > > > >
> > > > > On Mon, May 28, 2018 at 1:29 PM, Stanislav Lukyanov <
> > > > > [hidden email]>
> > > > > wrote:
> > > > >
> > > > > > Hi folks,
> > > > > >
> > > > > > It looks like we stopped half-way with this activity. I’d like to
> > > pick
> > > > it
> > > > > > up.
> > > > > >
> > > > > > All seem to agree that we should simplify the timeout settings.
> > > > > > Here are the specific actions I’d like to propose:
> > > > > >
> > > > > > 1. Promote the use of global timeouts as the best practice
> > > > > > *What*: update the docs to encourage users to rely on the
> following
> > > > > > timeouts for their “network stability” settings
> > > > > > IgniteConfiguration.failureDetectionTimeout
> > > > > > IgniteConfiguration.clientFailureDetectionTimeout
> > > > > > IgniteConfiguration.networkTimeout
> > > > > > *When*: update readme.io docs for 2.5 and Javadoc for 2.6
> > > > > >
> > > > > > 2. Discourage the use of finer timeouts
> > > > > > *What*:
> > > > > > - update the docs to discourage users to use the following
> timeouts
> > > and
> > > > > > announce their upcoming deprecation and removal
> > > > > > TcpDiscoverySpi.socketTimeout
> > > > > > TcpDiscoverySpi.ackTimeout
> > > > > > TcpDiscoverySpi.maxAckTimeout
> > > > > > TcpDiscoverySpi.reconnectCount
> > > > > > TcpCommunicationSpi.connectTimeout
> > > > > > TcpCommunicationSpi.maxConnectTimeout
> > > > > > TcpCommunicationSpi.reconnectCount
> > > > > > - deprecate the properties in code
> > > > > > - remove the properties in code
> > > > > > *When*:
> > > > > > - readme.io update with deprecation announcement for 2.5
> > > > > > - @Deprecated in code + Javadoc update + respective readme.io
> > > > rewording
> > > > > > for 2.6
> > > > > > - properties removal in 3.0
> > > > > >
> > > > > > 3. Make “orphan” timeouts rely on global timeouts, then deprecate
> > and
> > > > > > remove
> > > > > > *What*:
> > > > > > Two settings currently don’t default to the global equivalents,
> > > > although
> > > > > > they should:
> > > > > > - TcpCommunicationSpi.socketWriteTimeout should default to
> > > > > > failureDetectionTimeout
> > > > > > - TcpDiscoverySpi.networkTimeout should default to
> > > IgniteConfiguration.
> > > > > > networkTImeout
> > > > > > So the course of action would be:
> > > > > > - update the docs to explain that these timeouts have to be used
> > for
> > > > now,
> > > > > > but announce their upcoming deprecation and removal
> > > > > > - change the properties to default to their global counterparts
> and
> > > > > > deprecate them in code
> > > > > > - remove the properties in code
> > > > > > *When*:
> > > > > > - readme.io update with deprecation announcement for 2.5
> > > > > > - changing defaults + @Deprecated in code + Javadoc update +
> > > respective
> > > > > > readme.io rewording for 2.6
> > > > > > - properties removal in 3.0
> > > > > >
> > > > > > 4. Don’t touch other timeouts
> > > > > > Other timeouts, like TcpDiscoverySpi.joinTimeout or
> > > > TcpCommunicationSpi.
> > > > > idleConnectionTimeout,
> > > > > > are orthogonal to the whole
> > > > > > “network stability” theme discussed above, and don’t have to be
> > > > changed.
> > > > > >
> > > > > > Finally, I’ve prepared a draft of the docs page that may be used
> > as a
> > > > > base
> > > > > > for the readme.io update.
> > > > > > This email is pretty long already, so please find the draft
> > attached
> > > to
> > > > > > the JIRA issue
> > > > > > https://issues.apache.org/jira/browse/IGNITE-7704.
> > > > > >
> > > > > > Please share your thoughts.
> > > > > >
> > > > > > Thanks,
> > > > > > Stan
> > > > > >
> > > > > > From: Alexey Popov
> > > > > > Sent: 1 марта 2018 г. 17:01
> > > > > > To: [hidden email]
> > > > > > Subject: IgniteConfiguration, TcpDiscoverySpi,
> TcpCommunicationSpi
> > > > > timeouts
> > > > > >
> > > > > > Hi Igniters,
> > > > > >
> > > > > > We often see similar questions from users and customers related
> to
> > > > > > IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi
> timeouts
> > > and
> > > > > > their
> > > > > > relations. And we see several side-effects after incorrect
> timeout
> > > > > > configuration.
> > > > > >
> > > > > > I tried to briefly describe these timeout settings (please see
> > below)
> > > > and
> > > > > > found out that the most of them do not have sense in terms of
> > cluster
> > > > > > functions/operations and could not be explained to the users.
> > > > > >
> > > > > > I propose to deprecate most of them and leave only the timeouts
> we
> > > can
> > > > > > explain in common terms ( (setFailureDetectionTimeout,
> > > > setNetworkTimeout,
> > > > > > setJoinTimeout and some others).
> > > > > >
> > > > > > Please let me know your thoughts.
> > > > > >
> > > > > > Thanks,
> > > > > > Alexey
> > > > > >
> > > > > > GLOBAL:
> > > > > >
> > > > > > IgniteConfiguration.setNetworkTimeout:
> > > > > > It is a global timeout for high-level operations where a network
> is
> > > > > > involved. For instance, IgniteMessaging delivery uses this
> timeout
> > or
> > > > > > DiscoverySpi handshake.
> > > > > >
> > > > > > IgniteConfiguration.setFailureDetectionTimeout:
> > > > > > It is a global timeout for detecting failures at IgniteSpi
> > > > > implementations
> > > > > > (including DiscoverySpi and CommunicationSpi).
> > > > > > The failure detection algorithm actually limits a range of simple
> > > > network
> > > > > > operations related to a single logical operation (for instance, a
> > > > > reliable
> > > > > > delivery of some DiscoverySpi message within a cluster).
> > > > > > Failure detection timeout is a cumulative timeout for a socket
> > > > > connection,
> > > > > > sending and receiving data bytes and all possible socket retries
> > (if
> > > > some
> > > > > > failure happens).
> > > > > > This timeout is intended to simplify the failure detection
> > condition
> > > > > from a
> > > > > > user perspective.
> > > > > >
> > > > > > IgniteConfiguration.setClientFailureDetectionTimeout: - it is a
> > > > special
> > > > > > case
> > > > > > for DiscoverySpi client-node Ignite.
> > > > > >
> > > > > > TCP DISCOVERY SPI:
> > > > > >
> > > > > > If you need more control over failure detection algorithm for
> > > > > > TcpDiscoverySpi you can explicitly use the following low-level
> > > options
> > > > > > (that
> > > > > > will disable failureDetectoinTimeout logic):
> > > > > >
> > > > > > 1. TcpDiscoverySpi.setConnectTimeout - socket connection timeout
> > > > > > 2. TcpDiscoverySpi.setReconnectCount - number of reconnect
> attempts
> > > > used
> > > > > > when establishing connection with the remote node and sending
> > > messages
> > > > to
> > > > > > it
> > > > > > 3. TcpDiscoverySpi.setSocketTimeout - socket write timeout. The
> > write
> > > > > > operation will be repeated getReconnectCount() times if it
> exceeds
> > > this
> > > > > > timeout
> > > > > > 4. TcpDiscoverySpi.setAckTimeout - message acknowledgment
> timeout.
> > > If a
> > > > > > message acknowledgment is not received within this timeout,
> sending
> > > is
> > > > > > considered as failed and SPI will try to repeat send operation.
> It
> > is
> > > > > > automatically doubled for simultaneous retries up to
> > getMaxAckTimeout
> > > > > > value.
> > > > > > 5. TcpDiscoverySpi.setMaxAckTimeout - maximum connection timeout,
> > if
> > > > the
> > > > > > getAckTimeout reaches getMaxAckTimeout then SPI give up sending
> > > retries
> > > > > >
> > > > > > Another important TcpDiscoverySpi timeouts:
> > > > > >
> > > > > > TcpDiscoverySpi.setJoinTimeout - It is a timeout for join process
> > > when
> > > > a
> > > > > > new/restarted node joins a cluster. The node tries to connect to
> > all
> > > > > > available IP addresses provided by ipFinder within this timeout.
> > > > > > If the timeout is exceeded, the node will give up and throw an
> > > > exception
> > > > > > from Ignition.start().
> > > > > >
> > > > > > TcpDiscoverySpi.setNetworkTimeout - timeout for high-level
> > operations
> > > > > like
> > > > > > handshake. It looks like it should be deprecated and the
> > > > > > IgniteConfiguration.getNetworkTimeout should be used here.
> > > > > >
> > > > > > TCP COMMUNICATION SPI:
> > > > > >
> > > > > > If you need more control over failure detection algorithm for
> > > > > > TcpCommunicationSpi you can explicitly use the following
> low-level
> > > > > options
> > > > > > (that will disable failureDetectoinTimeout logic):
> > > > > >
> > > > > > 1. TcpCommunicationSpi.setConnectTimeout - socket connection
> > timeout,
> > > > > will
> > > > > > be automatically doubled for simultaneous retries (up to
> > > > > getReconnectCount)
> > > > > > related to a single logical operation
> > > > > > 2. TcpCommunicationSpi.setMaxConnectTimeout - maximum connection
> > > > > timeout,
> > > > > > the higher limit of getReconnectCount-times doubled
> > getConnectTimeout
> > > > > > 3. TcpCommunicationSpi.setReconnectCount - number of reconnect
> > > > attempts
> > > > > > used
> > > > > > when establishing connection with the remote node and sending
> > > messages
> > > > to
> > > > > > it
> > > > > >
> > > > > > Another important TcpCommunicationSpi timeouts:
> > > > > >
> > > > > > TcpDiscoverySpi.setSocketWriteTimeout - timeout to send a
> message.
> > > > > > TcpDiscoverySpi.setIdleConnectionTimeout - maximum idle
> connection
> > > > > timeout
> > > > > > upon which a connection will be closed.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Sent from:
> http://apache-ignite-developers.2346864.n4.nabble.com/
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
>
12