Hi Igniters,
Whilst diagnosing a problem with a slow query, I became aware of a potential issue in the Ignite codebase. When executing a SQL query that is to run remotely, the IgniteH2Indexing#send() method is called, with a Collection<ClusterNode> as one of its parameters. This collection is iterated sequentially, and ctx.io().sendGeneric() is called synchronously for each node. This is inefficient if a) This is the first execution of a query, and thus TCP connections have to be established b) The cost of establishing a TCP connection is high And optionally c) There are a large number of nodes in the cluster In my current situation, developers want to run test queries from their code running locally, but connected via VPN to their UAT server environment. The cost of opening a TCP connection is in the multiple seconds, as you can see from this Ignite log file snippet: 2017-05-22 18:29:48,908 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/7.1.14.242:56924, rmtAddr=/10.132.80.3:47100] 2017-05-22 18:29:52,294 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/7.1.14.242:56923, rmtAddr=/10.132.80.30:47102] 2017-05-22 18:29:58,659 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/7.1.14.242:56971, rmtAddr=/10.132.80.23:47101] 2017-05-22 18:30:03,183 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/7.1.14.242:56972, rmtAddr=/10.132.80.21:47100] 2017-05-22 18:30:06,039 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/7.1.14.242:56973, rmtAddr=/10.132.80.21:47103] 2017-05-22 18:30:10,828 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/7.1.14.242:57020, rmtAddr=/10.132.80.20:47100] 2017-05-22 18:30:13,060 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/7.1.14.242:57021, rmtAddr=/10.132.80.29:47103] 2017-05-22 18:30:22,144 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/7.1.14.242:57022, rmtAddr=/10.132.80.22:47103] 2017-05-22 18:30:26,513 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/7.1.14.242:57024, rmtAddr=/10.132.80.20:47101] 2017-05-22 18:30:28,526 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/7.1.14.242:57025, rmtAddr=/10.132.80.30:47103] Comparing the same code that is executed inside of the UAT environment (so not using the VPN): 2017-05-22 18:22:18,102 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/10.175.11.38:53288, rmtAddr=/10.175.11.58:47100] 2017-05-22 18:22:18,105 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/10.175.11.38:45890, rmtAddr=/10.175.11.54:47101] 2017-05-22 18:22:18,108 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/127.0.0.1:47582, rmtAddr=/127.0.0.1:47100] 2017-05-22 18:22:18,111 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/127.0.0.1:45240, rmtAddr=/127.0.0.1:47103] 2017-05-22 18:22:18,114 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/10.175.11.38:46280, rmtAddr=/10.175.11.15:47100] 2017-05-22 18:22:18,118 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/10.132.80.21:51476, rmtAddr=/10.132.80.29:47103] 2017-05-22 18:22:18,120 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/10.132.80.21:56274, rmtAddr=pocfd-master1/10.132.80.22:47103] 2017-05-22 18:22:18,124 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/10.132.80.21:53558, rmtAddr=pocfd-ignite1/10.132.80.20:47101] 2017-05-22 18:22:18,127 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/10.132.80.21:56216, rmtAddr=/10.132.80.30:47103] This is a design flaw in the Ignite code, as we are relying on the client's network behaving in a particular way (i.e., port opening being very fast). We should instead try to mask this potential slowness by establishing connections in parallel, and waiting on the results. I would like to hear others thoughts and comment before we open a JIRA to look at this. Regards Mike |
Hi Mike,
Generally, establishing connections in parallel could make sense, but note that in most this would be a minor optimization, because: - Under load connections are established once and then reused. If you observe disconnections during application lifetime under load, then probably this should be addressed first. - Actual communication is asynchronous, we use NIO for this. If connection already exists, sendGeneric() basically just puts a message into a queue. -Val On Mon, May 22, 2017 at 7:04 PM, Michael Griggs <[hidden email] > wrote: > Hi Igniters, > > > > Whilst diagnosing a problem with a slow query, I became aware of a > potential > issue in the Ignite codebase. When executing a SQL query that is to run > remotely, the IgniteH2Indexing#send() method is called, with a > Collection<ClusterNode> as one of its parameters. This collection is > iterated sequentially, and ctx.io().sendGeneric() is called synchronously > for each node. This is inefficient if > > > > a) This is the first execution of a query, and thus TCP connections > have to be established > > b) The cost of establishing a TCP connection is high > > > > And optionally > > > > c) There are a large number of nodes in the cluster > > > > In my current situation, developers want to run test queries from their > code > running locally, but connected via VPN to their UAT server environment. > The > cost of opening a TCP connection is in the multiple seconds, as you can see > from this Ignite log file snippet: > > 2017-05-22 18:29:48,908 INFO [TcpCommunicationSpi] - Established outgoing > communication connection [locAddr=/7.1.14.242:56924, > rmtAddr=/10.132.80.3:47100] > > 2017-05-22 18:29:52,294 INFO [TcpCommunicationSpi] - Established outgoing > communication connection [locAddr=/7.1.14.242:56923, > rmtAddr=/10.132.80.30:47102] > > 2017-05-22 18:29:58,659 INFO [TcpCommunicationSpi] - Established outgoing > communication connection [locAddr=/7.1.14.242:56971, > rmtAddr=/10.132.80.23:47101] > > 2017-05-22 18:30:03,183 INFO [TcpCommunicationSpi] - Established outgoing > communication connection [locAddr=/7.1.14.242:56972, > rmtAddr=/10.132.80.21:47100] > > 2017-05-22 18:30:06,039 INFO [TcpCommunicationSpi] - Established outgoing > communication connection [locAddr=/7.1.14.242:56973, > rmtAddr=/10.132.80.21:47103] > > 2017-05-22 18:30:10,828 INFO [TcpCommunicationSpi] - Established outgoing > communication connection [locAddr=/7.1.14.242:57020, > rmtAddr=/10.132.80.20:47100] > > 2017-05-22 18:30:13,060 INFO [TcpCommunicationSpi] - Established outgoing > communication connection [locAddr=/7.1.14.242:57021, > rmtAddr=/10.132.80.29:47103] > > 2017-05-22 18:30:22,144 INFO [TcpCommunicationSpi] - Established outgoing > communication connection [locAddr=/7.1.14.242:57022, > rmtAddr=/10.132.80.22:47103] > > 2017-05-22 18:30:26,513 INFO [TcpCommunicationSpi] - Established outgoing > communication connection [locAddr=/7.1.14.242:57024, > rmtAddr=/10.132.80.20:47101] > > 2017-05-22 18:30:28,526 INFO [TcpCommunicationSpi] - Established outgoing > communication connection [locAddr=/7.1.14.242:57025, > rmtAddr=/10.132.80.30:47103] > > > > Comparing the same code that is executed inside of the UAT environment (so > not using the VPN): > > 2017-05-22 18:22:18,102 INFO [TcpCommunicationSpi] - Established outgoing > communication connection [locAddr=/10.175.11.38:53288, > rmtAddr=/10.175.11.58:47100] > > 2017-05-22 18:22:18,105 INFO [TcpCommunicationSpi] - Established outgoing > communication connection [locAddr=/10.175.11.38:45890, > rmtAddr=/10.175.11.54:47101] > > 2017-05-22 18:22:18,108 INFO [TcpCommunicationSpi] - Established outgoing > communication connection [locAddr=/127.0.0.1:47582, > rmtAddr=/127.0.0.1:47100] > > 2017-05-22 18:22:18,111 INFO [TcpCommunicationSpi] - Established outgoing > communication connection [locAddr=/127.0.0.1:45240, > rmtAddr=/127.0.0.1:47103] > > 2017-05-22 18:22:18,114 INFO [TcpCommunicationSpi] - Established outgoing > communication connection [locAddr=/10.175.11.38:46280, > rmtAddr=/10.175.11.15:47100] > > 2017-05-22 18:22:18,118 INFO [TcpCommunicationSpi] - Established outgoing > communication connection [locAddr=/10.132.80.21:51476, > rmtAddr=/10.132.80.29:47103] > > 2017-05-22 18:22:18,120 INFO [TcpCommunicationSpi] - Established outgoing > communication connection [locAddr=/10.132.80.21:56274, > rmtAddr=pocfd-master1/10.132.80.22:47103] > > 2017-05-22 18:22:18,124 INFO [TcpCommunicationSpi] - Established outgoing > communication connection [locAddr=/10.132.80.21:53558, > rmtAddr=pocfd-ignite1/10.132.80.20:47101] > > 2017-05-22 18:22:18,127 INFO [TcpCommunicationSpi] - Established outgoing > communication connection [locAddr=/10.132.80.21:56216, > rmtAddr=/10.132.80.30:47103] > > > > This is a design flaw in the Ignite code, as we are relying on the client's > network behaving in a particular way (i.e., port opening being very fast). > We should instead try to mask this potential slowness by establishing > connections in parallel, and waiting on the results. > > > > I would like to hear others thoughts and comment before we open a JIRA to > look at this. > > > > Regards > > Mike > > |
Hi Val
This is precisely my point: it's only a minor optimization until the point when establishing each connection takes 3-4 seconds, and we establish 32 of them in sequence. At that point it becomes a serious issue: the customer cannot run SQL queries from their development machines without them timing out once out of every two or three runs. These kind of problems undermine confidence in Ignite. Mike -----Original Message----- From: Valentin Kulichenko [mailto:[hidden email]] Sent: 22 May 2017 19:15 To: [hidden email] Subject: Re: Inefficient approach to executing remote SQL queries Hi Mike, Generally, establishing connections in parallel could make sense, but note that in most this would be a minor optimization, because: - Under load connections are established once and then reused. If you observe disconnections during application lifetime under load, then probably this should be addressed first. - Actual communication is asynchronous, we use NIO for this. If connection already exists, sendGeneric() basically just puts a message into a queue. -Val On Mon, May 22, 2017 at 7:04 PM, Michael Griggs <[hidden email] > wrote: > Hi Igniters, > > > > Whilst diagnosing a problem with a slow query, I became aware of a > potential issue in the Ignite codebase. When executing a SQL query > that is to run remotely, the IgniteH2Indexing#send() method is called, > with a Collection<ClusterNode> as one of its parameters. This > collection is iterated sequentially, and ctx.io().sendGeneric() is > called synchronously for each node. This is inefficient if > > > > a) This is the first execution of a query, and thus TCP connections > have to be established > > b) The cost of establishing a TCP connection is high > > > > And optionally > > > > c) There are a large number of nodes in the cluster > > > > In my current situation, developers want to run test queries from > their code running locally, but connected via VPN to their UAT server > environment. > The > cost of opening a TCP connection is in the multiple seconds, as you > can see from this Ignite log file snippet: > > 2017-05-22 18:29:48,908 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/7.1.14.242:56924, > rmtAddr=/10.132.80.3:47100] > > 2017-05-22 18:29:52,294 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/7.1.14.242:56923, > rmtAddr=/10.132.80.30:47102] > > 2017-05-22 18:29:58,659 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/7.1.14.242:56971, > rmtAddr=/10.132.80.23:47101] > > 2017-05-22 18:30:03,183 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/7.1.14.242:56972, > rmtAddr=/10.132.80.21:47100] > > 2017-05-22 18:30:06,039 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/7.1.14.242:56973, > rmtAddr=/10.132.80.21:47103] > > 2017-05-22 18:30:10,828 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/7.1.14.242:57020, > rmtAddr=/10.132.80.20:47100] > > 2017-05-22 18:30:13,060 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/7.1.14.242:57021, > rmtAddr=/10.132.80.29:47103] > > 2017-05-22 18:30:22,144 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/7.1.14.242:57022, > rmtAddr=/10.132.80.22:47103] > > 2017-05-22 18:30:26,513 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/7.1.14.242:57024, > rmtAddr=/10.132.80.20:47101] > > 2017-05-22 18:30:28,526 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/7.1.14.242:57025, > rmtAddr=/10.132.80.30:47103] > > > > Comparing the same code that is executed inside of the UAT environment > (so not using the VPN): > > 2017-05-22 18:22:18,102 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/10.175.11.38:53288, > rmtAddr=/10.175.11.58:47100] > > 2017-05-22 18:22:18,105 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/10.175.11.38:45890, > rmtAddr=/10.175.11.54:47101] > > 2017-05-22 18:22:18,108 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/127.0.0.1:47582, > rmtAddr=/127.0.0.1:47100] > > 2017-05-22 18:22:18,111 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/127.0.0.1:45240, > rmtAddr=/127.0.0.1:47103] > > 2017-05-22 18:22:18,114 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/10.175.11.38:46280, > rmtAddr=/10.175.11.15:47100] > > 2017-05-22 18:22:18,118 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/10.132.80.21:51476, > rmtAddr=/10.132.80.29:47103] > > 2017-05-22 18:22:18,120 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/10.132.80.21:56274, > rmtAddr=pocfd-master1/10.132.80.22:47103] > > 2017-05-22 18:22:18,124 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/10.132.80.21:53558, > rmtAddr=pocfd-ignite1/10.132.80.20:47101] > > 2017-05-22 18:22:18,127 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/10.132.80.21:56216, > rmtAddr=/10.132.80.30:47103] > > > > This is a design flaw in the Ignite code, as we are relying on the > client's network behaving in a particular way (i.e., port opening being very fast). > We should instead try to mask this potential slowness by establishing > connections in parallel, and waiting on the results. > > > > I would like to hear others thoughts and comment before we open a JIRA > to look at this. > > > > Regards > > Mike > > |
Michael,
I see your point. I think it must not be too hard to start asynchronously establishing connections to all the needed nodes. I've created respective issue in Jira: https://issues.apache.org/jira/browse/IGNITE-5277 Sergi 2017-05-23 11:56 GMT+03:00 Michael Griggs <[hidden email]>: > Hi Val > > This is precisely my point: it's only a minor optimization until the point > when establishing each connection takes 3-4 seconds, and we establish 32 of > them in sequence. At that point it becomes a serious issue: the customer > cannot run SQL queries from their development machines without them timing > out once out of every two or three runs. These kind of problems undermine > confidence in Ignite. > > Mike > > > -----Original Message----- > From: Valentin Kulichenko [mailto:[hidden email]] > Sent: 22 May 2017 19:15 > To: [hidden email] > Subject: Re: Inefficient approach to executing remote SQL queries > > Hi Mike, > > Generally, establishing connections in parallel could make sense, but note > that in most this would be a minor optimization, because: > > - Under load connections are established once and then reused. If you > observe disconnections during application lifetime under load, then > probably this should be addressed first. > - Actual communication is asynchronous, we use NIO for this. If > connection already exists, sendGeneric() basically just puts a message > into > a queue. > > -Val > > On Mon, May 22, 2017 at 7:04 PM, Michael Griggs < > [hidden email] > > wrote: > > > Hi Igniters, > > > > > > > > Whilst diagnosing a problem with a slow query, I became aware of a > > potential issue in the Ignite codebase. When executing a SQL query > > that is to run remotely, the IgniteH2Indexing#send() method is called, > > with a Collection<ClusterNode> as one of its parameters. This > > collection is iterated sequentially, and ctx.io().sendGeneric() is > > called synchronously for each node. This is inefficient if > > > > > > > > a) This is the first execution of a query, and thus TCP connections > > have to be established > > > > b) The cost of establishing a TCP connection is high > > > > > > > > And optionally > > > > > > > > c) There are a large number of nodes in the cluster > > > > > > > > In my current situation, developers want to run test queries from > > their code running locally, but connected via VPN to their UAT server > > environment. > > The > > cost of opening a TCP connection is in the multiple seconds, as you > > can see from this Ignite log file snippet: > > > > 2017-05-22 18:29:48,908 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/7.1.14.242:56924, > > rmtAddr=/10.132.80.3:47100] > > > > 2017-05-22 18:29:52,294 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/7.1.14.242:56923, > > rmtAddr=/10.132.80.30:47102] > > > > 2017-05-22 18:29:58,659 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/7.1.14.242:56971, > > rmtAddr=/10.132.80.23:47101] > > > > 2017-05-22 18:30:03,183 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/7.1.14.242:56972, > > rmtAddr=/10.132.80.21:47100] > > > > 2017-05-22 18:30:06,039 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/7.1.14.242:56973, > > rmtAddr=/10.132.80.21:47103] > > > > 2017-05-22 18:30:10,828 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/7.1.14.242:57020, > > rmtAddr=/10.132.80.20:47100] > > > > 2017-05-22 18:30:13,060 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/7.1.14.242:57021, > > rmtAddr=/10.132.80.29:47103] > > > > 2017-05-22 18:30:22,144 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/7.1.14.242:57022, > > rmtAddr=/10.132.80.22:47103] > > > > 2017-05-22 18:30:26,513 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/7.1.14.242:57024, > > rmtAddr=/10.132.80.20:47101] > > > > 2017-05-22 18:30:28,526 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/7.1.14.242:57025, > > rmtAddr=/10.132.80.30:47103] > > > > > > > > Comparing the same code that is executed inside of the UAT environment > > (so not using the VPN): > > > > 2017-05-22 18:22:18,102 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/10.175.11.38:53288, > > rmtAddr=/10.175.11.58:47100] > > > > 2017-05-22 18:22:18,105 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/10.175.11.38:45890, > > rmtAddr=/10.175.11.54:47101] > > > > 2017-05-22 18:22:18,108 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/127.0.0.1:47582, > > rmtAddr=/127.0.0.1:47100] > > > > 2017-05-22 18:22:18,111 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/127.0.0.1:45240, > > rmtAddr=/127.0.0.1:47103] > > > > 2017-05-22 18:22:18,114 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/10.175.11.38:46280, > > rmtAddr=/10.175.11.15:47100] > > > > 2017-05-22 18:22:18,118 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/10.132.80.21:51476, > > rmtAddr=/10.132.80.29:47103] > > > > 2017-05-22 18:22:18,120 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/10.132.80.21:56274, > > rmtAddr=pocfd-master1/10.132.80.22:47103] > > > > 2017-05-22 18:22:18,124 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/10.132.80.21:53558, > > rmtAddr=pocfd-ignite1/10.132.80.20:47101] > > > > 2017-05-22 18:22:18,127 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/10.132.80.21:56216, > > rmtAddr=/10.132.80.30:47103] > > > > > > > > This is a design flaw in the Ignite code, as we are relying on the > > client's network behaving in a particular way (i.e., port opening being > very fast). > > We should instead try to mask this potential slowness by establishing > > connections in parallel, and waiting on the results. > > > > > > > > I would like to hear others thoughts and comment before we open a JIRA > > to look at this. > > > > > > > > Regards > > > > Mike > > > > > > |
Why do we turn off the connections, once established? Why not keep them
open, until an endpoint explicitly closes them? On Tue, May 23, 2017 at 2:16 AM, Sergi Vladykin <[hidden email]> wrote: > Michael, > > I see your point. I think it must not be too hard to start asynchronously > establishing connections to all the needed nodes. > > I've created respective issue in Jira: > https://issues.apache.org/jira/browse/IGNITE-5277 > > Sergi > > 2017-05-23 11:56 GMT+03:00 Michael Griggs <[hidden email]>: > > > Hi Val > > > > This is precisely my point: it's only a minor optimization until the > point > > when establishing each connection takes 3-4 seconds, and we establish 32 > of > > them in sequence. At that point it becomes a serious issue: the customer > > cannot run SQL queries from their development machines without them > timing > > out once out of every two or three runs. These kind of problems > undermine > > confidence in Ignite. > > > > Mike > > > > > > -----Original Message----- > > From: Valentin Kulichenko [mailto:[hidden email]] > > Sent: 22 May 2017 19:15 > > To: [hidden email] > > Subject: Re: Inefficient approach to executing remote SQL queries > > > > Hi Mike, > > > > Generally, establishing connections in parallel could make sense, but > note > > that in most this would be a minor optimization, because: > > > > - Under load connections are established once and then reused. If you > > observe disconnections during application lifetime under load, then > > probably this should be addressed first. > > - Actual communication is asynchronous, we use NIO for this. If > > connection already exists, sendGeneric() basically just puts a message > > into > > a queue. > > > > -Val > > > > On Mon, May 22, 2017 at 7:04 PM, Michael Griggs < > > [hidden email] > > > wrote: > > > > > Hi Igniters, > > > > > > > > > > > > Whilst diagnosing a problem with a slow query, I became aware of a > > > potential issue in the Ignite codebase. When executing a SQL query > > > that is to run remotely, the IgniteH2Indexing#send() method is called, > > > with a Collection<ClusterNode> as one of its parameters. This > > > collection is iterated sequentially, and ctx.io().sendGeneric() is > > > called synchronously for each node. This is inefficient if > > > > > > > > > > > > a) This is the first execution of a query, and thus TCP > connections > > > have to be established > > > > > > b) The cost of establishing a TCP connection is high > > > > > > > > > > > > And optionally > > > > > > > > > > > > c) There are a large number of nodes in the cluster > > > > > > > > > > > > In my current situation, developers want to run test queries from > > > their code running locally, but connected via VPN to their UAT server > > > environment. > > > The > > > cost of opening a TCP connection is in the multiple seconds, as you > > > can see from this Ignite log file snippet: > > > > > > 2017-05-22 18:29:48,908 INFO [TcpCommunicationSpi] - Established > > > outgoing communication connection [locAddr=/7.1.14.242:56924, > > > rmtAddr=/10.132.80.3:47100] > > > > > > 2017-05-22 18:29:52,294 INFO [TcpCommunicationSpi] - Established > > > outgoing communication connection [locAddr=/7.1.14.242:56923, > > > rmtAddr=/10.132.80.30:47102] > > > > > > 2017-05-22 18:29:58,659 INFO [TcpCommunicationSpi] - Established > > > outgoing communication connection [locAddr=/7.1.14.242:56971, > > > rmtAddr=/10.132.80.23:47101] > > > > > > 2017-05-22 18:30:03,183 INFO [TcpCommunicationSpi] - Established > > > outgoing communication connection [locAddr=/7.1.14.242:56972, > > > rmtAddr=/10.132.80.21:47100] > > > > > > 2017-05-22 18:30:06,039 INFO [TcpCommunicationSpi] - Established > > > outgoing communication connection [locAddr=/7.1.14.242:56973, > > > rmtAddr=/10.132.80.21:47103] > > > > > > 2017-05-22 18:30:10,828 INFO [TcpCommunicationSpi] - Established > > > outgoing communication connection [locAddr=/7.1.14.242:57020, > > > rmtAddr=/10.132.80.20:47100] > > > > > > 2017-05-22 18:30:13,060 INFO [TcpCommunicationSpi] - Established > > > outgoing communication connection [locAddr=/7.1.14.242:57021, > > > rmtAddr=/10.132.80.29:47103] > > > > > > 2017-05-22 18:30:22,144 INFO [TcpCommunicationSpi] - Established > > > outgoing communication connection [locAddr=/7.1.14.242:57022, > > > rmtAddr=/10.132.80.22:47103] > > > > > > 2017-05-22 18:30:26,513 INFO [TcpCommunicationSpi] - Established > > > outgoing communication connection [locAddr=/7.1.14.242:57024, > > > rmtAddr=/10.132.80.20:47101] > > > > > > 2017-05-22 18:30:28,526 INFO [TcpCommunicationSpi] - Established > > > outgoing communication connection [locAddr=/7.1.14.242:57025, > > > rmtAddr=/10.132.80.30:47103] > > > > > > > > > > > > Comparing the same code that is executed inside of the UAT environment > > > (so not using the VPN): > > > > > > 2017-05-22 18:22:18,102 INFO [TcpCommunicationSpi] - Established > > > outgoing communication connection [locAddr=/10.175.11.38:53288, > > > rmtAddr=/10.175.11.58:47100] > > > > > > 2017-05-22 18:22:18,105 INFO [TcpCommunicationSpi] - Established > > > outgoing communication connection [locAddr=/10.175.11.38:45890, > > > rmtAddr=/10.175.11.54:47101] > > > > > > 2017-05-22 18:22:18,108 INFO [TcpCommunicationSpi] - Established > > > outgoing communication connection [locAddr=/127.0.0.1:47582, > > > rmtAddr=/127.0.0.1:47100] > > > > > > 2017-05-22 18:22:18,111 INFO [TcpCommunicationSpi] - Established > > > outgoing communication connection [locAddr=/127.0.0.1:45240, > > > rmtAddr=/127.0.0.1:47103] > > > > > > 2017-05-22 18:22:18,114 INFO [TcpCommunicationSpi] - Established > > > outgoing communication connection [locAddr=/10.175.11.38:46280, > > > rmtAddr=/10.175.11.15:47100] > > > > > > 2017-05-22 18:22:18,118 INFO [TcpCommunicationSpi] - Established > > > outgoing communication connection [locAddr=/10.132.80.21:51476, > > > rmtAddr=/10.132.80.29:47103] > > > > > > 2017-05-22 18:22:18,120 INFO [TcpCommunicationSpi] - Established > > > outgoing communication connection [locAddr=/10.132.80.21:56274, > > > rmtAddr=pocfd-master1/10.132.80.22:47103] > > > > > > 2017-05-22 18:22:18,124 INFO [TcpCommunicationSpi] - Established > > > outgoing communication connection [locAddr=/10.132.80.21:53558, > > > rmtAddr=pocfd-ignite1/10.132.80.20:47101] > > > > > > 2017-05-22 18:22:18,127 INFO [TcpCommunicationSpi] - Established > > > outgoing communication connection [locAddr=/10.132.80.21:56216, > > > rmtAddr=/10.132.80.30:47103] > > > > > > > > > > > > This is a design flaw in the Ignite code, as we are relying on the > > > client's network behaving in a particular way (i.e., port opening being > > very fast). > > > We should instead try to mask this potential slowness by establishing > > > connections in parallel, and waiting on the results. > > > > > > > > > > > > I would like to hear others thoughts and comment before we open a JIRA > > > to look at this. > > > > > > > > > > > > Regards > > > > > > Mike > > > > > > > > > > > |
The problem here is with the initial opening of connections. With a client
who connects and disconnects quickly, and frequently, a 30-second plus connection time is not workable. Mike On 23 May 2017 6:51 pm, "Dmitriy Setrakyan" <[hidden email]> wrote: > Why do we turn off the connections, once established? Why not keep them > open, until an endpoint explicitly closes them? > > On Tue, May 23, 2017 at 2:16 AM, Sergi Vladykin <[hidden email]> > wrote: > > > Michael, > > > > I see your point. I think it must not be too hard to start asynchronously > > establishing connections to all the needed nodes. > > > > I've created respective issue in Jira: > > https://issues.apache.org/jira/browse/IGNITE-5277 > > > > Sergi > > > > 2017-05-23 11:56 GMT+03:00 Michael Griggs <[hidden email]>: > > > > > Hi Val > > > > > > This is precisely my point: it's only a minor optimization until the > > point > > > when establishing each connection takes 3-4 seconds, and we establish > 32 > > of > > > them in sequence. At that point it becomes a serious issue: the > customer > > > cannot run SQL queries from their development machines without them > > timing > > > out once out of every two or three runs. These kind of problems > > undermine > > > confidence in Ignite. > > > > > > Mike > > > > > > > > > -----Original Message----- > > > From: Valentin Kulichenko [mailto:[hidden email]] > > > Sent: 22 May 2017 19:15 > > > To: [hidden email] > > > Subject: Re: Inefficient approach to executing remote SQL queries > > > > > > Hi Mike, > > > > > > Generally, establishing connections in parallel could make sense, but > > note > > > that in most this would be a minor optimization, because: > > > > > > - Under load connections are established once and then reused. If > you > > > observe disconnections during application lifetime under load, then > > > probably this should be addressed first. > > > - Actual communication is asynchronous, we use NIO for this. If > > > connection already exists, sendGeneric() basically just puts a > message > > > into > > > a queue. > > > > > > -Val > > > > > > On Mon, May 22, 2017 at 7:04 PM, Michael Griggs < > > > [hidden email] > > > > wrote: > > > > > > > Hi Igniters, > > > > > > > > > > > > > > > > Whilst diagnosing a problem with a slow query, I became aware of a > > > > potential issue in the Ignite codebase. When executing a SQL query > > > > that is to run remotely, the IgniteH2Indexing#send() method is > called, > > > > with a Collection<ClusterNode> as one of its parameters. This > > > > collection is iterated sequentially, and ctx.io().sendGeneric() is > > > > called synchronously for each node. This is inefficient if > > > > > > > > > > > > > > > > a) This is the first execution of a query, and thus TCP > > connections > > > > have to be established > > > > > > > > b) The cost of establishing a TCP connection is high > > > > > > > > > > > > > > > > And optionally > > > > > > > > > > > > > > > > c) There are a large number of nodes in the cluster > > > > > > > > > > > > > > > > In my current situation, developers want to run test queries from > > > > their code running locally, but connected via VPN to their UAT server > > > > environment. > > > > The > > > > cost of opening a TCP connection is in the multiple seconds, as you > > > > can see from this Ignite log file snippet: > > > > > > > > 2017-05-22 18:29:48,908 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/7.1.14.242:56924, > > > > rmtAddr=/10.132.80.3:47100] > > > > > > > > 2017-05-22 18:29:52,294 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/7.1.14.242:56923, > > > > rmtAddr=/10.132.80.30:47102] > > > > > > > > 2017-05-22 18:29:58,659 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/7.1.14.242:56971, > > > > rmtAddr=/10.132.80.23:47101] > > > > > > > > 2017-05-22 18:30:03,183 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/7.1.14.242:56972, > > > > rmtAddr=/10.132.80.21:47100] > > > > > > > > 2017-05-22 18:30:06,039 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/7.1.14.242:56973, > > > > rmtAddr=/10.132.80.21:47103] > > > > > > > > 2017-05-22 18:30:10,828 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/7.1.14.242:57020, > > > > rmtAddr=/10.132.80.20:47100] > > > > > > > > 2017-05-22 18:30:13,060 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/7.1.14.242:57021, > > > > rmtAddr=/10.132.80.29:47103] > > > > > > > > 2017-05-22 18:30:22,144 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/7.1.14.242:57022, > > > > rmtAddr=/10.132.80.22:47103] > > > > > > > > 2017-05-22 18:30:26,513 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/7.1.14.242:57024, > > > > rmtAddr=/10.132.80.20:47101] > > > > > > > > 2017-05-22 18:30:28,526 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/7.1.14.242:57025, > > > > rmtAddr=/10.132.80.30:47103] > > > > > > > > > > > > > > > > Comparing the same code that is executed inside of the UAT > environment > > > > (so not using the VPN): > > > > > > > > 2017-05-22 18:22:18,102 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/10.175.11.38:53288, > > > > rmtAddr=/10.175.11.58:47100] > > > > > > > > 2017-05-22 18:22:18,105 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/10.175.11.38:45890, > > > > rmtAddr=/10.175.11.54:47101] > > > > > > > > 2017-05-22 18:22:18,108 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/127.0.0.1:47582, > > > > rmtAddr=/127.0.0.1:47100] > > > > > > > > 2017-05-22 18:22:18,111 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/127.0.0.1:45240, > > > > rmtAddr=/127.0.0.1:47103] > > > > > > > > 2017-05-22 18:22:18,114 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/10.175.11.38:46280, > > > > rmtAddr=/10.175.11.15:47100] > > > > > > > > 2017-05-22 18:22:18,118 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/10.132.80.21:51476, > > > > rmtAddr=/10.132.80.29:47103] > > > > > > > > 2017-05-22 18:22:18,120 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/10.132.80.21:56274, > > > > rmtAddr=pocfd-master1/10.132.80.22:47103] > > > > > > > > 2017-05-22 18:22:18,124 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/10.132.80.21:53558, > > > > rmtAddr=pocfd-ignite1/10.132.80.20:47101] > > > > > > > > 2017-05-22 18:22:18,127 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/10.132.80.21:56216, > > > > rmtAddr=/10.132.80.30:47103] > > > > > > > > > > > > > > > > This is a design flaw in the Ignite code, as we are relying on the > > > > client's network behaving in a particular way (i.e., port opening > being > > > very fast). > > > > We should instead try to mask this potential slowness by establishing > > > > connections in parallel, and waiting on the results. > > > > > > > > > > > > > > > > I would like to hear others thoughts and comment before we open a > JIRA > > > > to look at this. > > > > > > > > > > > > > > > > Regards > > > > > > > > Mike > > > > > > > > > > > > > > > > > |
Got it. In this case we should definitely connect concurrently, not in
parallel. I set the ticket to 2.1 release. Let's see if anyone picks it up. On Tue, May 23, 2017 at 2:56 PM, Michael Griggs <[hidden email] > wrote: > The problem here is with the initial opening of connections. With a client > who connects and disconnects quickly, and frequently, a 30-second plus > connection time is not workable. > > Mike > > On 23 May 2017 6:51 pm, "Dmitriy Setrakyan" <[hidden email]> wrote: > > > Why do we turn off the connections, once established? Why not keep them > > open, until an endpoint explicitly closes them? > > > > On Tue, May 23, 2017 at 2:16 AM, Sergi Vladykin < > [hidden email]> > > wrote: > > > > > Michael, > > > > > > I see your point. I think it must not be too hard to start > asynchronously > > > establishing connections to all the needed nodes. > > > > > > I've created respective issue in Jira: > > > https://issues.apache.org/jira/browse/IGNITE-5277 > > > > > > Sergi > > > > > > 2017-05-23 11:56 GMT+03:00 Michael Griggs <[hidden email] > >: > > > > > > > Hi Val > > > > > > > > This is precisely my point: it's only a minor optimization until the > > > point > > > > when establishing each connection takes 3-4 seconds, and we establish > > 32 > > > of > > > > them in sequence. At that point it becomes a serious issue: the > > customer > > > > cannot run SQL queries from their development machines without them > > > timing > > > > out once out of every two or three runs. These kind of problems > > > undermine > > > > confidence in Ignite. > > > > > > > > Mike > > > > > > > > > > > > -----Original Message----- > > > > From: Valentin Kulichenko [mailto:[hidden email]] > > > > Sent: 22 May 2017 19:15 > > > > To: [hidden email] > > > > Subject: Re: Inefficient approach to executing remote SQL queries > > > > > > > > Hi Mike, > > > > > > > > Generally, establishing connections in parallel could make sense, but > > > note > > > > that in most this would be a minor optimization, because: > > > > > > > > - Under load connections are established once and then reused. If > > you > > > > observe disconnections during application lifetime under load, > then > > > > probably this should be addressed first. > > > > - Actual communication is asynchronous, we use NIO for this. If > > > > connection already exists, sendGeneric() basically just puts a > > message > > > > into > > > > a queue. > > > > > > > > -Val > > > > > > > > On Mon, May 22, 2017 at 7:04 PM, Michael Griggs < > > > > [hidden email] > > > > > wrote: > > > > > > > > > Hi Igniters, > > > > > > > > > > > > > > > > > > > > Whilst diagnosing a problem with a slow query, I became aware of a > > > > > potential issue in the Ignite codebase. When executing a SQL query > > > > > that is to run remotely, the IgniteH2Indexing#send() method is > > called, > > > > > with a Collection<ClusterNode> as one of its parameters. This > > > > > collection is iterated sequentially, and ctx.io().sendGeneric() is > > > > > called synchronously for each node. This is inefficient if > > > > > > > > > > > > > > > > > > > > a) This is the first execution of a query, and thus TCP > > > connections > > > > > have to be established > > > > > > > > > > b) The cost of establishing a TCP connection is high > > > > > > > > > > > > > > > > > > > > And optionally > > > > > > > > > > > > > > > > > > > > c) There are a large number of nodes in the cluster > > > > > > > > > > > > > > > > > > > > In my current situation, developers want to run test queries from > > > > > their code running locally, but connected via VPN to their UAT > server > > > > > environment. > > > > > The > > > > > cost of opening a TCP connection is in the multiple seconds, as you > > > > > can see from this Ignite log file snippet: > > > > > > > > > > 2017-05-22 18:29:48,908 INFO [TcpCommunicationSpi] - Established > > > > > outgoing communication connection [locAddr=/7.1.14.242:56924, > > > > > rmtAddr=/10.132.80.3:47100] > > > > > > > > > > 2017-05-22 18:29:52,294 INFO [TcpCommunicationSpi] - Established > > > > > outgoing communication connection [locAddr=/7.1.14.242:56923, > > > > > rmtAddr=/10.132.80.30:47102] > > > > > > > > > > 2017-05-22 18:29:58,659 INFO [TcpCommunicationSpi] - Established > > > > > outgoing communication connection [locAddr=/7.1.14.242:56971, > > > > > rmtAddr=/10.132.80.23:47101] > > > > > > > > > > 2017-05-22 18:30:03,183 INFO [TcpCommunicationSpi] - Established > > > > > outgoing communication connection [locAddr=/7.1.14.242:56972, > > > > > rmtAddr=/10.132.80.21:47100] > > > > > > > > > > 2017-05-22 18:30:06,039 INFO [TcpCommunicationSpi] - Established > > > > > outgoing communication connection [locAddr=/7.1.14.242:56973, > > > > > rmtAddr=/10.132.80.21:47103] > > > > > > > > > > 2017-05-22 18:30:10,828 INFO [TcpCommunicationSpi] - Established > > > > > outgoing communication connection [locAddr=/7.1.14.242:57020, > > > > > rmtAddr=/10.132.80.20:47100] > > > > > > > > > > 2017-05-22 18:30:13,060 INFO [TcpCommunicationSpi] - Established > > > > > outgoing communication connection [locAddr=/7.1.14.242:57021, > > > > > rmtAddr=/10.132.80.29:47103] > > > > > > > > > > 2017-05-22 18:30:22,144 INFO [TcpCommunicationSpi] - Established > > > > > outgoing communication connection [locAddr=/7.1.14.242:57022, > > > > > rmtAddr=/10.132.80.22:47103] > > > > > > > > > > 2017-05-22 18:30:26,513 INFO [TcpCommunicationSpi] - Established > > > > > outgoing communication connection [locAddr=/7.1.14.242:57024, > > > > > rmtAddr=/10.132.80.20:47101] > > > > > > > > > > 2017-05-22 18:30:28,526 INFO [TcpCommunicationSpi] - Established > > > > > outgoing communication connection [locAddr=/7.1.14.242:57025, > > > > > rmtAddr=/10.132.80.30:47103] > > > > > > > > > > > > > > > > > > > > Comparing the same code that is executed inside of the UAT > > environment > > > > > (so not using the VPN): > > > > > > > > > > 2017-05-22 18:22:18,102 INFO [TcpCommunicationSpi] - Established > > > > > outgoing communication connection [locAddr=/10.175.11.38:53288, > > > > > rmtAddr=/10.175.11.58:47100] > > > > > > > > > > 2017-05-22 18:22:18,105 INFO [TcpCommunicationSpi] - Established > > > > > outgoing communication connection [locAddr=/10.175.11.38:45890, > > > > > rmtAddr=/10.175.11.54:47101] > > > > > > > > > > 2017-05-22 18:22:18,108 INFO [TcpCommunicationSpi] - Established > > > > > outgoing communication connection [locAddr=/127.0.0.1:47582, > > > > > rmtAddr=/127.0.0.1:47100] > > > > > > > > > > 2017-05-22 18:22:18,111 INFO [TcpCommunicationSpi] - Established > > > > > outgoing communication connection [locAddr=/127.0.0.1:45240, > > > > > rmtAddr=/127.0.0.1:47103] > > > > > > > > > > 2017-05-22 18:22:18,114 INFO [TcpCommunicationSpi] - Established > > > > > outgoing communication connection [locAddr=/10.175.11.38:46280, > > > > > rmtAddr=/10.175.11.15:47100] > > > > > > > > > > 2017-05-22 18:22:18,118 INFO [TcpCommunicationSpi] - Established > > > > > outgoing communication connection [locAddr=/10.132.80.21:51476, > > > > > rmtAddr=/10.132.80.29:47103] > > > > > > > > > > 2017-05-22 18:22:18,120 INFO [TcpCommunicationSpi] - Established > > > > > outgoing communication connection [locAddr=/10.132.80.21:56274, > > > > > rmtAddr=pocfd-master1/10.132.80.22:47103] > > > > > > > > > > 2017-05-22 18:22:18,124 INFO [TcpCommunicationSpi] - Established > > > > > outgoing communication connection [locAddr=/10.132.80.21:53558, > > > > > rmtAddr=pocfd-ignite1/10.132.80.20:47101] > > > > > > > > > > 2017-05-22 18:22:18,127 INFO [TcpCommunicationSpi] - Established > > > > > outgoing communication connection [locAddr=/10.132.80.21:56216, > > > > > rmtAddr=/10.132.80.30:47103] > > > > > > > > > > > > > > > > > > > > This is a design flaw in the Ignite code, as we are relying on the > > > > > client's network behaving in a particular way (i.e., port opening > > being > > > > very fast). > > > > > We should instead try to mask this potential slowness by > establishing > > > > > connections in parallel, and waiting on the results. > > > > > > > > > > > > > > > > > > > > I would like to hear others thoughts and comment before we open a > > JIRA > > > > > to look at this. > > > > > > > > > > > > > > > > > > > > Regards > > > > > > > > > > Mike > > > > > > > > > > > > > > > > > > > > > > > > |
Free forum by Nabble | Edit this page |