Igniters (specifically Sergi),
It has come to my attention today that nested sub-select statements, when used in combination with non-collocated joins do not work properly in Ignite. So a query like this, where A, B, and C are all stored in Partitioned caches and are **not** collocated at all, will not work. > *select * from A, B where a.id <http://a.id> = b.a_id and b.somefield in > (select somefield from C where c.zipcode = ?)* The main reason it is not supported right now is because, in the absence of collocation, such query may create N^N complexity and it was decided that it is best not supporting it at all. However, I am not sure why N^N complexity is required. Why not support it as follows? 1. execute the nested subquery and store the result in a temporary Replicated table. 2. execute the original query and use the temporary Replicated table instead of the sub-query. Sergi, given that you are the author of the code, can you provide some insight here? Thanks, D. |
The approach you are suggesting will be very complex for current
implementation. Also most probably very inefficient. Actually I was thinking about another but similar approach: in many cases we can rewrite a subquery in WHERE clause into JOIN subquery. Like the following: SELECT x.* FROM x WHERE x.a = (SELECT MAX(y.a) FROM y WHERE y.b = x.b) ===> SELECT x.* FROM x, (SELECT MAX(y.a), y.b FROM y GROUP BY y.b) z WHERE x.b = z.b There are still problems here: 1. We will not be able to rewrite all the queries. 2. We should not rewrite queries like this by default because this will have a noticeable performance penalty for correctly collocated subqueries. Probably we will need some flag for that. Sergi 2017-05-31 21:26 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > Igniters (specifically Sergi), > > It has come to my attention today that nested sub-select statements, when > used in combination with non-collocated joins do not work properly in > Ignite. > > So a query like this, where A, B, and C are all stored in Partitioned > caches and are **not** collocated at all, will not work. > > > > *select * from A, B where a.id <http://a.id> = b.a_id and b.somefield in > > (select somefield from C where c.zipcode = ?)* > > > The main reason it is not supported right now is because, in the absence of > collocation, such query may create N^N complexity and it was decided that > it is best not supporting it at all. > > However, I am not sure why N^N complexity is required. Why not support it > as follows? > > 1. execute the nested subquery and store the result in a temporary > Replicated table. > 2. execute the original query and use the temporary Replicated table > instead of the sub-query. > > Sergi, given that you are the author of the code, can you provide some > insight here? > > Thanks, > D. > |
Sergi,
I am OK with any improvement here, but we need to be able to clearly state to a user what is supported and what is not. If we cannot clearly describe it, I would rather not support it at all and throw an exception. Is this going to be possible with your solution? D. On Thu, Jun 1, 2017 at 2:51 AM, Sergi Vladykin <[hidden email]> wrote: > The approach you are suggesting will be very complex for current > implementation. Also most probably very inefficient. > > Actually I was thinking about another but similar approach: in many cases > we can rewrite a subquery in WHERE clause into JOIN subquery. > > Like the following: > > SELECT x.* FROM x WHERE x.a = (SELECT MAX(y.a) FROM y WHERE y.b = x.b) > > ===> > > SELECT x.* FROM x, (SELECT MAX(y.a), y.b FROM y GROUP BY y.b) z WHERE x.b = > z.b > > There are still problems here: > > 1. We will not be able to rewrite all the queries. > 2. We should not rewrite queries like this by default because this will > have a noticeable performance penalty for correctly collocated subqueries. > Probably we will need some flag for that. > > Sergi > > 2017-05-31 21:26 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > > Igniters (specifically Sergi), > > > > It has come to my attention today that nested sub-select statements, when > > used in combination with non-collocated joins do not work properly in > > Ignite. > > > > So a query like this, where A, B, and C are all stored in Partitioned > > caches and are **not** collocated at all, will not work. > > > > > > > *select * from A, B where a.id <http://a.id> = b.a_id and b.somefield > in > > > (select somefield from C where c.zipcode = ?)* > > > > > > The main reason it is not supported right now is because, in the absence > of > > collocation, such query may create N^N complexity and it was decided that > > it is best not supporting it at all. > > > > However, I am not sure why N^N complexity is required. Why not support it > > as follows? > > > > 1. execute the nested subquery and store the result in a temporary > > Replicated table. > > 2. execute the original query and use the temporary Replicated table > > instead of the sub-query. > > > > Sergi, given that you are the author of the code, can you provide some > > insight here? > > > > Thanks, > > D. > > > |
I guess it must work the following way:
If distributed joins are enabled we can try to prove that the subquery is collocated, if we can't then try to rewrite it, if we can't, then throw an exception. Still this can not be done 100% correct, probably we have to have some flag which allows to disable this subquery rewriting. Sergi 2017-06-01 21:33 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > Sergi, > > I am OK with any improvement here, but we need to be able to clearly state > to a user what is supported and what is not. If we cannot clearly describe > it, I would rather not support it at all and throw an exception. > > Is this going to be possible with your solution? > > D. > > On Thu, Jun 1, 2017 at 2:51 AM, Sergi Vladykin <[hidden email]> > wrote: > > > The approach you are suggesting will be very complex for current > > implementation. Also most probably very inefficient. > > > > Actually I was thinking about another but similar approach: in many cases > > we can rewrite a subquery in WHERE clause into JOIN subquery. > > > > Like the following: > > > > SELECT x.* FROM x WHERE x.a = (SELECT MAX(y.a) FROM y WHERE y.b = x.b) > > > > ===> > > > > SELECT x.* FROM x, (SELECT MAX(y.a), y.b FROM y GROUP BY y.b) z WHERE > x.b = > > z.b > > > > There are still problems here: > > > > 1. We will not be able to rewrite all the queries. > > 2. We should not rewrite queries like this by default because this will > > have a noticeable performance penalty for correctly collocated > subqueries. > > Probably we will need some flag for that. > > > > Sergi > > > > 2017-05-31 21:26 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > > > > Igniters (specifically Sergi), > > > > > > It has come to my attention today that nested sub-select statements, > when > > > used in combination with non-collocated joins do not work properly in > > > Ignite. > > > > > > So a query like this, where A, B, and C are all stored in Partitioned > > > caches and are **not** collocated at all, will not work. > > > > > > > > > > *select * from A, B where a.id <http://a.id> = b.a_id and > b.somefield > > in > > > > (select somefield from C where c.zipcode = ?)* > > > > > > > > > The main reason it is not supported right now is because, in the > absence > > of > > > collocation, such query may create N^N complexity and it was decided > that > > > it is best not supporting it at all. > > > > > > However, I am not sure why N^N complexity is required. Why not support > it > > > as follows? > > > > > > 1. execute the nested subquery and store the result in a temporary > > > Replicated table. > > > 2. execute the original query and use the temporary Replicated table > > > instead of the sub-query. > > > > > > Sergi, given that you are the author of the code, can you provide some > > > insight here? > > > > > > Thanks, > > > D. > > > > > > |
On Thu, Jun 1, 2017 at 12:32 PM, Sergi Vladykin <[hidden email]>
wrote: > I guess it must work the following way: > > If distributed joins are enabled we can try to prove that the subquery is > collocated, if we can't then try to rewrite it, if we can't, then throw an > exception. > > Still this can not be done 100% correct, probably we have to have some flag > which allows to disable this subquery rewriting. > Sergi, but how do you explain to users what is supported and what is not? |
If you don't see an exception then it must be supported. This is the whole
point of this exception, right? Sergi 2017-06-01 22:50 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > On Thu, Jun 1, 2017 at 12:32 PM, Sergi Vladykin <[hidden email]> > wrote: > > > I guess it must work the following way: > > > > If distributed joins are enabled we can try to prove that the subquery is > > collocated, if we can't then try to rewrite it, if we can't, then throw > an > > exception. > > > > Still this can not be done 100% correct, probably we have to have some > flag > > which allows to disable this subquery rewriting. > > > > Sergi, but how do you explain to users what is supported and what is not? > |
On Thu, Jun 1, 2017 at 1:07 PM, Sergi Vladykin <[hidden email]>
wrote: > If you don't see an exception then it must be supported. This is the whole > point of this exception, right? > Exception is just to enforce the constraint. We still must clearly document what is supported. |
Here is a “known limitations” section on readme.io:
https://apacheignite.readme.io/docs/sql-queries#section-known-limitations Feel free to update it with the limitation discussed. — Denis > On Jun 1, 2017, at 1:19 PM, Dmitriy Setrakyan <[hidden email]> wrote: > > On Thu, Jun 1, 2017 at 1:07 PM, Sergi Vladykin <[hidden email]> > wrote: > >> If you don't see an exception then it must be supported. This is the whole >> point of this exception, right? >> > > Exception is just to enforce the constraint. We still must clearly document > what is supported. |
Free forum by Nabble | Edit this page |