Apache Ignite Developers - Legacy Mail Archive

Distributive SQL Joins

Classic

List

Threaded

7 messages Options

Sergi

Distributive SQL Joins

Igniters,

I'm going to start working on distributed SQL joins soon and want to put my
thoughts on this matter here.

To provide collocated and non-collocated joins we need to know affinity key
for each table.
For example we have a table `Organization(id int)` where `id` is affinity
key and `Person(orgId int)`
where `orgId` is affinity key. This way when we see join `Person p,
Organization o on p.orgId = o.id`
we know that it does not make sense to try to find joins between different
nodes because
if the affinity fields are equal then they must be on the same node. Also
this allows to handle
transitive collocated joins like `Person pe, Product pr join on pe.orgId =
pr.orgId` where `orgId`
is affinity key for `Product` but is not a primary key. This will be a
backward incompatible configuration change.
This configuration approach is consistent with what MemSQL does.

Obviously we can have multiple tables joined on different keys in the same
query.
Lets say `+` is a join on collocated key and `-` is a join on
non-collocated key.
Suppose we have the following join in query
`a + b + c - d + e - f`

A. We can run it the following way with shuffle:
run collocated `(a + b + c) = m` , `(d + e) = n` resulting into
`m - n - f`
Then there are two possibilities (because we have only 3 non-collocated
entities)
either they joined on the same key or not. In the the first case we can
shuffle them
in a single step `(m - n - f) = z` to achieve collocation on joined fields,
in the second we have
to do shuffle `(m - n) = k` and then shuffle `(k - f) = z` which will be
our
resulting entity which is known to be collocated on joined fields of `k`
and `f`.

B. Another approach is to request data from remote nodes as needed for a
join.
It means that we are running locally `(a + b + c)` and when we fetch a row
from there, we request a joining part for this row from `(d + e)` part.
And if it is known that we join on affinity key from `(d + e)` then we know
on which node this part exists exactly. Otherwise we have to broadcast this
request.
After that the same must happen with `f` part. Of course batching here is
applicable
as well, no need to request data for each node separately.

If we analyze these two approaches from the performance standpoint then
we can see that the best one is B with known affinity key of remote side.
It has the same number of messages as shuffle but reduces traffic because
on request we need to send only keys (while on shuffle we need to send the
whole local table part to be joined on a third node) and in response we
will receive only
joined row parts (while in shuffle we need to send the whole remote
table part to be joined on a third node).

Obviously B without known affinity key requires broadcast and is not
practically
useful. But it seems that such a case will be quite uncommon since it is a
join of two partitioned tables which are in turn collocated with two
different partitioned tables.

Since implementing approach A is much more complex (it will require more
complex
query planning and generation as well as more complex coordination between
nodes)
and B with known affinity keys is simpler and more effective
my preference is to implement B with known affinity keys, and forbid
case when we don't know remote affinity key.

Feel free to comment.

Sergi

Alexey Kuznetsov-2

Re: Distributive SQL Joins

Sergi,

Questions about plan "B" :)
1) It is possible to throw exception on query prepare state (fail fast) when
we don't know remote affinity key?
2) Could you provide an example when we don't know remote affinity key? I
think we always have some default affinity (no?)?

--
Alexey Kuznetsov
GridGain Systems
www.gridgain.com

Sergi

Re: Distributive SQL Joins

Alexey,

1. Yes, in my plan it should work exactly like that: if both keys in join
are affinity keys, then we are fully collocated, if only one then we can
run join remotely as described, if none of them we will fail to run the
query.

2. I mean we don't have values for these affinity keys in our local query
result to map requests to remote nodes.
Example:
Lest say we have 4 partitioned tables:
- Organization(id) with affinity key `id`.
- Person(id, orgId, name) with affinity key `orgId` (it means that it will
be collocated with `Organization`)
- Manufacturer(id) with affinity key `id`.
- Purchase(id, personId, manufId) with affinity key `manufId` (collocated
with `Manufacturer`)

As you can see `Purchase` has a reference to a `Person` and we may want to
join them by this reference in a query like this:

SELECT pe.name FROM Person pe JOIN Purchase pu ON pe.id = pu.personId WHERE
pu.id = ?

as you can see neither `pe.id` nor `pu.personId` is an affinity key here.
But if the `Person` has affinity key `id` and thus is not collocated with
`Organization`
we can run query on `Purchase`, take value of `personId` and find the
affinity node to get the needed `Person`.

Of course it is a restriction but there are multiple ways to workaround it,
so I don't think it is really a problem:

1. Use primary key as affinity key if table is used in such joins. This way
`Person` still can be joined to `Organization`
(less effective though) and `Pusrchase` can be joined to `Person` as well.
2. Use denormalization: instread of having `Purchase.personId` store
`Person` object itself there.
3. Introduce another entity which can duplicate data from `Person` but have
collocation needed for this failing query:
For our example it can be an entity PersonForPurchace(id, manufId, name)
with the same affinity key `manufId` as `Purchase`.
This way our query can be rewritten in fully collocated style:

SELECT pe.name FROM PersonForPurchace pe JOIN Purchase pu ON pe.id =
pu.personId AND pu.manufId = pe.manufId WHERE pu.id = ?

Sergi

2015-08-07 11:42 GMT+03:00 Alexey Kuznetsov <[hidden email]>:

> Sergi,
>
> Questions about plan "B" :)
> 1) It is possible to throw exception on query prepare state (fail fast)
> when
> we don't know remote affinity key?
> 2) Could you provide an example when we don't know remote affinity key? I
> think we always have some default affinity (no?)?
>
> --
> Alexey Kuznetsov
> GridGain Systems
> www.gridgain.com
>

dsetrakyan

Re: Distributive SQL Joins

Sergi,

I personally don't like that for certain types of queries we will be
throwing an exception.

After analyzing the approaches you suggested, I can think of cases where A
performs better than B, as well as when B performs better than A.

However, if you prefer B, I don't mind us taking that approach. As you have
mentioned yourself, in case of non-collocated non-affinity-ID queries, you
would require a broadcast which is a performance hit. I still vote that we
take this performance hit and do the broadcast (optimized with batching, of
course), and execute the query instead of throwing an exception.

D.

On Fri, Aug 7, 2015 at 3:40 AM, Sergi Vladykin <[hidden email]>
wrote:

> Alexey,
>
> 1. Yes, in my plan it should work exactly like that: if both keys in join
> are affinity keys, then we are fully collocated, if only one then we can
> run join remotely as described, if none of them we will fail to run the
> query.
>
> 2. I mean we don't have values for these affinity keys in our local query
> result to map requests to remote nodes.
> Example:
> Lest say we have 4 partitioned tables:
> - Organization(id) with affinity key `id`.
> - Person(id, orgId, name) with affinity key `orgId` (it means that it will
> be collocated with `Organization`)
> - Manufacturer(id) with affinity key `id`.
> - Purchase(id, personId, manufId) with affinity key `manufId` (collocated
> with `Manufacturer`)
>
> As you can see `Purchase` has a reference to a `Person` and we may want to
> join them by this reference in a query like this:
>
> SELECT pe.name FROM Person pe JOIN Purchase pu ON pe.id = pu.personId
> WHERE
> pu.id = ?
>
> as you can see neither `pe.id` nor `pu.personId` is an affinity key here.
> But if the `Person` has affinity key `id` and thus is not collocated with
> `Organization`
> we can run query on `Purchase`, take value of `personId` and find the
> affinity node to get the needed `Person`.
>
> Of course it is a restriction but there are multiple ways to workaround it,
> so I don't think it is really a problem:
>
> 1. Use primary key as affinity key if table is used in such joins. This way
> `Person` still can be joined to `Organization`
> (less effective though) and `Pusrchase` can be joined to `Person` as well.
> 2. Use denormalization: instread of having `Purchase.personId` store
> `Person` object itself there.
> 3. Introduce another entity which can duplicate data from `Person` but have
> collocation needed for this failing query:
> For our example it can be an entity PersonForPurchace(id, manufId, name)
> with the same affinity key `manufId` as `Purchase`.
> This way our query can be rewritten in fully collocated style:
>
> SELECT pe.name FROM PersonForPurchace pe JOIN Purchase pu ON pe.id =
> pu.personId AND pu.manufId = pe.manufId WHERE pu.id = ?
>
> Sergi
>
>
>
>
>
> 2015-08-07 11:42 GMT+03:00 Alexey Kuznetsov <[hidden email]>:
>
> > Sergi,
> >
> > Questions about plan "B" :)
> > 1) It is possible to throw exception on query prepare state (fail fast)
> > when
> > we don't know remote affinity key?
> > 2) Could you provide an example when we don't know remote affinity key? I
> > think we always have some default affinity (no?)?
> >
> > --
> > Alexey Kuznetsov
> > GridGain Systems
> > www.gridgain.com
> >
>

Sergi

Re: Distributive SQL Joins

I was thinking about protecting users from doing stupid things,
but ok, we can do a broadcast as well.

Sergi

2015-08-08 3:46 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:

> Sergi,
>
> I personally don't like that for certain types of queries we will be
> throwing an exception.
>
> After analyzing the approaches you suggested, I can think of cases where A
> performs better than B, as well as when B performs better than A.
>
> However, if you prefer B, I don't mind us taking that approach. As you have
> mentioned yourself, in case of non-collocated non-affinity-ID queries, you
> would require a broadcast which is a performance hit. I still vote that we
> take this performance hit and do the broadcast (optimized with batching, of
> course), and execute the query instead of throwing an exception.
>
> D.
>
> On Fri, Aug 7, 2015 at 3:40 AM, Sergi Vladykin <[hidden email]>
> wrote:
>
> > Alexey,
> >
> > 1. Yes, in my plan it should work exactly like that: if both keys in join
> > are affinity keys, then we are fully collocated, if only one then we can
> > run join remotely as described, if none of them we will fail to run the
> > query.
> >
> > 2. I mean we don't have values for these affinity keys in our local query
> > result to map requests to remote nodes.
> > Example:
> > Lest say we have 4 partitioned tables:
> > - Organization(id) with affinity key `id`.
> > - Person(id, orgId, name) with affinity key `orgId` (it means that it
> will
> > be collocated with `Organization`)
> > - Manufacturer(id) with affinity key `id`.
> > - Purchase(id, personId, manufId) with affinity key `manufId` (collocated
> > with `Manufacturer`)
> >
> > As you can see `Purchase` has a reference to a `Person` and we may want
> to
> > join them by this reference in a query like this:
> >
> > SELECT pe.name FROM Person pe JOIN Purchase pu ON pe.id = pu.personId
> > WHERE
> > pu.id = ?
> >
> > as you can see neither `pe.id` nor `pu.personId` is an affinity key
> here.
> > But if the `Person` has affinity key `id` and thus is not collocated with
> > `Organization`
> > we can run query on `Purchase`, take value of `personId` and find the
> > affinity node to get the needed `Person`.
> >
> > Of course it is a restriction but there are multiple ways to workaround
> it,
> > so I don't think it is really a problem:
> >
> > 1. Use primary key as affinity key if table is used in such joins. This
> way
> > `Person` still can be joined to `Organization`
> > (less effective though) and `Pusrchase` can be joined to `Person` as
> well.
> > 2. Use denormalization: instread of having `Purchase.personId` store
> > `Person` object itself there.
> > 3. Introduce another entity which can duplicate data from `Person` but
> have
> > collocation needed for this failing query:
> > For our example it can be an entity PersonForPurchace(id, manufId, name)
> > with the same affinity key `manufId` as `Purchase`.
> > This way our query can be rewritten in fully collocated style:
> >
> > SELECT pe.name FROM PersonForPurchace pe JOIN Purchase pu ON pe.id =
> > pu.personId AND pu.manufId = pe.manufId WHERE pu.id = ?
> >
> > Sergi
> >
> >
> >
> >
> >
> > 2015-08-07 11:42 GMT+03:00 Alexey Kuznetsov <[hidden email]>:
> >
> > > Sergi,
> > >
> > > Questions about plan "B" :)
> > > 1) It is possible to throw exception on query prepare state (fail fast)
> > > when
> > > we don't know remote affinity key?
> > > 2) Could you provide an example when we don't know remote affinity
> key? I
> > > think we always have some default affinity (no?)?
> > >
> > > --
> > > Alexey Kuznetsov
> > > GridGain Systems
> > > www.gridgain.com
> > >
> >
>

dsetrakyan

Re: Distributive SQL Joins

On Mon, Aug 10, 2015 at 11:26 PM, Sergi Vladykin <[hidden email]>
wrote:

> I was thinking about protecting users from doing stupid things,
> but ok, we can do a broadcast as well.
>

I think we should print out a warning in this case for sure. Is there a
Jira created for this? We should provide a link to this discussion there.

>
> Sergi
>
> 2015-08-08 3:46 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
>
> > Sergi,
> >
> > I personally don't like that for certain types of queries we will be
> > throwing an exception.
> >
> > After analyzing the approaches you suggested, I can think of cases where
> A
> > performs better than B, as well as when B performs better than A.
> >
> > However, if you prefer B, I don't mind us taking that approach. As you
> have
> > mentioned yourself, in case of non-collocated non-affinity-ID queries,
> you
> > would require a broadcast which is a performance hit. I still vote that
> we
> > take this performance hit and do the broadcast (optimized with batching,
> of
> > course), and execute the query instead of throwing an exception.
> >
> > D.
> >
> > On Fri, Aug 7, 2015 at 3:40 AM, Sergi Vladykin <[hidden email]
> >
> > wrote:
> >
> > > Alexey,
> > >
> > > 1. Yes, in my plan it should work exactly like that: if both keys in
> join
> > > are affinity keys, then we are fully collocated, if only one then we
> can
> > > run join remotely as described, if none of them we will fail to run the
> > > query.
> > >
> > > 2. I mean we don't have values for these affinity keys in our local
> query
> > > result to map requests to remote nodes.
> > > Example:
> > > Lest say we have 4 partitioned tables:
> > > - Organization(id) with affinity key `id`.
> > > - Person(id, orgId, name) with affinity key `orgId` (it means that it
> > will
> > > be collocated with `Organization`)
> > > - Manufacturer(id) with affinity key `id`.
> > > - Purchase(id, personId, manufId) with affinity key `manufId`
> (collocated
> > > with `Manufacturer`)
> > >
> > > As you can see `Purchase` has a reference to a `Person` and we may want
> > to
> > > join them by this reference in a query like this:
> > >
> > > SELECT pe.name FROM Person pe JOIN Purchase pu ON pe.id = pu.personId
> > > WHERE
> > > pu.id = ?
> > >
> > > as you can see neither `pe.id` nor `pu.personId` is an affinity key
> > here.
> > > But if the `Person` has affinity key `id` and thus is not collocated
> with
> > > `Organization`
> > > we can run query on `Purchase`, take value of `personId` and find the
> > > affinity node to get the needed `Person`.
> > >
> > > Of course it is a restriction but there are multiple ways to workaround
> > it,
> > > so I don't think it is really a problem:
> > >
> > > 1. Use primary key as affinity key if table is used in such joins. This
> > way
> > > `Person` still can be joined to `Organization`
> > > (less effective though) and `Pusrchase` can be joined to `Person` as
> > well.
> > > 2. Use denormalization: instread of having `Purchase.personId` store
> > > `Person` object itself there.
> > > 3. Introduce another entity which can duplicate data from `Person` but
> > have
> > > collocation needed for this failing query:
> > > For our example it can be an entity PersonForPurchace(id, manufId,
> name)
> > > with the same affinity key `manufId` as `Purchase`.
> > > This way our query can be rewritten in fully collocated style:
> > >
> > > SELECT pe.name FROM PersonForPurchace pe JOIN Purchase pu ON pe.id =
> > > pu.personId AND pu.manufId = pe.manufId WHERE pu.id = ?
> > >
> > > Sergi
> > >
> > >
> > >
> > >
> > >
> > > 2015-08-07 11:42 GMT+03:00 Alexey Kuznetsov <[hidden email]>:
> > >
> > > > Sergi,
> > > >
> > > > Questions about plan "B" :)
> > > > 1) It is possible to throw exception on query prepare state (fail
> fast)
> > > > when
> > > > we don't know remote affinity key?
> > > > 2) Could you provide an example when we don't know remote affinity
> > key? I
> > > > think we always have some default affinity (no?)?
> > > >
> > > > --
> > > > Alexey Kuznetsov
> > > > GridGain Systems
> > > > www.gridgain.com
> > > >
> > >
> >
>

Sergi

Re: Distributive SQL Joins

I've created respective Jira issue
https://issues.apache.org/jira/browse/IGNITE-1232

Sergi

2015-08-11 9:31 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:

> On Mon, Aug 10, 2015 at 11:26 PM, Sergi Vladykin <[hidden email]
> >
> wrote:
>
> > I was thinking about protecting users from doing stupid things,
> > but ok, we can do a broadcast as well.
> >
>
> I think we should print out a warning in this case for sure. Is there a
> Jira created for this? We should provide a link to this discussion there.
>
>
> >
> > Sergi
> >
> > 2015-08-08 3:46 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
> >
> > > Sergi,
> > >
> > > I personally don't like that for certain types of queries we will be
> > > throwing an exception.
> > >
> > > After analyzing the approaches you suggested, I can think of cases
> where
> > A
> > > performs better than B, as well as when B performs better than A.
> > >
> > > However, if you prefer B, I don't mind us taking that approach. As you
> > have
> > > mentioned yourself, in case of non-collocated non-affinity-ID queries,
> > you
> > > would require a broadcast which is a performance hit. I still vote that
> > we
> > > take this performance hit and do the broadcast (optimized with
> batching,
> > of
> > > course), and execute the query instead of throwing an exception.
> > >
> > > D.
> > >
> > > On Fri, Aug 7, 2015 at 3:40 AM, Sergi Vladykin <
> [hidden email]
> > >
> > > wrote:
> > >
> > > > Alexey,
> > > >
> > > > 1. Yes, in my plan it should work exactly like that: if both keys in
> > join
> > > > are affinity keys, then we are fully collocated, if only one then we
> > can
> > > > run join remotely as described, if none of them we will fail to run
> the
> > > > query.
> > > >
> > > > 2. I mean we don't have values for these affinity keys in our local
> > query
> > > > result to map requests to remote nodes.
> > > > Example:
> > > > Lest say we have 4 partitioned tables:
> > > > - Organization(id) with affinity key `id`.
> > > > - Person(id, orgId, name) with affinity key `orgId` (it means that it
> > > will
> > > > be collocated with `Organization`)
> > > > - Manufacturer(id) with affinity key `id`.
> > > > - Purchase(id, personId, manufId) with affinity key `manufId`
> > (collocated
> > > > with `Manufacturer`)
> > > >
> > > > As you can see `Purchase` has a reference to a `Person` and we may
> want
> > > to
> > > > join them by this reference in a query like this:
> > > >
> > > > SELECT pe.name FROM Person pe JOIN Purchase pu ON pe.id =
> pu.personId
> > > > WHERE
> > > > pu.id = ?
> > > >
> > > > as you can see neither `pe.id` nor `pu.personId` is an affinity key
> > > here.
> > > > But if the `Person` has affinity key `id` and thus is not collocated
> > with
> > > > `Organization`
> > > > we can run query on `Purchase`, take value of `personId` and find the
> > > > affinity node to get the needed `Person`.
> > > >
> > > > Of course it is a restriction but there are multiple ways to
> workaround
> > > it,
> > > > so I don't think it is really a problem:
> > > >
> > > > 1. Use primary key as affinity key if table is used in such joins.
> This
> > > way
> > > > `Person` still can be joined to `Organization`
> > > > (less effective though) and `Pusrchase` can be joined to `Person` as
> > > well.
> > > > 2. Use denormalization: instread of having `Purchase.personId` store
> > > > `Person` object itself there.
> > > > 3. Introduce another entity which can duplicate data from `Person`
> but
> > > have
> > > > collocation needed for this failing query:
> > > > For our example it can be an entity PersonForPurchace(id, manufId,
> > name)
> > > > with the same affinity key `manufId` as `Purchase`.
> > > > This way our query can be rewritten in fully collocated style:
> > > >
> > > > SELECT pe.name FROM PersonForPurchace pe JOIN Purchase pu ON pe.id =
> > > > pu.personId AND pu.manufId = pe.manufId WHERE pu.id = ?
> > > >
> > > > Sergi
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > 2015-08-07 11:42 GMT+03:00 Alexey Kuznetsov <[hidden email]
> >:
> > > >
> > > > > Sergi,
> > > > >
> > > > > Questions about plan "B" :)
> > > > > 1) It is possible to throw exception on query prepare state (fail
> > fast)
> > > > > when
> > > > > we don't know remote affinity key?
> > > > > 2) Could you provide an example when we don't know remote affinity
> > > key? I
> > > > > think we always have some default affinity (no?)?
> > > > >
> > > > > --
> > > > > Alexey Kuznetsov
> > > > > GridGain Systems
> > > > > www.gridgain.com
> > > > >
> > > >
> > >
> >
>