Apache Ignite Developers - Legacy Mail Archive

affinity key syntax

Classic

List

Threaded

7 messages Options

dsetrakyan

affinity key syntax

Igniters,

Please take a look at how the affinity syntax works in Google Spanner:
https://stackoverflow.com/questions/46903159/can-i-have-
multiple-tables-with-same-parent-in-google-spanner

I find it rather nice. Perhaps we can borrow from it.

D.

Vladimir Ozerov

Re: affinity key syntax

Dima,

Yes, I saw it also. But this is not about syntax only. Spanner use this
information to store data efficiently - child entries a located near to
their parents. We can think of it as if all related tables were logical
caches inside one physical cache, sorted by the key. With this storage
format it will be possible to implement very efficient co-located joins.

On Wed, Oct 25, 2017 at 1:26 PM, Dmitriy Setrakyan <[hidden email]>
wrote:

> Igniters,
>
> Please take a look at how the affinity syntax works in Google Spanner:
> https://stackoverflow.com/questions/46903159/can-i-have-
> multiple-tables-with-same-parent-in-google-spanner
>
> I find it rather nice. Perhaps we can borrow from it.
>
> D.
>

dsetrakyan

Re: affinity key syntax

On Wed, Oct 25, 2017 at 3:32 AM, Vladimir Ozerov <[hidden email]>
wrote:

> Dima,
>
> Yes, I saw it also. But this is not about syntax only. Spanner use this
> information to store data efficiently - child entries a located near to
> their parents. We can think of it as if all related tables were logical
> caches inside one physical cache, sorted by the key. With this storage
> format it will be possible to implement very efficient co-located joins.
>

Hm... I don't think Ignite's approach for collocated joins is lees
efficient. However, back to Spanner, the first value in the child table key
is the parent table key. This tells me that Spanner collocates based on an
approach very similar to Ignite's affinity key. Am I wrong?

Vladimir Ozerov

Re: affinity key syntax

In Spanner once parent key is found you don't need to search for child keys
from scratch - they are located just after the parent key in the tree. In
Ignite child and parent keys are located in different trees, hence more
lookups are needed.

On Wed, Oct 25, 2017 at 1:36 PM, Dmitriy Setrakyan <[hidden email]>
wrote:

> On Wed, Oct 25, 2017 at 3:32 AM, Vladimir Ozerov <[hidden email]>
> wrote:
>
> > Dima,
> >
> > Yes, I saw it also. But this is not about syntax only. Spanner use this
> > information to store data efficiently - child entries a located near to
> > their parents. We can think of it as if all related tables were logical
> > caches inside one physical cache, sorted by the key. With this storage
> > format it will be possible to implement very efficient co-located joins.
> >
>
> Hm... I don't think Ignite's approach for collocated joins is lees
> efficient. However, back to Spanner, the first value in the child table key
> is the parent table key. This tells me that Spanner collocates based on an
> approach very similar to Ignite's affinity key. Am I wrong?
>

Vladimir Ozerov

Re: affinity key syntax

For example, currently every table in Ignite has at least two PK indexes -
one for cache operations, and another one for H2. If you have two tables
(parent - child), you have either 4 indexes (if they are in different
groups), or 3 indexes (same logical group). But even if certain tree is
shared between caches, dependent data entries are located in completely
different parts.

With Spanner we will need only 1 index for both tables.

On Wed, Oct 25, 2017 at 1:39 PM, Vladimir Ozerov <[hidden email]>
wrote:

> In Spanner once parent key is found you don't need to search for child
> keys from scratch - they are located just after the parent key in the tree.
> In Ignite child and parent keys are located in different trees, hence more
> lookups are needed.
>
> On Wed, Oct 25, 2017 at 1:36 PM, Dmitriy Setrakyan <[hidden email]>
> wrote:
>
>> On Wed, Oct 25, 2017 at 3:32 AM, Vladimir Ozerov <[hidden email]>
>> wrote:
>>
>> > Dima,
>> >
>> > Yes, I saw it also. But this is not about syntax only. Spanner use this
>> > information to store data efficiently - child entries a located near to
>> > their parents. We can think of it as if all related tables were logical
>> > caches inside one physical cache, sorted by the key. With this storage
>> > format it will be possible to implement very efficient co-located joins.
>> >
>>
>> Hm... I don't think Ignite's approach for collocated joins is lees
>> efficient. However, back to Spanner, the first value in the child table
>> key
>> is the parent table key. This tells me that Spanner collocates based on an
>> approach very similar to Ignite's affinity key. Am I wrong?
>>
>
>

dsetrakyan

Re: affinity key syntax

Thanks, Vladimir, got it. However, even though we may have different
indexes, the data should be on the same node, if it is properly collocated.
I do agree that we should try to remove extra index lookups, if possible.
Do we have a ticket for it? Is it a lot of work?

D.

On Wed, Oct 25, 2017 at 3:42 AM, Vladimir Ozerov <[hidden email]>
wrote:

> For example, currently every table in Ignite has at least two PK indexes -
> one for cache operations, and another one for H2. If you have two tables
> (parent - child), you have either 4 indexes (if they are in different
> groups), or 3 indexes (same logical group). But even if certain tree is
> shared between caches, dependent data entries are located in completely
> different parts.
>
> With Spanner we will need only 1 index for both tables.
>
> On Wed, Oct 25, 2017 at 1:39 PM, Vladimir Ozerov <[hidden email]>
> wrote:
>
> > In Spanner once parent key is found you don't need to search for child
> > keys from scratch - they are located just after the parent key in the
> tree.
> > In Ignite child and parent keys are located in different trees, hence
> more
> > lookups are needed.
> >
> > On Wed, Oct 25, 2017 at 1:36 PM, Dmitriy Setrakyan <
> [hidden email]>
> > wrote:
> >
> >> On Wed, Oct 25, 2017 at 3:32 AM, Vladimir Ozerov <[hidden email]>
> >> wrote:
> >>
> >> > Dima,
> >> >
> >> > Yes, I saw it also. But this is not about syntax only. Spanner use
> this
> >> > information to store data efficiently - child entries a located near
> to
> >> > their parents. We can think of it as if all related tables were
> logical
> >> > caches inside one physical cache, sorted by the key. With this storage
> >> > format it will be possible to implement very efficient co-located
> joins.
> >> >
> >>
> >> Hm... I don't think Ignite's approach for collocated joins is lees
> >> efficient. However, back to Spanner, the first value in the child table
> >> key
> >> is the parent table key. This tells me that Spanner collocates based on
> an
> >> approach very similar to Ignite's affinity key. Am I wrong?
> >>
> >
> >
>

Vladimir Ozerov

Re: affinity key syntax

Dima,

I filed a ticket [1]. This is very promising approach which will allow us
to replace regular joins with nested tables in many cases, thus boosting
performance of JOINs and making Ignite's affinity configuration easier form
UX perspecitve. But this is a big thing, as it would require fully-fledged
SQL parser (we cannot process SELECT statements at the moment) and quire a
few changes to our B+Tree.

[1] https://issues.apache.org/jira/browse/IGNITE-7038

On Tue, Oct 31, 2017 at 6:28 AM, Dmitriy Setrakyan <[hidden email]>
wrote:

> Thanks, Vladimir, got it. However, even though we may have different
> indexes, the data should be on the same node, if it is properly collocated.
> I do agree that we should try to remove extra index lookups, if possible.
> Do we have a ticket for it? Is it a lot of work?
>
> D.
>
> On Wed, Oct 25, 2017 at 3:42 AM, Vladimir Ozerov <[hidden email]>
> wrote:
>
> > For example, currently every table in Ignite has at least two PK indexes
> -
> > one for cache operations, and another one for H2. If you have two tables
> > (parent - child), you have either 4 indexes (if they are in different
> > groups), or 3 indexes (same logical group). But even if certain tree is
> > shared between caches, dependent data entries are located in completely
> > different parts.
> >
> > With Spanner we will need only 1 index for both tables.
> >
> > On Wed, Oct 25, 2017 at 1:39 PM, Vladimir Ozerov <[hidden email]>
> > wrote:
> >
> > > In Spanner once parent key is found you don't need to search for child
> > > keys from scratch - they are located just after the parent key in the
> > tree.
> > > In Ignite child and parent keys are located in different trees, hence
> > more
> > > lookups are needed.
> > >
> > > On Wed, Oct 25, 2017 at 1:36 PM, Dmitriy Setrakyan <
> > [hidden email]>
> > > wrote:
> > >
> > >> On Wed, Oct 25, 2017 at 3:32 AM, Vladimir Ozerov <
> [hidden email]>
> > >> wrote:
> > >>
> > >> > Dima,
> > >> >
> > >> > Yes, I saw it also. But this is not about syntax only. Spanner use
> > this
> > >> > information to store data efficiently - child entries a located near
> > to
> > >> > their parents. We can think of it as if all related tables were
> > logical
> > >> > caches inside one physical cache, sorted by the key. With this
> storage
> > >> > format it will be possible to implement very efficient co-located
> > joins.
> > >> >
> > >>
> > >> Hm... I don't think Ignite's approach for collocated joins is lees
> > >> efficient. However, back to Spanner, the first value in the child
> table
> > >> key
> > >> is the parent table key. This tells me that Spanner collocates based
> on
> > an
> > >> approach very similar to Ignite's affinity key. Am I wrong?
> > >>
> > >
> > >
> >
>