Apache Ignite Developers - Legacy Mail Archive

Unstructured object format.

Classic

List

Threaded

13 messages Options

Sergi

Unstructured object format.

Guys, specially Alexey G.

I've attended PostgreSQL conference and there was a talk about unstructured
data format.
They had an interesting idea of serialized layout close enough to ours, I'm
not sure how much this is different from our approach and if we can use
some ideas from it but anywaus it looks really promising to me and I want
to share.

The structure basically is the following:

[key headers] [keys] [values]

Key headers are [key offset, key length] so they are of a fixed length.

The cool idea here is that keys and respectively the key headers sorted by
(key length, key) so that you can do a lookup first by fast picking key of
the needed length without looking at keys at all and then pick an exact
key. Both searches can be done with fast scan if there are small number of
keys and binary search for a larger number of keys.

Alexey G., could you please compare this to our new marshalling approach
you are about to merge?
BTW, it would be nice if you will describe it in details here.

Sergi

Atri Sharma

Re: Unstructured object format.

Are you referring to JSONB here?

On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin <[hidden email]>
wrote:

> Guys, specially Alexey G.
>
> I've attended PostgreSQL conference and there was a talk about unstructured
> data format.
> They had an interesting idea of serialized layout close enough to ours, I'm
> not sure how much this is different from our approach and if we can use
> some ideas from it but anywaus it looks really promising to me and I want
> to share.
>
> The structure basically is the following:
>
> [key headers] [keys] [values]
>
> Key headers are [key offset, key length] so they are of a fixed length.
>
> The cool idea here is that keys and respectively the key headers sorted by
> (key length, key) so that you can do a lookup first by fast picking key of
> the needed length without looking at keys at all and then pick an exact
> key. Both searches can be done with fast scan if there are small number of
> keys and binary search for a larger number of keys.
>
> Alexey G., could you please compare this to our new marshalling approach
> you are about to merge?
> BTW, it would be nice if you will describe it in details here.
>
> Sergi
>

--
Regards,

Atri
*l'apprenant*

Sergi

Re: Unstructured object format.

HSTORE and JSONB appeared to have similar format in Postgresql (because
they was developed by the same people). They noticed that they switched off
of using key length sorting because they sometimes need lexicographical
order but this is irrelevant for us.

Sergi

2015-07-16 10:43 GMT+03:00 Atri Sharma <[hidden email]>:

> Are you referring to JSONB here?
>
> On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin <[hidden email]>
> wrote:
>
> > Guys, specially Alexey G.
> >
> > I've attended PostgreSQL conference and there was a talk about
> unstructured
> > data format.
> > They had an interesting idea of serialized layout close enough to ours,
> I'm
> > not sure how much this is different from our approach and if we can use
> > some ideas from it but anywaus it looks really promising to me and I want
> > to share.
> >
> > The structure basically is the following:
> >
> > [key headers] [keys] [values]
> >
> > Key headers are [key offset, key length] so they are of a fixed length.
> >
> > The cool idea here is that keys and respectively the key headers sorted
> by
> > (key length, key) so that you can do a lookup first by fast picking key
> of
> > the needed length without looking at keys at all and then pick an exact
> > key. Both searches can be done with fast scan if there are small number
> of
> > keys and binary search for a larger number of keys.
> >
> > Alexey G., could you please compare this to our new marshalling approach
> > you are about to merge?
> > BTW, it would be nice if you will describe it in details here.
> >
> > Sergi
> >
>
>
>
> --
> Regards,
>
> Atri
> *l'apprenant*
>

Atri Sharma

Re: Unstructured object format.

Keep in mind that JSONB's performance comes from the fact that it uses
server encoding, is binary represented and can have GIN indexes on top of
it. Not sure if Ignite's marshalling approach keeps those features as well.

On Thu, Jul 16, 2015 at 1:20 PM, Sergi Vladykin <[hidden email]>
wrote:

> HSTORE and JSONB appeared to have similar format in Postgresql (because
> they was developed by the same people). They noticed that they switched off
> of using key length sorting because they sometimes need lexicographical
> order but this is irrelevant for us.
>
> Sergi
>
> 2015-07-16 10:43 GMT+03:00 Atri Sharma <[hidden email]>:
>
> > Are you referring to JSONB here?
> >
> > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin <
> [hidden email]>
> > wrote:
> >
> > > Guys, specially Alexey G.
> > >
> > > I've attended PostgreSQL conference and there was a talk about
> > unstructured
> > > data format.
> > > They had an interesting idea of serialized layout close enough to ours,
> > I'm
> > > not sure how much this is different from our approach and if we can use
> > > some ideas from it but anywaus it looks really promising to me and I
> want
> > > to share.
> > >
> > > The structure basically is the following:
> > >
> > > [key headers] [keys] [values]
> > >
> > > Key headers are [key offset, key length] so they are of a fixed length.
> > >
> > > The cool idea here is that keys and respectively the key headers sorted
> > by
> > > (key length, key) so that you can do a lookup first by fast picking key
> > of
> > > the needed length without looking at keys at all and then pick an exact
> > > key. Both searches can be done with fast scan if there are small number
> > of
> > > keys and binary search for a larger number of keys.
> > >
> > > Alexey G., could you please compare this to our new marshalling
> approach
> > > you are about to merge?
> > > BTW, it would be nice if you will describe it in details here.
> > >
> > > Sergi
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > Atri
> > *l'apprenant*
> >
>

--
Regards,

Atri
*l'apprenant*

Alexey Goncharuk

Re: Unstructured object format.

Currently an index-enabled serialized object form has the following layout
(simplified):

[object fields][field1Offset,field1Length,
field2Offset,field2Length,...,fieldNOffset,fieldNLength]

where fields order is determined upon the first object serialization and
stored in metadata cache which is available on all nodes. Thus, the field
lookup is performed as follows:

fieldName -> fieldIndex (metadata lookup, O(1)), fieldIndex->fieldOffset in
footer (O(1)), fieldOffset->fieldValue (O(1)).

BTW, I am finalizing the branch with marshaller changes and will send this
for a preliminary review soon.

2015-07-16 0:55 GMT-07:00 Atri Sharma <[hidden email]>:

> Keep in mind that JSONB's performance comes from the fact that it uses
> server encoding, is binary represented and can have GIN indexes on top of
> it. Not sure if Ignite's marshalling approach keeps those features as well.
>
> On Thu, Jul 16, 2015 at 1:20 PM, Sergi Vladykin <[hidden email]>
> wrote:
>
> > HSTORE and JSONB appeared to have similar format in Postgresql (because
> > they was developed by the same people). They noticed that they switched
> off
> > of using key length sorting because they sometimes need lexicographical
> > order but this is irrelevant for us.
> >
> > Sergi
> >
> > 2015-07-16 10:43 GMT+03:00 Atri Sharma <[hidden email]>:
> >
> > > Are you referring to JSONB here?
> > >
> > > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin <
> > [hidden email]>
> > > wrote:
> > >
> > > > Guys, specially Alexey G.
> > > >
> > > > I've attended PostgreSQL conference and there was a talk about
> > > unstructured
> > > > data format.
> > > > They had an interesting idea of serialized layout close enough to
> ours,
> > > I'm
> > > > not sure how much this is different from our approach and if we can
> use
> > > > some ideas from it but anywaus it looks really promising to me and I
> > want
> > > > to share.
> > > >
> > > > The structure basically is the following:
> > > >
> > > > [key headers] [keys] [values]
> > > >
> > > > Key headers are [key offset, key length] so they are of a fixed
> length.
> > > >
> > > > The cool idea here is that keys and respectively the key headers
> sorted
> > > by
> > > > (key length, key) so that you can do a lookup first by fast picking
> key
> > > of
> > > > the needed length without looking at keys at all and then pick an
> exact
> > > > key. Both searches can be done with fast scan if there are small
> number
> > > of
> > > > keys and binary search for a larger number of keys.
> > > >
> > > > Alexey G., could you please compare this to our new marshalling
> > approach
> > > > you are about to merge?
> > > > BTW, it would be nice if you will describe it in details here.
> > > >
> > > > Sergi
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Atri
> > > *l'apprenant*
> > >
> >
>
>
>
> --
> Regards,
>
> Atri
> *l'apprenant*
>

Sergi

Re: Unstructured object format.

I think O(N) reasoning does not make a real sense here since N is always
small, lets not fool ourselves.
To my mind operation cost of cache access (with all busy locks...),
hashCode/equals and stuff like that has much bigger impact here.
Do we still have a pluggable marshaller? Can my approach be implemented
separately?

Sergi

2015-07-21 9:14 GMT+03:00 Alexey Goncharuk <[hidden email]>:

> Currently an index-enabled serialized object form has the following layout
> (simplified):
>
> [object fields][field1Offset,field1Length,
> field2Offset,field2Length,...,fieldNOffset,fieldNLength]
>
> where fields order is determined upon the first object serialization and
> stored in metadata cache which is available on all nodes. Thus, the field
> lookup is performed as follows:
>
> fieldName -> fieldIndex (metadata lookup, O(1)), fieldIndex->fieldOffset in
> footer (O(1)), fieldOffset->fieldValue (O(1)).
>
> BTW, I am finalizing the branch with marshaller changes and will send this
> for a preliminary review soon.
>
> 2015-07-16 0:55 GMT-07:00 Atri Sharma <[hidden email]>:
>
> > Keep in mind that JSONB's performance comes from the fact that it uses
> > server encoding, is binary represented and can have GIN indexes on top of
> > it. Not sure if Ignite's marshalling approach keeps those features as
> well.
> >
> > On Thu, Jul 16, 2015 at 1:20 PM, Sergi Vladykin <
> [hidden email]>
> > wrote:
> >
> > > HSTORE and JSONB appeared to have similar format in Postgresql (because
> > > they was developed by the same people). They noticed that they switched
> > off
> > > of using key length sorting because they sometimes need lexicographical
> > > order but this is irrelevant for us.
> > >
> > > Sergi
> > >
> > > 2015-07-16 10:43 GMT+03:00 Atri Sharma <[hidden email]>:
> > >
> > > > Are you referring to JSONB here?
> > > >
> > > > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin <
> > > [hidden email]>
> > > > wrote:
> > > >
> > > > > Guys, specially Alexey G.
> > > > >
> > > > > I've attended PostgreSQL conference and there was a talk about
> > > > unstructured
> > > > > data format.
> > > > > They had an interesting idea of serialized layout close enough to
> > ours,
> > > > I'm
> > > > > not sure how much this is different from our approach and if we can
> > use
> > > > > some ideas from it but anywaus it looks really promising to me and
> I
> > > want
> > > > > to share.
> > > > >
> > > > > The structure basically is the following:
> > > > >
> > > > > [key headers] [keys] [values]
> > > > >
> > > > > Key headers are [key offset, key length] so they are of a fixed
> > length.
> > > > >
> > > > > The cool idea here is that keys and respectively the key headers
> > sorted
> > > > by
> > > > > (key length, key) so that you can do a lookup first by fast picking
> > key
> > > > of
> > > > > the needed length without looking at keys at all and then pick an
> > exact
> > > > > key. Both searches can be done with fast scan if there are small
> > number
> > > > of
> > > > > keys and binary search for a larger number of keys.
> > > > >
> > > > > Alexey G., could you please compare this to our new marshalling
> > > approach
> > > > > you are about to merge?
> > > > > BTW, it would be nice if you will describe it in details here.
> > > > >
> > > > > Sergi
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > >
> > > > Atri
> > > > *l'apprenant*
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > Atri
> > *l'apprenant*
> >
>

Alexey Goncharuk

Re: Unstructured object format.

Metadata cache access is backed by a local hash map, so the real cost is a
String object hashcode which is cached in the String object and a hashmap
lookup by an integer key.

On the other hand, the marshaller is still pluggable and after the ticket
is completed, it should be fairly easy to implement this approach and
compare performance.

--Alexey

2015-07-21 10:01 GMT-07:00 Sergi Vladykin <[hidden email]>:

> I think O(N) reasoning does not make a real sense here since N is always
> small, lets not fool ourselves.
> To my mind operation cost of cache access (with all busy locks...),
> hashCode/equals and stuff like that has much bigger impact here.
> Do we still have a pluggable marshaller? Can my approach be implemented
> separately?
>
> Sergi
>
>
>
>
> 2015-07-21 9:14 GMT+03:00 Alexey Goncharuk <[hidden email]>:
>
> > Currently an index-enabled serialized object form has the following
> layout
> > (simplified):
> >
> > [object fields][field1Offset,field1Length,
> > field2Offset,field2Length,...,fieldNOffset,fieldNLength]
> >
> > where fields order is determined upon the first object serialization and
> > stored in metadata cache which is available on all nodes. Thus, the field
> > lookup is performed as follows:
> >
> > fieldName -> fieldIndex (metadata lookup, O(1)), fieldIndex->fieldOffset
> in
> > footer (O(1)), fieldOffset->fieldValue (O(1)).
> >
> > BTW, I am finalizing the branch with marshaller changes and will send
> this
> > for a preliminary review soon.
> >
> > 2015-07-16 0:55 GMT-07:00 Atri Sharma <[hidden email]>:
> >
> > > Keep in mind that JSONB's performance comes from the fact that it uses
> > > server encoding, is binary represented and can have GIN indexes on top
> of
> > > it. Not sure if Ignite's marshalling approach keeps those features as
> > well.
> > >
> > > On Thu, Jul 16, 2015 at 1:20 PM, Sergi Vladykin <
> > [hidden email]>
> > > wrote:
> > >
> > > > HSTORE and JSONB appeared to have similar format in Postgresql
> (because
> > > > they was developed by the same people). They noticed that they
> switched
> > > off
> > > > of using key length sorting because they sometimes need
> lexicographical
> > > > order but this is irrelevant for us.
> > > >
> > > > Sergi
> > > >
> > > > 2015-07-16 10:43 GMT+03:00 Atri Sharma <[hidden email]>:
> > > >
> > > > > Are you referring to JSONB here?
> > > > >
> > > > > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin <
> > > > [hidden email]>
> > > > > wrote:
> > > > >
> > > > > > Guys, specially Alexey G.
> > > > > >
> > > > > > I've attended PostgreSQL conference and there was a talk about
> > > > > unstructured
> > > > > > data format.
> > > > > > They had an interesting idea of serialized layout close enough to
> > > ours,
> > > > > I'm
> > > > > > not sure how much this is different from our approach and if we
> can
> > > use
> > > > > > some ideas from it but anywaus it looks really promising to me
> and
> > I
> > > > want
> > > > > > to share.
> > > > > >
> > > > > > The structure basically is the following:
> > > > > >
> > > > > > [key headers] [keys] [values]
> > > > > >
> > > > > > Key headers are [key offset, key length] so they are of a fixed
> > > length.
> > > > > >
> > > > > > The cool idea here is that keys and respectively the key headers
> > > sorted
> > > > > by
> > > > > > (key length, key) so that you can do a lookup first by fast
> picking
> > > key
> > > > > of
> > > > > > the needed length without looking at keys at all and then pick an
> > > exact
> > > > > > key. Both searches can be done with fast scan if there are small
> > > number
> > > > > of
> > > > > > keys and binary search for a larger number of keys.
> > > > > >
> > > > > > Alexey G., could you please compare this to our new marshalling
> > > > approach
> > > > > > you are about to merge?
> > > > > > BTW, it would be nice if you will describe it in details here.
> > > > > >
> > > > > > Sergi
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > >
> > > > > Atri
> > > > > *l'apprenant*
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Atri
> > > *l'apprenant*
> > >
> >
>

Atri Sharma

Re: Unstructured object format.

So does that mean that local hashmap is not controlled with all the heavy
locks that are present around the cache?
On 22 Jul 2015 07:31, "Alexey Goncharuk" <[hidden email]> wrote:

> Metadata cache access is backed by a local hash map, so the real cost is a
> String object hashcode which is cached in the String object and a hashmap
> lookup by an integer key.
>
> On the other hand, the marshaller is still pluggable and after the ticket
> is completed, it should be fairly easy to implement this approach and
> compare performance.
>
> --Alexey
>
> 2015-07-21 10:01 GMT-07:00 Sergi Vladykin <[hidden email]>:
>
> > I think O(N) reasoning does not make a real sense here since N is always
> > small, lets not fool ourselves.
> > To my mind operation cost of cache access (with all busy locks...),
> > hashCode/equals and stuff like that has much bigger impact here.
> > Do we still have a pluggable marshaller? Can my approach be implemented
> > separately?
> >
> > Sergi
> >
> >
> >
> >
> > 2015-07-21 9:14 GMT+03:00 Alexey Goncharuk <[hidden email]>:
> >
> > > Currently an index-enabled serialized object form has the following
> > layout
> > > (simplified):
> > >
> > > [object fields][field1Offset,field1Length,
> > > field2Offset,field2Length,...,fieldNOffset,fieldNLength]
> > >
> > > where fields order is determined upon the first object serialization
> and
> > > stored in metadata cache which is available on all nodes. Thus, the
> field
> > > lookup is performed as follows:
> > >
> > > fieldName -> fieldIndex (metadata lookup, O(1)),
> fieldIndex->fieldOffset
> > in
> > > footer (O(1)), fieldOffset->fieldValue (O(1)).
> > >
> > > BTW, I am finalizing the branch with marshaller changes and will send
> > this
> > > for a preliminary review soon.
> > >
> > > 2015-07-16 0:55 GMT-07:00 Atri Sharma <[hidden email]>:
> > >
> > > > Keep in mind that JSONB's performance comes from the fact that it
> uses
> > > > server encoding, is binary represented and can have GIN indexes on
> top
> > of
> > > > it. Not sure if Ignite's marshalling approach keeps those features as
> > > well.
> > > >
> > > > On Thu, Jul 16, 2015 at 1:20 PM, Sergi Vladykin <
> > > [hidden email]>
> > > > wrote:
> > > >
> > > > > HSTORE and JSONB appeared to have similar format in Postgresql
> > (because
> > > > > they was developed by the same people). They noticed that they
> > switched
> > > > off
> > > > > of using key length sorting because they sometimes need
> > lexicographical
> > > > > order but this is irrelevant for us.
> > > > >
> > > > > Sergi
> > > > >
> > > > > 2015-07-16 10:43 GMT+03:00 Atri Sharma <[hidden email]>:
> > > > >
> > > > > > Are you referring to JSONB here?
> > > > > >
> > > > > > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin <
> > > > > [hidden email]>
> > > > > > wrote:
> > > > > >
> > > > > > > Guys, specially Alexey G.
> > > > > > >
> > > > > > > I've attended PostgreSQL conference and there was a talk about
> > > > > > unstructured
> > > > > > > data format.
> > > > > > > They had an interesting idea of serialized layout close enough
> to
> > > > ours,
> > > > > > I'm
> > > > > > > not sure how much this is different from our approach and if we
> > can
> > > > use
> > > > > > > some ideas from it but anywaus it looks really promising to me
> > and
> > > I
> > > > > want
> > > > > > > to share.
> > > > > > >
> > > > > > > The structure basically is the following:
> > > > > > >
> > > > > > > [key headers] [keys] [values]
> > > > > > >
> > > > > > > Key headers are [key offset, key length] so they are of a fixed
> > > > length.
> > > > > > >
> > > > > > > The cool idea here is that keys and respectively the key
> headers
> > > > sorted
> > > > > > by
> > > > > > > (key length, key) so that you can do a lookup first by fast
> > picking
> > > > key
> > > > > > of
> > > > > > > the needed length without looking at keys at all and then pick
> an
> > > > exact
> > > > > > > key. Both searches can be done with fast scan if there are
> small
> > > > number
> > > > > > of
> > > > > > > keys and binary search for a larger number of keys.
> > > > > > >
> > > > > > > Alexey G., could you please compare this to our new marshalling
> > > > > approach
> > > > > > > you are about to merge?
> > > > > > > BTW, it would be nice if you will describe it in details here.
> > > > > > >
> > > > > > > Sergi
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > >
> > > > > > Atri
> > > > > > *l'apprenant*
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > >
> > > > Atri
> > > > *l'apprenant*
> > > >
> > >
> >
>

Alexey Goncharuk

Re: Unstructured object format.

2015-07-21 22:15 GMT-07:00 Atri Sharma <[hidden email]>:

> So does that mean that local hashmap is not controlled with all the heavy
> locks that are present around the cache?
>

Metadata cache is basically populated once and then never changes, so only
first get goes to the cache which is guarded by gateways and other logic,
then result is cached in the local map and all ongoing gets are done from
that map.

dsetrakyan

Re: Unstructured object format.

In reply to this post by Atri Sharma

On Tue, Jul 21, 2015 at 10:15 PM, Atri Sharma <[hidden email]> wrote:

> So does that mean that local hashmap is not controlled with all the heavy
> locks that are present around the cache?
>

Yes Atri, you are right. The data is stored in local hash map to avoid
touching a distributed cache whenever serializing objects.

> On 22 Jul 2015 07:31, "Alexey Goncharuk" <[hidden email]>
> wrote:
>
> > Metadata cache access is backed by a local hash map, so the real cost is
> a
> > String object hashcode which is cached in the String object and a hashmap
> > lookup by an integer key.
> >
> > On the other hand, the marshaller is still pluggable and after the ticket
> > is completed, it should be fairly easy to implement this approach and
> > compare performance.
> >
> > --Alexey
> >
> > 2015-07-21 10:01 GMT-07:00 Sergi Vladykin <[hidden email]>:
> >
> > > I think O(N) reasoning does not make a real sense here since N is
> always
> > > small, lets not fool ourselves.
> > > To my mind operation cost of cache access (with all busy locks...),
> > > hashCode/equals and stuff like that has much bigger impact here.
> > > Do we still have a pluggable marshaller? Can my approach be implemented
> > > separately?
> > >
> > > Sergi
> > >
> > >
> > >
> > >
> > > 2015-07-21 9:14 GMT+03:00 Alexey Goncharuk <[hidden email]
> >:
> > >
> > > > Currently an index-enabled serialized object form has the following
> > > layout
> > > > (simplified):
> > > >
> > > > [object fields][field1Offset,field1Length,
> > > > field2Offset,field2Length,...,fieldNOffset,fieldNLength]
> > > >
> > > > where fields order is determined upon the first object serialization
> > and
> > > > stored in metadata cache which is available on all nodes. Thus, the
> > field
> > > > lookup is performed as follows:
> > > >
> > > > fieldName -> fieldIndex (metadata lookup, O(1)),
> > fieldIndex->fieldOffset
> > > in
> > > > footer (O(1)), fieldOffset->fieldValue (O(1)).
> > > >
> > > > BTW, I am finalizing the branch with marshaller changes and will send
> > > this
> > > > for a preliminary review soon.
> > > >
> > > > 2015-07-16 0:55 GMT-07:00 Atri Sharma <[hidden email]>:
> > > >
> > > > > Keep in mind that JSONB's performance comes from the fact that it
> > uses
> > > > > server encoding, is binary represented and can have GIN indexes on
> > top
> > > of
> > > > > it. Not sure if Ignite's marshalling approach keeps those features
> as
> > > > well.
> > > > >
> > > > > On Thu, Jul 16, 2015 at 1:20 PM, Sergi Vladykin <
> > > > [hidden email]>
> > > > > wrote:
> > > > >
> > > > > > HSTORE and JSONB appeared to have similar format in Postgresql
> > > (because
> > > > > > they was developed by the same people). They noticed that they
> > > switched
> > > > > off
> > > > > > of using key length sorting because they sometimes need
> > > lexicographical
> > > > > > order but this is irrelevant for us.
> > > > > >
> > > > > > Sergi
> > > > > >
> > > > > > 2015-07-16 10:43 GMT+03:00 Atri Sharma <[hidden email]>:
> > > > > >
> > > > > > > Are you referring to JSONB here?
> > > > > > >
> > > > > > > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin <
> > > > > > [hidden email]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Guys, specially Alexey G.
> > > > > > > >
> > > > > > > > I've attended PostgreSQL conference and there was a talk
> about
> > > > > > > unstructured
> > > > > > > > data format.
> > > > > > > > They had an interesting idea of serialized layout close
> enough
> > to
> > > > > ours,
> > > > > > > I'm
> > > > > > > > not sure how much this is different from our approach and if
> we
> > > can
> > > > > use
> > > > > > > > some ideas from it but anywaus it looks really promising to
> me
> > > and
> > > > I
> > > > > > want
> > > > > > > > to share.
> > > > > > > >
> > > > > > > > The structure basically is the following:
> > > > > > > >
> > > > > > > > [key headers] [keys] [values]
> > > > > > > >
> > > > > > > > Key headers are [key offset, key length] so they are of a
> fixed
> > > > > length.
> > > > > > > >
> > > > > > > > The cool idea here is that keys and respectively the key
> > headers
> > > > > sorted
> > > > > > > by
> > > > > > > > (key length, key) so that you can do a lookup first by fast
> > > picking
> > > > > key
> > > > > > > of
> > > > > > > > the needed length without looking at keys at all and then
> pick
> > an
> > > > > exact
> > > > > > > > key. Both searches can be done with fast scan if there are
> > small
> > > > > number
> > > > > > > of
> > > > > > > > keys and binary search for a larger number of keys.
> > > > > > > >
> > > > > > > > Alexey G., could you please compare this to our new
> marshalling
> > > > > > approach
> > > > > > > > you are about to merge?
> > > > > > > > BTW, it would be nice if you will describe it in details
> here.
> > > > > > > >
> > > > > > > > Sergi
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Regards,
> > > > > > >
> > > > > > > Atri
> > > > > > > *l'apprenant*
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > >
> > > > > Atri
> > > > > *l'apprenant*
> > > > >
> > > >
> > >
> >
>

Sergi

Re: Unstructured object format.

Ok, looks good to me.

Sergi

2015-07-22 10:46 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:

> On Tue, Jul 21, 2015 at 10:15 PM, Atri Sharma <[hidden email]> wrote:
>
> > So does that mean that local hashmap is not controlled with all the heavy
> > locks that are present around the cache?
> >
>
> Yes Atri, you are right. The data is stored in local hash map to avoid
> touching a distributed cache whenever serializing objects.
>
>
> > On 22 Jul 2015 07:31, "Alexey Goncharuk" <[hidden email]>
> > wrote:
> >
> > > Metadata cache access is backed by a local hash map, so the real cost
> is
> > a
> > > String object hashcode which is cached in the String object and a
> hashmap
> > > lookup by an integer key.
> > >
> > > On the other hand, the marshaller is still pluggable and after the
> ticket
> > > is completed, it should be fairly easy to implement this approach and
> > > compare performance.
> > >
> > > --Alexey
> > >
> > > 2015-07-21 10:01 GMT-07:00 Sergi Vladykin <[hidden email]>:
> > >
> > > > I think O(N) reasoning does not make a real sense here since N is
> > always
> > > > small, lets not fool ourselves.
> > > > To my mind operation cost of cache access (with all busy locks...),
> > > > hashCode/equals and stuff like that has much bigger impact here.
> > > > Do we still have a pluggable marshaller? Can my approach be
> implemented
> > > > separately?
> > > >
> > > > Sergi
> > > >
> > > >
> > > >
> > > >
> > > > 2015-07-21 9:14 GMT+03:00 Alexey Goncharuk <
> [hidden email]
> > >:
> > > >
> > > > > Currently an index-enabled serialized object form has the following
> > > > layout
> > > > > (simplified):
> > > > >
> > > > > [object fields][field1Offset,field1Length,
> > > > > field2Offset,field2Length,...,fieldNOffset,fieldNLength]
> > > > >
> > > > > where fields order is determined upon the first object
> serialization
> > > and
> > > > > stored in metadata cache which is available on all nodes. Thus, the
> > > field
> > > > > lookup is performed as follows:
> > > > >
> > > > > fieldName -> fieldIndex (metadata lookup, O(1)),
> > > fieldIndex->fieldOffset
> > > > in
> > > > > footer (O(1)), fieldOffset->fieldValue (O(1)).
> > > > >
> > > > > BTW, I am finalizing the branch with marshaller changes and will
> send
> > > > this
> > > > > for a preliminary review soon.
> > > > >
> > > > > 2015-07-16 0:55 GMT-07:00 Atri Sharma <[hidden email]>:
> > > > >
> > > > > > Keep in mind that JSONB's performance comes from the fact that it
> > > uses
> > > > > > server encoding, is binary represented and can have GIN indexes
> on
> > > top
> > > > of
> > > > > > it. Not sure if Ignite's marshalling approach keeps those
> features
> > as
> > > > > well.
> > > > > >
> > > > > > On Thu, Jul 16, 2015 at 1:20 PM, Sergi Vladykin <
> > > > > [hidden email]>
> > > > > > wrote:
> > > > > >
> > > > > > > HSTORE and JSONB appeared to have similar format in Postgresql
> > > > (because
> > > > > > > they was developed by the same people). They noticed that they
> > > > switched
> > > > > > off
> > > > > > > of using key length sorting because they sometimes need
> > > > lexicographical
> > > > > > > order but this is irrelevant for us.
> > > > > > >
> > > > > > > Sergi
> > > > > > >
> > > > > > > 2015-07-16 10:43 GMT+03:00 Atri Sharma <[hidden email]>:
> > > > > > >
> > > > > > > > Are you referring to JSONB here?
> > > > > > > >
> > > > > > > > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin <
> > > > > > > [hidden email]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Guys, specially Alexey G.
> > > > > > > > >
> > > > > > > > > I've attended PostgreSQL conference and there was a talk
> > about
> > > > > > > > unstructured
> > > > > > > > > data format.
> > > > > > > > > They had an interesting idea of serialized layout close
> > enough
> > > to
> > > > > > ours,
> > > > > > > > I'm
> > > > > > > > > not sure how much this is different from our approach and
> if
> > we
> > > > can
> > > > > > use
> > > > > > > > > some ideas from it but anywaus it looks really promising to
> > me
> > > > and
> > > > > I
> > > > > > > want
> > > > > > > > > to share.
> > > > > > > > >
> > > > > > > > > The structure basically is the following:
> > > > > > > > >
> > > > > > > > > [key headers] [keys] [values]
> > > > > > > > >
> > > > > > > > > Key headers are [key offset, key length] so they are of a
> > fixed
> > > > > > length.
> > > > > > > > >
> > > > > > > > > The cool idea here is that keys and respectively the key
> > > headers
> > > > > > sorted
> > > > > > > > by
> > > > > > > > > (key length, key) so that you can do a lookup first by fast
> > > > picking
> > > > > > key
> > > > > > > > of
> > > > > > > > > the needed length without looking at keys at all and then
> > pick
> > > an
> > > > > > exact
> > > > > > > > > key. Both searches can be done with fast scan if there are
> > > small
> > > > > > number
> > > > > > > > of
> > > > > > > > > keys and binary search for a larger number of keys.
> > > > > > > > >
> > > > > > > > > Alexey G., could you please compare this to our new
> > marshalling
> > > > > > > approach
> > > > > > > > > you are about to merge?
> > > > > > > > > BTW, it would be nice if you will describe it in details
> > here.
> > > > > > > > >
> > > > > > > > > Sergi
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Regards,
> > > > > > > >
> > > > > > > > Atri
> > > > > > > > *l'apprenant*
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > >
> > > > > > Atri
> > > > > > *l'apprenant*
> > > > > >
> > > > >
> > > >
> > >
> >
>

Atri Sharma

Re: Unstructured object format.

+1 from me too

On Wed, Jul 22, 2015 at 3:34 PM, Sergi Vladykin <[hidden email]>
wrote:

> Ok, looks good to me.
>
> Sergi
>
> 2015-07-22 10:46 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
>
> > On Tue, Jul 21, 2015 at 10:15 PM, Atri Sharma <[hidden email]>
> wrote:
> >
> > > So does that mean that local hashmap is not controlled with all the
> heavy
> > > locks that are present around the cache?
> > >
> >
> > Yes Atri, you are right. The data is stored in local hash map to avoid
> > touching a distributed cache whenever serializing objects.
> >
> >
> > > On 22 Jul 2015 07:31, "Alexey Goncharuk" <[hidden email]>
> > > wrote:
> > >
> > > > Metadata cache access is backed by a local hash map, so the real cost
> > is
> > > a
> > > > String object hashcode which is cached in the String object and a
> > hashmap
> > > > lookup by an integer key.
> > > >
> > > > On the other hand, the marshaller is still pluggable and after the
> > ticket
> > > > is completed, it should be fairly easy to implement this approach and
> > > > compare performance.
> > > >
> > > > --Alexey
> > > >
> > > > 2015-07-21 10:01 GMT-07:00 Sergi Vladykin <[hidden email]
> >:
> > > >
> > > > > I think O(N) reasoning does not make a real sense here since N is
> > > always
> > > > > small, lets not fool ourselves.
> > > > > To my mind operation cost of cache access (with all busy locks...),
> > > > > hashCode/equals and stuff like that has much bigger impact here.
> > > > > Do we still have a pluggable marshaller? Can my approach be
> > implemented
> > > > > separately?
> > > > >
> > > > > Sergi
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > 2015-07-21 9:14 GMT+03:00 Alexey Goncharuk <
> > [hidden email]
> > > >:
> > > > >
> > > > > > Currently an index-enabled serialized object form has the
> following
> > > > > layout
> > > > > > (simplified):
> > > > > >
> > > > > > [object fields][field1Offset,field1Length,
> > > > > > field2Offset,field2Length,...,fieldNOffset,fieldNLength]
> > > > > >
> > > > > > where fields order is determined upon the first object
> > serialization
> > > > and
> > > > > > stored in metadata cache which is available on all nodes. Thus,
> the
> > > > field
> > > > > > lookup is performed as follows:
> > > > > >
> > > > > > fieldName -> fieldIndex (metadata lookup, O(1)),
> > > > fieldIndex->fieldOffset
> > > > > in
> > > > > > footer (O(1)), fieldOffset->fieldValue (O(1)).
> > > > > >
> > > > > > BTW, I am finalizing the branch with marshaller changes and will
> > send
> > > > > this
> > > > > > for a preliminary review soon.
> > > > > >
> > > > > > 2015-07-16 0:55 GMT-07:00 Atri Sharma <[hidden email]>:
> > > > > >
> > > > > > > Keep in mind that JSONB's performance comes from the fact that
> it
> > > > uses
> > > > > > > server encoding, is binary represented and can have GIN indexes
> > on
> > > > top
> > > > > of
> > > > > > > it. Not sure if Ignite's marshalling approach keeps those
> > features
> > > as
> > > > > > well.
> > > > > > >
> > > > > > > On Thu, Jul 16, 2015 at 1:20 PM, Sergi Vladykin <
> > > > > > [hidden email]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > HSTORE and JSONB appeared to have similar format in
> Postgresql
> > > > > (because
> > > > > > > > they was developed by the same people). They noticed that
> they
> > > > > switched
> > > > > > > off
> > > > > > > > of using key length sorting because they sometimes need
> > > > > lexicographical
> > > > > > > > order but this is irrelevant for us.
> > > > > > > >
> > > > > > > > Sergi
> > > > > > > >
> > > > > > > > 2015-07-16 10:43 GMT+03:00 Atri Sharma <[hidden email]
> >:
> > > > > > > >
> > > > > > > > > Are you referring to JSONB here?
> > > > > > > > >
> > > > > > > > > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin <
> > > > > > > > [hidden email]>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Guys, specially Alexey G.
> > > > > > > > > >
> > > > > > > > > > I've attended PostgreSQL conference and there was a talk
> > > about
> > > > > > > > > unstructured
> > > > > > > > > > data format.
> > > > > > > > > > They had an interesting idea of serialized layout close
> > > enough
> > > > to
> > > > > > > ours,
> > > > > > > > > I'm
> > > > > > > > > > not sure how much this is different from our approach and
> > if
> > > we
> > > > > can
> > > > > > > use
> > > > > > > > > > some ideas from it but anywaus it looks really promising
> to
> > > me
> > > > > and
> > > > > > I
> > > > > > > > want
> > > > > > > > > > to share.
> > > > > > > > > >
> > > > > > > > > > The structure basically is the following:
> > > > > > > > > >
> > > > > > > > > > [key headers] [keys] [values]
> > > > > > > > > >
> > > > > > > > > > Key headers are [key offset, key length] so they are of a
> > > fixed
> > > > > > > length.
> > > > > > > > > >
> > > > > > > > > > The cool idea here is that keys and respectively the key
> > > > headers
> > > > > > > sorted
> > > > > > > > > by
> > > > > > > > > > (key length, key) so that you can do a lookup first by
> fast
> > > > > picking
> > > > > > > key
> > > > > > > > > of
> > > > > > > > > > the needed length without looking at keys at all and then
> > > pick
> > > > an
> > > > > > > exact
> > > > > > > > > > key. Both searches can be done with fast scan if there
> are
> > > > small
> > > > > > > number
> > > > > > > > > of
> > > > > > > > > > keys and binary search for a larger number of keys.
> > > > > > > > > >
> > > > > > > > > > Alexey G., could you please compare this to our new
> > > marshalling
> > > > > > > > approach
> > > > > > > > > > you are about to merge?
> > > > > > > > > > BTW, it would be nice if you will describe it in details
> > > here.
> > > > > > > > > >
> > > > > > > > > > Sergi
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Regards,
> > > > > > > > >
> > > > > > > > > Atri
> > > > > > > > > *l'apprenant*
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Regards,
> > > > > > >
> > > > > > > Atri
> > > > > > > *l'apprenant*
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

--
Regards,

Atri
*l'apprenant*

Atri Sharma

Re: Unstructured object format.

+1 as well.

I wonder if it makes sense to further enhance the header fields to have
more metadata readily available (so that we can support more JSONB like
operators in a much efficient manner).

On Wed, Jul 22, 2015 at 3:37 PM, Atri Sharma <[hidden email]> wrote:

> +1 from me too
>
> On Wed, Jul 22, 2015 at 3:34 PM, Sergi Vladykin <[hidden email]>
> wrote:
>
>> Ok, looks good to me.
>>
>> Sergi
>>
>> 2015-07-22 10:46 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
>>
>> > On Tue, Jul 21, 2015 at 10:15 PM, Atri Sharma <[hidden email]>
>> wrote:
>> >
>> > > So does that mean that local hashmap is not controlled with all the
>> heavy
>> > > locks that are present around the cache?
>> > >
>> >
>> > Yes Atri, you are right. The data is stored in local hash map to avoid
>> > touching a distributed cache whenever serializing objects.
>> >
>> >
>> > > On 22 Jul 2015 07:31, "Alexey Goncharuk" <[hidden email]>
>> > > wrote:
>> > >
>> > > > Metadata cache access is backed by a local hash map, so the real
>> cost
>> > is
>> > > a
>> > > > String object hashcode which is cached in the String object and a
>> > hashmap
>> > > > lookup by an integer key.
>> > > >
>> > > > On the other hand, the marshaller is still pluggable and after the
>> > ticket
>> > > > is completed, it should be fairly easy to implement this approach
>> and
>> > > > compare performance.
>> > > >
>> > > > --Alexey
>> > > >
>> > > > 2015-07-21 10:01 GMT-07:00 Sergi Vladykin <[hidden email]
>> >:
>> > > >
>> > > > > I think O(N) reasoning does not make a real sense here since N is
>> > > always
>> > > > > small, lets not fool ourselves.
>> > > > > To my mind operation cost of cache access (with all busy
>> locks...),
>> > > > > hashCode/equals and stuff like that has much bigger impact here.
>> > > > > Do we still have a pluggable marshaller? Can my approach be
>> > implemented
>> > > > > separately?
>> > > > >
>> > > > > Sergi
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > 2015-07-21 9:14 GMT+03:00 Alexey Goncharuk <
>> > [hidden email]
>> > > >:
>> > > > >
>> > > > > > Currently an index-enabled serialized object form has the
>> following
>> > > > > layout
>> > > > > > (simplified):
>> > > > > >
>> > > > > > [object fields][field1Offset,field1Length,
>> > > > > > field2Offset,field2Length,...,fieldNOffset,fieldNLength]
>> > > > > >
>> > > > > > where fields order is determined upon the first object
>> > serialization
>> > > > and
>> > > > > > stored in metadata cache which is available on all nodes. Thus,
>> the
>> > > > field
>> > > > > > lookup is performed as follows:
>> > > > > >
>> > > > > > fieldName -> fieldIndex (metadata lookup, O(1)),
>> > > > fieldIndex->fieldOffset
>> > > > > in
>> > > > > > footer (O(1)), fieldOffset->fieldValue (O(1)).
>> > > > > >
>> > > > > > BTW, I am finalizing the branch with marshaller changes and will
>> > send
>> > > > > this
>> > > > > > for a preliminary review soon.
>> > > > > >
>> > > > > > 2015-07-16 0:55 GMT-07:00 Atri Sharma <[hidden email]>:
>> > > > > >
>> > > > > > > Keep in mind that JSONB's performance comes from the fact
>> that it
>> > > > uses
>> > > > > > > server encoding, is binary represented and can have GIN
>> indexes
>> > on
>> > > > top
>> > > > > of
>> > > > > > > it. Not sure if Ignite's marshalling approach keeps those
>> > features
>> > > as
>> > > > > > well.
>> > > > > > >
>> > > > > > > On Thu, Jul 16, 2015 at 1:20 PM, Sergi Vladykin <
>> > > > > > [hidden email]>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > HSTORE and JSONB appeared to have similar format in
>> Postgresql
>> > > > > (because
>> > > > > > > > they was developed by the same people). They noticed that
>> they
>> > > > > switched
>> > > > > > > off
>> > > > > > > > of using key length sorting because they sometimes need
>> > > > > lexicographical
>> > > > > > > > order but this is irrelevant for us.
>> > > > > > > >
>> > > > > > > > Sergi
>> > > > > > > >
>> > > > > > > > 2015-07-16 10:43 GMT+03:00 Atri Sharma <[hidden email]
>> >:
>> > > > > > > >
>> > > > > > > > > Are you referring to JSONB here?
>> > > > > > > > >
>> > > > > > > > > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin <
>> > > > > > > > [hidden email]>
>>
>> > > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Guys, specially Alexey G.
>> > > > > > > > > >
>> > > > > > > > > > I've attended PostgreSQL conference and there was a talk
>> > > about
>> > > > > > > > > unstructured
>> > > > > > > > > > data format.
>> > > > > > > > > > They had an interesting idea of serialized layout close
>> > > enough
>> > > > to
>> > > > > > > ours,
>> > > > > > > > > I'm
>> > > > > > > > > > not sure how much this is different from our approach
>> and
>> > if
>> > > we
>> > > > > can
>> > > > > > > use
>> > > > > > > > > > some ideas from it but anywaus it looks really
>> promising to
>> > > me
>> > > > > and
>> > > > > > I
>> > > > > > > > want
>> > > > > > > > > > to share.
>> > > > > > > > > >
>> > > > > > > > > > The structure basically is the following:
>> > > > > > > > > >
>> > > > > > > > > > [key headers] [keys] [values]
>> > > > > > > > > >
>> > > > > > > > > > Key headers are [key offset, key length] so they are of
>> a
>> > > fixed
>> > > > > > > length.
>> > > > > > > > > >
>> > > > > > > > > > The cool idea here is that keys and respectively the key
>> > > > headers
>> > > > > > > sorted
>> > > > > > > > > by
>> > > > > > > > > > (key length, key) so that you can do a lookup first by
>> fast
>> > > > > picking
>> > > > > > > key
>> > > > > > > > > of
>> > > > > > > > > > the needed length without looking at keys at all and
>> then
>> > > pick
>> > > > an
>> > > > > > > exact
>> > > > > > > > > > key. Both searches can be done with fast scan if there
>> are
>> > > > small
>> > > > > > > number
>> > > > > > > > > of
>> > > > > > > > > > keys and binary search for a larger number of keys.
>> > > > > > > > > >
>> > > > > > > > > > Alexey G., could you please compare this to our new
>> > > marshalling
>> > > > > > > > approach
>> > > > > > > > > > you are about to merge?
>> > > > > > > > > > BTW, it would be nice if you will describe it in details
>> > > here.
>> > > > > > > > > >
>> > > > > > > > > > Sergi
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > > Regards,
>> > > > > > > > >
>> > > > > > > > > Atri
>> > > > > > > > > *l'apprenant*
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > > Regards,
>> > > > > > >
>> > > > > > > Atri
>> > > > > > > *l'apprenant*
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>
>
> --
> Regards,
>
> Atri
> *l'apprenant*
>

--
Regards,

Atri
*l'apprenant*