Guys, specially Alexey G.
I've attended PostgreSQL conference and there was a talk about unstructured data format. They had an interesting idea of serialized layout close enough to ours, I'm not sure how much this is different from our approach and if we can use some ideas from it but anywaus it looks really promising to me and I want to share. The structure basically is the following: [key headers] [keys] [values] Key headers are [key offset, key length] so they are of a fixed length. The cool idea here is that keys and respectively the key headers sorted by (key length, key) so that you can do a lookup first by fast picking key of the needed length without looking at keys at all and then pick an exact key. Both searches can be done with fast scan if there are small number of keys and binary search for a larger number of keys. Alexey G., could you please compare this to our new marshalling approach you are about to merge? BTW, it would be nice if you will describe it in details here. Sergi |
Are you referring to JSONB here?
On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin <[hidden email]> wrote: > Guys, specially Alexey G. > > I've attended PostgreSQL conference and there was a talk about unstructured > data format. > They had an interesting idea of serialized layout close enough to ours, I'm > not sure how much this is different from our approach and if we can use > some ideas from it but anywaus it looks really promising to me and I want > to share. > > The structure basically is the following: > > [key headers] [keys] [values] > > Key headers are [key offset, key length] so they are of a fixed length. > > The cool idea here is that keys and respectively the key headers sorted by > (key length, key) so that you can do a lookup first by fast picking key of > the needed length without looking at keys at all and then pick an exact > key. Both searches can be done with fast scan if there are small number of > keys and binary search for a larger number of keys. > > Alexey G., could you please compare this to our new marshalling approach > you are about to merge? > BTW, it would be nice if you will describe it in details here. > > Sergi > -- Regards, Atri *l'apprenant* |
HSTORE and JSONB appeared to have similar format in Postgresql (because
they was developed by the same people). They noticed that they switched off of using key length sorting because they sometimes need lexicographical order but this is irrelevant for us. Sergi 2015-07-16 10:43 GMT+03:00 Atri Sharma <[hidden email]>: > Are you referring to JSONB here? > > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin <[hidden email]> > wrote: > > > Guys, specially Alexey G. > > > > I've attended PostgreSQL conference and there was a talk about > unstructured > > data format. > > They had an interesting idea of serialized layout close enough to ours, > I'm > > not sure how much this is different from our approach and if we can use > > some ideas from it but anywaus it looks really promising to me and I want > > to share. > > > > The structure basically is the following: > > > > [key headers] [keys] [values] > > > > Key headers are [key offset, key length] so they are of a fixed length. > > > > The cool idea here is that keys and respectively the key headers sorted > by > > (key length, key) so that you can do a lookup first by fast picking key > of > > the needed length without looking at keys at all and then pick an exact > > key. Both searches can be done with fast scan if there are small number > of > > keys and binary search for a larger number of keys. > > > > Alexey G., could you please compare this to our new marshalling approach > > you are about to merge? > > BTW, it would be nice if you will describe it in details here. > > > > Sergi > > > > > > -- > Regards, > > Atri > *l'apprenant* > |
Keep in mind that JSONB's performance comes from the fact that it uses
server encoding, is binary represented and can have GIN indexes on top of it. Not sure if Ignite's marshalling approach keeps those features as well. On Thu, Jul 16, 2015 at 1:20 PM, Sergi Vladykin <[hidden email]> wrote: > HSTORE and JSONB appeared to have similar format in Postgresql (because > they was developed by the same people). They noticed that they switched off > of using key length sorting because they sometimes need lexicographical > order but this is irrelevant for us. > > Sergi > > 2015-07-16 10:43 GMT+03:00 Atri Sharma <[hidden email]>: > > > Are you referring to JSONB here? > > > > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin < > [hidden email]> > > wrote: > > > > > Guys, specially Alexey G. > > > > > > I've attended PostgreSQL conference and there was a talk about > > unstructured > > > data format. > > > They had an interesting idea of serialized layout close enough to ours, > > I'm > > > not sure how much this is different from our approach and if we can use > > > some ideas from it but anywaus it looks really promising to me and I > want > > > to share. > > > > > > The structure basically is the following: > > > > > > [key headers] [keys] [values] > > > > > > Key headers are [key offset, key length] so they are of a fixed length. > > > > > > The cool idea here is that keys and respectively the key headers sorted > > by > > > (key length, key) so that you can do a lookup first by fast picking key > > of > > > the needed length without looking at keys at all and then pick an exact > > > key. Both searches can be done with fast scan if there are small number > > of > > > keys and binary search for a larger number of keys. > > > > > > Alexey G., could you please compare this to our new marshalling > approach > > > you are about to merge? > > > BTW, it would be nice if you will describe it in details here. > > > > > > Sergi > > > > > > > > > > > -- > > Regards, > > > > Atri > > *l'apprenant* > > > -- Regards, Atri *l'apprenant* |
Currently an index-enabled serialized object form has the following layout
(simplified): [object fields][field1Offset,field1Length, field2Offset,field2Length,...,fieldNOffset,fieldNLength] where fields order is determined upon the first object serialization and stored in metadata cache which is available on all nodes. Thus, the field lookup is performed as follows: fieldName -> fieldIndex (metadata lookup, O(1)), fieldIndex->fieldOffset in footer (O(1)), fieldOffset->fieldValue (O(1)). BTW, I am finalizing the branch with marshaller changes and will send this for a preliminary review soon. 2015-07-16 0:55 GMT-07:00 Atri Sharma <[hidden email]>: > Keep in mind that JSONB's performance comes from the fact that it uses > server encoding, is binary represented and can have GIN indexes on top of > it. Not sure if Ignite's marshalling approach keeps those features as well. > > On Thu, Jul 16, 2015 at 1:20 PM, Sergi Vladykin <[hidden email]> > wrote: > > > HSTORE and JSONB appeared to have similar format in Postgresql (because > > they was developed by the same people). They noticed that they switched > off > > of using key length sorting because they sometimes need lexicographical > > order but this is irrelevant for us. > > > > Sergi > > > > 2015-07-16 10:43 GMT+03:00 Atri Sharma <[hidden email]>: > > > > > Are you referring to JSONB here? > > > > > > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin < > > [hidden email]> > > > wrote: > > > > > > > Guys, specially Alexey G. > > > > > > > > I've attended PostgreSQL conference and there was a talk about > > > unstructured > > > > data format. > > > > They had an interesting idea of serialized layout close enough to > ours, > > > I'm > > > > not sure how much this is different from our approach and if we can > use > > > > some ideas from it but anywaus it looks really promising to me and I > > want > > > > to share. > > > > > > > > The structure basically is the following: > > > > > > > > [key headers] [keys] [values] > > > > > > > > Key headers are [key offset, key length] so they are of a fixed > length. > > > > > > > > The cool idea here is that keys and respectively the key headers > sorted > > > by > > > > (key length, key) so that you can do a lookup first by fast picking > key > > > of > > > > the needed length without looking at keys at all and then pick an > exact > > > > key. Both searches can be done with fast scan if there are small > number > > > of > > > > keys and binary search for a larger number of keys. > > > > > > > > Alexey G., could you please compare this to our new marshalling > > approach > > > > you are about to merge? > > > > BTW, it would be nice if you will describe it in details here. > > > > > > > > Sergi > > > > > > > > > > > > > > > > -- > > > Regards, > > > > > > Atri > > > *l'apprenant* > > > > > > > > > -- > Regards, > > Atri > *l'apprenant* > |
I think O(N) reasoning does not make a real sense here since N is always
small, lets not fool ourselves. To my mind operation cost of cache access (with all busy locks...), hashCode/equals and stuff like that has much bigger impact here. Do we still have a pluggable marshaller? Can my approach be implemented separately? Sergi 2015-07-21 9:14 GMT+03:00 Alexey Goncharuk <[hidden email]>: > Currently an index-enabled serialized object form has the following layout > (simplified): > > [object fields][field1Offset,field1Length, > field2Offset,field2Length,...,fieldNOffset,fieldNLength] > > where fields order is determined upon the first object serialization and > stored in metadata cache which is available on all nodes. Thus, the field > lookup is performed as follows: > > fieldName -> fieldIndex (metadata lookup, O(1)), fieldIndex->fieldOffset in > footer (O(1)), fieldOffset->fieldValue (O(1)). > > BTW, I am finalizing the branch with marshaller changes and will send this > for a preliminary review soon. > > 2015-07-16 0:55 GMT-07:00 Atri Sharma <[hidden email]>: > > > Keep in mind that JSONB's performance comes from the fact that it uses > > server encoding, is binary represented and can have GIN indexes on top of > > it. Not sure if Ignite's marshalling approach keeps those features as > well. > > > > On Thu, Jul 16, 2015 at 1:20 PM, Sergi Vladykin < > [hidden email]> > > wrote: > > > > > HSTORE and JSONB appeared to have similar format in Postgresql (because > > > they was developed by the same people). They noticed that they switched > > off > > > of using key length sorting because they sometimes need lexicographical > > > order but this is irrelevant for us. > > > > > > Sergi > > > > > > 2015-07-16 10:43 GMT+03:00 Atri Sharma <[hidden email]>: > > > > > > > Are you referring to JSONB here? > > > > > > > > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin < > > > [hidden email]> > > > > wrote: > > > > > > > > > Guys, specially Alexey G. > > > > > > > > > > I've attended PostgreSQL conference and there was a talk about > > > > unstructured > > > > > data format. > > > > > They had an interesting idea of serialized layout close enough to > > ours, > > > > I'm > > > > > not sure how much this is different from our approach and if we can > > use > > > > > some ideas from it but anywaus it looks really promising to me and > I > > > want > > > > > to share. > > > > > > > > > > The structure basically is the following: > > > > > > > > > > [key headers] [keys] [values] > > > > > > > > > > Key headers are [key offset, key length] so they are of a fixed > > length. > > > > > > > > > > The cool idea here is that keys and respectively the key headers > > sorted > > > > by > > > > > (key length, key) so that you can do a lookup first by fast picking > > key > > > > of > > > > > the needed length without looking at keys at all and then pick an > > exact > > > > > key. Both searches can be done with fast scan if there are small > > number > > > > of > > > > > keys and binary search for a larger number of keys. > > > > > > > > > > Alexey G., could you please compare this to our new marshalling > > > approach > > > > > you are about to merge? > > > > > BTW, it would be nice if you will describe it in details here. > > > > > > > > > > Sergi > > > > > > > > > > > > > > > > > > > > > -- > > > > Regards, > > > > > > > > Atri > > > > *l'apprenant* > > > > > > > > > > > > > > > -- > > Regards, > > > > Atri > > *l'apprenant* > > > |
Metadata cache access is backed by a local hash map, so the real cost is a
String object hashcode which is cached in the String object and a hashmap lookup by an integer key. On the other hand, the marshaller is still pluggable and after the ticket is completed, it should be fairly easy to implement this approach and compare performance. --Alexey 2015-07-21 10:01 GMT-07:00 Sergi Vladykin <[hidden email]>: > I think O(N) reasoning does not make a real sense here since N is always > small, lets not fool ourselves. > To my mind operation cost of cache access (with all busy locks...), > hashCode/equals and stuff like that has much bigger impact here. > Do we still have a pluggable marshaller? Can my approach be implemented > separately? > > Sergi > > > > > 2015-07-21 9:14 GMT+03:00 Alexey Goncharuk <[hidden email]>: > > > Currently an index-enabled serialized object form has the following > layout > > (simplified): > > > > [object fields][field1Offset,field1Length, > > field2Offset,field2Length,...,fieldNOffset,fieldNLength] > > > > where fields order is determined upon the first object serialization and > > stored in metadata cache which is available on all nodes. Thus, the field > > lookup is performed as follows: > > > > fieldName -> fieldIndex (metadata lookup, O(1)), fieldIndex->fieldOffset > in > > footer (O(1)), fieldOffset->fieldValue (O(1)). > > > > BTW, I am finalizing the branch with marshaller changes and will send > this > > for a preliminary review soon. > > > > 2015-07-16 0:55 GMT-07:00 Atri Sharma <[hidden email]>: > > > > > Keep in mind that JSONB's performance comes from the fact that it uses > > > server encoding, is binary represented and can have GIN indexes on top > of > > > it. Not sure if Ignite's marshalling approach keeps those features as > > well. > > > > > > On Thu, Jul 16, 2015 at 1:20 PM, Sergi Vladykin < > > [hidden email]> > > > wrote: > > > > > > > HSTORE and JSONB appeared to have similar format in Postgresql > (because > > > > they was developed by the same people). They noticed that they > switched > > > off > > > > of using key length sorting because they sometimes need > lexicographical > > > > order but this is irrelevant for us. > > > > > > > > Sergi > > > > > > > > 2015-07-16 10:43 GMT+03:00 Atri Sharma <[hidden email]>: > > > > > > > > > Are you referring to JSONB here? > > > > > > > > > > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin < > > > > [hidden email]> > > > > > wrote: > > > > > > > > > > > Guys, specially Alexey G. > > > > > > > > > > > > I've attended PostgreSQL conference and there was a talk about > > > > > unstructured > > > > > > data format. > > > > > > They had an interesting idea of serialized layout close enough to > > > ours, > > > > > I'm > > > > > > not sure how much this is different from our approach and if we > can > > > use > > > > > > some ideas from it but anywaus it looks really promising to me > and > > I > > > > want > > > > > > to share. > > > > > > > > > > > > The structure basically is the following: > > > > > > > > > > > > [key headers] [keys] [values] > > > > > > > > > > > > Key headers are [key offset, key length] so they are of a fixed > > > length. > > > > > > > > > > > > The cool idea here is that keys and respectively the key headers > > > sorted > > > > > by > > > > > > (key length, key) so that you can do a lookup first by fast > picking > > > key > > > > > of > > > > > > the needed length without looking at keys at all and then pick an > > > exact > > > > > > key. Both searches can be done with fast scan if there are small > > > number > > > > > of > > > > > > keys and binary search for a larger number of keys. > > > > > > > > > > > > Alexey G., could you please compare this to our new marshalling > > > > approach > > > > > > you are about to merge? > > > > > > BTW, it would be nice if you will describe it in details here. > > > > > > > > > > > > Sergi > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Regards, > > > > > > > > > > Atri > > > > > *l'apprenant* > > > > > > > > > > > > > > > > > > > > > -- > > > Regards, > > > > > > Atri > > > *l'apprenant* > > > > > > |
So does that mean that local hashmap is not controlled with all the heavy
locks that are present around the cache? On 22 Jul 2015 07:31, "Alexey Goncharuk" <[hidden email]> wrote: > Metadata cache access is backed by a local hash map, so the real cost is a > String object hashcode which is cached in the String object and a hashmap > lookup by an integer key. > > On the other hand, the marshaller is still pluggable and after the ticket > is completed, it should be fairly easy to implement this approach and > compare performance. > > --Alexey > > 2015-07-21 10:01 GMT-07:00 Sergi Vladykin <[hidden email]>: > > > I think O(N) reasoning does not make a real sense here since N is always > > small, lets not fool ourselves. > > To my mind operation cost of cache access (with all busy locks...), > > hashCode/equals and stuff like that has much bigger impact here. > > Do we still have a pluggable marshaller? Can my approach be implemented > > separately? > > > > Sergi > > > > > > > > > > 2015-07-21 9:14 GMT+03:00 Alexey Goncharuk <[hidden email]>: > > > > > Currently an index-enabled serialized object form has the following > > layout > > > (simplified): > > > > > > [object fields][field1Offset,field1Length, > > > field2Offset,field2Length,...,fieldNOffset,fieldNLength] > > > > > > where fields order is determined upon the first object serialization > and > > > stored in metadata cache which is available on all nodes. Thus, the > field > > > lookup is performed as follows: > > > > > > fieldName -> fieldIndex (metadata lookup, O(1)), > fieldIndex->fieldOffset > > in > > > footer (O(1)), fieldOffset->fieldValue (O(1)). > > > > > > BTW, I am finalizing the branch with marshaller changes and will send > > this > > > for a preliminary review soon. > > > > > > 2015-07-16 0:55 GMT-07:00 Atri Sharma <[hidden email]>: > > > > > > > Keep in mind that JSONB's performance comes from the fact that it > uses > > > > server encoding, is binary represented and can have GIN indexes on > top > > of > > > > it. Not sure if Ignite's marshalling approach keeps those features as > > > well. > > > > > > > > On Thu, Jul 16, 2015 at 1:20 PM, Sergi Vladykin < > > > [hidden email]> > > > > wrote: > > > > > > > > > HSTORE and JSONB appeared to have similar format in Postgresql > > (because > > > > > they was developed by the same people). They noticed that they > > switched > > > > off > > > > > of using key length sorting because they sometimes need > > lexicographical > > > > > order but this is irrelevant for us. > > > > > > > > > > Sergi > > > > > > > > > > 2015-07-16 10:43 GMT+03:00 Atri Sharma <[hidden email]>: > > > > > > > > > > > Are you referring to JSONB here? > > > > > > > > > > > > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin < > > > > > [hidden email]> > > > > > > wrote: > > > > > > > > > > > > > Guys, specially Alexey G. > > > > > > > > > > > > > > I've attended PostgreSQL conference and there was a talk about > > > > > > unstructured > > > > > > > data format. > > > > > > > They had an interesting idea of serialized layout close enough > to > > > > ours, > > > > > > I'm > > > > > > > not sure how much this is different from our approach and if we > > can > > > > use > > > > > > > some ideas from it but anywaus it looks really promising to me > > and > > > I > > > > > want > > > > > > > to share. > > > > > > > > > > > > > > The structure basically is the following: > > > > > > > > > > > > > > [key headers] [keys] [values] > > > > > > > > > > > > > > Key headers are [key offset, key length] so they are of a fixed > > > > length. > > > > > > > > > > > > > > The cool idea here is that keys and respectively the key > headers > > > > sorted > > > > > > by > > > > > > > (key length, key) so that you can do a lookup first by fast > > picking > > > > key > > > > > > of > > > > > > > the needed length without looking at keys at all and then pick > an > > > > exact > > > > > > > key. Both searches can be done with fast scan if there are > small > > > > number > > > > > > of > > > > > > > keys and binary search for a larger number of keys. > > > > > > > > > > > > > > Alexey G., could you please compare this to our new marshalling > > > > > approach > > > > > > > you are about to merge? > > > > > > > BTW, it would be nice if you will describe it in details here. > > > > > > > > > > > > > > Sergi > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Regards, > > > > > > > > > > > > Atri > > > > > > *l'apprenant* > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Regards, > > > > > > > > Atri > > > > *l'apprenant* > > > > > > > > > > |
2015-07-21 22:15 GMT-07:00 Atri Sharma <[hidden email]>:
> So does that mean that local hashmap is not controlled with all the heavy > locks that are present around the cache? > Metadata cache is basically populated once and then never changes, so only first get goes to the cache which is guarded by gateways and other logic, then result is cached in the local map and all ongoing gets are done from that map. |
In reply to this post by Atri Sharma
On Tue, Jul 21, 2015 at 10:15 PM, Atri Sharma <[hidden email]> wrote:
> So does that mean that local hashmap is not controlled with all the heavy > locks that are present around the cache? > Yes Atri, you are right. The data is stored in local hash map to avoid touching a distributed cache whenever serializing objects. > On 22 Jul 2015 07:31, "Alexey Goncharuk" <[hidden email]> > wrote: > > > Metadata cache access is backed by a local hash map, so the real cost is > a > > String object hashcode which is cached in the String object and a hashmap > > lookup by an integer key. > > > > On the other hand, the marshaller is still pluggable and after the ticket > > is completed, it should be fairly easy to implement this approach and > > compare performance. > > > > --Alexey > > > > 2015-07-21 10:01 GMT-07:00 Sergi Vladykin <[hidden email]>: > > > > > I think O(N) reasoning does not make a real sense here since N is > always > > > small, lets not fool ourselves. > > > To my mind operation cost of cache access (with all busy locks...), > > > hashCode/equals and stuff like that has much bigger impact here. > > > Do we still have a pluggable marshaller? Can my approach be implemented > > > separately? > > > > > > Sergi > > > > > > > > > > > > > > > 2015-07-21 9:14 GMT+03:00 Alexey Goncharuk <[hidden email] > >: > > > > > > > Currently an index-enabled serialized object form has the following > > > layout > > > > (simplified): > > > > > > > > [object fields][field1Offset,field1Length, > > > > field2Offset,field2Length,...,fieldNOffset,fieldNLength] > > > > > > > > where fields order is determined upon the first object serialization > > and > > > > stored in metadata cache which is available on all nodes. Thus, the > > field > > > > lookup is performed as follows: > > > > > > > > fieldName -> fieldIndex (metadata lookup, O(1)), > > fieldIndex->fieldOffset > > > in > > > > footer (O(1)), fieldOffset->fieldValue (O(1)). > > > > > > > > BTW, I am finalizing the branch with marshaller changes and will send > > > this > > > > for a preliminary review soon. > > > > > > > > 2015-07-16 0:55 GMT-07:00 Atri Sharma <[hidden email]>: > > > > > > > > > Keep in mind that JSONB's performance comes from the fact that it > > uses > > > > > server encoding, is binary represented and can have GIN indexes on > > top > > > of > > > > > it. Not sure if Ignite's marshalling approach keeps those features > as > > > > well. > > > > > > > > > > On Thu, Jul 16, 2015 at 1:20 PM, Sergi Vladykin < > > > > [hidden email]> > > > > > wrote: > > > > > > > > > > > HSTORE and JSONB appeared to have similar format in Postgresql > > > (because > > > > > > they was developed by the same people). They noticed that they > > > switched > > > > > off > > > > > > of using key length sorting because they sometimes need > > > lexicographical > > > > > > order but this is irrelevant for us. > > > > > > > > > > > > Sergi > > > > > > > > > > > > 2015-07-16 10:43 GMT+03:00 Atri Sharma <[hidden email]>: > > > > > > > > > > > > > Are you referring to JSONB here? > > > > > > > > > > > > > > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin < > > > > > > [hidden email]> > > > > > > > wrote: > > > > > > > > > > > > > > > Guys, specially Alexey G. > > > > > > > > > > > > > > > > I've attended PostgreSQL conference and there was a talk > about > > > > > > > unstructured > > > > > > > > data format. > > > > > > > > They had an interesting idea of serialized layout close > enough > > to > > > > > ours, > > > > > > > I'm > > > > > > > > not sure how much this is different from our approach and if > we > > > can > > > > > use > > > > > > > > some ideas from it but anywaus it looks really promising to > me > > > and > > > > I > > > > > > want > > > > > > > > to share. > > > > > > > > > > > > > > > > The structure basically is the following: > > > > > > > > > > > > > > > > [key headers] [keys] [values] > > > > > > > > > > > > > > > > Key headers are [key offset, key length] so they are of a > fixed > > > > > length. > > > > > > > > > > > > > > > > The cool idea here is that keys and respectively the key > > headers > > > > > sorted > > > > > > > by > > > > > > > > (key length, key) so that you can do a lookup first by fast > > > picking > > > > > key > > > > > > > of > > > > > > > > the needed length without looking at keys at all and then > pick > > an > > > > > exact > > > > > > > > key. Both searches can be done with fast scan if there are > > small > > > > > number > > > > > > > of > > > > > > > > keys and binary search for a larger number of keys. > > > > > > > > > > > > > > > > Alexey G., could you please compare this to our new > marshalling > > > > > > approach > > > > > > > > you are about to merge? > > > > > > > > BTW, it would be nice if you will describe it in details > here. > > > > > > > > > > > > > > > > Sergi > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Regards, > > > > > > > > > > > > > > Atri > > > > > > > *l'apprenant* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Regards, > > > > > > > > > > Atri > > > > > *l'apprenant* > > > > > > > > > > > > > > > |
Ok, looks good to me.
Sergi 2015-07-22 10:46 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > On Tue, Jul 21, 2015 at 10:15 PM, Atri Sharma <[hidden email]> wrote: > > > So does that mean that local hashmap is not controlled with all the heavy > > locks that are present around the cache? > > > > Yes Atri, you are right. The data is stored in local hash map to avoid > touching a distributed cache whenever serializing objects. > > > > On 22 Jul 2015 07:31, "Alexey Goncharuk" <[hidden email]> > > wrote: > > > > > Metadata cache access is backed by a local hash map, so the real cost > is > > a > > > String object hashcode which is cached in the String object and a > hashmap > > > lookup by an integer key. > > > > > > On the other hand, the marshaller is still pluggable and after the > ticket > > > is completed, it should be fairly easy to implement this approach and > > > compare performance. > > > > > > --Alexey > > > > > > 2015-07-21 10:01 GMT-07:00 Sergi Vladykin <[hidden email]>: > > > > > > > I think O(N) reasoning does not make a real sense here since N is > > always > > > > small, lets not fool ourselves. > > > > To my mind operation cost of cache access (with all busy locks...), > > > > hashCode/equals and stuff like that has much bigger impact here. > > > > Do we still have a pluggable marshaller? Can my approach be > implemented > > > > separately? > > > > > > > > Sergi > > > > > > > > > > > > > > > > > > > > 2015-07-21 9:14 GMT+03:00 Alexey Goncharuk < > [hidden email] > > >: > > > > > > > > > Currently an index-enabled serialized object form has the following > > > > layout > > > > > (simplified): > > > > > > > > > > [object fields][field1Offset,field1Length, > > > > > field2Offset,field2Length,...,fieldNOffset,fieldNLength] > > > > > > > > > > where fields order is determined upon the first object > serialization > > > and > > > > > stored in metadata cache which is available on all nodes. Thus, the > > > field > > > > > lookup is performed as follows: > > > > > > > > > > fieldName -> fieldIndex (metadata lookup, O(1)), > > > fieldIndex->fieldOffset > > > > in > > > > > footer (O(1)), fieldOffset->fieldValue (O(1)). > > > > > > > > > > BTW, I am finalizing the branch with marshaller changes and will > send > > > > this > > > > > for a preliminary review soon. > > > > > > > > > > 2015-07-16 0:55 GMT-07:00 Atri Sharma <[hidden email]>: > > > > > > > > > > > Keep in mind that JSONB's performance comes from the fact that it > > > uses > > > > > > server encoding, is binary represented and can have GIN indexes > on > > > top > > > > of > > > > > > it. Not sure if Ignite's marshalling approach keeps those > features > > as > > > > > well. > > > > > > > > > > > > On Thu, Jul 16, 2015 at 1:20 PM, Sergi Vladykin < > > > > > [hidden email]> > > > > > > wrote: > > > > > > > > > > > > > HSTORE and JSONB appeared to have similar format in Postgresql > > > > (because > > > > > > > they was developed by the same people). They noticed that they > > > > switched > > > > > > off > > > > > > > of using key length sorting because they sometimes need > > > > lexicographical > > > > > > > order but this is irrelevant for us. > > > > > > > > > > > > > > Sergi > > > > > > > > > > > > > > 2015-07-16 10:43 GMT+03:00 Atri Sharma <[hidden email]>: > > > > > > > > > > > > > > > Are you referring to JSONB here? > > > > > > > > > > > > > > > > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin < > > > > > > > [hidden email]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Guys, specially Alexey G. > > > > > > > > > > > > > > > > > > I've attended PostgreSQL conference and there was a talk > > about > > > > > > > > unstructured > > > > > > > > > data format. > > > > > > > > > They had an interesting idea of serialized layout close > > enough > > > to > > > > > > ours, > > > > > > > > I'm > > > > > > > > > not sure how much this is different from our approach and > if > > we > > > > can > > > > > > use > > > > > > > > > some ideas from it but anywaus it looks really promising to > > me > > > > and > > > > > I > > > > > > > want > > > > > > > > > to share. > > > > > > > > > > > > > > > > > > The structure basically is the following: > > > > > > > > > > > > > > > > > > [key headers] [keys] [values] > > > > > > > > > > > > > > > > > > Key headers are [key offset, key length] so they are of a > > fixed > > > > > > length. > > > > > > > > > > > > > > > > > > The cool idea here is that keys and respectively the key > > > headers > > > > > > sorted > > > > > > > > by > > > > > > > > > (key length, key) so that you can do a lookup first by fast > > > > picking > > > > > > key > > > > > > > > of > > > > > > > > > the needed length without looking at keys at all and then > > pick > > > an > > > > > > exact > > > > > > > > > key. Both searches can be done with fast scan if there are > > > small > > > > > > number > > > > > > > > of > > > > > > > > > keys and binary search for a larger number of keys. > > > > > > > > > > > > > > > > > > Alexey G., could you please compare this to our new > > marshalling > > > > > > > approach > > > > > > > > > you are about to merge? > > > > > > > > > BTW, it would be nice if you will describe it in details > > here. > > > > > > > > > > > > > > > > > > Sergi > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Regards, > > > > > > > > > > > > > > > > Atri > > > > > > > > *l'apprenant* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Regards, > > > > > > > > > > > > Atri > > > > > > *l'apprenant* > > > > > > > > > > > > > > > > > > > > > |
+1 from me too
On Wed, Jul 22, 2015 at 3:34 PM, Sergi Vladykin <[hidden email]> wrote: > Ok, looks good to me. > > Sergi > > 2015-07-22 10:46 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > > On Tue, Jul 21, 2015 at 10:15 PM, Atri Sharma <[hidden email]> > wrote: > > > > > So does that mean that local hashmap is not controlled with all the > heavy > > > locks that are present around the cache? > > > > > > > Yes Atri, you are right. The data is stored in local hash map to avoid > > touching a distributed cache whenever serializing objects. > > > > > > > On 22 Jul 2015 07:31, "Alexey Goncharuk" <[hidden email]> > > > wrote: > > > > > > > Metadata cache access is backed by a local hash map, so the real cost > > is > > > a > > > > String object hashcode which is cached in the String object and a > > hashmap > > > > lookup by an integer key. > > > > > > > > On the other hand, the marshaller is still pluggable and after the > > ticket > > > > is completed, it should be fairly easy to implement this approach and > > > > compare performance. > > > > > > > > --Alexey > > > > > > > > 2015-07-21 10:01 GMT-07:00 Sergi Vladykin <[hidden email] > >: > > > > > > > > > I think O(N) reasoning does not make a real sense here since N is > > > always > > > > > small, lets not fool ourselves. > > > > > To my mind operation cost of cache access (with all busy locks...), > > > > > hashCode/equals and stuff like that has much bigger impact here. > > > > > Do we still have a pluggable marshaller? Can my approach be > > implemented > > > > > separately? > > > > > > > > > > Sergi > > > > > > > > > > > > > > > > > > > > > > > > > 2015-07-21 9:14 GMT+03:00 Alexey Goncharuk < > > [hidden email] > > > >: > > > > > > > > > > > Currently an index-enabled serialized object form has the > following > > > > > layout > > > > > > (simplified): > > > > > > > > > > > > [object fields][field1Offset,field1Length, > > > > > > field2Offset,field2Length,...,fieldNOffset,fieldNLength] > > > > > > > > > > > > where fields order is determined upon the first object > > serialization > > > > and > > > > > > stored in metadata cache which is available on all nodes. Thus, > the > > > > field > > > > > > lookup is performed as follows: > > > > > > > > > > > > fieldName -> fieldIndex (metadata lookup, O(1)), > > > > fieldIndex->fieldOffset > > > > > in > > > > > > footer (O(1)), fieldOffset->fieldValue (O(1)). > > > > > > > > > > > > BTW, I am finalizing the branch with marshaller changes and will > > send > > > > > this > > > > > > for a preliminary review soon. > > > > > > > > > > > > 2015-07-16 0:55 GMT-07:00 Atri Sharma <[hidden email]>: > > > > > > > > > > > > > Keep in mind that JSONB's performance comes from the fact that > it > > > > uses > > > > > > > server encoding, is binary represented and can have GIN indexes > > on > > > > top > > > > > of > > > > > > > it. Not sure if Ignite's marshalling approach keeps those > > features > > > as > > > > > > well. > > > > > > > > > > > > > > On Thu, Jul 16, 2015 at 1:20 PM, Sergi Vladykin < > > > > > > [hidden email]> > > > > > > > wrote: > > > > > > > > > > > > > > > HSTORE and JSONB appeared to have similar format in > Postgresql > > > > > (because > > > > > > > > they was developed by the same people). They noticed that > they > > > > > switched > > > > > > > off > > > > > > > > of using key length sorting because they sometimes need > > > > > lexicographical > > > > > > > > order but this is irrelevant for us. > > > > > > > > > > > > > > > > Sergi > > > > > > > > > > > > > > > > 2015-07-16 10:43 GMT+03:00 Atri Sharma <[hidden email] > >: > > > > > > > > > > > > > > > > > Are you referring to JSONB here? > > > > > > > > > > > > > > > > > > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin < > > > > > > > > [hidden email]> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Guys, specially Alexey G. > > > > > > > > > > > > > > > > > > > > I've attended PostgreSQL conference and there was a talk > > > about > > > > > > > > > unstructured > > > > > > > > > > data format. > > > > > > > > > > They had an interesting idea of serialized layout close > > > enough > > > > to > > > > > > > ours, > > > > > > > > > I'm > > > > > > > > > > not sure how much this is different from our approach and > > if > > > we > > > > > can > > > > > > > use > > > > > > > > > > some ideas from it but anywaus it looks really promising > to > > > me > > > > > and > > > > > > I > > > > > > > > want > > > > > > > > > > to share. > > > > > > > > > > > > > > > > > > > > The structure basically is the following: > > > > > > > > > > > > > > > > > > > > [key headers] [keys] [values] > > > > > > > > > > > > > > > > > > > > Key headers are [key offset, key length] so they are of a > > > fixed > > > > > > > length. > > > > > > > > > > > > > > > > > > > > The cool idea here is that keys and respectively the key > > > > headers > > > > > > > sorted > > > > > > > > > by > > > > > > > > > > (key length, key) so that you can do a lookup first by > fast > > > > > picking > > > > > > > key > > > > > > > > > of > > > > > > > > > > the needed length without looking at keys at all and then > > > pick > > > > an > > > > > > > exact > > > > > > > > > > key. Both searches can be done with fast scan if there > are > > > > small > > > > > > > number > > > > > > > > > of > > > > > > > > > > keys and binary search for a larger number of keys. > > > > > > > > > > > > > > > > > > > > Alexey G., could you please compare this to our new > > > marshalling > > > > > > > > approach > > > > > > > > > > you are about to merge? > > > > > > > > > > BTW, it would be nice if you will describe it in details > > > here. > > > > > > > > > > > > > > > > > > > > Sergi > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Regards, > > > > > > > > > > > > > > > > > > Atri > > > > > > > > > *l'apprenant* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Regards, > > > > > > > > > > > > > > Atri > > > > > > > *l'apprenant* > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- Regards, Atri *l'apprenant* |
+1 as well.
I wonder if it makes sense to further enhance the header fields to have more metadata readily available (so that we can support more JSONB like operators in a much efficient manner). On Wed, Jul 22, 2015 at 3:37 PM, Atri Sharma <[hidden email]> wrote: > +1 from me too > > On Wed, Jul 22, 2015 at 3:34 PM, Sergi Vladykin <[hidden email]> > wrote: > >> Ok, looks good to me. >> >> Sergi >> >> 2015-07-22 10:46 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: >> >> > On Tue, Jul 21, 2015 at 10:15 PM, Atri Sharma <[hidden email]> >> wrote: >> > >> > > So does that mean that local hashmap is not controlled with all the >> heavy >> > > locks that are present around the cache? >> > > >> > >> > Yes Atri, you are right. The data is stored in local hash map to avoid >> > touching a distributed cache whenever serializing objects. >> > >> > >> > > On 22 Jul 2015 07:31, "Alexey Goncharuk" <[hidden email]> >> > > wrote: >> > > >> > > > Metadata cache access is backed by a local hash map, so the real >> cost >> > is >> > > a >> > > > String object hashcode which is cached in the String object and a >> > hashmap >> > > > lookup by an integer key. >> > > > >> > > > On the other hand, the marshaller is still pluggable and after the >> > ticket >> > > > is completed, it should be fairly easy to implement this approach >> and >> > > > compare performance. >> > > > >> > > > --Alexey >> > > > >> > > > 2015-07-21 10:01 GMT-07:00 Sergi Vladykin <[hidden email] >> >: >> > > > >> > > > > I think O(N) reasoning does not make a real sense here since N is >> > > always >> > > > > small, lets not fool ourselves. >> > > > > To my mind operation cost of cache access (with all busy >> locks...), >> > > > > hashCode/equals and stuff like that has much bigger impact here. >> > > > > Do we still have a pluggable marshaller? Can my approach be >> > implemented >> > > > > separately? >> > > > > >> > > > > Sergi >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > 2015-07-21 9:14 GMT+03:00 Alexey Goncharuk < >> > [hidden email] >> > > >: >> > > > > >> > > > > > Currently an index-enabled serialized object form has the >> following >> > > > > layout >> > > > > > (simplified): >> > > > > > >> > > > > > [object fields][field1Offset,field1Length, >> > > > > > field2Offset,field2Length,...,fieldNOffset,fieldNLength] >> > > > > > >> > > > > > where fields order is determined upon the first object >> > serialization >> > > > and >> > > > > > stored in metadata cache which is available on all nodes. Thus, >> the >> > > > field >> > > > > > lookup is performed as follows: >> > > > > > >> > > > > > fieldName -> fieldIndex (metadata lookup, O(1)), >> > > > fieldIndex->fieldOffset >> > > > > in >> > > > > > footer (O(1)), fieldOffset->fieldValue (O(1)). >> > > > > > >> > > > > > BTW, I am finalizing the branch with marshaller changes and will >> > send >> > > > > this >> > > > > > for a preliminary review soon. >> > > > > > >> > > > > > 2015-07-16 0:55 GMT-07:00 Atri Sharma <[hidden email]>: >> > > > > > >> > > > > > > Keep in mind that JSONB's performance comes from the fact >> that it >> > > > uses >> > > > > > > server encoding, is binary represented and can have GIN >> indexes >> > on >> > > > top >> > > > > of >> > > > > > > it. Not sure if Ignite's marshalling approach keeps those >> > features >> > > as >> > > > > > well. >> > > > > > > >> > > > > > > On Thu, Jul 16, 2015 at 1:20 PM, Sergi Vladykin < >> > > > > > [hidden email]> >> > > > > > > wrote: >> > > > > > > >> > > > > > > > HSTORE and JSONB appeared to have similar format in >> Postgresql >> > > > > (because >> > > > > > > > they was developed by the same people). They noticed that >> they >> > > > > switched >> > > > > > > off >> > > > > > > > of using key length sorting because they sometimes need >> > > > > lexicographical >> > > > > > > > order but this is irrelevant for us. >> > > > > > > > >> > > > > > > > Sergi >> > > > > > > > >> > > > > > > > 2015-07-16 10:43 GMT+03:00 Atri Sharma <[hidden email] >> >: >> > > > > > > > >> > > > > > > > > Are you referring to JSONB here? >> > > > > > > > > >> > > > > > > > > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin < >> > > > > > > > [hidden email]> >> >> > > > > > > > > wrote: >> > > > > > > > > >> > > > > > > > > > Guys, specially Alexey G. >> > > > > > > > > > >> > > > > > > > > > I've attended PostgreSQL conference and there was a talk >> > > about >> > > > > > > > > unstructured >> > > > > > > > > > data format. >> > > > > > > > > > They had an interesting idea of serialized layout close >> > > enough >> > > > to >> > > > > > > ours, >> > > > > > > > > I'm >> > > > > > > > > > not sure how much this is different from our approach >> and >> > if >> > > we >> > > > > can >> > > > > > > use >> > > > > > > > > > some ideas from it but anywaus it looks really >> promising to >> > > me >> > > > > and >> > > > > > I >> > > > > > > > want >> > > > > > > > > > to share. >> > > > > > > > > > >> > > > > > > > > > The structure basically is the following: >> > > > > > > > > > >> > > > > > > > > > [key headers] [keys] [values] >> > > > > > > > > > >> > > > > > > > > > Key headers are [key offset, key length] so they are of >> a >> > > fixed >> > > > > > > length. >> > > > > > > > > > >> > > > > > > > > > The cool idea here is that keys and respectively the key >> > > > headers >> > > > > > > sorted >> > > > > > > > > by >> > > > > > > > > > (key length, key) so that you can do a lookup first by >> fast >> > > > > picking >> > > > > > > key >> > > > > > > > > of >> > > > > > > > > > the needed length without looking at keys at all and >> then >> > > pick >> > > > an >> > > > > > > exact >> > > > > > > > > > key. Both searches can be done with fast scan if there >> are >> > > > small >> > > > > > > number >> > > > > > > > > of >> > > > > > > > > > keys and binary search for a larger number of keys. >> > > > > > > > > > >> > > > > > > > > > Alexey G., could you please compare this to our new >> > > marshalling >> > > > > > > > approach >> > > > > > > > > > you are about to merge? >> > > > > > > > > > BTW, it would be nice if you will describe it in details >> > > here. >> > > > > > > > > > >> > > > > > > > > > Sergi >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > -- >> > > > > > > > > Regards, >> > > > > > > > > >> > > > > > > > > Atri >> > > > > > > > > *l'apprenant* >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > -- >> > > > > > > Regards, >> > > > > > > >> > > > > > > Atri >> > > > > > > *l'apprenant* >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > > > > -- > Regards, > > Atri > *l'apprenant* > -- Regards, Atri *l'apprenant* |
Free forum by Nabble | Edit this page |