Guys,
It seems that for simple equality queries like SELECT * FROM x WHERE a = ? it is more effective to use hash indexes which do not even need to be snapshotable. I think it will be easy to implement one based on ConcurrentHashMap. Thoughts? Sergi |
Sergi,
Does it mean that field "a" will now have 2 indexes, hash and sorted? D. On Fri, Sep 18, 2015 at 4:01 PM, Sergi Vladykin <[hidden email]> wrote: > Guys, > > It seems that for simple equality queries like > > SELECT * FROM x WHERE a = ? > > it is more effective to use hash indexes which do not even need to be > snapshotable. > I think it will be easy to implement one based on ConcurrentHashMap. > > Thoughts? > > Sergi > |
Not necessary, you can configure to have either sorted index or hash index
or both. In the last case as far as I understand optimizer just will pick up hash index for equality conditions because it will have lower cost. The only thing I'm currently not sure of is how to add this to configuration (our indexed types config already looks like piece of crap, don't want to complicate it even more). Sergi 2015-09-18 19:05 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > Sergi, > > Does it mean that field "a" will now have 2 indexes, hash and sorted? > > D. > > On Fri, Sep 18, 2015 at 4:01 PM, Sergi Vladykin <[hidden email]> > wrote: > > > Guys, > > > > It seems that for simple equality queries like > > > > SELECT * FROM x WHERE a = ? > > > > it is more effective to use hash indexes which do not even need to be > > snapshotable. > > I think it will be easy to implement one based on ConcurrentHashMap. > > > > Thoughts? > > > > Sergi > > > |
On Sat, Sep 19, 2015 at 2:04 PM, Sergi Vladykin <[hidden email]>
wrote: > Not necessary, you can configure to have either sorted index or hash index > or both. > In the last case as far as I understand optimizer just will pick up hash > index for > equality conditions because it will have lower cost. > > The only thing I'm currently not sure of is how to add this to > configuration > (our indexed types config already looks like piece of crap, don't want to > complicate it even more). > I think index type should be specified at the annotation level. As far as configuring query metadata in XML, I agree with you, we should clean up the design. > > Sergi > > 2015-09-18 19:05 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > > Sergi, > > > > Does it mean that field "a" will now have 2 indexes, hash and sorted? > > > > D. > > > > On Fri, Sep 18, 2015 at 4:01 PM, Sergi Vladykin < > [hidden email]> > > wrote: > > > > > Guys, > > > > > > It seems that for simple equality queries like > > > > > > SELECT * FROM x WHERE a = ? > > > > > > it is more effective to use hash indexes which do not even need to be > > > snapshotable. > > > I think it will be easy to implement one based on ConcurrentHashMap. > > > > > > Thoughts? > > > > > > Sergi > > > > > > |
As part of this discussion I want to talk about non-snapshotable indexes in
general. There are few things we can do to improve performance of queries and indexing. 1. We can implement SkipList based index. Probably it will not be much faster than current SnapTree, but GC and Optimizer will feel better. 2. We can implement ConcurrentMap based index as I wrote earlier. For joins on big caches it must give us a good boost. 3. Because these indexes are not snapshotable we can eliminate taking writeLock and for taking snapshots. This writeLock stops concurrent updates of indexes, so on mixed query/update workloads it will be noticeable improvement. 4. Implement off-heap SkipList. I already did that for Hadoop Accelerator (though it was append only), so this code can be taken as a base. 5. Implement offheap ConcurrentMap. Probably we can use something from our current GridOffheapProcessor. The benefit of this is improved performance. The drawback is that all the updates (even partial) will be immediately visible and index search can occur when index is in inconsistent state like when on update the old value was already removed from index but the new one was not added yet. I think items 1-2 can be done in a few days, item 3 can take couple of days as well. Items 4-5 are probably the toughest ones, I think we should postpone them until we will be production ready with 1-3. Sergi 2015-09-19 17:49 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > On Sat, Sep 19, 2015 at 2:04 PM, Sergi Vladykin <[hidden email]> > wrote: > > > Not necessary, you can configure to have either sorted index or hash > index > > or both. > > In the last case as far as I understand optimizer just will pick up hash > > index for > > equality conditions because it will have lower cost. > > > > The only thing I'm currently not sure of is how to add this to > > configuration > > (our indexed types config already looks like piece of crap, don't want to > > complicate it even more). > > > > I think index type should be specified at the annotation level. As far as > configuring query metadata in XML, I agree with you, we should clean up the > design. > > > > > > Sergi > > > > 2015-09-18 19:05 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > > > > Sergi, > > > > > > Does it mean that field "a" will now have 2 indexes, hash and sorted? > > > > > > D. > > > > > > On Fri, Sep 18, 2015 at 4:01 PM, Sergi Vladykin < > > [hidden email]> > > > wrote: > > > > > > > Guys, > > > > > > > > It seems that for simple equality queries like > > > > > > > > SELECT * FROM x WHERE a = ? > > > > > > > > it is more effective to use hash indexes which do not even need to be > > > > snapshotable. > > > > I think it will be easy to implement one based on ConcurrentHashMap. > > > > > > > > Thoughts? > > > > > > > > Sergi > > > > > > > > > > |
Sergey,
Thanks for detailed explanation! I agree with your suggestion, let's do on-heap (points 1 to 3) first and then switch to off-heap. Can you please create a parent ticket with necessary subtasks, so we can properly track it in Jira? D. On Wed, Sep 23, 2015 at 11:10 AM, Sergi Vladykin <[hidden email]> wrote: > As part of this discussion I want to talk about non-snapshotable indexes in > general. > > There are few things we can do to improve performance of queries and > indexing. > > 1. We can implement SkipList based index. Probably it will not be much > faster than current SnapTree, but GC and Optimizer will feel better. > 2. We can implement ConcurrentMap based index as I wrote earlier. For joins > on big caches it must give us a good boost. > > 3. Because these indexes are not snapshotable we can eliminate taking > writeLock and for taking snapshots. This writeLock stops concurrent updates > of indexes, so on mixed query/update workloads it will be noticeable > improvement. > > 4. Implement off-heap SkipList. I already did that for Hadoop Accelerator > (though it was append only), so this code can be taken as a base. > 5. Implement offheap ConcurrentMap. Probably we can use something from our > current GridOffheapProcessor. > > The benefit of this is improved performance. > The drawback is that all the updates (even partial) will be immediately > visible and index search can occur when index is in inconsistent state like > when on update the old value was already removed from index but the new one > was not added yet. > > I think items 1-2 can be done in a few days, item 3 can take couple of days > as well. Items 4-5 are probably the toughest ones, I think we should > postpone them until we will be production ready with 1-3. > > Sergi > > > > 2015-09-19 17:49 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > > On Sat, Sep 19, 2015 at 2:04 PM, Sergi Vladykin < > [hidden email]> > > wrote: > > > > > Not necessary, you can configure to have either sorted index or hash > > index > > > or both. > > > In the last case as far as I understand optimizer just will pick up > hash > > > index for > > > equality conditions because it will have lower cost. > > > > > > The only thing I'm currently not sure of is how to add this to > > > configuration > > > (our indexed types config already looks like piece of crap, don't want > to > > > complicate it even more). > > > > > > > I think index type should be specified at the annotation level. As far as > > configuring query metadata in XML, I agree with you, we should clean up > the > > design. > > > > > > > > > > Sergi > > > > > > 2015-09-18 19:05 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > > > > > > Sergi, > > > > > > > > Does it mean that field "a" will now have 2 indexes, hash and sorted? > > > > > > > > D. > > > > > > > > On Fri, Sep 18, 2015 at 4:01 PM, Sergi Vladykin < > > > [hidden email]> > > > > wrote: > > > > > > > > > Guys, > > > > > > > > > > It seems that for simple equality queries like > > > > > > > > > > SELECT * FROM x WHERE a = ? > > > > > > > > > > it is more effective to use hash indexes which do not even need to > be > > > > > snapshotable. > > > > > I think it will be easy to implement one based on > ConcurrentHashMap. > > > > > > > > > > Thoughts? > > > > > > > > > > Sergi > > > > > > > > > > > > > > > |
Free forum by Nabble | Edit this page |